Reader small image

You're reading from  Hands-On Reinforcement Learning with Python

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788836524
Edition1st Edition
Languages
Right arrow
Author (1)
Sudharsan Ravichandiran
Sudharsan Ravichandiran
author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran

Right arrow

Recent Advancements and Next Steps

Congratulations! You have made it to the final chapter. We have come a long way! We started off with the very basics of RL, such as MDP, Monte Carlo methods, and TD learning and moved on to advanced deep reinforcement learning algorithms such as DQN, DRQN, and A3C. We have also learned about interesting state-of-the-art policy gradient methods such as DDPG, PPO, and TRPO, and we built a car-racing agent as our final project. But RL still has a lot more for us to explore, with increasing advancements each and every day. In this chapter, we will learn about some of the advancement in RL followed by hierarchical and inverse RL.

In this chapter, you will learn the following:

  • Imagination augmented agents (I2A)
  • Learning from human preference
  • Deep Q learning from demonstrations
  • Hindsight experience replay
  • Hierarchical reinforcement learning
  • Inverse...

Imagination augmented agents

Are you a fan of the game chess? If I asked you to play chess, how would you play the game? Before moving any pieces on the chessboard, you might imagine the consequences of moving any piece and move the piece you think would help you to win. So, basically, before taking any action, you imagine the consequence and, if it is favorable, you proceed with that action, or else you refrain from performing that action.

Similarly, imagination augmented agents are augmented with imagination; before taking any action in an environment they imagine the consequences of taking the action and, if they think the action will provide a good reward, they will perform the action. They also imagine the consequences of taking a different action. Augmenting agents with imaginations is the next big step towards general artificial intelligence.

Now we will see how imagination...

Learning from human preference

Learning from human preference is a major breakthrough in RL. The algorithm was proposed by researchers at OpenAI and DeepMind. The idea behind the algorithm is to make the agent learn according to human feedback. Initially, the agents act randomly and then two video clips of the agent performing an action are given to a human. The human can inspect the video clips and tell the agent which video clip is better, that is, in which video the agent is performing the task better and will lead it to achieving the goal. Once this feedback is given, the agent will try to do the actions preferred by the human and set the reward accordingly. Designing reward functions is one of the major challenges in RL, so having human interaction with the agent directly helps us to overcome the challenge and also helps us to minimize the writing of complex goal functions...

Deep Q learning from demonstrations

We have learned a lot about DQN. We started off with vanilla DQN and then we saw various improvements such as double DQN, dueling network architecture, and prioritized experience replay. We have also learned to build DQN to play Atari games. We stored the agent's interactions with the environment in the experience buffer and made the agent learn from those experiences. But the problem was, it took us a lot of training time to improve performance. For learning in simulated environments, it is fine, but when we make our agent learn in a real-world environment it causes a lot of problems. To overcome this, a researcher from Google's DeepMind introduced an improvement on DQN called deep Q learning from demonstrations (DQfd).

If we already have some demonstration data, then we can directly add those demonstrations to the experience replay...

Hindsight experience replay

We have seen how experience replay is used in DQN to avoid a correlated experience. Also, we learned that prioritized experience replay is an improvement to the vanilla experience replay as it prioritizes each experience with the TD error. Now we will look at a new technique called hindsight experience replay (HER), proposed by OpenAI researchers for dealing with sparse rewards. Do you remember how you learned to ride a bike? On your first try, you wouldn't have balanced the bike properly. You would have failed several times to balance correctly. But all those failures don't mean you didn't learn anything. The failures would have taught you how not to balance a bike. Even though you did not learn to ride the bike (goal), you learned a different goal, that is, you learned how not to balance a bike. This is how we humans learn, right? We...

Hierarchical reinforcement learning

The problem with RL is that it cannot scale well with a large number of state spaces and actions, which ultimately leads to the curse of dimensionality. Hierarchical reinforcement learning (HRL) is proposed to solve the curse of dimensionality where we decompress large problems into small subproblems in a hierarchy. Let's say the agent's goal is to reach its home from school. Here the problem is split into a set of subgoals such as going out of the school gate, booking a cab, and so on.

There are different methods used in HRL such as state-space decomposition, state abstraction, and temporal abstraction. In state-space decomposition, we decompose the state space into different subspaces and try to solve the problem in a smaller subspace. Breaking down the state space also allows faster exploration as the agent does not want to explore...

Inverse reinforcement learning

So, what did we do in RL? We tried to find the optimal policy given the reward function. Inverse reinforcement learning is just the inverse of reinforcement learning, that is, the optimal policy is given and we need to find the reward function. But why is inverse reinforcement learning helpful? Designing a reward function is not a simple task and a poor reward function will lead to the bad behavior of an agent. We do not always know the proper reward function but we know the right policy, that is, the right action in each state. So this optimal policy is fed to the agent by human experts and the agents try to learn the reward function. For example, consider an agent learning to walk in a real-world environment; it is difficult to design the reward function for all the actions it will do. Instead, we can feed in to the agents the demonstrations (optimal...

Summary

In this chapter, we have learned about several recent advancements in RL. We saw how I2A architecture uses the imagination core for forward planning followed by how agents can be trained according to human preference. We also learned about DQfd, which boosts the performance and reduces the training time of DQN by learning from demonstrations. Then we looked at hindsight experience replay where we learned how agents learn from failures.

Next, we learned about hierarchical RL, where the goal is decompressed into a hierarchy of sub-goals. We learned about inverse RL where the agents try to learn the reward function given the policy. RL is evolving each and every day with interesting advancements; now that you have understood various reinforcement learning algorithms, you can build agents to perform various tasks and contribute to RL research.

...

Questions

The question list is as follows:

  1. What is imagination in an agent?
  2. What is the imagination core?
  3. How do the agents learn from human preference?
  4. How is DQfd different from DQN?
  5. What is hindsight experience replay?
  6. What is the need for hierarchical reinforcement learning?
  7. How does inverse reinforcement learning differ from reinforcement learning?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Reinforcement Learning with Python
Published in: Jun 2018Publisher: PacktISBN-13: 9781788836524
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran