Reader small image

You're reading from  Hands-On Reinforcement Learning with Python

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788836524
Edition1st Edition
Languages
Right arrow
Author (1)
Sudharsan Ravichandiran
Sudharsan Ravichandiran
author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran

Right arrow

The Asynchronous Advantage Actor Critic Network

In the previous chapters, we have seen how cool a Deep Q Network (DQN) is and how it succeeded in generalizing its learning to play a series of Atari games with a human level performance. But the problem we faced is that it required a large amount of computation power and training time. So, Google's DeepMind introduced a new algorithm called the Asynchronous Advantage Actor Critic (A3C) algorithm, which dominates the other deep reinforcement learning algorithms, as it requires less computation power and training time. The main idea behind A3C is that it uses several agents for learning in parallel and aggregates their overall experience. In this chapter, we will see how A3C networks work. Following this, we will learn how to build an agent to drive up a mountain using A3C.

In this chapter, you will learn the following:

  • The...

The Asynchronous Advantage Actor Critic

The A3C network came as a storm and took over the DQN. Aside of the previously stated advantages, it also yields good accuracy compared to other algorithms. It works well in both continuous and discrete action spaces. It uses several agents, and each agent learns in parallel with a different exploration policy in copies of the actual environment. Then, the experience obtained from these agents is aggregated to the global agent. The global agent is also called a master network or global network and other agents are also called the workers. Now, we will see in detail how A3C works and how it differs from the DQN algorithm.

The three As

Before diving in, what does A3C mean? What do the...

Driving up a mountain with A3C

Let's understand A3C with a mountain car example. Our agent is the car and it is placed between two mountains. The goal of our agent is to drive up the mountain on the right. However, the car can't drive up the mountain in one pass; it has to drive up back and forth to build the momentum. A high reward will be assigned if our agent spends less energy on driving up. Credits for the code used in this section goes to Stefan Boschenriedter (https://github.com/stefanbo92/A3C-Continuous). The environment is shown as follows:

Okay, let's get to the coding! The complete code is available as the Jupyter notebook with an explanation here (https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/10.%20Aysnchronous%20Advantage%20Actor%20Critic%20Network/10.5%20Drive%20up%20the%20Mountain%20Using%20A3C.ipynb...

Summary

In this chapter, we learned how the A3C network works. In A3C, Asynchronous implies multiple agents working independently by interacting with multiple copies of the environment, Advantage implies the advantage function, which is the difference between the Q function and the value function, and Actor Critic refers to the Actor Critic network, where the actor network is responsible for generating a policy and the critic network evaluates the policy generated by the actor network. We have seen how A3C works, and saw how to solve a mountain car problem using the algorithm.

In the next chapter, Chapter 11, Policy Gradients and Optimization, we will see policy gradient methods that directly optimize the policy without requiring the Q function.

Questions

The question list is as follows:

  1. What is A3C?
  2. What do the three As signify?
  3. Name one advantage of A3N over DQN
  4. What is the difference between global and worker nodes?
  5. Why do we entropy to our loss function?
  6. Explain the workings of A3C.

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Reinforcement Learning with Python
Published in: Jun 2018Publisher: PacktISBN-13: 9781788836524
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran