Reader small image

You're reading from  TensorFlow Reinforcement Learning Quick Start Guide

Product typeBook
Published inMar 2019
PublisherPackt
ISBN-139781789533583
Edition1st Edition
Right arrow
Author (1)
Kaushik Balakrishnan
Kaushik Balakrishnan
author image
Kaushik Balakrishnan

Kaushik Balakrishnan works for BMW in Silicon Valley, and applies reinforcement learning, machine learning, and computer vision to solve problems in autonomous driving. Previously, he also worked at Ford Motor Company and NASA Jet Propulsion Laboratory. His primary expertise is in machine learning, computer vision, and high-performance computing, and he has worked on several projects involving both research and industrial applications. He has also worked on numerical simulations of rocket landings on planetary surfaces, and for this he developed several high-fidelity models that run efficiently on supercomputers. He holds a PhD in aerospace engineering from the Georgia Institute of Technology in Atlanta, Georgia.
Read more about Kaushik Balakrishnan

Right arrow

Double DQN, Dueling Architectures, and Rainbow

We discussed the Deep Q-Network (DQN) algorithm in the previous chapter, coded it in Python and TensorFlow, and trained it to play Atari Breakout. In DQN, the same Q-network was used to select and evaluate an action. This, unfortunately, is known to overestimate the Q values, which results in over-optimistic estimates for the values. To mitigate this, DeepMind released another paper where it proposed the decoupling of the action selection and action evaluation. This is the crux of the Double DQN (DDQN) architectures, which we will investigate in this chapter.

Even later, DeepMind released another paper where they proposed the Q-network architecture with two output values, one representing the value, V(s), and the other the advantage of taking an action at the given state, A(s,a). DeepMind then combined these two to compute the Q...

Technical requirements

To successfully complete this chapter, knowledge of the following will help significantly:

  • Python (2 or 3)
  • NumPy
  • Matplotlib
  • TensorFlow (version 1.4 or higher)
  • Dopamine (we will discuss this in more detail later)

Understanding Double DQN

DDQN is an extension to DQN, where we use the target network in the Bellman update. Specifically, in DDQN, we evaluate the target network's Q function using the action that would be greedy maximization of the primary network's Q function. First, we will use the vanilla DQN target for the Bellman equation update step, then, we will extend to DDQN for the same Bellman equation update step; this is the crux of the DDQN algorithm. We will then code DDQN in TensorFlow to play Atari Breakout. Finally, we will compare and contrast the two algorithms: DQN and DDQN.

Updating the Bellman equation

In vanilla DQN, the target for the Bellman update is this:

θt represents the model parameters...

Understanding dueling network architectures

We will now understand the use of dueling network architectures. In DQN and DDQN, and other DQN variants in the literature, the focus was primarily on algorithms, that is, how to efficiently and stably update the value function neural networks. While this is crucial for developing robust RL algorithms, a parallel but complementary direction to advance the field is to also innovate and develop novel neural network architectures that are well suited for model-free RL. This is precisely the concept behind dueling network architectures, another contribution from DeepMind.

The steps involved in dueling architectures are as follows:

  1. Dueling network architecture figure; compare with standard DQN
  2. Computing Q(s,a)
  3. Subtracting the average of the advantage from the advantage function

As we saw in the previous chapter, the output of the Q-network...

Understanding Rainbow networks

We will now move on to Rainbow networks, which is a confluence of several different DQN improvements. Since the original DQN paper, several different improvements were proposed with notable success. This motivated DeepMind to combine several different improvements into an integrated agent, which they refer to as the Rainbow DQN. Specifically, six different DQN improvements are combined into one integrated Rainbow DQN agent. These six improvements are summarized as follows:

  • DDQN
  • Dueling network architecture
  • Prioritized experience replay
  • Multi-step learning
  • Distributional RL
  • Noisy nets

DQN improvements

We have already seen DDQN and dueling network architectures and have coded them in TensorFlow...

Running a Rainbow network on Dopamine

In 2018, some engineers at Google released an open source, lightweight, TensorFlow-based framework for training RL agents, called Dopamine. Dopamine, as you may already know, is the name of an organic chemical that plays an important role in the brain. We will use Dopamine to run Rainbow.

The Dopamine framework is based on four design principles:

  • Easy experimentation
  • Flexible development
  • Compact and reliable
  • Reproducible

To download Dopamine from GitHub, type the following command in a Terminal:

git clone https://github.com/google/dopamine.git

We can test whether Dopamine was successfully installed by typing the following commands into a Terminal:

cd dopamine
export PYTHONPATH=${PYTHONPATH}:.
python tests/atari_init_test.py

The output of this will look something like the following:

2018-10-27 23:08:17.810679: I tensorflow/core/platform/cpu_feature_guard...

Summary

In this chapter, we were introduced to DDQN, dueling network architectures, and the Rainbow DQN. We extended our previous DQN code to DDQN and dueling architectures and tried it out on Atari Breakout. We can clearly see that the average episode rewards are higher with these improvements, and so these improvements are a natural choice to use. Next, we also saw Google's Dopamine and used it to train a Rainbow DQN agent. Dopamine has several other RL algorithms, and the user is encouraged to dig deeper and try out these other RL algorithms as well.

This chapter was a good deep dive into the DQN variants, and we really covered a lot of mileage as far as coding of RL algorithms is involved. In the next chapter, we will learn about our next RL algorithm called Deep Deterministic Policy Gradient (DDPG), which is our first Actor-Critic RL algorithm and our first continuous...

Questions

  1. Why does DDQN perform better than DQN?
  2. How does the dueling network architecture help in the training?
  3. Why does prioritized experience replay speed up the training?
  4. How do sticky actions help in the training?

Further reading

  • The DDQN paper, Deep Reinforcement Learning with Double Q-learning, by Hado van Hasselt, Arthur Guez, and David Silver can be obtained from the following link, and the interested reader is recommended to read it: https://arxiv.org/abs/1509.06461
  • Rainbow: Combining Improvements in Deep Reinforcement Learning, Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver, arXiv:1710.02298 (the Rainbow DQN): https://arxiv.org/abs/1710.02298
  • Prioritized Experience Replay, Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, arXiv:1511.05952: https://arxiv.org/abs/1511.05952
  • Multi-Step Reinforcement Learning: A Unifying Algorithm, Kristopher de Asis, J Fernando Hernandez-Garcia, G Zacharias Holland, Richard S Sutton: https://arxiv.org/pdf/1703.01327.pdf
  • Noisy Networks for Exploration...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
TensorFlow Reinforcement Learning Quick Start Guide
Published in: Mar 2019Publisher: PacktISBN-13: 9781789533583
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Kaushik Balakrishnan

Kaushik Balakrishnan works for BMW in Silicon Valley, and applies reinforcement learning, machine learning, and computer vision to solve problems in autonomous driving. Previously, he also worked at Ford Motor Company and NASA Jet Propulsion Laboratory. His primary expertise is in machine learning, computer vision, and high-performance computing, and he has worked on several projects involving both research and industrial applications. He has also worked on numerical simulations of rocket landings on planetary surfaces, and for this he developed several high-fidelity models that run efficiently on supercomputers. He holds a PhD in aerospace engineering from the Georgia Institute of Technology in Atlanta, Georgia.
Read more about Kaushik Balakrishnan