Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Reinforcement Learning with Python

You're reading from  Hands-On Reinforcement Learning with Python

Product type Book
Published in Jun 2018
Publisher Packt
ISBN-13 9781788836524
Pages 318 pages
Edition 1st Edition
Languages
Author (1):
Sudharsan Ravichandiran Sudharsan Ravichandiran
Profile icon Sudharsan Ravichandiran

Table of Contents (16) Chapters

Preface Introduction to Reinforcement Learning Getting Started with OpenAI and TensorFlow The Markov Decision Process and Dynamic Programming Gaming with Monte Carlo Methods Temporal Difference Learning Multi-Armed Bandit Problem Deep Learning Fundamentals Atari Games with Deep Q Network Playing Doom with a Deep Recurrent Q Network The Asynchronous Advantage Actor Critic Network Policy Gradients and Optimization Capstone Project – Car Racing Using DQN Recent Advancements and Next Steps Assessments Other Books You May Enjoy

Atari Games with Deep Q Network

Deep Q Network (DQN) is one of the very popular and widely used deep reinforcement learning (DRL) algorithms. In fact, it created a lot of buzz around the reinforcement learning (RL) community after its release. The algorithm was proposed by researchers at Google's DeepMind and achieved human-level results when playing any Atari game by just taking the game screen as input.

In this chapter, we will explore how DQN works and also learn how to build a DQN that plays any Atari game by taking only the game screen as input. We will look at some of the improvements made to DQN architecture, such as double DQN and dueling network architecture.

In this chapter, you will learn about:

  • Deep Q Networks (DQNs)
  • Architecture of DQN
  • Building an agent to play Atari games
  • Double DQN
  • Prioritized experience replay
...

What is a Deep Q Network?

Before going ahead, first, let us just recap the Q function. What is a Q function? A Q function, also called a state-action value function, specifies how good an action a is in the state s. So, we store the value of all possible actions in each state in a table called a Q table and we pick the action that has the maximum value in a state as the optimal action. Remember how we learned this Q function? We used Q learning, which is an off-policy temporal difference learning algorithm for estimating the Q function. We looked at this in Chapter 5, Temporal Difference Learning.

So far, we have seen environments with a finite number of states with limited actions, and we did an exhaustive search through all possible state-action pairs for finding the optimal Q value. Think of an environment where we have a very large number of states and, in each state, we have...

Architecture of DQN

Now that we have a basic understanding of DQN, we will go into detail about how DQN works and the architecture of DQN for playing Atari games. We will look at each component and then we will view the algorithm as a whole.

Convolutional network

The first layer of DQN is the convolutional network, and the input to the network will be a raw frame of the game screen. So, we take a raw frame and pass that to the convolutional layers to understand the game state. But the raw frames will have 210 x 160 pixels with a 128 color palette and it will clearly take a lot of computation and memory if we feed the raw pixels directly. So, we downsample the pixel to 84 x 84 and convert the RGB values to grayscale values...

Building an agent to play Atari games

Now we will see how to build an agent to play any Atari game. You can get the complete code as a Jupyter notebook with the explanation here (https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/08.%20Atari%20Games%20with%20DQN/8.8%20Building%20an%20Agent%20to%20Play%20Atari%20Games.ipynb).

First, we import all the necessary libraries:

import numpy as np
import gym
import tensorflow as tf
from tensorflow.contrib.layers import flatten, conv2d, fully_connected
from collections import deque, Counter
import random
from datetime import datetime

We can use any of the Atari gaming environments given here: http://gym.openai.com/envs/#atari.

In this example, we use the Pac-Man game environment:

env = gym.make("MsPacman-v0")
n_outputs = env.action_space.n

The Pac-Man environment is shown here:

Now we define a...

Double DQN

Deep Q learning is pretty cool, right? It has generalized its learning to play any Atari game. But the problem with DQN is that it tends to overestimate Q values. This is because of the max operator in the Q learning equation. The max operator uses the same value for both selecting and evaluating an action. What do I mean by that? Let's suppose we are in a state s and we have five actions a1 to a5. Let's say a3 is the best action. When we estimate Q values for all these actions in the state s, the estimated Q values will have some noise and differ from the actual value. Due to this noise, action a2 will get a higher value than the optimal action a3. Now, if we select the best action as the one that has maximum value, we will end up selecting a suboptimal action a2 instead of optimal action a3.

We can solve this problem by having two separate Q functions, each...

Prioritized experience replay

In DQN architecture, we use experience replay to remove correlations between the training samples. However, uniformly sampling transitions from the replay memory is not an optimal method. Instead, we can prioritize transitions and sample according to priority. Prioritizing transitions helps the network to learn swiftly and effectively. How do we prioritize the transitions? We prioritize the transitions that have a high TD error. We know that a TD error specifies the difference between the estimated Q value and the actual Q value. So, transitions with a high TD error are the transition we have to focus on and learn from because those are the transitions that deviate from our estimation. Intuitively, let us say you try to solve a set of problems, but you fail in solving two of these problems. You then give priority to those two problems alone to focus...

Dueling network architecture

We know that the Q function specifies how good it is for an agent to perform an action a in the state s and the value function specifies how good it is for an agent to be in a state s. Now we introduce a new function called an advantage function which can be defined as the difference between the value function and the advantage function. The advantage function specifies how good it is for an agent to perform an action a compared to other actions.

Thus, the value function specifies the goodness of a state and the advantage function specifies the goodness of an action. What would happen if we were to combine the value function and advantage function? It would tell us how good it is for an agent to perform an action a in a state s that is actually our Q function. So we can define our Q function as a sum of a value function and an advantage function, as...

Summary

In this chapter, we have learned about one of the very popular deep reinforcement learning algorithms called DQN. We saw how deep neural networks are used to approximate the Q function. We also learned how to build an agent to play Atari games. Later, we looked at several advancements to the DQN, such as double DQN, which is used to avoid overestimating Q values. We then looked at prioritized experience replay, for prioritizing the experience, and dueling network architecture, which breaks down the Q function computation into two streams, called value stream and advantage stream.

In the next chapter, Chapter 9, Playing Doom with Deep Recurrent Q Network, we will look at a really cool variant of DQNs called DRQN, which makes use of an RNN for approximating a Q function.

Questions

The question list is as follows:

  1. What is DQN?
  2. What is the need for experience replay?
  3. Why do we keep a separate target network?
  4. Why is DQN overestimating?
  5. How does double DQN avoid overestimating the Q value?
  6. How are experiences prioritized in prioritized experience replay?
  7. What is the need for duel architecture?
lock icon The rest of the chapter is locked
You have been reading a chapter from
Hands-On Reinforcement Learning with Python
Published in: Jun 2018 Publisher: Packt ISBN-13: 9781788836524
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}