Reader small image

You're reading from  Hands-On Reinforcement Learning with Python

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788836524
Edition1st Edition
Languages
Right arrow
Author (1)
Sudharsan Ravichandiran
Sudharsan Ravichandiran
author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran

Right arrow

Playing Doom with a Deep Recurrent Q Network

In the last chapter, we saw how to build an agent using a Deep Q Network (DQN) in order to play Atari games. We have taken advantage of neural networks for approximating the Q function, used the convolutional neural network (CNN) to understand the input game screen, and taken the past four game screens to better understand the current game state. In this chapter, we will learn how to improve the performance of our DQN by taking advantage of the recurrent neural network (RNN). We will also look at what is partially observable with the Markov Decision Process (MDP) and how we can solve that using a Deep Recurrent Q Network (DRQN). Following this, we will learn how to build an agent to play the game Doom using a DRQN. Finally, we will see a variant of DRQN called Deep Attention Recurrent Q Network (DARQN), which augments the attention...

DRQN

So, why do we need DRQN when our DQN performed at a human level at Atari games? To answer this question, let us understand the problem of the partially observable Markov Decision Process (POMDP). An environment is called a partially observable MDP when we have a limited set of information available about the environment. So far, in the previous chapters, we have seen a fully observable MDP where we know all possible actions and states—although the agent might be unaware of transition and reward probabilities, it had complete knowledge of the environment, for example, a frozen lake environment, where we clearly know about all the states and actions of the environment; we easily modeled that environment as a fully observable MDP. But most of the real-world environments are only partially observable; we cannot see all the states. Consider the agent learning to walk in...

Training an agent to play Doom

Doom is a very popular first-person shooter game. The goal of the game is to kill monsters. Doom is another example of a partially observable MDP as the agent's (player) view is limited to 90 degrees. The agent has no idea about the rest of the environment. Now, we will see how can we use DRQN to train our agent to play Doom.

Instead of OpenAI Gym, we will use the ViZDoom package to simulate the Doom environment to train our agent. To learn more about the ViZDoom package, check out its official website at http://vizdoom.cs.put.edu.pl/. We can install ViZDoom simply by using the following command:

pip install vizdoom

ViZDoom provides a lot of Doom scenarios and those scenarios can be found in the package folder vizdoom/scenarios.

Basic Doom game...

DARQN

We have improved our DQN architecture by adding a recurrent layer, which captures temporal dependency, and we called it DRQN. Do you think we can improve our DRQN architecture further? Yes. We can further improve our DRQN architecture by adding the attention layer on top of the convolutional layer. So, what is the function of the attention layer? Attention implies the literal meaning of the word. Attention mechanisms are widely used in image captioning, object detection, and so on. Consider the task of neural networks captioning the image; to understand what is in the image, the network has to give attention to the specific object in the image for generating the caption.

Similarly, when we add the attention layer to our DRQN, we can select and pay attention to small regions of the image, and ultimately this reduces the number of parameters in the network and also reduces...

Summary

In this chapter, we learned how DRQN is used to remember information about the previous states and how it overcomes the problem of partially observable MDP. We have seen how to train our agent to play the game Doom using a DRQN algorithm. We have also learned about DARQN as an improvement to DRQN, which adds an attention layer on top of the convolution layer. Following this, we saw the two types of attention mechanism; namely, soft and hard attention.

In the next chapter, Chapter 10, Asynchronous Advantage Actor Critic Network, we will learn about another interesting deep reinforcement learning algorithm called Asynchronous Advantage Actor Critic network.

Questions

The question list is as follows:

  1. What is the difference between DQN and DRQN?
  2. What are the shortcomings of DQN?
  3. How do we set up an experience replay in DQN?
  4. What is the difference between DRQN and DARQN?
  5. Why do we need DARQN?
  6. What are the different types of attention mechanism?
  7. Why do we set a living reward in Doom?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Reinforcement Learning with Python
Published in: Jun 2018Publisher: PacktISBN-13: 9781788836524
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran