You're reading from Deep Reinforcement Learning Hands-On. - Second Edition

Product typeBook

Published inJan 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781838826994

Edition2nd Edition

Languages

Python

Tools

Keras TensorFlow

Concepts

Deep Reinforcement Learning

Author (1)

Maxim Lapan

DQN Extensions

Since DeepMind published its paper on the deep Q-network (DQN) model (https://deepmind.com/research/publications/playing-atari-deep-reinforcement-learning) in 2015, many improvements have been proposed, along with tweaks to the basic architecture, which, significantly, have improved the convergence, stability, and sample efficiency of DeepMind's basic DQN. In this chapter, we will take a deeper look at some of those ideas.

Very conveniently, in October 2017, DeepMind published a paper called Rainbow: Combining Improvements in Deep Reinforcement Learning ([1] Hessel and others, 2017), which presented the seven most important improvements to DQN; some were invented in 2015, but others were very recent. In this paper, state-of-the-art results on the Atari games suite were reached, just by combining those seven methods. This chapter will go through all those methods. We will analyze the ideas behind them, alongside how they can be implemented and compared to the...

Basic DQN

To get started, we will implement the same DQN method as in Chapter 6, Deep Q-Networks, but leveraging the high-level libraries described in Chapter 7, Higher-Level RL Libraries. This will make our code much more compact, which is good, as non-relevant details won't distract us from the method's logic.

At the same time, the purpose of this book is not to teach you how to use the existing libraries, but rather how to develop intuition about RL methods and, if necessary, implement everything from scratch. From my perspective, this is a much more valuable skill, as libraries come and go, but true understanding of the domain will allow you to quickly make sense of other people's code and apply it consciously.

In the basic DQN implementation, we have three modules:

Chapter08/lib/dqn_model.py: the DQN neural network (NN), which is the same as Chapter 6, so I won't repeat it
Chapter08/lib/common.py: common functions and declarations shared by...

N-step DQN

The first improvement that we will implement and evaluate is quite an old one. It was first introduced in the paper Learning to Predict by the Methods of Temporal Differences, by Richard Sutton ([2] Sutton, 1988). To get the idea, let's look at the Bellman update used in Q-learning once again:

This equation is recursive, which means that we can express Q(s_t₊₁, a_t₊₁) in terms of itself, which gives us this result:

Value r_a,t₊₁ means local reward at time t+1, after issuing action a. However, if we assume that action a at the step t+1 was chosen optimally, or close to optimally, we can omit the max_a operation and obtain this:

This value can be unrolled again and again any number of times. As you may guess, this unrolling can be easily applied to our DQN update by replacing one-step transition sampling with longer transition sequences of n-steps. To understand why this unrolling will help us to speed up training, let's consider the example illustrated...

Double DQN

The next fruitful idea on how to improve a basic DQN came from DeepMind researchers in the paper titled Deep Reinforcement Learning with Double Q-Learning ([3] van Hasselt, Guez, and Silver, 2015). In the paper, the authors demonstrated that the basic DQN tends to overestimate values for Q, which may be harmful to training performance and sometimes can lead to suboptimal policies. The root cause of this is the max operation in the Bellman equation, but the strict proof is too complicated to write down here. As a solution to this problem, the authors proposed modifying the Bellman update a bit.

In the basic DQN, our target value for Q looked like this:

Q'(s_t₊₁, a) was Q-values calculated using our target network, so we update with the trained network every n steps. The authors of the paper proposed choosing actions for the next state using the trained network, but taking values of Q from the target network. So, the new expression for target Q-values will look...

Noisy networks

The next improvement that we are going to look at addresses another RL problem: exploration of the environment. The paper that we will draw from is called Noisy Networks for Exploration ([4] Fortunato and others, 2017) and it has a very simple idea for learning exploration characteristics during training instead of having a separate schedule related to exploration.

Classical DQN achieves exploration by choosing random actions with a specially defined hyperparameter epsilon, which is slowly decreased over time from 1.0 (fully random actions) to some small ratio of 0.1 or 0.02. This process works well for simple environments with short episodes, without much non-stationarity during the game; but even in such simple cases, it requires tuning to make the training processes efficient.

In the Noisy Networks paper, the authors proposed a quite simple solution that, nevertheless, works well. They add noise to the weights of fully connected layers of the network and adjust...

Prioritized replay buffer

The next very useful idea on how to improve DQN training was proposed in 2015 in the paper Prioritized Experience Replay ([7] Schaul and others, 2015). This method tries to improve the efficiency of samples in the replay buffer by prioritizing those samples according to the training loss.

The basic DQN used the replay buffer to break the correlation between immediate transitions in our episodes. As we discussed in Chapter 6, Deep Q-Networks, the examples we experience during the episode will be highly correlated, as most of the time, the environment is "smooth" and doesn't change much according to our actions. However, the stochastic gradient descent (SGD) method assumes that the data we use for training has an i.i.d. property. To solve this problem, the classic DQN method uses a large buffer of transitions, randomly sampled to get the next training batch.

The authors of the paper questioned this uniform random sample policy and proved...

Dueling DQN

This improvement to DQN was proposed in 2015, in the paper called Dueling Network Architectures for Deep Reinforcement Learning ([8] Wang et al., 2015). The core observation of this paper is that the Q-values, Q(s, a), that our network is trying to approximate can be divided into quantities: the value of the state, V(s), and the advantage of actions in this state, A(s, a).

You have seen the quantity V(s) before, as it was the core of the value iteration method from Chapter 5, Tabular Learning and the Bellman Equation. It is just equal to the discounted expected reward achievable from this state. The advantage A(s, a) is supposed to bridge the gap from A(s) to Q(s, a), as, by definition, Q(s, a) = V(s) + A(s, a). In other words, the advantage A(s, a) is just the delta, saying how much extra reward some particular action from the state brings us. The advantage could be positive or negative and, in general, can have any magnitude. For example, at some tipping point, the...

Categorical DQN

The last, and the most complicated, method in our DQN improvements toolbox is from a very recent paper, published by DeepMind in June 2017, called A Distributional Perspective on Reinforcement Learning ([9] Bellemare, Dabney, and Munos, 2017).

In the paper, the authors questioned the fundamental piece of Q-learning—Q-values—and tried to replace them with a more generic Q-value probability distribution. Let's try to understand the idea. Both the Q-learning and value iteration methods work with the values of the actions or states represented as simple numbers and showing how much total reward we can achieve from a state, or an action and a state. However, is it practical to squeeze all future possible rewards into one number? In complicated environments, the future could be stochastic, giving us different values with different probabilities.

For example, imagine the commuter scenario when you regularly drive from home to work. Most of the time...

Combining everything

You have now seen all the DQN improvements mentioned in the paper Rainbow: Combining Improvements in Deep Reinforcement Learning, but it was done in an incremental way, which helped you to understand the idea and implementation of every improvement. The main point of the paper was to combine those improvements and check the results. In the final example, I've decided to exclude categorical DQN and double DQN from the final system, as they haven't shown too much improvement on our guinea pig environment. If you want, you can add them and try using a different game. The complete example is available in Chapter08/08_dqn_rainbow.py.

First of all, we need to define our network architecture and the methods that have contributed to it:

Dueling DQN: our network will have two separate paths for the value of the state distribution and advantage distribution. On the output, both paths will be summed together, providing the final value probability distributions...

Summary

In this chapter, we have walked through and implemented a lot of DQN improvements that have been discovered by researchers since the first DQN paper was published in 2015. This list is far from complete. First of all, for the list of methods, I used the paper Rainbow: Combining Improvements in Deep Reinforcement Learning, which was published by DeepMind, so the list of methods is definitely biased to DeepMind papers. Secondly, RL is so active nowadays that new papers come out almost every day, which makes it very hard to keep up, even if we limit ourselves to one kind of RL model, such as a DQN. The goal of this chapter was to give you a practical view of different ideas that the field has developed.

In the next chapter, we will continue discussing practical DQN applications from an engineering perspective by talking about ways to improve DQN performance without touching the underlying method.

References

Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver, 2017, Rainbow: Combining Improvements in Deep Reinforcement Learning. arXiv:1710.02298
Sutton, R.S., 1988, Learning to Predict by the Methods of Temporal Differences, Machine Learning 3(1):9-44
Hado Van Hasselt, Arthur Guez, David Silver, 2015, Deep Reinforcement Learning with Double Q-Learning. arXiv:1509.06461v3
Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Pilot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg, 2017, Noisy Networks for Exploration. arXiv:1706.10295v1
Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaus, David Saxton, Remi Munos, 2016, Unifying Count-Based Exploration and Intrinsic Motivation. arXiv:1606.01868v2
Jarryd Martin, Suraj Narayanan Sasikumar, Tom Everitt, Marcus Hutter, 2017, Count-Based...

The rest of the chapter is locked

You have been reading a chapter from

Deep Reinforcement Learning Hands-On. - Second Edition

Published in: Jan 2020Publisher: PacktISBN-13: 9781838826994

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Maxim Lapan

Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.
Read more about Maxim Lapan

Other recommended products

Related to this chapter

TensorFlow Reinforcement Learning Quick Start Guide

This book is an essential guide for anyone interested in Reinforcement Learning. The book provides an actionable reference for Reinforcement Learning algorithms and their applications using TensorFlow and Python. It will help readers leverage the power of algorithms such as Deep Q-Network (DQN), Deep Deterministic Policy Gradients (DDPG), and Proximal Policy Optimization (PPO) to solve challenging control and decision-making problems.

BookMar 2019184 pages

Deep Reinforcement Learning with Python

Deep Reinforcement Learning with Python - Second Edition will help you learn reinforcement learning algorithms, techniques and architectures – including deep reinforcement learning – from scratch. This new edition is an extensive update of the original, reflecting the state-of-the-art latest thinking in reinforcement learning.

BookSep 2020760 pages

Hands-On Intelligent Agents with OpenAI Gym

Walks through the hands-on process of building intelligent agents from the basics and all the way up to solving complex problems including playing Atari games and driving a car autonomously in the CARLA simulator. Discusses various learning environments and how to transform real-world problems into learning environments and solve using the agents.

BookJul 2018254 pages

Hands-On Q-Learning with Python

Q-learning is the reinforcement learning approach behind Deep-Q-Learning and is a values-based learning algorithm in RL. This book will help you get comfortable with developing the effective agents for Q learning and also make you learn to effectively develop and deploy Deep Q networks for complex AI applications.

BookApr 2019212 pages

PyTorch 1.x Reinforcement Learning Cookbook

This book presents practical solutions to the most common reinforcement learning problems. The recipes in this book will help you understand the fundamental concepts to develop popular RL algorithms. You will gain practical experience in the RL domain using the modern offerings of the PyTorch 1.x library.

BookOct 2019340 pages

TensorFlow 2 Reinforcement Learning Cookbook

This cookbook will help you to gain a solid understanding of deep reinforcement learning (RL) algorithms with the help of concise, easy-to-follow implementations from scratch. You'll learn how to implement these algorithms with minimal code and develop AI applications to solve real-world and business problems using RL.

BookJan 2021472 pages

Reinforcement Learning Algorithms with Python

With this book, you will understand the core concepts and techniques of reinforcement learning. You will take a look into each RL algorithm and will develop your own self-learning algorithms and models. You will optimize the algorithms for better precision, use high-speed actions and lower the risk of anomalies in your applications.

BookOct 2019366 pages

Reinforcement Learning with TensorFlow

Reinforcement learning allows you to develop intelligent, self-learning systems. This book shows you how to put the concepts of Reinforcement Learning to train efficient models.You will use popular reinforcement learning algorithms to implement use-cases in image processing and NLP, by combining the power of TensorFlow and OpenAI Gym.

BookApr 2018334 pages

Hands-On Reinforcement Learning for Games

The AI revolution is here and it is embracing games. Game developers are being challenged to enlist cutting edge AI as part of their games. In this book, you will look at the journey of building capable AI using reinforcement learning algorithms and techniques. You will learn to solve complex tasks and build next-generation games using a practical approach.

BookJan 2020432 pages

Mastering Reinforcement Learning with Python

This book focuses on expert-level explanations and implementations of scalable reinforcement learning algorithms and approaches. Starting with the fundamentals, the book covers state-of-the-art methods from bandit problems to meta-reinforcement learning. You’ll also explore practical examples inspired by real-life problems from the industry.

BookDec 2020544 pages

Hands-On Reinforcement Learning with Python

Reinforcement learning is a self-evolving type of machine learning that takes us closer to achieving true artificial intelligence. This easy-to-follow guide explains everything from scratch using rich examples written in Python.

BookJun 2018318 pages

The Reinforcement Learning Workshop

With the help of practical examples and engaging activities, The Reinforcement Learning Workshop takes you through reinforcement learning’s core techniques and frameworks. Following a hands-on approach, it allows you to learn reinforcement learning at your own pace to develop your own intelligent applications with ease.

BookAug 2020822 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages