You're reading from Deep Reinforcement Learning Hands-On. - Second Edition

Product type Book

Published in Jan 2020

Publisher Packt

ISBN-13 9781838826994

Pages 826 pages

Edition 2nd Edition

Languages

Python

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

Table of Contents (28) Chapters

Preface

What Is Reinforcement Learning?

OpenAI Gym

Deep Learning with PyTorch

The Cross-Entropy Method

Tabular Learning and the Bellman Equation

Deep Q-Networks

Higher-Level RL Libraries

DQN Extensions

Ways to Speed up RL

Stocks Trading Using RL

Policy Gradients – an Alternative

The Actor-Critic Method

Asynchronous Advantage Actor-Critic

Training Chatbots with RL

The TextWorld Environment

Web Navigation

Continuous Action Space

RL in Robotics

Trust Regions – PPO, TRPO, ACKTR, and SAC

Black-Box Optimization in RL

Advanced Exploration

Beyond Model-Free – Imagination

AlphaGo Zero

RL in Discrete Optimization

Multi-agent RL

Other Books You May Enjoy

Index

Higher-Level RL Libraries

In Chapter 6, Deep Q-Networks, we implemented the deep Q-network (DQN) model published by DeepMind in 2015 (https://deepmind.com/research/publications/playing-atari-deep-reinforcement-learning). This paper had a significant effect on the RL field by demonstrating that, despite common belief, it's possible to use nonlinear approximators in RL. This proof of concept stimulated great interest in the deep Q-learning field and in deep RL in general.

In this chapter, we will take another step towards practical RL by discussing higher-level RL libraries, which will allow you to build your code from higher-level blocks and focus on the details of the method that you are implementing. Most of the chapter will describe the PyTorch Agent Net (PTAN) library, which will be used in the rest of the book to avoid code repetition, so will be covered in detail.

We will cover:

The motivation for using high-level libraries, rather than reimplementing everything...

Why RL libraries?

Our implementation of basic DQN in Chapter 6, Deep Q-Networks wasn't very, long and complicated—about 200 lines of training code plus 120 lines in environment wrappers. When you are becoming familiar with RL methods, it is very useful to implement everything yourself to understand how things actually work. However, the more involved you become in the field, the more often you will realize that you are writing the same code over and over again.

This repetition comes from the generality of RL methods. As we already discussed in Chapter 1, What Is Reinforcement Learning?, RL is quite flexible and many real-life problems fall into the environment-agent interaction scheme. RL methods don't make many assumptions about the specifics of observations and actions, so code implemented for the CartPole environment will be applicable to Atari games (maybe with some minor tweaks).

Writing the same code over and over again is not very efficient, as bugs might...

The PTAN library

The library is available in GitHub: https://github.com/Shmuma/ptan. All the subsequent examples were implemented using version 0.6 of PTAN, which can be installed in your virtual environment by running the following:

pip install ptan==0.6

The original goal of PTAN was to simplify my RL experiments, and it tries to keep the balance between two extremes:

Import the library and then write one line with tons of parameters to train one of the provided methods, like DQN (a very vivid example is the OpenAI Baselines project)
Implement everything from scratch

The first approach is very inflexible. It works well when you are using the library the way it is supposed to be used. But if you want to do something fancy, you will quickly find yourself hacking the library and fighting with the constraints that I imposed, rather than solving the problem you want to solve.

The second extreme gives too much freedom and requires implementing replay buffers...

The PTAN CartPole solver

Let's now take the PTAN classes (without Ignite so far) and try to combine everything together to solve our first environment: CartPole. The complete code is in Chapter07/06_cartpole.py. I will show only the important parts of the code related to the material that we have just covered.

net = Net(obs_size, HIDDEN_SIZE, n_actions)
tgt_net = ptan.agent.TargetNet(net)
selector = ptan.actions.ArgmaxActionSelector()
selector = ptan.actions.EpsilonGreedyActionSelector(
    epsilon=1, selector=selector)
agent = ptan.agent.DQNAgent(net, selector)
exp_source = ptan.experience.ExperienceSourceFirstLast(
    env, agent, gamma=GAMMA)
buffer = ptan.experience.ExperienceReplayBuffer(
    exp_source, buffer_size=REPLAY_SIZE)

In the beginning, we create the NN (the simple two-layer feed-forward NN that we used for CartPole before) and target the NN epsilon-greedy action selector and DQNAgent. Then the experience source and replay...

Other RL libraries

As we discussed earlier, there are several RL-specific libraries available. Overall, TensorFlow is more popular than PyTorch, as it is more widespread in the deep learning community. The following is my (very biased) list of libraries:

Keras-RL: started by Matthias Plappert in 2016, this includes basic deep RL methods. As suggested by the name, this library was implemented using Keras, which is a higher-level wrapper around TensorFlow (https://github.com/keras-rl/keras-rl).
Dopamine: a library from Google published in 2018. It is TensorFlow-specific, which is not surprising for a library from Google (https://github.com/google/dopamine).
Ray: a library for distributed execution of machine learning code. It includes RL utilities as part of the library (https://github.com/ray-project/ray).
TF-Agents: another library from Google published in 2018 (https://github.com/tensorflow/agents).
ReAgent: a library from Facebook Research. It uses PyTorch...

Summary

In this chapter, we talked about higher-level RL libraries, their motivation, and their requirements. Then we took a deep look into the PTAN library, which will be used in the rest of the book to simplify example code.

In the next chapter, we will return to DQN methods by exploring extensions that researchers and practitioners have discovered since the classic DQN introduction to improve the stability and performance of the method.