Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Deep Reinforcement Learning Hands-On. - Second Edition

You're reading from  Deep Reinforcement Learning Hands-On. - Second Edition

Product type Book
Published in Jan 2020
Publisher Packt
ISBN-13 9781838826994
Pages 826 pages
Edition 2nd Edition
Languages
Author (1):
Maxim Lapan Maxim Lapan
Profile icon Maxim Lapan

Table of Contents (28) Chapters

Preface What Is Reinforcement Learning? OpenAI Gym Deep Learning with PyTorch The Cross-Entropy Method Tabular Learning and the Bellman Equation Deep Q-Networks Higher-Level RL Libraries DQN Extensions Ways to Speed up RL Stocks Trading Using RL Policy Gradients – an Alternative The Actor-Critic Method Asynchronous Advantage Actor-Critic Training Chatbots with RL The TextWorld Environment Web Navigation Continuous Action Space RL in Robotics Trust Regions – PPO, TRPO, ACKTR, and SAC Black-Box Optimization in RL Advanced Exploration Beyond Model-Free – Imagination AlphaGo Zero RL in Discrete Optimization Multi-agent RL Other Books You May Enjoy
Index

Higher-Level RL Libraries

In Chapter 6, Deep Q-Networks, we implemented the deep Q-network (DQN) model published by DeepMind in 2015 (https://deepmind.com/research/publications/playing-atari-deep-reinforcement-learning). This paper had a significant effect on the RL field by demonstrating that, despite common belief, it's possible to use nonlinear approximators in RL. This proof of concept stimulated great interest in the deep Q-learning field and in deep RL in general.

In this chapter, we will take another step towards practical RL by discussing higher-level RL libraries, which will allow you to build your code from higher-level blocks and focus on the details of the method that you are implementing. Most of the chapter will describe the PyTorch Agent Net (PTAN) library, which will be used in the rest of the book to avoid code repetition, so will be covered in detail.

We will cover:

  • The motivation for using high-level libraries, rather than reimplementing everything...

Why RL libraries?

Our implementation of basic DQN in Chapter 6, Deep Q-Networks wasn't very, long and complicated—about 200 lines of training code plus 120 lines in environment wrappers. When you are becoming familiar with RL methods, it is very useful to implement everything yourself to understand how things actually work. However, the more involved you become in the field, the more often you will realize that you are writing the same code over and over again.

This repetition comes from the generality of RL methods. As we already discussed in Chapter 1, What Is Reinforcement Learning?, RL is quite flexible and many real-life problems fall into the environment-agent interaction scheme. RL methods don't make many assumptions about the specifics of observations and actions, so code implemented for the CartPole environment will be applicable to Atari games (maybe with some minor tweaks).

Writing the same code over and over again is not very efficient, as bugs might...

The PTAN library

The library is available in GitHub: https://github.com/Shmuma/ptan. All the subsequent examples were implemented using version 0.6 of PTAN, which can be installed in your virtual environment by running the following:

pip install ptan==0.6

The original goal of PTAN was to simplify my RL experiments, and it tries to keep the balance between two extremes:

  • Import the library and then write one line with tons of parameters to train one of the provided methods, like DQN (a very vivid example is the OpenAI Baselines project)
  • Implement everything from scratch

The first approach is very inflexible. It works well when you are using the library the way it is supposed to be used. But if you want to do something fancy, you will quickly find yourself hacking the library and fighting with the constraints that I imposed, rather than solving the problem you want to solve.

The second extreme gives too much freedom and requires implementing replay buffers...

The PTAN CartPole solver

Let's now take the PTAN classes (without Ignite so far) and try to combine everything together to solve our first environment: CartPole. The complete code is in Chapter07/06_cartpole.py. I will show only the important parts of the code related to the material that we have just covered.

net = Net(obs_size, HIDDEN_SIZE, n_actions)
tgt_net = ptan.agent.TargetNet(net)
selector = ptan.actions.ArgmaxActionSelector()
selector = ptan.actions.EpsilonGreedyActionSelector(
    epsilon=1, selector=selector)
agent = ptan.agent.DQNAgent(net, selector)
exp_source = ptan.experience.ExperienceSourceFirstLast(
    env, agent, gamma=GAMMA)
buffer = ptan.experience.ExperienceReplayBuffer(
    exp_source, buffer_size=REPLAY_SIZE)

In the beginning, we create the NN (the simple two-layer feed-forward NN that we used for CartPole before) and target the NN epsilon-greedy action selector and DQNAgent. Then the experience source and replay...

Other RL libraries

As we discussed earlier, there are several RL-specific libraries available. Overall, TensorFlow is more popular than PyTorch, as it is more widespread in the deep learning community. The following is my (very biased) list of libraries:

  • Keras-RL: started by Matthias Plappert in 2016, this includes basic deep RL methods. As suggested by the name, this library was implemented using Keras, which is a higher-level wrapper around TensorFlow (https://github.com/keras-rl/keras-rl).
  • Dopamine: a library from Google published in 2018. It is TensorFlow-specific, which is not surprising for a library from Google (https://github.com/google/dopamine).
  • Ray: a library for distributed execution of machine learning code. It includes RL utilities as part of the library (https://github.com/ray-project/ray).
  • TF-Agents: another library from Google published in 2018 (https://github.com/tensorflow/agents).
  • ReAgent: a library from Facebook Research. It uses PyTorch...

Summary

In this chapter, we talked about higher-level RL libraries, their motivation, and their requirements. Then we took a deep look into the PTAN library, which will be used in the rest of the book to simplify example code.

In the next chapter, we will return to DQN methods by exploring extensions that researchers and practitioners have discovered since the classic DQN introduction to improve the stability and performance of the method.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Deep Reinforcement Learning Hands-On. - Second Edition
Published in: Jan 2020 Publisher: Packt ISBN-13: 9781838826994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}