You're reading from Deep Reinforcement Learning Hands-On. - Second Edition

Product type Book

Published in Jan 2020

Publisher Packt

ISBN-13 9781838826994

Pages 826 pages

Edition 2nd Edition

Languages

Python

Concepts

Deep Reinforcement Learning

Author (1):

Maxim Lapan

Table of Contents (28) Chapters

Preface

What Is Reinforcement Learning?

OpenAI Gym

Deep Learning with PyTorch

The Cross-Entropy Method

Tabular Learning and the Bellman Equation

Deep Q-Networks

Higher-Level RL Libraries

DQN Extensions

Ways to Speed up RL

Stocks Trading Using RL

Policy Gradients – an Alternative

The Actor-Critic Method

Asynchronous Advantage Actor-Critic

Training Chatbots with RL

The TextWorld Environment

Web Navigation

Continuous Action Space

RL in Robotics

Trust Regions – PPO, TRPO, ACKTR, and SAC

Black-Box Optimization in RL

Advanced Exploration

Beyond Model-Free – Imagination

AlphaGo Zero

RL in Discrete Optimization

Multi-agent RL

Other Books You May Enjoy

Index

Preface

The topic of this book is reinforcement learning (RL), which is a subfield of machine learning (ML); it focuses on the general and challenging problem of learning optimal behavior in a complex environment. The learning process is driven only by the reward value and observations obtained from the environment. This model is very general and can be applied to many practical situations, from playing games to optimizing complex manufacturing processes.

Due to its flexibility and generality, the field of RL is developing very quickly and attracting lots of attention, both from researchers who are trying to improve existing methods or create new methods and from practitioners interested in solving their problems in the most efficient way.

Why I wrote this book

This book was written as an attempt to fill the obvious gap in practical and structured information about RL methods and approaches. On the one hand, there is lots of research activity all around the world. New research papers are being published almost every day, and a large portion of deep learning (DL) conferences, such as Neural Information Processing Systems (NeurIPS) or the International Conference on Learning Representations (ICLR), are dedicated to RL methods. There are also several large research groups focusing on the application of RL methods to robotics, medicine, multi-agent systems, and others.

Information about the recent research is widely available, but it is too specialized and abstract to be easily understandable. Even worse is the situation surrounding the practical aspect of RL, as it is not always obvious how to make the step from an abstract method described in its mathematical-heavy form in a research paper to a working implementation solving an actual problem.

This makes it hard for somebody interested in the field to get a clear understanding of the methods and ideas behind papers and conference talks. There are some very good blog posts about various RL aspects that are illustrated with working examples, but the limited format of a blog post allows authors to describe only one or two methods, without building a complete structured picture and showing how different methods are related to each other. This book is my attempt to address this issue.

The approach

Another aspect of the book is its orientation to practice. Every method is implemented for various environments, from the very trivial to the quite complex. I’ve tried to make the examples clean and easy to understand, which was made possible by the expressiveness and power of PyTorch. On the other hand, the complexity and requirements of the examples are oriented to RL hobbyists without access to very large computational resources, such as clusters of graphics processing units (GPUs) or very powerful workstations. This, I believe, will make the fun-filled and exciting RL domain accessible to a much wider audience than just research groups or large artificial intelligence companies. This is still deep RL, so access to a GPU is highly recommended. Approximately half of the examples in the book will benefit from being run on a GPU.

In addition to traditional medium-sized examples of environments used in RL, such as Atari games or continuous control problems, the book contains several chapters (10, 14, 15, 16, and 18) that contain larger projects, illustrating how RL methods can be applied to more complicated environments and tasks. These examples are still not full-sized, real-life projects (they would occupy a separate book on their own), but just larger problems illustrating how the RL paradigm can be applied to domains beyond the well-established benchmarks.

Another thing to note about the examples in the first three parts of the book is that I’ve tried to make them self-contained, with the source code shown in full. Sometimes this has led to the repetition of code pieces (for example, the training loop is very similar in most of the methods), but I believe that giving you the freedom to jump directly into the method you want to learn is more important than avoiding a few repetitions. All examples in the book are available on GitHub: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On-Second-Edition, and you’re welcome to fork them, experiment, and contribute.

Who this book is for

The main target audience is people who have some knowledge of ML, but want to get a practical understanding of the RL domain. The reader should be familiar with Python and the basics of DL and ML. An understanding of statistics and probability is an advantage, but is not absolutely essential for understanding most of the book’s material.

What this book covers

Chapter 1, What Is Reinforcement Learning?, contains an introduction to RL ideas and the main formal models.

Chapter 2, OpenAI Gym, introduces the practical aspects of RL, using the open source library Gym.

Chapter 3, Deep Learning with PyTorch, gives a quick overview of the PyTorch library.

Chapter 4, The Cross-Entropy Method, introduces one of the simplest methods in RL to give you an impression of RL methods and problems.

Chapter 5, Tabular Learning and the Bellman Equation, introduces the value-based family of RL methods.

Chapter 6, Deep Q-Networks, describes deep Q-networks (DQNs), an extension of the basic value-based methods, allowing us to solve a complicated environment.

Chapter 7, Higher-Level RL Libraries, describes the library PTAN, which we will use in the book to simplify the implementations of RL methods.

Chapter 8, DQN Extensions, gives a detailed overview of a modern extension to the DQN method, to improve its stability and convergence in complex environments.

Chapter 9, Ways to Speed up RL Methods, provides an overview of ways to make the execution of RL code faster.

Chapter 10, Stocks Trading Using RL, is the first practical project and focuses on applying the DQN method to stock trading.

Chapter 11, Policy Gradients—an Alternative, introduces another family of RL methods that is based on policy learning.

Chapter 12, The Actor-Critic Method, describes one of the most widely used methods in RL.

Chapter 13, Asynchronous Advantage Actor-Critic, extends the actor-critic method with parallel environment communication, which improves stability and convergence.

Chapter 14, Training Chatbots with RL, is the second project and shows how to apply RL methods to natural language processing problems.

Chapter 15, The TextWorld Environment, covers the application of RL methods to interactive fiction games.

Chapter 16, Web Navigation, is another long project that applies RL to web page navigation using the MiniWoB set of tasks.

Chapter 17, Continuous Action Space, describes the specifics of environments using continuous action spaces and various methods.

Chapter 18, RL in Robotics, covers the application of RL methods to robotics problems. In this chapter, I describe the process of building and training a small hardware robot with RL methods.

Chapter 19, Trust Regions – PPO, TRPO, ACKTR, and SAC, is yet another chapter about continuous action spaces describing the trust region set of methods.

Chapter 20, Black-Box Optimization in RL, shows another set of methods that don’t use gradients in their explicit form.

Chapter 21, Advanced Exploration, covers different approaches that can be used for better exploration of the environment.

Chapter 22, Beyond Model-Free – Imagination, introduces the model-based approach to RL and uses recent research results about imagination in RL.

Chapter 23, AlphaGo Zero, describes the AlphaGo Zero method and applies it to the game Connect 4.

Chapter 24, RL in Discrete Optimization, describes the application of RL methods to the domain of discrete optimization, using the Rubik’s Cube as an environment.

Chapter 25, Multi-agent RL, introduces a relatively new direction of RL methods for situations with multiple agents.

To get the most out of this book

All the chapters in this book describing RL methods have the same structure: in the beginning, we discuss the motivation of the method, its theoretical foundation, and the idea behind it. Then, we follow several examples of the method applied to different environments with the full source code.

You can use the book in different ways:

To quickly become familiar with some method, you can read only the introductory part of the relevant chapter
To get a deeper understanding of the way the method is implemented, you can read the code and the comments around it
To gain a deep familiarity with the method (the best way to learn, I believe) you can try to reimplement the method and make it work, using the provided source code as a reference point

In any case, I hope the book will be useful for you!

Download the example code files

You can download the example code files for this book from your account at www.packt.com/. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at http://www.packt.com.
Select the Support tab.
Click on Code Downloads.
Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On-Second-Edition. In case there’s an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838826994_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; “Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”

A block of code is set as follows:

def grads_func(proc_name, net, device, train_queue):
    envs = [make_env() for _ in range(NUM_ENVS)]
    agent = ptan.agent.PolicyAgent(
        lambda x: net(x)[0], device=device, apply_softmax=True)
    exp_source = ptan.experience.ExperienceSourceFirstLast(
        envs, agent, gamma=GAMMA, steps_count=REWARD_STEPS)
    batch = []
    frame_idx = 0
    writer = SummaryWriter(comment=proc_name)

Any command-line input or output is written as follows:

rl_book_samples/Chapter11$ ./02_a3c_grad.py --cuda -n final

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: “Select System info from the Administration panel.”

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.