Reader small image

You're reading from  Mastering Reinforcement Learning with Python

Product typeBook
Published inDec 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838644147
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Enes Bilgin
Enes Bilgin
author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin

Right arrow

Chapter 13: Exploring Advanced Topics

In this chapter, we cover several advanced topics in reinforcement learning. First of all, we go deeper into distributed reinforcement learning, in addition to our discussion in the previous chapters, which is a key topic to create scalable training architectures. Next, we present curiosity-driven reinforcement learning to handle hard-exploration problems that are not solvable by traditional exploration techniques. Finally, we discuss offline reinforcement learning, which leverages offline datasets rather than environment interactions to obtain good policies. All of these are hot research areas that you will hear more about over the next several years.

So, in this chapter, you will learn about the following:

  • Diving deeper into distributed reinforcement learning
  • Exploring curiosity-driven reinforcement learning
  • Offline reinforcement learning

Let's get started!

Diving deeper into distributed reinforcement learning

As we already mentioned in the earlier chapters, training sophisticated reinforcement learning agents requires massive amounts of data. While one critical area of research is to increase the sample efficiency in RL, the other and complementary direction is about how to best utilize the compute power and parallelization and reduce the wall-clock time and cost of training. We already covered, implemented, and used distributed RL algorithms and libraries in the earlier chapters. So, this section will be an extension of the previous discussions due to the importance of this topic. Here, we present additional material on state-of-the-art distributed RL architectures, algorithms, and libraries. With that, let's get started with SEED RL, an architecture designed for massive and efficient parallelization.

Scalable, efficient deep reinforcement learning: SEED RL

Let's first begin the discussion by revisiting the Ape-X architecture...

Exploring curiosity-driven reinforcement learning

When we discussed the R2D2 agent, we mentioned that there were only few Atari games left in the benchmark set that the agent could not exceed the human performance in. The remaining challenge for the agent was to solve hard-exploration problems, which have very sparse and/or misleading rewards. Later work came out of Google DeepMind addressed those challenges as well, with agents called Never Give Up (NGU) and Agent57, reaching super-human level performance in all of the 57 games used in the benchmarks. In this section, we are going to discuss these agents and the methods they used for effective exploration.

Let's dive in by describing the concepts of hard-exploration and curiosity-driven learning.

Curiosity-driven learning for hard-exploration problems

Let's consider a simple grid world illustrated in Figure 13.7:

Figure 13.7 – A hard-exploration grid-world problem

Assume the following...

Offline reinforcement learning

Offline reinforcement learning is about training agents using data recorded during some prior interactions of an agent (likely non-RL, such as a human agent) with the environment, as opposed to directly interacting with it. It is also called batch reinforcement learning. In this section, we look into some of the key components of offline RL. Let's get started with an overview of how it works.

An overview of how offline reinforcement learning works

In offline RL, the agent does not directly interact with the environment to explore and learn a policy. Figure 13.12 contrasts this to on-policy and off-policy settings.

Figure 13.12 – Comparison of on-policy, off-policy, and offline deep RL (adapted from Levine 2020).

Let's unpack what this figure illustrates:

  • In on-policy RL, the agent collects a batch of experiences with each policy. Then, it uses this batch to update the policy. This cycle repeats until...

Summary

In this chapter, we covered several advanced topics that are very hot areas of research. Distributed reinforcement learning is key to be able to scale RL experiments efficiently. Curiosity-driven RL is making solving hard-exploration problems possible through effective exploration strategies. And finally, offline RL has a potential to transform how RL is used for real-world problems by leveraging the data logs already available for many processes.

With this chapter, we conclude the part of our book on algorithmic and theoretical discussions. The remaining chapters will be more applied, starting with robotics applications in the next chapter.

References

  1. Espeholt, Lasse, et al. (2020). SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. arXiv.org, http://arxiv.org/abs/1910.06591.
  2. Weng, Lilian. (2020). Exploration Strategies in Deep Reinforcement Learning. Lil'Log. URL: https://bit.ly/3mRohHL.
  3. DeepMind. (2020). Agent57: Outperforming the Human Atari Benchmark. DeepMind Blog. URL: https://bit.ly/3mVaZu4.
  4. OpenAI. (2018). Reinforcement Learning with Prediction-Based Rewards. OpenAI Blog. URL: https://openai.com/blog/reinforcement-learning-with-prediction-based-rewards/.
  5. Pathak, D. (2018). Large-Scale Study of Curiosity-Driven Learning. YouTube. URL: https://youtu.be/C3yKgCzvE_E.
  6. Levine, Sergey. (2020). Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning. Medium. URL: https://bit.ly/3gjq8Tk.
  7. Agarwal, R. et al. (2020). Offline Reinforcement Learning Workshop. Neural Information Processing Systems (NeurIPS). URL: https://offline...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Reinforcement Learning with Python
Published in: Dec 2020Publisher: PacktISBN-13: 9781838644147
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin