Reader small image

You're reading from  Mastering Reinforcement Learning with Python

Product typeBook
Published inDec 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838644147
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Enes Bilgin
Enes Bilgin
author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin

Right arrow

Chapter 11: Achieving Generalization and Overcoming Partial Observability

Deep reinforcement learning (RL) has achieved what was impossible with the earlier AI methods, such as beating world champions in games like Go, Dota 2, and StarCraft II. Yet, applying RL to real-world problems is still challenging. Two important obstacles to this end are generalization of trained policies to a broad set of environment conditions and developing policies that can handle partial observability. As we will see in the chapter, these are closely related challenges, for which we will present solution approaches.

Here is what we will cover in this chapter:

  • Focusing on generalization in reinforcement learning
  • Enriching agent experience via domain randomization
  • Using memory to overcome partial observability
  • Quantifying generalization via CoinRun

These topics are critical to understand for a successful implementation of RL in real-world settings. So, let's dive right in...

Focusing on generalization in reinforcement learning

The core goal in most machine learning projects is to obtain models that will work beyond training, and under a broad set of conditions during test time. Yet, when you start learning about RL, efforts to prevent overfitting and achieve generalization are not always at the forefront of the discussion, as opposed to how it is with supervised learning. In this section, we discuss what leads to this discrepancy, describe how generalization is closely related to partial observability in RL, and present a general recipe to handle these challenges.

Generalization and overfitting in supervised learning

When we train an image recognition or forecasting model, what we really want to achieve is high accuracy on unseen data. After all, we already know the labels for the data at hand. We use various methods to this end:

  • We use separate training, dev, and test sets, for model training, hyperparameter selection, and model performance...

Enriching agent experience via domain randomization

DR is simply about randomizing the parameters defining (part of) the environment during training to enrich the training data. It is a useful technique to obtain policies that are robust and generalizable, both in fully and partially observable environments. In this section, we first present a classification of such parameters, in other words, different dimensions of randomization. Then, we discuss two curriculum learning approaches to guide RL training along those dimensions.

Dimensions of randomization

Borrowed from (Rivlin, 2019), a useful categorization of how two environments belonging to the same problem class (e.g., autonomous driving) can differ is as follows.

Different observations for the same/similar states

In this case, two environments emit different observations although the underlying state and transition functions are the same or very similar. An example to this is the same Atari game scene but with different...

Using memory to overcome partial observability

A memory is nothing but a way of processing a sequence of observations as the input to the agent policy. If you worked with other types of sequence data with neural networks, such as in time series prediction or natural language processing (NLP), you can adopt similar approaches to use observation memory as the input your RL model.

Let's go into more details of how this can be done.

Stacking observations

A simple way of passing an observation sequence to the model is to stitch them together and treat this stack as a single observation. Denoting the observation at time as , we can form a new observation to be passed to the model as follows:

where is the length of the memory. Of course, for , we need to somehow initialize the earlier parts of the memory, such as using vectors of zeros that are the same dimension as .

In fact, simply stacking observations is how the original DQN work handled...

Quantifying generalization via CoinRun

There are various ways of testing whether certain algorithms/approaches generalize to unseen environment conditions better than others, such as:

  • Creating validation and test environments with separate sets of environment parameters,
  • Assessing policy performance in real-life deployment.

Real-life deployment may not necessarily be an option, so the latter is not always practical. The challenge with the former is to have consistency and to ensure that validation/test data are indeed not used in training. Also, it is possible to overfit to the validation environment when too many models are tried based on validation performance. One approach to overcome these challenges is to use procedurally generated environments. To this end, OpenAI has created the CoinRun environment to benchmark algorithms on their generalization capabilities. Let's look into it in more detail.

CoinRun environment

In the CoinRun environment, we have...

Summary

In this chapter, we have covered an important topic in RL: Generalization and partial observability, which are key for real-world applications. Note that this is an active research area: Keep our discussion here as directional suggestions and the first methods to try for your problem. New approaches come out periodically, so watch out for them. The important thing is you should always keep an eye on the generalization and partial observability for a successful RL implementation outside of video games. In the next section, we will take our expedition to yet a next advanced level with meta-learning. So, stay tuned!

References

  1. Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. (2018). Quantifying Generalization in Reinforcement Learning. Retrieved from ArXiv: https://arxiv.org/abs/1812.02341
  2. Lee, K., Lee, K., Shin, J., & Lee, H. (2020). {Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning. Retrieved from ArXiv: https://arxiv.org/abs/1910.05396
  3. Rivlin, O. (2019, Nov 21). Generalization in Deep Reinforcement Learning. Retrieved from Towards Data Science: https://towardsdatascience.com/generalization-in-deep-reinforcement-learning-a14a240b155b
  4. Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. (2018). Quantifying Generalization in Reinforcement Learning: https://arxiv.org/abs/1812.0234
  5. Lee, K., Lee, K., Shin, J., & Lee, H. (2020). "Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning.": https://arxiv.org/abs/1910.0539
  6. Parisotto, Emilio, et al. (2019) "...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Reinforcement Learning with Python
Published in: Dec 2020Publisher: PacktISBN-13: 9781838644147
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin