Reader small image

You're reading from  Mastering Reinforcement Learning with Python

Product typeBook
Published inDec 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781838644147
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Enes Bilgin
Enes Bilgin
author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin

Right arrow

Chapter 8: Model-Based Methods

All of the deep reinforcement learning (RL) algorithms we have covered so far were model-free, which means they did not assume any knowledge about the transition dynamics of the environment but learned from sampled experiences. In fact, this was a quite deliberate departure from the dynamic programming methods to save us from requiring a model of the environment. In this chapter, we swing the pendulum back a little bit and discuss a class of methods that rely on a model, called model-based methods. These methods can lead to improved sample efficiency by several orders of magnitude in some problems, making it a very appealing approach, especially when collecting experience is as costly as in robotics. Having said this, we still will not assume that we have such a model readily available, but we will discuss how to learn one. Once we have a model, it can be used for decision-time planning and improving the performance of model-free methods.

This important...

Introducing model-based methods

Imagine a scene in which you are traveling in a car on an undivided road and you face the following situation. Suddenly, another car in the opposing direction approaches you fast in your lane as it is passing a truck. Chances are your mind automatically simulates different scenarios about how the next scenes might unfold:

  • The other car might go back to its lane right away or drive even faster to pass the truck as soon as possible.
  • Another scenario could be the car steering toward your right, but this is an unlikely scenario (in a right-hand traffic flow).

The driver (possibly you) then evaluates the likelihood and risk of each scenario, together with their possible actions too, and makes the decision to safely continue the journey.

In a less sensational example, consider a game of chess. Before making a move, a player "simulates" many scenarios in their head and assesses the possible outcomes of several moves down...

Planning through a model

In this section, we first define what it means to plan through a model in the sense of optimal control. Then, we will cover several planning methods, including the cross-entropy method and covariance matrix adaptation evolution strategy. You will also see how these methods can be parallelized using the Ray library. Now, let's get started with the problem definition.

Defining the optimal control problem

In RL, or in control problems in general, we care about the actions an agent takes because there is a task that we want to be achieved. We express this task as a mathematical objective so that we can use mathematical tools to figure out the actions toward the task – and in RL, this is the expected sum of cumulative discounted rewards. You of course know all this, as this is what we have been doing all along, but this is a good time to reiterate it: We are essentially solving an optimization problem here.

Now, let's assume that we are...

Learning a world model

In the introduction to this chapter, we reminded you how we departed from dynamic programming methods to avoid assuming that the model of the environment an agent is in is available and accessible. Now, coming back to talking about models, we need to also discuss how a world model can be learned when not available. In particular, in this section, we discuss what we aim to learn as a model, when we may want to learn it, a general procedure for learning a model, how to improve it by incorporating the model uncertainty into the learning procedure, and what to do when we have complex observations. Let's dive in!

Understanding what model means

From what we have done so far, a model of the environment could be equivalent to the simulation of the environment in your mind. On the other hand, model-based methods don't require the full fidelity of a simulation. Instead, what we expect to get from a model is the next state given the current state and action...

Unifying model-based and model-free approaches

When we went from dynamic programming-based approaches to Monte Carlo and temporal-difference methods in Chapter 5, Solving the Reinforcement Learning Problem, our motivation was that it is limiting to assume that the environment transition probabilities are known. Now that we know how to learn the environment dynamics, we will leverage that to find a middle ground. It turns out that with a learned model of the environment, the learning with model-free methods can be accelerated. To that end, in this section, we first refresh our minds on Q-learning, then introduce a class of methods called Dyna.

Refresher on Q-learning

Let's start with remembering the definition of the action-value function:

The expectation operator here is because the transition into the next state is probabilistic, so is a random variable along with . On the other hand, if we know the probability distribution of and , we can...

Summary

In this chapter, we covered model-based methods. We started the chapter by describing how we humans use the world models we have in our brains to plan our actions. Then, we introduced several methods that can be used to plan an agent's actions in an environment when a model is available. These were derivative-free search methods, and for the CEM and CMA-ES methods, we implemented parallelized versions. As a natural follow-up to this section, we then went into how a world model can be learned to be used for planning or developing policies. This section contained some important discussions about model uncertainty and how learned models can suffer from it. At the end of the chapter, we unified the model-free and model-based approaches in the Dyna framework.

As we conclude our discussion on model-based RL, we proceed to the next chapter for yet another exciting topic: multi-agent RL. Take a break, and we will see you soon!

References

  1. Levine, Sergey. (2019). Optimal Control and Planning. CS285 Fa19 10/2/19. YouTube. URL: https://youtu.be/pE0GUFs-EHI
  2. Levine, Sergey. (2019). Model-Based Reinforcement Learning. CS285 Fa19 10/7/19. YouTube. URL: https://youtu.be/6JDfrPRhexQ
  3. Levine, Sergey. (2019). Model-Based Policy Learning. CS285 Fa19 10/14/19. YouTube. URL: https://youtu.be/9AbBfIgTzoo.
  4. Ha, David, and Jürgen Schmidhuber. (2018). World Models. arXiv.org, URL: https://arxiv.org/abs/1803.10122.
  5. Mania, Horia, et al. (2018). Simple Random Search Provides a Competitive Approach to Reinforcement Learning. arXiv.org, URL: http://arxiv.org/abs/1803.07055
  6. Jospin, Laurent Valentin, et al. (2020). Hands-on Bayesian Neural Networks – a Tutorial for Deep Learning Users. arXiv.org, http://arxiv.org/abs/2007.06823.
  7. Joseph, Trist'n. (2020). Bootstrapping Statistics. What It Is and Why It's Used. Medium. URL: https://bit.ly/3fOlvjK.
  8. Richard S. Sutton. (1991). Dyna...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Reinforcement Learning with Python
Published in: Dec 2020Publisher: PacktISBN-13: 9781838644147
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Enes Bilgin

Enes Bilgin works as a senior AI engineer and a tech lead in Microsoft's Autonomous Systems division. He is a machine learning and operations research practitioner and researcher with experience in building production systems and models for top tech companies using Python, TensorFlow, and Ray/RLlib. He holds an M.S. and a Ph.D. in systems engineering from Boston University and a B.S. in industrial engineering from Bilkent University. In the past, he has worked as a research scientist at Amazon and as an operations research scientist at AMD. He also held adjunct faculty positions at the McCombs School of Business at the University of Texas at Austin and at the Ingram School of Engineering at Texas State University.
Read more about Enes Bilgin