Reader small image

You're reading from  Reinforcement Learning Algorithms with Python

Product typeBook
Published inOct 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789131116
Edition1st Edition
Languages
Right arrow
Author (1)
Andrea Lonza
Andrea Lonza
author image
Andrea Lonza

Andrea Lonza is a deep learning engineer with a great passion for artificial intelligence and a desire to create machines that act intelligently. He has acquired expert knowledge in reinforcement learning, natural language processing, and computer vision through academic and industrial machine learning projects. He has also participated in several Kaggle competitions, achieving high results. He is always looking for compelling challenges and loves to prove himself.
Read more about Andrea Lonza

Right arrow

Model-Based RL

Reinforcement learning algorithms are divided into two classes—model-free methods and model-based methods. These two classes differ by the assumption made about the model of the environment. Model-free algorithms learn a policy from mere interactions with the environment without knowing anything about it, whereas model-based algorithms already have a deep understanding of the environment and use this knowledge to take the next actions according to the dynamics of the model.

In this chapter, we'll give you a comprehensive overview of model-based approaches, highlighting their advantages and disadvantages vis-à-vis model-free approaches, and the differences that arise when the model is known or has to be learned. This latter division is important because it influences how problems are approached and the tools used to solve them. After this introduction...

Model-based methods

Model-free algorithms are a formidable kind of algorithm that have the ability to learn very complex policies and accomplish objectives in complicated and composite environments. As demonstrated in the latest works by OpenAI (https://openai.com/five/) and DeepMind (https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii), these algorithms can actually show long-term planning, teamwork, and adaptation to unexpected situations in challenge games such as StarCraft and Dota 2.

Trained agents have been able to beat top professional players. However, the biggest downside is in the huge number of games that need to be played in order to train agents to master these games. In fact, to achieve these results, the algorithms have been scaled massively to let the agents play hundreds of years' worth of games against themselves. But...

Combining model-based with model-free learning

We just saw how planning can be computationally expensive both during training and runtime, and how, in more complex environments, planning algorithms aren't able to achieve good performances. The other strategy that we briefly hinted at is to learn a policy. A policy is certainly much faster in inference as it doesn't have to plan at each step.

A simple, yet effective, way to learn a policy is to combine model-based with model-free learning. With the latest innovations in model-free algorithms, this combination has gained in popularity and is the most common approach to date. The algorithm we'll develop in the next section, ME-TRPO, is one such method. Let's dive further into these algorithms.

A useful combination

...

ME-TRPO applied to an inverted pendulum

Many variants exist of the vanilla model-based and model-free algorithms introduced in the pseudocode in the A useful combination section. Pretty much all of them propose different ways to deal with the imperfections of the model of the environment.

This is a key problem to address in order to reach the same performance as model-free methods. Models learned from complex environments will always have some inaccuracies. So, the main challenge is to estimate or control the uncertainty of the model to stabilize and accelerate the learning process.

ME-TRPO proposes the use of an ensemble of models to maintain the model uncertainty and regularize the learning process. The models are deep neural networks with different weight initialization and training data. Together, they provide a more robust general model of the environment that is less prone...

Summary

In this chapter, we took a break from model-free algorithms and started discussing and exploring algorithms that learn from a model of the environment. We looked at the key reasons behind the change of paradigm that inspired us to develop this kind of algorithm. We then distinguished two main cases that can be found when dealing with a model, the first in which the model is already known, and the second in which the model has to be learned.

Moreover, we learned how the model can either be used to plan the next actions or to learn a policy. There's no fixed rule to choose one over the other, but generally, it is related to the complexity of the action and observation space and the inference speed. We then investigated the advantages and disadvantages of model-free algorithms and deepened our understanding of how to learn a policy with model-free algorithms by combining...

Questions

  1. Would you use a model-based or a model-free algorithm if you had only 10 games in which to train your agent to play checkers?
  2. What are the disadvantages of model-based algorithms?
  3. If a model of the environment is unknown, how can it be learned?
  4. Why are data aggregation methods used?
  5. How does ME-TRPO stabilize training?
  6. How does using an ensemble of models improve policy learning?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Reinforcement Learning Algorithms with Python
Published in: Oct 2019Publisher: PacktISBN-13: 9781789131116
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Andrea Lonza

Andrea Lonza is a deep learning engineer with a great passion for artificial intelligence and a desire to create machines that act intelligently. He has acquired expert knowledge in reinforcement learning, natural language processing, and computer vision through academic and industrial machine learning projects. He has also participated in several Kaggle competitions, achieving high results. He is always looking for compelling challenges and loves to prove himself.
Read more about Andrea Lonza