Reader small image

You're reading from  Deep Reinforcement Learning Hands-On. - Second Edition

Product typeBook
Published inJan 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781838826994
Edition2nd Edition
Languages
Right arrow
Author (1)
Maxim Lapan
Maxim Lapan
author image
Maxim Lapan

Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.
Read more about Maxim Lapan

Right arrow

Beyond Model-Free – Imagination

Model-based methods allow us to decrease the amount of communication with the environment by building a model of the environment and using it during training. In this chapter, we will:

  • Take a brief look at the model-based methods in reinforcement learning (RL)
  • Reimplement the model, described by DeepMind researchers in the paper Imagination-Augmented Agents for Deep Reinforcement Learning (https://arxiv.org/abs/1707.06203), that adds imagination to agents

Model-based methods

To begin, let's discuss the difference between the model-free approach that we have used in the book and model-based methods, including their strong and weak points and where they might be applicable.

Model-based versus model-free

In the The taxonomy of RL methods section in Chapter 4, The Cross-Entropy Method, we saw several different angles from which we can classify RL methods. We distinguished three main aspects:

  • Value-based and policy-based
  • On-policy and off-policy
  • Model-free and model-based

There were enough examples of methods on both sides of the first and second categories, but all the methods that we have covered so far were 100% model-free. However, this doesn't mean that model-free methods are more important or better than their model-based counterparts. Historically, due to their sample efficiency, the model-based methods have been used in the robotics field and other industrial controls. This has also happened...

The imagination-augmented agent

The overall idea of the new architecture, called imagination-augmented agent (I2A), is to allow the agent to imagine future trajectories using the current observations and incorporate these imagined paths into its decision process. The high-level architecture is shown in the following diagram:

Figure 22.1: The I2A architecture

The agent consists of two different paths used to transform the input observation: model-free and imagination. Model-free is a standard set of convolution layers that transforms the input image in high-level features. The other path, imagination, consists of a set of trajectories imagined from the current observation. The trajectories are called rollouts and they are produced for every available action in the environment. Every rollout consists of a fixed number of steps into the future, and on every step, a special model, called the environment model (EM) (but not to be confused with the expectation maximization method...

I2A on Atari Breakout

The training path of I2A is a bit complicated and includes a lot of code and several steps. To understand it better, let's start with a brief overview. In this example, we will implement the I2A architecture described in the paper [2], adopted to the Atari environments, and test it on the Breakout game. The overall goal is to check the training dynamics and the effect of imagination augmentation on the final policy.

Our example consists of three parts, which correspond to different steps in the training:

  1. The baseline advantage actor-critic (A2C) agent in Chapter22/01_a2c.py. The resulting policy is used for obtaining observations of the EM.
  2. The EM training in Chapter22/02_imag.py. It uses the model obtained on the previous step to train the EM in an unsupervised way. The result is the EM weights.
  3. The final I2A agent training in Chapter22/03_i2a.py. In this step, we use the EM from step 2 to train a full I2A agent, which combines the model...

Experiment results

In this section, we will take a look at the results of our multistep training process.

The baseline agent

To train the agent, run Chapter22/01_a2c.py with the optional --cuda flag to enable the graphics processing unit (GPU) and the required -n option with the experiment name used in TensorBoard and in a directory name to save the models.

Chapter22$ ./01_a2c.py --cuda -n tt
AtariA2C(
  (conv): Sequential(
    (0): Conv2d(2, 32, kernel_size=(8, 8), stride=(4, 4))
    (1): ReLU()
    (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
    (3): ReLU()
    (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
    (5): ReLU()
  )
  (fc): Sequential(
    (0): Linear(in_features=3136, out_features=512, bias=True)
    (1): ReLU()
  )
  (policy): Linear(in_features=512, out_features=4, bias=True)
  (value): Linear(in_features=512, out_features=1, bias=True)
)
4: done 13 episodes, mean_reward=0.00, best_reward=0.00, speed=696.72
9: done 12 episodes, mean_reward...

Summary

In this chapter, we discussed the model-based approach to RL and implemented one of the recent research architectures from DeepMind that augments the model of the environment. This model tries to join both model-free and model-based paths into one to allow the agent to decide which knowledge to use.

In the next chapter, we will take a look at a recent DeepMind breakthrough in the area of full-information games: the AlphaGo Zero algorithm.

References

  1. Reinforcement Learning with Unsupervised Auxiliary Tasks by Max Jaderberg, Volodymyr Mnih, and others, (arXiv:1611.05397)
  2. Imagination-Augmented Agents for Deep Reinforcement Learning by Theophane Weber, Sebastien Racantiere, and others, (arXiv:1707.06203)
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Reinforcement Learning Hands-On. - Second Edition
Published in: Jan 2020Publisher: PacktISBN-13: 9781838826994
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Maxim Lapan

Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.
Read more about Maxim Lapan