Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Python Reinforcement Learning Projects

You're reading from  Python Reinforcement Learning Projects

Product type Book
Published in Sep 2018
Publisher Packt
ISBN-13 9781788991612
Pages 296 pages
Edition 1st Edition
Languages
Authors (3):
Sean Saito Sean Saito
Profile icon Sean Saito
Yang Wenzhuo Yang Wenzhuo
Profile icon Yang Wenzhuo
Rajalingappaa Shanmugamani Rajalingappaa Shanmugamani
Profile icon Rajalingappaa Shanmugamani
View More author details

Chapter 5. Building Virtual Worlds in Minecraft

In the two previous  chapters, we discussed the deep Q-learning (DQNalgorithm for playing Atari games and the Trust Region Policy Optimization (TRPO) algorithm for continuous control tasks. We saw the big success of these algorithms in solving complex problems when compared to traditional reinforcement learning algorithms without the use of deep neural networks to approximate the value function or the policy function. Their main disadvantage, especially for DQN, is that the training step converges too slowly, for example, training an agent to play Atari games takes about one week. For more complex games, even one week's training is insufficient.

This chapter will introduce a more complicated example, Minecraft, which is a popular online video game created by Swedish game developer Markus Persson and later developed by Mojang. You will learn how to launch a Minecraft environment using OpenAI Gym and play different missions. In order to build...

Introduction to the Minecraft environment


The original OpenAI Gym does not contain the Minecraft environment. We need to install a Minecraft environment bundle, available at https://github.com/tambetm/gym-minecraft. This bundle is built based on Microsoft's Malmö, which is a platform for AI experimentation and research built on top of Minecraft.

Before installing the gym-minecraft package, Malmö should first be downloaded from https://github.com/Microsoft/malmo. We can download the latest pre-built version from https://github.com/Microsoft/malmo/releases. After unzipping the package, go to the Minecraft folder and run launchClient.bat on Windows, or launchClient.sh on Linux/MacOS, to launch a Minecraft environment. If it is successfully launched, we can now install gym-minecraft via the following scripts:

python3 -m pip install gym
python3 -m pip install pygame

git clone https://github.com/tambetm/minecraft-py.git
cd minecraft-py
python setup.py install

git clone https://github.com/tambetm...

Data preparation


In the Atari environment, recall that there are three modes for each Atari game, for example, Breakout, BreakoutDeterministic, and BreakoutNoFrameskip, and each mode has two versions, for example, Breakout-v0 and Breakout-v4. The main difference between the three modes is the frameskip parameter that indicates the number of frames (steps) the one action is repeated on. This is called the frame-skipping technique, which allows us to play more games without significantly increasing the runtime.

However, in the Minecraft environment, there is only one mode where the frameskip parameter is equal to one. Therefore, in order to apply the frame-skipping technique, we need to explicitly repeat a certain action frameskip multiple times during one timestep. Besides this, the frame images returned by the step function are RGB images. Similar to the Atari environment, the observed frame images are converted to grayscale and then resized to 84x84. The following code provides the wrapper...

Asynchronous advantage actor-critic algorithm


In the previous chapters, we discussed the DQN for playing Atari games and the use of the DPG and TRPO algorithms for continuous control tasks. Recall that DQN has the following architecture:

At each timestep

, the agent observes the frame image 

and selects an action 

based on the current learned policy. The emulator (the Minecraft environment) executes this action and returns the next frame image 

and the corresponding reward

. The quadruplet 

is then stored in the experience memory and is taken as a sample for training the Q-network by minimizing the empirical loss function via stochastic gradient descent.

Deep reinforcement learning algorithms based on experience replay have achieved unprecedented success in playing Atari games. However, experience replay has several disadvantages:

  • It uses more memory and computation per real interaction
  • It requires off-policy learning algorithms that can update from data generated by an older policy

In order...

Implementation of A3C


We will now look at how to implement A3C using Python and TensorFlow. Here, the policy network and value network share the same feature representation. We implement two kinds of policies: one is based on the CNN architecture used in DQN, and the other is based on LSTM.

We implement the FFPolicy class for the policy based on CNN:

class FFPolicy:

    def __init__(self, input_shape=(84, 84, 4), n_outputs=4, network_type='cnn'):

        self.width = input_shape[0]
        self.height = input_shape[1]
        self.channel = input_shape[2]
        self.n_outputs = n_outputs
        self.network_type = network_type
        self.entropy_beta = 0.01

        self.x = tf.placeholder(dtype=tf.float32, 
                                shape=(None, self.channel, self.width, self.height))
        self.build_model()

The constructor requires three arguments:

  1.  input_shape
  2. n_outputs
  3. network_type

 

input_shape is the size of the input image. After data preprocessing, the input is an 84x84x4...

Experiments


The full implementation of the A3C algorithm can be downloaded from our GitHub repository (https://github.com/PacktPublishing/Python-Reinforcement-Learning-Projects). There are three environments in our implementation we can test. The first one is the special game, demo, introduced in Chapter 3, Playing Atari Games. For this game, A3C only needs to launch two agents to achieve good performance. Run the following command in the src folder:

python3 train.py -w 2 -e demo

The first argument, -w, or --num_workers, indicates the number of launched agents. The second argument, -e, or --env, specifies the environment, for example, demo. The other two environments are Atari and Minecraft. For Atari games, A3C requires at least 8 agents running in parallel. Typically, launching 16 agents can achieve better performance:

python3 train.py -w 8 -e Breakout

For Breakout, A3C takes about 2-3 hours to achieve a score of 300. If you have a decent PC with more than 8 cores, it is better to test it...

Summary


This chapter introduced the Gym Minecraft environment, available at https://github.com/tambetm/gym-minecraft. You have learned how to launch a Minecraft mission and how to implement an emulator for it. The most important part of this chapter was the asynchronous reinforcement learning framework. You learned what the shortcomings of DQN are, and why DQN is difficult to apply in complex tasks. Then, you learned how to apply the asynchronous reinforcement learning framework in the actor-critic method REINFORCE, which led us to the A3C algorithm. Finally, you learned how to implement A3C using Tensorflow and how to handle multiple terminals using TMUX. The tricky part in the implementation is that of the global shared parameters. This is related to creating a cluster of TensorFlow servers. For the readers who want to learn more about this, visit https://www.tensorflow.org/deploy/distributed.

In the following chapters, you will learn more about how to apply reinforcement learning algorithms...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Python Reinforcement Learning Projects
Published in: Sep 2018 Publisher: Packt ISBN-13: 9781788991612
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}