Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Python Reinforcement Learning Projects

You're reading from  Python Reinforcement Learning Projects

Product type Book
Published in Sep 2018
Publisher Packt
ISBN-13 9781788991612
Pages 296 pages
Edition 1st Edition
Languages
Authors (3):
Sean Saito Sean Saito
Profile icon Sean Saito
Yang Wenzhuo Yang Wenzhuo
Profile icon Yang Wenzhuo
Rajalingappaa Shanmugamani Rajalingappaa Shanmugamani
Profile icon Rajalingappaa Shanmugamani
View More author details

Chapter 3. Playing Atari Games

an a machine learn how to play video games by itself and beat human players? Solving this problem is the first step toward general artificial intelligence (AI) in the field of gaming. The key technique to creating an AI player is deep reinforcement learning. In 2015, Google's DeepMind, one of the foremost AI/machine learning research teams (who are famous for building AlphaGo, the machine that beat Go champion Lee Sedol) proposed the deep Q-learning algorithm to build an AI player that can learn to play Atari 2600 games, and surpass a human expert on several games. This work made a great impact on AI research, showing the possibility of building general AI systems.

In this chapter, we will introduce how to use gym to play Atari 2600 games, and then explain why the deep Q-learning algorithm works and how to implement it using TensorFlow. The goal is to be able to understand deep reinforcement learning algorithms and how to apply them to solve real tasks. This...

Introduction to Atari games


Atari, Inc. was an American video game developer and home computer company founded in 1972 by Nolan Bushnell and Ted Dabney. In 1976, Bushnell developed the Atari video computer system, or Atari VCS (later renamed Atari 2600). Atari VCS was a flexible console that was capable of playing the existing Atari games, which included a console, two joysticks, a pair of paddles, and the combat game cartridge. The following screenshot depicts an Atari console:

Atari 2600 has more than 500 games that were published by Atari, Sears, and some third parties. Some famous games are Breakout, Pac-Man, Pitfall!, Atlantis, Seaquest, and Space Invaders.

As a direct result of the North American video game crash of 1983, Atari, Inc. was closed and its properties were split in 1984. The home computing and game console divisions of Atari were sold to Jack Tramiel under the name Atari corporation in July 1984.

For readers who are interested in playing Atari games, here are several online...

Building an Atari emulator


OpenAI gym provides an Atari 2600 game environment with a Python interface. The games are simulated by the arcade learning environment, which uses the Stella Atari emulator. For more details, read the following papers:

  • MG Bellemare, Y Naddaf, J Veness, and M Bowling, The arcade learning environment: An evaluation platform for general agents, journal of Artificial Intelligence Research (2012)
  • Stella: A Multi-Platform Atari 2600 VCS emulator, http://stella.sourceforge.net/

Getting started

If you don't have a full install of OpenAI gym, you can install the Atari environment dependencies via the following:

pip install gym[atari]

This requires the cmake tools. This command will automatically compile the arcade learning environment and its Python interface, atari-py. The compilation will take a few minutes on a common laptop, so go have a cup of coffee.

After the Atari environment is installed, try the following:

import gym
atari = gym.make('Breakout-v0')
atari.reset()
atari...

Data preparation


Careful readers may notice that a suffix, v0, follows each game name, and come up with the following questions: What is the meaning of v0?Is it allowable to replace it with v1 or v2? Actually, this suffix has a relationship with the data preprocessing step for the screen images (observations) extracted from the Atari environment.

There are three modes for each game, for example, Breakout, BreakoutDeterministic, and BreakoutNoFrameskip, and each mode has two versions, for example, Breakout-v0 and Breakout-v4. The main difference between the three modes is the value of the frameskip parameter in the Atari environment. This parameter indicates the number of frames (steps) the one action is repeated on. This is called the frame-skipping technique, which allows us to play more games without significantly increasing the runtime.

For Breakout, frameskip is randomly sampled from 2 to 5. The following screenshots show the frame images returned by the step function when the action LEFT...

Deep Q-learning


Here comes the fun part—the brain design of our AI Atari player. The core algorithm is based on deep reinforcement learning or deep RL. In order to understand it better, some basic mathematical formulations are required. Deep RL is a perfect combination of deep learning and traditional reinforcement learning. Without understanding the basic concepts about reinforcement learning, it is difficult to apply deep RL correctly in real applications, for example, it is possible that someone may try to use deep RL without defining state space, reward, and transition properly.

Well, don't be afraid of the difficulty of the formulations. We only need high school-level mathematics, and will not go deep into the mathematical proofs of why traditional reinforcement learning algorithms work. The goal of this chapter is to learn the basic Q-learning algorithm, to know how to extend it into the deep Q-learning algorithm (DQN), and to understand the intuition behind these algorithms. Besides...

Implementation of DQN


This chapter will show you how to implement all the components, for example, Q-network, replay memory, trainer, and Q-learning optimizer, of the deep Q-learning algorithm with Python and TensorFlow.

We will  implement the QNetwork class for the Q-network that we discussed in the previous chapter, which is defined as follows:

class QNetwork:

    def __init__(self, input_shape=(84, 84, 4), n_outputs=4, 
                 network_type='cnn', scope='q_network'):

        self.width = input_shape[0]
        self.height = input_shape[1]
        self.channel = input_shape[2]
        self.n_outputs = n_outputs
        self.network_type = network_type
        self.scope = scope

        # Frame images
        self.x = tf.placeholder(dtype=tf.float32, 
                                shape=(None, self.channel, 
                                       self.width, self.height))
        # Estimates of Q-value
        self.y = tf.placeholder(dtype=tf.float32, shape=(None,))
       ...

Experiments


The full implementation of the deep Q-learning algorithm can be downloaded from GitHub (link xxx). To train our AI player for Breakout, run the following command under the src folder:

python train.py -g Breakout -d gpu

There are two arguments in train.py. One is -g or --game, indicating the name of the game one wants to test. The other one is -d or --device, which specifies the device (CPU or GPU) one wants to use to train the Q-network.

For Atari games, even with a high-end GPU, it will take 4-7 days to make our AI player achieve human-level performance. In order to test the algorithm quickly, a special game called demo is implemented as a lightweight benchmark. Run the demo via the following:

python train.py -g demo -d cpu

 

The demo game is based on the GridWorld game on the website at https://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html:

In this game, a robot in a 2D grid world has nine eyes pointing in different angles, and each eye senses three values along its direction...

Summary


Congratulations! You just learned four important things. The first one is how to implement an Atari game emulator using gym, and how to play Atari games for relaxation and having fun. The second one is that you learned how to preprocess data in reinforcement learning tasks such as Atari games. For practical machine learning applications, you will spend a great deal of time on understanding and refining data, which affects the performance of an AI system a lot. The third one is the deep Q-learning algorithm. You learned the intuition behind it, for example, why the replay memory is necessary, why the target network is needed, where the update rule comes from, and so on. The final one is that you learned how to implement DQN using TensorFlow, and how to visualize the training process. Now, you are ready for the more advanced topics that we will discuss in the following chapters.

In the next chapter, you will learn how to simulate classic control tasks, and how to implement the state...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Python Reinforcement Learning Projects
Published in: Sep 2018 Publisher: Packt ISBN-13: 9781788991612
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}