Introduction to Intelligent Agents and Learning Environments

Greetings! Welcome to the first chapter of this book. This book will introduce you to the awesome OpenAI Gym learning environment and guide you through an exciting journey to get you equipped with enough skills to train state-of-the-art, artificial intelligence agent-based systems. This book will help you develop hands-on experience with reinforcement learning and deep reinforcement learning through practical projects ranging from developing an autonomous, self-driving car to developing Atari game-playing agents that can surpass human performance. By the time you complete the book, you will be in a position to explore the endless possibilities of using artificial intelligence to solve algorithmic tasks, play games, and fix control problems.

The following topics will be covered in this chapter:

Understanding intelligent agents and learning environments
Understanding what OpenAI Gym is all about
Different categories of tasks/environments that are available, with a brief description of what each category is suitable for
Understanding the key features of OpenAI Gym
Getting an idea about what you can do with the OpenAI Gym toolkit
Creating and visualizing your first Gym environment

Let's start our journey by understanding what an intelligent agent is.

What is an intelligent agent?

A major goal of artificial intelligence is to build intelligent agents. Perceiving their environment, understanding, reasoning and learning to plan, and making decisions and acting upon them are essential characteristics of intelligent agents. We will begin our first chapter by understanding what an intelligent agent is, from the basic definition of agents, to adding intelligence on top of that.

An agent is an entity that acts based on the observation (perception) of its environment. Humans and robots are examples of agents with physical forms.

A human, or an animal, is an example of an agent that uses its organs (eyes, ears, nose, skin, and so on) as sensors to observe/perceive its environment and act using their physical body (arms, hands, legs, head, and so on). A robot uses its sensors (cameras, microphones, LiDAR, radar, and so on) to observe/perceive its environment and act using its physical robotic body (robotic arms, robotic hands/grippers, robotic legs, speakers, and so on).

Software agents are computer programs that are capable of making decisions and taking actions through interaction with their environment. A software agent can be embodied in a physical form, such as a robot. Autonomous agents are entities that make decisions autonomously and take actions based on their understanding of and reasoning about their observations of their environment.

An intelligent agent is an autonomous entity that can learn and improve based on its interactions with its environment. An intelligent agent is capable of analyzing its own behavior and performance using its observations.

In this book, we will develop intelligent agents to solve sequential decision-making problems that can be solved using a sequence of (independent) decisions/actions in a (loosely) Markovian environment, where feedback in the form of reward signals is available (through percepts), at least in some environmental conditions.

Learning environments

A learning environment is an integral component of a system where an intelligent agent can be trained to develop intelligent systems. The learning environment defines the problem or the task for the agent to complete.

A problem or task in which the outcome depends on a sequence of decisions made or actions taken is a sequential decision-making problem. Here are some of the varieties of learning environments:

Fully observable versus partially observable
Deterministic versus stochastic
Episodic versus sequential
Static versus dynamic
Discrete versus continuous
Discrete state space versus continuous state space
Discrete action space versus continuous action space

In this book, we will be using learning environments implemented using the OpenAI Gym Python library, as it provides a simple and standard interface and environment implementations, along with the ability to implement new custom environments.

In the following subsections, we will get a glimpse of the OpenAI Gym toolkit. This section is geared towards familiarizing a complete newbie with the OpenAI Gym toolkit. No prior knowledge or experience is assumed. We will first try to get a feel for the Gym toolkit and walk through the various environments that are available under different categories. We will then discuss the features of Gym that might be of interest to you, irrespective of the application domain that you are interested in. We'll then briefly discuss what the value proposition of the Gym toolkit is and how you can utilize it. We will be building several cool and intelligent agents in subsequent chapters, building on top of the Gym toolkit. So, this chapter is really the foundation for all that. We will also be quickly creating and visualizing our first OpenAI Gym environment towards the end of this chapter. Excited? Let's jump right in.

What is OpenAI Gym?

OpenAI Gym is an open source toolkit that provides a diverse collection of tasks, called environments, with a common interface for developing and testing your intelligent agent algorithms. The toolkit introduces a standard Application Programming Interface (API) for interfacing with environments designed for reinforcement learning. Each environment has a version attached to it, which ensures meaningful comparisons and reproducible results with the evolving algorithms and the environments themselves.

The Gym toolkit, through its various environments, provides an episodic setting for reinforcement learning, where an agent's experience is broken down into a series of episodes. In each episode, the initial state of the agent is randomly sampled from a distribution, and the interaction between the agent and the environment proceeds until the environment reaches a terminal state. Do not worry if you are not familiar with reinforcement learning. You will be introduced to reinforcement learning in Chapter 2, Reinforcement Learning and Deep Reinforcement Learning.

Some of the basic environments available in the OpenAI Gym library are shown in the following screenshot:

Examples of basic environments available in the OpenAI Gym with a short description of the task

At the time of writing this book, the OpenAI Gym natively has about 797 environments spread over different categories of tasks. The famous Atari category has the largest share with about 116 (half with screen inputs and half with RAM inputs) environments! The categories of tasks/environments supported by the toolkit are listed here:

Algorithmic
Atari
Board games
Box2D
Classic control
Doom (unofficial)
Minecraft (unofficial)
MuJoCo
Soccer
Toy text
Robotics (newly added)

The various types of environment (or tasks) available under the different categories, along with a brief description of each environment, is given next. Keep in mind that you may need some additional tools and packages installed on your system to run environments in each of these categories. Do not worry! We will go over every single step you need to do to get any environment up and running in the upcoming chapters. Stay tuned!

We will now see the previously mentioned categories in detail, as follows:

Algorithmic environments: They provide tasks that require an agent to perform computations, such as the addition of multi-digit numbers, copying data from an input sequence, reversing sequences, and so on.
Atari environments: These offer interfaces to several classic Atari console games. These environment interfaces are wrappers on top of the Arcade Learning Environment (ALE). They provide the game's screen images or RAM as input to train your agents.
Board games: This category has the environment for the popular game Go on 9 x 9 and 19 x 19 boards. For those of you who have been following the recent breakthroughs by Google's DeepMind in the game of Go, this might be very interesting. DeepMind developed an agent named AlphaGo, which used reinforcement learning and other learning and planning techniques, including Monte Carlo tree search, to beat the top-ranked human Go players in the world, including Fan Hui and Lee Sedol. DeepMind also published their work on AlphaGo Zero, which was trained from scratch, unlike the original AlphaGo, which used sample games played by humans. AlphaGo Zero surpassed the original AlphaGo's performance. Later, AlphaZero was published; it is an autonomous system that learned to play chess, Go, and Shogi using self-play (without any human supervision for training) and reached performance levels higher than the previous systems developed.
Box2D: This is an open source physics engine used for simulating rigid bodies in 2D. The Gym toolkit has a few continuous control tasks that are developed using the Box2D simulator:

A sample list of environments built using the Box2D simulator

The tasks include training a bipedal robot to walk, navigating a lunar lander to its landing pad, and training a race car to drive around a race track. Exciting! In this book, we will train an AI agent using reinforcement learning to drive a race car around the track autonomously! Stay tuned.

Classic control: This category has many tasks developed for it and was used widely in reinforcement learning literature in the past. These tasks formed the basis for some of the early development and benchmarking of reinforcement learning algorithms. For example, one of the environments available under the classic control category is the Mountain Car environment, which was first introduced in 1990 by Andrew Moore (Dean of the School of Computer Science at CMU, and Pittsburgh founder) in his PhD thesis. This environment is still used sometimes as a test bed for reinforcement learning algorithms. You will create your first OpenAI Gym environment from this category in just a few moments towards the end of this chapter!
Doom: This category provides an environment interface for the popular first-person shooter game Doom. It is an unofficial, community-created Gym environment category and is based on ViZDoom, which is a Doom-based AI research platform providing an easy-to-use API suitable for developing intelligent agents from raw visual inputs. It enables the development of AI bots that can play several challenging rounds of the Doom game using only the screen buffer! If you have played this game, you know how thrilling and difficult it is to progress through some of the rounds without losing lives in the game! Although this is not a game with cool graphics like some of the new first-person shooter games, the visuals aside, it is a great game. In recent times, several studies in machine learning, especially in deep reinforcement learning, have utilized the ViZDoom platform and have developed new algorithms to tackle the goal-directed navigation problems encountered in the game. You can visit ViZDoom's research web page (http://vizdoom.cs.put.edu.pl/research) for a list of research studies that use this platform. The following screenshot lists some of the missions that are available as separate environments in the Gym for training your agents:

List of missions or rounds available in Doom environments

MineCraft: This is another great platform. Game AI developers especially might be very much interested in this environment. MineCraft is a popular video game among hobbyists. The MineCraft Gym environment was built using Microsoft's Malmo project, which is a platform for artificial intelligence experimentation and research built on top of Minecraft. Some of the missions that are available as environments in the OpenAI Gym are shown in the following screenshot. These environments provide inspiration for developing solutions to challenging new problems presented by this unique environment:

Environments in MineCraft available in OpenAI Gym

MuJoCo: Are you interested in robotics? Do you dream of developing algorithms that can make a humanoid walk and run, or do a backflip like Boston Dynamic's Atlas Robot? You can! You will be able to apply the reinforcement learning methods you will learn in this book in the OpenAI Gym MuJoCo environment to develop your own algorithm that can make a 2D robot walk, run, swim, or hop, or make a 3D multi-legged robot walk or run! In the following screenshot, there are some cool, real-world, robot-like environments available under the MuJoCo environment:

Soccer: This an environment suitable for training multiple agents that can cooperate together. The soccer environments available through the Gym toolkit have continuous state and action spaces. Wondering what that means? You will learn all about it when we talk about reinforcement learning in the next chapter. For now, here is a simple explanation: a continuous state and action space means that the action that an agent can take and the input that the agent receives are both continuous values. This means that they can take any real number value between, say, 0 and 1 (0.5, 0.005, and so on), rather than being limited to a few discrete sets of values, such as {1, 2, 3}. There are three types of environment. The plain soccer environment initializes a single opponent on the field and gives a reward of +1 for scoring a goal and 0 otherwise. In order for an agent to score a goal, it will need to learn to identify the ball, approach the ball, and kick the ball towards the goal. Sound simple enough? But it is really hard for a computer to figure that out on its own, especially when all you say is +1 when it scores a goal and 0 in any other case. It does not have any other clues! You can develop agents that will learn all about soccer by themselves and learn to score goals using the methods that you will learn in this book.

Toy text: OpenAI Gym also has some simple text-based environments under this category. These include some classic problems such as Frozen Lake, where the goal is to find a safe path to cross a grid of ice and water tiles. It is categorized under toy text because it uses a simpler environment representation—mostly through text.

With that, you have a very good overview of all the different categories and types of environment that are available as part of the OpenAI Gym toolkit. It is worth noting that the release of the OpenAI Gym toolkit was accompanied by an OpenAI Gym website (gym.openai.com), which maintained a scoreboard for every algorithm that was submitted for evaluation. It showcased the performance of user-submitted algorithms, and some submissions were also accompanied by detailed explanations and source code. Unfortunately, OpenAI decided to withdraw support for the evaluation website. The service went offline in September 2017.

Now you have a good picture of the various categories of environment available in OpenAI Gym and what each category provides you with. Next, we will look at the key features of OpenAI Gym that make it an indispensable component in many of today's advancements in intelligent agent development, especially those that use reinforcement learning or deep reinforcement learning.

Understanding the features of OpenAI Gym

In this section, we will take a look at the key features that have made the OpenAI Gym toolkit very popular in the reinforcement learning community and led to it becoming widely adopted.

Simple environment interface

OpenAI Gym provides a simple and common Python interface to environments. Specifically, it takes an action as input and provides observation, reward, done and an optional info object, based on the action as the output at each step. If this does not make perfect sense to you yet, do not worry. We will go over the interface again in a more detailed manner to help you understand. This paragraph is just to give you an overview of the interface to make it clear how simple it is. This provides great flexibility for users as they can design and develop their agent algorithms based on any paradigm they like, and not be constrained to use any particular paradigm because of this simple and convenient interface.

Comparability and reproducibility

We intuitively feel that we should be able to compare the performance of an agent or an algorithm in a particular task to the performance of another agent or algorithm in the same task. For example, if an agent gets a score of 1,000 on average in the Atari game of Space Invaders, we should be able to tell that this agent is performing worse than an agent that scores 5000 on average in the Space Invaders game in the same amount of training time. But what happens if the scoring system for the game is slightly changed? Or if the environment interface was modified to include additional information about the game states that will provide an advantage to the second agent? This would make the score-to-score comparison unfair, right?

To handle such changes in the environment, OpenAI Gym uses strict versioning for environments. The toolkit guarantees that if there is any change to an environment, it will be accompanied by a different version number. Therefore, if the original version of the Atari Space Invaders game environment was named SpaceInvaders-v0 and there were some changes made to the environment to provide more information about the game states, then the environment's name would be changed to SpaceInvaders-v1. This simple versioning system makes sure we are always comparing performance measured on the exact same environment setup. This way, the results obtained are comparable and reproducible.

Ability to monitor progress

All the environments available as part of the Gym toolkit are equipped with a monitor. This monitor logs every time step of the simulation and every reset of the environment. What this means is that the environment automatically keeps track of how our agent is learning and adapting with every step. You can even configure the monitor to automatically record videos of the game while your agent is learning to play. How cool is that?

What can you do with the OpenAI Gym toolkit?

The Gym toolkit provides a standardized way of defining the interface for environments developed for problems that can be solved using reinforcement learning. If you are familiar with or have heard of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), you may realize how much of an impact a standard benchmarking platform can have on accelerating research and development. For those of you who are not familiar with ILSVRC, here is a brief summary: it is a competition where the participating teams evaluate the supervised learning algorithms they have developed for the given dataset and compete to achieve higher accuracy with several visual recognition tasks. This common platform, coupled with the success of deep neural network-based algorithms popularized by AlexNet (https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf), paved the way for the deep learning era we are in at the moment.

In a similar way, the Gym toolkit provides a common platform to benchmark reinforcement learning algorithms and encourages researchers and engineers to develop algorithms that can achieve higher rewards for several challenging tasks. In short, the Gym toolkit is to reinforcement learning what ILSVRC is to supervised learning.

Creating your first OpenAI Gym environment

We will be going over the steps to set up the OpenAI Gym dependencies and other tools required for training your reinforcement learning agents in detail in Chapter 3, Getting Started with OpenAI Gym and Deep Reinforcement Learning. This section provides a quick way to get started with the OpenAI Gym Python API on Linux and macOS using virtualenv so that you can get a sneak peak into the Gym!

MacOS and Ubuntu Linux systems come with Python installed by default. You can check which version of Python is installed by running python --version from a terminal window. If this returns python followed by a version number, then you are good to proceed to the next steps! If you get an error saying the Python command was not found, then you have to install Python. Please refer to the detailed installation section in Chapter 3, Getting Started with OpenAI Gym and Deep Reinforcement Learning of this book:

Install virtualenv:

$pip install virtualenv

If pip is not installed on your system, you can install it by typing sudo easy_install pip.

Create a virtual environment named openai-gym using the virtualenv tool:

 $virtualenv openai-gym

Activate the openai-gym virtual environment:

$source openai-gym/bin/activate

Install all the packages for the Gym toolkit from upstream:

$pip install -U gym

If you get permission denied or failed with error code 1 when you run the pip install command, it is most likely because the permissions on the directory you are trying to install the package to (the openai-gym directory inside virtualenv in this case) needs special/root privileges. You can either run sudo -H pip install -U gym[all] to solve the issue or change permissions on the openai-gym directory by running sudo chmod -R o+rw ~/openai-gym.

Test to make sure the installation is successful:

$python -c 'import gym; gym.make("CartPole-v0");'

Creating and visualizing a new Gym environment

In just a minute or two, you have created an instance of an OpenAI Gym environment to get started!

Let's open a new Python prompt and import the gym module:

>>import gym

Once the gym module is imported, we can use the gym.make method to create our new environment like this:

>>env = gym.make('CartPole-v0')
>>env.reset()
env.render()

This will bring up a window like this:

Hooray!

Summary

Congrats on completing the first chapter! Hope you had fun creating your own environment. In this chapter, you learned what OpenAI Gym is all about, what features it provides, and what you can do with the toolkit. You now have a very good idea about OpenAI Gym. In the next chapter, we will go over the basics of reinforcement learning to give you a good foundation, which will help you build your cool intelligent agents as you progress through the book. Excited? Move on to the next chapter!