Deep Neuroevolution

In this chapter, you will learn about the deep neuroevolution method, which can be used to train Deep Neural Networks (DNNs). DNNs are conventionally trained using backpropagation methods based on the descent of the error gradient, which is computed with respect to the weights of the connections between neural nodes. Although gradient-based learning is a powerful technique that conceived the current era of deep machine learning, it has its drawbacks, such as long training times and enormous computing power requirements.

In this chapter, we will demonstrate how deep neuroevolution methods can be used for reinforcement learning and how they considerably outperform traditional DQN, A3C gradient-based learning methods of training DNNs. By the end of this chapter, you will have a solid understanding of deep neuroevolution methods, and you'll also have practical...

Technical requirements

The following technical requirements should be met so that you can complete the experiments described in this chapter:

A modern PC with a Nvidia graphics accelerator GeForce GTX 1080Ti or better
MS Windows 10, Ubuntu Linux 16.04, or macOS 10.14 with a discrete GPU
Anaconda Distribution version 2019.03 or newer

The code for this chapter can be found at https://github.com/PacktPublishing/Hands-on-Neuroevolution-with-Python/tree/master/Chapter10

Deep neuroevolution for deep reinforcement learning

In this book, we have already covered how the neuroevolution method can be applied to solve simple reinforcement learning (RL) tasks, such as single- and double-pole balancing in Chapter 4, Pole-Balancing Experiments. However, while the pole-balancing experiment is exciting and easy to conduct, it is pretty simple and operates with tiny artificial neural networks. In this chapter, we will discuss how to apply neuroevolution to reinforcement learning problems that require immense ANNs to approximate the value function of the RL algorithm.

The RL algorithm learns through trial and error. Almost all the variants of RL algorithms try to optimize the value function, which maps the current state of the system to the appropriate action that will be performed in the next time step. The most widely used classical version of the RL algorithm...

Evolving an agent to play the Frostbite Atari game using deep neuroevolution

Recently, classic Atari games were encapsulated by the Atari Learning Environment (ALE) to become a benchmark for testing different implementations of RL algorithms. Algorithms that are tested against the ALE are required to read the game state from the pixels of the game screen and devise a sophisticated control logic that allows the agent to win the game. Thus, the task of the algorithm is to evolve an understanding of the game situation in terms of the game character and its adversaries. Also, the algorithm needs to understand the reward signal that's received from the game screen in the form of the final game score at the end of a single game run.

The Frostbite Atari game

...

Training an agent to play the Frostbite game

Now that we have discussed the theory behind the game-playing agent's implementation, we are ready to start working on it. Our implementation is based on the source code provided by the Uber AI Lab on GitHub at https://github.com/uber-research/deep-neuroevolution. The source code in this repository contains an implementation of two methods to train DNNs: the CPU-based methods for multicore systems (up to 720 cores) and the GPU-based methods. We are interested in the GPU-based implementation because the majority of practitioners don't have access to such behemoths of technology as a PC with 720 CPU cores. At the same time, it is pretty easy to get access to a modern Nvidia GPU.

Next, we'll discuss the implementation details.

...

Running the Frostbite Atari experiment

Now that we have discussed all the particulars of the experiment's implementation, it is time to run the experiment. However, the first thing we need to do is create an appropriate work environment, which we'll discuss next.

Setting up the work environment

The work environment for training the agent to play Atari games assumes that a large controller ANN needs to be trained in the process. We already stated that the controller ANN has more than 4 million training parameters and requires a lot of computational resources to be able to evaluate. Fortunately, modern GPU accelerators allow the execution of massive parallel computations simultaneously. This feature is convenient...

Visual inspector for neuroevolution

During the neuroevolution process, we are evolving a population of individuals. Each of the individuals is evaluated against the test environment (such as an Atari game) and reward scores are collected per individual for each generation of evolution. To explore the general dynamics of the neuroevolution process, we need to have a tool that can visualize the cloud of results for each individual in each generation of evolution. Also, it is interesting to see the changes in the fitness score of the elite individual to understand the progress of the evolution process.

To address these requirements, the researchers from Uber AI developed the VINE tool, which we'll discuss next.

Setting up the work environment

...

Exercises

Try to increase the population_size parameter in the experiment and see what happens.
Try to create the experiment results, which can be visualized using VINE. You can use the master_extract_parent_ga and master_extract_cloud_ga helper functions in the ga.py script to do this.

Summary

In this chapter, we discussed how neuroevolution can be used to train large ANNs with more than 4 million trainable parameters. You learned how to apply this learning method to create successful agents that are able to play classic Atari games by learning the game rules solely from observing the game screens. By completing the Atari game-playing experiment that was described in this chapter, you have learned about CNNs and how they can be used to map high-dimensional inputs, such as game screen observations, into the appropriate game actions. You now have a solid understanding of how CNNs can be used for value-function approximations in the deep RL method, which is guided by the deep neuroevolution algorithm.

With the knowledge that you've acquired from this chapter, you will be able to apply deep neuroevolution methods in domains with high-dimensional input data...