You're reading from Hands-On Neuroevolution with Python.

Product type Book

Published in Dec 2019

Publisher Packt

ISBN-13 9781838824914

Pages 368 pages

Edition 1st Edition

Languages

Python

Concepts

Neural Networks

Author (1):

Iaroslav Omelianenko

Pole-Balancing Experiments

In this chapter, you will learn about a classic reinforcement learning experiment, which is also an established benchmark for testing various implementations of the control strategies. In this chapter, we consider three modifications of the cart-pole balancing experiment and develop control strategies that can be used to stabilize the cart-pole apparatuses of given configurations. You will learn how to write accurate simulations of real-life physical systems and how to use them for a definition of the objective function for the NEAT algorithm. After this chapter, you will be ready to apply the NEAT algorithm to implement controllers that can be directly used to control physical appliances.

In this chapter, we will cover the following topics:

The single-pole balancing problem in reinforcement learning
Implementation of the simulator of the cart-pole...

Technical requirements

The following technical requirements should be met to execute the experiments described in this chapter:

Windows 8/10, macOS 10.13 or newer, modern Linux
Anaconda Distribution version 2019.03 or newer

The code for this chapter can be found at https://github.com/PacktPublishing/Hands-on-Neuroevolution-with-Python/tree/master/Chapter4

The single-pole balancing problem

The single-pole balancer (or inverted pendulum) is an unstable pendulum that has its center of mass above its pivot point. It can be stabilized by applying external forces under the control of a specialized system that monitors the angle of the pole and moves the pivot point horizontally back and forth under the center of mass as it starts to fall. The single-pole balancer is a classic problem in dynamics and control theory that is used as a benchmark for testing control strategies, including strategies based on reinforcement learning methods. We are particularly interested in the implementation of the specific control algorithm that uses neuroevolution-based methods to stabilize the inverted pendulum for a given amount of time.

The experiment described in this chapter considers the simulation of the inverted pendulum implemented as a cart that...

Objective function for a single-pole balancing experiment

Our goal is to create a pole balancing controller that's able to maintain a system in a stable state within defined constraints for as long as possible, but at least for the expected number of time steps specified in the experiment configuration (500,000). Thus, the objective function must optimize the duration of stable pole-balancing and can be defined as the logarithmic difference between the expected number of steps and the actual number of steps obtained during the evaluation of the phenotype ANN. The loss function is given as follows:

In this experiment, is the expected number of time steps from the configuration of the experiment, and is the actual number of time steps during which the controller was able to maintain a stable pole balancer state within allowed bounds (refer to the reinforcement signal definition...

The single-pole balancing experiment

Now that we have an objective function defined and implemented along with a simulation of cart-pole apparatus dynamics, we are ready to start writing the source code to run the neuroevolutionary process with the NEAT algorithm. We will use the same NEAT-Python library as in the XOR experiment in the previous chapter, but with the NEAT hyperparameters adjusted appropriately. The hyperparameters are stored in the single_pole_config.ini file, which can be found in the source code repository related to this chapter. You need to copy this file into your local Chapter4 directory, in which you already should have a Python script with the cart-pole simulator we created earlier.

Hyperparameter selection

...

Exercises

Try to increase the value of the node_add_prob parameter and see what happens. Does the algorithm produce any number of hidden nodes, and if so, how many?
Try to decrease/increase the compatibility_threshold value. What happens if you set it to 2.0 or 6.0? Can the algorithm find the solution in each case?
Try to set the elitism value to zero in the DefaultReproduction section. See what happens. How long did the evolutionary process take to find an acceptable solution in this case?
Set the survival_threshold value to 0.5 in the DefaultReproduction section. See how this affects speciation during evolution. Why does it?
Increase the additional_num_runs and additional_steps values in order of magnitude to examine further how well the found control strategy is generalized. Is the algorithm still able to find a winning solution?

The last exercise will lead to an increase...

The double-pole balancing problem

The single-pole balancing problem is easy enough for the NEAT algorithm, which can quickly find the optimal control strategy to maintain a stable system state. To make the experiment more challenging, we present a more advanced version of the cart-pole balancing problem. In this version, the two poles are connected to the moving cart by a hinge.

A schema of the new cart-poles apparatus is as follows:

The cart-poles apparatus with two poles

Before we move to the implementation details of the experiment, we need to define the state variables and equations of motion for the simulation of the double-pole balancing system.

The system state and equations of motion

The goal of the controller is...

Objective function for a double-pole balancing experiment

The objective function for this problem is similar to the objective function defined earlier for the single-pole balancing problem. It is given by the following equations:

In these equations, is the expected number of time steps specified in the configuration of the experiment (100,000), and is the actual number of time steps during which the controller was able to maintain a stable state of the pole balancer within the specified limits.

We use logarithmic scales because most of the trials fail in the first several 100 steps, but we are testing against 100,000 steps. With a logarithmic scale, we have a better distribution of fitness scores, even compared with a small number of steps in failed trials.

The first of the preceding equations defines the loss, which is in the [0,1] range, and the second is a fitness score...

Double-pole balancing experiment

This experiment uses a version of the double-pole balancing problem that assumes full knowledge of the current system state, including the angular velocities of the poles and the velocity of the cart. The criteria of success in this experiment are to keep both poles balanced for 100,000 steps, or approximately 33 minutes of simulated time. The pole is considered balanced when it stays within degrees of vertical, while the cart remains within meters of the track's center.

Hyperparameter selection

Compared to the previous experiment described in this chapter, double-pole balancing is much harder to solve due to its complex motion dynamics. Thus, the search space for a successful control...

Exercises

Try setting the node_add parameter value to 0.02 in the configuration file and see what happens.
Change the seed value of the random number generator and see what happens. Was a solution found with a new value? How is it different from what we have presented in this chapter?

Summary

In this chapter, we learned how to implement control strategies for controllers that can maintain a stable state of a cart-pole apparatus with one or two poles mounted on top. We improved our Python skills and expanded our knowledge of the NEAT-Python library by implementing accurate simulations of physical apparatuses, which was used to define the objective functions for the experiments. Besides this, we learned about two methods for numerical approximations of differential equations, Euler's and Runge-Kutta, and implemented them in Python.

We found that the initial conditions that determine the neuroevolutionary process, such as a random seed number, have a significant impact on the performance of the algorithm. These values determine the entire sequence of numbers that will be generated by a random number generator. They serve as a random attractor that can amplify...