Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Neuroevolution with Python.

You're reading from  Hands-On Neuroevolution with Python.

Product type Book
Published in Dec 2019
Publisher Packt
ISBN-13 9781838824914
Pages 368 pages
Edition 1st Edition
Languages
Author (1):
Iaroslav Omelianenko Iaroslav Omelianenko
Profile icon Iaroslav Omelianenko

Table of Contents (18) Chapters

Preface Section 1: Fundamentals of Evolutionary Computation Algorithms and Neuroevolution Methods
Overview of Neuroevolution Methods Python Libraries and Environment Setup Section 2: Applying Neuroevolution Methods to Solve Classic Computer Science Problems
Using NEAT for XOR Solver Optimization Pole-Balancing Experiments Autonomous Maze Navigation Novelty Search Optimization Method Section 3: Advanced Neuroevolution Methods
Hypercube-Based NEAT for Visual Discrimination ES-HyperNEAT and the Retina Problem Co-Evolution and the SAFE Method Deep Neuroevolution Section 4: Discussion and Concluding Remarks
Best Practices, Tips, and Tricks Concluding Remarks Other Books You May Enjoy

Pole-Balancing Experiments

In this chapter, you will learn about a classic reinforcement learning experiment, which is also an established benchmark for testing various implementations of the control strategies. In this chapter, we consider three modifications of the cart-pole balancing experiment and develop control strategies that can be used to stabilize the cart-pole apparatuses of given configurations. You will learn how to write accurate simulations of real-life physical systems and how to use them for a definition of the objective function for the NEAT algorithm. After this chapter, you will be ready to apply the NEAT algorithm to implement controllers that can be directly used to control physical appliances.

In this chapter, we will cover the following topics:

  • The single-pole balancing problem in reinforcement learning
  • Implementation of the simulator of the cart-pole...

Technical requirements

The single-pole balancing problem

The single-pole balancer (or inverted pendulum) is an unstable pendulum that has its center of mass above its pivot point. It can be stabilized by applying external forces under the control of a specialized system that monitors the angle of the pole and moves the pivot point horizontally back and forth under the center of mass as it starts to fall. The single-pole balancer is a classic problem in dynamics and control theory that is used as a benchmark for testing control strategies, including strategies based on reinforcement learning methods. We are particularly interested in the implementation of the specific control algorithm that uses neuroevolution-based methods to stabilize the inverted pendulum for a given amount of time.

The experiment described in this chapter considers the simulation of the inverted pendulum implemented as a cart that...

Objective function for a single-pole balancing experiment

Our goal is to create a pole balancing controller that's able to maintain a system in a stable state within defined constraints for as long as possible, but at least for the expected number of time steps specified in the experiment configuration (500,000). Thus, the objective function must optimize the duration of stable pole-balancing and can be defined as the logarithmic difference between the expected number of steps and the actual number of steps obtained during the evaluation of the phenotype ANN. The loss function is given as follows:

In this experiment, is the expected number of time steps from the configuration of the experiment, and is the actual number of time steps during which the controller was able to maintain a stable pole balancer state within allowed bounds (refer to the reinforcement signal definition...

The single-pole balancing experiment

Now that we have an objective function defined and implemented along with a simulation of cart-pole apparatus dynamics, we are ready to start writing the source code to run the neuroevolutionary process with the NEAT algorithm. We will use the same NEAT-Python library as in the XOR experiment in the previous chapter, but with the NEAT hyperparameters adjusted appropriately. The hyperparameters are stored in the single_pole_config.ini file, which can be found in the source code repository related to this chapter. You need to copy this file into your local Chapter4 directory, in which you already should have a Python script with the cart-pole simulator we created earlier.

Hyperparameter selection

...

Exercises

  1. Try to increase the value of the node_add_prob parameter and see what happens. Does the algorithm produce any number of hidden nodes, and if so, how many?
  2. Try to decrease/increase the compatibility_threshold value. What happens if you set it to 2.0 or 6.0? Can the algorithm find the solution in each case?
  3. Try to set the elitism value to zero in the DefaultReproduction section. See what happens. How long did the evolutionary process take to find an acceptable solution in this case?
  4. Set the survival_threshold value to 0.5 in the DefaultReproduction section. See how this affects speciation during evolution. Why does it?
  5. Increase the additional_num_runs and additional_steps values in order of magnitude to examine further how well the found control strategy is generalized. Is the algorithm still able to find a winning solution?
The last exercise will lead to an increase...

The double-pole balancing problem

The single-pole balancing problem is easy enough for the NEAT algorithm, which can quickly find the optimal control strategy to maintain a stable system state. To make the experiment more challenging, we present a more advanced version of the cart-pole balancing problem. In this version, the two poles are connected to the moving cart by a hinge.

A schema of the new cart-poles apparatus is as follows:

The cart-poles apparatus with two poles

Before we move to the implementation details of the experiment, we need to define the state variables and equations of motion for the simulation of the double-pole balancing system.

The system state and equations of motion

The goal of the controller is...

Objective function for a double-pole balancing experiment

The objective function for this problem is similar to the objective function defined earlier for the single-pole balancing problem. It is given by the following equations:

In these equations, is the expected number of time steps specified in the configuration of the experiment (100,000), and is the actual number of time steps during which the controller was able to maintain a stable state of the pole balancer within the specified limits.

We use logarithmic scales because most of the trials fail in the first several 100 steps, but we are testing against 100,000 steps. With a logarithmic scale, we have a better distribution of fitness scores, even compared with a small number of steps in failed trials.

The first of the preceding equations defines the loss, which is in the [0,1] range, and the second is a fitness score...

Double-pole balancing experiment

This experiment uses a version of the double-pole balancing problem that assumes full knowledge of the current system state, including the angular velocities of the poles and the velocity of the cart. The criteria of success in this experiment are to keep both poles balanced for 100,000 steps, or approximately 33 minutes of simulated time. The pole is considered balanced when it stays within degrees of vertical, while the cart remains within meters of the track's center.

Hyperparameter selection

Compared to the previous experiment described in this chapter, double-pole balancing is much harder to solve due to its complex motion dynamics. Thus, the search space for a successful control...

Exercises

  1. Try setting the node_add parameter value to 0.02 in the configuration file and see what happens.
  2. Change the seed value of the random number generator and see what happens. Was a solution found with a new value? How is it different from what we have presented in this chapter?

Summary

In this chapter, we learned how to implement control strategies for controllers that can maintain a stable state of a cart-pole apparatus with one or two poles mounted on top. We improved our Python skills and expanded our knowledge of the NEAT-Python library by implementing accurate simulations of physical apparatuses, which was used to define the objective functions for the experiments. Besides this, we learned about two methods for numerical approximations of differential equations, Euler's and Runge-Kutta, and implemented them in Python.

We found that the initial conditions that determine the neuroevolutionary process, such as a random seed number, have a significant impact on the performance of the algorithm. These values determine the entire sequence of numbers that will be generated by a random number generator. They serve as a random attractor that can amplify...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Hands-On Neuroevolution with Python.
Published in: Dec 2019 Publisher: Packt ISBN-13: 9781838824914
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}