Reader small image

You're reading from  Python Reinforcement Learning Projects

Product typeBook
Published inSep 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788991612
Edition1st Edition
Languages
Right arrow
Authors (3):
Sean Saito
Sean Saito
author image
Sean Saito

Sean Saito is the youngest ever Machine Learning Developer at SAP and the first bachelor hired for the position. He currently researches and develops machine learning algorithms that automate financial processes. He graduated from Yale-NUS College in 2017 with a Bachelor of Science degree (with Honours), where he explored unsupervised feature extraction for his thesis. Having a profound interest in hackathons, Sean represented Singapore during Data Science Game 2016, the largest student data science competition. Before attending university in Singapore, Sean grew up in Tokyo, Los Angeles, and Boston.
Read more about Sean Saito

Yang Wenzhuo
Yang Wenzhuo
author image
Yang Wenzhuo

Yang Wenzhuo works as a Data Scientist at SAP, Singapore. He got a bachelor's degree in computer science from Zhejiang University in 2011 and a Ph.D. in machine learning from the National University of Singapore in 2016. His research focuses on optimization in machine learning and deep reinforcement learning. He has published papers on top machine learning/computer vision conferences including ICML and CVPR, and operations research journals including Mathematical Programming.
Read more about Yang Wenzhuo

Rajalingappaa Shanmugamani
Rajalingappaa Shanmugamani
author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani

View More author details
Right arrow

Chapter 2. Balancing CartPole

In this chapter, you will learn about the CartPole balancing problem. The CartPole is an inverted pendulum, where the pole is balanced against gravity. Traditionally, this problem is solved by control theory, using analytical equations. However, in this chapter, we will solve the problem with machine learning.

The following topics will be covered in this chapter:

  • Installing OpenAI Gym
  • The different environments of Gym

OpenAI Gym


OpenAI is a non-profit organization dedicated to researching artificial intelligence. Visit https://openai.com for more information about the mission of OpenAI. The technologies developed by OpenAI are free for anyone to use.

Gym

Gym provides a toolkit to benchmark AI-based tasks. The interface is easy to use. The goal is to enable reproducible research. Visit https://gym.openai.com for more information about Gym. An agent can be taught inside of the gym, and learn activities such as playing games or walking. An environment is a library of problems.

The standard set of problems presented in the gym are as follows:

  • CartPole
  • Pendulum
  • Space Invaders
  • Lunar Lander
  • Ant
  • Mountain Car
  • Acrobot
  • Car Racing
  • Bipedal Walker

Any algorithm can work out in the gym by training for these activities. All of the problems have the same interface. Therefore, any general reinforcement learning algorithm can be used through the interface.

Installation 

The primary interface of the gym is used through Python. Once you...

Markov models


The problem is set up as a reinforcement learning problem, with a trial and error method. The environment is described using state_valuesstate_values (?), and the state_values are changed by actions. The actions are determined by an algorithm, based on the current state_value, in order to achieve a particular state_value that is termed a Markov model. In an ideal case, the past state_values does have an influence on future state_values, but here, we assume that the current state_value has all of the previous state_values encoded. There are two types of state_values; one is observable, and the other is non-observable. The model has to take non-observable state_values into account, as well. That is called a Hidden Markov model.

CartPole

At each step of the cart and pole, several variables can be observed, such as the position, velocity, angle, and angular velocity. The possible state_values of the cart are moved right and left:

  1. state_values: Four dimensions of continuous values...

Summary


In this chapter, you learned about the OpenAI Gym, used in reinforcement learning projects. You saw several examples of the training platform provided out of the box. Then, we formulated the problem of the CartPole, and made the CartPole balance by using a trial and error approach.

In the next chapter, you will learn how to play Atari games by using the Gym and a reinforcement learning approach. 

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Reinforcement Learning Projects
Published in: Sep 2018Publisher: PacktISBN-13: 9781788991612
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Sean Saito

Sean Saito is the youngest ever Machine Learning Developer at SAP and the first bachelor hired for the position. He currently researches and develops machine learning algorithms that automate financial processes. He graduated from Yale-NUS College in 2017 with a Bachelor of Science degree (with Honours), where he explored unsupervised feature extraction for his thesis. Having a profound interest in hackathons, Sean represented Singapore during Data Science Game 2016, the largest student data science competition. Before attending university in Singapore, Sean grew up in Tokyo, Los Angeles, and Boston.
Read more about Sean Saito

author image
Yang Wenzhuo

Yang Wenzhuo works as a Data Scientist at SAP, Singapore. He got a bachelor's degree in computer science from Zhejiang University in 2011 and a Ph.D. in machine learning from the National University of Singapore in 2016. His research focuses on optimization in machine learning and deep reinforcement learning. He has published papers on top machine learning/computer vision conferences including ICML and CVPR, and operations research journals including Mathematical Programming.
Read more about Yang Wenzhuo

author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani