Reader small image

You're reading from  Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803232911
Edition3rd Edition
Right arrow
Authors (3):
Amita Kapoor
Amita Kapoor
author image
Amita Kapoor

Amita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.
Read more about Amita Kapoor

Antonio Gulli
Antonio Gulli
author image
Antonio Gulli

Antonio Gulli has a passion for establishing and managing global technological talent for innovation and execution. His core expertise is in cloud computing, deep learning, and search engines. Currently, Antonio works for Google in the Cloud Office of the CTO in Zurich, working on Search, Cloud Infra, Sovereignty, and Conversational AI.
Read more about Antonio Gulli

Sujit Pal
Sujit Pal
author image
Sujit Pal

Sujit Pal is a Technology Research Director at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His interests include semantic search, natural language processing, machine learning, and deep learning. At Elsevier, he has worked on several initiatives involving search quality measurement and improvement, image classification and duplicate detection, and annotation and ontology development for medical and scientific corpora.
Read more about Sujit Pal

View More author details
Right arrow

Reinforcement Learning

This chapter introduces Reinforcement Learning (RL)—the least explored and yet most promising learning paradigm. Reinforcement learning is very different from the supervised and unsupervised learning models we covered in earlier chapters. Starting from a clean slate (that is, having no prior information), the RL agent can go through multiple stages of trial and error, and learn to achieve a goal, all the while the only input being the feedback from the environment. The research in RL by OpenAI seems to suggest that continuous competition can be a cause for the evolution of intelligence. Many deep learning practitioners believe that RL will play an important role in the big AI dream: Artificial General Intelligence (AGI). This chapter will delve into different RL algorithms. The following topics will be covered:

  • What RL is and its lingo
  • Learn how to use the OpenAI Gym interface
  • Applications of RL
  • Deep Q-Networks
  • Policy...

An introduction to RL

What is common between a baby learning to walk, birds learning to fly, and an RL agent learning to play an Atari game? Well, all three involve:

  • Trial and error: The child (or the bird) tries various ways, fails many times, and succeeds in some ways before it can really walk (or fly). The RL agent plays many games, winning some and losing many, before it can become reliably successful.
  • Goal: The child has the goal to walk, the bird to fly, and the RL agent to win the game.
  • Interaction with the environment: The only feedback they have is from their environment.

So, the first questions that arise are what is RL, and how is it different from supervised and unsupervised learning? Anyone who owns a pet knows that the best strategy to train a pet is rewarding it for desirable behavior and disciplining it for bad behavior. RL, also called learning with a critic, is a learning paradigm where the agent learns in the same manner. The agent...

Simulation environments for RL

As mentioned earlier, trial and error is an important component of any RL algorithm. Therefore, it makes sense to train our RL agent firstly in a simulated environment.

Today there exists a large number of platforms that can be used for the creation of an environment. Some popular ones are:

  • OpenAI Gym: This contains a collection of environments that we can use to train our RL agents. In this chapter, we’ll be using the OpenAI Gym interface.
  • Unity ML-Agents SDK: It allows developers to transform games and simulations created using the Unity editor into environments where intelligent agents can be trained using DRL, evolutionary strategies, or other machine learning methods through a simple-to-use Python API. It works with TensorFlow and provides the ability to train intelligent agents for 2D/3D and VR/AR games. You can learn more about it here: https://github.com/Unity-Technologies/ml-agents.
  • Gazebo: In Gazebo, we can...

An introduction to OpenAI Gym

We will be using OpenAI Gym to provide an environment for our agent. OpenAI Gym is an open source toolkit to develop and compare RL algorithms. It contains a variety of simulated environments that can be used to train agents and develop new RL algorithms.

The first thing to do is install OpenAI Gym. The following command will install the minimal gym package:

pip install gym

If you want to install all (free) gym modules, add [all] after it:

pip install gym[all]

The MuJoCo environment requires a purchasing license. For Atari-based games, you will need to install Atari dependencies (Box2D and ROM):

pip install box2d-py

OpenAI Gym provides a variety of environments, from simple text-based to three-dimensional games. The environments supported can be grouped as follows:

  • Algorithms: Contains environments that involve performing computations such as addition. While we can easily perform the computations...

Deep Q-networks

Deep Q-Networks, DQNs for short, are deep learning neural networks designed to approximate the Q-function (value-state function). They are one of the most popular value-based reinforcement learning algorithms. The model was proposed by Google’s DeepMind in NeurIPS 2013, in the paper entitled Playing Atari with Deep Reinforcement Learning. The most important contribution of this paper was that they used the raw state space directly as input to the network; the input features were not hand-crafted as done in earlier RL implementations. Also, they could train the agent with exactly the same architecture to play different Atari games and obtain state-of-the-art results.

This model is an extension of the simple Q-learning algorithm. In Q-learning algorithms, a Q-table is maintained as a cheat sheet. After each action, the Q-table is updated using the Bellman equation [5]:

is the learning rate, and its value lies in the range [0,1]. The first term represents...

Deep deterministic policy gradient

The DQN and its variants have been very successful in solving problems where the state space is continuous and action space is discrete. For example, in Atari games, the input space consists of raw pixels, but actions are discrete—[up, down, left, right, no-op]. How do we solve a problem with continuous action space? For instance, say an RL agent driving a car needs to turn its wheels: this action has a continuous action space.

One way to handle this situation is by discretizing the action space and continuing with a DQN or its variants. However, a better solution would be to use a policy gradient algorithm. In policy gradient methods, the policy is approximated directly.

A neural network is used to approximate the policy; in the simplest form, the neural network learns a policy for selecting actions that maximize the rewards by adjusting its weights using the steepest gradient ascent, hence the name: policy gradients.

In this...

Summary

Reinforcement learning has in recent years seen a lot of progress. To summarize all of that in a single chapter is not possible. However, in this chapter, we focused on the recent successful RL algorithms. The chapter started by introducing the important concepts in the RL field, its challenges, and the solutions to move forward. Next, we delved into two important RL algorithms: the DQN and DDPG algorithms. Toward the end of this chapter, we covered important topics in the field of deep learning.

In the next chapter, we will move on to applying what we have learned to production.

References

  1. MIT Technology Review covers OpenAI experiments on reinforcement learning: https://www.technologyreview.com/s/614325/open-ai-algorithms-learned-tool-use-and-cooperation-after-hide-and-seek-games/
  2. Coggan, Melanie. (2014). Exploration and Exploitation in Reinforcement Learning. Research supervised by Prof. Doina Precup, CRA-W DMP Project at McGill University.
  3. Lin, Long-Ji. (1993). Reinforcement learning for robots using neural networks. No. CMU-CS-93-103. Carnegie-Mellon University Pittsburgh PA School of Computer Science.
  4. Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. (2015). Prioritized Experience Replay. arXiv preprint arXiv:1511.05952
  5. Sutton R., Barto A. Chapter 4, Reinforcement Learning. MIT Press: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
  6. Dabney W., Rowland M., Bellemare M G., and Munos R. (2018). Distributional Reinforcement Learning with Quantile Regression. In Thirty-Second AAAI...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with TensorFlow and Keras – 3rd edition - Third Edition
Published in: Oct 2022Publisher: PacktISBN-13: 9781803232911
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Amita Kapoor

Amita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.
Read more about Amita Kapoor

author image
Antonio Gulli

Antonio Gulli has a passion for establishing and managing global technological talent for innovation and execution. His core expertise is in cloud computing, deep learning, and search engines. Currently, Antonio works for Google in the Cloud Office of the CTO in Zurich, working on Search, Cloud Infra, Sovereignty, and Conversational AI.
Read more about Antonio Gulli

author image
Sujit Pal

Sujit Pal is a Technology Research Director at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His interests include semantic search, natural language processing, machine learning, and deep learning. At Elsevier, he has worked on several initiatives involving search quality measurement and improvement, image classification and duplicate detection, and annotation and ontology development for medical and scientific corpora.
Read more about Sujit Pal