You're reading from Deep Learning with TensorFlow 2 and Keras - Second Edition

Product type Book

Published in Dec 2019

Publisher Packt

ISBN-13 9781838823412

Pages 646 pages

Edition 2nd Edition

Languages

Python

Concepts

Deep Learning

Authors (3):

Antonio Gulli

Amita Kapoor

Sujit Pal

View More author details

Table of Contents (19) Chapters

Preface

Neural Network Foundations with TensorFlow 2.0

TensorFlow 1.x and 2.x

Regression

Convolutional Neural Networks

Advanced Convolutional Neural Networks

Generative Adversarial Networks

Word Embeddings

Recurrent Neural Networks

Autoencoders

Unsupervised Learning

Reinforcement Learning

TensorFlow and Cloud

TensorFlow for Mobile and IoT and TensorFlow.js

An introduction to AutoML

The Math Behind Deep Learning

Tensor Processing Unit

Other Books You May Enjoy

Index

Reinforcement Learning

This chapter introduces reinforcement learning (RL)—the least explored and yet most promising learning paradigm. Reinforcement learning is very different from both supervised and unsupervised learning models we have done in earlier chapters. Starting from a clean slate (that is, having no prior information), the RL agent can go through multiple stages of hit and trials, and learn to achieve a goal, all the while the only input being the feedback from the environment. The latest research in RL by OpenAI seems to suggest that continuous competition can be a cause for the evolution of intelligence. Many deep learning practitioners believe that RL will play an important role in the big AI dream: Artificial General Intelligence (AGI). This chapter will delve into different RL algorithms, the following topics will be covered:

What is RL and its lingo
Learn how to use OpenAI Gym interface
Deep Q-Networks
Policy gradients

Introduction

What is common between a baby learning to walk, birds learning to fly, or an RL agent learning to play an Atari game? Well, all three involve:

Trial and error: The child (or the bird) tries various ways, fails many times, and succeeds in some ways before it can really stand (or fly). The RL Agent plays many games, winning some and losing many, before it can become reliably successful.
Goal: The child has the goal to stand, the bird to fly, and the RL agent has the goal to win the game.
Interaction with the environment: The only feedback they have is from their environment.

So, the first question that arises is what is RL and how is it different from supervised and unsupervised learning? Anyone who owns a pet knows that the best strategy to train a pet is rewarding it for desirable behavior and punishing it for bad behavior. RL, also called learning with a critic, is a learning paradigm where the agent learns in the same manner. The agent here corresponds...

Introduction to OpenAI Gym

As mentioned earlier, trial and error is an important component of any RL algorithm. Therefore, it makes sense to train our RL agent firstly in a simulated environment.

Today there exists a large number of platforms that can be used for the creation of an environment. Some popular ones are:

OpenAI Gym: It contains a collection of environments that we can use to train our RL agents. In this chapter, we'll be using the OpenAI Gym interface.
Unity ML-Agents SDK: It allows developers to transform games and simulations created using the Unity editor into environments where intelligent agents can be trained using DRL, evolutionary strategies, or other machine learning methods through a simple-to-use Python API. It works with TensorFlow and provides the ability to train intelligent agents for 2D/3D and VR/AR games. You can learn more about it here: https://github.com/Unity-Technologies/ml-agents.
Gazebo: In Gazebo, we can build three-dimensional...

Deep Q-Networks

Deep Q-networks, DQNs for short, are deep learning neural networks designed to approximate the Q-function (value-state function), it is one of the most popular value-based reinforcement learning algorithms. The model was proposed by Google's DeepMind in NIPS 2013, in the paper entitled Playing Atari with Deep Reinforcement Learning. The most important contribution of this paper was that they used the raw state space directly as input to the network; the input features were not hand-crafted as done in earlier RL implementations. Also, they could train the agent with exactly the same architecture to play different Atari games and obtain state of the art results.

This model is an extension of the simple Q-learning algorithm. In Q-learning algorithms a Q-table is maintained as a cheat sheet. After each action the Q-table is updated using the Bellman equation [5]:

The is the learning rate, and its value lies in the range [0,1]. The first term represents the...

Deep deterministic policy gradient

DQN and its variants have been very successful in solving problems where the state space is continuous and action space is discrete. For example, in Atari games, the input space consists of raw pixels, but actions are discrete - [up, down, left, right, no-op]. How do we solve a problem with continuous action space? For instance, say an RL agent driving a car needs to turn its wheels: this action has a continuous action space One way to handle this situation is by discretizing the action space and continuing with DQN or its variants. However, a better solution would be to use a policy gradient algorithm. In policy gradient methods the policy is approximated directly.

A neural network is used to approximate the policy; in the simplest form, the neural network learns a policy for selecting actions that maximize the rewards by adjusting its weights using steepest gradient ascent, hence, the name: policy gradients.

In this section we will focus...

Summary

Reinforcement learning has in recent years seen a lot of progress, to summarize all of that in a single chapter is not possible. However, in this chapter we focused on the recent successful RL algorithms. The chapter started by introducing the important concepts in the RL field, its challenges, and the solutions to move forward. Next, we delved into two important RL algorithms: the DQN and DDPG algorithms. Toward the end of this chapter the book covered the important topics in the field of deep learning. In the next chapter, we will move into applying what we have learned to production.

References

https://www.technologyreview.com/s/614325/open-ai-algorithms-learned-tool-use-and-cooperation-after-hide-and-seek-games/?fbclid=IwAR1JvW-JTWnzP54bk9eCEvuJOq1y7vU4qz4OFfilWr7xHGHsILakKSD9UjY
Coggan, Melanie. Exploration and Exploitation in Reinforcement Learning. Research supervised by Prof. Doina Precup, CRA-W DMP Project at McGill University (2004).
Lin, Long-Ji. Reinforcement learning for robots using neural networks. No. CMU-CS-93-103. Carnegie-Mellon University Pittsburgh PA School of Computer Science, 1993.
Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. Prioritized Experience Replay. arXiv preprint arXiv:1511.05952 (2015).
Chapter 4, Reinforcement Learning, Richard Sutton and Andrew Barto, MIT Press. https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf.
Dabney, Will, Mark Rowland, Marc G. Bellemare, and Rémi Munos. Distributional Reinforcement Learning with Quantile Regression. In Thirty-Second AAAI...