You're reading from Advanced Deep Learning with Keras

Product typeBook

Published inOct 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781788629416

Edition1st Edition

Languages

Python

Tools

Keras TensorFlow

Concepts

Deep Learning

Author (1)

Rowel Atienza

Chapter 9. Deep Reinforcement Learning

Reinforcement Learning (RL) is a framework that is used by an agent for decision-making. The agent is not necessarily a software entity such as in video games. Instead, it could be embodied in hardware such as a robot or an autonomous car. An embodied agent is probably the best way to fully appreciate and utilize reinforcement learning since a physical entity interacts with the real-world and receives responses.

The agent is situated within an environment. The environment has a state that can be partially or fully observable. The agent has a set of actions that it can use to interact with its environment. The result of an action transitions the environment to a new state. A corresponding scalar reward is received after executing an action. The goal of the agent is to maximize the accumulated future reward by learning a policy that will decide which action to take given a state.

Reinforcement learning has a strong similarity to human...

Principles of reinforcement learning (RL)

Figure 9.1.1 shows the perception-action-learning loop that is used to describe RL. The environment is a soda can sitting on the floor. The agent is a mobile robot whose goal is to pick up the soda can. It observes the environment around it and tracks the location of the soda can through an onboard camera. The observation is summarized in a form of state which the robot will use to decide which action to take. The actions it takes may pertain to low-level control such as the rotation angle/speed of each wheel, rotation angle/speed of each joint of the arm, and whether the gripper is open or close.

Alternatively, the actions may be high-level control moves such as moving the robot forward/backward, steering with a certain angle, and grab/release. Any action that moves the gripper away from the soda receives a negative reward. Any action that closes the gap between the gripper location and the soda receives a positive reward. When the robot...

The Q value

An important question is that if the RL problem is to find The Q value , how does the agent learn by interacting with the environment? Equation 9.1.3 does not explicitly indicate the action to try and the succeeding state to compute the return. In RL, we find that it's easier to learn by using the Q value:

The Q value (Equation 9.2.1)

Where:

The Q value (Equation 9.2.2)

In other words, instead of finding the policy that maximizes the value for all states, Equation 9.2.1 looks for the action that maximizes the quality (Q) value for all states. After finding the Q value function, V* and hence are determined by Equation 9.2.2 and 9.1.3 respectively.

If for every action, the reward and the next state can be observed, we can formulate the following iterative or trial and error algorithm to learn the Q value:

The Q value (Equation 9.2.3)

For notational simplicity, both s ' and a ' are the next state and action respectively. Equation 9.2.3 is known as the Bellman Equation which is the core...

Q-Learning example

To illustrate the Q-Learning algorithm, we need to consider a simple deterministic environment, as shown in the following figure. The environment has six states. The rewards for allowed transitions are shown. The reward is non-zero in two cases. Transition to the Goal (G) state has +100 reward while moving into Hole (H) state has -100 reward. These two states are terminal states and constitute the end of one episode from the Start state:

Figure 9.3.1: Rewards in a simple deterministic world

To formalize the identity of each state, we need to use a (row, column) identifier as shown in the following figure. Since the agent has not learned anything yet about its environment, the Q-Table also shown in the following figure has zero initial values. In this example, the discount factor, Q-Learning example . Recall that in the estimate of current Q value, the discount factor determines the weight of future Q values as a function of the number of steps, . In Equation 9.2.3, we only consider the...

Q-Learning in Python

The environment and the Q-Learning discussed in the previous section can be implemented in Python. Since the policy is just a simple table, there is, at this point in time no need for Keras. Listing 9.3.1 shows q-learning-9.3.1.py, the implementation of the simple deterministic world (environment, agent, action, and Q-Table algorithms) using the QWorld class. For conciseness, the functions dealing with the user interface are not shown.

In this example, the environment dynamics is represented by self.transition_table. At every action, self.transition_table determines the next state. The reward for executing an action is stored in self.reward_table. The two tables are consulted every time an action is executed by the step() function. The Q-Learning algorithm is implemented by update_q_table() function. Every time the agent needs to decide which action to take, it calls the act() function. The action may be randomly drawn or decided by the policy using the Q-Table...

Nondeterministic environment

In the event that the environment is nondeterministic, both the reward and action are probabilistic. The new system is a stochastic MDP. To reflect the nondeterministic reward the new value function is:

Nondeterministic environment (Equation 9.4.1)

The Bellman equation is modified as:

Nondeterministic environment (Equation 9.4.2)

Temporal-difference learning

Q-Learning is a special case of a more generalized Temporal-Difference Learning or TD-Learning . More specifically, it's a special case of one-step TD-Learning TD(0):

Temporal-difference learning (Equation 9.5.1)

In the equation Temporal-difference learning is the learning rate. We should note that when , Equation 9.5.1 is similar to the Bellman equation. For simplicity, we'll refer to Equation 9.5.1 as Q-Learning or generalized Q-Learning.

Previously, we referred to Q-Learning as an off-policy RL algorithm since it learns the Q value function without directly using the policy that it is trying to optimize. An example of an on-policy one-step TD-learning algorithm is SARSA which similar to Equation 9.5.1:

Temporal-difference learning (Equation 9.5.2)

The main difference is the use of the policy that is being optimized to determine a'. The terms s, a, r, s' and a' (thus the name SARSA) must be known to update the Q value function at every iteration. Both Q-Learning and SARSA use existing estimates...

Q-Learning on OpenAI gym

Before presenting another example, there appears to be a need for a suitable RL simulation environment. Otherwise, we can only run RL simulations on very simple problems like in the previous example. Fortunately, OpenAI created Gym, https://gym.openai.com.

The gym is a toolkit for developing and comparing RL algorithms. It works with most deep learning libraries, including Keras. The gym can be installed by running the following command:

$ sudo pip3 install gym

The gym has several environments where an RL algorithm can be tested against such as toy text, classic control, algorithmic, Atari, and 2D/3D robots. For example, FrozenLake-v0 (Figure 9.5.1) is a toy text environment similar to the simple deterministic world used in the Q-Learning in Python example. FrozenLake-v0 has 12 states. The state marked S is the starting state, F is the frozen part of the lake which is safe, H is the Hole state that should be avoided, and G is the Goal state where the frisbee...

Deep Q-Network (DQN)

Using the Q-Table to implement Q-Learning is fine in small discrete environments. However, when the environment has numerous states or continuous as in most cases, a Q-Table is not feasible or practical. For example, if we are observing a state made of four continuous variables, the size of the table is infinite. Even if we attempt to discretize the four variables into 1000 values each, the total number of rows in the table is a staggering 1000⁴ = 1e¹². Even after training, the table is sparse - most of the cells in this table are zero.

A solution to this problem is called DQN [2] which uses a deep neural network to approximate the Q-Table. As shown in Figure 9.6.1. There are two approaches to build the Q-network:

The input is the state-action pair, and the prediction is the Q value
The input is the state, and the prediction is the Q value for each action

The first option is not optimal since the network will be called a number of times equal to the number of...

DQN on Keras

To illustrate DQN, the CartPole-v0 environment of the OpenAI Gym is used. CartPole-v0 is a pole balancing problem. The goal is to keep the pole from falling over. The environment is 2D. The action space is made of two discrete actions (left and right movements). However, the state space is continuous and is made of four variables:

Linear position
Linear velocity
Angle of rotation
Angular velocity

The CartPole-v0 is shown in Figure 9.6.1.

Initially, the pole is upright. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole exceeds 15 degrees from the vertical or 2.4 units from the center. The CartPole-v0 problem is considered solved if the average reward is 195.0 in 100 consecutive trials:

Figure 9.6.1: The CartPole-v0 environment

Listing 9.6.1 shows us the DQN implementation for CartPole-v0. The DQNAgent class represents the agent using DQN. Two Q-Networks are created:

Q-Network or Q in Algorithm 9.6.1
Target Q-Network or Q_target...

Double Q-Learning (DDQN)

In DQN, the target Q-Network selects and evaluates every action resulting in an overestimation of Q value. To resolve this issue, DDQN [3] proposes to use the Q-Network to choose the action and use the target Q-Network to evaluate the action.

In DQN as summarized by Algorithm 9.6.1, the estimate of the Q value in line 10 is:

Q_target chooses and evaluates the action a _j+1.

DDQN proposes to change line 10 to:

The term Double Q-Learning (DDQN) lets Q to choose the action. Then this action is evaluated by Q_target.

In Listing 9.6.1, both DQN and DDQN are implemented. Specifically, for DDQN, the modification on the Q value computation performed by get_target_q_value() function is highlighted:

# compute Q_max
# use of target Q Network solves the non-stationarity problem
def get_target_q_value(self, next_state):
    # max Q value among next state's actions
    if self.ddqn:
        # DDQN
        # current Q Network selects the action
        # a'_max = argmax_a' Q(s', a&apos...

Conclusion

In this chapter, we've been introduced to DRL. A powerful technique believed by many researchers as the most promising lead towards artificial intelligence. Together, we've gone over the principles of RL. RL is able to solve many toy problems, but the Q-Table is unable to scale to more complex real-world problems. The solution is to learn the Q-Table using a deep neural network. However, training deep neural networks on RL is highly unstable due to sample correlation and non-stationarity of the target Q-Network.

DQN proposed a solution to these problems using experience replay and separating the target network from the Q-Network under training. DDQN suggested further improvement of the algorithm by separating the action selection from action evaluation to minimize the overestimation of Q value. There are other improvements proposed for the DQN. Prioritized experience replay [6] argues that that experience buffer should not be sampled uniformly. Instead, experiences that...

References

Sutton and Barto. Reinforcement Learning: An Introduction, 2017 (http://incompleteideas.net/book/bookdraft2017nov5.pdf).
Volodymyr Mnih and others, Human-level control through deep reinforcement learning. Nature 518.7540, 2015: 529 (http://www.davidqiu.com:8888/research/nature14236.pdf)
Hado Van Hasselt, Arthur Guez, and David Silver Deep Reinforcement Learning with Double Q-Learning. AAAI. Vol. 16, 2016 (http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12389/11847).
Kai Arulkumaran and others A Brief Survey of Deep Reinforcement Learning. arXiv preprint arXiv:1708.05866, 2017 (https://arxiv.org/pdf/1708.05866.pdf).
David Silver Lecture Notes on Reinforcement Learning, (http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html).
Tom Schaul and others. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015 (https://arxiv.org/pdf/1511.05952.pdf).
Ziyu Wang and others. Dueling Network Architectures for Deep Reinforcement Learning. arXiv preprint arXiv:1511...

The rest of the chapter is locked

You have been reading a chapter from

Advanced Deep Learning with Keras

Published in: Oct 2018Publisher: PacktISBN-13: 9781788629416

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Rowel Atienza

Rowel Atienza is an Associate Professor at the Electrical and Electronics Engineering Institute of the University of the Philippines, Diliman. He holds the Dado and Maria Banatao Institute Professorial Chair in Artificial Intelligence. Rowel has been fascinated with intelligent robots since he graduated from the University of the Philippines. He received his MEng from the National University of Singapore for his work on an AI-enhanced four-legged robot. He finished his Ph.D. at The Australian National University for his contribution on the field of active gaze tracking for human-robot interaction. Rowel's current research work focuses on AI and computer vision. He dreams on building useful machines that can perceive, understand, and reason. To help make his dreams become real, Rowel has been supported by grants from the Department of Science and Technology (DOST), Samsung Research Philippines, and Commission on Higher Education-Philippine California Advanced Research Institutes (CHED-PCARI).
Read more about Rowel Atienza

Other recommended products

Related to this chapter

Advanced Deep Learning with TensorFlow 2 and Keras

A second edition of the bestselling guide to exploring and mastering deep learning with Keras, updated to include TensorFlow 2.x with new chapters on object detection, semantic segmentation, and unsupervised learning using mutual information.

BookFeb 2020512 pages

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

Hands-On Generative Adversarial Networks with Keras

This book will explore deep learning and generative models, and their applications in artificial intelligence. You will learn to evaluate and improve your GAN models by eliminating challenges that are encountered in real-world applications. You will implement GAN architectures in various domains such as computer vision, NLP, and audio processing

BookMay 2019272 pages

TensorFlow Reinforcement Learning Quick Start Guide

This book is an essential guide for anyone interested in Reinforcement Learning. The book provides an actionable reference for Reinforcement Learning algorithms and their applications using TensorFlow and Python. It will help readers leverage the power of algorithms such as Deep Q-Network (DQN), Deep Deterministic Policy Gradients (DDPG), and Proximal Policy Optimization (PPO) to solve challenging control and decision-making problems.

BookMar 2019184 pages

Deep Reinforcement Learning with Python

Deep Reinforcement Learning with Python - Second Edition will help you learn reinforcement learning algorithms, techniques and architectures – including deep reinforcement learning – from scratch. This new edition is an extensive update of the original, reflecting the state-of-the-art latest thinking in reinforcement learning.

BookSep 2020760 pages

Generative Adversarial Networks Cookbook

Generative Adversarial Networks have opened up many new possibilities in the machine learning domain. This book is all you need to implement different types of GANs using TensorFlow and Keras, in order to provide optimized and efficient deep learning solutions.

BookDec 2018268 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Hands-On Generative Adversarial Networks with PyTorch 1.x

This book will help you understand how GANs architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working codes. You will apply GAN models to areas like computer vision, multimedia and natural language processing using a sample-generation perspective.

BookDec 2019312 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Keras Deep Learning Cookbook

This book gives you a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning. It presents a unique problem-solution approach to tackle various problems in training different types of neural networks while taking care of the speed and accuracy of these models

BookOct 2018252 pages

Python Deep Learning Cookbook

Deep Learning is a rapidly evolving field of Machine Learning science which gives machines the ability to learn from information. This book contains detailed recipes to tackle with the common and not so common problems while dealing with deep learning algorithms and models in Python. You will benefit from this book by finding technical solutions to the issues presented, along with a detailed explanation of the solutions, and a discussion on corresponding pros and cons of implementing the proposed solution using Theano, Tensorflow, MXNet, and Keras. You'll come across recipes on data pre-processing, network models and topologies, supervised and unsupervised learning presented in a “solution to problem” fashion.

BookOct 2017330 pages

Deep Learning with R Cookbook

This book will help you get through the problems that you face during the execution of different tasks and understand hacks in deep learning. With unique recipes, you will implement various deep learning architectures using R 3.5.x. You will cover complex algorithms to perform tasks such as reinforcement learning, GANs, advanced neural networks and more.

BookFeb 2020328 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages