Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
TensorFlow 2 Reinforcement Learning Cookbook
TensorFlow 2 Reinforcement Learning Cookbook

TensorFlow 2 Reinforcement Learning Cookbook: Over 50 recipes to help you build, train, and deploy learning agents for real-world applications

By Palanisamy P
$15.99 per month
Book Jan 2021 472 pages 1st Edition
eBook
$35.99 $9.99
Print
$48.99
Subscription
$15.99 Monthly
eBook
$35.99 $9.99
Print
$48.99
Subscription
$15.99 Monthly

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Jan 15, 2021
Length 472 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781838982546
Vendor :
Google
Category :
Languages :
Table of content icon View table of contents Preview book icon Preview Book

TensorFlow 2 Reinforcement Learning Cookbook

Chapter 2: Implementing Value-Based, Policy-Based, and Actor-Critic Deep RL Algorithms

This chapter provides a practical approach to building value-based, policy-based, and actor-critic algorithm-based reinforcement learning (RL) agents. It includes recipes for implementing value iteration-based learning agents and breaks down the implementation details of several foundational algorithms in RL into simple steps. The policy gradient-based agent and the actor-critic agent make use of the latest major version of TensorFlow 2.x to define the neural network policies.

The following recipes will be covered in this chapter:

  • Building stochastic environments for training RL agents
  • Building value-based (RL) agent algorithms
  • Implementing temporal difference learning
  • Building Monte Carlo prediction and control algorithms for RL
  • Implementing the SARSA algorithm and an RL agent
  • Building a Q-learning agent
  • Implementing policy gradients
  • Implementing actor-critic...

Technical requirements

The code in this book has been tested extensively on Ubuntu 18.04 and Ubuntu 20.04, and should work with later versions of Ubuntu if Python 3.6+ is available. With Python 3.6 installed, along with the necessary Python packages listed at the beginning of each recipe, the code should run fine on Windows and Mac OS X too. It is advised that you create and use a Python virtual environment named tf2rl-cookbook to install the packages and run the code in this book. Installing Miniconda or Anaconda for Python virtual environment management is recommended.

The complete code for each recipe in each chapter is available here: https://github.com/PacktPublishing/Tensorflow-2-Reinforcement-Learning-Cookbook.

Building stochastic environments for training RL agents

To train RL agents for the real world, we need learning environments that are stochastic, since real-world problems are stochastic in nature. This recipe will walk you through the steps for building a Maze learning environment to train RL agents. The Maze is a simple, stochastic environment where the world is represented as a grid. Each location on the grid can be referred to as a cell. The goal of an agent in this environment is to find its way to the goal state. Consider the maze shown in the following diagram, where the black cells represent walls:

Figure 2.1 – The Maze environment

The agent's location is initialized to be at the top-left cell in the Maze. The agent needs to find its way around the grid to reach the goal located at the top-right cell in the Maze, collecting a maximum number of coins along the way while avoiding walls. The location of the goal, coins, walls, and the agent...

Building value-based reinforcement learning agent algorithms

Value-based reinforcement learning works by learning the state-value function or the action-value function in a given environment. This recipe will show you how to create and update the value function for the Maze environment to obtain an optimal policy. Learning value functions, especially in model-free RL problems where a model of the environment is not available, can prove to be quite effective, especially for RL problems with low-dimensional state space.

Upon completing this recipe, you will have an algorithm that can generate the following optimal action sequence based on value functions:

Figure 2.3 – Optimal action sequence generated by a value-based RL algorithm with state values represented through a jet color map

Let's get started.

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install numpy...

Implementing temporal difference learning

This recipe will walk you through how to implement the temporal difference (TD) learning algorithm. TD algorithms allow us to incrementally learn from incomplete episodes of agent experiences, which means they can be used for problems that require online learning capabilities. TD algorithms are useful in model-free RL settings as they do not depend on a model of the MDP transitions or rewards. To visually understand the learning progression of the TD algorithm, this recipe will also show you how to implement the GridworldV2 learning environment, which looks as follows when rendered:

Figure 2.6 – The GridworldV2 learning environment 2D rendering with state values and grid cell coordinates

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install numpy gym. If the following import statements run without issues, you are ready to get...

Building Monte Carlo prediction and control algorithms for RL

This recipe provides the ingredients for building a Monte Carlo prediction and control algorithm so that you can build your RL agents. Similar to the temporal difference learning algorithm, Monte Carlo learning methods can be used to learn both the state and the action value functions. Monte Carlo methods have zero bias since they learn from complete episodes with real experience, without approximate predictions. These methods are suitable for applications that require good convergence properties. The following diagram illustrates the value that's learned by the Monte Carlo method for the GridworldV2 environment:

Figure 2.10 – Monte Carlo prediction of state values (left) and state-action values (right)

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import...

Implementing the SARSA algorithm and an RL agent

This recipe will show you how to implement the State-Action-Reward-State-Action (SARSA) algorithm, as well as how to develop and train an agent using the SARSA algorithm so that it can act in a reinforcement learning environment. The SARSA algorithm can be applied to model-free control problems and allows us to optimize the value function of an unknown MDP.

Upon completing this recipe, you will have a working RL agent that, when acting in the GridworldV2 environment, will generate the following state-action value function using the SARSA algorithm:

Figure 2.15 – Rendering of the GridworldV2 environment – each triangle represents the action value of taking that directional action in that grid state

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import statements run...

Building a Q-learning agent

This recipe will show you how to build a Q-learning agent. Q-learning can be applied to model-free RL problems. It supports off-policy learning and therefore provides a practical solution to problems where available experiences were/are collected using some other policy or by some other agent (even humans).

Upon completing this recipe, you will have a working RL agent that, when acting in the GridworldV2 environment, will generate the following state-action value function using the SARSA algorithm:

Figure 2.18 – State-action values obtained using the Q-learning algorithm

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started:

import numpy as np
import random

Now, let's begin.

How to do it…

Let's implement...

Implementing policy gradients

Policy gradient algorithms are fundamental to reinforcement learning and serve as the basis for several advanced RL algorithms. These algorithms directly optimize for the best policy, which can lead to faster learning compared to value-based algorithms. Policy gradient algorithms are effective for problems/applications with high-dimensional or continuous action spaces. This recipe will show you how to implement policy gradient algorithms using TensorFlow 2.0. Upon completing this recipe, you will be able to train an RL agent in any compatible OpenAI Gym environment.

Getting ready

To complete this recipe, you will need to activate the tf2rl-cookbook Python/conda virtual environment and run pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started:

import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow import keras
from tensorflow.keras import layers
import numpy as...

Implementing actor-critic RL algorithms

Actor-critic algorithms allow us to combine value-based and policy-based reinforcement learning – an all-in-one agent. While policy gradient methods directly search and optimize the policy in the policy space, leading to smoother learning curves and improvement guarantees, they tend to get stuck at the local maxima (for a long-term reward optimization objective). Value-based methods do not get stuck at local optimum values, but they lack convergence guarantees, and algorithms such as Q-learning tend to have high variance and are not very sample-efficient. Actor-critic methods combine the good qualities of both value-based and policy gradient-based algorithms. Actor-critic methods are also more sample-efficient. This recipe will make it easy for you to implement an actor-critic-based RL agent using TensorFlow 2.x. Upon completing this recipe, you will be able to train the actor-critic agent in any OpenAI Gym-compatible reinforcement learning...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Develop and deploy deep reinforcement learning-based solutions to production pipelines, products, and services
  • Explore popular reinforcement learning algorithms such as Q-learning, SARSA, and the actor-critic method
  • Customize and build RL-based applications for performing real-world tasks

Description

With deep reinforcement learning, you can build intelligent agents, products, and services that can go beyond computer vision or perception to perform actions. TensorFlow 2.x is the latest major release of the most popular deep learning framework used to develop and train deep neural networks (DNNs). This book contains easy-to-follow recipes for leveraging TensorFlow 2.x to develop artificial intelligence applications. Starting with an introduction to the fundamentals of deep reinforcement learning and TensorFlow 2.x, the book covers OpenAI Gym, model-based RL, model-free RL, and how to develop basic agents. You'll discover how to implement advanced deep reinforcement learning algorithms such as actor-critic, deep deterministic policy gradients, deep-Q networks, proximal policy optimization, and deep recurrent Q-networks for training your RL agents. As you advance, you’ll explore the applications of reinforcement learning by building cryptocurrency trading agents, stock/share trading agents, and intelligent agents for automating task completion. Finally, you'll find out how to deploy deep reinforcement learning agents to the cloud and build cross-platform apps using TensorFlow 2.x. By the end of this TensorFlow book, you'll have gained a solid understanding of deep reinforcement learning algorithms and their implementations from scratch.

What you will learn

Build deep reinforcement learning agents from scratch using the all-new TensorFlow 2.x and Keras API Implement state-of-the-art deep reinforcement learning algorithms using minimal code Build, train, and package deep RL agents for cryptocurrency and stock trading Deploy RL agents to the cloud and edge to test them by creating desktop, web, and mobile apps and cloud services Speed up agent development using distributed DNN model training Explore distributed deep RL architectures and discover opportunities in AIaaS (AI as a Service)

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Jan 15, 2021
Length 472 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781838982546
Vendor :
Google
Category :
Languages :

Table of Contents

11 Chapters
Preface Chevron down icon Chevron up icon
1. Chapter 1: Developing Building Blocks for Deep Reinforcement Learning Using Tensorflow 2.x Chevron down icon Chevron up icon
2. Chapter 2: Implementing Value-Based, Policy-Based, and Actor-Critic Deep RL Algorithms Chevron down icon Chevron up icon
3. Chapter 3: Implementing Advanced RL Algorithms Chevron down icon Chevron up icon
4. Chapter 4: Reinforcement Learning in the Real World – Building Cryptocurrency Trading Agents Chevron down icon Chevron up icon
5. Chapter 5: Reinforcement Learning in the Real World – Building Stock/Share Trading Agents Chevron down icon Chevron up icon
6. Chapter 6: Reinforcement Learning in the Real World – Building Intelligent Agents to Complete Your To-Dos Chevron down icon Chevron up icon
7. Chapter 7: Deploying Deep RL Agents to the Cloud Chevron down icon Chevron up icon
8. Chapter 8: Distributed Training for Accelerated Development of Deep RL Agents Chevron down icon Chevron up icon
9. Chapter 9: Deploying Deep RL Agents on Multiple Platforms Chevron down icon Chevron up icon
10. Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.