Reader small image

You're reading from  TensorFlow 2 Reinforcement Learning Cookbook

Product typeBook
Published inJan 2021
Reading LevelExpert
PublisherPackt
ISBN-139781838982546
Edition1st Edition
Languages
Right arrow
Author (1)
Palanisamy P
Palanisamy P
author image
Palanisamy P

Praveen Palanisamy works on developing autonomous intelligent systems. He is currently an AI researcher at General Motors R&D. He develops planning and decision-making algorithms and systems that use deep reinforcement learning for autonomous driving. Previously, he was at the Robotics Institute, Carnegie Mellon University, where he worked on autonomous navigation, including perception and AI for mobile robots. He has experience developing complete, autonomous, robotic systems from scratch.
Read more about Palanisamy P

Right arrow

Chapter 3: Implementing Advanced RL Algorithms

This chapter provides short and crisp recipes to implement advanced Reinforcement Learning (RL) algorithms and agents from scratch using TensorFlow 2.x. It includes recipes to build Deep-Q-Networks (DQN), Double and Dueling Deep Q-Networks (DDQN, DDDQN), Deep Recurrent Q-Networks (DRQN), Asynchronous Advantage Actor-Critic (A3C), Proximal Policy Optimization (PPO), and Deep Deterministic Policy Gradients (DDPG).

The following recipes are discussed in this chapter:

  • Implementing the Deep Q-Learning algorithm, DQN, and Double-DQN agent
  • Implementing the Dueling DQN agent
  • Implementing the Dueling Double DQN algorithm and DDDQN agent
  • Implementing the Deep Recurrent Q-Learning algorithm and DRQN agent
  • Implementing the Asynchronous Advantage Actor-Critic algorithm and A3C agent
  • Implementing the Proximal Policy Optimization algorithm and PPO agent
  • Implementing the Deep Deterministic Policy Gradient algorithm and...

Technical requirements

The code in the book is extensively tested on Ubuntu 18.04 and Ubuntu 20.04 and should work with later versions of Ubuntu if Python 3.6+ is available. With Python 3.6+ installed along with the necessary Python packages as listed before the start of each of the recipes, the code should run fine on Windows and Mac OS X too. It is advised to create and use a Python virtual environment named tf2rl-cookbook to install the packages and run the code in this book. Miniconda or Anaconda installation for Python virtual environment management is recommended.

The complete code for each recipe in each chapter is available here: https://github.com/PacktPublishing/Tensorflow-2-Reinforcement-Learning-Cookbook.

Implementing the Deep Q-Learning algorithm, DQN, and Double-DQN agent

DQN agent uses a deep neural network to learn the Q-value function. DQN has shown itself to be a powerful algorithm for discrete action-space environments and problems and is considered to be a notable milestone in the history of deep reinforcement learning when DQN mastered Atari Games.

The Double-DQN agent uses two identical deep neural networks that are updated differently and so hold different weights. The second neural network is a copy of the main neural network from some time in the past (typically from the last episode).

By the end of this recipe, you will have implemented a complete DQN and Double-DQN agent from scratch using TensorFlow 2.x that is ready to be trained in any discrete action-space RL environment.

Let's get started.

Getting ready

To complete this recipe, you will first need to activate the tf2rl-cookbook Conda Python virtual environment and pip install -r requirements...

Implementing the Dueling DQN agent

A Dueling DQN agent explicitly estimates two quantities through a modified network architecture:

  • State values, V(s)
  • Advantage values, A(s, a)

The state value estimates the value of being in state s, and the advantage value represents the advantage of taking action a in state s. This key idea of explicitly and separately estimating the two quantities enables the Dueling DQN to perform better in comparison to DQN. This recipe will walk you through the steps to implement a Dueling DQN agent from scratch using TensorFlow 2.x.

Getting ready

To complete this recipe, you will first need to activate the tf2rl-cookbook Conda Python virtual environment and pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started!

import argparse
import os
import random
from collections import deque
from datetime import datetime
import gym
import numpy as np
import tensorflow as tf
from...

Implementing the Dueling Double DQN algorithm and DDDQN agent

Dueling Double DQN (DDDQN) combines the benefits of both Double Q-learning and Dueling architecture. Double Q-learning corrects DQN from overestimating the action values. The Dueling architecture uses a modified architecture to separately learn the state value function (V) and the advantage function (A). This explicit separation allows the algorithm to learn faster, especially when there are many actions to choose from and when the actions are very similar to each other. The dueling architecture enables the agent to learn even when only one action in a state has been taken, as it can update and estimate the state value function, unlike the DQN agent, which cannot learn from actions that were not taken yet. By the end of this recipe, you will have a complete implementation of the DDDQN agent.

Getting ready

To complete this recipe, you will first need to activate the tf2rl-cookbook Conda Python virtual environment and...

Implementing the Deep Recurrent Q-Learning algorithm and DRQN agent

DRQN uses a recurrent neural network to learn the Q-value function. DRQN is more suited for reinforcement learning in environments with partial observability. The recurrent network layers in the DRQN allow the agent to learn by integrating information from a temporal sequence of observations. For example, DRQN agents can infer the velocity of moving objects in the environment without any changes to their inputs (for example, no frame stacking is required). By the end of this recipe, you will have a complete DRQN agent ready to be trained in an RL environment of your choice.

Getting ready

To complete this recipe, you will first need to activate the tf2rl-cookbook Conda Python virtual environment and pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started!

import tensorflow as tf
from datetime import datetime
import os
from tensorflow.keras.layers...

Implementing the Asynchronous Advantage Actor-Critic algorithm and A3C agent

The A3C algorithm builds upon the Actor-Critic class of algorithms by using a neural network to approximate the actor (and critic). The actor learns the policy function using a deep neural network, while the critic estimates the value function. The asynchronous nature of the algorithm allows the agent to learn from different parts of the state space, allowing parallel learning and faster convergence. Unlike DQN agents, which use an experience replay memory, the A3C agent uses multiple workers to gather more samples for learning. By the end of this recipe, you will have a complete script to train an A3C agent for any continuous action valued environment of your choice!

Getting ready

To complete this recipe, you will first need to activate the tf2rl-cookbook Conda Python virtual environment and pip install -r requirements.txt. If the following import statements run without issues, you are ready to get...

Implementing the Proximal Policy Optimization algorithm and PPO agent

The Proximal Policy Optimization (PPO) algorithm builds upon the work of Trust Region Policy Optimization (TRPO) to constrain the new policy to be within a trust region from the old policy. PPO simplifies the implementation of this core idea by using a clipped surrogate objective function that is easier to implement, yet quite powerful and efficient. It is one of the most widely used RL algorithms, especially for continuous control problems. By the end of this recipe, you will have built a PPO agent that you can train in your RL environment of choice.

Getting ready

To complete this recipe, you will first need to activate the tf2rl-cookbook Conda Python virtual environment and pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started!

import argparse
import os
from datetime import datetime
import gym
import numpy as np
import tensorflow as tf
from...

Implementing the Deep Deterministic Policy Gradient algorithm and DDPG agent

Deterministic Policy Gradient (DPG) is a type of Actor-Critic RL algorithm that uses two neural networks: one for estimating the action value function, and the other for estimating the optimal target policy. The Deep Deterministic Policy Gradient (DDPG) agent builds upon the idea of DPG and is quite efficient compared to vanilla Actor-Critic agents due to the use of deterministic action policies. By completing this recipe, you will have access to a powerful agent that can be trained efficiently in a variety of RL environments.

Getting ready

To complete this recipe, you will first need to activate the tf2rl-cookbook Conda Python virtual environment and pip install -r requirements.txt. If the following import statements run without issues, you are ready to get started!

import argparse
import os
import random
from collections import deque
from datetime import datetime
import gym
import numpy as np
import...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
TensorFlow 2 Reinforcement Learning Cookbook
Published in: Jan 2021Publisher: PacktISBN-13: 9781838982546
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Palanisamy P

Praveen Palanisamy works on developing autonomous intelligent systems. He is currently an AI researcher at General Motors R&D. He develops planning and decision-making algorithms and systems that use deep reinforcement learning for autonomous driving. Previously, he was at the Robotics Institute, Carnegie Mellon University, where he worked on autonomous navigation, including perception and AI for mobile robots. He has experience developing complete, autonomous, robotic systems from scratch.
Read more about Palanisamy P