Reader small image

You're reading from  Hands-On Reinforcement Learning with Python

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788836524
Edition1st Edition
Languages
Right arrow
Author (1)
Sudharsan Ravichandiran
Sudharsan Ravichandiran
author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran

Right arrow

Capstone Project – Car Racing Using DQN

In the last few chapters, we have learned how Deep Q learning works by approximating the q function with a neural network. Following this, we have seen various improvements to Deep Q Network (DQN) such as Double Q learning, dueling network architectures, and the Deep Recurrent Q Network. We have seen how DQN makes use of a replay buffer to store the agent's experience and trains the network with the mini-batch of samples from the buffer. We have also implemented DQNs for playing Atari games and a Deep Recurrent Q Network (DRQN) for playing the Doom game. In this chapter, let's get into the detailed implementation of a dueling DQN, which is essentially the same as a regular DQN, except the final fully connected layer will be broken down into two streams, namely a value stream and an advantage stream, and these two streams...

Environment wrapper functions

The credit for the code used in this chapter goes to Giacomo Spigler's GitHub repository (https://github.com/spiglerg/DQN_DDQN_Dueling_and_DDPG_Tensorflow). Throughout this chapter, the code is explained at each and every line. For a complete structured code, check the above GitHub repository.

First, we import all the necessary libraries:

import numpy as np
import tensorflow as tf
import gym
from gym.spaces import Box
from scipy.misc import imresize
import random
import cv2
import time
import logging
import os
import sys

We define the EnvWrapper class and define some of the environment wrapper functions:

class EnvWrapper:

We define the __init__ method and initialize variables:

   def __init__(self, env_name, debug=False):

Initialize the gym environment:

        self.env = gym.make(env_name)

Get the action_space:

        self.action_space = self.env.action_space...

Dueling network

Now, we build our dueling DQN; we build three convolutional layers followed by two fully connected layers, and the final fully connected layer will be split into two separate layers for value stream and advantage stream. We will use the aggregate layer, which combines both the value stream and the advantage stream, to compute the q value. The dimensions of these layers are given as follows:

  • Layer 1: 32 8x8 filters with stride 4 + RELU
  • Layer 2: 64 4x4 filters with stride 2 + RELU
  • Layer 3: 64 3x3 filters with stride 1 + RELU
  • Layer 4a: 512 unit fully-connected layer + RELU
  • Layer 4b: 512 unit fully-connected layer + RELU
  • Layer 5a: 1 unit FC + RELU (state value)
  • Layer 5b: Actions FC + RELU (advantage value)
  • Layer6: Aggregate V(s)+A(s,a)
class QNetworkDueling(QNetwork):

We define the __init__ method to initialize all layers:


def __init__(self, input_size, output_size...

Replay memory

Now, we build the experience replay buffer, which is used for storing all the agent's experience. We sample a minibatch of experience from the replay buffer for training the network:

class ReplayMemoryFast:

First, we define the __init__ method and initiate the buffer size:


def __init__(self, memory_size, minibatch_size):

# max number of samples to store
self.memory_size = memory_size

# minibatch size
self.minibatch_size = minibatch_size

self.experience = [None]*self.memory_size
self.current_index = 0
self.size = 0

Next, we define the store function for storing the experiences:

 def store(self, observation, action, reward, newobservation, is_terminal):

Store the experience as a tuple (current state, action, reward, next state, is it a terminal state):

        self.experience[self.current_index] = (observation...

Training the network

Now, we will see how to train the network.

First, we define the DQN class and initialize all variables in the __init__ method:

class DQN(object):
def __init__(self, state_size,
action_size,
session,
summary_writer = None,
exploration_period = 1000,
minibatch_size = 32,
discount_factor = 0.99,
experience_replay_buffer = 10000,
target_qnet_update_frequency = 10000,
initial_exploration_epsilon = 1.0,
final_exploration_epsilon = 0.05,
reward_clipping = -1,
):

Initialize all variables:

   
self.state_size = state_size
self.action_size = action_size


self.session = session
...

Car racing

So far, we have seen how to build a dueling DQN. Now, we will see how to make use of our dueling DQN when playing the car racing game.

First, let's import our necessary libraries:

import gym
import time
import logging
import os
import sys
import tensorflow as tf

Initialize all of the necessary variables:

ENV_NAME = 'Seaquest-v0'
TOTAL_FRAMES = 20000000
MAX_TRAINING_STEPS = 20*60*60/3
TESTING_GAMES = 30
MAX_TESTING_STEPS = 5*60*60/3
TRAIN_AFTER_FRAMES = 50000
epoch_size = 50000
MAX_NOOP_START = 30
LOG_DIR = 'logs'
outdir = 'results'
logger = tf.train.SummaryWriter(LOG_DIR)
# Intialize tensorflow session
session = tf.InteractiveSession()

Build the agent:

agent = DQN(state_size=env.observation_space.shape,
action_size=env.action_space.n,
session=session,
summary_writer = logger,
exploration_period = 1000000,
minibatch_size = 32,
discount_factor = 0...

Summary

In this chapter, we have learned how to implement a dueling DQN in detail. We started off with the basic environment wrapper functions for preprocessing our game screens and then we defined the QNetworkDueling class. Here, we implemented a dueling Q Network, which splits the final fully connected layer of DQN into a value stream and an advantage stream and then combines these two streams to compute the q value. Following this, we saw how to create a replay buffer, which is used to store the experience and samples a minibatch of experience for training the network, and finally, we initialized our car racing environment using OpenAI's Gym and trained our agent. In the next chapter, Chapter 13, Recent Advancements and Next Steps, we will see some of the recent advancements in RL.

Questions

The question list is as follows:

  1. What is the difference between a DQN and a dueling DQN?
  2. Write the Python code for a replay buffer.
  3. What is a target network?
  4. Write the Python code for a prioritized experience replay buffer.
  5. Create a Python function to decay an epsilon-greedy policy.
  6. How does a dueling DQN differ from a double DQN?
  7. Create a Python function for updating primary network weights to the target network.

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Reinforcement Learning with Python
Published in: Jun 2018Publisher: PacktISBN-13: 9781788836524
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sudharsan Ravichandiran

Sudharsan Ravichandiran is a data scientist and artificial intelligence enthusiast. He holds a Bachelors in Information Technology from Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning including natural language processing and computer vision. He is an open-source contributor and loves answering questions on Stack Overflow.
Read more about Sudharsan Ravichandiran