Reader small image

You're reading from  Hands-On Intelligent Agents with OpenAI Gym

Product typeBook
Published inJul 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788836579
Edition1st Edition
Languages
Right arrow
Author (1)
Palanisamy P
Palanisamy P
author image
Palanisamy P

Praveen Palanisamy works on developing autonomous intelligent systems. He is currently an AI researcher at General Motors R&D. He develops planning and decision-making algorithms and systems that use deep reinforcement learning for autonomous driving. Previously, he was at the Robotics Institute, Carnegie Mellon University, where he worked on autonomous navigation, including perception and AI for mobile robots. He has experience developing complete, autonomous, robotic systems from scratch.
Read more about Palanisamy P

Right arrow

Implementing an Intelligent - Autonomous Car Driving Agent using Deep Actor-Critic Algorithm

In Chapter 6, Implementing an Intelligent Agent for Optimal Control using Deep Q-Learning, we implemented agents using deep Q-learning to solve discrete control tasks that involve discrete actions or decisions to be made. We saw how they can be trained to play video games such as Atari, just like we do: by looking at the game screen and pressing the buttons on the game pad/joystick. We can use such agents to pick the best choice given a finite set of choices, make decisions, or perform actions where the number of possible decisions or actions is finite and typically small. There are numerous real-world problems that can be solved with an agent that can learn to take optimal through to discrete actions. We saw some examples in Chapter 6, Implementing an Intelligent Agent for Optimal Discrete...

The deep n-step advantage actor-critic algorithm

In our deep Q-learner-based intelligent agent implementation, we used a deep neural network as the function approximator to represent the action-value function. The agent then used the action-value function to come up with a policy based on the value function. In particular, we used the -greedy algorithm in our implementation. So, we understand that ultimately the agent has to know what actions are good to take given an observation/state. Instead of parametrizing or approximating a state/action action function and then deriving a policy based on that function, can we not parametrize the policy directly? Yes we can! That is the exact idea behind policy gradient methods.

In the following subsections, we will briefly look at policy gradient-based learning methods and then transition to actor-critic methods that combine and make use...

Implementing a deep n-step advantage actor critic agent

We have prepared ourselves with all the background information required to implement the deep n-step advantage actor-critic (A2C) agent. Let's look at an overview of the agent implementation process and then jump right into the hands-on implementation.

The following is the high-level flow of our A2C agent:

  1. Initialize the actor's and critic's networks.
  2. Use the current policy of the actor to gather n-step experiences from the environment and calculate the n-step return.
  1. Calculate the actor's and critic's losses.
  2. Perform the stochastic gradent descent optimization step to update the actor and critic parameters.
  3. Repeat from step 2.

We will implement the agent in a Python class named DeepActorCriticAgent. You will find the full implementation in this book's code repository under 8th chapter: ch8...

Training an intelligent and autonomous driving agent

We now have all the pieces we need to accomplish our goal for this chapter, which is to put together an intelligent, autonomous driving agent, and then train it to drive a car autonomously in the photo-realistic CARLA driving environment that we developed as a learning environment using the Gym interface in the previous chapter. The agent training process can take a while. Depending on the hardware of the machine that you are going to train the agent on, it may take anywhere from a few hours for simpler environments (such asPendulum-v0, CartPole-v0, and some of the Atari games) to a few days for complex environments (such as the CARLA driving environment). In order to first get a good understanding of the training process and how to monitor progress while the agent is training, we will start with a few simple examples to walk...

Summary

In this chapter, we got hands-on with an actor-critic architecture-based deep reinforcement learning agent, starting from the basics. We started with the introduction to policy gradient-based methods and walked through the step-by-step process of representing the objective function for the policy gradient optimization, understanding the likelihood ratio trick, and finally deriving the policy gradient theorem. We then looked at how the actor-critic architecture makes use of the policy gradient theorem and uses an actor component to represent the policy of the agent, and a critic component to represent the state/action/advantage value function, depending on the implementation of the architecture. With an intuitive understanding of the actor-critic architecture, we moved on to the A2C algorithm and discussed the six steps involved in it. We then discussed the n-step return...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Intelligent Agents with OpenAI Gym
Published in: Jul 2018Publisher: PacktISBN-13: 9781788836579
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Palanisamy P

Praveen Palanisamy works on developing autonomous intelligent systems. He is currently an AI researcher at General Motors R&D. He develops planning and decision-making algorithms and systems that use deep reinforcement learning for autonomous driving. Previously, he was at the Robotics Institute, Carnegie Mellon University, where he worked on autonomous navigation, including perception and AI for mobile robots. He has experience developing complete, autonomous, robotic systems from scratch.
Read more about Palanisamy P