Reader small image

You're reading from  Reinforcement Learning with TensorFlow

Product typeBook
Published inApr 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788835725
Edition1st Edition
Languages
Right arrow
Author (1)
Sayon Dutta
Sayon Dutta
author image
Sayon Dutta

Sayon Dutta is an Artificial Intelligence researcher and developer. A graduate from IIT Kharagpur, he owns the software copyright for Mobile Irrigation Scheduler. At present, he is an AI engineer at Wissen Technology. He co-founded an AI startup Marax AI Inc., focused on AI-powered customer churn prediction. With over 2.5 years of experience in AI, he invests most of his time implementing AI research papers for industrial use cases, and weightlifting.
Read more about Sayon Dutta

Right arrow

Chapter 4. Policy Gradients

So far, we have seen how to derive implicit policies from a value function with the value-based approach. Here, an agent will try to learn the policy directly. The approach is similar, any experienced agent will change the policy after witnessing it.

Value iteration, policy iteration, and Q-learning come under the value-based approach solved by dynamic programming, while the policy optimization approach involves policy gradients and union of this knowledge along with policy iteration, giving rise to actor-critic algorithms.

As per the dynamic programming method, there are a set of self-consistent equations to satisfy the Q and V values. Policy optimization is different, where policy learning happens directly, unlike deriving from the value function:

Thus, value-based methods learn the value function and we derive an implicit policy, but with policy-based methods, no value function is learned and the policy is learnt directly. The actor-critic method is more advanced...

lock icon
The rest of the page is locked
Previous ChapterNext Page
You have been reading a chapter from
Reinforcement Learning with TensorFlow
Published in: Apr 2018Publisher: PacktISBN-13: 9781788835725

Author (1)

author image
Sayon Dutta

Sayon Dutta is an Artificial Intelligence researcher and developer. A graduate from IIT Kharagpur, he owns the software copyright for Mobile Irrigation Scheduler. At present, he is an AI engineer at Wissen Technology. He co-founded an AI startup Marax AI Inc., focused on AI-powered customer churn prediction. With over 2.5 years of experience in AI, he invests most of his time implementing AI research papers for industrial use cases, and weightlifting.
Read more about Sayon Dutta