Reader small image

You're reading from  Reinforcement Learning with TensorFlow

Product typeBook
Published inApr 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788835725
Edition1st Edition
Languages
Right arrow
Author (1)
Sayon Dutta
Sayon Dutta
author image
Sayon Dutta

Sayon Dutta is an Artificial Intelligence researcher and developer. A graduate from IIT Kharagpur, he owns the software copyright for Mobile Irrigation Scheduler. At present, he is an AI engineer at Wissen Technology. He co-founded an AI startup Marax AI Inc., focused on AI-powered customer churn prediction. With over 2.5 years of experience in AI, he invests most of his time implementing AI research papers for industrial use cases, and weightlifting.
Read more about Sayon Dutta

Right arrow

Policy objective functions


Let's discuss now how to optimize a policy. In policy methods, our main objective is that a given policy 

 with parameter vector 

 finds the best values of the parameter vector. In order to measure which is the best, we measure

 the quality of the policy 

 for different values of the parameter vector 

.

Before discussing the optimization methods, let's first figure out the different ways to measure the quality of a policy 

:

  • If it's an episodic environment, 
     can be the value function of the start state
     that is if it starts from any state 
    , then the value function of it would be the expected sum of reward from that state onwards. Therefore,
  • If it's a continuing environment, 
     can be the average value function of the states. So, if the environment goes on and on forever, then the measure of the quality of the policy can be the summation of the probability of being in any state s that is
     times the value of that state that is, the expected reward from that state onward. Therefore...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Reinforcement Learning with TensorFlow
Published in: Apr 2018Publisher: PacktISBN-13: 9781788835725

Author (1)

author image
Sayon Dutta

Sayon Dutta is an Artificial Intelligence researcher and developer. A graduate from IIT Kharagpur, he owns the software copyright for Mobile Irrigation Scheduler. At present, he is an AI engineer at Wissen Technology. He co-founded an AI startup Marax AI Inc., focused on AI-powered customer churn prediction. With over 2.5 years of experience in AI, he invests most of his time implementing AI research papers for industrial use cases, and weightlifting.
Read more about Sayon Dutta