Reader small image

You're reading from  Statistics for Machine Learning

Product typeBook
Published inJul 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781788295758
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Pratap Dangeti
Pratap Dangeti
author image
Pratap Dangeti

Pratap Dangeti develops machine learning and deep learning solutions for structured, image, and text data at TCS, analytics and insights, innovation lab in Bangalore. He has acquired a lot of experience in both analytics and data science. He received his master's degree from IIT Bombay in its industrial engineering and operations research program. He is an artificial intelligence enthusiast. When not working, he likes to read about next-gen technologies and innovative methodologies.
Read more about Pratap Dangeti

Right arrow

Q-learning - off-policy TD control


Q-learning is the most popular method used in practical applications for many reinforcement learning problems. The off-policy TD control algorithm is known as Q-learning. In this case, the learned action-value function, Q directly approximates

, the optimal action-value function, independent of the policy being followed. This approximation simplifies the analysis of the algorithm and enables early convergence proofs. The policy still has an effect, in that it determines which state-action pairs are visited and updated. However, all that is required for correct convergence is that all pairs continue to be updated. As we know, this is a minimal requirement in the sense that any method guaranteed to find optimal behavior in the general case must require it. An algorithm of convergence is shown in the following steps:

  1. Initialize:
  1. Repeat (for each episode):
    • Initialize S
    • Repeat (for each step of episode):
      • Choose A from S using policy derived from Q (for example...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Statistics for Machine Learning
Published in: Jul 2017Publisher: PacktISBN-13: 9781788295758

Author (1)

author image
Pratap Dangeti

Pratap Dangeti develops machine learning and deep learning solutions for structured, image, and text data at TCS, analytics and insights, innovation lab in Bangalore. He has acquired a lot of experience in both analytics and data science. He received his master's degree from IIT Bombay in its industrial engineering and operations research program. He is an artificial intelligence enthusiast. When not working, he likes to read about next-gen technologies and innovative methodologies.
Read more about Pratap Dangeti