Reader small image

You're reading from  Advanced Deep Learning with Keras

Product typeBook
Published inOct 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788629416
Edition1st Edition
Languages
Right arrow
Author (1)
Rowel Atienza
Rowel Atienza
author image
Rowel Atienza

Rowel Atienza is an Associate Professor at the Electrical and Electronics Engineering Institute of the University of the Philippines, Diliman. He holds the Dado and Maria Banatao Institute Professorial Chair in Artificial Intelligence. Rowel has been fascinated with intelligent robots since he graduated from the University of the Philippines. He received his MEng from the National University of Singapore for his work on an AI-enhanced four-legged robot. He finished his Ph.D. at The Australian National University for his contribution on the field of active gaze tracking for human-robot interaction. Rowel's current research work focuses on AI and computer vision. He dreams on building useful machines that can perceive, understand, and reason. To help make his dreams become real, Rowel has been supported by grants from the Department of Science and Technology (DOST), Samsung Research Philippines, and Commission on Higher Education-Philippine California Advanced Research Institutes (CHED-PCARI).
Read more about Rowel Atienza

Right arrow

Temporal-difference learning

Q-Learning is a special case of a more generalized Temporal-Difference Learning or TD-Learning Temporal-difference learning. More specifically, it's a special case of one-step TD-Learning TD(0):

Temporal-difference learning (Equation 9.5.1)

In the equation Temporal-difference learning is the learning rate. We should note that when Temporal-difference learning, Equation 9.5.1 is similar to the Bellman equation. For simplicity, we'll refer to Equation 9.5.1 as Q-Learning or generalized Q-Learning.

Previously, we referred to Q-Learning as an off-policy RL algorithm since it learns the Q value function without directly using the policy that it is trying to optimize. An example of an on-policy one-step TD-learning algorithm is SARSA which similar to Equation 9.5.1:

Temporal-difference learning (Equation 9.5.2)

The main difference is the use of the policy that is being optimized to determine a'. The terms s, a, r, s' and a' (thus the name SARSA) must be known to update the Q value function at every iteration. Both Q-Learning and SARSA use existing estimates...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Advanced Deep Learning with Keras
Published in: Oct 2018Publisher: PacktISBN-13: 9781788629416

Author (1)

author image
Rowel Atienza

Rowel Atienza is an Associate Professor at the Electrical and Electronics Engineering Institute of the University of the Philippines, Diliman. He holds the Dado and Maria Banatao Institute Professorial Chair in Artificial Intelligence. Rowel has been fascinated with intelligent robots since he graduated from the University of the Philippines. He received his MEng from the National University of Singapore for his work on an AI-enhanced four-legged robot. He finished his Ph.D. at The Australian National University for his contribution on the field of active gaze tracking for human-robot interaction. Rowel's current research work focuses on AI and computer vision. He dreams on building useful machines that can perceive, understand, and reason. To help make his dreams become real, Rowel has been supported by grants from the Department of Science and Technology (DOST), Samsung Research Philippines, and Commission on Higher Education-Philippine California Advanced Research Institutes (CHED-PCARI).
Read more about Rowel Atienza