8.5 Using uncertainty estimates for smarter reinforcement learning
Reinforcement learning aims to develop machine learning techniques capable of learning from their environment. There’s a clue to the fundamental principle behind reinforcement learning in its name: the aim is to reinforce successful behaviour. Generally speaking, in reinforcement learning, we have an agent capable of executing a number of actions in an environment. Following these actions, the agent receives feedback from the environment, and this feedback is used to allow the agent to build a better understanding of which actions are more likely to lead to a positive outcome given the current state of the environment.
Formally, we can describe this using a set of states, S, a set of actions A, which map from a current state s to a new state s′, and a reward function, R(s,s′), describing the reward for the transition between the current state, s, and the new state, s′. The set of states comprises...