Building Monte Carlo prediction and control algorithms for RL
This recipe provides the ingredients for building a Monte Carlo prediction and control algorithm so that you can build your RL agents. Similar to the temporal difference learning algorithm, Monte Carlo learning methods can be used to learn both the state and the action value functions. Monte Carlo methods have zero bias since they learn from complete episodes with real experience, without approximate predictions. These methods are suitable for applications that require good convergence properties. The following diagram illustrates the value that's learned by the Monte Carlo method for the GridworldV2 environment:
Getting ready
To complete this recipe, you will need to activate the tf2rl-cookbook
Python/conda virtual environment and run pip install -r requirements.txt
. If the following import...