MAML in Reinforcement Learning
The algorithm for MAML in the reinforcement learning setting is given as follows:
- Say we have a model f parameterized by a parameter
and we have a distribution over tasks p(T). First, we randomly initialize the model parameter
. - Sample a batch of tasks Ti from a distribution of tasks, that is, Ti ~ p(T).
- For each task Ti:
- Sample k trajectories using
and prepare the training dataset: 
- Train the model
on the training dataset
and compute the loss - Minimize the loss using gradient descent and get the optimal parameter
as 
- Sample k trajectories using
and prepare the test dataset: 
- Sample k trajectories using
- Now, we minimize the loss on the test dataset
. Parameterize the model f with the optimal parameter
calculated in the previous step and compute the loss
. Calculate the gradients of the loss and update our randomly initialized parameter
using our test (meta-training) dataset: 
- Repeat...