DAgger
The algorithm for DAgger is given as follows:
- Initialize an empty dataset

- Initialize a policy

- For iterations i = 1 to N:
- Create a policy
. - Generate a trajectory using the policy
. - Create a dataset
by collecting states visited by the policy
and the actions of those states provided by the expert
. Thus,
. - Aggregate the dataset as
. - Train a classifier on the updated dataset
and extract a new policy
.
- Create a policy