MaxEnt Inverse Reinforcement Learning
The algorithm for maximum entropy inverse reinforcement learning is given as follows:
- Initialize the parameter
and gather the expert demonstrations 
- For N number of iterations:
- Compute the reward function

- Compute the policy using the value iteration with the reward function obtained in the previous step
- Compute the state visitation frequency
using the policy obtained in the previous step - Compute the gradient with respect to
, that is, 
- Update the value of
as 
- Compute the reward function