References
- OpenAI. (2018). Spinning Up. URL: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
- Williams R. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 229-256. URL: https://link.springer.com/article/10.1007/BF00992696
- Sutton, R. et al. (1999). Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS. URL: https://bit.ly/3lOMFs7
- Silver, D. et al. (2014). Deterministic Policy Gradient Algorithms. Journal of Machine Learning Research. URL: http://proceedings.mlr.press/v32/silver14.pdf
- Mnih, Volodymyr, et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. arXiv.org, http://arxiv.org/abs/1602.01783.
- Gu, Shixiang, et al. (2016). Continuous Deep Q-Learning with Model-Based Acceleration. arXiv.org, http://arxiv.org/abs/1603.00748
- Schulman, John, et al. (2017). Trust Region Policy Optimization. arXiv.org, http://arxiv.org/abs/1502...