Further Reading
- Pretraining of Deep Bidirectional Transformers for Language Understanding, Devlin, et al.(2018): https://arxiv.org/abs/1810.04805
- RoBERTa: A Robustly Optimized BERT Pretraining, Liu et al.(2019) Approach: https://arxiv.org/abs/1907.11692