References
- Hugging Face Reformer: https://huggingface.co/transformers/model_doc/reformer.html?highlight=reformer
- Hugging Face DeBERTa: https://huggingface.co/transformers/model_doc/deberta.html
- Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, 2020, Decoding-enhanced BERT with Disentangled Attention: https://arxiv.org/abs/2006.03654
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, 2020, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929
- OpenAI: https://openai.com/
- William Fedus, Barret Zoph, Noam Shazeer, 2021, Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity: https://arxiv.org/abs/2101.03961
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal...