References
- Peters, M.E., Ammar, W., Bhagavatula, C., & Power, R. (2017). Semi-supervised sequence tagging with bidirectional language models. arXiv. https://arxiv.org/abs/1705.00108
- Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contexualized word representations. arXiv. https://arxiv.org/abs/1802.05365
- Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. arXiv. https://arxiv.org/abs/1801.06146
- Cheng, J., Dong, L., & Lapata, M. (2016). Long Short-Term Memory-Networks for Machine Reading. arXiv. https://arxiv.org/abs/1601.06733
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. arXiv. https://arxiv.org/abs/1706.03762
- Devlin, J., Chang, M-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language...