You're reading from Mastering Transformers

Product type Book

Published in Sep 2021

Publisher Packt

ISBN-13 9781801077651

Pages 374 pages

Edition 1st Edition

Languages

Concepts

Authors (2):

Savaş Yıldırım

Meysam Asgari- Chenaghlu

Preface

1. Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications

2. Chapter 1: From Bag-of-Words to the Transformer

3. Chapter 2: A Hands-On Introduction to the Subject

4. Section 2: Transformer Models – From Autoencoding to Autoregressive Models

5. Chapter 3: Autoencoding Language Models

6. Chapter 4:Autoregressive and Other Language Models

7. Chapter 5: Fine-Tuning Language Models for Text Classification

8. Chapter 6: Fine-Tuning Language Models for Token Classification

9. Chapter 7: Text Representation

10. Section 3: Advanced Topics

11. Chapter 8: Working with Efficient Transformers

12. Chapter 9:Cross-Lingual and Multilingual Language Modeling

13. Chapter 10: Serving Transformer Models

14. Chapter 11: Attention Visualization and Experiment Tracking

15. Other Books You May Enjoy

References

Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Bahdanau, D., Cho, K. & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Pennington, J., Socher, R. & Manning, C. D. (2014, October). GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Bengio, Y., Simard, P, & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2), 157-166.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Kim, Y. (2014). Convolutional neural networks for sentence classification. CoRR abs/1408.5882 (2014). arXiv preprint arXiv:1408.5882.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N. & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional Transformers for language understanding. arXiv preprint arXiv:1810.04805.