Reader small image

You're reading from  Mastering Transformers

Product typeBook
Published inSep 2021
PublisherPackt
ISBN-139781801077651
Edition1st Edition
Right arrow
Authors (2):
Savaş Yıldırım
Savaş Yıldırım
author image
Savaş Yıldırım

Savaş Yıldırım graduated from the Istanbul Technical University Department of Computer Engineering and holds a Ph.D. degree in Natural Language Processing (NLP). Currently, he is an associate professor at the Istanbul Bilgi University, Turkey, and is a visiting researcher at the Ryerson University, Canada. He is a proactive lecturer and researcher with more than 20 years of experience teaching courses on machine learning, deep learning, and NLP. He has significantly contributed to the Turkish NLP community by developing a lot of open source software and resources. He also provides comprehensive consultancy to AI companies on their R&D projects. In his spare time, he writes and directs short films, and enjoys practicing yoga.
Read more about Savaş Yıldırım

Meysam Asgari- Chenaghlu
Meysam Asgari- Chenaghlu
author image
Meysam Asgari- Chenaghlu

Meysam Asgari-Chenaghlu is an AI manager at Carbon Consulting and is also a Ph.D. candidate at the University of Tabriz. He has been a consultant for Turkey's leading telecommunication and banking companies. He has also worked on various projects, including natural language understanding and semantic search.
Read more about Meysam Asgari- Chenaghlu

View More author details
Right arrow

Preface

We've seen big changes in Natural Language Processing (NLP) over the last 20 years. During this time, we have experienced different paradigms and finally entered a new era dominated by the magical transformer architecture. This deep learning architecture has come about by inheriting many approaches. Contextual word embeddings, multi-head self-attention, positional encoding, parallelizable architectures, model compression, transfer learning, and cross-lingual models are among those approaches. Starting with the help of various neural-based NLP approaches, the transformer architecture gradually evolved into an attention-based encoder-decoder architecture and continues to evolve to this day. Now, we are seeing new successful variants of this architecture in the literature. Great models have emerged that use only the encoder part of it, such as BERT, or only the decoder part of it, such as GPT.

Throughout the book, we will touch on these NLP approaches and will be able to work with transformer models easily thanks to the Transformers library from the Hugging Face community. We will provide the solutions step by step to a wide variety of NLP problems, ranging from summarization to question-answering. We will see that we can achieve state-of-the-art results with the help of transformers.

Who this book is for

This book is for deep learning researchers, hands-on NLP practitioners, and machine learning/NLP educators and students who want to start their journey with the transformer architecture. Beginner-level machine learning knowledge and a good command of Python will help you get the most out of this book.

What this book covers

Chapter 1, From Bag-of-Words to the Transformers, provides a brief introduction to the history of NLP, providing a comparison between traditional methods, deep learning models such as CNNs, RNNs, and LSTMs, and transformer models.

Chapter 2, A Hands-On Introduction to the Subject, takes a deeper look at how a transformer model can be used. Tokenizers and models such as BERT will be described with hands-on examples.

Chapter 3, Autoencoding Language Models, is where you will gain knowledge about how to train autoencoding language models on any given language from scratch. This training will include pretraining and the task-specific training of models.

Chapter 4, Autoregressive and Other Language Models, explores the theoretical details of autoregressive language models and teaches you about pretraining them on their own corpus. You will learn how to pretrain any language model such as GPT-2 on their own text and use the model in various tasks such as language generation.

Chapter 5, Fine-Tuning Language Models for Text Classification, is where you will learn how to configure a pre-trained model for text classification and how to fine-tune it for any text classification downstream task, such as sentiment analysis or multi-class classification.

Chapter 6, Fine-Tuning Language Models for Token Classification, teaches you how to fine-tune language models for token classification tasks such as NER, POS tagging, and question-answering.

Chapter 7, Text Representation, is where you will learn about text representation techniques and how to efficiently utilize the transformer architecture, especially for unsupervised tasks such as clustering, semantic search, and topic modeling.

Chapter 8, Working with Efficient Transformers, shows you how to make efficient models out of trained models by using distillation, pruning, and quantization. Then, you will gain knowledge about efficient sparse transformers, such as Linformer and BigBird, and how to work with them.

Chapter 9, Cross-Lingual and Multilingual Language Modeling, is where you will learn about multilingual and cross-lingual language model pretraining and the difference between monolingual and multilingual pretraining. Causal language modeling and translation language modeling are the other topics covered in the chapter.

Chapter 10, Serving Transformer Models, will detail how to serve transformer-based NLP solutions in environments where CPU/GPU is available. Using TensorFlow Extended (TFX) for machine learning deployment will be described here also.

Chapter 11, Attention Visualization and Experiment Tracking, will cover two different technical concepts: attention visualization and experiment tracking. We will practice them using sophisticated tools such as exBERT and BertViz.

To get the most out of this book

To follow this book, you need to have a basic knowledge of the Python programming language. It is also a required that you know the basics of NLP, deep learning, and how deep neural networks work.

Important note

All the code in this book has been executed in the Python 3.6 version since some of the libraries in the Python 3.9 version are in development stages.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Mastering-Transformers. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Code in Action

The Code in Action videos for this book can be viewed at https://bit.ly/3i4vFzJ.

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801077651_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Sequences that are shorter than max_sen_len (maximum sentence length) are padded with a PAD value until they are max_sen_len in length."

A block of code is set as follows:

max_sen_len=max([len(s.split()) for s in sentences])
words = ["PAD"]+ list(set([w for s in sentences for w in s.split()]))
word2idx= {w:i for i,w in enumerate(words)}
max_words=max(word2idx.values())+1
idx2word= {i:w for i,w in enumerate(words)}
train=[list(map(lambda x:word2idx[x], s.split())) for s in sentences]

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ conda activate transformers
$ conda install -c conda-forge tensorflow

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "We must now take care of the computational cost of a particular model for a given environment (Random Access Memory (RAM), CPU, and GPU) in terms of memory usage and speed."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read Mastering Transformers, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Transformers
Published in: Sep 2021Publisher: PacktISBN-13: 9781801077651
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Savaş Yıldırım

Savaş Yıldırım graduated from the Istanbul Technical University Department of Computer Engineering and holds a Ph.D. degree in Natural Language Processing (NLP). Currently, he is an associate professor at the Istanbul Bilgi University, Turkey, and is a visiting researcher at the Ryerson University, Canada. He is a proactive lecturer and researcher with more than 20 years of experience teaching courses on machine learning, deep learning, and NLP. He has significantly contributed to the Turkish NLP community by developing a lot of open source software and resources. He also provides comprehensive consultancy to AI companies on their R&D projects. In his spare time, he writes and directs short films, and enjoys practicing yoga.
Read more about Savaş Yıldırım

author image
Meysam Asgari- Chenaghlu

Meysam Asgari-Chenaghlu is an AI manager at Carbon Consulting and is also a Ph.D. candidate at the University of Tabriz. He has been a consultant for Turkey's leading telecommunication and banking companies. He has also worked on various projects, including natural language understanding and semantic search.
Read more about Meysam Asgari- Chenaghlu