You're reading from Mastering Transformers

Product type Book

Published in Sep 2021

Publisher Packt

ISBN-13 9781801077651

Pages 374 pages

Edition 1st Edition

Languages

Concepts

Mobile Application Development

Authors (2):

Savaş Yıldırım

Meysam Asgari- Chenaghlu

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications

2. Chapter 1: From Bag-of-Words to the Transformer

3. Chapter 2: A Hands-On Introduction to the Subject

4. Section 2: Transformer Models – From Autoencoding to Autoregressive Models

5. Chapter 3: Autoencoding Language Models

6. Chapter 4:Autoregressive and Other Language Models

7. Chapter 5: Fine-Tuning Language Models for Text Classification

8. Chapter 6: Fine-Tuning Language Models for Token Classification

9. Chapter 7: Text Representation

10. Section 3: Advanced Topics

11. Chapter 8: Working with Efficient Transformers

12. Chapter 9:Cross-Lingual and Multilingual Language Modeling

13. Chapter 10: Serving Transformer Models

14. Chapter 11: Attention Visualization and Experiment Tracking

15. Other Books You May Enjoy

Chapter 11: Attention Visualization and Experiment Tracking

In this chapter, we will cover two different technical concepts, attention visualization and experiment tracking, and we will practice them through sophisticated tools such as exBERT and BertViz. These tools provide important functions for interpretability and explainability. First, we will discuss how to visualize the inner parts of attention by utilizing the tools. It is important to interpret the learned representations and to understand the information encoded by self-attention heads in the Transformer. We will see that certain heads correspond to a certain aspect of syntax or semantics. Secondly, we will learn how to track experiments by logging and then monitoring by using TensorBoard and Weights & Biases (W&B). These tools enable us to efficiently host and track experimental results such as loss or other metrics, which helps us to optimize model training. You will learn how to use exBERT and BertViz to see the...

Technical requirements

The code for this chapter is found at https://github.com/PacktPublishing/Mastering-Transformers/tree/main/CH11, which is the GitHub repository for this book. We will be using Jupyter Notebook to run our coding exercises that require Python 3.6.0 or above, and the following packages will need to be installed:

tensorflow
pytorch
Transformers >=4.00
tensorboard
wandb
bertviz
ipywidgets

Check out the following link to see Code in Action Video:

https://bit.ly/3iM4Y1F

Interpreting attention heads

As with most Deep Learning (DL) architectures, both the success of the Transformer models and how they learn have been not fully understood, but we know that the Transformers—remarkably—learn many linguistic features of the language. A significant amount of learned linguistic knowledge is distributed both in the hidden state and in the self-attention heads of the pre-trained model. There have been substantial recent studies published and many tools developed to understand and to better explain the phenomena.

Thanks to some Natural Language Processing (NLP) community tools, we are able to interpret the information learned by the self-attention heads in a Transformer model. The heads can be interpreted naturally, thanks to the weights between tokens. We will soon see that in further experiments in this section, certain heads correspond to a certain aspect of syntax or semantics. We can also observe surface-level patterns and many other linguistic...

Tracking model metrics

So far, we have trained language models and simply analyzed the final results. We have not observed the training process or made a comparison of training using different options. In this section, we will briefly discuss how to monitor model training. For this, we will handle how to track the training of the models we developed before in Chapter 5, Fine-Tuning Language Models for Text Classification.

There are two important tools developed in this area—one is TensorBoard, and the other is W&B. With the former, we save the training results to a local drive and visualize them at the end of the experiment. With the latter, we are able to monitor the model-training progress live in a cloud platform.

This section will be a short introduction to these tools without going into much detail about them, as this is beyond the scope of this chapter.

Let's start with TensorBoard.

Tracking model training with TensorBoard

TensorBoard is a visualization...

Summary

In this chapter, we introduced two different technical concepts: attention visualization and experiment tracking. We visualized attention heads with the exBERT online interface first. Then, we studied BertViz, where we wrote Python code to see three BertViz visualizations: head view, model view, and neuron view. The BertViz interface gave us more control so that we could work with different language models. Moreover, we were also able to observe how attention weights between tokens are computed. These tools provide us with important functions for interpretability and exploitability. We also learned how to track our experiments to obtain higher-quality models and do error analysis. We utilized two tools to monitor training: TensorBoard and W&B. These tools were used to effectively track experiments and to optimize model training.

Congratulations! You've finished reading this book by demonstrating great perseverance and persistence throughout this journey. You can...

References

exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models, Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann, 2019.
Vig, J., 2019. A multiscale visualization of attention in the Transformer model. arXiv preprint arXiv:1906.05714.
Clark, K., Khandelwal, U., Levy, O. and Manning, C.D., 2019. What does bert look at? An analysis of bert's attention. arXiv preprint arXiv:1906.04341.7
Biewald, L., Experiment tracking with weights and biases, 2020. Software available from wandb.com, 2(5).
Rogers, A., Kovaleva, O. and Rumshisky, A.,2020. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, pp.842-866.
W&B: https://wandb.ai
TensorBoard: https://www.tensorflow.org/tensorboard
exBert—Hugging Face: https://huggingface.co/exbert
exBERT: https://exbert.net/

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.