Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering Transformers

You're reading from  Mastering Transformers

Product type Book
Published in Sep 2021
Publisher Packt
ISBN-13 9781801077651
Pages 374 pages
Edition 1st Edition
Languages
Authors (2):
Savaş Yıldırım Savaş Yıldırım
Profile icon Savaş Yıldırım
Meysam Asgari- Chenaghlu Meysam Asgari- Chenaghlu
Profile icon Meysam Asgari- Chenaghlu
View More author details

Table of Contents (16) Chapters

Preface 1. Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications
2. Chapter 1: From Bag-of-Words to the Transformer 3. Chapter 2: A Hands-On Introduction to the Subject 4. Section 2: Transformer Models – From Autoencoding to Autoregressive Models
5. Chapter 3: Autoencoding Language Models 6. Chapter 4:Autoregressive and Other Language Models 7. Chapter 5: Fine-Tuning Language Models for Text Classification 8. Chapter 6: Fine-Tuning Language Models for Token Classification 9. Chapter 7: Text Representation 10. Section 3: Advanced Topics
11. Chapter 8: Working with Efficient Transformers 12. Chapter 9:Cross-Lingual and Multilingual Language Modeling 13. Chapter 10: Serving Transformer Models 14. Chapter 11: Attention Visualization and Experiment Tracking 15. Other Books You May Enjoy

Chapter 11: Attention Visualization and Experiment Tracking

In this chapter, we will cover two different technical concepts, attention visualization and experiment tracking, and we will practice them through sophisticated tools such as exBERT and BertViz. These tools provide important functions for interpretability and explainability. First, we will discuss how to visualize the inner parts of attention by utilizing the tools. It is important to interpret the learned representations and to understand the information encoded by self-attention heads in the Transformer. We will see that certain heads correspond to a certain aspect of syntax or semantics. Secondly, we will learn how to track experiments by logging and then monitoring by using TensorBoard and Weights & Biases (W&B). These tools enable us to efficiently host and track experimental results such as loss or other metrics, which helps us to optimize model training. You will learn how to use exBERT and BertViz to see the...

Technical requirements

The code for this chapter is found at https://github.com/PacktPublishing/Mastering-Transformers/tree/main/CH11, which is the GitHub repository for this book. We will be using Jupyter Notebook to run our coding exercises that require Python 3.6.0 or above, and the following packages will need to be installed:

  • tensorflow
  • pytorch
  • Transformers >=4.00
  • tensorboard
  • wandb
  • bertviz
  • ipywidgets

Check out the following link to see Code in Action Video:

https://bit.ly/3iM4Y1F

Interpreting attention heads

As with most Deep Learning (DL) architectures, both the success of the Transformer models and how they learn have been not fully understood, but we know that the Transformers—remarkably—learn many linguistic features of the language. A significant amount of learned linguistic knowledge is distributed both in the hidden state and in the self-attention heads of the pre-trained model. There have been substantial recent studies published and many tools developed to understand and to better explain the phenomena.

Thanks to some Natural Language Processing (NLP) community tools, we are able to interpret the information learned by the self-attention heads in a Transformer model. The heads can be interpreted naturally, thanks to the weights between tokens. We will soon see that in further experiments in this section, certain heads correspond to a certain aspect of syntax or semantics. We can also observe surface-level patterns and many other linguistic...

Tracking model metrics

So far, we have trained language models and simply analyzed the final results. We have not observed the training process or made a comparison of training using different options. In this section, we will briefly discuss how to monitor model training. For this, we will handle how to track the training of the models we developed before in Chapter 5, Fine-Tuning Language Models for Text Classification.

There are two important tools developed in this area—one is TensorBoard, and the other is W&B. With the former, we save the training results to a local drive and visualize them at the end of the experiment. With the latter, we are able to monitor the model-training progress live in a cloud platform.

This section will be a short introduction to these tools without going into much detail about them, as this is beyond the scope of this chapter.

Let's start with TensorBoard.

Tracking model training with TensorBoard

TensorBoard is a visualization...

Summary

In this chapter, we introduced two different technical concepts: attention visualization and experiment tracking. We visualized attention heads with the exBERT online interface first. Then, we studied BertViz, where we wrote Python code to see three BertViz visualizations: head view, model view, and neuron view. The BertViz interface gave us more control so that we could work with different language models. Moreover, we were also able to observe how attention weights between tokens are computed. These tools provide us with important functions for interpretability and exploitability. We also learned how to track our experiments to obtain higher-quality models and do error analysis. We utilized two tools to monitor training: TensorBoard and W&B. These tools were used to effectively track experiments and to optimize model training.

Congratulations! You've finished reading this book by demonstrating great perseverance and persistence throughout this journey. You can...

References

  • exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models, Benjamin Hoover, Hendrik Strobelt, Sebastian Gehrmann, 2019.
  • Vig, J., 2019. A multiscale visualization of attention in the Transformer model. arXiv preprint arXiv:1906.05714.
  • Clark, K., Khandelwal, U., Levy, O. and Manning, C.D., 2019. What does bert look at? An analysis of bert's attention. arXiv preprint arXiv:1906.04341.7
  • Biewald, L., Experiment tracking with weights and biases, 2020. Software available from wandb.com, 2(5).
  • Rogers, A., Kovaleva, O. and Rumshisky, A.,2020. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, pp.842-866.
  • W&B: https://wandb.ai
  • TensorBoard: https://www.tensorflow.org/tensorboard
  • exBert—Hugging Face: https://huggingface.co/exbert
  • exBERT: https://exbert.net/

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Improve your learning with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Fully searchable for easy access to vital information
  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Mastering Transformers
Published in: Sep 2021 Publisher: Packt ISBN-13: 9781801077651
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}