Reader small image

You're reading from  Transformers for Natural Language Processing and Computer Vision - Third Edition

Product typeBook
Published inFeb 2024
Reading LevelN/a
PublisherPackt
ISBN-139781805128724
Edition3rd Edition
Languages
Tools
Right arrow
Author (1)
Denis Rothman
Denis Rothman
author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Right arrow

Transformer visualization with BertViz

Jesse Vig's article, A Multiscale Visualization of Attention in the Transformer Model, 2019, recognizes the effectiveness of transformer models. However, Jesse Vig explains that deciphering the attention mechanism is challenging. The paper describes the process of BertViz, a visualization tool.BertViz can visualize attention head activity and interpret a transformer model's behavior.BertViz was first designed to visualize BERT and GPT models. In this section, we will visualize the activity of a BERT model.Some tools mention the term "interpretable," stressing the "why" of an output. Others use the term "explainable" to describe "how" an output is reached. Finally, some don't apply the nuance and use the terms loosely because "why" can sometimes mean "how" to explain why! We will use the terms loosely, as the tools in this chapter most often do.Let...

Interpreting Hugging Face transformers with SHAP

In this section, we will interpret the Hugging Face transformers with SHAP. The Hugging Face platform provides an interface for an impressive list of transformer models.The section is divided into two parts:

  • Introducing SHAP
  • Explaining Hugging Face outputs with SHAP

Introducing SHAP

In Game Theory, a Shapley value expresses the distribution of the total values among "players" through their marginal contribution. In a sentence, the words are the "players." Each word will have a score. The total score is the value of the game. The value of each word is calculated over all the permutations of the sentence.The goal is to see how each word changes the meaning of a sentence.For example, there are seven words in the following sentence: "I love playing chess with my friends"The total number of permutations = !7= 7x6x5x4x3x2x1= 5040.The immediate conclusion is that SHAP will be challenging for a long text. However...

Transformer visualization via dictionary learning

Transformer visualization via dictionary learning is based on transformer factors. The goal is to analyze words in their context.

Transformer factors

A transformer factor is an embedding vector that contains contextualized words. A word without context can have many meanings, creating a polysemy issue. For example, the word separate can be a verb or an adjective. Furthermore, separate can mean disconnect, discriminate, scatter, and many other definitions.Yun et al. (2021) thus created an embedding vector with contextualized words. A word embedding vector can be constructed with sparse linear representations of word factors. For example, depending on the context of the sentences in a dataset, separate can be represented as:

separate=0.3" keep apart"+"0.3" distinct"+ 0.1 "discriminate"+0.1 "sever" + 0.1 "disperse"+0.1 "scatter...

Other Interpretable AI Tools

There are many other methods and tools to interpret transformer models. We will briefly examine two efficient tools: LIT and OpenAI’s GPT-4 explainer. Let's now begin with the intuitive LIT tool.

LIT

LIT's visual interface will help you find examples that the model processes incorrectly, analyze similar examples, see how the model behaves when you change a context, and more language issues related to transformer models.LIT does not display the activities of the attention heads as BertViz does. However, it's worth analyzing why things went wrong and trying to find solutions.You can choose a Uniform Manifold Approximation and Projection (UMAP) visualization or a PCA projector representation. PCA will make more linear projections in specific directions and magnitude. UMAP will break its projections down into mini-clusters. Both approaches make sense depending on how far you want to go when analyzing the output...

Summary

Transformer models are trained to resolve word-level polysemy disambiguation and low-level, mid-level, and high-level dependencies. The process is achieved by training million - to trillion-parameter models. The task of interpreting these giant models seems daunting. However, several tools are emerging.We first installed BertViz. We learned how to interpret the computations of the attention heads with an interactive interface. We saw how words interacted with other words for each layer. We introduced ExBERT, another approach to visualizing BERT, among other models.The chapter continued by defining SHAP and revealing the contribution of each word processed by Hugging Face Transformers.We then ran transformer visualization via dictionary learning with LIME. A user can choose a transformer factor to analyze and visualize the evolution of its representation from the lower layers to the higher layers of the transformer. The factor will progressively go from polysemy disambiguation...

Questions

  1. BertViz only shows the output of the last layer of the BERT model. (True/False)
  2. BertViz shows the attention heads of each layer of a BERT model. (True/False)
  3. BertViz shows how the tokens relate to each other. (True/False)
  4. LIT shows the inner workings of attention heads like BertViz. (True/False)
  5. Probing is a way for an algorithm to predict language representations. (True/False)
  6. NER is a probing task. (True/False)
  7. PCA and UMAP are non-probing tasks. (True/False)
  8. LIME is model agnostic. (True/False)
  9. Transformers deepen the relationships of the tokens layer by layer. (True/False)
  10. OpenAI Large Language Models (LLMs) can explain LLMs. (True/False)

References

BertViz: https://github.com/jessevig/BertVizTransformer visualization via dictionary learningZeyu YunYubei ChenBruno A OlshausenYann LeCun, 2021, Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors, https://arxiv.org/abs/2103.15949Hugging Face with Slunberg SHAP, https://github.com/slundberg/SHAPTransformer Visualization via dictionary learning: https://transformervis.github.io/transformervis/OpenAI, Large Language Models can explain neurons in language models: https://openai.com/research/language-models-can-explain-neurons-in-language-modelsOpenAI neuro explainer paper: https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.htmlLIT: https://pair-code.github.io/lit/

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Transformers for Natural Language Processing and Computer Vision - Third Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781805128724
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman