You're reading from Transformers for Natural Language Processing - Second Edition

Product type Book

Published in Mar 2022

Publisher Packt

ISBN-13 9781803247335

Pages 602 pages

Edition 2nd Edition

Languages

Concepts

Mobile Application Development

Author (1):

Denis Rothman

Table of Contents (25) Chapters

Preface

1. What are Transformers?

2. Getting Started with the Architecture of the Transformer Model

3. Fine-Tuning BERT Models

4. Pretraining a RoBERTa Model from Scratch

5. Downstream NLP Tasks with Transformers

6. Machine Translation with the Transformer

7. The Rise of Suprahuman Transformers with GPT-3 Engines

8. Applying Transformers to Legal and Financial Documents for AI Text Summarization

9. Matching Tokenizers and Datasets

10. Semantic Role Labeling with BERT-Based Transformers

11. Let Your Data Do the Talking: Story, Questions, and Answers

12. Detecting Customer Emotions to Make Predictions

13. Analyzing Fake News with Transformers

14. Interpreting Black Box Transformer Models

15. From NLP to Task-Agnostic Transformer Models

16. The Emergence of Transformer-Driven Copilots

17. The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

18. Other Books You May Enjoy

19. Index

Appendix I — Terminology of Transformer Models

1. Appendix II — Hardware Constraints for Transformer Models

2. Appendix III — Generic Text Completion with GPT-2

3. Appendix IV — Custom Text Completion with GPT-2

4. Appendix V — Answers to the Questions

Applying Transformers to Legal and Financial Documents for AI Text Summarization

We explored the architecture training, fine-tuning, and usage of several transformer ecosystems during the first seven chapters. In Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines, we discovered that OpenAI has begun to experiment with zero-shot models that require no fine-tuning, no development, and can be implemented in a few lines.

The underlying concept of such an evolution relies on how transformers strive to teach a machine how to understand a language and express itself in a human-like manner. Thus, we have gone from training a model to teaching languages to machines.

Raffel et al. (2019) designed a transformer meta-model based on a simple assertion: every NLP problem can be represented as a text-to-text function. Every type of NLP task requires some kind of text context that generates some form of text response.

A text-to-text representation of any NLP task provides...

Designing a universal text-to-text model

Google’s NLP technical revolution started with Vaswani et al. (2017), the original Transformer, in 2017. Attention is All You Need toppled 30+ years of artificial intelligence belief in RNNs and CNNs applied to NLP tasks. It took us from the stone age of NLP/NLU to the 21^st century in a long-overdue evolution.

Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines, summed up a second revolution that boiled up and erupted between Google’s Vaswani et al. (2017) original Transformer and OpenAI’s Brown et al. (2020) GPT-3 transformers. The original Transformer was focused on performance to prove that attention was all we needed for NLP/NLU tasks.

OpenAI’s second revolution, through GPT-3, focused on taking transformer models from fine-tuned pretrained models to few-shot trained models that required no fine-tuning. The second revolution was to show that a machine can learn a language and apply it to...

Text summarization with T5

NLP summarizing tasks extract succinct parts of a text. This section will start by presenting the Hugging Face resources we will use in this chapter. Then we will initialize a T5-large transformer model. Finally, we will see how to use T5 to summarize any document, including legal and corporate documents.

Let’s begin by introducing Hugging Face’s framework.

Hugging Face

Hugging Face designed a framework to implement Transformers at a higher level. We used Hugging Face to fine-tune a BERT model in Chapter 3, Fine-Tuning BERT Models, and train a RoBERTa model in Chapter 4, Pretraining a RoBERTa Model from Scratch.

To expand our knowledge, we needed to explore other approaches, such as Trax, in Chapter 6, Machine Translation with the Transformer, and OpenAI’s models, in Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines. This chapter will use Hugging Face’s framework again and explain more about the...

Summarization with GPT-3

It was essential to understand the architecture of a T5 transformer. We will also see how GPT-3 engines behave on one of the texts. The goal is not to benchmark companies and models. The goal is for an Industry 4.0 AI Guru to have a broad knowledge of NLP.

First, go to https://openai.com/ and sign up and sign in.

Then go to the examples page and select Summarize for a 2nd grader:

Une image contenant texte Description générée automatiquement

Figure 8.8: GPT-3 examples page

A window will open, and we can enter our prompt.

We submit the text T of the corporate sample of the previous section to the GPT-3 model.

The prompt is P = E + T + S:

E tells the model to make the explanation simple:
My second grader asked me what this passage means:

The text T is the same as in the previous section and is in quotes:
"""The law regarding corporations prescribes that a corporation can be incorporated in the state of Montana to serve any lawful purpose...

Summary

In this chapter, we saw how the T5 transformer models standardized the input of the encoder and decoder stacks of the original Transformer. The original Transformer architecture has an identical structure for each block (or layer) of the encoder and decoder stacks. However, the original Transformer did not have a standardized input format for NLP tasks.

Raffel et al. (2018) designed a standard input for a wide range of NLP tasks by defining a text-to-text model. They added a prefix to an input sequence, indicating the NLP problem type to solve. This led to a standard text-to-text format. The Text-To-Text Transfer Transformer (T5) was born. We saw that this deceivingly simple evolution made it possible to use the same model and hyperparameters for a wide range of NLP tasks. The invention of T5 takes the standardization process of transformer models a step further.

We then implemented a T5 model that could summarize any text. We tested the model on texts that were not...

Questions

T5 models only have encoder stacks like BERT models. (True/False)
T5 models have both encoder and decoder stacks. (True/False)
T5 models use relative positional encoding, not absolute positional encoding. (True/False)
Text-to-text models are only designed for summarization. (True/False)
Text-to-text models apply a prefix to the input sequence that determines the NLP task. (True/False)
T5 models require specific hyperparameters for each task. (True/False)
One of the advantages of text-to-text models is that they use the same hyperparameters for all NLP tasks. (True/False)
T5 transformers do not contain a feedforward network. (True/False)
Hugging Face is a framework that makes transformers easier to implement. (True/False)
OpenAI’s transformer engines are game changers. (True/False)

References

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, 2019, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer: https://arxiv.org/pdf/1910.10683.pdf
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017, Attention is All You Need: https://arxiv.org/abs/1706.03762
Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani, 2018, Self-Attention with Relative Position Representations: https://arxiv.org/abs/1803.02155
Hugging Face Framework and Resources: https://huggingface.co/
U.S. Legal, Montana Corporate Laws: https://corporations.uslegal.com/state-corporation-law/montana-corporation-law/#:~:text=Montana%20Corporation%20Law,carrying%20out%20its%20business%20activities
The Declaration of Independence of the United States of America by Thomas Jefferson: https://www.gutenberg.org...