Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Transformers for Natural Language Processing - Second Edition

You're reading from  Transformers for Natural Language Processing - Second Edition

Product type Book
Published in Mar 2022
Publisher Packt
ISBN-13 9781803247335
Pages 602 pages
Edition 2nd Edition
Languages
Author (1):
Denis Rothman Denis Rothman
Profile icon Denis Rothman

Table of Contents (25) Chapters

Preface 1. What are Transformers? 2. Getting Started with the Architecture of the Transformer Model 3. Fine-Tuning BERT Models 4. Pretraining a RoBERTa Model from Scratch 5. Downstream NLP Tasks with Transformers 6. Machine Translation with the Transformer 7. The Rise of Suprahuman Transformers with GPT-3 Engines 8. Applying Transformers to Legal and Financial Documents for AI Text Summarization 9. Matching Tokenizers and Datasets 10. Semantic Role Labeling with BERT-Based Transformers 11. Let Your Data Do the Talking: Story, Questions, and Answers 12. Detecting Customer Emotions to Make Predictions 13. Analyzing Fake News with Transformers 14. Interpreting Black Box Transformer Models 15. From NLP to Task-Agnostic Transformer Models 16. The Emergence of Transformer-Driven Copilots 17. The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4 18. Other Books You May Enjoy
19. Index
Appendix I — Terminology of Transformer Models 1. Appendix II — Hardware Constraints for Transformer Models 2. Appendix III — Generic Text Completion with GPT-2 3. Appendix IV — Custom Text Completion with GPT-2 4. Appendix V — Answers to the Questions

Applying Transformers to Legal and Financial Documents for AI Text Summarization

We explored the architecture training, fine-tuning, and usage of several transformer ecosystems during the first seven chapters. In Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines, we discovered that OpenAI has begun to experiment with zero-shot models that require no fine-tuning, no development, and can be implemented in a few lines.

The underlying concept of such an evolution relies on how transformers strive to teach a machine how to understand a language and express itself in a human-like manner. Thus, we have gone from training a model to teaching languages to machines.

Raffel et al. (2019) designed a transformer meta-model based on a simple assertion: every NLP problem can be represented as a text-to-text function. Every type of NLP task requires some kind of text context that generates some form of text response.

A text-to-text representation of any NLP task provides...

Designing a universal text-to-text model

Google’s NLP technical revolution started with Vaswani et al. (2017), the original Transformer, in 2017. Attention is All You Need toppled 30+ years of artificial intelligence belief in RNNs and CNNs applied to NLP tasks. It took us from the stone age of NLP/NLU to the 21st century in a long-overdue evolution.

Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines, summed up a second revolution that boiled up and erupted between Google’s Vaswani et al. (2017) original Transformer and OpenAI’s Brown et al. (2020) GPT-3 transformers. The original Transformer was focused on performance to prove that attention was all we needed for NLP/NLU tasks.

OpenAI’s second revolution, through GPT-3, focused on taking transformer models from fine-tuned pretrained models to few-shot trained models that required no fine-tuning. The second revolution was to show that a machine can learn a language and apply it to...

Text summarization with T5

NLP summarizing tasks extract succinct parts of a text. This section will start by presenting the Hugging Face resources we will use in this chapter. Then we will initialize a T5-large transformer model. Finally, we will see how to use T5 to summarize any document, including legal and corporate documents.

Let’s begin by introducing Hugging Face’s framework.

Hugging Face

Hugging Face designed a framework to implement Transformers at a higher level. We used Hugging Face to fine-tune a BERT model in Chapter 3, Fine-Tuning BERT Models, and train a RoBERTa model in Chapter 4, Pretraining a RoBERTa Model from Scratch.

To expand our knowledge, we needed to explore other approaches, such as Trax, in Chapter 6, Machine Translation with the Transformer, and OpenAI’s models, in Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines. This chapter will use Hugging Face’s framework again and explain more about the...

Summarization with GPT-3

It was essential to understand the architecture of a T5 transformer. We will also see how GPT-3 engines behave on one of the texts. The goal is not to benchmark companies and models. The goal is for an Industry 4.0 AI Guru to have a broad knowledge of NLP.

First, go to https://openai.com/ and sign up and sign in.

Then go to the examples page and select Summarize for a 2nd grader:

Une image contenant texte  Description générée automatiquement

Figure 8.8: GPT-3 examples page

A window will open, and we can enter our prompt.

We submit the text T of the corporate sample of the previous section to the GPT-3 model.

The prompt is P = E + T + S:

  • E tells the model to make the explanation simple:

    My second grader asked me what this passage means:

  • The text T is the same as in the previous section and is in quotes:

    """The law regarding corporations prescribes that a corporation can be incorporated in the state of Montana to serve any lawful purpose...

Summary

In this chapter, we saw how the T5 transformer models standardized the input of the encoder and decoder stacks of the original Transformer. The original Transformer architecture has an identical structure for each block (or layer) of the encoder and decoder stacks. However, the original Transformer did not have a standardized input format for NLP tasks.

Raffel et al. (2018) designed a standard input for a wide range of NLP tasks by defining a text-to-text model. They added a prefix to an input sequence, indicating the NLP problem type to solve. This led to a standard text-to-text format. The Text-To-Text Transfer Transformer (T5) was born. We saw that this deceivingly simple evolution made it possible to use the same model and hyperparameters for a wide range of NLP tasks. The invention of T5 takes the standardization process of transformer models a step further.

We then implemented a T5 model that could summarize any text. We tested the model on texts that were not...

Questions

  1. T5 models only have encoder stacks like BERT models. (True/False)
  2. T5 models have both encoder and decoder stacks. (True/False)
  3. T5 models use relative positional encoding, not absolute positional encoding. (True/False)
  4. Text-to-text models are only designed for summarization. (True/False)
  5. Text-to-text models apply a prefix to the input sequence that determines the NLP task. (True/False)
  6. T5 models require specific hyperparameters for each task. (True/False)
  7. One of the advantages of text-to-text models is that they use the same hyperparameters for all NLP tasks. (True/False)
  8. T5 transformers do not contain a feedforward network. (True/False)
  9. Hugging Face is a framework that makes transformers easier to implement. (True/False)
  10. OpenAI’s transformer engines are game changers. (True/False)

References

lock icon The rest of the chapter is locked
You have been reading a chapter from
Transformers for Natural Language Processing - Second Edition
Published in: Mar 2022 Publisher: Packt ISBN-13: 9781803247335
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}