You're reading from Transformers for Natural Language Processing - Second Edition

Product type Book

Published in Mar 2022

Publisher Packt

ISBN-13 9781803247335

Pages 602 pages

Edition 2nd Edition

Languages

Concepts

Mobile Application Development

Author (1):

Denis Rothman

Table of Contents (25) Chapters

Preface

1. What are Transformers?

2. Getting Started with the Architecture of the Transformer Model

3. Fine-Tuning BERT Models

4. Pretraining a RoBERTa Model from Scratch

5. Downstream NLP Tasks with Transformers

6. Machine Translation with the Transformer

7. The Rise of Suprahuman Transformers with GPT-3 Engines

8. Applying Transformers to Legal and Financial Documents for AI Text Summarization

9. Matching Tokenizers and Datasets

10. Semantic Role Labeling with BERT-Based Transformers

11. Let Your Data Do the Talking: Story, Questions, and Answers

12. Detecting Customer Emotions to Make Predictions

13. Analyzing Fake News with Transformers

14. Interpreting Black Box Transformer Models

15. From NLP to Task-Agnostic Transformer Models

16. The Emergence of Transformer-Driven Copilots

17. The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

18. Other Books You May Enjoy

19. Index

Appendix I — Terminology of Transformer Models

1. Appendix II — Hardware Constraints for Transformer Models

2. Appendix III — Generic Text Completion with GPT-2

3. Appendix IV — Custom Text Completion with GPT-2

4. Appendix V — Answers to the Questions

The Emergence of Transformer-Driven Copilots

When Industry 4.0 (I4.0) reaches maturity, it will all be about machine-to-machine connections, communication, and decision-making. AI will be primarily embedded in ready-to-use pay-as-you-go cloud AI solutions. Big tech will absorb the most talented AI specialists to create APIs, interfaces, and integration tools.

AI specialists will go from development to design to becoming architects, integrators, and cloud AI pipeline administrators. Thus, AI is becoming a job for engineer consultants more than engineer developers.

Chapter 1, What Are Transformers?, introduced foundation models, transformers that can do NLP tasks they were not trained for. Chapter 15, From NLP to Task-Agnostic Transformer Models, expanded foundation model transformers to task-agnostic models that can perform vision tasks, NLP tasks, and much more.

This chapter will extend task-agnostic OpenAI GPT-3 models to a wide range of copilot tasks. A new generation...

Prompt engineering

Speaking a specific language is not hereditary. There is not a language center in our brain containing the language of our parents. Our brain engineers our neurons early in our lives to speak, read, write, and understand a language. Each human has a different language circuitry depending on their cultural background and how they were communicated with in their early years.

As we grow up, we discover that much of what we hear is chaos: unfinished sentences, grammar mistakes, misused words, bad pronunciation, and many other distortions.

We use language to convey a message. We quickly find that we need to adapt our language to the person or audience we address. We might have to try additional “inputs” or “prompts” to obtain the result (“output”) we expect. Foundation-level transformer models such as GPT-3 can perform hundreds of tasks in an indefinite number of ways. We must learn the language of transformer prompts...

Copilots

Welcome to the world of AI-driven development copilots powered by OpenAI and available in Visual Studio.

GitHub Copilot

Let’s begin with GitHub Copilot:

https://github.com/github/copilot-docs

In this section, we will use GitHub Copilot with PyCharm (JetBrains):

https://github.com/github/copilot-docs/tree/main/docs/jetbrains

Follow the instructions in the documentation to install JetBrains and activate OpenAI GitHub Copilot in PyCharm.

Working withGitHub Copilot is a four-step process (see Figure 16.7):

OpenAI Codex is trained on public code and text on the internet.
The trained model is plugged into the GitHub Copilot service.
The GitHub service manages back-and-forth flows between code we write in an editor (in this case PyCharm) and OpenAI Codex. The GitHub Service Manager makes suggestions and then sends the interactions back for improvement.
The code editor is our development workspace.

A picture containing diagram Description automatically generated

Figure...

Domain-specific GPT-3 engines

This section explores GPT-3 engines that can perform domain-specific tasks. We will run three models in the three subsections of this section:

Embedding2ML to use GPT-3 to provide embeddings for ML algorithms
Instruct series to ask GPT-3 to provide instructions for any task
Content filter to filter bias or any form of unacceptable input and output

Open Domain_Specific_GPT_3_Functionality.ipynb.

We will begin with embedding2ML (embeddings as an input to ML).

Embedding2ML

OpenAI has trained several embedding models with different dimensions with different capabilities:

Ada (1,024 dimensions)
Babbage (2,048 dimensions)
Curie (4,096 dimensions)
Davinci (12,288 dimensions)

For more explanations on each engine, you will find more information on OpenAI’s website:

https://beta.openai.com/docs/guides/embeddings.

The Davinci model offers embedding with 12,288 dimensions...

Transformer-based recommender systems

Transformer models learn sequences. Learning language sequences is a great place to start considering the billions of messages posted on social media and cloud platforms each day. Consumer behaviors, images, and sounds can also be represented in sequences.

In this section, we will first create a general-purpose sequence graph and then build a general-purpose transformer-based recommender in Google Colaboratory. We will then see how to deploy them in metahumans.

Let’s first define general-purpose sequences.

General-purpose sequences

Many activities can be represented by entities and links between them. They are thus organized in sequences. For example, a video on YouTube can be an entity A, and the link can be the behavior of a person going from video A to video E.

Another example is a bad fever being an entity F, and the link being the inference a doctor may make leading to a micro-decision B. The purchase of product...

Computer vision

This book is about NLP, not computer vision. However, in the previous section, we implemented general purpose sequences that can be applied to many domains. Computer vision is one of them.

The title of the article by Dosovitskiy et al. (2021) says it all: An image is worth 16x16 words: Transformers for Image Recognition at Scale. The authors processed an image as sequences. The results proved their point.

Google has made vision transformers available in a Colaboratory notebook. Open Vision_Transformer_MLP_Mixer.ipynb in the Chapter16 directory of this book’s GitHub repository.

Open Vision_Transformer_MLP_Mixer.ipynb contains a transformer computer vision model in JAX(). JAX combines Autograd and XLA. JAX can differentiate Python and NumPy functions. JAX speeds up Python and NumPy by using compilation techniques and parallelization.

The notebook is self-explanatory. You can explore it to see how it works. However, bear in mind that when Industry...

Humans and AI copilots in metaverses

Humans and metahuman AI are merging into metaverses. Exploring metaverses is beyond the scope of this book. The toolbox provided by this book shows the path to metaverses populated by humans and metahuman AI.

Avatars, computer vision, and video game experience will make our communication with others immersive. We will go from looking at smartphones to being in locations with others.

From looking at to being in

The evolution from looking at to being in is a natural one. We invented computers, added screens, then invented smartphones, and now use apps for video meetings.

Now we can enter virtual reality for all types of meetings and activities.

We will use Facebook’s metaverse, for example, on our smartphone to feel present in the same location as the people (personal and professional) we meet. Feeling present will no doubt be a major evolution in smartphone communication.

Feeling present somewhere is quite different...

Summary

This chapter described the rise of AI copilots with human-decision-making-level capability. Industry 4.0 has opened the door to machine interconnectivity. Machine-to-machine micro-decision making will speed up transactions. AI copilots will boost our productivity in a wide range of domains.

We saw how to use OpenAI Codex to generate source code while we code and even with natural language instructions.

We built a transformer-based recommender system using a dataset generated by the MDP program to train a RoBERTa transformer model. The dataset structure was a multi-purpose sequence model. A metahuman can thus acquire multi-domain recommender functionality.

The chapter then showed how a vision transformer could classify images processed as sequences of information.

Finally, we saw that the metaverse would make recommendations visible through a metahuman interface or invisible in deeply embedded functions in social media, for example.

Transformers have emerged...

Questions

AI copilots that can generate code automatically do not exist. (True/False)
AI copilots will never replace humans. (True/False)
GPT-3 engines can only do one task. (True/False)
Transformers can be trained to be recommenders. (True/False)
Transformers can only process language. (True/False)
A transformer sequence can only contain words. (True/False)
Vision transformers cannot equal CNNs. (True/False)
AI robots with computer vision do not exist. (True/False)
It is impossible to produce Python source code automatically. (True/False)
We might one day become the copilots of robots. (True/False)

References

OpenAI platform for GPT-3: https://openai.com
OpenAI models and engines: https://beta.openai.com/docs/engines
Vision Transformers: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, 2020, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929
JAX for vision transformers: https://github.com/google/jax
OpenAI, Visual Studio Copilot: https://copilot.github.com/
Facebook metaverse: https://www.facebook.com/Meta/videos/577658430179350
Markov Decision Process (MDP), examples and graph: Denis Rothman, 2020, Artificial Intelligence by Example, 2^nd Edition: https://www.amazon.com/Artificial-Intelligence-Example-advanced-learning/dp/1839211539/ref=sr_1_3?crid=238SF8FPU7BB0&keywords=denis+rothman&qid=1644008912&sprefix=denis...