You're reading from Transformers for Natural Language Processing - Second Edition

Product type Book

Published in Mar 2022

Publisher Packt

ISBN-13 9781803247335

Pages 602 pages

Edition 2nd Edition

Languages

Concepts

Mobile Application Development

Author (1):

Denis Rothman

Table of Contents (25) Chapters

Preface

1. What are Transformers?

2. Getting Started with the Architecture of the Transformer Model

3. Fine-Tuning BERT Models

4. Pretraining a RoBERTa Model from Scratch

5. Downstream NLP Tasks with Transformers

6. Machine Translation with the Transformer

7. The Rise of Suprahuman Transformers with GPT-3 Engines

8. Applying Transformers to Legal and Financial Documents for AI Text Summarization

9. Matching Tokenizers and Datasets

10. Semantic Role Labeling with BERT-Based Transformers

11. Let Your Data Do the Talking: Story, Questions, and Answers

12. Detecting Customer Emotions to Make Predictions

13. Analyzing Fake News with Transformers

14. Interpreting Black Box Transformer Models

15. From NLP to Task-Agnostic Transformer Models

16. The Emergence of Transformer-Driven Copilots

17. The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

18. Other Books You May Enjoy

19. Index

Appendix I — Terminology of Transformer Models

1. Appendix II — Hardware Constraints for Transformer Models

2. Appendix III — Generic Text Completion with GPT-2

3. Appendix IV — Custom Text Completion with GPT-2

4. Appendix V — Answers to the Questions

Appendix III — Generic Text Completion with GPT-2

This appendix is the detailed explanation of the Generic text completion with GPT-2 section in Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines. This section describes how to implement a GPT-2 transformer model for generic text complexion.

You can read the usage of this notebook directly in Chapter 7 or build the program and run it in this appendix to get more profound knowledge of how a GPT model works.

We will clone the OpenAI_GPT_2 repository, download the 345M-parameter GPT-2 transformer model, and interact with it. We will enter context sentences and analyze the text generated by the transformer. The goal is to see how it creates new content.

This section is divided into nine steps. Open OpenAI_GPT_2.ipynb in Google Colaboratory. The notebook is in the AppendixIII directory of the GitHub repository of this book. You will notice that the notebook is also divided into the same nine steps and cells...

Step 1: Activating the GPU

We must activate the GPU to train our GPT-2 345M-parameter transformer model.

To activate the GPU, go to the Runtime menu in Notebook settings to get the most out of the VM:

Figure III.1: The GPU hardware accelerator

We can see that activating the GPU is a prerequisite for better performance that will give us access to the world of GPT transformers. So let’s now clone the OpenAI GPT-2 repository.

Step 2: Cloning the OpenAI GPT-2 repository

OpenAI still lets us download GPT-2 for now. This may be discontinued in the future, or maybe we will get access to more resources. At this point, the evolution of transformers and their usage moves so fast that nobody can foresee how the market will evolve, even the major research labs themselves.

We will clone OpenAI’s GitHub directory on our VM:

#@title Step 2: Cloning the OpenAI GPT-2 Repository
!git clone https://github.com/openai/gpt-2.git

When the cloning is over, you should see the repository appear in the file manager:

Figure III.2: Cloned GPT-2 repository

Click on src, and you will see that the Python files we need from OpenAI to run our model are installed:

Figure III.3: The GPT-2 Python files to run a model

You can see that we do not have the Python training files we need. We will install them when we train the GPT-2 model in the Training a GPT-2 language model section of Appendix...

Step 3: Installing the requirements

The requirements will be installed automatically:

#@title Step 3: Installing the requirements
import os          # when the VM restarts import os necessary
os.chdir("/content/gpt-2")
!pip3 install -r requirements.txt

When running cell by cell, we might have to restart the VM and thus import os again.

The requirements for this notebook are:

Fire 0.1.3 to generate command-line interfaces (CLIs)
regex 2017.4.5 for regex usage
Requests 2.21.0, an HTTP library
tqdm 4.31.1 to display a progress meter for loops

You may be asked to restart the notebook.

Do not restart it now. Let’s wait until we check the version of TensorFlow.

Step 4: Checking the version of TensorFlow

The GPT-2 345M transformer model provided by OpenAI uses TensorFlow 1.x. This will lead to several warnings when running the program. However, we will ignore them and run at full speed on the thin ice of training GPT models ourselves with our modest machines.

In the 2020s, GPT models have reached 175 billion parameters, making it impossible for us to train them ourselves efficiently without having access to a supercomputer. The number of parameters will only continue to increase.

The corporate giants’ research labs, such as Facebook AI and OpenAI, and Google Research/Brain, are speeding toward super-transformers and are leaving what they can for us to learn and understand. But, unfortunately, they do not have time to go back and update all the models they share. However, we still have this notebook!

TensorFlow 2.x is the latest TensorFlow version. However, older programs can still be helpful. This is one reason why Google...

Step 5: Downloading the 345M-parameter GPT-2 model

We will now download the trained 345M-parameter GPT-2 model:

#@title Step 5: Downloading the 345M parameter GPT-2 Model
# run code and send argument
import os # after runtime is restarted
os.chdir("/content/gpt-2")
!python3 download_model.py '345M'

The path to the model directory is:

/content/gpt-2/models/345M

It contains the information we need to run the model:

Figure III.4: The GPT-2 Python files of the 345M-parameter model

The hparams.json file contains the definition of the GPT-2 model:

"n_vocab": 50257, the size of the vocabulary of the model
"n_ctx": 1024, the context size
"n_embd": 1024, the embedding size
"n_head": 16, the number of heads
"n_layer": 24, the number of layers

encoder.json and vacab.bpe contain the tokenized vocabulary and the BPE word pairs. If necessary, take a few...

Steps 6-7: Intermediate instructions

In this section, we will go through Steps 6, 7, and 7a, which are intermediate steps leading to Step 8, in which we will define and activate the model.

We want to print UTF-encoded text to the console when we are interacting with the model:

#@title Step 6: Printing UTF encoded text to the console
!export PYTHONIOENCODING=UTF-8

We want to make sure we are in the src directory:

#@title Step 7: Project Source Code
import os # import after runtime is restarted
os.chdir("/content/gpt-2/src")

We are ready to interact with the GPT-2 model. We could run it directly with a command, as we will do in the Training a GPT-2 language model section of Appendix IV, Custom Text Completion with GPT-2. However, in this section, we will go through the main aspects of the code.

interactive_conditional_samples.py first imports the necessary modules required to interact with the model:

#@title Step 7a: Interactive Conditional Samples...

Steps 7b-8: Importing and defining the model

We will now activate the interaction with the model with interactive_conditional_samples.py.

We need to import three modules that are also in /content/gpt-2/src:

import model, sample, encoder

The three programs are:

model.py defines the model’s structure: the hyperparameters, the multi-attention tf.matmul operations, the activation functions, and all the other properties.
sample.py processes the interaction and controls the sample that will be generated. It makes sure that the tokens are more meaningful.
Softmax values can sometimes be blurry, like looking at an image in low definition. sample.py contains a variable named temperature that will make the values sharper, increasing the higher probabilities and softening the lower ones.

sample.py can activate Top-k sampling. Top-k sampling sorts the probability distribution of a predicted sequence. The higher probability values of the head of...

Step 9: Interacting with GPT-2

In this section, we will interact with the GPT-2 345M model.

There will be more messages when the system runs, but as long as Google Colaboratory maintains tf 1.x, we will run the model with this notebook. One day, we might have to use GPT-3 engines if this notebook becomes obsolete, or we will have to use Hugging Face GPT-2 wrappers, for example, which might be deprecated as well in the future.

In the meantime, GPT-2 is still in use so let’s interact with the model!

To interact with the model, run the interact_model cell:

#@title Step 9: Interacting with GPT-2
interact_model('345M',None,1,1,300,1,0,'/content/gpt-2/models')

You will be prompted to enter some context:

Figure III.5: Context input for text completion

You can try any type of context you wish since this is a standard GPT-2 model.

We can try a sentence written by Emmanuel Kant:

Human reason, in one sphere of its cognition...

References

OpenAI GPT-2 GitHub repository: https://github.com/openai/gpt-2
N Shepperd’s GitHub repository: https://github.com/nshepperd/gpt-2

Join our book’s Discord space

Join the book’s Discord workspace for a monthly Ask me Anything session with the authors:

https://www.packt.link/Transformers

The rest of the chapter is locked

You're reading from Transformers for Natural Language Processing - Second Edition

Table of Contents (25) Chapters

Appendix III — Generic Text Completion with GPT-2

Step 1: Activating the GPU

Step 2: Cloning the OpenAI GPT-2 repository

Step 3: Installing the requirements

Step 4: Checking the version of TensorFlow

Step 5: Downloading the 345M-parameter GPT-2 model

Steps 6-7: Intermediate instructions

Steps 7b-8: Importing and defining the model

Step 9: Interacting with GPT-2

References

Join our book’s Discord space

Authors (1)

Personalised recommendations for you

You're reading from Transformers for Natural Language Processing - Second Edition

Table of Contents (25) Chapters

Appendix III — Generic Text Completion with GPT-2

Step 1: Activating the GPU

Step 2: Cloning the OpenAI GPT-2 repository

Step 3: Installing the requirements

Step 4: Checking the version of TensorFlow

Step 5: Downloading the 345M-parameter GPT-2 model

Steps 6-7: Intermediate instructions

Steps 7b-8: Importing and defining the model

Step 9: Interacting with GPT-2

References

Join our book’s Discord space

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you