The buzz about Generative Pre-trained Transformer Version 3 (GPT-3) started with a blog post from a leading Artificial Intelligence (AI) research lab, OpenAI, on June 11, 2020. The post began as follows:
Online demos from early beta testers soon followed—some seemed too good to be true. GPT-3 was writing articles, penning poetry, answering questions, chatting with lifelike responses, translating text from one language to another, summarizing complex documents, and even writing code. The demos were incredibly impressive—things we hadn't seen a general-purpose AI system do before—but equally impressive was that many of the demos were created by people with a limited or no formal background in AI and Machine Learning (ML). GPT-3 had raised the bar, not just in terms of the technology, but also in terms of AI accessibility.
GPT-3 is a general-purpose language processing AI model that practically anybody can understand and start using in a matter of minutes. You don't need a Doctor of Philosophy (PhD) in computer science—you don't even need to know how to write code. In fact, everything you'll need to get started is right here in this book. We'll begin in this chapter with the following topics:
- Introduction to GPT-3
- Democratizing NLP
- Understanding prompts, completions, and tokens
- Introducing Davinci, Babbage, Curie, and Ada
- Understanding GPT-3 risks
This chapter requires you to have access to the OpenAI Application Programming Interface (API). You can register for API access by visiting https://openai.com/.
Introduction to GPT-3
In short, GPT-3 is a language model: a statistical model that calculates the probability distribution over a sequence of words. In other words, GPT-3 is a system for guessing which text comes next when text is given as an input.
NLP is a branch of AI that focuses on the use of natural human language for various computing applications. NLP is a broad category that encompasses many different types of language processing tasks, including sentiment analysis, speech recognition, machine translation, text generation, and text summarization, to name but a few.
In NLP, language models are used to calculate the probability distribution over a sequence of words. Language models are essential because of the extremely complex and nuanced nature of human languages. For example, pay in full and painful or tee time and teatime sound alike but have very different meanings. A phrase such as she's on fire could be literal or figurative, and words such as big and large can be used interchangeably in some cases but not in others—for example, using the word big to refer to an older sibling wouldn't have the same meaning as using the word large. Thus, language models are used to deal with this complexity, but that's easier said than done.
While understanding things such as word meanings and their appropriate usage seems trivial to humans, NLP tasks can be challenging for machines. This is especially true for more complex language processing tasks such as recognizing irony or sarcasm—tasks that even challenge humans at times.
Today, the best technical approach to a given NLP task depends on the task. So, most of the best-performing, state-of-the-art (SOTA) NLP systems are specialized systems that have been fine-tuned for a single purpose or a narrow range of tasks. Ideally, however, a single system could successfully handle any NLP task. That's the goal of GPT-3: to provide a general-purpose AI system for NLP. So, even though the best-performing NLP systems today tend to be specialized, purpose-built systems, GPT-3 achieves SOTA performance on a number of common NLP tasks, showing the potential for a future general-purpose NLP system that could provide SOTA performance for any NLP task.
What exactly is GPT-3?
Although GPT-3 is a general-purpose NLP system, it really just does one thing: it predicts what comes next based on the text that is provided as input. But it turns out that, with the right architecture and enough data, this one thing can handle a stunning array of language processing tasks.
GPT-3 is the third version of the GPT language model from OpenAI. So, although it started to become popular in the summer of 2020, the first version of GPT was announced 2 years earlier, and the following version, GPT-2, was announced in February 2019. But even though GPT-3 is the third version, the general system design and architecture hasn't changed much from GPT-2. There is one big difference, however, and that's the size of the dataset that was used for training.
GPT-3 was trained with a massive dataset comprised of text from the internet, books, and other sources, containing roughly 57 billion words and 175 billion parameters. That's 10 times larger than GPT-2 and the next-largest language model. To put the model size into perspective, the average human might read, write, speak, and hear upward of a billion words in an entire lifetime. So, GPT-3 has been trained on an estimated 57 times the number of words most humans will ever process.
The GPT-3 language model is massive, so it isn't something you'll be downloading and dabbling with on your laptop. But even if you could (which you can't because it's not available to download), it would cost millions of dollars in computing resources each time you wanted to build the model. This would put GPT-3 out of reach for most small companies and virtually all individuals if you had to rely on your own computer resource to use it. Thankfully, you don't. OpenAI makes GPT-3 available through an API that is both affordable and easy to use. So, anyone can use some of the most advanced AI ever created!
Anyone can use GPT-3 with access to the OpenAI API. The API is a general-purpose text in, text out interface that could be used for virtually any language task. To use the API, you simply pass in text and get a text response back. The task might be to do sentiment analysis, write an article, answer a question, or summarize a document. It doesn't matter, as far as the API is concerned—it's all done the same way, which makes using the API easy enough for just about anyone to use, even non-programmers.
The text you pass in is referred to as a prompt, and the returned text is called a completion. A prompt is used by GPT-3 to determine how best to complete the task. In the simplest case, a prompt can provide a few words to get started with. For example, if the prompt was If today is Monday, tomorrow is, GPT-3 would likely respond with Tuesday, along with some additional text such as If today is Tuesday, tomorrow is Wednesday, and so on. This means that what you get out of GPT-3 depends on what you send to it.
As you might guess, the quality of a completion depends heavily on the prompt. GPT-3 uses all of the text in a prompt to help generate the most relevant completion. Each and every word, along with how the prompt is structured, helps improve the language model prediction results. So, understanding how to write and test prompts is the key to unlocking GPT-3's true potential.
Understanding prompts, completions, and tokens
Literally any text can be used as a prompt—send some text in and get some text back. However, as entertaining as it can be to see what GPT-3 does with random strings, the real power comes from understanding how to write effective prompts.
Prompts are how you get GPT-3 to do what you want. It's like programming, but with plain English. So, you have to know what you're trying to accomplish, but rather than writing code, you use words and plain text.
When you're writing prompts, the main thing to keep in mind is that GPT-3 is trying to figure out which text should come next, so including things such as instructions and examples provides context that helps the model figure out the best possible completion. Also, quality matters— for example, spelling, unclear text, and the number of examples provided will have an effect on the quality of the completion.
Another key consideration is the prompt size. While a prompt can be any text, the prompt and the resulting completion must add up to fewer than 2,048 tokens. We'll discuss tokens a bit later in this chapter, but that's roughly 1,500 words.
So, a prompt can be any text, and there aren't hard and fast rules that must be followed like there are when you're writing code. However, there are some guidelines for structuring your prompt text that can be helpful in getting the best results.
Different kinds of prompts
- Zero-shot prompts
- One-shot prompts
- Few-shot prompts
A zero-shot prompt is the simplest type of prompt. It only provides a description of a task, or some text for GPT-3 to get started with. Again, it could literally be anything: a question, the start of a story, instructions—anything, but the clearer your prompt text is, the easier it will be for GPT-3 to understand what should come next. Here is an example of a zero-shot prompt for generating an email message. The completion will pick up where the prompt ends—in this case, after
Write an email to my friend Jay from me Steve thanking him for covering my shift this past Friday. Tell him to let me know if I can ever return the favor. Subject:
The following screenshot is taken from a web-based testing tool called the Playground. We'll discuss the Playground more in Chapter 2, GPT-3 Applications and Use Cases, and Chapter 3, Working with the OpenAI Playground, but for now we'll just use it to show the completion generated by GPT-3 as a result of the preceding prompt. Note that the original prompt text is bold, and the completion shows as regular text:
So, a zero-shot prompt is just a few words or a short description of a task without any examples. Sometimes this is all GPT-3 needs to complete the task. Other times, you may need to include one or more examples. A prompt that provides a single example is referred to as a one-shot prompt.
A one-shot prompt provides one example that GPT-3 can use to learn how to best complete a task. Here is an example of a one-shot prompt that provides a task description (the first line) and a single example (the second line):
A list of actors in the movie Star Wars 1. Mark Hamill: Luke Skywalker
From just the description and the one example, GPT-3 learns what the task is and that it should be completed. In this example, the task is to create a list of actors from the movie Star Wars. The following screenshot shows the completion generated from this prompt:
The one-shot prompt works great for lists and commonly understood patterns. But sometimes you'll need more than one example. When that's the case you'll use a few-shot prompt.
A few-shot prompt provides multiple examples—typically, 10 to 100. Multiple examples can be useful for showing a pattern that GPT-3 should continue. Few-shot prompts and more examples will likely increase the quality of the completion because the prompt provides more for GPT-3 to learn from.
Here is an example of a few-shot prompt to generate a simulated conversation. Notice that the examples provide a back-and-forth dialog, with things that might be said in a conversation:
This is a conversation between Steve, the author of the book Exploring GPT-3 and someone who is reading the book. Reader: Why did you decide to write the book? Steve: Because I'm super fascinated by GPT-3 and emerging technology in general. Reader: What will I learn from this book? Steve: The book provides an introduction to GPT-3 from OpenAI. You'll learn what GPT-3 is and how to get started using it. Reader: Do I need to be a coder to follow along? Steve: No. Even if you've never written a line of code before, you'll be able to follow along just fine. Reader:
In the following screenshot, you can see that GPT-3 continues the simulated conversation that was started in the examples provided in the prompt:
The OpenAI API can handle a variety of tasks. The possibilities range from generating original stories to performing complex text analysis, and everything in between. To get familiar with the kinds of tasks GPT-3 can perform, OpenAI provides a number of prompt examples. You can find example prompts in the Playground and in the OpenAI documentation.
In the Playground, the examples are referred to as presets. Again, we'll cover the Playground in detail in Chapter 3, Working with the OpenAI Playground, but the following screenshot shows some of the presets that are available:
Example prompts are also available in the OpenAI documentation. The OpenAI documentation is excellent and includes a number of great prompt examples, with links to open and test them in the Playground. The following screenshot shows an example prompt from the OpenAI documentation. Notice the Open this example in Playground link below the prompt example. You can use that link to open the prompt in the Playground:
Again, a completion refers to the text that is generated and returned as a result of the provided prompt/input. You'll also recall that GPT-3 was not specifically trained to perform any one type of NLP task—it's a general-purpose language processing system. However, GPT-3 can be shown how to complete a given task using a prompt. This is called meta-learning.
With most NLP systems, the data used to teach the system how to complete a task is provided when the underlying ML model is trained. So, to improve results for a given task, the underlying training must be updated, and a new version of the model must be built. GPT-3 works differently, as it isn't trained for any specific task. Rather, it was designed to recognize patterns in the prompt text and to continue the pattern(s) by using the underlying general-purpose model. This approach is referred to as meta-learning because the prompt is used to teach GPT-3 how to generate the best possible completion, without the need for retraining. So, in effect, the different prompt types (zero-shot, one-shot, and few-shot) can be used to program GPT-3 for different types of tasks, and you can provide a lot of instructions in the prompt—up to 2,048 tokens. Alright—now is a good time to talk about tokens.
When a prompt is sent to GPT-3, it's broken down into tokens. Tokens are numeric representations of words or—more often—parts of words. Numbers are used for tokens rather than words or sentences because they can be processed more efficiently. This enables GPT-3 to work with relatively large amounts of text. That said, as you've learned, there is still a limit of 2,048 tokens (approximately ~1,500 words) for the combined prompt and the resulting generated completion.
You can stay under the token limit by estimating the number of tokens that will be used in your prompt and resulting completion. On average, for English words, every four characters represent one token. So, just add the number of characters in your prompt to the response length and divide the sum by four. This will give you a general idea of the tokens required. This is helpful if you're trying to get an idea of how many tokens are required for a number of tasks.
Another way to get the token count is with the token count indicator in the Playground. This is located just under the large text input, on the bottom right. The magnified area in the following screenshot shows the token count. If you hover your mouse over the number, you'll also see the total count with the completion. For our example, the prompt Do or do not. There is no try.—the wise words from Master Yoda—uses 10 tokens and 74 tokens with the completion:
While understanding tokens is important for staying under the 2,048 token limit, they are also important to understand because tokens are what OpenAI uses as the basis for usage fees. Overall token usage reporting is available for your account at https://beta.openai.com/account/usage. The following screenshot shows an example usage report. We'll discuss this more in Chapter 3, Working with the OpenAI Playground:
In addition to token usage, the other thing that affects the costs associated with using GPT-3 is the engine you choose to process your prompts. The engine refers to the language model that will be used. The main difference between the engines is the size of the associated model. Larger models can complete more complex tasks, but smaller models are more efficient. So, depending on the task complexity, you can significantly reduce costs by using a smaller model. The following screenshot shows the model pricing at the time of publishing. As you can see, the cost differences can be significant:
Introducing Davinci, Babbage, Curie, and Ada
The massive dataset that is used for training GPT-3 is the primary reason why it's so powerful. However, bigger is only better when it's necessary—and more power comes at a cost. For those reasons, OpenAI provides multiple models to choose from. Today there are four primary models available, along with a model for content filtering and instruct models.
The available models or engines (as they're also referred to) are named
Ada. Of the four,
Davinci is the largest and most capable.
Davinci can perform any tasks that any other engine can perform.
Babbage is the next most capable engine, which can do anything that
Ada can do.
Ada is the least capable engine, but the best-performing and lowest-cost engine.
When you're getting started and for initially testing new prompts, you'll usually want to begin with Davinci , then try,
Curie to see if one of them can complete the task faster or more cost-effectively. The following is an overview of each engine and the types of tasks that might be best suited for each. However, keep in mind that you'll want to test. Even though the smaller engines might not be trained with as much data, they are all still general-purpose models.
Davinci is the most capable model and can do anything that any other model can do, and much more—often with fewer instructions.
Davinci is able to solve logic problems, determine cause and effect, understand the intent of text, produce creative content, explain character motives, and handle complex summarization tasks.
Curie tries to balance power and speed. It can do anything that
Babbage can do but it's also capable of handling more complex classification tasks and more nuanced tasks like summarization, sentiment analysis, chatbot applications, and Question and Answers.
Babbage is a bit more capable than
Ada but not quite as performant. It can perform all the same tasks as
Ada, but it can also handle a bit more involved classification tasks, and it's well suited for semantic search tasks that rank how well documents match a search query.
Ada is usually the fastest model and least costly. It's best for less nuanced tasks—for example, parsing text, reformatting text, and simpler classification tasks. The more context you provide
Ada, the better it will likely perform.
Content filtering model
These are models that are built on top of the
Curie models. Instruct models are tuned to make it easier to tell the API what you want it to do. Clear instructions can often produce better results than the associated core model.
A snapshot in time
A final note to keep in mind about all of the engines is that they are all a snapshot in time, meaning the data used to train them cuts off on the date the model was built. So, GPT-3 is not working with up-to-the-minute or even up-to-the-day data—it's likely weeks or months old. OpenAI is planning to add more continuous training in the future, but today this is a consideration to keep in mind.
All of the GPT-3 models are extremely powerful and capable of generating text that is indistinguishable from human-written text. This holds tremendous potential for all kinds of potential applications. In most cases, that's a good thing. However, not all potential use cases are good.
Understanding GPT-3 risks
GPT-3 is a fantastic technology, with numerous practical and valuable potential applications. But as is often the case with powerful technologies, with its potential comes risk. In GPT-3's case, some of those risks include inappropriate results and potentially malicious use cases.
Inappropriate or offensive results
GPT-3 generates text so well that it can seem as though it is aware of what it is saying. It's not. It's an AI system with an excellent language model—it is not conscious in any way, so it will never willfully say something hurtful or inappropriate because it has no will. That said, it can certainly generate inappropriate, hateful, or malicious results—it's just not intentional.
Nevertheless, understanding that GPT-3 can and will likely generate offensive text at times needs to be understood and considered when using GPT or making GPT-3 results available to others. This is especially true for results that might be seen by children. We'll discuss this more and look at how to deal with it in Chapter 6, Content Filtering.
Potential for malicious use
It's not hard to imagine potentially malicious or harmful uses for GPT-3. OpenAI even describes how GPT-3 could be weaponized for misinformation campaigns or for creating fake product reviews. But OpenAI's declared mission is to ensure that artificial general intelligence benefits all of humanity. Hence, pursuing that mission includes taking responsible steps to prevent their AI from being used for the wrong purposes. So, OpenAI has implemented an application approval process for all applications that will use GPT-3 or the OpenAI API.
But as application developers, this is something we also need to consider. When we build an application that uses GPT-3, we need to consider if and how the application could be used for the wrong purposes and take the necessary steps to prevent it. We'll talk more about this in Chapter 10, Going Live with OpenAI-Powered Apps.
In this chapter, you learned that GPT-3 is a general-purpose language model for processing virtually any language processing task. You learned how GPT-3 works at a high level, along with key terms and concepts. We introduced the available models and discussed how all GPT-3 applications must go through an approval process to prevent potentially inappropriate or harmful results.
In the next chapter, we'll discuss different ways to use GPT-3 and look at specific GPT-3 use case examples.