You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

Product typeBook

Published inFeb 2024

Reading LevelN/a

PublisherPackt

ISBN-139781805128724

Edition3rd Edition

Languages

Python

Tools

PyTorch

Concepts

Deep Learning

Author (1)

Denis Rothman

The architecture of OpenAI GPT transformer models

In 2020, Brown et al. (2020) described the training of an OpenAI GPT-3 model containing 175 billion parameters that was trained on huge datasets, such as the 400 billion byte-pair-encoded tokens extracted from Common Crawl data. OpenAI ran the training on a Microsoft Azure supercomputer with 285,00 CPUs and 10,000 GPUs.The machine intelligence of OpenAI's GPT-3 models and their supercomputer led Brown et al. (2020) to zero-shot experiments. The idea was to use a trained model for downstream tasks without further training the parameters. The goal would be for a trained model to go directly into multi-task production with an API that could even perform tasks it wasn't trained for.The era of suprahuman cloud AI models was born. OpenAI's API requires no high-level software skills or AI knowledge. You might wonder why I use the term "suprahuman." GPT-3 and a GPT-4 model(and soon more powerful ones) can perform many...

OpenAI Models as Assistants

Generative AI and Generative Pre-trained Transformers(GPTs) assistants will pervade everyday applications. From a software development perspective, nothing will ever return to the old days. ChatGPT-like models will boost the productivity of everyday software development. In this section, we will unleash the power of GPTs to use them as assistants to explain OpenAI models and engines.Go to this link to access ChatGPT Plus: https://chat.openai.com/If you don't wish to subscribe, you can try the free version OpenAI's free version: https://openai.com/chatgptChatGPT Plus offers services such as GPT-3.5, GPT-4, and Plugins.The cutoff date of GPT-4 training might limit its responses since the available data is in the past. Therefore, plugins can come in handy, such as the Bing feature that has been implemented in OpenAI. However, the plugins are continually updated, and some are retired.ChatGPT Plus models will evolve, and you will grow with them!You can...

Getting Started with the GPT-4 - API

OpenAI has some of the most powerful transformer engines in the world. One GPT-4 model can perform hundreds of tasks. GPT-3 can do many tasks it wasn't trained for.This section will use the API in Getting_Started_GPT_4_API.ipynb.To use a GPT-3, go to OpenAI's website, https://openai.com/, and sign up.We can run the examples provided by OpenAI to get started. We are once again relying on assistants.

Running our first NLP task with GPT-4

Let's start using GPT-3 in a few steps.Go to Google Colab and open Getting_Started_GPT_4_API.ipynb, which is the chapter directory of the book on GitHub.You do not need to change the hardware settings of the notebook. We are using an API, so we will not need much local computing power for the tasks in this section.The steps of this section are the same ones as in the notebook.Running an NLP is done in three simple steps:

Steps 1: Installing OpenAI and Step 2: Entering the API key

Steps 1 and 2 are the...

Retrieval Augmented Generation(RAG) with GPT-4

In this section, we will build an introductory program that implements RAG, Retrieval Augmented Generation. Document retrieval is not new. Knowledge bases have been around since the arrival of queries on databases decades ago. Generative AI isn’t new, either. RNNs were AI-driven text generators years ago. Taking these factors into account, we can say that RAG is not an innovation but an improvement that compensates for the lack of precision, training data, and responses of generative AI models. It can also avoid fine-tuning a model in some instances.There are also different ways of performing augmented generation, as we will see, among which:

Chapter 11, Leveraging LLM Embeddings as an Alternative to Fine-Tuning, is where we will implement embedded data.
Chapter 15, Guarding the Giants: Mitigating Risks in Large Language Models, in which one of the mitigating solutions is to implement knowledge bases.
Chapter 20: Beyond Human-Designed...

Summary

We began the chapter by seeing how OpenAI Generative Pre-trained Transformer models (GPT) are General Purpose Technologies (GPTs). As such, they improve rapidly, and their diffusion is highly pervasive. The Generative AI functionality of OpenAI’s models has opened the horizons for mainstream applications.We continued by examining the improvements in the architecture of GPTs through decoder stacks, scaling, and machine power. These improvements led to the Big Bang creation of ChatGPT, which spread into mainstream everyday lives, pervading applications such as search engines, Office tools, and more.We started with some of the many generative transformer assistants, including ChatGPT, New Bing, GitHub Copilot, Microsoft 365 Office Copilot, and OpenAI’s Playground.Our journey then led to building several examples with the OpenAI GPT-4 API, such as grammar corrections, translations, and more. Finally, we build an example of how Retrieval Augmented Generation (RAG) can...

Questions

A zero-shot method trains the parameters once. (True/False)
Gradient updates are performed when running zero-shot models. (True/False)
GPT models only have a decoder stack. (True/False)
OpenAI GPT models are not GPTs. (True/False)
The diffusion of generative transformer models is very slow in everyday applications. (True/False)
GPT-3 models have been useless since GPT-4 was made public. (True/False)
ChatGPT models are not completion models. (True/False)
Gradio is a transformer model. (True/False)
Supercomputers with 285,000 CPUs do not exist. (True/False)
Supercomputers with thousands of GPUs are game-changers in AI. (True/False)

References

OpenAI and GPT-3 engines: https://beta.openai.com/docs/engines/engines
BertViz GitHub Repository by Jesse Vig: https://github.com/jessevig/bertviz
OpenAI's supercomputer: https://blogs.microsoft.com/ai/openai-azure-supercomputer/
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018, Improving Language Understanding by Generative Pre-Training: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019, Language Models are Unsupervised Multi-task Learners: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Common Crawl data: https://commoncrawl.org/big-picture/

GPT-4 Technical Report, OpenAI 2023, https://arxiv.org/pdf/2303.08774.pdf

Our book's Discord space

Join the book's Discord workspace:https://www.packt.link/Transformers

A picture containing black, darkness Description automatically generated

Questions

It is useless to fine-tune an OpenAI model. (True/False)
Any pretrained OpenAI model can do the task we need without fine-tuning. (True/False)
We don’t need to prepare a dataset to fine-tune an OpenAI model. (True/False)
We don’t need one if no datasets are available on the web (follow-up question for Question 3. (True/False)
We don’t need to keep track of the fine-tunes we created. (True/False)
As of January 2024, anybody can access our fine-tunes. (True/False)
A standard model can sometimes produce a similar output to a fine-tuned model. (True/False)
GPT-4 cannot be fine-tuned. (True/False)
GPT-3 cannot be fine-tuned. (True/False)
We can provide raw data with no preparation for fine-tuning. (True/False)

References

OpenAI fine-tuning documentation: https://platform.openai.com/docs/guides/fine-tuning

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

The architecture of OpenAI GPT transformer models

OpenAI Models as Assistants

Getting Started with the GPT-4 - API

Running our first NLP task with GPT-4

Steps 1: Installing OpenAI and Step 2: Entering the API key

Retrieval Augmented Generation(RAG) with GPT-4

Summary

Questions

References

Further Reading

Our book's Discord space

Questions

References

Further reading

Join our community on Discord

Author (1)