Reader small image

You're reading from  Transformers for Natural Language Processing and Computer Vision - Third Edition

Product typeBook
Published inFeb 2024
Reading LevelN/a
PublisherPackt
ISBN-139781805128724
Edition3rd Edition
Languages
Tools
Right arrow
Author (1)
Denis Rothman
Denis Rothman
author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Right arrow

The architecture of OpenAI GPT transformer models

In 2020, Brown et al. (2020) described the training of an OpenAI GPT-3 model containing 175 billion parameters that was trained on huge datasets, such as the 400 billion byte-pair-encoded tokens extracted from Common Crawl data. OpenAI ran the training on a Microsoft Azure supercomputer with 285,00 CPUs and 10,000 GPUs.The machine intelligence of OpenAI's GPT-3 models and their supercomputer led Brown et al. (2020) to zero-shot experiments. The idea was to use a trained model for downstream tasks without further training the parameters. The goal would be for a trained model to go directly into multi-task production with an API that could even perform tasks it wasn't trained for.The era of suprahuman cloud AI models was born. OpenAI's API requires no high-level software skills or AI knowledge. You might wonder why I use the term "suprahuman." GPT-3 and a GPT-4 model(and soon more powerful ones) can perform many...

OpenAI Models as Assistants

Generative AI and Generative Pre-trained Transformers(GPTs) assistants will pervade everyday applications. From a software development perspective, nothing will ever return to the old days. ChatGPT-like models will boost the productivity of everyday software development. In this section, we will unleash the power of GPTs to use them as assistants to explain OpenAI models and engines.Go to this link to access ChatGPT Plus: https://chat.openai.com/If you don't wish to subscribe, you can try the free version OpenAI's free version: https://openai.com/chatgptChatGPT Plus offers services such as GPT-3.5, GPT-4, and Plugins.The cutoff date of GPT-4 training might limit its responses since the available data is in the past. Therefore, plugins can come in handy, such as the Bing feature that has been implemented in OpenAI. However, the plugins are continually updated, and some are retired.ChatGPT Plus models will evolve, and you will grow with them!You can...

Getting Started with the GPT-4 - API

OpenAI has some of the most powerful transformer engines in the world. One GPT-4 model can perform hundreds of tasks. GPT-3 can do many tasks it wasn't trained for.This section will use the API in Getting_Started_GPT_4_API.ipynb.To use a GPT-3, go to OpenAI's website, https://openai.com/, and sign up.We can run the examples provided by OpenAI to get started. We are once again relying on assistants.

Running our first NLP task with GPT-4

Let's start using GPT-3 in a few steps.Go to Google Colab and open Getting_Started_GPT_4_API.ipynb, which is the chapter directory of the book on GitHub.You do not need to change the hardware settings of the notebook. We are using an API, so we will not need much local computing power for the tasks in this section.The steps of this section are the same ones as in the notebook.Running an NLP is done in three simple steps:

Steps 1: Installing OpenAI and Step 2: Entering the API key

Steps 1 and 2 are the...

Retrieval Augmented Generation(RAG) with GPT-4

In this section, we will build an introductory program that implements RAG, Retrieval Augmented Generation. Document retrieval is not new. Knowledge bases have been around since the arrival of queries on databases decades ago. Generative AI isn’t new, either. RNNs were AI-driven text generators years ago. Taking these factors into account, we can say that RAG is not an innovation but an improvement that compensates for the lack of precision, training data, and responses of generative AI models. It can also avoid fine-tuning a model in some instances.There are also different ways of performing augmented generation, as we will see, among which:

  • Chapter 11, Leveraging LLM Embeddings as an Alternative to Fine-Tuning, is where we will implement embedded data.
  • Chapter 15, Guarding the Giants: Mitigating Risks in Large Language Models, in which one of the mitigating solutions is to implement knowledge bases.
  • Chapter 20: Beyond Human-Designed...

Summary

We began the chapter by seeing how OpenAI Generative Pre-trained Transformer models (GPT) are General Purpose Technologies (GPTs). As such, they improve rapidly, and their diffusion is highly pervasive. The Generative AI functionality of OpenAI’s models has opened the horizons for mainstream applications.We continued by examining the improvements in the architecture of GPTs through decoder stacks, scaling, and machine power. These improvements led to the Big Bang creation of ChatGPT, which spread into mainstream everyday lives, pervading applications such as search engines, Office tools, and more.We started with some of the many generative transformer assistants, including ChatGPT, New Bing, GitHub Copilot, Microsoft 365 Office Copilot, and OpenAI’s Playground.Our journey then led to building several examples with the OpenAI GPT-4 API, such as grammar corrections, translations, and more. Finally, we build an example of how Retrieval Augmented Generation (RAG) can...

Questions

  1. A zero-shot method trains the parameters once. (True/False)
  2. Gradient updates are performed when running zero-shot models. (True/False)
  3. GPT models only have a decoder stack. (True/False)
  4. OpenAI GPT models are not GPTs. (True/False)
  5. The diffusion of generative transformer models is very slow in everyday applications. (True/False)
  6. GPT-3 models have been useless since GPT-4 was made public. (True/False)
  7. ChatGPT models are not completion models. (True/False)
  8. Gradio is a transformer model. (True/False)
  9. Supercomputers with 285,000 CPUs do not exist. (True/False)
  10. Supercomputers with thousands of GPUs are game-changers in AI. (True/False)

References

GPT-4 Technical Report, OpenAI 2023, https://arxiv.org/pdf/2303.08774.pdf

Further Reading

  • Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman, 2019, SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems: https://w4ngatang.github.io/static/papers/superglue.pdf
  • Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplany, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020, Language Models are Few-Shot Learners: https://arxiv.org/abs/2005.14165
  • Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman, 2019, GLUE: A Multi-Task Benchmark and...

Our book's Discord space

Join the book's Discord workspace:https://www.packt.link/Transformers

A picture containing black, darkness Description automatically generated

Questions

  1. It is useless to fine-tune an OpenAI model. (True/False)
  2. Any pretrained OpenAI model can do the task we need without fine-tuning. (True/False)
  3. We don’t need to prepare a dataset to fine-tune an OpenAI model. (True/False)
  4. We don’t need one if no datasets are available on the web (follow-up question for Question 3. (True/False)
  5. We don’t need to keep track of the fine-tunes we created. (True/False)
  6. As of January 2024, anybody can access our fine-tunes. (True/False)
  7. A standard model can sometimes produce a similar output to a fine-tuned model. (True/False)
  8. GPT-4 cannot be fine-tuned. (True/False)
  9. GPT-3 cannot be fine-tuned. (True/False)
  10. We can provide raw data with no preparation for fine-tuning. (True/False)

References

Further reading

  • Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, and Sanjeev Arora, 2023, Fine-Tuning Language Models with Just Forward Passes: https://arxiv.org/abs/2305.17333

Join our community on Discord

Join our community’s Discord space for discussions with the authors and other readers:

https://www.packt.link/Transformers

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Transformers for Natural Language Processing and Computer Vision - Third Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781805128724
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman