You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

Product typeBook

Published inFeb 2024

Reading LevelN/a

PublisherPackt

ISBN-139781805128724

Edition3rd Edition

Languages

Python

Tools

PyTorch

Concepts

Deep Learning

Author (1)

Denis Rothman

What this book covers

Part I: The Foundations of Transformers

Chapter 1, What Are Transformers?, explains, at a high level, what transformers and Foundation Models are. We will first unveil the incredible power of the deceptively simple O(1) time complexity of transformer models that changed everything. We will continue to discover how a hardly known transformer algorithm in 2017 rose to dominate so many domains and brought us Foundation Models.

Chapter 2, Getting Started with the Architecture of the Transformer Model, goes through the background of NLP to understand how RNN, LSTM, and CNN architectures were abandoned and how the transformer architecture opened a new era. We will go through the Original Transformer’s architecture through the unique Attention Is All You Need approach invented by the Google Research and Google Brain authors. We will describe the theory of transformers. We will get our hands dirty in Python to see how multi-attention head sublayers work.

Chapter 3, Emergent vs Downstream Tasks: The Unseen Depths of Transformers, bridges the gap between the functional and mathematical architecture of transformers by introducing emergence. We will then see how to measure the performance of transformers before exploring several downstream tasks, such as the Standard Sentiment TreeBank (SST-2), linguistic acceptability, and Winograd schemas.

Chapter 4, Advancements in Translations with Google Trax, Google Translate, and Gemini, goes through machine translation in three steps. We will first define what machine translation is. We will then preprocess a Workshop on Machine Translation (WMT) dataset. Finally, we will see how to implement machine translations.

Chapter 5, Diving into Fine-Tuning through BERT, builds on the architecture of the Original Transformer. Bidirectional Encoder Representations from Transformers (BERT) takes transformers into a vast new way of perceiving the world of NLP. Instead of analyzing a past sequence to predict a future sequence, BERT attends to the whole sequence! We will first go through the key innovations of BERT’s architecture and then fine-tune a BERT model by going through each step in a Google Colaboratory notebook. Like humans, BERT can learn tasks and perform other new ones without having to learn the topic from scratch.

Chapter 6, Pretraining a Transformer from Scratch through RoBERTa, builds a RoBERTa transformer model from scratch using the Hugging Face PyTorch modules. The transformer will be both BERT-like and DistilBERT-like. First, we will train a tokenizer from scratch on a customized dataset. Finally, we will put the knowledge acquired in this chapter to work and pretrain a Generative AI customer support model on X (formerly Twitter) data.

Part II: The Rise of Suprahuman NLP

Chapter 7, The Generative AI Revolution with ChatGPT, goes through the tremendous improvements and diffusion of ChatGPT models into the everyday lives of developers and end-users. We will first examine the architecture of OpenAI’s GPT models before working with the GPT-4 API and its hyperparameters to implement several NLP examples. Finally, we will learn how to obtain better results with Retrieval Augmented Generation (RAG). We will implement an example of automated RAG with GPT-4.

Chapter 8, Fine-Tuning OpenAI GPT Models, explores fine-tuning to make sense of the choices we can make for a project to go in this direction or not. We will introduce risk management perspectives. We will prepare a dataset and fine-tune a cost-effective babbage-02 model for a completion task.

Chapter 9, Shattering the Black Box with Interpretable Tools, lifts the lid on the black box that is transformer models by visualizing their activity. We will use BertViz to visualize attention heads, Language Interpretability Tool (LIT) to carry out a Principal Component Analysis (PCA), and LIME to visualize transformers via dictionary learning. OpenAI LLMs will take us deeper and visualize the activity of a neuron in a transformer with an interactive interface. This approach opens the door to GPT-4 explaining a transformer, for example.

Chapter 10, Investigating the Role of Tokenizers in Shaping Transformer Models, introduces some tokenizer-agnostic best practices to measure the quality of a tokenizer. We will describe basic guidelines for datasets and tokenizers from a tokenization perspective. We will explore word and subword tokenizers and show how a tokenizer can shape a transformer model’s training and performance. Finally, we will build a function to display and control token-ID mappings.

Chapter 11, Leveraging LLM Embeddings as an Alternative to Fine-Tuning, explains why searching with embeddings can sometimes be a very effective alternative to fine-tuning. We will go through the advantages and limits of this approach. We will go through the fundamentals of text embeddings. We will build a program that reads a file, tokenizes it, and embeds it with Gensim and Word2Vec. We will implement a question-answering program on sports events and use OpenAI Ada to embed Amazon fine food reviews. By the end of the chapter, we will have taken a system from prompt design to advanced prompt engineering using embeddings for RAG.

Chapter 12, Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4, goes through the revolutionary concepts of syntax-free, nonrepetitive stochastic models. We will use ChatGPT Plus with GPT-4 to run easy to complex Semantic Role Labeling (SRL) samples. We will see how a general-purpose, emergent model reacts to our SRL requests. We will progressively push the transformer model to the limits of SRL.

Chapter 13, Summarization with T5 and ChatGPT, goes through the concepts and architecture of the T5 transformer model. We will then apply T5 to summarize documents with Hugging Face models. The examples in this chapter will be legal and medical to explore domain-specific summarization beyond simple texts. We are not looking for an easy way to implement NLP but preparing ourselves for the reality of real-life projects. We will then compare T5 and ChatGPT approaches to summarization.

Chapter 14, Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2, examines Pathways to understand PaLM. We will continue and look at the main features of PaLM (Pathways Language Model), a decoder-only, densely activated, and autoregressive transformer model with 540 billion parameters trained on Google’s Pathways system. We will see how Google PaLM 2 can perform a chat task, a discriminative task (such as classification), a completion task (also known as a generative task), and more. We will implement the Vertex AI PaLM 2 API for several NLP tasks, including question-answering and summarization. Finally, we will go through Google Cloud’s fine-tuning process.

Chapter 15, Guarding the Giants: Mitigating Risks in Large Language Models, examines the risks of LLMs, risk management, and risk mitigation tools. The chapter explains hallucinations, memorization, risky emergent behavior, disinformation, influence operations, harmful content, adversarial attacks (“jailbreaks”), privacy, cybersecurity, overreliance, and memorization. We will then go through some risk mitigation tools through advanced prompt engineering, such as implementing a moderation model, a knowledge base, keyword parsing, prompt pilots, post-processing moderation, and embeddings.

Part III: Generative Computer Vision: A New Way to See the World

Chapter 16, Beyond Text: Vision Transformers in the Dawn of Revolutionary AI, explores the innovative transformer models that respect the basic structure of the Original Transformer but make some significant changes. We will discover powerful computer vision transformers like ViT, CLIP, DALL-E, and GPT-4V. We will implement vision transformers in code, including GPT-4V, and expand the text-image interactions of DALL-3 to divergent semantic association. We will take OpenAI models into the nascent world of highly divergent semantic association creativity.

Chapter 17, Transcending the Image-Text Boundary with Stable Diffusion, delves into to diffusion models, introducing Stable Vision, which has created a disruptive generative image AI wave rippling through the market. We will then dive into the principles, math, and code of the remarkable Keras Stable Diffusion model. We will go through each of the main components of a Stable Diffusion model and peek into the source code provided by Keras and run the model. We will run a text-to-video synthesis model with Hugging Face and a video-to-text task with Meta’s TimeSformer.

Chapter 18, Hugging Face AutoTrain: Training Vision Models without Coding, explores how to train a vision transformer using Hugging Face’s AutoTrain. We will go through the automated training process and discover the unpredictable problems that show why even automated ML requires human AI expertise. The goal of this chapter is also to show how to probe the limits of a computer vision model, no matter how sophisticated it is.

Chapter 19, On the Road to Functional AGI with HuggingGPT and its Peers, shows how we can use cross-platform chained models to solve difficult image classification problems. We will put HuggingGPT and Google Cloud Vision to work to identify easy, difficult, and very difficult images. We will go beyond classical pipelines and explore how to chain heterogeneous competing models.

Chapter 20, Beyond Human-Designed Prompts with Generative Ideation, explores generative ideation, an ecosystem that automates the production of an idea to text and image content. The development phase requires highly skilled human AI experts. For an end user, the ecosystem is a click-and-run experience. By the end of this chapter, we will be able to deliver ethical, exciting, generative ideation to companies with no marketing resources. We will be able to expand generative ideation to any field in an exciting, cutting-edge, yet ethical ecosystem.

The rest of the page is locked

You have been reading a chapter from

Transformers for Natural Language Processing and Computer Vision - Third Edition

Published in: Feb 2024Publisher: PacktISBN-13: 9781805128724

Author (1)

Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

What this book covers

Part I: The Foundations of Transformers

Part II: The Rise of Suprahuman NLP

Part III: Generative Computer Vision: A New Way to See the World

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook