Reader small image

You're reading from  Transformers for Natural Language Processing - Second Edition

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781803247335
Edition2nd Edition
Right arrow
Author (1)
Denis Rothman
Denis Rothman
author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Right arrow

Chapter 1, What are Transformers?

  1. We are still in the Third Industrial Revolution. (True/False)

    False. Eras in history indeed overlap. However, the Third Industrial Revolution focused on making the world digital. The Fourth Industrial Revolution has begun to connect everything to everything else: systems, machines, bots, robots, algorithms, and more.

  1. The Fourth Industrial Revolution is connecting everything to everything else. (True/False)

    True. This leads to an increasing amount of automated decisions that formerly required human intervention.

  1. Industry 4.0 developers will sometimes have no AI development to do. (True/False)

    True. In some projects, AI will be an online service that requires no development.

  1. Industry 4.0 developers might have to implement transformers from scratch. (True/False)

    True. In some projects, not all, standard online services or APIs might not satisfy the needs of a project. There...

Chapter 2, Getting Started with the Architecture of the Transformer Model

  1. NLP transduction can encode and decode text representations. (True/False)

    True. NLP is transduction that converts sequences (written or oral) into numerical representations, processes them, and decodes the results back into text.

  1. Natural Language Understanding (NLU) is a subset of Natural Language Processing (NLP). (True/False)

    True.

  1. Language modeling algorithms generate probable sequences of words based on input sequences. (True/False)

    True.

  1. A transformer is a customized LSTM with a CNN layer. (True/False)

    False. A transformer does not contain an LSTM or a CNN at all.

  1. A transformer does not contain LSTM or CNN layers. (True/False)

    True.

  1. Attention examines all the tokens in a sequence, not just the last one. (True/False)

    True.

  1. A transformer does not use positional encoding...

Chapter 3, Fine-Tuning BERT Models

  1. BERT stands for Bidirectional Encoder Representations from Transformers. (True/False)

    True.

  1. BERT is a two-step framework. Step 1 is pretraining. Step 2 is fine-tuning. (True/False)

    True.

  1. Fine-tuning a BERT model implies training parameters from scratch. (True/False)

    False. BERT fine-tuning is initialized with the trained parameters of pretraining.

  1. BERT only pretrains using all downstream tasks. (True/False)

    False.

  1. BERT pretrains on Masked Language Modeling (MLM). (True/False)

    True.

  1. BERT pretrains on Next Sentence Prediction (NSP). (True/False)

    True.

  1. BERT pretrains on mathematical functions. (True/False)

    False.

  1. A question-answer task is a downstream task. (True/False)

    True.

  1. A BERT pretraining model does not require tokenization. (True/False)

    False.

    ...

Chapter 4, Pretraining a RoBERTa Model from Scratch

  1. RoBERTa uses a byte-level byte-pair encoding tokenizer. (True/False)

    True.

  1. A trained Hugging Face tokenizer produces merges.txt and vocab.json. (True/False)

    True.

  1. RoBERTa does not use token-type IDs. (True/False)

    True.

  1. DistilBERT has 6 layers and 12 heads. (True/False)

    True.

  1. A transformer model with 80 million parameters is enormous. (True/False)

    False. 80 million parameters is a small model.

  1. We cannot train a tokenizer. (True/False)

    False. A tokenizer can be trained.

  1. A BERT-like model has six decoder layers. (True/False)

    False. BERT contains six encoder layers, not decoder layers.

  1. MLM predicts a word contained in a mask token in a sentence. (True/False)

    True.

  1. A BERT-like model has no self-attention sublayers. (True/False)

    False. BERT has self...

Chapter 5, Downstream NLP Tasks with Transformers

  1. Machine intelligence uses the same data as humans to make predictions. (True/False)

    True and False.

    True. In some cases, machine intelligence surpasses humans when processing massive amounts of data to extract meaning and perform a range of tasks that would take centuries for humans to process.

    False. For NLU, humans have access to more information through their senses. Machine intelligence relies on what humans provide for all types of media.

  1. SuperGLUE is more difficult than GLUE for NLP models. (True/False)

    True.

  1. BoolQ expects a binary answer. (True/False)

    True.

  1. WiC stands for Words in Context. (True/False)

    True.

  1. Recognizing Textual Entailment (RTE) detects whether one sequence entails another sequence. (True/False)

    True.

  1. A Winograd schema predicts whether a verb is spelled correctly. (True/False) ...

Chapter 6, Machine Translation with the Transformer

  1. Machine translation has now exceeded human baselines. (True/False)

    False. Machine translation is one of the most challenging NLP ML tasks.

  1. Machine translation requires large datasets. (True/False)

    True.

  1. There is no need to compare transformer models using the same datasets. (True/False)

    False. The only way to compare different models is to use the same datasets.

  1. BLEU is the French word for blue and is the acronym of an NLP metric. (True/False)

    True. BLEU stands for Bilingual Evaluation Understudy Score, making it easy to remember.

  1. Smoothing techniques enhance BERT. (True/False)

    True.

  1. German-English is the same as English-German for machine translation. (True/False)

    False. Representing German and then translating into another language is not the same process as representing English and translating into another...

Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines

  1. A zero-shot method trains the parameters once. (True/False)

    False. No parameters are trained.

  1. Gradient updates are performed when running zero-shot models. (True/False)

    False.

  1. GPT models only have a decoder stack. (True/False)

    True.

  1. It is impossible to train a 117M GPT model on a local machine. (True/False)

    False. We trained one in this chapter.

  1. It is impossible to train the GPT-2 model with a specific dataset. (True/False)

    False. We trained one in this chapter.

  1. A GPT-2 model cannot be conditioned to generate text. (True/False)

    False. We implemented this in this chapter.

  1. A GPT-2 model can analyze the context of input and produce completion content. (True/False)

    True.

  1. We cannot interact with a 345M GTP parameter model on a machine with fewer than eight GPUs....

Chapter 9, Matching Tokenizers and Datasets

  1. A tokenized dictionary contains every word that exists in a language. (True/False)

    False.

  1. Pretrained tokenizers can encode any dataset. (True/False)

    False.

  1. It is good practice to check a database before using it. (True/False)

    True.

  1. It is good practice to eliminate obscene data from datasets. (True/False)

    True.

  1. It is good practice to delete data containing discriminating assertions. (True/False)

    True.

  1. Raw datasets might sometimes produce relationships between noisy content and useful content. (True/False)

    True.

  1. A standard pretrained tokenizer contains the English vocabulary of the past 700 years. (True/False)

    False.

  1. Old English can create problems when encoding data with a tokenizer trained in modern English. (True/False)

    True.

  1. Medical and other types of jargon...

Chapter 10, Semantic Role Labeling with BERT-Based Transformers

  1. Semantic Role Labeling (SRL) is a text generation task. (True/False)

    False.

  1. A predicate is a noun. (True/False)

    False.

  1. A verb is a predicate. (True/False)

    True.

  1. Arguments can describe who and what is doing something. (True/False)

    True.

  1. A modifier can be an adverb. (True/False)

    True.

  1. A modifier can be a location. (True/False)

    True.

  1. A BERT-based model contains encoder and decoder stacks. (True/False)

    False.

  1. A BERT-based SRL model has standard input formats. (True/False)

    True.

  1. Transformers can solve any SRL task. (True/False)

    False.

Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers

  1. A trained transformer model can answer any question. (True/False)

    False.

  1. Question-answering requires no further research. It is perfect as it is. (True/False)

    False.

  1. Named Entity Recognition (NER) can provide useful information when looking for meaningful questions. (True/False)

    True.

  1. Semantic Role Labeling (SRL) is useless when preparing questions. (True/False)

    False.

  1. A question generator is an excellent way to produce questions. (True/False)

    True.

  1. Implementing question-answering requires careful project management. (True/False)

    True.

  1. ELECTRA models have the same architecture as GPT-2. (True/False)

    False.

  1. ELECTRA models have the same architecture as BERT but are trained as discriminators. (True/False)

    True.

  1. NER can recognize a location...

Chapter 12, Detecting Customer Emotions to Make Predictions

  1. It is not necessary to pretrain transformers for sentiment analysis. (True/False)

    False.

  1. A sentence is always positive or negative. It cannot be neutral. (True/False)

    False.

  1. The principle of compositionality signifies that a transformer must grasp every part of a sentence to understand it. (True/False)

    True.

  1. RoBERTa-large was designed to improve the pretraining process of transformer models. (True/False)

    True.

  1. A transformer can provide feedback that informs us of whether a customer is satisfied or not. (True/False)

    True.

  1. If the sentiment analysis of a product or service is consistently negative, it helps us make appropriate decisions to improve our offer. (True/False)

    True.

  1. If a model fails to provide a good result on a task, it requires more training or fine-tuning before changing models...

Chapter 13, Analyzing Fake News with Transformers

  1. News labeled as fake news is always fake. (True/False)

    False.

  1. News that everybody agrees with is always accurate. (True/False)

    False.

  1. Transformers can be used to run sentiment analysis on Tweets. (True/False)

    True.

  1. Key entities can be extracted from Facebook messages with a DistilBERT model running NER. (True/False)

    True.

  1. Key verbs can be identified from YouTube chats with BERT-based models running SRL. (True/False)

    True.

  1. Emotional reactions are a natural first response to fake news. (True/False)

    True.

  1. A rational approach to fake news can help clarify one’s position. (True/False)

    True.

  1. Connecting transformers to reliable websites can help somebody understand why some news is fake. (True/False)

    True.

  1. Transformers can make summaries of reliable websites...

Chapter 14, Interpreting Black Box Transformer Models

  1. BERTViz only shows the output of the last layer of the BERT model. (True/False)

    False. BERTViz displays the outputs of all the layers.

  1. BERTViz shows the attention heads of each layer of a BERT model. (True/False)

    True.

  1. BERTViz shows how the tokens relate to each other. (True/False)

    True.

  1. LIT shows the inner workings of the attention heads like BERTViz. (True/False)

    False. However, LIT makes non-probing predictions.

  1. Probing is a way for an algorithm to predict language representations. (True/False)

    True.

  1. NER is a probing task. (True/False)

    True.

  1. PCA and UMAP are non-probing tasks. (True/False)

    True.

  1. LIME is model-agnostic. (True/False)

    True.

  1. Transformers deepen the relationships of the tokens layer by layer. (True/False)

    True.

    ...

Chapter 15, From NLP to Task-Agnostic Transformer Models

  1. Reformer transformer models don’t contain encoders. (True/False)

    False. Reformer transformer models contain encoders.

  1. Reformer transformer models don’t contain decoders. (True/False)

    False. Reformer transformer models contain encoders and decoders.

  1. The inputs are stored layer by layer in Reformer models. (True/False)

    False. The inputs are recomputed at each level, thus saving memory.

  1. DeBERTa transformer models disentangle content and positions. (True/False)

    True.

  1. It is necessary to test the hundreds of pretrained transformer models before choosing one for a project. (True/False)

    True and False. You can try all of the models, or you can choose a very reliable model and implement it to fit your needs.

  1. The latest transformer model is always the best. (True/False)

    True and false. A lot of research...

Chapter 16, The Emergence of Transformer-Driven Copilots

  1. AI copilots that can generate code automatically do not exist. (True/False)

    False. GitHub Copilot, for example, is now in production.

  1. AI copilots will never replace humans. (True/False)

    True and false. AI will take over many tasks in sales, support, maintenance, and other domains. However, many complex tasks will still require human intervention.

  1. GPT-3 engines can only do one task. (True/False)

    False. GPT-3 engines can do a wide variety of tasks.

  1. Transformers can be trained to be recommenders. (True/False)

    True. Transformers have gone from language sequences to sequences in many domains.

  1. Transformers can only process language. (True/False)

    False. Once transformers are trained for language sequences, they can analyze many other types of sequences.

  1. A transformer sequence can only contain words. (True/False) ...

Chapter 17, The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

  1. GPT-4 is sentient. (True/False)

    False. GPT-4 is a mathematical algorithm. It does not need to be sentient to learn statistical patterns to do a wide variety of tasks.

  2. ChatGPT can replace a human expert. (True/False)

    False. ChatGPT can produce results based on its datasets. However, it cannot make subject matter expert (SME) decisions.

  3. GPT-4 can generate source code for any task? (True/False)

    False. GPT-4 can generate source code for many tasks. However, for complex problems, human intervention is required.

  4. Advanced prompt engineering is intuitive. (True/False)

    False. Advanced prompt engineering has become a skill that is based on in-depth knowledge of transformers. Advanced prompt engineering involves building knowledge bases, multiple types of objects for the completion of APIs, and understanding the many models available.

  5. The most...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Transformers for Natural Language Processing - Second Edition
Published in: Mar 2022Publisher: PacktISBN-13: 9781803247335
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman