You're reading from Mastering Transformers

Product type Book

Published in Sep 2021

Publisher Packt

ISBN-13 9781801077651

Pages 374 pages

Edition 1st Edition

Languages

Concepts

Mobile Application Development

Authors (2):

Savaş Yıldırım

Meysam Asgari- Chenaghlu

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications

2. Chapter 1: From Bag-of-Words to the Transformer

3. Chapter 2: A Hands-On Introduction to the Subject

4. Section 2: Transformer Models – From Autoencoding to Autoregressive Models

5. Chapter 3: Autoencoding Language Models

6. Chapter 4:Autoregressive and Other Language Models

7. Chapter 5: Fine-Tuning Language Models for Text Classification

8. Chapter 6: Fine-Tuning Language Models for Token Classification

9. Chapter 7: Text Representation

10. Section 3: Advanced Topics

11. Chapter 8: Working with Efficient Transformers

12. Chapter 9:Cross-Lingual and Multilingual Language Modeling

13. Chapter 10: Serving Transformer Models

14. Chapter 11: Attention Visualization and Experiment Tracking

15. Other Books You May Enjoy

Chapter 6: Fine-Tuning Language Models for Token Classification

In this chapter, we will learn about fine-tuning language models for token classification. Tasks such as Named Entity Recognition (NER), Part-of-Speech (POS) tagging, and Question Answering (QA) are explored in this chapter. We will learn how a specific language model can be fine-tuned on such tasks. We will focus on BERT more than other language models. You will learn how to apply POS, NER, and QA using BERT. You will get familiar with the theoretical details of these tasks such as their respective datasets and how to perform them. After finishing this chapter, you will be able to perform any token classification using Transformers.

In this chapter, we will fine-tune BERT for the following tasks: fine-tuning BERT for token classification problems such as NER and POS, fine-tuning a language model for an NER problem, and thinking of the QA problem as a start/stop token classification.

The following topics will be...

Technical requirements

We will be using Jupyter Notebook to run our coding exercises and Python 3.6+ and the following packages need to be installed:

sklearn
transformers 4.0+
Datasets
seqeval

All notebooks with coding exercises will be available at the following GitHub link: https://github.com/PacktPublishing/Mastering-Transformers/tree/main/CH06.

Check out the following link to see the Code in Action video: https://bit.ly/2UGMQP2

Introduction to token classification

The task of classifying each token in a token sequence is called token classification. This task says that a specific model must be able to classify each token into a class. POS and NER are two of the most well-known tasks in this criterion. However, QA is also another major NLP task that fits in this category. We will discuss the basics of these three tasks in the following sections.

Understanding NER

One of the well-known tasks in the category of token classification is NER – the recognition of each token as an entity or not and identifying the type of each detected entity. For example, a text can contain multiple entities at the same time – person names, locations, organizations, and other types of entities. The following text is a clear example of NER:

George Washington is one the presidents of the United States of America.

George Washington is a person name while the United States of America is a location name. A sequence...

Fine-tuning language models for NER

In this section, we will learn how to fine-tune BERT for an NER task. We first start with the datasets library and by loading the conll2003 dataset.

The dataset card is accessible at https://huggingface.co/datasets/conll2003. The following screenshot shows this model card from the HuggingFace website:

Figure 6.4 – CONLL2003 dataset card from HuggingFace

From this screenshot, it can be seen that the model is trained on this dataset and is currently available and listed in the right panel. However, there are also descriptions of the dataset such as its size and its characteristics:

To load the dataset, the following commands are used:
```
import datasets
conll2003 = datasets.load_dataset("conll2003")
```
A download progress bar will appear and after finishing the downloading and caching, the dataset will be ready to use. The following screenshot shows the progress bars:
Figure 6.5 – Downloading and preparing...

Question answering using token classification

A QA problem is generally defined as an NLP problem with a given text and a question for AI, and getting an answer back. Usually, this answer can be found in the original text but there are different approaches to this problem. In the case of Visual Question Answering (VQA), the question is about a visual entity or visual concept rather than text but the question itself is in the form of text.

Some examples of VQA are as follows:

Figure 6.10 – VQA examples

Most of the models that are intended to be used in VQA are multimodal models that can understand the visual context along with the question and generate the answer properly. However, unimodal fully textual QA or just QA is based on textual context and textual questions with respective textual answers:

SQUAD is one of the most well-known datasets in the field of QA. To see examples of SQUAD and examine them, you can use the following code:
```
from...
```

Summary

In this chapter, we discussed how to fine-tune a pretrained model to any token classification task. Fine-tuning models on NER and QA problems were explored. Using the pretrained and fine-tuned models on specific tasks with pipelines was detailed with examples. We also learned about various preprocessing steps for these two tasks. Saving pretrained models that are fine-tuned on specific tasks was another major learning point of this chapter. We also saw how it is possible to train models with a limited input size on tasks such as QA that have longer sequence sizes than the model input. Using tokenizers more efficiently to have document splitting with document stride was another important item in this chapter too.

In the next chapter, we will discuss text representation methods using Transformers. By studying the chapter, you will learn how to perform zero-/few-shot learning and semantic text clustering.