You're reading from Natural Language Understanding with Python

Product type Book

Published in Jun 2023

Publisher Packt

ISBN-13 9781804613429

Pages 326 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Deborah A. Dahl

Table of Contents (21) Chapters

Preface

Part 1: Getting Started with Natural Language Understanding Technology

Chapter 1: Natural Language Understanding, Related Technologies, and Natural Language Applications

Chapter 2: Identifying Practical Natural Language Understanding Problems

Part 2:Developing and Testing Natural Language Understanding Systems

Chapter 3: Approaches to Natural Language Understanding – Rule-Based Systems, Machine Learning, and Deep Learning

Chapter 4: Selecting Libraries and Tools for Natural Language Understanding

Chapter 5: Natural Language Data – Finding and Preparing Data

Chapter 6: Exploring and Visualizing Data

Chapter 7: Selecting Approaches and Representing Data

Chapter 8: Rule-Based Techniques

Chapter 9: Machine Learning Part 1 – Statistical Machine Learning

Chapter 10: Machine Learning Part 2 – Neural Networks and Deep Learning Techniques

Chapter 11: Machine Learning Part 3 – Transformers and Large Language Models

Chapter 12: Applying Unsupervised Learning Approaches

Chapter 13: How Well Does It Work? – Evaluation

Part 3: Systems in Action – Applying Natural Language Understanding at Scale

Chapter 14: What to Do If the System Isn’t Working

Chapter 15: Summary and Looking to the Future

Index

Why subscribe?

Other Books You May Enjoy

Machine Learning Part 3 – Transformers and Large Language Models

In this chapter, we will cover the currently best-performing techniques in natural language processing (NLP) – transformers and pretrained models. We will discuss the concepts behind transformers and include examples of using transformers and large language models (LLMs) for text classification. The code for this chapter will be based on the TensorFlow/Keras Python libraries and the cloud services provided by OpenAI.

The topics covered in this chapter are important because although transformers and LLMs are only a few years old, they have become state-of-the-art for many different types of NLP applications. In fact, LLM systems such as ChatGPT have been widely covered in the press and you have undoubtedly encountered references to them. You have probably even used their online interfaces. In this chapter, you will learn how to work with the technology behind these systems, which should be part of the...

Technical requirements

The code that we will go over in this chapter makes use of a number of open source software libraries and resources. We have used many of these in earlier chapters, but we will list them here for convenience:

The Tensorflow machine learning libraries: hub, text, and tf-models
The Python numerical package, NumPy
The Matplotlib plotting and graphical package
The IMDb movie reviews dataset
scikit-learn’s sklearn.model_selection to do the training, validation, and test split
A BERT model from TensorFlow Hub: we’re using this one –'small_bert/bert_en_uncased_L-4_H-512_A-8' – but you can use any other BERT model you like, bearing in mind that larger models might take a long time to train

Note that we have kept the models relatively small here so that they don’t require an especially powerful computer. The examples in this chapter were tested on a Windows 10 machine with an Intel 3.4 GHz...

Overview of transformers and LLMs

Transformers and LLMs are currently the best-performing technologies for natural language understanding (NLU). This does not mean that the approaches covered in earlier chapters are obsolete. Depending on the requirements of a specific NLP project, some of the simpler approaches may be more practical or cost-effective. In this chapter, you will get information about the more recent approaches that you can use to make that decision.

There is a great deal of information about the theoretical aspects of these techniques available on the internet, but here we will focus on applications and explore how these technologies can be applied to solving practical NLU problems.

As we saw in Chapter 10, recurrent neural networks (RNNs) have been a very effective approach in NLP because they don’t assume that the elements of input, specifically words, are independent, and so are able to take into account sequences of input elements such as the order...

BERT and its variants

As an example of an LLM technology based on transformers, we will demonstrate the use of BERT, a widely used state-of-the-art system. BERT is an open source NLP approach developed by Google that is the foundation of today’s state-of-the-art NLP systems. The source code for BERT is available at https://github.com/google-research/bert.

BERT’s key technical innovation is that the training is bidirectional, that is, taking both previous and later words in input into account. A second innovation is that BERT’s pretraining uses a masked language model, where the system masks out a word in the training data and attempts to predict it.

BERT also uses only the encoder part of the encoder-decoder architecture because, unlike machine translation systems, it focuses only on understanding; it doesn’t produce language.

Another advantage of BERT, unlike the systems we’ve discussed earlier in this book, is that the training process...

Using BERT – a classification example

In this example, we’ll use BERT for classification, using the movie review dataset we saw in earlier chapters. We will start with a pretrained BERT model and fine-tune it to classify movie reviews. This is a process that you can follow if you want to apply BERT to your own data.

Using BERT for specific applications starts with one of the pretrained models available from TensorFlow Hub (https://tfhub.dev/tensorflow) and then fine-tuning it with training data that is specific to the application. It is recommended to start with one of the small BERT models, which have the same architecture as BERT but are faster to train. Generally, the smaller models are less accurate, but if their accuracy is adequate for the application, it isn’t necessary to take the extra time and computer resources that would be needed to use a larger model. There are many models of various sizes that can be downloaded from TensorFlow Hub.

BERT models...

Cloud-based LLMs

Recently, there have been a number of cloud-based pretrained large language models that have shown very impressive performance because they have been trained on very large amounts of data. In contrast to BERT, they are too large to be downloaded and used locally. In addition, some are closed and proprietary and can’t be downloaded for that reason. These newer models are based on the same principles as BERT, and they have shown a very impressive performance. This impressive performance is due to the fact that these models have been trained with much larger amounts of data than BERT. Because they cannot be downloaded, it is important to keep in mind that they aren’t appropriate for every application. Specifically, if there are any privacy or security concerns regarding the data, it may not be a good idea to send it to the cloud for processing. Some of these systems are GPT-2, GPT-3, GPT-4, ChatGPT, and OPT-175B, and new LLMs are being published on a frequent...

Summary

This chapter covered the currently best-performing techniques in NLP – transformers and pretrained models. In addition, we have demonstrated how they can be applied to processing your own application-specific data, using both local pretrained models and cloud-based models.

Specifically, you learned about the basic concepts behind attention, transformers, and pretrained models, and then applied the BERT pretrained transformer system to a classification problem. Finally, we looked at using the cloud-based GPT-3 systems for generating data and for processing application-specific data.

In Chapter 12, we will turn to a different topic – unsupervised learning. Up to this point, all of our models have been supervised, which you will recall means that the data has been annotated with the correct processing result. Next, we will discuss applications of unsupervised learning. These applications include topic modeling and clustering. We will also talk about the value...