You're reading from Natural Language Understanding with Python

Product type Book

Published in Jun 2023

Publisher Packt

ISBN-13 9781804613429

Pages 326 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Deborah A. Dahl

Table of Contents (21) Chapters

Preface

Part 1: Getting Started with Natural Language Understanding Technology

Chapter 1: Natural Language Understanding, Related Technologies, and Natural Language Applications

Chapter 2: Identifying Practical Natural Language Understanding Problems

Part 2:Developing and Testing Natural Language Understanding Systems

Chapter 3: Approaches to Natural Language Understanding – Rule-Based Systems, Machine Learning, and Deep Learning

Chapter 4: Selecting Libraries and Tools for Natural Language Understanding

Chapter 5: Natural Language Data – Finding and Preparing Data

Chapter 6: Exploring and Visualizing Data

Chapter 7: Selecting Approaches and Representing Data

Chapter 8: Rule-Based Techniques

Chapter 9: Machine Learning Part 1 – Statistical Machine Learning

Chapter 10: Machine Learning Part 2 – Neural Networks and Deep Learning Techniques

Chapter 11: Machine Learning Part 3 – Transformers and Large Language Models

Chapter 12: Applying Unsupervised Learning Approaches

Chapter 13: How Well Does It Work? – Evaluation

Part 3: Systems in Action – Applying Natural Language Understanding at Scale

Chapter 14: What to Do If the System Isn’t Working

Chapter 15: Summary and Looking to the Future

Index

Why subscribe?

Other Books You May Enjoy

Machine Learning Part 1 – Statistical Machine Learning

In this chapter, we will discuss how to apply classical statistical machine learning techniques such as Naïve Bayes, term frequency-inverse document frequency (TF-IDF), support vector machines (SVMs), and conditional random fields (CRFs) to common natural language processing (NLP) tasks such as classification (or intent recognition) and slot filling.

There are two aspects of these classical techniques that we need to consider: representations and models. Representation refers to the format of the data that we are going to analyze. You will recall from Chapter 7, that it is standard to represent NLP data in formats other than lists of words. Numeric data representation formats such as vectors make it possible to use widely available numeric processing techniques, and consequently open up many possibilities for processing. In Chapter 7, we also explored data representations such as the count bag of words(BoW), TF...

A quick overview of evaluation

Before we look at how different statistical techniques work, we have to have a way to measure their performance, and there are a couple of important considerations that we should review first. The first consideration is the metric or score that we assign to the system’s processing. The most common and simple metric is accuracy, which is the number of correct responses divided by the overall number of attempts. For example, if we’re attempting to measure the performance of a movie review classifier, and we attempt to classify 100 reviews as positive or negative, if the system classifies 75 reviews correctly, the accuracy is 75%. A closely related metric is error rate, which is, in a sense, the opposite of accuracy because it measures how often the system made a mistake. In this example, the error rate would be 25%.

We will only make use of accuracy in this chapter, although there are more precise and informative metrics that are actually...

Representing documents with TF-IDF and classifying with Naïve Bayes

In addition to evaluation, two important topics in the general paradigm of machine learning are representation and processing algorithms. Representation involves converting a text, such as a document, into a numerical format that preserves relevant information about the text. This information is then analyzed by the processing algorithm to perform the NLP application. You’ve already seen a common approach to representation, TF-IDF, in Chapter 7. In this section, we will cover using TF-IDF with a common classification approach, Naïve Bayes. We will explain both techniques and show an example.

Summary of TF-IDF

You will recall the discussion of TF-IDF from Chapter 7. TF-IDF is based on the intuitive goal of trying to find words in documents that are particularly diagnostic of their classification topic. Words that are relatively infrequent in the whole corpus, but which are relatively common in...

Classifying documents with Support Vector Machines (SVMs)

SVMs are a popular and robust tool for text classification in applications such as intent recognition and chatbots. Unlike neural networks, which we will discuss in the next chapter, the training process is usually relatively quick and normally doesn’t require enormous amounts of data. That means that SVMs are good for applications that have to be quickly deployed, perhaps as a preliminary step in the development of a larger-scale application.

The basic idea behind SVMs is that if we represent documents as n-dimensional vectors (for example, the TF-IDF vectors that we discussed in Chapter 7, we want to be able to identify a hyperplane that provides a boundary that separates the documents into two categories with as large a boundary (or margin) as possible.

An illustration of using SVMs on the movie review data is shown here. We start, as usual, by importing the data and creating a train/test split:

import numpy...

Slot-filling with CRFs

In Chapter 8, we discussed the popular application of slot-filling, and we used the spaCy rule engine to find slots for the restaurant search application shown in Figure 8.9. This required writing rules for finding the fillers of each slot in the application. This approach can work fairly well if the potential slot fillers are known in advance, but if they aren’t known in advance, it won’t be possible to write rules. For example, with the rules in the code following Figure 8.9, if a user asked for a new cuisine, say, Thai, the rules wouldn’t be able to recognize Thai as a new filler for the CUISINE slot, and wouldn’t be able to recognize not too far away as a filler for the LOCATION slot. Statistical methods, which we will discuss in this section, can help with this problem.

With statistical methods, the system does not use rules but looks for patterns in its training data that can be applied to new examples. Statistical methods...

Summary

This chapter has explored some of the basic and most useful classical statistical techniques for NLP. They are especially valuable for small projects that start out without a large amount of training data, and for the exploratory work that often precedes a large-scale project.

We started out by learning about some basic evaluation concepts. We learned particularly about accuracy, but we also looked at some confusion matrices. We also learned how to apply Naïve Bayes classification to texts represented in TF-IDF format, and then we worked through the same classification task using a more modern technique, SVMs. Comparing the results produced by Naïve Bayes and SVMs, we saw that we got better performance from the SVMs. We then turned our attention to a related NLP task, slot-filling. We learned about different ways to represent slot-tagged data and finally illustrated CRFs with a restaurant recommendation task. These are all standard approaches that are good to have...