You're reading from Natural Language Understanding with Python

Product type Book

Published in Jun 2023

Publisher Packt

ISBN-13 9781804613429

Pages 326 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Deborah A. Dahl

Table of Contents (21) Chapters

Preface

Part 1: Getting Started with Natural Language Understanding Technology

Chapter 1: Natural Language Understanding, Related Technologies, and Natural Language Applications

Chapter 2: Identifying Practical Natural Language Understanding Problems

Part 2:Developing and Testing Natural Language Understanding Systems

Chapter 3: Approaches to Natural Language Understanding – Rule-Based Systems, Machine Learning, and Deep Learning

Chapter 4: Selecting Libraries and Tools for Natural Language Understanding

Chapter 5: Natural Language Data – Finding and Preparing Data

Chapter 6: Exploring and Visualizing Data

Chapter 7: Selecting Approaches and Representing Data

Chapter 8: Rule-Based Techniques

Chapter 9: Machine Learning Part 1 – Statistical Machine Learning

Chapter 10: Machine Learning Part 2 – Neural Networks and Deep Learning Techniques

Chapter 11: Machine Learning Part 3 – Transformers and Large Language Models

Chapter 12: Applying Unsupervised Learning Approaches

Chapter 13: How Well Does It Work? – Evaluation

Part 3: Systems in Action – Applying Natural Language Understanding at Scale

Chapter 14: What to Do If the System Isn’t Working

Chapter 15: Summary and Looking to the Future

Index

Why subscribe?

Other Books You May Enjoy

What to Do If the System Isn’t Working

In this chapter, we will discuss how to improve systems. If the original model’s first round of training fails to produce a satisfactory performance or the real-world scenario that the system addresses undergoes changes, we need to modify something to enhance the system’s performance. In this chapter, we will discuss techniques such as adding new data and changing the structure of an application, while at the same time ensuring that new data doesn’t degrade the performance of the existing system. Clearly, this is a big topic, and there is a lot of room to explore how to improve the performance of natural language understanding (NLU) systems. It isn’t possible to cover all the possibilities here, but this chapter should give you a good perspective on the most important options and techniques that can improve system performance.

We will cover the following topics in this chapter:

Figuring out that a...

Technical requirements

We will be using the following data and software to run the examples in this chapter:

Our usual development environment – that is, Python 3 and Jupyter Notebook
The TREC dataset
The Matplotlib and Seaborn packages, which we will use to display graphical charts
pandas and NumPy for numerical manipulation of data
The BERT NLU system, previously used in Chapter 11 and Chapter 13
The Keras machine learning library, for working with BERT
NLTK, which we will use for generating new data
An OpenAI API key which we will use to access the OpenAI tools

Figuring out that a system isn’t working

Figuring out whether a system isn’t working as well as it should be is important, both during initial development as well as during ongoing deployment. We’ll start by looking at poor performance during initial development.

Initial development

The primary techniques we will use to determine that our system isn’t working as well as we'd like are the evaluation techniques we learned about in Chapter 13. We will apply those in this chapter. We will also use confusion matrices to detect specific classes that don’t work as well as the other classes.

It is always a good idea to look at the dataset at the outset and check the balance of categories because unbalanced data is a common source of problems. Unbalanced data does not necessarily mean that there will be accuracy problems, but it’s valuable to understand our class balance at the beginning. That way, we will be prepared to address accuracy...

Fixing accuracy problems

In this section, we will look at fixing performance problems through two strategies. The first one involves issues that can be addressed by changing data, and the second strategy involves issues that require restructuring the application. Generally, changing the data is easier, and it is a better strategy if it is important to keep the structure of the application the same – that is, we don’t want to remove classes or introduce new classes. We’ll start by discussing changing the data and then discuss restructuring the application.

Changing data

Changing data can greatly improve the performance of your system; however, you won’t always have this option. For example, you might not have control over the dataset if you work with a standard dataset that you intend to compare to other researchers’ work. You can’t change the data if you are in that situation because if you do, your system’s performance won’...

Moving on to deployment

If we’ve fixed the performance issues we’ve discussed so far, we will have trained a model that meets our performance expectations, and we can move on to deployment, when the system is installed and does the task that it was designed for. Like any software, a deployed NLU model can have problems with system and hardware issues, such as network issues, scalability, and general software problems. We won’t discuss these kinds of problems because they aren’t specific to NLU.

The next section will cover considerations to address NLU performance problems that occur after deployment.

Problems after deployment

After an NLU system is developed and put into place in an application, it still requires monitoring. Once the system has reached an acceptable level of performance and has been deployed, it can be tempting to leave it alone and assume that it doesn’t need any more attention, but this is not the case. At the very least, the deployed system will receive a continuous stream of new data that can be challenging to the existing system if it is different from the training data in some way. On the other hand, if it is not different, it can be used as new training data. Clearly, it is better to detect performance problems from internal testing than to learn about them from negative customer feedback.

At a high level, we can think of new performance problems as either being due to a change in the system itself, or due to a change in the deployment context.

Changes in system performance due to system changes should be detected by testing before the new system...

Summary

In this chapter, you have learned about a number of important strategies to improve the performance of NLU applications. You first learned how to do an initial survey of the data and identify possible problems with the training data. Then, you learned how to find and diagnose problems with accuracy. We then described different strategies to improve performance – specifically, adding data and restructuring the application. The final topic we covered was a review of problems that can occur in deployed applications and how they can be addressed.

In the final chapter, we will provide an overview of the book and a look to the future. We will discuss where there is potential for improvement in the state of the art of NLU performance, as well as faster training, more challenging applications, and what we can expect from NLU technology as the new LLMs become more widely used.