Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Natural Language Understanding with Python

You're reading from  Natural Language Understanding with Python

Product type Book
Published in Jun 2023
Publisher Packt
ISBN-13 9781804613429
Pages 326 pages
Edition 1st Edition
Languages
Author (1):
Deborah A. Dahl Deborah A. Dahl
Profile icon Deborah A. Dahl

Table of Contents (21) Chapters

Preface Part 1: Getting Started with Natural Language Understanding Technology
Chapter 1: Natural Language Understanding, Related Technologies, and Natural Language Applications Chapter 2: Identifying Practical Natural Language Understanding Problems Part 2:Developing and Testing Natural Language Understanding Systems
Chapter 3: Approaches to Natural Language Understanding – Rule-Based Systems, Machine Learning, and Deep Learning Chapter 4: Selecting Libraries and Tools for Natural Language Understanding Chapter 5: Natural Language Data – Finding and Preparing Data Chapter 6: Exploring and Visualizing Data Chapter 7: Selecting Approaches and Representing Data Chapter 8: Rule-Based Techniques Chapter 9: Machine Learning Part 1 – Statistical Machine Learning Chapter 10: Machine Learning Part 2 – Neural Networks and Deep Learning Techniques Chapter 11: Machine Learning Part 3 – Transformers and Large Language Models Chapter 12: Applying Unsupervised Learning Approaches Chapter 13: How Well Does It Work? – Evaluation Part 3: Systems in Action – Applying Natural Language Understanding at Scale
Chapter 14: What to Do If the System Isn’t Working Chapter 15: Summary and Looking to the Future Index Other Books You May Enjoy

What to Do If the System Isn’t Working

In this chapter, we will discuss how to improve systems. If the original model’s first round of training fails to produce a satisfactory performance or the real-world scenario that the system addresses undergoes changes, we need to modify something to enhance the system’s performance. In this chapter, we will discuss techniques such as adding new data and changing the structure of an application, while at the same time ensuring that new data doesn’t degrade the performance of the existing system. Clearly, this is a big topic, and there is a lot of room to explore how to improve the performance of natural language understanding (NLU) systems. It isn’t possible to cover all the possibilities here, but this chapter should give you a good perspective on the most important options and techniques that can improve system performance.

We will cover the following topics in this chapter:

  • Figuring out that a...

Technical requirements

We will be using the following data and software to run the examples in this chapter:

  • Our usual development environment – that is, Python 3 and Jupyter Notebook
  • The TREC dataset
  • The Matplotlib and Seaborn packages, which we will use to display graphical charts
  • pandas and NumPy for numerical manipulation of data
  • The BERT NLU system, previously used in Chapter 11 and Chapter 13
  • The Keras machine learning library, for working with BERT
  • NLTK, which we will use for generating new data
  • An OpenAI API key which we will use to access the OpenAI tools

Figuring out that a system isn’t working

Figuring out whether a system isn’t working as well as it should be is important, both during initial development as well as during ongoing deployment. We’ll start by looking at poor performance during initial development.

Initial development

The primary techniques we will use to determine that our system isn’t working as well as we'd like are the evaluation techniques we learned about in Chapter 13. We will apply those in this chapter. We will also use confusion matrices to detect specific classes that don’t work as well as the other classes.

It is always a good idea to look at the dataset at the outset and check the balance of categories because unbalanced data is a common source of problems. Unbalanced data does not necessarily mean that there will be accuracy problems, but it’s valuable to understand our class balance at the beginning. That way, we will be prepared to address accuracy...

Fixing accuracy problems

In this section, we will look at fixing performance problems through two strategies. The first one involves issues that can be addressed by changing data, and the second strategy involves issues that require restructuring the application. Generally, changing the data is easier, and it is a better strategy if it is important to keep the structure of the application the same – that is, we don’t want to remove classes or introduce new classes. We’ll start by discussing changing the data and then discuss restructuring the application.

Changing data

Changing data can greatly improve the performance of your system; however, you won’t always have this option. For example, you might not have control over the dataset if you work with a standard dataset that you intend to compare to other researchers’ work. You can’t change the data if you are in that situation because if you do, your system’s performance won’...

Moving on to deployment

If we’ve fixed the performance issues we’ve discussed so far, we will have trained a model that meets our performance expectations, and we can move on to deployment, when the system is installed and does the task that it was designed for. Like any software, a deployed NLU model can have problems with system and hardware issues, such as network issues, scalability, and general software problems. We won’t discuss these kinds of problems because they aren’t specific to NLU.

The next section will cover considerations to address NLU performance problems that occur after deployment.

Problems after deployment

After an NLU system is developed and put into place in an application, it still requires monitoring. Once the system has reached an acceptable level of performance and has been deployed, it can be tempting to leave it alone and assume that it doesn’t need any more attention, but this is not the case. At the very least, the deployed system will receive a continuous stream of new data that can be challenging to the existing system if it is different from the training data in some way. On the other hand, if it is not different, it can be used as new training data. Clearly, it is better to detect performance problems from internal testing than to learn about them from negative customer feedback.

At a high level, we can think of new performance problems as either being due to a change in the system itself, or due to a change in the deployment context.

Changes in system performance due to system changes should be detected by testing before the new system...

Summary

In this chapter, you have learned about a number of important strategies to improve the performance of NLU applications. You first learned how to do an initial survey of the data and identify possible problems with the training data. Then, you learned how to find and diagnose problems with accuracy. We then described different strategies to improve performance – specifically, adding data and restructuring the application. The final topic we covered was a review of problems that can occur in deployed applications and how they can be addressed.

In the final chapter, we will provide an overview of the book and a look to the future. We will discuss where there is potential for improvement in the state of the art of NLU performance, as well as faster training, more challenging applications, and what we can expect from NLU technology as the new LLMs become more widely used.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Natural Language Understanding with Python
Published in: Jun 2023 Publisher: Packt ISBN-13: 9781804613429
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}