Reader small image

You're reading from  Natural Language Understanding with Python

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781804613429
Edition1st Edition
Right arrow
Author (1)
Deborah A. Dahl
Deborah A. Dahl
author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl

Right arrow

Summary and Looking to the Future

In this chapter, we will get an overview of the book and a look into the future. We will discuss where there is potential for improvement in performance as well as faster training, more challenging applications, and future directions for practical systems and research.

We will cover the following topics in this chapter:

  • Overview of the book
  • Potential for better accuracy and faster training
  • Other areas for improvement
  • Applications that are beyond the current state of the art
  • Future directions

The first section of this chapter is an overall summary of the topics covered in this book.

Overview of the book

This book has covered the basics of natural language understanding (NLU), the technology that enables computers to process natural language and apply the results to a wide variety of practical applications.

The goal of this book has been to provide a solid grounding in NLU using the Python programming language. This grounding will enable you not only to select the right tools and software libraries for developing your own applications but will also provide you with the background you need to independently make use of the many resources available on the internet. You can use these resources to expand your knowledge and skills as you take on more advanced projects and to keep up with the many new tools that are becoming available as this rapidly advancing technology continues to improve.

In this book, we’ve discussed three major topics:

  • In Part 1, we covered background information and how to get started
  • In Part 2, we went over Python tools...

Potential for improvement – better accuracy and faster training

At the beginning of Chapter 13, we listed several criteria that can be used to evaluate NLU systems. The one that we usually think of first is accuracy – that is, given a specific input, did the system provide the right answer? Although in a particular application, we eventually may decide to give another criterion priority over accuracy, accuracy is essential.

Better accuracy

As we saw in Chapter 13, even our best-performing system, the large Bidirectional Encoder Representations from Transformers (BERT) model, only achieved an F1 score of 0.85 on the movie review dataset, meaning that 15% of its classifications were incorrect. State-of-the-art LLM-based research systems currently report an accuracy of 0.93 on this dataset, which still means that the system makes many errors (SiYu Ding, Junyuan Shang, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. 2021. ERNIE-Doc: A Retrospective Long-Document...

Applications that are beyond the current state of the art

This section talks about several applications that are not yet possible, but that are theoretically feasible. In some cases, they could probably be achieved if the right training data and computing resources were available. In other cases, they might require some new algorithmic insights. In all of these examples, it is very interesting to think about how these and other futuristic applications might be accomplished.

Processing very long documents

Current LLMs have relatively small limits on the length of documents (or prompts) they can process. For example, GPT-4 can only handle texts of up to 8,192 tokens (https://platform.openai.com/docs/models/gpt-4), which is around 16 single-spaced pages. Clearly, this means that many existing documents can’t be fully analyzed with these cloud systems. If you are doing a typical classification task, you can train your own model, for example, with a Term frequency-inverse document...

Future directions in NLU technology and research

While the recent improvements in NLU technology based on transformers and LLMs, which we reviewed in Chapter 11, have resulted in very impressive capabilities, it is important to point out that there are many topics in NLU that are far from solved. In this section, we will look at some of the most active research areas – extending NLU to new languages, speech-to-speech translation, multimodal interaction, and avoiding bias.

Quickly extending NLU technologies to new languages

A precise count of the number of currently spoken languages is difficult to obtain. However, according to WorldData.info, there are currently about 6,500 languages spoken throughout the world (https://www.worlddata.info/languages/index.php#:~:text=There%20are%20currently%20around%206%2C500,of%20Asia%2C%20Australia%20and%20Oceania). Some languages, such as Mandarin, English, Spanish, and Hindi, are spoken by many millions of people, while other languages...

Summary

In this chapter, we have summarized the previous chapters in the book, reviewed some areas where NLU technology still faces challenges, and talked about some directions where it could improve in the future. NLU is an extremely dynamic and fast-moving field, and it will clearly continue to develop in many exciting directions. With this book, you have received foundational information about NLU that will enable you to decide not only how to build NLU systems for your current applications but also to take advantage of technological advances as NLU continues to evolve. I hope you will be able to build on the information in this book to create innovative and useful applications that use NLU to solve future practical as well as scientific problems.

Further reading

SiYu Ding, Junyuan Shang, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. 2021. ERNIE-Doc: A Retrospective Long-Document Modeling Transformer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2914–2927, Online. Association for Computational Linguistics

Beltagy, I., Peters, M.E., & Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv, abs/2004.05150

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Understanding with Python
Published in: Jun 2023Publisher: PacktISBN-13: 9781804613429
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl