Reader small image

You're reading from  Natural Language Understanding with Python

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781804613429
Edition1st Edition
Right arrow
Author (1)
Deborah A. Dahl
Deborah A. Dahl
author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl

Right arrow

Machine Learning Part 2 – Neural Networks and Deep Learning Techniques

Neural networks (NNs) have only became popular in natural language understanding (NLU) around 2010 but have since been widely applied to many problems. In addition, there are many applications of NNs to non-natural language processing (NLP) problems such as image classification. The fact that NNs are a general approach that can be applied across different research areas has led to some interesting synergies across these fields.

In this chapter, we will cover the application of machine learning (ML) techniques based on NNs to problems such as NLP classification. We will also cover several different kinds of commonly used NNs—specifically, fully connected multilayer perceptrons (MLPs), convolutional NNs (CNNs), and recurrent NNs (RNNs)—and show how they can be applied to problems such as classification and information extraction. We will also discuss fundamental NN concepts such as hyperparameters...

Basics of NNs

The basic concepts behind NNs have been studied for many years but have only fairly recently been applied to NLP problems on a large scale. Currently, NNs are one of the most popular tools for solving NLP tasks. NNs are a large field and are very actively researched, so we won’t be able to give you a comprehensive understanding of NNs for NLP. However, we will attempt to provide you with some basic knowledge that will let you apply NNs to your own problems.

NNs are inspired by some properties of the animal nervous system. Specifically, animal nervous systems consist of a network of interconnected cells, called neurons, that transmit information throughout the network with the result that, given an input, the network produces an output that represents a decision about the input.

Artificial NNs (ANNs) are designed to model this process in some respects. The decision about how to react to the inputs is determined by a sequence of processing steps starting with...

Example – MLP for classification

We will review basic NN concepts by looking at the MLP, which is conceptually one of the most straightforward types of NNs. The example we will use is the classification of movie reviews into reviews with positive and negative sentiments. Since there are only two possible categories, this is a binary classification problem. We will use the Sentiment Labelled Sentences Data Set (From Group to Individual Labels using Deep Features, Kotzias et al., KDD 2015 https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences), available from the University of California, Irvine. Start by downloading the data and unzipping it into a directory in the same directory as your Python script. You will see a directory called sentiment labeled sentences that contains the actual data in a file called imdb_labeled.txt. You can install the data into another directory of your choosing, but if you do, be sure to modify the filepath_dict variable accordingly.

...

Hyperparameters and tuning

Figure 10.4 clearly shows that increasing the number of training epochs is not going to improve performance on this task. The best validation accuracy seems to be about 80% after 10 epochs. However, 80% accuracy is not very good. How can we improve it? Here are some ideas. None of them is guaranteed to work, but it is worth experimenting with them:

  • If more training data is available, the amount of training data can be increased.
  • Preprocessing techniques that can remove noise from the training data can be investigated—for example, stopword removal, removing non-words such as numbers and HTML tags, stemming and lemmatization, and lowercasing. Details on these techniques were covered in Chapter 5.
  • Changes to the learning rate—for example, lowering the learning rate might improve the ability of the network to avoid local minima.
  • Decreasing the batch size.
  • Changing the number of layers and the number of neurons in each layer...

Moving beyond MLPs – RNNs

RNNs are a type of NN that is able to take into account the order of items in an input. In the example of the MLP that was discussed previously, the vector representing the entire input (that is, the complete document) was fed to the NN at once, so the network had no way of taking into account the order of words in the document. However, this is clearly an oversimplification in the case of text data since the order of words can be very important to the meaning. RNNs are able to take into account the order of words by using earlier outputs as inputs to later layers. This can be especially helpful in certain NLP problems where the order of words is very important, such as named entity recognition (NER), part-of-speech (POS) tagging, or slot labeling.

A diagram of a unit of an RNN is shown in Figure 10.5:

Figure 10.5 – A unit of an RNN

Figure 10.5 – A unit of an RNN

The unit is shown at time t. The input at time t, x(t), is passed to the activation...

Looking at another approach – CNNs

CNNs are very popular for image recognition tasks, but they are less often used for NLP tasks than RNNs because they don’t take into account the temporal order of items in the input. However, they can be useful for document classification tasks. As you will recall from earlier chapters, the representations that are often used in classification depend only on the words that occur in the document—BoW and TF-IDF, for example—so, effective classification can often be accomplished without taking word order into account.

To classify documents with CNNs, we can represent a text as an array of vectors, where each word is mapped to a vector in a space made up of the full vocabulary. We can use word2vec, which we discussed in Chapter 7, to represent word vectors. Training a CNN for text classification with Keras is very similar to the training process that we worked through in MLP classification. We create a sequential model as...

Summary

In this chapter, we have explored applications of NNs to document classification in NLP. We covered the basic concepts of NNs, reviewed a simple MLP, and applied it to a binary classification problem. We also provided some suggestions for improving performance by modifying hyperparameters and tuning. Finally, we discussed the more advanced types of NNs—RNNs and CNNs.

In Chapter 11, we will cover the currently best-performing techniques in NLP—transformers and pretrained models.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Understanding with Python
Published in: Jun 2023Publisher: PacktISBN-13: 9781804613429
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl