Reader small image

You're reading from  Neural Network Projects with Python

Product typeBook
Published inFeb 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789138900
Edition1st Edition
Languages
Right arrow
Author (1)
James Loy
James Loy
author image
James Loy

James Loy has more than five years, expert experience in data science in the finance and healthcare industries. He has worked with the largest bank in Singapore to drive innovation and improve customer loyalty through predictive analytics. He has also experience in the healthcare sector, where he applied data analytics to improve decision-making in hospitals. He has a master's degree in computer science from Georgia Tech, with a specialization in machine learning. His research interest includes deep learning and applied machine learning, as well as developing computer-vision-based AI agents for automation in industry. He writes on Towards Data Science, a popular machine learning website with more than 3 million views per month.
Read more about James Loy

Right arrow

Sentiment Analysis of Movie Reviews Using LSTM

In previous chapters, we looked at neural network architectures, such as the basic MLP and feedforward neural networks, for classification and regression tasks. We then looked at CNNs, and we saw how they are used for image recognition tasks. In this chapter, we will turn our attention to recurrent neural networks (RNNs) (in particular, to long short-term memory (LSTM) networks) and how they can be used in sequential problems, such as Natural Language Processing (NLP). We will develop and train a LSTM network to predict the sentiment of movie reviews on IMDb.

In this chapter, we'll cover the following topics:

  • Sequential problems in machine learning
  • NLP and sentiment analysis
  • Introduction to RNNs and LSTM networks
  • Analysis of the IMDb movie reviews dataset
  • Word embeddings
  • A step-by-step guide to building and training an LSTM...

Technical requirements

The Python libraries required for this chapter are as follows:

  • matplotlib 3.0.2
  • Keras 2.2.4
  • seaborn 0.9.0
  • scikit-learn 0.20.2

The code for this chapter can be found in the GitHub repository for the book.

To download the code onto your computer, you may run the following git clone command:

$ git clone https://github.com/PacktPublishing/Neural-Network-Projects-with-Python.git

After the process is complete, there will be a folder entitled Neural-Network-Projects-with-Python. Enter the folder by running the following:

$ cd Neural-Network-Projects-with-Python

To install the required Python libraries in a virtual environment, run the following command:

$ conda env create -f environment.yml

Note that you should have installed Anaconda on your computer first, before running this command. To enter the virtual environment, run the following command:

$ conda activate...

Sequential problems in machine learning

Sequential problems are a class of problem in machine learning in which the order of the features presented to the model is important for making predictions. Sequential problems are commonly encountered in the following scenarios:

  • NLP, including sentiment analysis, language translation, and text prediction
  • Time series predictions

For example, let's consider the text prediction problem, as shown in the following screenshot, which falls under NLP:

Human beings have an innate ability for this, and it is trivial for us to know that the word in the blank is probably the word Japanese. The reason for this is that as we read the sentence, we process the words as a sequence. The sequence of the words captures the information required to make the prediction. By contrast, if we discard the sequential information and only consider the words...

NLP and sentiment analysis

NLP is a subfield in artificial intelligence (AI) that is concerned with the interaction of computers and human languages. As early as the 1950s, scientists were interested in designing intelligent machines that could understand human languages. Early efforts to create a language translator focused on the rule-based approach, where a group of linguistic experts handcrafted a set of rules to be encoded in machines. However, this rule-based approach produced results that were sub-optimal, and, often, it was impossible to convert these rules from one language to another, which meant that scaling up was difficult. For many decades, not much progress was made in NLP, and human language was a goal that AI couldn't reach—until the resurgence of deep learning.

With the proliferation of deep learning and neural networks in the image classification...

RNN

Up until now, we have used neural networks such as the MLP, feedforward neural network, and CNN in our projects. The constraint faced by these neural networks is that they only accept a fixed input vector such as an image, and output another vector. The high-level architecture of these neural networks can be summarized by the following diagram:

This restrictive architecture makes it difficult for CNNs to work with sequential data. To work with sequential data, the neural network needs to take in specific bits of the data at each time step, in the sequence that it appears. This provides the idea for an RNN. An RNN has high-level architecture, as shown in the following diagram:

From the previous diagram, we can see that an RNN is a multi-layered neural network. We can break up the raw input, splitting it into time steps. For example, if the raw input is a sentence, we can...

The LSTM network

LSTMs are a variation of RNNs, and they solve the long-term dependency problem faced by conventional RNNs. Before we dive into the technicalities of LSTMs, it is useful to understand the intuition behind them.

LSTMs – the intuition

As we explained in the previous section, LSTMs were designed to overcome the problem with long-term dependencies. Let's assume we have this movie review:

Our task is to predict whether the reviewer liked the movie. As we read this review, we immediately understand that this review is positive. In particular, the following words (highlighted) are the most important:

If we think about it, only the highlighted words are important, and we can ignore the rest of the words...

The IMDb movie reviews dataset

At this point, let's take a quick look at the IMDb movie reviews dataset before we start building our model. It is always a good practice to understand our data before we build our model.

The IMDb movie reviews dataset is a corpus of movie reviews posted on the popular movie reviews website https://www.imdb.com/. Each movie review has a label indicating whether the review is positive (1) or negative (0).

The IMDb movie reviews dataset is provided in Keras, and we can import it by simply calling the following code:

from keras.datasets import imdb
training_set, testing_set = imdb.load_data(index_from = 3)
X_train, y_train = training_set
X_test, y_test = testing_set

We can print out the first movie review as follows:

print(X_train[0])

We'll see the following output:

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36...

Representing words as vectors

So far, we have looked at what RNNs and LSTM networks represent. There remains an important question we need to address: how do we represent words as input data for our neural network? In the case of CNNs, we saw how images are essentially three-dimensional vectors/matrixes, with dimensions represented by the image width, height, and the number of channels (three channels for color images). The values in the vectors represent the intensity of each individual pixel.

One-hot encoding

How do we create a similar vector/matrix for words so that they can be used as input to our neural network? In earlier chapters, we saw how categorical variables such as the day of week can be one-hot encoded to numerical...

Model architecture

Let's take a look at the model architecture of our IMDb movie review sentiment analyzer, shown in the following diagram:

This should be fairly familiar to you by now! Let's go through each component briefly.

Input

The input to our neural network shall be IMDb movie reviews. The reviews will be in the form of English sentences. As we've seen, the dataset provided in Keras has already encoded the English words into numbers, as neural networks require numerical inputs. However, there remains a problem we need to address. As we know, movie reviews have different lengths. If we were to represent the reviews as a vector, then different reviews would have different vector lengths, which is not...

Model building in Keras

We're finally ready to start building our model in Keras. As a reminder, the model architecture that we're going to use is shown in the previous section.

Importing data

First, let's import the dataset. The IMDb movie reviews dataset is already provided in Keras, so we can import it directly:

from keras.datasets import imdb

The imdb class has a load_data main function, which takes in the following important argument:

  • num_words: This is defined as the maximum number of unique words to be loaded. Only the n most common unique words (as they appear in the dataset) will be loaded. If n is small, the training time will be faster at the expense of accuracy. Let's set num_words = 10000...

Analyzing the results

Let's plot the validation accuracy per epoch for the three different models. First, we plot for the model trained using the sgd optimizer:

from matplotlib import pyplot as plt

plt.plot(range(1,11), SGD_score.history['acc'], label='Training Accuracy')
plt.plot(range(1,11), SGD_score.history['val_acc'],
label='Validation Accuracy')
plt.axis([1, 10, 0, 1])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Train and Validation Accuracy using SGD Optimizer')
plt.legend()
plt.show()

We get the following output:

Did you notice anything wrong? The training and validation accuracy is stuck at 50%! Essentially, this shows that the training has failed and our neural network performs no better than a random coin toss for this binary classification task. Clearly, the sgd optimizer is not...

Putting it all together

We have covered a lot in this chapter. Let's consolidate all our code here:

from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Embedding
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Import IMDB dataset
training_set, testing_set = imdb.load_data(num_words = 10000)
X_train, y_train = training_set
X_test, y_test = testing_set

print("Number of training samples = {}".format(X_train.shape[0]))
print("Number of testing samples = {}".format(X_test.shape[0]))

# Zero-Padding
X_train_padded = sequence.pad_sequences(X_train, maxlen= 100)
X_test_padded = sequence.pad_sequences(X_test, maxlen= 100)

print("X_train vector shape = {}".format...

Summary

In this chapter, we created an LSTM-based neural network that can predict the sentiment of movie reviews with 85% accuracy. We first looked at the theory behind recurrent neural networks and LSTMs, and we understood that they are a special class of neural network designed to handle sequential data, where the order of the data matters.

We also looked at how we can convert sequential data such as a paragraph of text into a numerical vector, as input for neural networks. We saw how word embeddings can reduce the dimensionality of such a numerical vector into something more manageable for training neural networks, without necessarily losing information. A word embedding layer does this by learning which words are similar to one another, and it places such words in a cluster, in the transformed vector.

We also looked at how we can easily construct a LSTM neural network in...

Questions

  1. What are sequential problems in machine learning?

Sequential problems are a class of problem in machine learning in which the order of the features presented to the model is important for making predictions. Examples of sequential problems include NLP problems (for example, speech and text) and time series problems.

  1. What are some reasons that make it challenging for AI to solve sentiment analysis problems?

Human languages often contain words that have different meanings, depending on the context. It is therefore important for a machine learning model to fully understand the context before making a prediction. Furthermore, sarcasm is common in human languages, which is difficult for an AI-based model to comprehend.

  1. How is an RNN different than a CNN?

RNNs can be thought of as multiple, recursive copies of a single neural network. Each layer in an RNN passes its...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Neural Network Projects with Python
Published in: Feb 2019Publisher: PacktISBN-13: 9781789138900
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
James Loy

James Loy has more than five years, expert experience in data science in the finance and healthcare industries. He has worked with the largest bank in Singapore to drive innovation and improve customer loyalty through predictive analytics. He has also experience in the healthcare sector, where he applied data analytics to improve decision-making in hospitals. He has a master's degree in computer science from Georgia Tech, with a specialization in machine learning. His research interest includes deep learning and applied machine learning, as well as developing computer-vision-based AI agents for automation in industry. He writes on Towards Data Science, a popular machine learning website with more than 3 million views per month.
Read more about James Loy