Reader small image

You're reading from  Automated Machine Learning with AutoKeras

Product typeBook
Published inMay 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800567641
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Luis Sobrecueva
Luis Sobrecueva
author image
Luis Sobrecueva

Luis Sobrecueva is a senior software engineer and ML/DL practitioner currently working at Cabify. He has been a contributor to the OpenAI project as well as one of the contributors to the AutoKeras project.
Read more about Luis Sobrecueva

Right arrow

Chapter 7: Sentiment Analysis Using AutoKeras

Let's start by defining the unusual term in the title. Sentiment analysis is a term that's widely used in text classification and it is basically about using natural language processing (NLP) in conjunction with machine learning (ML) to interpret and classify emotions in text.

To get an idea of this, let's imagine the task of determining whether a review for a film is positive or negative. You could do this yourself just by reading it, right? However, if our boss sends us a list of 1,000 movie reviews for tomorrow, things become complicated. That's where sentiment analysis becomes an interesting option.

In this chapter, we will use a text classifier to extract sentiments from text data. Most of the concepts of text classification were already explained in Chapter 4, Image Classification and Regression Using AutoKeras, so in this chapter, we will apply them in a practical way by implementing a sentiment predictor...

Technical requirements

All the code examples in this book are available as Jupyter notebooks that can be downloaded from https://github.com/PacktPublishing/Automated-Machine-Learning-with-AutoKeras.

Since code cells can be executed, each notebook can be self-installed; you just need to add the code snippet with the requirements you need. For this reason, at the beginning of each notebook, there is a code cell for environment setup that installs AutoKeras and its dependencies.

So, to run the code examples for this chapter, you only need a computer with Ubuntu Linux as your OS and install the Jupyter Notebook with the following code:

$ apt-get install python3-pip jupyter-notebook

Alternatively, you can also run these notebooks using Google Colaboratory, in which case you will only need a web browser. See the AutoKeras with Google Colaboratory section of Chapter 2, Getting Started with AutoKeras, for more details. Furthermore, in the Installing AutoKeras section of that chapter...

Creating a sentiment analyzer

The model we are going to create will be a binary classifier for sentiments (1=Positive/0=Negative) from the IMDb sentiments dataset. This is a dataset for binary sentiment classification that contains a set of 25,000 sentiment labeled movie reviews for training and 25,000 for testing:

Figure 7.1 – Example of sentiment analysis being used on two samples

Figure 7.1 – Example of sentiment analysis being used on two samples

Similar to the Reuters example from Chapter 4, Image Classification and Regression Using AutoKeras, each review is encoded as a list of word indexes (integers). For convenience, words are indexed by their overall frequency in the dataset. So, for instance, the integer 3 encodes the third most frequent word in the data.

The notebook that contains the complete source code can be found at https://github.com/PacktPublishing/Automated-Machine-Learning-with-AutoKeras/blob/main/Chapter07/Chapter7_IMDB_sentiment_analysis.ipynb.

Now, let's have a look at the relevant...

Creating the sentiment predictor

Now, we will use the AutoKeras TextClassifier to find the best classification model. Just for this example, we will set max_trials (the maximum number of different Keras models to try) to 2; we do not need to set the epochs parameter; instead, we must define an EarlyStopping callback of 2 epochs so that the training process stops if the validation loss does not improve in two consecutive epochs:

clf = ak.TextClassifier(max_trials=2)
cbs = [tf.keras.callbacks.EarlyStopping(patience=2)]

Let's run the training process and search for the optimal classifier for the training dataset:

clf.fit(x_train, y_train, callbacks=cbs)

Here is the output:

Figure 7.3 – Notebook output of text classifier training

Figure 7.3 – Notebook output of text classifier training

The previous output shows that the accuracy of the training dataset is increasing.

As we can see, we are getting a loss of 0.28 in the validation set. This isn't bad just for a few minutes of training...

Evaluating the model

Now, it's time to evaluate the best model with the testing dataset:

clf.evaluate(x_test, y_test)

Here is the output:

782/782 [==============================] - 41s 52ms/step - loss: 0.3118 - accuracy: 0.8724
 
[0.31183066964149475, 0.8723599910736084]

As we can see, 0.8724 is a really good final prediction accuracy for the time we've invested.

Visualizing the model

Now, we can view a little summary of the architecture for the best generated model:

model = clf.export_model()
model.summary()

Here is the output:

Figure 7.4 – Best model architecture summary

Figure 7.4 – Best model architecture summary

As we can see, AutoKeras, as we did in the classification example in Chapter 4, Image Classification and Regression Using AutoKeras, has chosen a convolution model (Conv1D) for this task. As we explained in the beginning of that chapter, this kind of architecture works really well when the order of the input sentences is not important for the prediction; there are no correlations between the different movie reviews.

Here is a visual representation of this:

Figure 7.5 – Best model architecture visualization graph

Figure 7.5 – Best model architecture visualization graph

As you already know, generating the models and choosing the best one is done by AutoKeras automatically, but let's explain these blocks in more detail.

Each block represents...

Analyzing the sentiment in specific sentences

Now, let's take a look at some predicted samples from the test set:

import tensorflow as tf
tf.get_logger().setLevel('ERROR')
def get_sentiment(val):
    return "Positive" if val == 1 else "Negative"
for i in range(10):
    print(x_test[i])
    print("label: %s, prediction: %s" % (get_sentiment(y_test[i][0]), get_sentiment(clf.predict(x_test[i:i+1])[0][0])))

Here is the output of the preceding code:

Figure 7.6 – Some predictions based on the first 10 sentences of the test dataset

As you can see, the model predictions match every label for the first 10 samples in the test dataset.

Summary

In this chapter, we learned about the importance of sentiment analysis in the real world, as well as how to extract sentiments from text data and how to implement a sentiment predictor in just a few lines of code.

In the next chapter, we will cover a very interesting topic: we will use AutoKeras to classify news topics based on their content by using a text classifier.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Automated Machine Learning with AutoKeras
Published in: May 2021Publisher: PacktISBN-13: 9781800567641
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Luis Sobrecueva

Luis Sobrecueva is a senior software engineer and ML/DL practitioner currently working at Cabify. He has been a contributor to the OpenAI project as well as one of the contributors to the AutoKeras project.
Read more about Luis Sobrecueva