You're reading from Automated Machine Learning with AutoKeras

Product type Book

Published in May 2021

Publisher Packt

ISBN-13 9781800567641

Pages 194 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Luis Sobrecueva

Table of Contents (15) Chapters

Preface

Section 1: AutoML Fundamentals

Chapter 1: Introduction to Automated Machine Learning

Chapter 2: Getting Started with AutoKeras

Chapter 3: Automating the Machine Learning Pipeline with AutoKeras

Section 2: AutoKeras in Practice

Chapter 4: Image Classification and Regression Using AutoKeras

Chapter 5: Text Classification and Regression Using AutoKeras

Chapter 6: Working with Structured Data Using AutoKeras

Chapter 7: Sentiment Analysis Using AutoKeras

Chapter 8: Topic Classification Using AutoKeras

Section 3: Advanced AutoKeras

Chapter 9: Working with Multimodal and Multitasking Data

Chapter 10: Exporting and Visualizing the Models

Other Books You May Enjoy

Chapter 5: Text Classification and Regression Using AutoKeras

In this chapter, we will focus on the use of AutoKeras to work with text (a sequence of words).

In the previous chapter, we saw that there was a specialized type of network suitable for image processing, called a convolutional neural network (CNN). In this chapter, we will see what recurrent neural networks (RNNs) are and how they work. An RNN is a type of neural network that is very suited to working with text.

We will also use a classifier and a regressor to solve text-based tasks. By the end of the chapter, you will have learned how to use AutoKeras to solve a wide variety of problems that are text-based, such as extracting emotions from tweets, detecting spam in emails, and so on.

In this chapter, we will cover the following topics:

Working with text data
Understanding RNNs—what are these neural networks and how do they work?
One-dimensional CNNs (Conv1D)
Creating an email spam detector...

Technical requirements

All coding examples in this book are available as Jupyter notebooks that can be downloaded from the following link: https://github.com/PacktPublishing/Automated-Machine-Learning-with-AutoKeras.

As code cells can be executed, each notebook can be self-installable, by adding a code snippet with the requirements you need. For this reason, at the beginning of each notebook there is a code cell for environmental setup, which installs AutoKeras and its dependencies.

So, to run the coding examples, you only need a computer with Ubuntu Linux as the operating system and can install the Jupyter Notebook with the following command line:

$ apt-get install python3-pip jupyter-notebook

Alternatively, you can also run these notebooks using Google Colaboratory, in which case you will only need a web browser. For further details, see the AutoKeras with Google Colaboratory section in Chapter 2, Getting Started with AutoKeras. Furthermore, in the Installing AutoKeras...

Working with text data

AutoKeras allows us to quickly and easily create high-performance models for solving text-based tasks.

Text is an excellent source of information to feed DL models, and there is a multitude of sources that are text-based, such as social media, chats, emails, articles, books, and countless tasks to automate based on text, such as the following:

Translation: Convert source text in one language to text in another language.
Conversational bots: Simulate human conversation using ML models.
Sentiment analysis: Classification of emotions by analyzing text data.
Spam classifiers: Email classification using machine learning models.
Document summarizers: Generate summaries of documents automatically.
Text generators: Generate text from scratch automatically.

As with other types of data, AutoKeras will do all the preprocessing so that we can pass the text directly to our model, but before starting with the practical examples, let...

Understanding RNNs

A common feature of all the neural networks seen so far is that they don't have a memory. Networks formed by either fully connected layers or convolutional layers process each input independently so that it is isolated from the other layers. However, in RNNs, "the past" is taken into account, and this is done using its previous output as the state; so, an RNN layer will have two inputs, one is which is the standard input of the current vector, and the other being the output of the previous vector, as seen in the following diagram:

Figure 5.2 – RNN loop unfolded

The RNN implements this memory feature with an internal loop over the entire sequence of elements. Let's explain it with some pseudocode, as follows:

state = 0
for input in input_sequence:
     output = f(input, state)
     state = output

There are several types of RNN architectures with much more...

One-dimensional CNNs (Conv1D)

Another architecture to take into account when working with texts is one-dimensional CNNs (Conv1D). The principle on which they are based is similar to the 2D CNN that we saw in the previous chapter, Chapter 4, Image Classification and Regression Using AutoKeras. These neural networks manage to learn patterns in text through filters, in the same way as they did with images in the previous chapter.

An example of a one-dimensional CNN is shown in the following diagram:

Figure 5.3 – 1D convolution over text sequences

It is good to know that if the chronological order of the elements in the sequence is important for the prediction, the RNNs are much more effective, thus one-dimensional CNNs are often combined with the RNNs to create high-performance models. The exhaustive search performed by AutoKeras takes both into account to find the best model.

Now, let's put the learned concepts into practice with some practical...

Creating an email spam detector

The model we are going to create will detect spam emails from an emails dataset. This is a little dataset of 5,572 emails, labeled with a spam column.

The notebook with the complete source code can be found at the following link:

https://colab.research.google.com/github/PacktPublishing/Automated-Machine-Learning-with-AutoKeras/blob/main/Chapter05/Chapter5_SpamDetector.ipynb

Let's now have a look at the relevant cells of the notebook in detail, as follows:

Installing AutoKeras: As we commented in other examples, the following snippet at the top of the notebook is responsible for installing AutoKeras and its dependencies, using the pip package manager:
```
!pip3 install autokeras
```
Importing needed packages: The following lines load tensorflow, pandas, numpy, and autokeras as needed dependencies for this project:
```
import tensorflow as tf
import pandas as pd 
import numpy as np
import autokeras as ak
from sklearn import model_selection...
```

Predicting news popularity in social media

In this section, we will create a model that will find out the popularity score for an article on social media platforms, based on its text. For this, we will train the model with a News Popularity dataset collected between 2015 and 2016 (https://archive.ics.uci.edu/ml/datasets/News+Popularity+in+Multiple+Social+Media+Platforms).

As we want to approximate a score (number of likes), we will use a text regressor for this task.

In the next screenshot, you can see some samples taken from this dataset:

Figure 5.8 – A few samples from the News Popularity dataset

This notebook with the complete source code can be found at https://colab.research.google.com/github/PacktPublishing/Automated-Machine-Learning-with-AutoKeras/blob/main/Chapter05/Chapter5_SpamDetector.ipynb.

We will now explain the relevant code cells of the notebook in detail, as follows:

Getting the articles dataset: Before training, we...

Summary

In this chapter, we have learned how neural networks work with text data, what recurrent neural networks are and how they work.

We've also put the concept of neural network into practice, using the power of AutoKeras, by implementing a spam predictor and a news popularity regressor, in just a few lines of code.

Now that we have learned how to work with text, we are ready to move on to the next chapter, where you will learn how to work with structured data by implementing classification and regression models using AutoKeras.