Reader small image

You're reading from  Advanced Deep Learning with R

Product typeBook
Published inDec 2019
Reading LevelExpert
PublisherPackt
ISBN-139781789538779
Edition1st Edition
Languages
Right arrow
Author (1)
Bharatendra Rai
Bharatendra Rai
author image
Bharatendra Rai

Bharatendra Rai is a chairperson and professor of business analytics, and the director of the Master of Science in Technology Management program at the Charlton College of Business at UMass Dartmouth. He received a Ph.D. in industrial engineering from Wayne State University, Detroit. He received a master's in quality, reliability, and OR from Indian Statistical Institute, India. His current research interests include machine learning and deep learning applications. His deep learning lecture videos on YouTube are watched in over 198 countries. He has over 20 years of consulting and training experience in industries such as software, automotive, electronics, food, chemicals, and so on, in the areas of data science, machine learning, and supply chain management.
Read more about Bharatendra Rai

Right arrow

Text Classification Using Convolutional Recurrent Neural Networks

Convolutional neural networks (CNNs) have been found to be useful in capturing high-level local features from data. On the other hand, recurrent neural networks (RNNs), such as long short-term memory (LSTM), have been found to be useful in capturing long-term dependencies in data involving sequences such as text. When we use CNNs and RNNs in the same model architecture, it gives rise to what's called convolutional recurrent neural networks (CRNNs).

This chapter illustrates how to apply convolutional recurrent neural networks to text classification problems by combining the advantages of RNNs and CNNs networks. The steps that are involved in this process include text data preparation, defining a convolutional recurrent network model, training the model, and model assessment.

More specifically, in this chapter...

Working with the reuter_50_50 dataset

In the previous chapters, when dealing with text data, we made use of data that had already been converted into a sequence of integers for developing deep network models. In this chapter, we will use text data that needs to be converted into a sequence of integers. We will start by reading the data that we will use to illustrate how to develop a text classification deep network model. We will also explore the dataset that we'll use so that we have a better understanding of it.

In this chapter, we will make use of the keras, deepviz, and readtext libraries, as shown in the following code:

# Libraries used
library(keras)
library(deepviz)
library(readtext)

For illustrating the steps involved in developing a convolutional recurrent network model, we will make use of the reuter_50_50 text dataset, which is available from the UCI Machine Learning...

Preparing the data for model building

In this section, we will prepare some data so that we can develop an author classification model. We will start by using tokens to convert text data that is available in the form of articles into a sequence of integers. We will also make changes to identify each author by unique integers. Subsequently, we will use padding and truncation to arrive at the same length for the sequence of integers that represent the articles by 50 authors. We will end this section by partitioning the training data into train and validation datasets and then carrying out one-hot encoding on the response variables.

Tokenization and converting text into a sequence of integers

We will start by carrying out tokenization...

Developing the model architecture

In this section, we will make use of convolutional and LSTM layers in the same network. The convolutional recurrent network architecture can be captured in the form of a simple flowchart:

Here, we can see that the flowchart contains embedding, convolutional 1D, maximum pooling, LSTM, and dense layers. Note that the embedding layer is always the first layer in the network and is commonly used for applications involving text data. The main purpose of the embedding layer is to find a mapping of each unique word, which in our example is 500, and turn it into a vector that is smaller in size, which we will specify using output_dim. In the convolutional layer, we will use the relu activation function. Similarly, the activation functions that will be used for the LSTM and dense layers will be tanh and softmax, respectively.

We can use the following...

Compiling and fitting the model

In this section, we will compile the model and then train the model using the fit function using the training and validation dataset. We will also plot the loss and accuracy values that were obtained while training the model.

Compiling the model

For compiling the model, we will use the following code:

# Compile model
model %>% compile(optimizer = "adam",
loss = "categorical_crossentropy",
metrics = c("acc"))

Here, we've specified the adam optimizer. We're using categorical_crossentropy as the loss function since the labels are based on 50 authors. For the metrics, we've specified the accuracy of the author's classification...

Evaluating the model and predicting classes

In this section, we will evaluate the model based on our training and test data. We will obtain accuracy by correctly classifying each author using a confusion matrix for the training and test data to gain further insights. We will also use bar plots to visualize the accuracy of identifying each author.

Model evaluation with training data

First, we will evaluate the model's performance using training data. Then, we will use the model to predict the class representing each of the 50 authors. The code for evaluating the model is as follows:

# Loss and accuracy
model %>% evaluate(trainx, trainy)
$loss
[1] 1.45669
$acc
[1] 0.5346288

Here, we can see that, by using the training data...

Performance optimization tips and best practices

In this section, we will explore changes we can make to the model architecture and other settings to improve author classification performance. We will carry out two experiments, and, for both of these two experiments, we will increase the number of most frequent words from 500 to 1,500 and increase the length of the sequences of integers from 300 to 400. For both experiments, we will also add a dropout layer after the pooling layer.

Experimenting with reduced batch size

The code that we'll be using for this experiment is as follows:

# Model architecture
model <- keras_model_sequential() %>%
layer_embedding(input_dim = 1500,
output_dim...

Summary

In this chapter, we illustrated the steps for developing a convolutional recurrent neural network for author classification based on articles that they have written. Convolutional recurrent neural networks combine the advantages of two networks into one network. On one hand, convolutional networks can capture high-level local features from the data, while, on the other hand, recurrent networks can capture long-term dependencies in the data involving sequences.

First, convolutional recurrent neural networks extract features using a one-dimensional convolutional layer. These extracted features are then passed to the LSTM recurrent layer to obtain hidden long-term dependencies, which are then passed to a fully connected dense layer. This dense layer obtains the probability of the correct classification of each author based on the data in the articles. Although we used a convolutional...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Deep Learning with R
Published in: Dec 2019Publisher: PacktISBN-13: 9781789538779
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Bharatendra Rai

Bharatendra Rai is a chairperson and professor of business analytics, and the director of the Master of Science in Technology Management program at the Charlton College of Business at UMass Dartmouth. He received a Ph.D. in industrial engineering from Wayne State University, Detroit. He received a master's in quality, reliability, and OR from Indian Statistical Institute, India. His current research interests include machine learning and deep learning applications. His deep learning lecture videos on YouTube are watched in over 198 countries. He has over 20 years of consulting and training experience in industries such as software, automotive, electronics, food, chemicals, and so on, in the areas of data science, machine learning, and supply chain management.
Read more about Bharatendra Rai