Reader small image

You're reading from  Natural Language Processing with TensorFlow

Product typeBook
Published inMay 2018
Reading LevelBeginner
PublisherPackt
ISBN-139781788478311
Edition1st Edition
Languages
Right arrow
Authors (2):
Thushan Ganegedara
Thushan Ganegedara
author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara

View More author details
Right arrow

Chapter 5. Sentence Classification with Convolutional Neural Networks

In this chapter, we will discuss a type of neural networks known as Convolutional Neural Networks (CNNs). CNNs are quite different from fully connected neural networks and have achieved state-of-the-art performance in numerous tasks. These tasks include image classification, object detection, speech recognition, and of course, sentence classification. One of the main advantages of CNNs is that compared to a fully connected layer, a convolution layer in a CNN has a much smaller number of parameters. This allows us to build deeper models without worrying about memory overflow. Also, deeper models usually lead to better performance.

We will introduce you to what a CNN is in detail by discussing different components found in a CNN and what makes CNNs different from their fully connected counterparts. Then we will discuss the various operations used in CNNs, such as the convolution and pooling operations, and certain hyperparameters...

Introducing Convolution Neural Networks


In this section, you will learn about CNNs. Specifically, you will first get an understanding of the sort of operations present in a CNN, such as convolution layers, pooling layers, and fully connected layers. Next, we will briefly see how all of these are connected to form an end-to-end model. Then we will dive into the details of each of these operations, define them mathematically, and learn how the various hyperparameters involved with these operations change the output produced by them.

CNN fundamentals

Now, let's explore the fundamental idea behind a CNN without delving into too much technical detail. As noted in the preceding paragraph, a CNN is a stack of layers, such as convolution layers, pooling layers, and fully connected layers. We will discuss each of these to understand their role in the CNN.

Initially, the input is connected to a set of convolution layers. These convolution layers slide a patch of weights (sometimes called the convolution...

Understanding Convolution Neural Networks


Now let's walk through the technical details of a CNN. First, we will discuss the convolution operation and introduce some terminology, such as filter size, stride, and padding. In brief, filter size refers to the window size of the convolution operation, stride refers to the distance between two movements of the convolution window, and padding refers to the way you handle boundaries of the input. We will also discuss an operation that is known as deconvolution or transposed convolution. Then we will discuss the details of the pooling operation. Finally, we will discuss how to connect fully connected layers and the two-dimensional outputs produced by the convolution and pooling layers and how to use the output for classification or regression.

Convolution operation

In this section, we will discuss the convolution operation in detail. First we will discuss the convolution operation without stride and padding, next we will describe the convolution operation...

Exercise – image classification on MNIST with CNN


This will be our first example of using a CNN for a real-world machine learning task. We will classify images using a CNN. The reason for not starting with an NLP task is that applying CNNs to NLP tasks (for example, sentence classification) is not very straightforward. There are several tricks involved in using CNNs for such a task. However, originally, CNNs were designed to cope with image data. Therefore, let's start there and then find our way through to see how CNNs apply to NLP tasks.

About the data

In this exercise, we will use a dataset well-known in the computer vision community: the MNIST dataset. The MNIST dataset is a database of labeled images of handwritten digits from 0 to 9. The dataset contains three different subdatasets: the training, validation, and test sets. We will train on the training set and evaluate the performance of our model on the unseen test dataset. We will use the validation dataset to improve the performance...

Using CNNs for sentence classification


Though CNNs have mostly been used for computer vision tasks, nothing stops them from being used in NLP applications. One such application for which CNNs have been used effectively is sentence classification.

In sentence classification, a given sentence should be classified to a class. We will use a question database, where each question is labeled by what the question is about. For example, the question "Who was Abraham Lincoln?" will be a question and its label will be Person. For this we will use a sentence classification dataset available at http://cogcomp.org/Data/QA/QC/; here you will find 1,000 training sentences and their respective labels and 500 testing sentences.

We will use the CNN network introduced in the paper by Yoon Kim, Convolutional Neural Networks for Sentence Classification, to understand the value of CNNs for NLP tasks. However, using CNNs for sentence classification is somewhat different from the MNIST example we discussed, because...

Summary


In this chapter, we discussed CNNs and their various applications. First, we went through a detailed explanation about what CNNs are and their ability to excel at machine learning tasks. Next we decomposed the CNN into several components, such as convolution and pooling layers, and discussed in detail how these operators work. Furthermore, we discussed several hyperparameters that are related to these operators such as filter size, stride, and padding. Then, to illustrate the functionality of CNNs, we walked through a simple example of classifying images of handwritten digits to the corresponding image. We also did a bit of analysis to see why the CNN fails to recognize some images correctly. Finally, we started talking about how CNNs are applied for NLP tasks. Concretely, we discussed an altered architecture of CNNs that can be used to classify sentences. We then implemented this particular CNN architecture and tested it on an actual sentence classification task.

In the next chapter...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Processing with TensorFlow
Published in: May 2018Publisher: PacktISBN-13: 9781788478311
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara