You're reading from The Natural Language Processing Workshop

Product type Book

Published in Aug 2020

Publisher Packt

ISBN-13 9781800208421

Pages 452 pages

Edition 1st Edition

Languages

Python

Concepts

Mobile Application Development

Authors (6):

Rohan Chopra

Aniruddha M. Godbole

Nipun Sadvilkar

Muzaffar Bashir Shah

Sohom Ghosh

Dwight Gunning

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to Natural Language Processing

2. Feature Extraction Methods

3. Developing a Text Classifier

4. Collecting Text Data with Web Scraping and APIs

5. Topic Modeling

6. Vector Representation

7. Text Generation and Summarization

8. Sentiment Analysis

Appendix

8. Sentiment Analysis

Overview

This chapter introduces you to one of the most exciting applications of natural language processing—that is, sentiment analysis. You will explore the various tools used to perform sentiment analysis, such as popular NLP libraries and deep learning frameworks. You will then perform sentiment analysis on given text data using the powerful textblob library. You will load textual data and perform preprocessing on it to fine-tune the results of your sentiment analysis program. By the end of the chapter, you will be able to train a sentiment analysis model.

Introduction

In the previous chapter, we looked at text generation, paraphrasing, and summarization, all of which can be immensely useful in helping us focus on only the essential and meaningful parts of the text corpus. This, in turn, helps us to further refine the results of our NLP project. In this chapter, we will look at sentiment analysis, which, as the name suggests, is the area of NLP that involves teaching computers how to identify the sentiment behind written content or parsed audio—that is, audio converted to text. Adding this ability to automatically detect sentiment in large volumes of text and speech opens new possibilities for us to write useful software.

In sentiment analysis, we try to build models that detect how people feel. This starts with determining what kind of feeling we want to detect. Our application may attempt to determine the level of human emotion (most often, whether a person is sad or happy; satisfied or dissatisfied; or interested or disinterested...

Tools Used for Sentiment Analysis

There are a lot of tools capable of analyzing sentiment. Each tool has its advantages and disadvantages. We will look at each of them in detail.

NLP Services from Major Cloud Providers

Online sentiment analysis is carried out by all major cloud services providers, such as Amazon, Microsoft, Google, and IBM. You can usually find sentiment analysis as a part of their text analysis services or general machine learning services. Online services offer the convenience of packaging all the necessary algorithms behind the provider's API. These algorithms are capable of performing sentiment analysis. To use such services, you need to provide the text or audio sources, and in return, the services will provide you with a measure of the sentiment. These services usually return a standard, simple score, such as positive, negative, or neutral. The score usually ranges between 0 and 1.

The following are the advantages and disadvantages of NLP services...

The textblob library

textblob is a Python library used for NLP, as we've seen in the previous chapters. It has a simple API and is probably the easiest way to begin with sentiment analysis. textblob is built on top of the NLTK library but is much easier to use. In the following sections, we will do an exercise and an activity to get a better understanding of how we can use textblob for sentiment analysis.

Exercise 8.01: Basic Sentiment Analysis Using the textblob Library

In this exercise, we will perform sentiment analysis on a given text. For this, we will be using the TextBlob class of the textblob library. Follow these steps to complete this exercise:

Open a Jupyter notebook.
Insert a new cell and add the following code to implement to import the TextBlob class from the textblob library:
```
from textblob import TextBlob
```
Create a variable named sentence and assign it a string. Insert a new cell and add the following code to implement this:
```
sentence = "but...
```

Understanding Data for Sentiment Analysis

Sentiment analysis is a type of text classification. Sentiment analysis models are usually trained using supervised datasets. Supervised datasets are a kind of dataset that is labeled with the target variable, usually a column that specifies the sentiment value in the text. This is the value we want to predict in the unseen text.

Exercise 8.02: Loading Data for Sentiment Analysis

In this exercise, we will load data that could be used to train a sentiment analysis model. For this exercise, we will be using three datasets—namely Amazon, Yelp, and IMDb.

Note

You can find the data being used in this exercise here: https://packt.live/2XgeQqJ.

Follow these steps to implement this exercise:

Open a Jupyter notebook.
Insert a new cell and add the following code to import the necessary libraries:
```
import pandas as pd
pd.set_option('display.max_colwidth', 200)
```
This imports the pandas library. It also sets the display...

Training Sentiment Models

The end product of any sentiment analysis project is a sentiment model. This is an object containing a stored representation of the data on which it was trained. Such a model has the ability to predict sentiment values for text that it has not seen before. To develop a sentiment analysis model, the following steps should be taken:

The document dataset must be split into train and test datasets. The test dataset is normally a fraction of the overall dataset. It is usually between 5% and 40% of the overall dataset, depending on the total number of examples available. If the amount of data is too large, then a smaller test dataset can be used.
Next, the text should be preprocessed by stripping unwanted characters, removing stop words, and performing other common preprocessing steps.
The text should be converted to numeric vector representations in order to extract the features. These representations are used for training machine learning models...

Summary

We started our journey into NLP with basic text analytics and text preprocessing techniques, such as tokenization, stemming, lemmatization, and lowercase conversion, to name a few. We then explored ways in which we can represent our text data in numerical form so that it can be understood by machines in order to implement various algorithms. After getting some practical knowledge of topic modeling, we moved on to text vectorization, and finally, in this chapter, we explored various applications of sentiment analysis. This included different tools that use sentiment analysis, from technologies available from online marketplaces to deep learning frameworks. More importantly, we learned how to load data and train our model to use it to predict sentiment.