Reader small image

You're reading from  The Natural Language Processing Workshop

Product typeBook
Published inAug 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800208421
Edition1st Edition
Languages
Tools
Right arrow
Authors (6):
Rohan Chopra
Rohan Chopra
author image
Rohan Chopra

Rohan Chopra graduated from Vellore Institute of Technology with a bachelors degree in computer science. Rohan has an experience of more than 2 years in designing, implementing, and optimizing end-to-end deep neural network systems. His research is centered around the use of deep learning to solve computer vision-related problems and has hands-on experience working on self-driving cars. He is a data scientist at Absolutdata.
Read more about Rohan Chopra

Aniruddha M. Godbole
Aniruddha M. Godbole
author image
Aniruddha M. Godbole

Aniruddha M. Godbole is a data science consultant with inter-disciplinary expertise in computer science, applied statistics, and finance. He has a master's degree in data science from Indiana University, USA, and has done MBA in finance from the National Institute of Bank Management, India. He has authored papers in computer science and finance and has been an occasional opinion pages contributor to Mint, which is a leading business newspaper in India. He has fifteen years of experience.
Read more about Aniruddha M. Godbole

Nipun Sadvilkar
Nipun Sadvilkar
author image
Nipun Sadvilkar

Nipun Sadvilkar is a senior data scientist at US healthcare company leading a team of data scientists and subject matter expertise to design and build the clinical NLP engine to revamp medical coding workflows, enhance coder efficiency, and accelerate revenue cycle. He has experience of more than 3 years in building NLP solutions and web-based data science platforms in the area of healthcare, finance, media, and psychology. His interests lie at the intersection of machine learning and software engineering with a fair understanding of the business domain. He is a member of the regional and national python community. He is author of pySBD - an NLP open-source python library for sentence segmentation which is recognized by ExplosionAI (spaCy) and AllenAI (scispaCy) organizations.
Read more about Nipun Sadvilkar

Muzaffar Bashir Shah
Muzaffar Bashir Shah
author image
Muzaffar Bashir Shah

Muzaffar Bashir Shah is a software developer with vast experience in machine learning, natural language processing (NLP), text analytics, and data science. He holds a masters degree in computer science from the University of Kashmir and is currently working in a Bangalore based startup named Datoin.
Read more about Muzaffar Bashir Shah

Sohom Ghosh
Sohom Ghosh
author image
Sohom Ghosh

Sohom Ghosh is a passionate data detective with expertise in natural language processing. He has worked extensively in the data science arena with a specialization in deep learning-based text analytics, NLP, and recommendation systems. He has publications in several international conferences and journals.
Read more about Sohom Ghosh

Dwight Gunning
Dwight Gunning
author image
Dwight Gunning

Dwight Gunning is a data scientist at FINRA, a financial services regulator in the US. He has extensive experience in Python-based machine learning and hands-on experience with the most popular NLP tools such as NLTK, gensim, and spacy.
Read more about Dwight Gunning

View More author details
Right arrow

8. Sentiment Analysis

Overview

This chapter introduces you to one of the most exciting applications of natural language processing—that is, sentiment analysis. You will explore the various tools used to perform sentiment analysis, such as popular NLP libraries and deep learning frameworks. You will then perform sentiment analysis on given text data using the powerful textblob library. You will load textual data and perform preprocessing on it to fine-tune the results of your sentiment analysis program. By the end of the chapter, you will be able to train a sentiment analysis model.

Introduction

In the previous chapter, we looked at text generation, paraphrasing, and summarization, all of which can be immensely useful in helping us focus on only the essential and meaningful parts of the text corpus. This, in turn, helps us to further refine the results of our NLP project. In this chapter, we will look at sentiment analysis, which, as the name suggests, is the area of NLP that involves teaching computers how to identify the sentiment behind written content or parsed audio—that is, audio converted to text. Adding this ability to automatically detect sentiment in large volumes of text and speech opens new possibilities for us to write useful software.

In sentiment analysis, we try to build models that detect how people feel. This starts with determining what kind of feeling we want to detect. Our application may attempt to determine the level of human emotion (most often, whether a person is sad or happy; satisfied or dissatisfied; or interested or disinterested...

Tools Used for Sentiment Analysis

There are a lot of tools capable of analyzing sentiment. Each tool has its advantages and disadvantages. We will look at each of them in detail.

NLP Services from Major Cloud Providers

Online sentiment analysis is carried out by all major cloud services providers, such as Amazon, Microsoft, Google, and IBM. You can usually find sentiment analysis as a part of their text analysis services or general machine learning services. Online services offer the convenience of packaging all the necessary algorithms behind the provider's API. These algorithms are capable of performing sentiment analysis. To use such services, you need to provide the text or audio sources, and in return, the services will provide you with a measure of the sentiment. These services usually return a standard, simple score, such as positive, negative, or neutral. The score usually ranges between 0 and 1.

The following are the advantages and disadvantages of NLP services...

The textblob library

textblob is a Python library used for NLP, as we've seen in the previous chapters. It has a simple API and is probably the easiest way to begin with sentiment analysis. textblob is built on top of the NLTK library but is much easier to use. In the following sections, we will do an exercise and an activity to get a better understanding of how we can use textblob for sentiment analysis.

Exercise 8.01: Basic Sentiment Analysis Using the textblob Library

In this exercise, we will perform sentiment analysis on a given text. For this, we will be using the TextBlob class of the textblob library. Follow these steps to complete this exercise:

  1. Open a Jupyter notebook.
  2. Insert a new cell and add the following code to implement to import the TextBlob class from the textblob library:
    from textblob import TextBlob
  3. Create a variable named sentence and assign it a string. Insert a new cell and add the following code to implement this:
    sentence = "but...

Understanding Data for Sentiment Analysis

Sentiment analysis is a type of text classification. Sentiment analysis models are usually trained using supervised datasets. Supervised datasets are a kind of dataset that is labeled with the target variable, usually a column that specifies the sentiment value in the text. This is the value we want to predict in the unseen text.

Exercise 8.02: Loading Data for Sentiment Analysis

In this exercise, we will load data that could be used to train a sentiment analysis model. For this exercise, we will be using three datasets—namely Amazon, Yelp, and IMDb.

Note

You can find the data being used in this exercise here: https://packt.live/2XgeQqJ.

Follow these steps to implement this exercise:

  1. Open a Jupyter notebook.
  2. Insert a new cell and add the following code to import the necessary libraries:
    import pandas as pd
    pd.set_option('display.max_colwidth', 200)

    This imports the pandas library. It also sets the display...

Training Sentiment Models

The end product of any sentiment analysis project is a sentiment model. This is an object containing a stored representation of the data on which it was trained. Such a model has the ability to predict sentiment values for text that it has not seen before. To develop a sentiment analysis model, the following steps should be taken:

  1. The document dataset must be split into train and test datasets. The test dataset is normally a fraction of the overall dataset. It is usually between 5% and 40% of the overall dataset, depending on the total number of examples available. If the amount of data is too large, then a smaller test dataset can be used.
  2. Next, the text should be preprocessed by stripping unwanted characters, removing stop words, and performing other common preprocessing steps.
  3. The text should be converted to numeric vector representations in order to extract the features. These representations are used for training machine learning models...

Summary

We started our journey into NLP with basic text analytics and text preprocessing techniques, such as tokenization, stemming, lemmatization, and lowercase conversion, to name a few. We then explored ways in which we can represent our text data in numerical form so that it can be understood by machines in order to implement various algorithms. After getting some practical knowledge of topic modeling, we moved on to text vectorization, and finally, in this chapter, we explored various applications of sentiment analysis. This included different tools that use sentiment analysis, from technologies available from online marketplaces to deep learning frameworks. More importantly, we learned how to load data and train our model to use it to predict sentiment.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Natural Language Processing Workshop
Published in: Aug 2020Publisher: PacktISBN-13: 9781800208421
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (6)

author image
Rohan Chopra

Rohan Chopra graduated from Vellore Institute of Technology with a bachelors degree in computer science. Rohan has an experience of more than 2 years in designing, implementing, and optimizing end-to-end deep neural network systems. His research is centered around the use of deep learning to solve computer vision-related problems and has hands-on experience working on self-driving cars. He is a data scientist at Absolutdata.
Read more about Rohan Chopra

author image
Aniruddha M. Godbole

Aniruddha M. Godbole is a data science consultant with inter-disciplinary expertise in computer science, applied statistics, and finance. He has a master's degree in data science from Indiana University, USA, and has done MBA in finance from the National Institute of Bank Management, India. He has authored papers in computer science and finance and has been an occasional opinion pages contributor to Mint, which is a leading business newspaper in India. He has fifteen years of experience.
Read more about Aniruddha M. Godbole

author image
Nipun Sadvilkar

Nipun Sadvilkar is a senior data scientist at US healthcare company leading a team of data scientists and subject matter expertise to design and build the clinical NLP engine to revamp medical coding workflows, enhance coder efficiency, and accelerate revenue cycle. He has experience of more than 3 years in building NLP solutions and web-based data science platforms in the area of healthcare, finance, media, and psychology. His interests lie at the intersection of machine learning and software engineering with a fair understanding of the business domain. He is a member of the regional and national python community. He is author of pySBD - an NLP open-source python library for sentence segmentation which is recognized by ExplosionAI (spaCy) and AllenAI (scispaCy) organizations.
Read more about Nipun Sadvilkar

author image
Muzaffar Bashir Shah

Muzaffar Bashir Shah is a software developer with vast experience in machine learning, natural language processing (NLP), text analytics, and data science. He holds a masters degree in computer science from the University of Kashmir and is currently working in a Bangalore based startup named Datoin.
Read more about Muzaffar Bashir Shah

author image
Sohom Ghosh

Sohom Ghosh is a passionate data detective with expertise in natural language processing. He has worked extensively in the data science arena with a specialization in deep learning-based text analytics, NLP, and recommendation systems. He has publications in several international conferences and journals.
Read more about Sohom Ghosh

author image
Dwight Gunning

Dwight Gunning is a data scientist at FINRA, a financial services regulator in the US. He has extensive experience in Python-based machine learning and hands-on experience with the most popular NLP tools such as NLTK, gensim, and spacy.
Read more about Dwight Gunning