Packt+ | Advance your knowledge in tech

You're reading from Mastering Data Mining with Python - Find patterns hidden in your data

Product typeBook

Published inAug 2016

Reading LevelIntermediate

Publisher

ISBN-139781785889950

Edition1st Edition

Languages

Python

Tools

NLTK Scikit-learn

Concepts

Data Mining

Author (1)

Megan Squire

Chapter 5. Sentiment Analysis in Text

One of the most powerful skills we can master in data mining is learning how to deal with large amounts of unstructured or semi-structured textual data. Textual data, sometimes just called text, is important because it is everywhere, and because it conveys so much detail about the human experience in so many formats: books, news media, journals, government reports, case law, e-mail messages, chat logs, product reviews, and so on. We also find text data in places we might not expect. For example, when the spoken word is written down it also becomes text, as do song lyrics and video transcripts. When we look at the code that makes up web pages and computer programs, we find text. When we need a computer to leave a record of what activities have transpired, we have it create a text log file. When we need a common, universally interoperable medium for communicating between devices, we often use plain text to do so.

Over the next few chapters, we will be exploring...

What is sentiment analysis?

Many texts contain language that can be described as emotional. Whether to express the feelings of the writer, or to inspire a particular feeling in the reader, human language can convey anger, disappointment, disgust, joy, happiness, amusement, and so on. Discovering this type of emotional content can tell us a great deal about the writer, including what the writer's intention was and the expected response of the reader. Even noticing the absence of emotional content in a text can be interesting. Once we understand how to discern the emotional content of a text, or lack thereof, we can compare texts and writers to each other in terms of the emotional content, we can compare emotional content over time, and we can sometimes even predict how a reader will respond to a particular text.

Analyzing a text for its emotional content can take many forms. In this chapter, we will be primarily concerned with sentiment analysis, sometimes called opinion mining. Sentiment...

The basics of sentiment analysis

To begin a sentiment mining project, we first need to understand how opinions are structured in text so we can find the best way to train the computer to deal with them. Opinion mining and sentiment analysis are considered sub-problems of the much larger field of natural language processing (NLP), and as such, are subject to many of the same unsolved issues in trying to account for all the quirks of human communication. However, sentiment mining is restricted in an important way, namely that its goal is not to understand the statements made by people, but rather to just figure out their tone. As we will see later, any one strategy for finding the sentiment of any given text may not be perfect, but this may not matter much if the amount of data is high and the stakes are comparatively low.

The structure of an opinion

Each opinion typically has a target. If we read the sentence, "This was the worst movie I ever saw," the target of that opinion is the movie. In...

Sentiment analysis algorithms

Supposing we wanted to broadly classify the sentiment of a text as positive or negative, we may choose to model the opinion mining task as a classification problem, such as could be solved with supervised machine learning techniques like a Naïve Bayes classifier (NBC). Given a set of positive text features and negative text features, an NBC strategy will allow us to take a new text and classify it as being more positive or more negative given the observations about other similar texts we have made in the past. The machine learning literature is replete with examples of supervised classification, and it is a very reliable approach for certain types of problems.

The trick of course with this type of classification scheme is being able to count on the observations we have made in the past as reliable indicators of future observations. These training examples are critically important and are the basis for the success of the entire scheme. After all, if we choose...

Sentiment mining application

In this section, we will look at building an application to do sentiment analysis on text using the NLTK tools. There are several different options for how to direct NLTK to do sentiment analysis on text, so our experiments with these various methods will teach us a bit about what is going on inside NLTK and also about how sentiment analysis works.

You might recall that we installed and tested NLTK in Chapter 1, Expanding Your Data Mining Toolbox, and we used NLTK for entity matching back in Chapter 3, Entity Matching, so if you skipped those chapters, you may need to install or upgrade NLTK now. To do this from within Anaconda, open the Tools menu, select Open a terminal, and type:

conda upgrade nltk

This will fetch all the relevant NLTK packages and upgrade your Anaconda installation.

Motivating the project

With this housekeeping task finished, we are ready to start thinking about what kind of sentiment analysis we want to experiment with. Throughout this book...

Summary

After finishing this chapter, we now have a functional understanding of how sentiment analysis works, and we have compared many different strategies that the mainstream sentiment analysis tools use to accomplish this goal. We paid special attention to the Vader tool which comes as standard with the Python NLTK, since it is well-tested and straightforward to use. To learn how to use its sentiment intensity scoring system, we calculated the sentiment for a few different real-world datasets, both messy chat data and somewhat more structured e-mail data.

In the next chapter, we will continue to hone our skills in text mining, but instead of looking at the emotion conveyed by an entire sentence, we will focus our attention on locating entities within sentences. This task, called named entity recognition, is slightly related to the entity matching task we looked at in Chapter 3, Entity Matching, in that in both cases we are working with entities such as people or organizations. However...

The rest of the chapter is locked

You have been reading a chapter from

Mastering Data Mining with Python - Find patterns hidden in your data

Published in: Aug 2016Publisher: ISBN-13: 9781785889950

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Megan Squire

Megan Squire is a professor of computing sciences at Elon University. Her primary research interest is in collecting, cleaning, and analyzing data about how free and open source software is made. She is one of the leaders of the FLOSSmole.org, FLOSSdata.org, and FLOSSpapers.org projects.
Read more about Megan Squire

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages