Reader small image

You're reading from  Hadoop Real-World Solutions Cookbook - Second Edition

Product typeBook
Published inMar 2016
Publisher
ISBN-139781784395506
Edition2nd Edition
Right arrow
Author (1)
Tanmay Deshpande
Tanmay Deshpande
author image
Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande

Right arrow

Twitter sentiment analysis using Hive


Twitter is one of the most important data sources that helps you to know the sentiments behind various things. In this recipe, we will take a look at how to perform sentiment analysis using Hive on Twitter data.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.

How to do it...

First of all, we need a dataset to perform this recipe. We will be using a dataset that can be found at http://s3.amazonaws.com/hw-sandbox/tutorial13/SentimentFiles.zip.

Next, we will unzip this data and upload it on HDFS. The zip contains three folders: the first for raw Twitter data, the second for a dictionary, and the third for a time zone:

hadoop fs -mkdir /data
hadoop fs -put tweets_raw /data
hadoop fs -put time_zone_map /data
hadoop fs -put dictionary /data

We use Hive's JSON SerDe jar to read the tweeter data, as shown here:

ADD JAR json-serde-1.1.9.9-Hive1.2-jar-with...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Hadoop Real-World Solutions Cookbook - Second Edition
Published in: Mar 2016Publisher: ISBN-13: 9781784395506

Author (1)

author image
Tanmay Deshpande

Tanmay Deshpande is a Hadoop and big data evangelist. He currently works with Schlumberger as a Big Data Architect in Pune, India. He has interest in a wide range of technologies, such as Hadoop, Hive, Pig, NoSQL databases, Mahout, Sqoop, Java, cloud computing, and so on. He has vast experience in application development in various domains, such as oil and gas, finance, telecom, manufacturing, security, and retail. He enjoys solving machine-learning problems and spends his time reading anything that he can get his hands on. He has great interest in open source technologies and has been promoting them through his talks. Before Schlumberger, he worked with Symantec, Lumiata, and Infosys. Through his innovative thinking and dynamic leadership, he has successfully completed various projects. He regularly blogs on his website http://hadooptutorials.co.in. You can connect with him on LinkedIn at https://www.linkedin.com/in/deshpandetanmay/. He has also authored Mastering DynamoDB, published in August 2014, DynamoDB Cookbook, published in September 2015, Hadoop Real World Solutions Cookbook-Second Edition, published in March 2016, Hadoop: Data Processing and Modelling, published in August, 2016, and Hadoop Blueprints, published in September 2016, all by Packt Publishing.
Read more about Tanmay Deshpande