Reader small image

You're reading from  R Machine Learning By Example

Product typeBook
Published inMar 2016
Reading LevelIntermediate
Publisher
ISBN-139781784390846
Edition1st Edition
Languages
Tools
Right arrow

Chapter 7. Social Media Analysis – Analyzing Twitter Data

Connected is the word that describes life in the 21st century. Though various factors contribute to the term connected, there's one aspect which has played a pivotal role. It's called the Web. The Web, which has made distance an irrelevant metric and blurred socio-economic boundaries, is a world in itself and we all are a part of it. The Web or Internet in particular has been a central entity in this data-driven revolution. As we have seen in our previous chapters, for most modern day problems, it is the Web/Internet (henceforth used interchangeably) that acts as a source of data. Be it e-commerce platforms or financial domain, the Internet provides us with huge amounts of data every second. There's another ocean of data within this virtual world which touches our lives at a very personal level. Social networks, or social media, is a behemoth of information and the topic for this chapter.

In the previous chapter, we covered the financial...

Social networks (Twitter)


We all use social networks day in and day out. There are numerous social networks catering to all sorts of ideologies and philosophies, but Facebook and Twitter (barring a couple more) have become synonymous with the term social network itself. These two social networks enjoy popularity not only because of their uniqueness and the quality of service but because of the way they enable us to interact in a very intuitive way. As we saw with recommendation engines used in e-commerce websites (see Chapter 4, Building a Product Recommendation System), social networks have existed long before Facebook, Twitter, or even the Internet.

Social networks have interested scientists and mathematicians alike. It is an interdisciplinary topic which spans but is not limited to sociology, psychology, biology, economics, communication studies, and information science. Various theories have been developed to analyze social networks and their impact on human lives in the form of factors...

Data mining @social networks


We have traveled quite a distance so far through the chapters of this book, understanding various concepts and learning some amazing algorithms. We have even worked on projects that have applications in our daily lives. In short, we have done data mining without using the term explicitly. Let us now take this opportunity to formally define data mining.

Mining, in the classical sense of the word, refers to the extraction of useful minerals from the Earth (such as coal mining). Put in the context of the information age, mining refers to the extraction of useful information from large pools of data. Thus, if we look carefully, Knowledge Mining or Knowledge Discovery from Data (KDD) seems to be a better representation than the term data mining. As is the case with many keywords, short and sweet catches the attention. Thus, you may find in many places the terms Knowledge Discovery from Data and data mining being used interchangeably, which is rightly so. The process...

Getting started with Twitter APIs


Twitter is as much a delight for tweeple (people using Twitter to tweet) as it is for data scientists. The APIs and the documentation are well updated and easy to use. Let us get started with the APIs.

Overview

Twitter has one of easiest yet most powerful set of APIs available of any social network out there. These APIs have been used by Twitter itself and data scientists to understand the dynamics of the Twitter world. Twitter APIs make use of four different objects, namely:

  • Tweets: A tweet is the central entity that defines Twitter itself. As discussed in the previous section, a tweet contains far more information (metadata) than just the content/message of the tweet.

  • Users: Anybody or anything that can tweet, follow, or perform any of Twitter's actions is a user. Twitter is unique in its definition of user, which need not necessarily be a human. @MarsCuriosity is one such nonhuman popular Twitter handle with over 2 million followers!

  • Entities: These are structured...

Twitter data mining


Now that we have tested our tools, libraries, and connections to Twitter APIs, the time has come to begin our search for the hidden treasures in Twitter land. Let's wear our data miner's cap and start digging!

In this section, we will be working on Twitter data gathered from searching keywords (or hashtags in Twitter vocabulary) and user timelines. Using this data, we will be uncovering some interesting insights while using different functions and utilities from TwitteR and other R packages.

Note

Please note that our process will implicitly follow the steps outlined for data mining. In the spirit of brevity, we might take the liberty to not mention each of the steps explicitly. We are mining for some gold-plated insights; rest assured nothing is skipped!

Every year, we begin with a new zeal to achieve great feats and improve upon our shortcomings. Most of us make promises to ourselves in the form of New Year's resolutions. Let us explore what tweeple are doing with their...

Challenges with social network data mining


Before we close the chapter, let us look at the different challenges posed by social networks to the process of data mining. The following points present a few arguments, questions, and challenges:

  • No doubt the data generated by social networks classifies as big data in every aspect. It has all the volume, velocity, and variety in it to overwhelm any system. Yet, interestingly, the challenge with such a huge source of data is the availability of enough granular data. If we zoom into our data sets and try to use data on a per user basis, we find that there isn't enough data to do some of the most common tasks, such as making recommendations!

  • Social networks such as Twitter handle millions of users creating and sharing tons of data every second. To keep their systems up and running at all times, they put limits upon the amount of data that can be tapped using their APIs (security is also a major reason behind these limits, though). These limits put...

Summary


Social network analysis is one the trending topics in the world of data science. As we have seen throughout the chapter, these platforms not only provide us with ways to connect but they also present a unique opportunity to study human dynamics at a global scale. Through this chapter, we have learned some interesting techniques. We started off by understanding data mining in the social network context followed by the importance of visualizations. We focused on Twitter and understood different objects and APIs to manipulate them. We used various packages from R, such as TwitteR and TM, to connect, collect, and manipulate data for our analysis. We used data from Twitter to learn about frequency throughout. Finally, we presented some of the challenges posed by social networks words and associations, popular devices used by tweeple, hierarchical clustering and even touched upon topic modeling. We used ggplot2 and wordcloud to visualize our results to the data mining process in general...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
R Machine Learning By Example
Published in: Mar 2016Publisher: ISBN-13: 9781784390846
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime