R: Mining spatial, text, web, and social media data

Create data mining algorithms
Preview in Mapt
Code Files

R: Mining spatial, text, web, and social media data

Bater Makhabel et al.

2 customer reviews
Create data mining algorithms
Mapt Subscription
FREE
$29.99/m after trial
eBook
$44.80
RRP $63.99
Save 29%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$44.80
$29.99p/m after trial
RRP $63.99
Subscription
eBook
Start 30 Day Trial

Frequently bought together


R: Mining spatial, text, web, and social media data Book Cover
R: Mining spatial, text, web, and social media data
$ 63.99
$ 44.80
R Data Analysis Solution - Analyzing Time-Series and Social Media Data, and More [Video] Book Cover
R Data Analysis Solution - Analyzing Time-Series and Social Media Data, and More [Video]
$ 124.99
$ 106.25
Buy 2 for $35.00
Save $153.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781788293747
Paperback651 pages

Book Description

Data mining is the first step to understanding data and making sense of heaps of data. Properly mined data forms the basis of all data analysis and computing performed on it. This learning path will take you from the very basics of data mining to advanced data mining techniques, and will end up with a specialized branch of data mining—social media mining.

You will learn how to manipulate data with R using code snippets and how to mine frequent patterns, association, and correlation while working with R programs. You will discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on R Hadoop projects.

Now that you are comfortable with data mining with R, you will move on to implementing your knowledge with the help of end-to-end data mining projects. You will learn how to apply different mining concepts to various statistical and data applications in a wide range of fields. At this stage, you will be able to complete complex data mining cases and handle any issues you might encounter during projects.

After this, you will gain hands-on experience of generating insights from social media data. You will get detailed instructions on how to obtain, process, and analyze a variety of socially-generated data while providing a theoretical background to accurately interpret your findings. You will be shown R code and examples of data that can be used as a springboard as you get the chance to undertake your own analyses of business, social, or political data.

This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:

  • Learning Data Mining with R by Bater Makhabel
  • R Data Mining Blueprints by Pradeepta Mishra
  • Social Media Mining with R by Nathan Danneman and Richard Heimann

Table of Contents

Chapter 1: Warming Up
Big data
Data source
Data mining
Social network mining
Text mining
Web data mining
Why R?
Statistics
Machine learning
Data attributes and description
Data cleaning
Data integration
Data dimension reduction
Data transformation and discretization
Visualization of results
Time for action
Summary
Chapter 2: Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Market basket analysis
Hybrid association rules mining
Mining sequence dataset
The R implementation
High-performance algorithms
Time for action
Summary
Chapter 3: Classification
Classification
Generic decision tree induction
High-value credit card customers classification using ID3
Web spam detection using C4.5
Web key resource page judgment using CART
Trojan traffic identification method and Bayes classification
Identify spam e-mail and Naïve Bayes classification
Rule-based classification of player types in computer games and rule-based classification
Time for action
Summary
Chapter 4: Advanced Classification
Ensemble (EM) methods
Biological traits and the Bayesian belief network
Protein classification and the k-Nearest Neighbors algorithm
Document retrieval and Support Vector Machine
Classification using frequent patterns
Classification using the backpropagation algorithm
Time for action
Summary
Chapter 5: Cluster Analysis
Search engines and the k-means algorithm
Automatic abstraction of document texts and the k-medoids algorithm
The CLARA algorithm
CLARANS
Unsupervised image categorization and affinity propagation clustering
News categorization and hierarchical clustering
Time for action
Summary
Chapter 6: Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE clustering
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data
Time for action
Summary
Chapter 7: Outlier Detection
Credit card fraud detection and statistical methods
Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
Intrusion detection and density-based methods
Intrusion detection and clustering-based methods
Monitoring the performance of the web server and classification-based methods
Detecting novelty in text, topic detection, and mining contextual outliers
Collective outliers on spatial data
Outlier detection in high-dimensional data
Time for action
Summary
Chapter 8: Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
Predicting future prices and time-series analysis
Stock market data and time-series clustering and classification
Web click streams and mining symbolic sequences
Mining sequence patterns in transactional databases
Time for action
Summary
Chapter 9: Graph Mining and Network Analysis
Graph mining
Mining frequent subgraph patterns
Social network mining
Time for action
Summary
Chapter 10: Mining Text and Web Data
Text mining and TM packages
Text summarization
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
Web usage mining with web logs
Time for action
Summary
Chapter 11: Data Manipulation Using In-built R Data
What is data mining?
Introduction to the R programming language
Data type conversion
Sorting and merging dataframes
Indexing or subsetting dataframes
Date and time formatting
Creating new functions
Loop concepts - the for loop
Loop concepts - the repeat loop
Loop concepts - while conditions
Apply concepts
String manipulation
NA and missing value management
Missing value imputation techniques
Summary
Chapter 12: Exploratory Data Analysis with Automobile Data
Univariate data analysis
Bivariate analysis
Multivariate analysis
Understanding distributions and transformation
Interpreting distributions
Variable binning or discretizing continuous data
Contingency tables, bivariate statistics, and checking for data normality
Hypothesis testing
Non-parametric methods
Summary
Chapter 13: Visualize Diamond Dataset
Data visualization using ggplot2
Using plotly
Creating geo mapping
Summary
Chapter 14: Regression with Automobile Data
Regression introduction
Linear regression
Stepwise regression method for variable selection
Logistic regression
Cubic regression
Penalized regression
Summary
Chapter 15: Market Basket Analysis with Groceries Data
Introduction to Market Basket Analysis
Practical project
Summary
Chapter 16: Clustering with E-commerce Data
Understanding customer segmentation
Various clustering methods available
References
Summary
Chapter 17: Building a Retail Recommendation Engine
What is recommendation?
Assumptions
What method to apply when
Limitations of collaborative filtering
Practical project
Summary
Chapter 18: Dimensionality Reduction
Why dimensionality reduction?
Practical project around dimensionality reduction
Parametric approach to dimension reduction
References
Summary
Chapter 19: Applying Neural Network to Healthcare Data
Introduction to neural networks
Understanding the math behind the neural network
Neural network implementation in R
Neural networks for prediction
Neural networks for classification
Neural networks for forecasting
Merits and demerits of neural networks
References
Summary
Chapter 20: Going Viral
Social media mining using sentiment analysis
The state of communication
What is Big Data?
Human sensors and honest signals
Quantitative approaches
Summary
Chapter 21: Getting Started with R
Why R?
Quick start
Vectors, sequences, and combining vectors
A quick example – creating data frames and importing files
Visualization in R
Style and workflow
Additional resources
Summary
Chapter 22: Mining Twitter with R
Why Twitter data?
Obtaining Twitter data
Preliminary analyses
Summary
Chapter 23: Potentials and Pitfalls of Social Media Data
Opinion mining made difficult
Sentiment and its measurement
The nature of social media data
Traditional versus nontraditional social data
Measurement and inferential challenges
Summary
Chapter 24: Social Media Mining – Fundamentals
Key concepts of social media mining
Good data versus bad data
Understanding sentiments
Sentiment polarity – data and classification
Supervised social media mining – lexicon-based sentiment
Supervised social media mining – Naive Bayes classifiers
Unsupervised social media mining – Item Response Theory for text scaling
Summary
Chapter 25: Social Media Mining – Case Studies
Introductory considerations
Case study 1 – supervised social media mining – lexicon-based sentiment
Case study 2 – Naive Bayes classifier
Case study 3 – IRT models for unsupervised sentiment scaling
Summary

What You Will Learn

  • Discover how to manipulate data in R
  • Get to know top classification algorithms written in R
  • Explore solutions written in R based on R Hadoop projects
  • Apply data management skills in handling large data sets
  • Acquire knowledge about neural network concepts and their applications in data mining
  • Create predictive models for classification, prediction, and recommendation
  • Use various libraries on R CRAN for data mining
  • Discover more about data potential, the pitfalls, and inferencial gotchas
  • Gain an insight into the concepts of supervised and unsupervised learning
  • Delve into exploratory data analysis
  • Understand the minute details of sentiment analysis

Authors

Table of Contents

Chapter 1: Warming Up
Big data
Data source
Data mining
Social network mining
Text mining
Web data mining
Why R?
Statistics
Machine learning
Data attributes and description
Data cleaning
Data integration
Data dimension reduction
Data transformation and discretization
Visualization of results
Time for action
Summary
Chapter 2: Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Market basket analysis
Hybrid association rules mining
Mining sequence dataset
The R implementation
High-performance algorithms
Time for action
Summary
Chapter 3: Classification
Classification
Generic decision tree induction
High-value credit card customers classification using ID3
Web spam detection using C4.5
Web key resource page judgment using CART
Trojan traffic identification method and Bayes classification
Identify spam e-mail and Naïve Bayes classification
Rule-based classification of player types in computer games and rule-based classification
Time for action
Summary
Chapter 4: Advanced Classification
Ensemble (EM) methods
Biological traits and the Bayesian belief network
Protein classification and the k-Nearest Neighbors algorithm
Document retrieval and Support Vector Machine
Classification using frequent patterns
Classification using the backpropagation algorithm
Time for action
Summary
Chapter 5: Cluster Analysis
Search engines and the k-means algorithm
Automatic abstraction of document texts and the k-medoids algorithm
The CLARA algorithm
CLARANS
Unsupervised image categorization and affinity propagation clustering
News categorization and hierarchical clustering
Time for action
Summary
Chapter 6: Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE clustering
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data
Time for action
Summary
Chapter 7: Outlier Detection
Credit card fraud detection and statistical methods
Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
Intrusion detection and density-based methods
Intrusion detection and clustering-based methods
Monitoring the performance of the web server and classification-based methods
Detecting novelty in text, topic detection, and mining contextual outliers
Collective outliers on spatial data
Outlier detection in high-dimensional data
Time for action
Summary
Chapter 8: Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
Predicting future prices and time-series analysis
Stock market data and time-series clustering and classification
Web click streams and mining symbolic sequences
Mining sequence patterns in transactional databases
Time for action
Summary
Chapter 9: Graph Mining and Network Analysis
Graph mining
Mining frequent subgraph patterns
Social network mining
Time for action
Summary
Chapter 10: Mining Text and Web Data
Text mining and TM packages
Text summarization
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
Web usage mining with web logs
Time for action
Summary
Chapter 11: Data Manipulation Using In-built R Data
What is data mining?
Introduction to the R programming language
Data type conversion
Sorting and merging dataframes
Indexing or subsetting dataframes
Date and time formatting
Creating new functions
Loop concepts - the for loop
Loop concepts - the repeat loop
Loop concepts - while conditions
Apply concepts
String manipulation
NA and missing value management
Missing value imputation techniques
Summary
Chapter 12: Exploratory Data Analysis with Automobile Data
Univariate data analysis
Bivariate analysis
Multivariate analysis
Understanding distributions and transformation
Interpreting distributions
Variable binning or discretizing continuous data
Contingency tables, bivariate statistics, and checking for data normality
Hypothesis testing
Non-parametric methods
Summary
Chapter 13: Visualize Diamond Dataset
Data visualization using ggplot2
Using plotly
Creating geo mapping
Summary
Chapter 14: Regression with Automobile Data
Regression introduction
Linear regression
Stepwise regression method for variable selection
Logistic regression
Cubic regression
Penalized regression
Summary
Chapter 15: Market Basket Analysis with Groceries Data
Introduction to Market Basket Analysis
Practical project
Summary
Chapter 16: Clustering with E-commerce Data
Understanding customer segmentation
Various clustering methods available
References
Summary
Chapter 17: Building a Retail Recommendation Engine
What is recommendation?
Assumptions
What method to apply when
Limitations of collaborative filtering
Practical project
Summary
Chapter 18: Dimensionality Reduction
Why dimensionality reduction?
Practical project around dimensionality reduction
Parametric approach to dimension reduction
References
Summary
Chapter 19: Applying Neural Network to Healthcare Data
Introduction to neural networks
Understanding the math behind the neural network
Neural network implementation in R
Neural networks for prediction
Neural networks for classification
Neural networks for forecasting
Merits and demerits of neural networks
References
Summary
Chapter 20: Going Viral
Social media mining using sentiment analysis
The state of communication
What is Big Data?
Human sensors and honest signals
Quantitative approaches
Summary
Chapter 21: Getting Started with R
Why R?
Quick start
Vectors, sequences, and combining vectors
A quick example – creating data frames and importing files
Visualization in R
Style and workflow
Additional resources
Summary
Chapter 22: Mining Twitter with R
Why Twitter data?
Obtaining Twitter data
Preliminary analyses
Summary
Chapter 23: Potentials and Pitfalls of Social Media Data
Opinion mining made difficult
Sentiment and its measurement
The nature of social media data
Traditional versus nontraditional social data
Measurement and inferential challenges
Summary
Chapter 24: Social Media Mining – Fundamentals
Key concepts of social media mining
Good data versus bad data
Understanding sentiments
Sentiment polarity – data and classification
Supervised social media mining – lexicon-based sentiment
Supervised social media mining – Naive Bayes classifiers
Unsupervised social media mining – Item Response Theory for text scaling
Summary
Chapter 25: Social Media Mining – Case Studies
Introductory considerations
Case study 1 – supervised social media mining – lexicon-based sentiment
Case study 2 – Naive Bayes classifier
Case study 3 – IRT models for unsupervised sentiment scaling
Summary

Book Details

ISBN 139781788293747
Paperback651 pages
Read More
From 2 reviews

Read More Reviews

Recommended for You

R Data Analysis Solution - Analyzing Time-Series and Social Media Data, and More [Video] Book Cover
R Data Analysis Solution - Analyzing Time-Series and Social Media Data, and More [Video]
$ 124.99
$ 106.25
Mastering Social Media Mining with R Book Cover
Mastering Social Media Mining with R
$ 27.99
$ 19.60
Social Media Mining with R Book Cover
Social Media Mining with R
$ 16.99
$ 11.90
R: Data Analysis and Visualization Book Cover
R: Data Analysis and Visualization
$ 59.99
$ 42.00
Managing Data and Media in Microsoft Silverlight 4: A mashup of chapters from Packt's bestselling Silverlight books Book Cover
Managing Data and Media in Microsoft Silverlight 4: A mashup of chapters from Packt's bestselling Silverlight books
$ 19.99
$ 14.00
R: Programming and Data Science Book Cover
R: Programming and Data Science
$ 124.99
$ 106.25