Free Sample
+ Collection

Learning Data Mining with R

Bater Makhabel

Develop key skills and techniques with R to create and customize data mining algorithms
RRP $29.99
RRP $49.99
Print + eBook

Want this title & more?

$16.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781783982103
Paperback314 pages

About This Book

  • Develop a sound strategy for solving predictive modeling problems using the most popular data mining algorithms
  • Gain understanding of the major methods of predictive modeling
  • Packed with practical advice and tips to help you get to grips with data mining

Who This Book Is For

This book is intended for the budding data scientist or quantitative analyst with only a basic exposure to R and statistics. This book assumes familiarity with only the very basics of R, such as the main data types, simple functions, and how to move data around. No prior experience with data mining packages is necessary; however, you should have a basic understanding of data mining concepts and processes.

Table of Contents

Chapter 1: Warming Up
Big data
Data source
Data mining
Social network mining
Text mining
Web data mining
Why R?
Machine learning
Data attributes and description
Data cleaning
Data integration
Data dimension reduction
Data transformation and discretization
Visualization of results
Time for action
Chapter 2: Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Market basket analysis
Hybrid association rules mining
Mining sequence dataset
The R implementation
High-performance algorithms
Time for action
Chapter 3: Classification
Generic decision tree induction
High-value credit card customers classification using ID3
Web spam detection using C4.5
Web key resource page judgment using CART
Trojan traffic identification method and Bayes classification
Identify spam e-mail and Naïve Bayes classification
Rule-based classification of player types in computer games and rule-based classification
Time for action
Chapter 4: Advanced Classification
Ensemble (EM) methods
Biological traits and the Bayesian belief network
Protein classification and the k-Nearest Neighbors algorithm
Document retrieval and Support Vector Machine
Classification using frequent patterns
Classification using the backpropagation algorithm
Time for action
Chapter 5: Cluster Analysis
Search engines and the k-means algorithm
Automatic abstraction of document texts and the k-medoids algorithm
The CLARA algorithm
Unsupervised image categorization and affinity propagation clustering
News categorization and hierarchical clustering
Time for action
Chapter 6: Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE clustering
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data
Time for action
Chapter 7: Outlier Detection
Credit card fraud detection and statistical methods
Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
Intrusion detection and density-based methods
Intrusion detection and clustering-based methods
Monitoring the performance of the web server and classification-based methods
Detecting novelty in text, topic detection, and mining contextual outliers
Collective outliers on spatial data
Outlier detection in high-dimensional data
Time for action
Chapter 8: Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
Predicting future prices and time-series analysis
Stock market data and time-series clustering and classification
Web click streams and mining symbolic sequences
Mining sequence patterns in transactional databases
Time for action
Chapter 9: Graph Mining and Network Analysis
Graph mining
Mining frequent subgraph patterns
Social network mining
Time for action
Chapter 10: Mining Text and Web Data
Text mining and TM packages
Text summarization
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
Web usage mining with web logs
Time for action

What You Will Learn

  • Discover how you can manipulate data with R using code snippets
  • Get to know the top classification algorithms written in R
  • Develop best practices in the fields of graph mining and network analysis
  • Find out the solutions to mine text and web data with appropriate support from R
  • Familiarize yourself with algorithms written in R for spatial data mining, text mining, and web data mining
  • Explore solutions written in R based on RHadoop projects

In Detail

Being able to deal with the array of problems that you may encounter during complex statistical projects can be difficult. If you have only a basic knowledge of R, this book will provide you with the skills and knowledge to successfully create and customize the most popular data mining algorithms to overcome these difficulties.

You will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. Discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on RHadoop projects. You will finish this book feeling confident in your ability to know which data mining algorithm to apply in any situation.


Read More