Packt+ | Advance your knowledge in tech

You're reading from R for Data Science Cookbook (n)

Product typeBook

Published inJul 2016

Reading LevelIntermediate

Publisher

ISBN-139781784390815

Edition1st Edition

Languages

Tools

ggplot

Concepts

Data Science

Author (1)

Yu-Wei, Chiu (David Chiu)

Chapter 9. Rule and Pattern Mining with R

This chapter covers the following topics:

Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE

Introduction

The majority of readers will be familiar with Wal-Mart moving beer next to diapers in its stores because it found that the purchase of both products is highly correlated. This is one example of what data mining is about; it can help us find how items are associated in a transaction dataset. Using this skill, a business can explore the relationship between items, allowing it to sell correlated items together to increase sales.

As an alternative to identifying correlated items with association mining, another popular application of data mining is to discover frequent sequential patterns from transaction datasets with temporal information. This can be used in a number of applications, including predicting customer shopping sequence order, web click streams and biological sequences.

The recipes in this chapter cover creating and inspecting transaction datasets, performing association analysis with the Apriori algorithm, visualizing associations in various graph formats, and finding...

Transforming data into transactions

Before using any rule mining algorithm, we need to transform data from the data frame format into transactions. In this example, we demonstrate how to transform a purchase order dataset into transactions with the arules package.

Getting ready

Download the purchase_order.RData dataset from the https://github.com/ywchiu/rcookbook/raw/master/chapter9/product_by_user.RData GitHub link.

How to do it…

Perform the following steps to create transactions:

First, install and load the arules package:

> install.packages("arules")
> library(arules)

Use the load function to load purchase orders by user into an R session:
```
> load("product_by_user.RData")
```

Last, convert the data.table (or data.frame) into transactions with the as function:

> trans = as(product_by_user $Product, "transactions")
> trans
transactions in sparse format with
 32539 transactions (rows) and
 20054 items (columns)

How it works…

Before mining a frequent item set or association rule, it is...

Displaying transactions and associations

The arules package uses its transactions class to store transaction data. As such, we must use the generic function provided by arules to display transactions and association rules. In this recipe, we illustrate how to plot transactions and association rules with various functions in the arules package.

Getting ready

Ensure you have completed the previous recipe by generating transactions and storing these in a variable named trans.

How to do it…

Perform the following steps to display transactions and associations:

First, obtain a LIST representation of the transaction data:

> head(LIST(trans),3)
$'00001'
[1] "P0014520085"

$'00002'
[1] "P0018800250"

$'00003'
[1] "P0003926850034" "P0013344760004" "P0013834251"    "P0014251480003"

Next, use the summary function to show a summary of the statistics and details of the transactions:

> summary(trans)
transactions as itemMatrix in sparse format with
 32539 rows (elements/itemsets/transactions) and
 20054...

Mining associations with the Apriori rule

Association mining is a technique that can discover interesting relationships hidden in a transaction dataset. This approach first finds all frequent itemsets and generates strong association rules from frequent itemsets. In this recipe, we will introduce how to perform association analysis using the Apriori rule.

Getting ready

Ensure you have completed the previous recipe by generating transactions and storing these in a variable, trans.

How to do it…

Please perform the following steps to analyze association rules:

Use apriori to discover rules with support over 0.001 and confidence over 0.1:

> rules <- apriori(trans, parameter = list(supp = 0.001, conf = 0.1, target= "rules"))
> summary(rules)
set of 6 rules
 
 rule length distribution (lhs + rhs):sizes
 2 
 6 
 
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       2       2       2       2       2       2 
 
 summary of quality measures:
     support           confidence          lift...

Pruning redundant rules

Among generated rules, we sometimes find repeated or redundant rules (for instance, one rule is the super rule of another rule). In this recipe, we will show how to prune (or remove) repeated or redundant rules.

Getting ready

In this recipe, one has to have completed the previous recipe by generating rules and having these stored in a variable named rules.

How to do it…

Perform the following steps to prune redundant rules:

First, you need to identify the redundant rules:

> rules.sorted = sort(rules, by="lift")
> subset.matrix = is.subset(rules.sorted, rules.sorted)
> subset.matrix[lower.tri(subset.matrix, diag=T)] = NA
> redundant = colSums(subset.matrix, na.rm=T) >= 1

You can then remove the redundant rules:

> rules.pruned = rules.sorted[!redundant]
> inspect(rules.pruned)
  lhs                 rhs              support     confidence lift    
1 {P0014252070}    => {P0014252066}    0.001321491 0.2704403  27.32874
5 {P0014252055}    => {P0014252066...

Visualizing association rules

To explore the relationship between items, one can visualize the association rules. In the following recipe, we introduce how to use the arulesViz package to visualize association rules.

Getting ready

In this recipe, one has to have completed the previous recipe by generating rules and have these stored in a variable named rules.pruned.

How to do it…

Please perform the following steps to visualize association rules:

First, install and load the arulesViz package:

> install.packages("arulesViz")
> library(arulesViz)

You can then make a scatterplot from the pruned rules:
```
> plot(rules.pruned)
```
Figure 3: The scatterplot of pruned rules
We can also present the rules in a grouped matrix:
```
> plot(rules.pruned,method="grouped")
```
Figure 4: The grouped matrix for three rules
Alternatively, we can use a graph to present the rules:
```
> plot(rules.pruned,method="graph")
```
Figure 5: The graph for three rules

How it works…

As an alternative to presenting association rules as...

Mining frequent itemsets with Eclat

As the Apriori algorithm performs a breadth-first search to scan the complete database, support counting is rather time-consuming. Alternatively, if the database fits into memory, one can use the Eclat algorithm, which performs a depth-first search to count supports. The Eclat algorithm, therefore, runs much more quickly than the Apriori algorithm. In this recipe, we introduce how to use the Eclat algorithm to generate a frequent itemset.

Getting ready

In this recipe, one has to have completed the previous recipe by generating rules and have these stored in a variable named rules.

How to do it…

Please perform the following steps to generate a frequent itemset using the Eclat algorithm:

Similar to the Apriori method, we can use the eclat function to generate a frequent itemset:
```
> frequentsets=eclat(trans,parameter=list(support=0.01,maxlen=10))
```
We can then obtain the summary information from the generated frequent itemset:
```
> summary(frequentsets)
set of...
```

Creating transactions with temporal information

In addition to mining interesting associations within the transaction database, we can mine interesting sequential patterns using transactions with temporal information. In the following recipe, we demonstrate how to create transactions with temporal information from a web traffic dataset.

Getting ready

Download the web_traffic.csv dataset from the https://github.com/ywchiu/rcookbook/raw/master/chapter9/traffic.RData GitHub link.

We can then generate transactions from the loaded dataset for frequent sequential pattern mining.

How to do it…

Perform the following steps to create transactions with temporal information:

First, install and load the arulesSequences package:

> install.packages("arulesSequences")
> library(arulesSequences)

Load web traffic data into an R session:
```
> load('traffic.RData')
```

Create the transaction data with temporal information:

> traffic_data<-data.frame(item=traffic$Page)
> traffic.tran<-as(traffic_data...

Mining frequent sequential patterns with cSPADE

One of the most famous frequent sequential pattern mining algorithms is the SPADE (Sequential PAttern Discovery using Equivalence classes) algorithm, which employs characteristics of the vertical database to perform intersection on ID-list with efficient lattice search and allows us to place constraints on mined sequences. In this recipe, we will demonstrate how to use cSPADE to mine frequent sequential patterns.

Getting ready

In this recipe, one has to have completed the previous recipe by generating transactions with temporal information and have it stored in a variable named traffic.tran.

How to do it…

Please perform the following steps to mine frequent sequential patterns:

First, use the cspade function to generate frequent sequential patterns:

> frequent_pattern <-cspade(traffic.tran,parameter = list(support = 0.50))
> inspect(frequent_pattern)
    items                           support 
  1 <{item=/}>                  1.00...

The rest of the chapter is locked

You have been reading a chapter from

R for Data Science Cookbook (n)

Published in: Jul 2016Publisher: ISBN-13: 9781784390815

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Yu-Wei, Chiu (David Chiu)

Yu-Wei, Chiu (David Chiu) is the founder of LargitData (www.LargitData.com), a startup company that mainly focuses on providing big data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building big data platforms for business intelligence and customer relationship management systems. In addition to being a start-up entrepreneur and data scientist, he specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences. In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing. For more information, please visit his personal website at www.ywchiu.com. **********************************Acknowledgement************************************** I have immense gratitude for my family and friends for supporting and encouraging me to complete this book. I would like to sincerely thank my mother, Ming-Yang Huang (Miranda Huang); my mentor, Man-Kwan Shan; the proofreader of this book, Brendan Fisher; Members of LargitData; Data Science Program (DSP); and other friends who have offered their support.
Read more about Yu-Wei, Chiu (David Chiu)

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages