Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Predictive Analytics with R

You're reading from  Learning Predictive Analytics with R

Product type Book
Published in Sep 2015
Publisher Packt
ISBN-13 9781782169352
Pages 332 pages
Edition 1st Edition
Languages
Author (1):
Eric Mayor Eric Mayor
Profile icon Eric Mayor

Table of Contents (23) Chapters

Learning Predictive Analytics with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Setting GNU R for Predictive Analytics Visualizing and Manipulating Data Using R Data Visualization with Lattice Cluster Analysis Agglomerative Clustering Using hclust() Dimensionality Reduction with Principal Component Analysis Exploring Association Rules with Apriori Probability Distributions, Covariance, and Correlation Linear Regression Classification with k-Nearest Neighbors and Naïve Bayes Classification Trees Multilevel Analyses Text Analytics with R Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML Exercises and Solutions Further Reading and References Index

Chapter 7. Exploring Association Rules with Apriori

Association rules allow us to explore the relationship between items and sets of items. Such items can be as diverse as the contents of a market basket, the words used in sentences, the components of food products, and so on. Let's go back to the first example: transactions in a shop. Each transaction is composed of one or more items. We are interested in transactions of at least two items because, of course, there cannot be relationships between several items in the purchase of a single item. Imagine customers are purchasing the following sets of items, for which each row represents a transaction. We will use this example more thoroughly in this section:

  • Cherry coke, chips, lemon

  • Cherry coke, chicken wings, lemon

  • Cherry coke, chips, chicken wings, lemon

  • Chips, chicken wings, lemon

  • Cherry coke, lemon, chips, chocolate cake

At first sight, you will notice that there seems to be an association between purchases of cherry coke and lemon, as four...

Apriori – basic concepts


There are some concepts about apriori that need to be understood before going further in this chapter: association rules, itemsets, support, confidence, and lift.

Association rules

An association rule is the explicit mention of a relationship in the data, in the form X => Y, where X (the antecedent) can be composed of one or several items. X is called an itemset. In what we will see, Y (the consequent) is always one single item. We might, for instance, be interested in what the antecedents of lemon are if we are interested in promoting the purchase of lemons.

Itemsets

Frequent itemsets are items or collections of items that occur frequently in transactions. Lemon is the most frequent itemset in the previous example, followed by cherry coke and chips. Itemsets are considered frequent if they occur more frequently than a specified threshold. This threshold is called minimal support. The omission of itemsets with support less than the minimal support is called support...

The inner working of apriori


The goal of apriori is to compute the frequent itemsets and the association rules in an efficient way, as well as to compute support and confidence for these. Going into the details of these computations is beyond the scope of this chapter. In what follows, we briefly examine how itemset generation and rule generation are accomplished.

Generating itemsets with support-based pruning

The most straightforward way to compute frequent itemsets would be to consider all the possible itemsets and discard those with support lower than minimal support. This is particularly inefficient, as generating itemsets and then discarding them is a waste of computation power. The goal is, of course, to generate only the itemsets that are useful for the analysis: those with support higher than minimal support. Let's continue with our previous example. The following table presents the same data using a binary representation:

Analyzing data with apriori in R


In this section, we will continue with another supermarket example and analyze associations in the Groceries dataset. In order to use this dataset and to explore association rules in R, we need to install and load the arules package:

install.packages("arules")
library(arules)
data(Groceries)

Using apriori for basic analysis

We can now explore relationships between purchased products in this dataset. This dataset is already in a form exploitable by apriori (transactions). We will first use the default parameters as follows:

rules = apriori(Groceries)

The output is provided in the following screenshot:

Running apriori on the Groceries dataset with default parameters

We can see on the first line the parameters used in the analysis—in this case, the default. Around the middle of the output (where the arrow is), we see that there are 169 items in 9835 transactions in this dataset, and that 0 rules have been found (see second to last line). If you try this with your own...

Summary


In this chapter, we discovered some important concepts regarding association rules. In particular, we examined how important support, confidence, and lift measures are in the assessment of association rules, and that high support and confidence do not necessarily mean that an association rule is useful. We uncovered the efficient working of the apriori algorithm for mining association rules and discovered the use of apriori in R in mining several datasets. We have also seen that it is often necessary to recode some variables before being able to analyze the data. Finally, we have discovered Grouped matrix-based visualization.

In next chapter, we will examine statistical distributions and correlations.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning Predictive Analytics with R
Published in: Sep 2015 Publisher: Packt ISBN-13: 9781782169352
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Transaction

Cherry Coke

Chicken wings

Chips

Chocolate cake...