Packt+ | Advance your knowledge in tech

You're reading from Machine Learning with Swift

Product type Book

Published in Feb 2018

Publisher Packt

ISBN-13 9781787121515

Pages 378 pages

Edition 1st Edition

Languages

Swift

Concepts

Machine Learning

Authors (3):

Jojo Moolayil

Alexander Sosnovshchenko

Oleksandr Baiev

View More author details

Table of Contents (18) Chapters

Title Page

Packt Upsell

Contributors

Preface

Getting Started with Machine Learning

Classification – Decision Tree Learning

K-Nearest Neighbors Classifier

K-Means Clustering

Association Rule Learning

Linear Regression and Gradient Descent

Linear Classifier and Logistic Regression

Neural Networks

Convolutional Neural Networks

Natural Language Processing

Machine Learning Libraries

Optimizing Neural Networks for Mobile Devices

Best Practices

Index

Chapter 5. Association Rule Learning

In many practical applications data comes in the form of lists (ordered or unordered): grocery lists, playlists, visited locations or URLs, app logs, and so on. Sometimes those lists are generated as a byproduct of business processes, but they still contain potentially useful information and insights for process improvement. To extract some of that hidden knowledge, one can use a special kind of unsupervised learning algorithm—association rule mining. In this chapter, we are going to build an app that can analyze your shopping lists to find out your preferences in the form of rules such as "If you've bought oatmeal and cornflakes, you also want to buy milk." This can be used to create an adaptable user experience, for instance, contextual suggestions or reminders.

In this chapter, we will cover the following topics:

Association rules
Association measures
Association rule mining algorithms
Building an adaptable user experience

Seeing association rules

There are many situations where we're interested in patterns demonstrating the co-occurrence of some items. For example, marketers want to know which goods are often bought together, clinical personnel need to know symptoms associated with certain medical conditions, and in information security we want to know which activity patterns are associated with intrusion or fraud. All of these problems have a common structure: there are items (goods, symptoms, records in logs) organized in transactions (shopping list, medical case, user activity transaction). With this type of data, we can then analyze it to find association rules, such as If the client bought a lemon and some cookies, he is also likely to buy tea, or in more formal notation: (cookies, lemon → tea).

Note

We will use pictograms throughout this chapter to facilitate the visual notation of item sets and rules: {

→

These rules allow us to make informed decisions, such as putting associated items on the same...

Defining data structures

What we want to have by the end of this chapter is a rule learning algorithm called Apriori. We will learn about the algorithm details later; for now, we only want to define the data structures that we will work with throughout the chapter, along with some utility functions.

The generic structure for the algorithm is as follows:

public struct Apriori<Item: Hashable & Equatable> {

In the simplest case, the ordering of the items in the transaction doesn't matter, and neither does their number nor the associated timestamps. This means that we consider our item sets and transactions as mathematical or Swift sets:

public typealias ItemSet = Set<Item>

The parameter I is a type of item in your transactions. Next, we have to implement some structures for subsets and rules:

class Subsets: Sequence {
  var subsets: [ItemSet]
  init(_ set: ItemSet) {
    self.subsets = Array(set).combinations().map(Set.init)
  }
  func makeIterator() -> AnyIterator<ItemSet...

Using association measures to assess rules

Look at these two rules:

{Oatmeal, corn flakes → Milk}
{Dog food, paperclips → Washing powder}

Intuitively, the second rule looks more unlikely than the first one, doesn't it? How can we tell that for sure, though? In this case, we need some quantitative measures that will show us how likely each rule is. What we are looking for here are association measures, as we call them in machine learning and data mining. Rule mining algorithms revolve around this notion in a similar manner to how distance-based algorithms revolve around distance metrics. In this chapter, we're going to use four association measures: support, confidence, lift, and conviction (see Table 5.1).

Note that these measures tell us nothing about how useful or interesting the rules are, but only quantify their probabilistic characteristics. A rule's usefulness and practicality can be hard to grasp mathematically and often requires human judgment in each case. As usual in statistics, interpreting...

Decomposing the problem

The task of extracting all association rules with the given confidence and support from the dataset is non-trivial. Let's approach it by decomposing it into smaller subtasks:

Find all item sets with the support above the given threshold
Generate all possible rules from the item sets that have confidence above the given threshold

Generating all possible rules

We need a method to generate all possible combinations of elements of this array. Combinations are found via the binary representation of subsets, as shown in the following snippet:

public extension Array { 
public func combinations() -> [[Element]] { 
        if isEmpty { return [] } 
        let numberOfSubsets = Int(pow(2, Double(count))) 
        var result = [[Element]]() 
        for i in 0..<numberOfSubsets { 
            var remainder = i 
            var index = 0 
            var combination = [Element]() 
            while remainder > 0 { 
                if remainder % 2 == 1 { 
                    combination.append(self[index]) 
                } 
                index += 1 
                remainder /= 2 
            } 
            result.append(combination) 
        } 
        return result 
    } 
}

The following usage example:

let array = [1,2,3] 
print(array.combinations())

Produces:

[[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2...

Finding frequent item sets

The first step of the algorithm that we implement is based on the support measure. This function returns a set of all item sets with support larger than minSupport:

func frequentItemSets(minSupport: Double) -> Set<ItemSet> { 
     
    var itemSets = Set<ItemSet>() 
     
    let emptyItemSet: ItemSet = ItemSet() 
     
    supporters[emptyItemSet] = Array(0 ..< transactions.count)

Here we use the priority queue data structure to keep track of possible extensions.

Note

There is no priority queue implementation in the Foundation or Swift standard libraries, and standard data structures are out of the scope of this book. We are using the open source implementation by David Kopec (MIT license): https://github.com/davecom/SwiftPriorityQueue.

To make it work with item sets we had to change the code a bit—instead of being parameterized with the comparable types, it is now parameterized with types conforming to the equatable protocol:

    var queue = PriorityQueue...

The Apriori algorithm

The most famous algorithm for association rule learning is Apriori. It was proposed by Agrawal and Srikant in 1994. The input of the algorithm is a dataset of transactions where each transaction is a set of items. The output is a collection of association rules for which support and confidence are greater than some specified threshold. The name comes from the Latin phrase a priori (literally, "from what is before") because of one smart observation behind the algorithm: if the item set is infrequent, then we can be sure in advance that all its subsets are also infrequent.

You can implement Apriori with the following steps:

Count the support of all item sets of length 1, or calculate the frequency of every item in the dataset.
Drop the item sets that have support lower than the threshold.
Store all the remaining item sets.
Extend each stored item set by one element with all possible extensions. This step is known as candidate generation.
Calculate the support value of each...

Implementing Apriori in Swift

What will follow is a simplified version of a code that can be found in supplementary materials. We will skip some of the less important parts here.

The main method, which returns association rules with the given support and confidence, is as follows:

public func associationRules(minSupport: Double, minConfidence: Double) -> [Rule] { 
    var rules = [Rule]() 
    let frequent = frequentItemSets(minSupport: minSupport) 
     
    for itemSet in frequent { 
        for (ifPart, thenPart) in nonOverlappingSubsetPairs(itemSet) { 
            if confidence(ifPart, thenPart) >= minConfidence { 
                let rule = Rule(ifPart: convertIndexesToItems(ifPart), thenPart: convertIndexesToItems(thenPart)) 
                rules.append(rule) 
            } 
        } 
    } 
     
    return rules 
} 
func nonOverlappingSubsetPairs(_ itemSet: ItemSet) -> [(ItemSet, ItemSet)] { 
    var result = [(ItemSet, ItemSet)]() 
    let ifParts = Subsets(itemSet) 
 ...

Running Apriori

And finally, this is how we use the algorithm with our toy example:

let transactions = [["
", "
", "
", ""], 
                    ["
", "
", ""], 
                    ["
", "
", ""], 
                    ["
", ""], 
                    ["
", ""], 
                    ["
", ""], 
                    [""] 
] 
 
let apriori = Apriori<String>(transactions: transactions) 
let rules = apriori.associationRules(minSupport: 0.3, minConfidence: 0.5) 
for rule in rules { 
    print(rule) 
    print("Confidence: ", apriori.confidence(rule), "Lift: ", apriori.lift(rule), "Conviction: ", apriori.conviction(rule)) 
}

It produces the following:

{ 
 → } 
Confidence:  0.8 Lift:  1.4 Conviction:  2.14285714285714 
{ 
 → } 
Confidence:  1.0 Lift:  1.4 Conviction:  inf 
{ 
 → } 
Confidence:  0.75 Lift:  1.3125 Conviction:  1.71428571428571 
{ 
 → } 
Confidence:  0.75 Lift:  1.3125 Conviction:  1.71428571428571

Let's analyze what's going on here. The second rule has the maximum confidence as well as conviction...

Running Apriori on real-world data

In this example, we collected real-world shopping lists from an apartment and composed a small, but nevertheless realistic, dataset. Let's see if we'll be able to extract any meaningful rules from it using our algorithm. Please note that this dataset is extremely small. For any production application of Apriori, you will need much larger datasets:

let transactions =  
[["Grapes", "Cheese"], 
["Cheese", "Milk"], 
["Apples", "Oranges", "Cheese", "Gingerbread", "Marshmallows", "Eggs", "Canned vegetables"], 
["Tea", "Apples", "Bagels", "Marshmallows", "Icecream", "Canned vegetables"], 
["Cheese", "Buckwheat", "Cookies", "Oatmeal", "Banana", "Butter", "Bread", "Apples", "Baby puree"], 
["Baby puree", "Cookies"], 
["Cookies"], 
["Chicken", "Grapes", "Pizza", "Cheese", "Marshmallows", "Cream"], 
["Potatoes"], 
["Chicken"], 
["Сornflakes", "Cookies", "Oatmeal"], 
["Tea"], 
["Chicken"], 
["Chicken", "Eggs", "Cheese", "Oatmeal", "Bell pepper", "Bread", "Chocolate...

The pros and cons of Apriori

The pros of Apriori are as follows:

This is the most simple and easy-to-understand algorithm among association rule learning algorithms
The resulting rules are intuitive and easy to communicate to an end user
It doesn't require labeled data as it is fully unsupervised; as a result, you can use it in many different situations because unlabeled data is often more accessible
Many extensions were proposed for different use cases based on this implementation—for example, there are association learning algorithms that take into account the ordering of items, their number, and associated timestamps
The algorithm is exhaustive, so it finds all the rules with the specified support and confidence

The cons of Apriori are as follows:

If the dataset is small, the algorithm can find many false associations that happened simply by chance. You can address this issue by evaluating obtained rules on the held-out test data for the support, confidence, lift, and conviction values.
As Agrawal...

Building an adaptable user experience

Human-computer interaction is never easy. Computers don't understand speech, sentiments, or body language. However, we are all used to communicating with our smart devices using not-so-smart buttons, drop downs, pickers, switches, checkboxes, sliders, and hundreds of other controls. They comply with a new kind of language that is commonly referred to as UI. Slowly but unavoidably, machine learning has made its way into all areas where computers interact directly with humans: voice input, handwriting input, lip reading, gesture recognition, body pose estimation, face emotion recognition, sentiment analysis, and so on. This may not be immediately obvious, but machine learning is the future of both UI and UX. Today, machine learning is already changing the way users interact with their devices. Machine learning-based solutions are likely to become widely-adopted in UIs because of their convenience. Furthermore, ranking, contextual suggestions, automatic...

Summary

In this chapter, we explored association rule learning, which is a branch of unsupervised learning. We implemented the Apriori algorithm, which can be used to find patterns in the form of rules in different transactional datasets. Apriori's classical use case is market basket analysis. However, it is also important conceptually, because rule learning algorithms bridge the gap between classical artificial intelligence approaches (logical programming, concept learning, searching graphs, and so on) and logic-based machine learning (decision trees).

In the following chapter, we're going to return to supervised learning, but this time we will switch our attention from non-parametric models, such as KNN and k-means, to parametric linear models. We will also discuss linear regression and the gradient descent optimization method.

Bibliography

Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, in SIGMOD 1997, Proceedings ACM SIGMOD international conference on Management of data, pages 255-264, Tucson, Arizona, USA, May 1997
Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules, Proceedings of the 20^th international conference on very large databases, VLDB, pages 487-499, Santiago, Chile, September 1994 at: http://www.vldb.org/conf/1994/P487.PDF