Packt+ | Advance your knowledge in tech

You're reading from F# for Machine Learning Essentials

Product typeBook

Published inFeb 2016

Reading LevelExpert

Publisher

ISBN-139781783989348

Edition1st Edition

Languages

Tools

Visual Studio

Concepts

Machine Learning

Author (1)

Sudipta Mukherjee

Chapter 5. Collaborative Filtering

"People who bought this also bought these."

When you shop online, the site recommends some items to you based on your shopping history and the products that you looked at and probably liked. The system that generates these recommendations is known as a recommendation engine. There are several types of state-of-the-art algorithms used to create a recommendation engine. In this chapter, a couple of such types of algorithms will be discussed, namely "collaborative filtering" and "content-based filtering".

Objective

After reading this chapter, you will be able to understand how some of the sites recommend items based on what products you have rated and based on your item browsing history. You will understand the mathematics behind collaborative filtering and how these can be applied to your problem domain.

Different classification algorithms you will learn

The following algorithms will be discussed at length:

General non-personalized recommendations for a category of items
User-user collaborative filtering
User-user collaborative filtering with variations on similarity measures
Item-item collaborative filtering
Item-item collaborative filtering with variations

You will also learn how to evaluate recommendation engines using several evaluation metrics for different purposes, such as prediction accuracy, ranking accuracy in the context of top-N recommendations, and so on.

Vocabulary of collaborative filtering

The design of collaborative filters is influenced by two factors, which are as follows:

Users (the people for whom the recommendation is being provided)
Items (the products for which the recommendation is being provided)

Based on these two entities, there are two major varieties of collaborative filtering methods. The first of these methods takes the similarity between users to recommend items. This is known as User-User collaborative filtering or User k-Nearest Neighbors.

If the number of users of a recommender system is denoted by m and the number of items is denoted by n and if m >> n (m is much greater than n) then user-user collaborative filtering suffers from performance hiccups. In this case, item-item collaborative filtering (which relies on the similarity of the items) is often implemented.

In the next few sections, these two algorithms will be discussed at length.

Baseline predictors

Before delving into true collaborative filtering, let's look at some baseline predictors that can predict ratings for new users who haven't rated anything yet, which makes it almost impossible to find out the neighborhood of such users. For such users, a basic baseline rating can be the average of all ratings. The problem with applying collaborative filtering in order to predict the ratings of items for new users is referred to as Cold Start in collaborative filtering literature.

The baseline predictor is normally denoted by for user and item . The base case where the baseline is set as equal to the global average of all ratings is given by the following formula:

However, this can be optimized using the average of that user's rating for other items (if any are available) or the average rating for that particular item (given by all other users). In these two cases, the baseline predictor is calculated using the following formulae:

Note that the subscript is used to denote...

Item-item collaborative filtering

Like the user-user approach, items can be used to find similarities and then items can be recommended. The idea is to locate similar items based on who is purchasing them. This approach is generally taken when user bases grow. So, if the number of users is denoted by M and number of items is denoted by N, when M >> N (read: M is much greater than N) then this approach yields a good result. Most of the e-commerce sites use this algorithm to recommend items to users.

Amazon's "People who bought this also bought these" is a simple output of this approach. You can think of this as an inverted index of items and users in a very simple way, devoid of all the details about how ratings are persisted.

Items	Users
I1	U1,U2,U3,U5
I2	U3,U1
I3	U1,U5

Now, if a new user buys item I2, then the system will list all the other items that have been purchased by U3 and U1, which are I1 and I3. On the other hand, if the new user buys item I3, then the system will...

Top-N recommendations

So far, the discussion has been around prediction. A recommendation is nothing but the sorting of items based on their predicted rating values. So, for a new user who has rated a few items, the system calculates rating predictions for several items and then presents the sorted list of items based on their ratings in a descending order.

Evaluating recommendations

Understanding how good a collaborative filtering system is can be broadly determined by measuring three types of accuracy parameters, namely:

Prediction Accuracy Metrics
These measures help to understand how accurately the recommender works. These measures work by calculating the differences between previously rated items and their ratings estimated by the recommender system.
Decision Support Metrics (a.k.a Confusion Matrix)
These measures are used to find how well a supervised learning algorithm has performed.
Ranking Accuracy Metrics
These metrics are used to find out how well the recommender has placed the items in the final recommended list.

Prediction accuracy

Metrics help us to understand how good the predicted ratings are. Here are some of the prediction accuracy metrics that are used frequently:

Note

denotes the predicted rating for user on item . And is the actual rating. So the closer the value of these metrics to zero, the better the prediction algorithm...

Ranking accuracy metrics

A third view of the task of a recommender system is that it ranks all items with respect to a user (or ranks all user-item pairs), such that the higher-ranked recommendations are more likely to be relevant to users. Individual rating predictions may be incorrect, but, as long as the order is caught correctly, rank accuracy measures will evaluate the system as having a high accuracy.

Prediction-rating correlation

If the variance of one variable can be explained by the variance in another, the two variables are said to correlate. Let be items and be their true order rank. Let the recommender system predict the ranks for these items (i.e., is the true rank of the item and is the predicted rank). Let be the mean of , and be the mean of . The Spearman's correlation is defined as follows:

The following code finds the coefficient:

Get the raw code at https://gist.github.com/sudipto80/89253a340503f7559cce

This produces the following output:

val p : float = 0.9338995047...

Working with real movie review data (Movie Lens)

You can download the Movie Lens 100K dataset (for collaborative filtering) from http://files.grouplens.org/datasets/movielens/ml-100k/.

This dataset has movie ratings given by 943 users for 1,682 movies. These ratings are stored in the u.data file at http://files.grouplens.org/datasets/movielens/ml-100k/u.data. The full u data set has 100,000 ratings by 943 users on 1,682 items.

Each user has rated at least 20 movies. The users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of:

user id | item id | rating | timestamp.

The time stamps are in Unix seconds since 1/1/1970 UTC.

The following C# program translates this data to an F# array so that this data can be fed to the collaborative filtering algorithms implemented earlier in the chapter:

https://gist.github.com/sudipto80/606418978f4a86fe93aa

Once you generate this array, you can then plug this into the algorithms described earlier.

Summary

In this chapter, the most commonly used memory-based approaches for recommendations were discussed. There are several other approaches to recommender system building, which have not been discussed here, such as model-based and hybrid recommendations systems that take cues from several other recommendation algorithms to produce final recommendations. However, We hope this chapter gave you a nice introduction and hands-on guide to implementations for these collaborative filtering ideas. All source code is available at https://gist.github.com/sudipto80/7002e66350ca7b2a7551.

The rest of the chapter is locked

You have been reading a chapter from

F# for Machine Learning Essentials

Published in: Feb 2016Publisher: ISBN-13: 9781783989348

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Sudipta Mukherjee

Sudipta Mukherjee was born in Kolkata and migrated to Bangalore. He is an electronics engineer by education and a computer engineer/scientist by profession and passion. He graduated in 2004 with a degree in electronics and communication engineering. He has a keen interest in data structure, algorithms, text processing, natural language processing tools development, programming languages, and machine learning at large. His first book on Data Structure using C has been received quite well. Parts of the book can be read on Google Books. The book was also translated into simplified Chinese, available from Amazon.cn. This is Sudipta's second book with Packt Publishing. His first book, .NET 4.0 Generics , was also received very well. During the last few years, he has been hooked to the functional programming style. His book on functional programming, Thinking in LINQ, was released in 2014. He lives in Bangalore with his wife and son. Sudipta can be reached via e-mail at sudipto80@yahoo.com and via Twitter at @samthecoder.
Read more about Sudipta Mukherjee

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages