Reader small image

You're reading from  Hands-On Recommendation Systems with Python

Product typeBook
Published inJul 2018
Reading LevelExpert
PublisherPackt
ISBN-139781788993753
Edition1st Edition
Languages
Right arrow
Author (1)
Rounak Banik
Rounak Banik
author image
Rounak Banik

Rounak Banik is a Young India Fellow and an ECE graduate from IIT Roorkee. He has worked as a software engineer at Parceed, a New York start-up, and Springboard, an EdTech start-up based in San Francisco and Bangalore. He has also served as a backend development instructor at Acadview, teaching Python and Django to around 35 college students from Delhi and Dehradun. He is an alumni of Springboard's data science career track. He has given talks at the SciPy India Conference and published popular tutorials on Kaggle and DataCamp.
Read more about Rounak Banik

Right arrow

Building Collaborative Filters

In the previous chapter, we mathematically defined the collaborative filtering problem and gained an understanding of various data mining techniques that we assumed would be useful in solving this problem.

The time has finally come for us to put our skills to the test. In the first section, we will construct a well-defined framework that will allow us to build and test our collaborative filtering models effortlessly. This framework will consist of the data, the evaluation metric, and a corresponding function to compute that metric for a given model.

Technical requirements

The framework

Just like the knowledge-based and content-based recommenders, we will build our collaborative filtering models in the context of movies. Since collaborative filtering demands data on user behavior, we will be using a different dataset known as MovieLens.

The MovieLens dataset

The MovieLens dataset is made publicly available by GroupLens Research, a computer science lab at the University of Minnesota. It is one of the most popular benchmark datasets used to test the potency of various collaborative filtering models and is usually available in most recommender libraries and packages:

MovieLens gives us user ratings on a variety of movies and is available in various sizes. The full version consists of more than...

User-based collaborative filtering

In Chapter 1, Getting Started with Recommender Systems, we learned what user-based collaborative filters do: they find users similar to a particular user and then recommend products that those users have liked to the first user.

In this section, we will implement this idea in code. We will build filters of increasing complexity and gauge their performance using the framework we constructed in the previous section.

To aid us in this process, let's first build a ratings matrix (described in Chapters 1, Getting Started with Recommender Systems and Chapter 5, Getting Started with Data Mining Techniques) where each row represents a user and each column represents a movie. Therefore, the value in the ith row and jth column will denote the rating given by user i to movie j. As usual, pandas gives us a very useful function, called pivot_table, to...

Item-based collaborative filtering

Item-based collaborative filtering is essentially user-based collaborative filtering where the users now play the role that items played, and vice versa.

In item-based collaborative filtering, we compute the pairwise similarity of every item in the inventory. Then, given user_id and movie_id, we compute the weighted mean of the ratings given by the user to all the items they have rated. The basic idea behind this model is that a particular user is likely to rate two items that are similar to each other similarly.

Building an item-based collaborative filter is left as an exercise to the reader. The steps involved are exactly the same except now, as mentioned earlier, the movies and users have swapped places.

Model-based approaches

The collaborative filters we have built thus far are known as memory-based filters. This is because they only make use of similarity metrics to come up with their results. They learn any parameters from the data or assign classes/clusters to the data. In other words, they do not make use of machine learning algorithms.

In this section, we will take a look at some filters that do. We spent an entire chapter looking at various supervised and unsupervised learning techniques. The time has finally come to see them in action and test their potency.

Clustering

In our weighted mean-based filter, we took every user into consideration when trying to predict the final rating. In contrast, our demographic-based...

Summary

This brings us to the end of our discussion on collaborative filters. In this chapter, we built various kinds of user-based collaborative filters and, by extension, learned to build item-based collaborative filters as well.

We then shifted our focus to model-based approaches that rely on machine learning algorithms to churn out predictions. We were introduced to the surprise library and used it to implement a clustering model based on kNN. We then took a look at an approach to using supervised learning algorithms to predict the missing values in the ratings matrix. Finally, we gained a layman's understanding of the singular-value decomposition algorithm and implemented it using surprise.

All the recommenders we've built so far reside only inside our Jupyter Notebooks. In the next chapter, we will learn how to deploy our models to the web, where they can be used...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Recommendation Systems with Python
Published in: Jul 2018Publisher: PacktISBN-13: 9781788993753
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rounak Banik

Rounak Banik is a Young India Fellow and an ECE graduate from IIT Roorkee. He has worked as a software engineer at Parceed, a New York start-up, and Springboard, an EdTech start-up based in San Francisco and Bangalore. He has also served as a backend development instructor at Acadview, teaching Python and Django to around 35 college students from Delhi and Dehradun. He is an alumni of Springboard's data science career track. He has given talks at the SciPy India Conference and published popular tutorials on Kaggle and DataCamp.
Read more about Rounak Banik