Reader small image

You're reading from  Apache Mahout Essentials

Product typeBook
Published inJun 2015
Reading LevelIntermediate
Publisher
ISBN-139781783554997
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayani Withanawasam
Jayani Withanawasam
author image
Jayani Withanawasam

Jayani Withanawasam is R&D engineer and a senior software engineer at Zaizi Asia, where she focuses on applying machine learning techniques to provide smart content management solutions. She is currently pursuing an MSc degree in artificial intelligence at the University of Moratuwa, Sri Lanka, and has completed her BE in software engineering (with first class honors) from the University of Westminster, UK. She has more than 6 years of industry experience, and she has worked in areas such as machine learning, natural language processing, and semantic web technologies during her tenure. She is passionate about working with semantic technologies and big data.
Read more about Jayani Withanawasam

Right arrow

Chapter 4. Recommendations

In this chapter, we will cover the recommendation techniques used in Apache Mahout. We will discuss the related MapReduce- and Spark-based implementations with respect to a real-world example, with Java code examples as well as command-line executions.

In this chapter, we will cover the following topics:

  • Collaborative versus content-based filtering

  • User-based recommenders

  • Data models

  • Similarity

  • Neighborhoods

  • Recommenders

  • Item-based recommenders with Spark

  • Matrix factorization-based recommenders

    • SVD recommenders

    • ALS-WS

  • Evaluation techniques

  • Recommendation tips and tricks

 

"A lot of times, people don't know what they want until you show it to them."

 
 --Steve Jobs

Before we proceed with the chapter, let's think about the significance of the preceding quote for a moment.

  • How many times have you come across relevant items to buy, which were suggested by Amazon recommendations?

  • How many times have you found your friends when suggested by Facebook, which you did not notice earlier?

  • How...

Collaborative versus content-based filtering


There are two main approaches you can take when it comes to filtering information.

Content-based filtering

Content-based filtering is an unsupervised mechanism based on the attributes of the items and the preferences and model of the user.

For example, if a user views a movie with a certain set of attributes, such as genre, actors, and awards, the systems recommend items with similar attributes. The preferences of the user (for example, previous "likes" for movies) are mapped with the attributes or features of the recommended item.

User ratings are not required in this approach. However, this approach requires considerable effort when it comes to feature or attribute extraction, and it is also relatively less precise than collaborative filtering approaches, which we will discuss later.

Collaborative filtering

Collaborative filtering approaches consider the notion of similarity between items and users. The features of a product or the properties of users...

User-based recommenders


In user-based recommenders, similar users from a given neighborhood are identified and item recommendations are given based on what similar users already bought or viewed, which a particular user did not buy or view yet.

For example, as shown in the following figure, if Nimal likes the movies Interstellar (2014) and Lucy (2014) and Sunil also likes the same movies, and in addition, if Sunil likes The Matrix (1999) as well, then we can recommend The Matrix (1999) to Nimal, as the chances are that Nimal and Sunil are like-minded people.

A real-world example – movie recommendations

Let's explain this approach using a real-world example on a movie recommendation site, as shown in the following figure:

Users who watched the movies (items) rated them according to their preferences. The rating is a value between 1 (lowest) and 10 (highest).

The user, item, and preferences (ratings) information is given in the following table; you need to save this data as movies.csv in order...

Item-based recommenders


An item-based recommender measures the similarities between different items and picks the top k closest (in similarity) items to a given item in order to arrive at a rating prediction or recommendation for a given user for a given item.

For the movie recommendation scenario, an item-based recommender works as given in the following figure:

Let's say both Sunil and Roshan like the movies Interstellar (2014) and Star Wars (1977). Then, we can infer that Interstellar (2014) and Star Wars (1977) could be similar items. So, when Nimal likes Interstellar (2014), we recommend Star Wars (1977) to Nimal based on our previous observation.

The following is the Java code example for item-based recommenders:

DataModel model = new FileDataModel (new File("movie.csv"));

ItemSimilarity itemSimilarity = new EuclideanDistanceSimilarity (model);

Recommender itemRecommender = new GenericItemBasedRecommender(model,itemSimilarity);

List<RecommendedItem> itemRecommendations = itemRecommender...

Matrix factorization-based recommenders


So far, we have discussed two main collaborative filtering approaches, namely user-based and item-based recommenders.

Even though they are capable of providing users with relevant recommendations, a major challenge that these approaches face is the sparsity of large datasets. Not all users will provide ratings on all the available items. Also, new items and new users tend to lack sufficient historical data to predict good recommendations. This is known as the cold start problem.

Further, the requirement for scalable recommendation algorithms remains the same along with the requirement to perform well in sparse datasets.

Also, some users tend to have a bias toward ratings, and the previous approaches have not made an attempt to correct this bias. Also, hidden patterns between the features of available items and the features of users that lead to certain ratings are not exploited.

Matrix factorization is another way of doing collaborative filtering, which...

Singular value decomposition


Using Singular Value Decomposition (SVD), we can come up with a more generalized set of features to represent the user-item preferences for a large dataset using dimensionality reduction techniques. This approach helps to generalize users into lesser dimensions.

The following is the Java code example for SVD using ALS-WR as the factorizer; the number of target features should be given as input, which in this case (3. 0.065) is given as lambda (the regularization parameter), and the number of iterations is given as 1:

DataModel svdmodel = new FileDataModel (new File("movie.csv"));

ALSWRFactorizer factorizer = new ALSWRFactorizer(svdmodel, 3, 0.065, 1);

Recommender svdrecommender = new SVDRecommender(svdmodel, factorizer);
for (RecommendedItem recommendation :svdrecommender.recommend(3,1))
{
  System.out.println(recommendation);
}

The following is the output of the preceding code:

RecommendedItem[item:3, value:7.2046385]

The following is the command-line execution...

Summary


The Apache Mahout recommendations module helps you to recommend items to users which they have not seen before, based on their previous preferences. The collaborative filtering approach is implemented in Mahout. User-based recommendations, item-based recommendations, and matrix factorization are the key approaches that are geared toward collaborative filtering in Mahout.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Mahout Essentials
Published in: Jun 2015Publisher: ISBN-13: 9781783554997
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayani Withanawasam

Jayani Withanawasam is R&D engineer and a senior software engineer at Zaizi Asia, where she focuses on applying machine learning techniques to provide smart content management solutions. She is currently pursuing an MSc degree in artificial intelligence at the University of Moratuwa, Sri Lanka, and has completed her BE in software engineering (with first class honors) from the University of Westminster, UK. She has more than 6 years of industry experience, and she has worked in areas such as machine learning, natural language processing, and semantic web technologies during her tenure. She is passionate about working with semantic technologies and big data.
Read more about Jayani Withanawasam