Chapter 4. Evaluating the Recommender Systems
The previous chapter showed you how to build recommender systems. There are a few options, and some of them can be developed using the recommenderlab
package. In addition, each technique has some parameters. After we build the models, how can we decide which one to use? How can we determine its parameters? We can first test the performance of some models and/or parameter configurations and then choose the one that performs best.
This chapter will show you how to evaluate recommender models, compare their performances, and choose the most appropriate model. In this chapter, we will cover the following topics:
Preparing the data to evaluate performance
Evaluating the performance of some models
Choosing the best performing models
Optimizing model parameters
Preparing the data to evaluate the models
To evaluate models, you need to build them with some data and test them on some other data. This chapter will show you how to prepare the two sets of data. The recommenderlab
package contains prebuilt tools that help in this task.
The target is to define two datasets, which are as follows:
In order to evaluate the models, we need to compare the recommendations with the user preferences. In order to do so, we need to forget about some user preferences in the test set and see whether the techniques are able to identify them. For each user in the test set, we ignore some purchases and build the recommendations based on the others. Let's load the packages:
The data-set that we will use is called MovieLense
. Let's define ratings_movies
containing only the most relevant users and movies:
Evaluating recommender techniques
This chapter will show you two popular approaches to evaluate recommendations. They are both based on the cross-validation framework described in the previous section.
The first approach is to evaluate the ratings estimated by the algorithm. The other approach is to evaluate the recommendations directly. There is a subsection for each approach.
In order to recommend items to new users, collaborative filtering estimates the ratings of items that are not yet purchased. Then, it recommends the top-rated items. At the moment, let's forget about the last step. We can evaluate the model by comparing the estimated ratings with the real ones.
First, let's prepare the data for validation, as shown in the previous section. Since the k-fold is the most accurate approach, we will use it here:
Identifying the most suitable model
The previous chapter showed you how to evaluate a model. The performance indices are useful to compare different models and/or parameters. Applying different techniques on the same data, we can compare a performance index to pick the most appropriate recommender. Since there are different evaluation metrics, there is no objective way to do it.
The starting point is the k-fold evaluation framework that we defined in the previous section. It is stored inside eval_sets
.
In order to compare different models, we first need to define them. Each model is stored in a list with its name and parameters. The components of the list are as follows:
name
: This is the model name.
param
: This is a list with its parameters. It can be NULL, if all the parameters are left at their defaults.
For instance, that's how we can define an item-based collaborative filtering by setting the k
parameter to 20
:
In order to evaluate...
This chapter showed you how to evaluate the performance of different models in order to choose the most accurate one. There are different ways to evaluate performances that might potentially lead to different choices. Depending on the business target, the evaluation metric is different. This is an example of how business and data should be combined to achieve the final result.
The next chapter will explain a complete use case in which we will prepare the data, build different models, and test them.