When you go to a bookstore to buy books, you have a particular book in mind generally, which you are interested in buying and you look for that particular book in the bookshelves. Usually, in the book store, the top selling books at that point in time are kept upfront and the remaining inventory is kept on the shelves arranged (sorted). A typical small bookstore can have say a few thousand books or maybe more. So, in short, the limit to which the physical products are available is right in front of you as a customer and you can pick and choose what you like at that moment. Also, physical stores keep top products in front as they are more sellable, but there is no way the products can be arranged according to the choice or preference of a customer coming to a physical store. However, this is not the case when you go to popular online e-commerce store such as Amazon or Walmart. There could be a million if not a billion products on Amazon when you go to buy...
You're reading from Big Data Analytics with Java
Before we dig deeper into the concepts of the recommendation system, let's see two real-world examples of recommendation engines that we might be using on a daily basis. The examples are shown in the following screenshots. The first screenshot is from Amazon.com, where we can see a section called Customers who bought this also bought, and the second screenshot will be from YouTube.com, where we are seeing a section called Recommended:
As you can see in the screenshot which we have taken from http://www.amazon.com, it shows a list of books on Java. So if you search for keyword Core Java
on Amazon.com for buying books, you will get a list books on Core Java. If you select one of those core java books now and click on it, you will be directed to the page where you will get the full description about the book: its price, author, reviews, and so on. It is here, at the bottom of this section you will get a link as shown above where it is mentioned Customers...
In content-based recommendations, the recommendation systems check for similarity between the items based on their attributes or content and then propose those items to the end users. For example, if there is a movie and the recommendation system has to show similar movies to the users, then it might check for the attributes of the movie such as the director name, the actors in the movie, the genre of the movie, and so on or if there is a news website and the recommendation system has to show similar news then it might check for the presence of certain words within the news articles to build the similarity criteria. As such the recommendations are based on actual content whether in the form of tags, metadata, or content from the item itself (as in the case of news articles).
Let's try to understand content-based recommendation using the following diagram:
As you can see in the preceding diagram, there are four movies each with a specific director and genre...
In this chapter, we learned about recommendation engines. We saw the two types of recommendation engines, that is, content recommenders and collaborative filtering recommenders. We learned how content recommenders can be built on zero to no historical data and are based on the attributes present on the item itself, using which, we figure out the similarity with other items and recommend them. Later, we worked on a collaborative filtering example using the same MovieLens dataset and the Apache Spark alternating least square recommender. We learned that collaborative filtering is based on historical data of users' activity, based on which other similar users are figured out and the products they liked are recommended to the other users.
In the next chapter, we will learn two important algorithms that are part of the unsupervised learning world and they will help us form clusters or groups in unlabeled data. We will also see how these algorithms help us segment the important customers...