Reader small image

You're reading from  Frank Kane's Taming Big Data with Apache Spark and Python

Product typeBook
Published inJun 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787287945
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Frank Kane
Frank Kane
author image
Frank Kane

Frank Kane has spent nine years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers all the time. He holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology and teaches others about big data analysis.
Read more about Frank Kane

Right arrow

Item-based collaborative filtering in Spark, cache(), and persist()


We're now going to cover a topic that's near and dear to my heart-collaborative filtering. Have you ever been to some place like amazon.com and seen something like "people who bought this also bought," or have you seen "similar movies" suggested on imdb.com? I used to work on that. In this section, I'm going to show you some general algorithms on how that works under the hood. Now I can't tell you exactly how Amazon does it, because Jeff Bezos would hunt me down and probably do terrible things to me, but I can tell you some generally known techniques that you can build upon for doing something similar. Let's talk about a technique called item-based collaborative filtering and discuss how that works. We'll apply it to our MovieLens data to actually figure out similar movies to each other based on user ratings.

We're doing some pretty complicated and advanced stuff at this point in the book. The good news is this is probably...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Frank Kane's Taming Big Data with Apache Spark and Python
Published in: Jun 2017Publisher: PacktISBN-13: 9781787287945

Author (1)

author image
Frank Kane

Frank Kane has spent nine years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers all the time. He holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology and teaches others about big data analysis.
Read more about Frank Kane