Reader small image

You're reading from  Fast Data Processing with Spark 2 - Third Edition

Product typeBook
Published inOct 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781785889271
Edition3rd Edition
Languages
Right arrow
Author (1)
Holden Karau
Holden Karau
author image
Holden Karau

Holden Karau is a software development engineer and is active in the open source. She has worked on a variety of search, classification, and distributed systems problems at IBM, Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor's of mathematics degree in computer science. Other than software, she enjoys playing with fire and hula hoops, and welding.
Read more about Holden Karau

Right arrow

Recommendation


Recommendation systems are one of the most visible and popular machine learning applications on the Web, from Amazon to LinkedIn to Walmart. The algorithms behind recommendations systems are very interesting. Recommendation algorithms fall into roughly five general mechanisms: knowledge-based, demographic-based, content-based, collaborative filtering (item-based or user-based), and latent factor-based. Of these, collaborative filtering is the most widely used and unfortunately very computationally intensive.

Spark implements a scalable variation, the Alternating Least Square (ALS) algorithm authored by Yehuda Koren, available at http://dl.acm.org/citation.cfm?id=1608614. It is a user-based collaborative filtering mechanism that uses the latent factors method of learning, which can scale to a large Dataset. Let's quickly use the movielens medium Dataset to implement a recommendation using Spark. While the model development patterns follow what we have seen so far, there are...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Fast Data Processing with Spark 2 - Third Edition
Published in: Oct 2016Publisher: PacktISBN-13: 9781785889271

Author (1)

author image
Holden Karau

Holden Karau is a software development engineer and is active in the open source. She has worked on a variety of search, classification, and distributed systems problems at IBM, Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor's of mathematics degree in computer science. Other than software, she enjoys playing with fire and hula hoops, and welding.
Read more about Holden Karau