Reader small image

You're reading from  Big Data Analytics with Java

Product typeBook
Published inJul 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787288980
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
RAJAT MEHTA
RAJAT MEHTA
author image
RAJAT MEHTA

The author is a VP (Technical Architect) in technology in JP Morgan Chase in New York. The author is a sun certified java developer and has worked on java related technologies for more than 16 years. Current role for the past few years heavily involves the usage of bid data stack and running analytics on it. Author is also a contributor in various open source projects that are available on his GitHub repository and is also a frequent write on dev magazines.
Read more about RAJAT MEHTA

Right arrow

Summary


In this chapter, we learnt about clustering and we saw how this approach helps to group different items into groups with each group having items which are similar to them in some form. Clustering is an example of unsupervised learning and there are lots of popular clustering algorithms that are shipped by default in the Apache Spark package. We learnt about two clustering approaches, the first being k-means approach where items that are closer to each other based on some mathematical formula like Euclidean distance and so on were grouped together. We also learnt about bisecting k-means approach which is essentially and improvement on the regular k-means clustering and is creating by being a combination of hierarchical and k-means clustering. We also applied clustering on a sample dataset of retail from UCI. On this sample case study we segmented the customers of the website using clustering and tried to figure out the important customers for an online e-commerce store.

In the next...

lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
Big Data Analytics with Java
Published in: Jul 2017Publisher: PacktISBN-13: 9781787288980

Author (1)

author image
RAJAT MEHTA

The author is a VP (Technical Architect) in technology in JP Morgan Chase in New York. The author is a sun certified java developer and has worked on java related technologies for more than 16 years. Current role for the past few years heavily involves the usage of bid data stack and running analytics on it. Author is also a contributor in various open source projects that are available on his GitHub repository and is also a frequent write on dev magazines.
Read more about RAJAT MEHTA