Reader small image

You're reading from  Hands-On Data Analysis with Scala

Product typeBook
Published inMay 2019
Reading LevelExpert
PublisherPackt
ISBN-139781789346114
Edition1st Edition
Languages
Right arrow
Author (1)
Rajesh Gupta
Rajesh Gupta
author image
Rajesh Gupta

Rajesh is a Hands-on Big Data Tech Lead and Enterprise Architect with extensive experience in the full life cycle of software development. He has successfully architected, developed and deployed highly scalable data solutions using Spark, Scala and Hadoop technology stack for several enterprises. A passionate, hands-on technologist, Rajesh has masters degrees in Mathematics and Computer Science from BITS, Pilani (India).
Read more about Rajesh Gupta

Right arrow

Streaming a k-means clustering algorithm using Spark

The k-means algorithm is an unsupervised machine learning (ML) clustering algorithm. The objective of this algorithm is to build k centers around which data points are centered, thereby forming k clusters. The most common implementation of this algorithm is generally done using batch-oriented processing. Streaming-based clustering algorithms are also available for this, with the following properties:

  • The k clusters are built using initial data
  • As new data arrives in minibatches, existing k clusters are updated to compute new k clusters
  • It also possible to control the decay or decrease in the significance of older data

At a high level, the preceding steps are quite similar to the word count problem that we solved using the streaming solution. The goal of the k-means algorithm is to partition the data into k clusters. If the...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Hands-On Data Analysis with Scala
Published in: May 2019Publisher: PacktISBN-13: 9781789346114

Author (1)

author image
Rajesh Gupta

Rajesh is a Hands-on Big Data Tech Lead and Enterprise Architect with extensive experience in the full life cycle of software development. He has successfully architected, developed and deployed highly scalable data solutions using Spark, Scala and Hadoop technology stack for several enterprises. A passionate, hands-on technologist, Rajesh has masters degrees in Mathematics and Computer Science from BITS, Pilani (India).
Read more about Rajesh Gupta