Reader small image

You're reading from  Learning Predictive Analytics with Python

Product typeBook
Published inFeb 2016
Reading LevelIntermediate
Publisher
ISBN-139781783983261
Edition1st Edition
Languages
Right arrow
Authors (2):
Ashish Kumar
Ashish Kumar
author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

View More author details
Right arrow

Mathematics behind clustering


Earlier in this chapter, we discussed how a measure of similarity or dissimilarity is needed for the purpose of clustering observations. In this section, we will see what those measures are and how they are used.

Distances between two observations

If we consider each observation as a point in an n-dimensional space, where n is the number of columns in the dataset, one can calculate the mathematical distance between the points. The lesser the distance, the more similar they are. The points that are less distant to each other will be clubbed together.

Now, there are many ways of calculating distances and different algorithms use different methods of calculating distance. Let us see the different methods with a few examples. Let us consider a sample dataset of 10 observations with three variables, each to illustrate the distance better. The following dataset contains percentage marks obtained by 10 students in English, Maths, and Science:

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Predictive Analytics with Python
Published in: Feb 2016Publisher: ISBN-13: 9781783983261

Authors (2)

author image
Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Student

English

...