Reader small image

You're reading from  Python Data Mining Quick Start Guide

Product typeBook
Published inApr 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789800265
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Nathan Greeneltch
Nathan Greeneltch
author image
Nathan Greeneltch

Nathan Greeneltch, PhD is a ML engineer at Intel Corp and resident data mining and analytics expert in the AI consulting group. Hes worked with Python analytics in both the start-up realm and the large-scale manufacturing sector over the course of the last decade. Nathan regularly mentors new hires and engineers fresh to the field of analytics, with impromptu chalk talks and division-wide knowledge-sharing sessions at Intel. In his past life, he was a physical chemist studying surface enhancement of the vibration signals of small molecules; a topic on which he wrote a doctoral thesis while at Northwestern University in Evanston, IL. Nathan hails from the southeastern United States, with family in equal parts from Arkansas and Florida
Read more about Nathan Greeneltch

Right arrow

Introducing clustering concepts

Grouping and clustering methods have a very simple goal, and I want you to keep this goal in mind throughout this entire chapter.

The goal of clustering: Group similar things together, while separating dissimilar things.

That is the beginning and end of the motivation, but of course, as with other data mining tasks, the devil is in the details.

So, let's start the discussion by brainstorming what types of mathematical machinery we will need to get this task done right. We will need a quantitative way to describe the following three things:

  1. Location of group: A way to define where a group is in space that spans multiple dimensions
  2. Similarity: What it means to be similar and dissimilar to other data points
  3. Termination Condition: When to stop grouping, preferably without human intervention

If we can find a way to get these three things defined...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Python Data Mining Quick Start Guide
Published in: Apr 2019Publisher: PacktISBN-13: 9781789800265

Author (1)

author image
Nathan Greeneltch

Nathan Greeneltch, PhD is a ML engineer at Intel Corp and resident data mining and analytics expert in the AI consulting group. Hes worked with Python analytics in both the start-up realm and the large-scale manufacturing sector over the course of the last decade. Nathan regularly mentors new hires and engineers fresh to the field of analytics, with impromptu chalk talks and division-wide knowledge-sharing sessions at Intel. In his past life, he was a physical chemist studying surface enhancement of the vibration signals of small molecules; a topic on which he wrote a doctoral thesis while at Northwestern University in Evanston, IL. Nathan hails from the southeastern United States, with family in equal parts from Arkansas and Florida
Read more about Nathan Greeneltch