You're reading from Machine Learning for Developers

Product typeBook

Published inOct 2017

Reading LevelBeginner

PublisherPackt

ISBN-139781786469878

Edition1st Edition

Languages

Python

Tools

SciPy Scikit-learn

Concepts

Machine Learning

Authors (2):

Rodolfo Bonnin

Md Mahmudul Hasan

View More author details

Clustering

Congratulations! You have finished this book's introductory section, in which you have explored a great number of topics, and if you were able to follow it, you are prepared to start the journey of understanding the inner workings of many machine learning models.

In this chapter, we will explore some effective and simple approaches for automatically finding interesting data conglomerates, and so begin to research the reasons for natural groupings in data.

This chapter will covers the following topics:

A line-by-line implementation of an example of the K-means algorithm, with explanations of the data structures and routines
A thorough explanation of the k-nearest neighbors (K-NN) algorithm, using a code example to explain the whole process
Additional methods of determining the optimal number of groups representing a set of samples

...

Grouping as a human activity

Humans typically tend to agglomerate everyday elements into groups of similar features. This feature of the human mind can also be replicated by an algorithm. Conversely, one of the simplest operations that can be initially applied to any unlabeled dataset is to group elements around common features.

As we have described, in this stage of the development of the discipline, clustering is taught as an introductory theme that's applied to the simplest categories of element sets.

But as an author, I recommend researching this domain, because the community is hinting that the current model's performance will all reach a plateau, before aiming for the full generalization of tasks in AI. And what kinds of method are the main candidates for the next stages of crossing the frontier towards AI? Unsupervised methods, in the form of very sophisticated...

Automating the clustering process

The grouping of information for clustering follows a common pattern for all techniques. Basically, we have an initialization stage, followed by the iterative insertion of new elements, after which the new group relationships are updated. This process continues until the stop criteria is met, where the group characterization is finished. The following flow diagram illustrates this process:

General scheme for a clustering algorithm

After we get a clear sense of the overall process, let's start working with several cases where this scheme is applied, starting with K-means.

Finding a common center - K-means

Here we go! After some necessary preparation review, we will finally start to learn from data; in this case, we are looking to label data we observe in real life.

In this case, we have the following elements:

A set of N-dimensional elements of numeric type
A predetermined number of groups (this is tricky because we have to make an educated guess)
A set of common representative points for each group (called centroids)

The main objective of this method is to split the dataset into an arbitrary number of clusters, each of which can be represented by the mentioned centroids.

The word centroid comes from the mathematics world, and has been translated to calculus and physics. Here we find a classical representation of the analytical calculation of a triangle's centroid:

Graphical depiction of the centroid finding scheme for a triangle

The centroid...

Nearest neighbors

K-NN is another classical method of clustering. It builds groups of samples, supposing that each new sample will have the same class as its neighbors, without looking for a global representative central sample. Instead, it looks at the environment, looking for the most frequent class on each new sample's environment.

Mechanics of K-NN

K-NN can be implemented in many configurations, but in this chapter we will use the semi-supervised approach, starting from a certain number of already assigned samples, and later guessing the cluster membership using the main criteria.

In the following diagram, we have a breakdown of the algorithm. It can be summarized with the following steps:

Flowchart for the K-NN...

K-NN sample implementation

For this simple implementation of the K-NN method, we will use the NumPy and Matplotlib libraries. Also, as we will be generating a synthetic dataset for better comprehension, we will use the make_blobs method from scikit-learn, which will generate well-defined and separated groups of information so we have a sure reference for our implementation.

Importing the required libraries:

    import numpy as np

    import matplotlib
    import matplotlib.pyplot as plt

    from sklearn.datasets.samples_generator import make_blobs
    %matplotlib inline

So, it's time to generate the data samples for this example. The parameters of make_blobs are the number of samples, the number of features or dimensions, the quantity of centers or groups, whether the samples have to be shuffled, and the standard deviation of the cluster, to control how dispersed...

Summary

In this chapter, we have covered the simplest but still very practical machine learning models in an eminently practical way to get us started on the complexity scale.

In the following chapter, where we will cover several regression techniques, it will be time to go and solve a new type of problem that we have not worked on, even if it's possible to solve the problem with clustering methods (regression), using new mathematical tools for approximating unknown values. In it, we will model past data using mathematical functions, and try to model new output based on those modeling functions.

References

Thorndike, Robert L, Who belongs in the family?, Psychometrika18.4 (1953): 267-276.
Steinhaus, H, Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci 1 (1956): 801–804.
MacQueen, James, Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. No. 14. 1967.
Cover, Thomas, and Peter Hart, Nearest neighbor pattern classification. IEEE transactions on information theory 13.1 (1967): 21-27.

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning for Developers

Published in: Oct 2017Publisher: PacktISBN-13: 9781786469878

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Rodolfo Bonnin

Rodolfo Bonnin is a systems engineer and Ph.D. student at Universidad Tecnolgica Nacional, Argentina. He has also pursued parallel programming and image understanding postgraduate courses at Universitt Stuttgart, Germany. He has been doing research on high-performance computing since 2005 and began studying and implementing convolutional neural networks in 2008, writing a CPU- and GPU-supporting neural network feedforward stage. More recently he's been working in the field of fraud pattern detection with Neural Networks and is currently working on signal classification using machine learning techniques. He is also the author of Building Machine Learning Projects with Tensorflow and Machine Learning for Developers by Packt Publishing.
Read more about Rodolfo Bonnin

Md Mahmudul Hasan

Other recommended products

Related to this chapter

AI Crash Course

AI legend Hadelin de Ponteves captures his proven AI training approach in a friendly, interactive, and hands-on tutorial book.

BookNov 2019360 pages5

AI Crash Course

AI legend Hadelin de Ponteves captures his proven AI training approach in a friendly, interactive, and hands-on tutorial book.

BookNov 2019360 pages5

TensorFlow 1.x Deep Learning Cookbook

Deep Neural Networks (DNNs) have achieved a lot of success in the field of computer vision, speech recognition, and natural language processing. In this book, you will learn how to efficiently use TensorFlow, Google's open source framework for deep learning, and implement different deep learning networks with easy to follow independent recipes.

BookDec 2017536 pages

Python Data Mining Quick Start Guide

This book is an introduction to data mining and its practical demonstration of working with real-world data sets. With this book, you will be able to extract useful insights using common Python libraries. You will also learn key stages like data loading, cleaning, analysis, visualization to build an efficient data mining pipeline.

BookApr 2019188 pages

SciPy Recipes

The SciPy stack is a popular Python ecosystem used for mathematical and scientific computing tasks. Learn how you can put to use the various functionalities offered by the SciPy stack in the most efficient way possible. With the help of this book, you will solve real-world problems in linear algebra, numerical analysis, visualization, and more.

BookDec 2017386 pages

Deep Learning with Keras

Keras is a high-level neural network library written in Python that runs on top of either Theano or TensorFlow. With this book, you’ll learn the basics of Keras in a highly practical way and understand how this minimal, highly modular framework runs on both CPU and GPU, allowing you to put your ideas into action in the shortest possible time.

BookApr 2017318 pages

Practical Convolutional Neural Networks

This book helps you master CNN, from the basics to the most advanced concepts in CNN such as GANs, instance classification and attention mechanism for vision models and more. You will implement advanced CNN models using complex image and video datasets. By the end of the book you will learn CNN’s best practices to implement smart ConvNet models and apply them to solve complex deep learning problems.

BookFeb 2018218 pages

Mastering Numerical Computing with NumPy

Mastering Numerical Computing with Python guides you in performing complex computing with cutting-edge coverage on advanced concepts such as exploratory data analysis and clustering algorithms. You'll become an expert in addressing matrix calculations, and write efficient NumPy codes for implementing algorithms with real-world examples.

BookJun 2018248 pages

Hands-On Deep Learning Architectures with Python

This book explains the essential learning algorithms used for deep and shallow architectures. Packed with practical implementations to help you understand the concepts and ideas required to build efficient artificial intelligence systems, this book will help you construct deep models using popular frameworks and datasets.

BookApr 2019316 pages

R Deep Learning Essentials

This book demonstrates how to use deep Learning in R for machine learning, image classification, and natural language processing. It covers topics such as convolutional networks, recurrent neural networks, transfer learning and deep learning in the cloud. By the end of this book, you will be able to apply deep learning to real-world projects.

BookAug 2018378 pages

Hands-On Artificial Intelligence for IoT

The book will help you get well-versed with different techniques in Artificial Intelligence such as machine learning, deep learning, natural language processing and more to build smart IoT systems. By the end of the book, you will have practical knowledge on how to implement and manipulate text, audio, and speech data within the IoT system.

BookJan 2019390 pages

Hands-On Generative Adversarial Networks with Keras

This book will explore deep learning and generative models, and their applications in artificial intelligence. You will learn to evaluate and improve your GAN models by eliminating challenges that are encountered in real-world applications. You will implement GAN architectures in various domains such as computer vision, NLP, and audio processing

BookMay 2019272 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages