Packt+ | Advance your knowledge in tech

You're reading from Machine Learning with R - Third Edition

Product type Book

Published in Apr 2019

Publisher Packt

ISBN-13 9781788295864

Pages 458 pages

Edition 3rd Edition

Languages

Concepts

Machine Learning

Author (1):

Brett Lantz

Table of Contents (18) Chapters

Machine Learning with R - Third Edition

Contributors

Preface

Other Books You May Enjoy

Leave a review - let other readers know what you think

Introducing Machine Learning

Managing and Understanding Data

Lazy Learning – Classification Using Nearest Neighbors

Probabilistic Learning – Classification Using Naive Bayes

Divide and Conquer – Classification Using Decision Trees and Rules

Forecasting Numeric Data – Regression Methods

Black Box Methods – Neural Networks and Support Vector Machines

Finding Patterns – Market Basket Analysis Using Association Rules

Finding Groups of Data – Clustering with k-means

Evaluating Model Performance

Improving Model Performance

Specialized Machine Learning Topics

Index

Chapter 9. Finding Groups of Data – Clustering with k-means

Have you ever spent time watching a crowd? If so, you are likely to have seen some recurring personalities. Perhaps a certain type of person, identified by a freshly pressed suit and a briefcase, comes to typify the "fat cat" business executive. A 20-something wearing skinny jeans, a flannel shirt, and sunglasses might be dubbed a "hipster," while a woman unloading children from a minivan may be labeled a "soccer mom."

Of course, these types of stereotypes are dangerous to apply to individuals, as no two people are exactly alike. Yet, understood as a way to describe a collective, the labels capture some underlying aspect of similarity shared among the individuals within the group.

As you will soon learn, the act of clustering, or spotting patterns in data, is not much different from spotting patterns in groups of people. This chapter describes:

The ways clustering tasks differ from the classification tasks we examined previously
How...

Understanding clustering

Clustering is an unsupervised machine learning task that automatically divides the data into clusters, or groups of similar items. It does this without having been told how the groups should look ahead of time. As we may not even know what we're looking for, clustering is used for knowledge discovery rather than prediction. It provides an insight into the natural groupings found within data.

Without advanced knowledge of what comprises a cluster, how can a computer possibly know where one group ends and another begins? The answer is simple: clustering is guided by the principle that items inside a cluster should be very similar to each other, but very different from those outside. The definition of similarity might vary across applications, but the basic idea is always the same: group the data such that related elements are placed together.

The resulting clusters can then be used for action. For instance, you might find clustering methods employed in applications such...

Finding teen market segments using k-means clustering

Interacting with friends on a social networking service (SNS) such as Facebook, Tumblr, and Instagram has become a rite of passage for teenagers around the world. Having a relatively large amount of disposable income, these adolescents are a coveted demographic for businesses hoping to sell snacks, beverages, electronics, and hygiene products.

The many millions of teenage consumers using such sites have attracted the attention of marketers struggling to find an edge in an increasingly competitive market. One way to gain this edge is to identify segments of teenagers who share similar tastes, so that clients can avoid targeting advertisements to teens with no interest in the product being sold. For instance, sporting apparel is likely to be a difficult sell to teens with no interest in sports.

Given the text of teenagers' SNS pages, we can identify groups that share common interests such as sports, religion, or music. Clustering can automate...

Summary

Our findings support the popular adage that "birds of a feather flock together." By using machine learning methods to cluster teenagers with others who have similar interests, we were able to develop a typology of teenage identities that was predictive of personal characteristics such as gender and number of friends. These same methods can be applied to other contexts with similar results.

This chapter covered only the fundamentals of clustering. There are many variants of the k-means algorithm, as well as many other clustering algorithms that bring unique biases and heuristics to the task. Based on the foundation in this chapter, you will be able to understand these clustering methods and apply them to new problems.

In the next chapter, we will begin to look at methods for measuring the success of a learning algorithm that are applicable across many machine learning tasks. While our process has always devoted some effort to evaluating the success of learning, in order to obtain the...