Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning with R - Third Edition

You're reading from  Machine Learning with R - Third Edition

Product type Book
Published in Apr 2019
Publisher Packt
ISBN-13 9781788295864
Pages 458 pages
Edition 3rd Edition
Languages
Author (1):
Brett Lantz Brett Lantz
Profile icon Brett Lantz

Table of Contents (18) Chapters

Machine Learning with R - Third Edition
Contributors
Preface
Other Books You May Enjoy
Leave a review - let other readers know what you think
Introducing Machine Learning Managing and Understanding Data Lazy Learning – Classification Using Nearest Neighbors Probabilistic Learning – Classification Using Naive Bayes Divide and Conquer – Classification Using Decision Trees and Rules Forecasting Numeric Data – Regression Methods Black Box Methods – Neural Networks and Support Vector Machines Finding Patterns – Market Basket Analysis Using Association Rules Finding Groups of Data – Clustering with k-means Evaluating Model Performance Improving Model Performance Specialized Machine Learning Topics Index

Chapter 9. Finding Groups of Data – Clustering with k-means

Have you ever spent time watching a crowd? If so, you are likely to have seen some recurring personalities. Perhaps a certain type of person, identified by a freshly pressed suit and a briefcase, comes to typify the "fat cat" business executive. A 20-something wearing skinny jeans, a flannel shirt, and sunglasses might be dubbed a "hipster," while a woman unloading children from a minivan may be labeled a "soccer mom."

Of course, these types of stereotypes are dangerous to apply to individuals, as no two people are exactly alike. Yet, understood as a way to describe a collective, the labels capture some underlying aspect of similarity shared among the individuals within the group.

As you will soon learn, the act of clustering, or spotting patterns in data, is not much different from spotting patterns in groups of people. This chapter describes:

  • The ways clustering tasks differ from the classification tasks we examined previously

  • How...

Understanding clustering


Clustering is an unsupervised machine learning task that automatically divides the data into clusters, or groups of similar items. It does this without having been told how the groups should look ahead of time. As we may not even know what we're looking for, clustering is used for knowledge discovery rather than prediction. It provides an insight into the natural groupings found within data.

Without advanced knowledge of what comprises a cluster, how can a computer possibly know where one group ends and another begins? The answer is simple: clustering is guided by the principle that items inside a cluster should be very similar to each other, but very different from those outside. The definition of similarity might vary across applications, but the basic idea is always the same: group the data such that related elements are placed together.

The resulting clusters can then be used for action. For instance, you might find clustering methods employed in applications such...

Finding teen market segments using k-means clustering


Interacting with friends on a social networking service (SNS) such as Facebook, Tumblr, and Instagram has become a rite of passage for teenagers around the world. Having a relatively large amount of disposable income, these adolescents are a coveted demographic for businesses hoping to sell snacks, beverages, electronics, and hygiene products.

The many millions of teenage consumers using such sites have attracted the attention of marketers struggling to find an edge in an increasingly competitive market. One way to gain this edge is to identify segments of teenagers who share similar tastes, so that clients can avoid targeting advertisements to teens with no interest in the product being sold. For instance, sporting apparel is likely to be a difficult sell to teens with no interest in sports.

Given the text of teenagers' SNS pages, we can identify groups that share common interests such as sports, religion, or music. Clustering can automate...

Summary


Our findings support the popular adage that "birds of a feather flock together." By using machine learning methods to cluster teenagers with others who have similar interests, we were able to develop a typology of teenage identities that was predictive of personal characteristics such as gender and number of friends. These same methods can be applied to other contexts with similar results.

This chapter covered only the fundamentals of clustering. There are many variants of the k-means algorithm, as well as many other clustering algorithms that bring unique biases and heuristics to the task. Based on the foundation in this chapter, you will be able to understand these clustering methods and apply them to new problems.

In the next chapter, we will begin to look at methods for measuring the success of a learning algorithm that are applicable across many machine learning tasks. While our process has always devoted some effort to evaluating the success of learning, in order to obtain the...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with R - Third Edition
Published in: Apr 2019 Publisher: Packt ISBN-13: 9781788295864
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}