You're reading from MATLAB for Machine Learning - Second Edition

Product typeBook

Published inJan 2024

Reading LevelIntermediate

PublisherPackt

ISBN-139781835087695

Edition2nd Edition

Languages

MATLAB

Tools

MATLAB

Concepts

Machine Learning

Author (1)

Giuseppe Ciaburro

Clustering Analysis and Dimensionality Reduction

Clustering techniques aim to uncover concealed patterns or groupings within a dataset. These algorithms detect groupings without relying on any predefined labels. Instead, they select clusters based on the similarity between elements. Dimensionality reduction, on the other hand, involves transforming a dataset with numerous variables into one with fewer dimensions while preserving relevant information. Feature selection methods attempt to identify a subset of the original variables, while feature extraction reduces data dimensionality by transforming it into new features. This chapter shows us how to divide data into clusters, or groupings of similar items. We’ll also learn how to select features that best represent the set of data.

In this chapter, we will cover the following main topics:

Understanding clustering – basic concepts and methods
Understanding hierarchical clustering
Partitioning-based clustering...

Technical requirements

In this chapter, we will introduce basic concepts relating to machine learning. To understand these topics, a basic knowledge of algebra and mathematical modeling is needed. You will also need a working knowledge of the MATLAB environment.

To work with the MATLAB code in this chapter, you need the following files (available on GitHub at https://github.com/PacktPublishing/MATLAB-for-Machine-Learning-second-edition):

Minerals.xls
PeripheralLocations.xls
YachtHydrodynamics.xlsx
SeedsDataset.xlsx

Understanding clustering – basic concepts and methods

Clustering is a fundamental concept in data analysis, aiming to identify meaningful groupings or patterns within a dataset. It involves the partitioning of data points into distinct clusters based on their similarity or proximity to each other. In both clustering and classification, our goal is to discover the underlying rules that enable us to assign observations to the correct class. However, clustering differs from classification as it requires identifying a meaningful subdivision of classes as well. In classification, we benefit from the target variable, which provides the classification information in the training set. In contrast, clustering lacks such additional information, necessitating the deduction of classes by analyzing the spatial distribution of the data. Dense areas in the data correspond to groups of similar observations. If we can identify observations that are like each other but distinct from those in...

Understanding hierarchical clustering

Hierarchical clustering is a method of clustering that creates a hierarchy or tree-like structure of clusters. It iteratively merges or splits clusters based on the similarity or dissimilarity between data points. The resulting structure is often represented as a dendrogram, which visualizes the relationships and similarities among the data points.

There are two main types of hierarchical clustering:

Agglomerative hierarchical clustering: This starts with each data point considered as an individual cluster and progressively merges similar clusters until all data points belong to a single cluster. At the beginning, each data point is treated as a separate cluster, and in each iteration, the two most similar clusters are merged into a larger cluster. This process continues until all data points are in one cluster. The merging process is guided by a distance or similarity measure, such as a Euclidean distance or correlation.
Divisive...

Partitioning-based clustering algorithms with MATLAB

Partitioning-based clustering is a type of clustering algorithm that aims to divide a dataset into distinct groups or partitions. In this approach, each data point is assigned to exactly one cluster, and the goal is to minimize the intra-cluster distance while maximizing the inter-cluster distance. The most popular partitioning-based clustering algorithms include k-medoids, fuzzy c-means, and hierarchical k-means. These algorithms vary in their approach and objectives, but they all aim to partition the data into well-separated clusters based on some distance or similarity measure.

Introducing the k-means algorithm

One of the most well-known partitioning-based clustering algorithms is k-means. In k-means clustering, the algorithm attempts to partition the data into k clusters, where k is a predefined number specified by the user. The algorithm iteratively assigns data points to the nearest cluster centroid and recalculates the...

Grouping data using the similarity measures

The k-medoids algorithm is a variation of the k-means algorithm that uses medoids (actual data points) as representatives of each cluster instead of centroids. Unlike the k-means algorithm, which calculates the mean of the data points within each cluster, the k-medoids algorithm selects the most centrally located data point within each cluster as the medoid. This makes k-medoids more robust to outliers and suitable for data with non-Euclidean distances.

Here are some key differences between k-medoids and k-means:

Representative points: In k-medoids, the representatives of each cluster are actual data points from the dataset (medoids), while in k-means, the representatives are the centroids, which are calculated as the mean of the data points.
Distance measure: The distance measure used in k-means is typically the Euclidean distance. On the other hand, k-medoids can handle various distance measures, including non-Euclidean distances...

Discovering dimensionality reduction techniques

Dimensionality reduction is a technique used in machine learning and data analysis to reduce the number of variables or features in a dataset. The goal of dimensionality reduction is to simplify the data while retaining important information, thereby improving the efficiency and effectiveness of subsequent analysis tasks.

High-dimensional datasets can be challenging to work with due to several reasons:

Curse of dimensionality: As the number of features increases, the data becomes more sparse, making it difficult to find meaningful patterns or relationships
Computational complexity: Many algorithms and models become computationally expensive as the dimensionality of the data increases, requiring more time and resources for analysis
Overfitting: High-dimensional data is more susceptible to overfitting, where a model becomes too specialized to the training data and fails to generalize well to new data

Dimensionality...

Feature selection and feature extraction using MATLAB

In MATLAB, there are several built-in functions and toolboxes that can be used for dimensionality reduction. In the next section, we will explore some practical examples of the dimensionality reduction algorithm in the MATLAB environment.

Stepwise regression for feature selection

Regression analysis is a valuable approach for understanding the impact of independent variables on a dependent variable. It allows us to identify predictors that hold greater influence over the model’s response. Stepwise regression is a variable selection method used to choose a subset of predictors that exhibit the strongest relationship with the dependent variable. There are three common variable selection algorithms:

Forward method: The forward method starts with an empty model, where no predictors are initially selected. In the first step, the variable showing the most significant association at a statistical level is added. In...

Summary

In this chapter, we gained knowledge about performing accurate cluster analysis in the MATLAB environment. Our exploration began by understanding the measurement of similarity, including concepts such as element proximity, similarity, and dissimilarity measures. We delved into different methods for grouping objects, namely hierarchical clustering, and partitioning clustering.

Regarding partitioning clustering, we focused on the k-means method. We learned how to iteratively locate k centroids, each representing a cluster. We also examined the effectiveness of cluster separation and how to generate a silhouette plot using cluster indices obtained from k-means. The silhouette value for each data point serves as a measure of its similarity to other points within its own cluster, compared to points in other clusters. Furthermore, we delved into k-medoids clustering, which involves identifying the centers of clusters using medoids instead of centroids. We learned the procedure...

The rest of the chapter is locked

You have been reading a chapter from

MATLAB for Machine Learning - Second Edition

Published in: Jan 2024Publisher: PacktISBN-13: 9781835087695

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages