Reader small image

You're reading from  MATLAB for Machine Learning - Second Edition

Product typeBook
Published inJan 2024
Reading LevelIntermediate
PublisherPackt
ISBN-139781835087695
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Giuseppe Ciaburro
Giuseppe Ciaburro
author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

Right arrow

Working with Data in MATLAB

Today, the amount of data generated is enormous; computers, smart TVs, smartphones, home appliances, credit cards, sensors, public and private transport, and automation systems: these devices effortlessly produce a plethora of data, and here are just a handful of examples. Various purposes are served by storing and utilizing such data. One notable application is the utilization of machine learning (ML) algorithms for data analysis. This chapter delves into the process of importing and organizing data in MATLAB. To achieve this, it is crucial to acquaint yourself with the MATLAB workspace to streamline operations. The chapter then proceeds to examine the different data formats available for collected data and goes on to explore various data formats for the gathered data, providing guidance on importing and exporting data to and from MATLAB. Moreover, it delves into data types suitable for managing grouping variables and categorical data. The section wraps...

Technical requirements

In this chapter, we will introduce ML basic concepts. To understand these topics, a basic knowledge of algebra and mathematical modeling is needed, and a working knowledge of the MATLAB environment is also required.

To work with the MATLAB code in this chapter, you need the following files (available on GitHub at https://github.com/PacktPublishing/MATLAB-for-Machine-Learning-second-edition):

  • IrisData.csv
  • matrix.mat
  • matrix.txt
  • NumMatrix.txt
  • ItalianMuseum.xlsx
  • DataItalianCities.txt
  • coliseum.jpg
  • Apollo13.wav
  • CleaningData.xlsx
  • GlassIdentificationDataSet.xlsx

Importing data into MATLAB

The exchange of data between the analysis environment and external devices plays a pivotal role in data analysis. Importing data refers to the process of bringing external data into a software or platform for further analysis, processing, or storage. This flow of operation can be developed in MATLAB through the following steps:

  1. Prepare your data: Ensure that your data is in a compatible format such as a text file, spreadsheet (CSV or Excel), or a supported file format (MAT or HDF). Make sure the data is organized and structured properly.
  2. Navigate to the Import Tool: MATLAB provides an interactive tool called the Import Tool that simplifies the data import process. To access it, you can either click on the Import Data button on the MATLAB toolbar or use the importdata() function in the MATLAB command window and press Enter. This function loads data from a file:
Figure 2.1 – Import Data procedure

Figure 2.1 – Import Data procedure

  1. Select...

Reading ASCII-delimited files

The readmatrix() function in MATLAB allows you to read the contents of a text file into a matrix. It is a convenient way to load numerical data from a delimited or fixed-width text file. The basic syntax for using readmatrix is as follows:

NumMatrix = readmatrix('NumMatrix.txt');

The function will attempt to infer the delimiter used in the file automatically. You can also specify additional options to customize the behavior of readmatrix, such as specifying the range of rows or columns to read, handling missing data, specifying the delimiter explicitly, and more. Here’s an example:

NumMatrix = readmatrix('NumMatrix.txt', 'Range', 'A1:C3', 'Delimiter', ',');

In this example, the Range option is used to specify that only the data in the range A1 to C3 should be read, and the Delimiter option specifies that the data is comma-separated. The following results are returned:

NumMatrix...

Exporting data from MATLAB

Exporting data represents a crucial activity for a data scientist: it allows you to perform advanced data analysis using specialized tools and libraries commonly used in data science. By exporting data from MATLAB to formats compatible with these tools, you can leverage their extensive functionality for exploratory data analysis (EDA), statistical modeling, ML, and visualization. Exporting data from MATLAB serves several purposes and can be beneficial in various scenarios:

  • Data sharing: Exporting data allows you to share your results, findings, or processed data with others who may not have direct access to MATLAB
  • External analysis: Exporting data enables you to analyze your MATLAB data using external software tools or programming languages
  • Documentation and reporting: Exported data can be used for documentation and reporting purposes
  • Data backup: Exporting data serves as a form of data backup
  • Integration with other systems: Exporting...

Working with different types of data

Working with different types of data involves understanding their specific formats and applying appropriate techniques for manipulation and analysis. Here are some common types of data and general considerations for working with them.

Working with images

In MATLAB, working with images involves loading, displaying, and performing various operations on image data. Here’s a brief overview of how to work with images in MATLAB:

  • Loading an image: We can load an image into MATLAB using the imread() function. It reads the image file and returns a numeric array representing the image data. Here’s an example:
    img = imread('coliseum.jpg');
  • Displaying an image: MATLAB provides the imshow() function to display an image. It opens a separate window and shows the image. Here’s an example:
    imshow(img);
  • Manipulating an image: MATLAB offers a wide range of functions to perform various operations on images. For example...

Exploring data wrangling

Data wrangling, also known as data munging or data preprocessing, refers to the process of cleaning, transforming, and preparing raw data for analysis. It involves several tasks, such as handling missing or inconsistent data, removing duplicates, reshaping data formats, and merging multiple datasets. Common techniques used in data wrangling include the following:

  • Data cleaning: Identifying and handling missing values, outliers, and errors in the dataset. This may involve imputing missing values, removing outliers, or correcting errors.
  • Data transformation: Modifying the structure or format of the data to make it compatible with the desired analysis or modeling techniques. This can include tasks such as changing variable types, scaling numerical values, or encoding categorical variables.
  • Data integration: Combining multiple datasets or data sources into a single unified dataset. This may involve joining datasets based on common variables or merging...

Discovering exploratory statistics

Exploratory statistics refers to the initial phase of data analysis where various statistical techniques are employed to understand the main characteristics of a dataset. There are many techniques available, but the most used one is the following.

EDA

EDA is an approach to analyzing data that focuses on understanding the main characteristics, patterns, and relationships within a dataset. It involves using statistical techniques and visualizations to summarize and explore the data to gain insights and formulate hypotheses. Here are some key steps and techniques involved in EDA:

  • Data summary: Start by examining the basic summary statistics of the dataset, such as the mean, median, standard deviation, minimum, maximum, and so on. This gives an initial understanding of the central tendency, spread, and distribution of the data.
  • Data distribution: Examine the distribution of individual variables to identify any skewness or non-normality...

Introducing exploratory visualization

Exploratory visualization is a crucial step in the data analysis process, allowing us to gain insights and understand underlying patterns, relationships, and trends within our data. It involves creating visual representations of the data to explore its various attributes and uncover potential patterns or anomalies. The primary goal of exploratory visualization is to visually inspect the data, identify any interesting features, and generate hypotheses for further investigation. By leveraging the power of visual perception, we can better understand complex datasets and make informed decisions. MATLAB provides a variety of functions and tools for exploratory data visualization. Here are some commonly used functions for exploratory visualization:

  • plot(): This function is used to create line plots, scatter plots, or any custom plot by specifying x and y coordinates.
  • histogram(): This function creates histograms to visualize the distribution...

Understanding advanced data preprocessing techniques in MATLAB

After introducing data preprocessing techniques, in this section, we will analyze some data preprocessing techniques in MATLAB with practical examples. We will talk about min-max scaling and z-score standardization, which are two common techniques used to normalize data in ML. Both techniques aim to rescale numerical data to a common scale, making it easier for ML algorithms to learn from the data.

Data normalization for feature scaling

Data normalization is a preprocessing step used to scale and standardize data to a common range or distribution. It aims to bring different features or variables to a comparable scale, ensuring that no single feature dominates the analysis due to its larger magnitude. Normalizing data can also help improve the performance of certain ML algorithms. There are various techniques for data normalization, including the following:

  • Min-Max scaling: This method scales the data to a...

Summary

In this chapter, we began our exploration of the MATLAB desktop and its convenient interaction features. We familiarized ourselves with the MATLAB Toolstrip, which is organized into various tabs. Subsequently, we delved into the importing capabilities of MATLAB, enabling us to read diverse types of data resources. We acquired knowledge on how to import data into MATLAB interactively and programmatically. Moreover, we comprehended the process of exporting data from the workspace and working with media files.

Next, we embarked on the challenging task of data preparation. We learned various techniques, including identifying missing values, modifying data types, replacing missing values, removing incomplete entries, organizing tables, identifying outliers, and consolidating multiple data sources. Following that, we explored exploratory statistics techniques, which enabled us to derive insightful features guiding us in selecting appropriate tools for extracting knowledge from...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
MATLAB for Machine Learning - Second Edition
Published in: Jan 2024Publisher: PacktISBN-13: 9781835087695
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro