Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Labeling in Machine Learning with Python

You're reading from  Data Labeling in Machine Learning with Python

Product type Book
Published in Jan 2024
Publisher Packt
ISBN-13 9781804610541
Pages 398 pages
Edition 1st Edition
Languages
Author (1):
Vijaya Kumar Suda Vijaya Kumar Suda
Profile icon Vijaya Kumar Suda

Table of Contents (18) Chapters

Preface Part 1: Labeling Tabular Data
Chapter 1: Exploring Data for Machine Learning Chapter 2: Labeling Data for Classification Chapter 3: Labeling Data for Regression Part 2: Labeling Image Data
Chapter 4: Exploring Image Data Chapter 5: Labeling Image Data Using Rules Chapter 6: Labeling Image Data Using Data Augmentation Part 3: Labeling Text, Audio, and Video Data
Chapter 7: Labeling Text Data Chapter 8: Exploring Video Data Chapter 9: Labeling Video Data Chapter 10: Exploring Audio Data Chapter 11: Labeling Audio Data Chapter 12: Hands-On Exploring Data Labeling Tools Index Other Books You May Enjoy

Summary

In this chapter, we have learned how to use Pandas and matplotlib to analyze a dataset and understand the data and correlations between various features. This understanding of data and patterns in the data is required to build the rules for labeling raw data before using it for training ML models and fine-tuning LLMs.

We also went through various examples for aggregating columns and categorical values using groupby and mean. Then, we created reusable functions so that those functions can be reused simply by calling and passing column names to get aggregates of one or more columns.

Finally, we saw a fast and easy exploration of data using the ydata-profiling library with simple one-line Python code. Using this library, we need not remember many Pandas functions. We can simply call one line of code to perform a detailed analysis of data. We can create detailed reports of statistics for each variable with missing values, correlations, interactions, and duplicate rows.

Once we get a good sense of our data using EDA, we will be able to build the rules for creating labels for the dataset.

In the next chapter, we will see how to build these rules using Python libraries such as snorkel and compose to label an unlabeled dataset. We will also explore other methods, such as pseudo-labeling and K-means clustering, for data labeling.

You have been reading a chapter from
Data Labeling in Machine Learning with Python
Published in: Jan 2024 Publisher: Packt ISBN-13: 9781804610541
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}