Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data-Centric Machine Learning with Python

You're reading from  Data-Centric Machine Learning with Python

Product type Book
Published in Feb 2024
Publisher Packt
ISBN-13 9781804618127
Pages 378 pages
Edition 1st Edition
Languages
Authors (3):
Jonas Christensen Jonas Christensen
Profile icon Jonas Christensen
Nakul Bajaj Nakul Bajaj
Profile icon Nakul Bajaj
Manmohan Gosada Manmohan Gosada
Profile icon Manmohan Gosada
View More author details

Table of Contents (17) Chapters

Preface Part 1: What Data-Centric Machine Learning Is and Why We Need It
Chapter 1: Exploring Data-Centric Machine Learning Chapter 2: From Model-Centric to Data-Centric – ML’s Evolution Part 2: The Building Blocks of Data-Centric ML
Chapter 3: Principles of Data-Centric ML Chapter 4: Data Labeling Is a Collaborative Process Part 3: Technical Approaches to Better Data
Chapter 5: Techniques for Data Cleaning Chapter 6: Techniques for Programmatic Labeling in Machine Learning Chapter 7: Using Synthetic Data in Data-Centric Machine Learning Chapter 8: Techniques for Identifying and Removing Bias Chapter 9: Dealing with Edge Cases and Rare Events in Machine Learning Part 4: Getting Started with Data-Centric ML
Chapter 10: Kick-Starting Your Journey in Data-Centric Machine Learning Index Other Books You May Enjoy

Data augmentation and resampling techniques

Class imbalance is a common issue in datasets with rare events. Class imbalance can adversely affect the model’s performance, as the model tends to be biased toward the majority class. To address this, we will explore two resampling techniques:

  • Oversampling: Increasing the number of instances in the minority class by generating synthetic samples
  • Undersampling: Reducing the number of instances in the majority class to balance class distribution

Let’s discuss these resampling techniques in more detail.

Oversampling using SMOTE

Synthetic Minority Over-sampling TEchnique (SMOTE) is a widely used resampling method for addressing class imbalance in machine learning datasets, especially when dealing with rare events or minority classes. SMOTE helps to generate synthetic samples for the minority class by interpolating between existing minority class samples. This technique aims to balance class distribution...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}