You're reading from Data Labeling in Machine Learning with Python

Product typeBook

Published inJan 2024

PublisherPackt

ISBN-139781804610541

Edition1st Edition

Concepts

Machine Learning

Author (1)

Vijaya Kumar Suda

Labeling Image Data Using Data Augmentation

In this chapter, we will learn how to label image data using data augmentation for semi-supervised machine learning. We will use the CIFAR-10 dataset and the MNIST dataset of handwritten digits to generate labels using data augmentation. From there we will build an image classification machine learning model.

Data augmentation plays a crucial role in data labeling by enhancing the diversity, size, and quality of the dataset. Data augmentation techniques generate additional samples by applying transformations to existing data. This effectively increases the size of the dataset, providing more examples for training and improving the model’s ability to generalize.

In this chapter, we will cover the following:

How to prepare training data with image data augmentation and implement support vector machines
How to implement convolutional neural networks with augmented image data

Technical requirements

For this chapter, we will use the CIFAR-10 dataset, which is a publicly available image dataset consisting of 60,000 32x32 color images in 10 classes (http://www.cs.toronto.edu/~kriz/cifar.html), along with the famous MNIST handwritten digits dataset.

Training support vector machines with augmented image data

Support Vector Machines (SVMs) are widely used in machine learning to solve classification problems. SVMs are known for their high accuracy and ability to handle complex datasets. One of the challenges in training SVMs is the availability of large and diverse datasets. In this section, we will discuss the importance of data augmentation in training SVMs for image classification problems. We will also provide Python code examples for each technique.

Figure 6.1 – SVM separates class A and class B with largest margin

SVMs are a type of supervised learning algorithm used for classification and regression analysis. SVMs can be used for outlier detection. SVMs were originally designed for classification tasks, but can also be adapted for anomaly or outlier detection as well.

The objective of SVMs is to find the hyperplane that maximizes the margin between two classes of data. The hyperplane...

Implementing an SVM with data augmentation in Python

In this section, we will provide a step-by-step guide to implement an SVM with data augmentation in Python using the CIFAR-10 dataset. We will start by introducing the CIFAR-10 dataset and then move on to loading the dataset in Python. We will then preprocess the data for SVM training and implement an SVM with the default hyperparameters and dataset. Next, we train and evaluate the performance of the SVM with an augmented dataset, showing that the performance of the SVM improves on the augmented dataset.

Introducing the CIFAR-10 dataset

The CIFAR-10 dataset is a commonly used image classification dataset that consists of 60,000 32x32 color images in 10 classes. The classes are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The dataset is divided into 50,000 training images and 10,000 testing images. The dataset is preprocessed in a way that the training set and test set have an equal number of images...

Image classification using the SVM with data augmentation on the MNIST dataset

Let us see how we can apply data augmentation for image classification using an SVM with the MNIST dataset. All the steps are similar to the previous example with the CIFAR-10 dataset, except the dataset itself:

import tensorflow as tf
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator
# load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# normalize pixel values between 0 and 1
x_train = x_train / 255.0
x_test = x_test / 255.0
# convert labels to one-hot encoded vectors
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)
# create image data generator for data augmentation
datagen = ImageDataGenerator(rotation_range=20, \
    width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.2)
# fit image data...

Convolutional neural networks using augmented image data

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision by demonstrating exceptional performance in various image-related tasks such as object detection, image classification, and segmentation. However, the availability of large, annotated datasets for training CNNs is often a challenge. Fortunately, one effective approach to overcome this limitation is through the use of image data augmentation techniques.

Let’s start from scratch and explain what CNNs are and how they work. Imagine you have a picture, say a photo of a cat, and you want to teach a computer how to recognize that it’s a cat. CNNs are like a special type of computer program that helps computers understand and recognize things in images, just like how you recognize objects in photos.

An image is made up of tiny dots called pixels. Each pixel has a color, and when you put them all together, you get an image. The more...

Summary

In this chapter, we covered a variety of image data augmentation techniques. We learned how to implement an SVM with data augmentation in Python using the scikit-learn and Keras libraries. We first implemented SVM with the default hyperparameters and evaluated the performance of the classifier on the original dataset. We then implemented an SVM with data augmentation and trained the classifier on each batch of training data generated by the ImageDataGenerator object. Finally, we evaluated the performance of the classifier on the augmented dataset.

We also saw how to implement a CNN using augmentation with the CIFAR-10 dataset. Using data augmentation, we were able to improve the accuracy of the classifier on the augmented dataset. This demonstrates the effectiveness of data augmentation in improving the performance of machine learning models, especially in cases where the available dataset is limited.

Data augmentation can reduce the need for manual annotation by creating...

The rest of the chapter is locked

You have been reading a chapter from

Data Labeling in Machine Learning with Python

Published in: Jan 2024Publisher: PacktISBN-13: 9781804610541

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at £13.99/month. Cancel anytime

Author (1)

Vijaya Kumar Suda

Vijaya Kumar Suda is a seasoned data and AI professional boasting over two decades of expertise collaborating with global clients. Having resided and worked in diverse locations such as Switzerland, Belgium, Mexico, Bahrain, India, Canada, and the USA, Vijaya has successfully assisted customers spanning various industries. Currently serving as a senior data and AI consultant at Microsoft, he is instrumental in guiding industry partners through their digital transformation endeavors using cutting-edge cloud technologies and AI capabilities. His proficiency encompasses architecture, data engineering, machine learning, generative AI, and cloud solutions.
Read more about Vijaya Kumar Suda

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages