Reader small image

You're reading from  Data Labeling in Machine Learning with Python

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781804610541
Edition1st Edition
Right arrow
Author (1)
Vijaya Kumar Suda
Vijaya Kumar Suda
author image
Vijaya Kumar Suda

Vijaya Kumar Suda is a seasoned data and AI professional boasting over two decades of expertise collaborating with global clients. Having resided and worked in diverse locations such as Switzerland, Belgium, Mexico, Bahrain, India, Canada, and the USA, Vijaya has successfully assisted customers spanning various industries. Currently serving as a senior data and AI consultant at Microsoft, he is instrumental in guiding industry partners through their digital transformation endeavors using cutting-edge cloud technologies and AI capabilities. His proficiency encompasses architecture, data engineering, machine learning, generative AI, and cloud solutions.
Read more about Vijaya Kumar Suda

Right arrow

Labeling Image Data Using Data Augmentation

In this chapter, we will learn how to label image data using data augmentation for semi-supervised machine learning. We will use the CIFAR-10 dataset and the MNIST dataset of handwritten digits to generate labels using data augmentation. From there we will build an image classification machine learning model.

Data augmentation plays a crucial role in data labeling by enhancing the diversity, size, and quality of the dataset. Data augmentation techniques generate additional samples by applying transformations to existing data. This effectively increases the size of the dataset, providing more examples for training and improving the model’s ability to generalize.

In this chapter, we will cover the following:

  • How to prepare training data with image data augmentation and implement support vector machines
  • How to implement convolutional neural networks with augmented image data

Technical requirements

For this chapter, we will use the CIFAR-10 dataset, which is a publicly available image dataset consisting of 60,000 32x32 color images in 10 classes (http://www.cs.toronto.edu/~kriz/cifar.html), along with the famous MNIST handwritten digits dataset.

Training support vector machines with augmented image data

Support Vector Machines (SVMs) are widely used in machine learning to solve classification problems. SVMs are known for their high accuracy and ability to handle complex datasets. One of the challenges in training SVMs is the availability of large and diverse datasets. In this section, we will discuss the importance of data augmentation in training SVMs for image classification problems. We will also provide Python code examples for each technique.

Figure 6.1 – SVM separates class A and class B with largest margin

SVMs are a type of supervised learning algorithm used for classification and regression analysis. SVMs can be used for outlier detection. SVMs were originally designed for classification tasks, but can also be adapted for anomaly or outlier detection as well.

The objective of SVMs is to find the hyperplane that maximizes the margin between two classes of data. The hyperplane...

Implementing an SVM with data augmentation in Python

In this section, we will provide a step-by-step guide to implement an SVM with data augmentation in Python using the CIFAR-10 dataset. We will start by introducing the CIFAR-10 dataset and then move on to loading the dataset in Python. We will then preprocess the data for SVM training and implement an SVM with the default hyperparameters and dataset. Next, we train and evaluate the performance of the SVM with an augmented dataset, showing that the performance of the SVM improves on the augmented dataset.

Introducing the CIFAR-10 dataset

The CIFAR-10 dataset is a commonly used image classification dataset that consists of 60,000 32x32 color images in 10 classes. The classes are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The dataset is divided into 50,000 training images and 10,000 testing images. The dataset is preprocessed in a way that the training set and test set have an equal number of images...

Image classification using the SVM with data augmentation on the MNIST dataset

Let us see how we can apply data augmentation for image classification using an SVM with the MNIST dataset. All the steps are similar to the previous example with the CIFAR-10 dataset, except the dataset itself:

import tensorflow as tf
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator
# load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# normalize pixel values between 0 and 1
x_train = x_train / 255.0
x_test = x_test / 255.0
# convert labels to one-hot encoded vectors
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)
# create image data generator for data augmentation
datagen = ImageDataGenerator(rotation_range=20, \
    width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.2)
# fit image data...

Convolutional neural networks using augmented image data

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision by demonstrating exceptional performance in various image-related tasks such as object detection, image classification, and segmentation. However, the availability of large, annotated datasets for training CNNs is often a challenge. Fortunately, one effective approach to overcome this limitation is through the use of image data augmentation techniques.

Let’s start from scratch and explain what CNNs are and how they work. Imagine you have a picture, say a photo of a cat, and you want to teach a computer how to recognize that it’s a cat. CNNs are like a special type of computer program that helps computers understand and recognize things in images, just like how you recognize objects in photos.

An image is made up of tiny dots called pixels. Each pixel has a color, and when you put them all together, you get an image. The more...

Summary

In this chapter, we covered a variety of image data augmentation techniques. We learned how to implement an SVM with data augmentation in Python using the scikit-learn and Keras libraries. We first implemented SVM with the default hyperparameters and evaluated the performance of the classifier on the original dataset. We then implemented an SVM with data augmentation and trained the classifier on each batch of training data generated by the ImageDataGenerator object. Finally, we evaluated the performance of the classifier on the augmented dataset.

We also saw how to implement a CNN using augmentation with the CIFAR-10 dataset. Using data augmentation, we were able to improve the accuracy of the classifier on the augmented dataset. This demonstrates the effectiveness of data augmentation in improving the performance of machine learning models, especially in cases where the available dataset is limited.

Data augmentation can reduce the need for manual annotation by creating...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Labeling in Machine Learning with Python
Published in: Jan 2024Publisher: PacktISBN-13: 9781804610541
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime

Author (1)

author image
Vijaya Kumar Suda

Vijaya Kumar Suda is a seasoned data and AI professional boasting over two decades of expertise collaborating with global clients. Having resided and worked in diverse locations such as Switzerland, Belgium, Mexico, Bahrain, India, Canada, and the USA, Vijaya has successfully assisted customers spanning various industries. Currently serving as a senior data and AI consultant at Microsoft, he is instrumental in guiding industry partners through their digital transformation endeavors using cutting-edge cloud technologies and AI capabilities. His proficiency encompasses architecture, data engineering, machine learning, generative AI, and cloud solutions.
Read more about Vijaya Kumar Suda