Reader small image

You're reading from  Data Labeling in Machine Learning with Python

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781804610541
Edition1st Edition
Right arrow
Author (1)
Vijaya Kumar Suda
Vijaya Kumar Suda
author image
Vijaya Kumar Suda

Vijaya Kumar Suda is a seasoned data and AI professional boasting over two decades of expertise collaborating with global clients. Having resided and worked in diverse locations such as Switzerland, Belgium, Mexico, Bahrain, India, Canada, and the USA, Vijaya has successfully assisted customers spanning various industries. Currently serving as a senior data and AI consultant at Microsoft, he is instrumental in guiding industry partners through their digital transformation endeavors using cutting-edge cloud technologies and AI capabilities. His proficiency encompasses architecture, data engineering, machine learning, generative AI, and cloud solutions.
Read more about Vijaya Kumar Suda

Right arrow

Using summary statistics to generate housing price labels

In this section, we are going to generate house price labels using summary statistics of a small set of available labeled housing price data. This is useful in real-world projects when there is insufficient labeled data for regression tasks. In such scenarios, we will generate labeled data by creating some rules based on summary statistics.

We decode the significance of the data’s underlying trends. By computing the mean of each feature within the labeled training dataset, we embark on a journey to quantify the essence of the data. This approach ingeniously leverages distance metrics to unveil the closest match for a label, bestowing unlabeled data points with the wisdom of their labeled counterparts.

Let’s load the data from the housing.csv file using pandas:

import pandas as pd
# Load the labeled data
df_labeled = pd.read_csv('housing.csv')

Here’s the output:

Figure 3.1 – Snippet of the DataFrame
...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Labeling in Machine Learning with Python
Published in: Jan 2024Publisher: PacktISBN-13: 9781804610541

Author (1)

author image
Vijaya Kumar Suda

Vijaya Kumar Suda is a seasoned data and AI professional boasting over two decades of expertise collaborating with global clients. Having resided and worked in diverse locations such as Switzerland, Belgium, Mexico, Bahrain, India, Canada, and the USA, Vijaya has successfully assisted customers spanning various industries. Currently serving as a senior data and AI consultant at Microsoft, he is instrumental in guiding industry partners through their digital transformation endeavors using cutting-edge cloud technologies and AI capabilities. His proficiency encompasses architecture, data engineering, machine learning, generative AI, and cloud solutions.
Read more about Vijaya Kumar Suda