Reader small image

You're reading from  Automated Machine Learning with AutoKeras

Product typeBook
Published inMay 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800567641
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Luis Sobrecueva
Luis Sobrecueva
author image
Luis Sobrecueva

Luis Sobrecueva is a senior software engineer and ML/DL practitioner currently working at Cabify. He has been a contributor to the OpenAI project as well as one of the contributors to the AutoKeras project.
Read more about Luis Sobrecueva

Right arrow

Chapter 6: Working with Structured Data Using AutoKeras

In this chapter, we will focus on using AutoKeras to work with structured data, also known as tabular data. We will learn how to explore this type of dataset and what techniques to apply to solve problems based on this data source.

Once you've completed this chapter, you will be able to explore a structured dataset, transform it, and use it as a data source for specific models, as well as create your own classification and regression models to solve tasks based on structured data.

Specifically, in this chapter, we will cover the following topics:

  • Understanding structured data
  • Working with structured data
  • Creating a structured data classifier to predict Titanic survivors
  • Creating a structured data regressor to predict Boston house prices

Technical requirements

All the coding examples in this book are available as Jupyter notebooks that can be downloaded from this book's GitHub repository: https://colab.research.google.com/github/PacktPublishing/Automated-Machine-Learning-with-AutoKeras/blob/main/Chapter06/Chapter6_HousingPricePredictor.ipynb.

Since code cells can be executed, each notebook can be self-installed, so you can add a code snippet with the requirements you need. For this reason, at the beginning of each notebook, there is a code cell for environment setup, which installs AutoKeras and its dependencies.

So, to run the coding examples in this book, you only need a computer with Ubuntu Linux as your OS and to install the respective Jupyter notebook with the following code:

$ apt-get install python3-pip jupyter-notebook

Alternatively, you can also run these notebooks using Google Colaboratory. In that case, you will only need a web browser. For further details, see the AutoKeras with Google...

Understanding structured data

Structured data is basically tabular data; that is, data represented by rows and columns of a database. These tables contain two types of structured data, as follows:

  • Numerical data: This is data that is expressed on a numerical scale. Furthermore, it is represented in two ways, as follows:

    a. Continuous: Data that can take any value in an interval, such as temperature, speed, height, and so on. For example, a person's height could be any value (within the range of human heights), not just certain fixed heights.

    b. Discrete: Data that can take only non-divisible integer values, such as counters. Examples include the amount of money in a bank account, the population of a country, and so on.

  • Categorical data: This is data that can take only a specific set of values corresponding to possible categories. In turn, they are divided into the following categories:

    a. Binary: Data that can only accept two values (0/1)

    b. Ordinal: Data...

Working with structured data

AutoKeras allows us to quickly and easily create high-performance models for solving tasks based on structured data.

Depending on the format of each column, AutoKeras will preprocess them automatically before feeding the model. For instance, if the column contains text, it will convert it into an embedding, if the column values are fixed categories, it will convert them into one-hot encoding arrays, and so on.

In the following sections, we will see how easy it is to work with tabular datasets.

Creating a structured data classifier to predict Titanic survivors

This model will predict whether a Titanic passenger will survive the sinking of the ship based on characteristics that have been extracted from the Titanic Kaggle dataset. Although luck was an important factor in survival, some groups of people were more likely to survive than others.

There are a train dataset and a test dataset in this dataset. Both are similar datasets that include passenger information such as name, age, sex, socioeconomic class, and so on.

The train dataset (train.csv) contains details about a subset of the passengers on board (891, to be exact), revealing if they survived or not in the survived column.

The test dataset (test.csv) will be used in the final evaluation and contains similar information for the other 418 passengers.

AutoKeras will find patterns in the train data to predict whether these other 418 passengers on board (found in test.csv) survived.

The full source code notebook...

Creating a structured data regressor to predict Boston house prices

In the following example, we will try to predict the median home price in a Boston suburb in the mid-1970s, given data features about the suburb at that time, such as the crime rate, tax rate of the property, local property, and so on.

We will create a model that will find out the house price of a specific suburb based on its features. For this, we will train the model with the boston_housing dataset, which we must add to our repository (https://github.com/PacktPublishing/Automated-Machine-Learning-with-AutoKeras/blob/main/boston.csv). The dataset we will use is relatively small – 506 samples divided between 404 training samples and 102 test samples. Note that the dataset isn't normalized, which means that each characteristic in the input data applies a different scale to its values. For example, some columns have values in the 0 to 1 range, while others are between 1 and 12, 0 and 100, and so on. So...

Summary

In this chapter, we learned what structured data is and its different categories, how to feed our AutoKeras models with different structured data formats (pandas, CSV files, and so on), and how to load and explore tabular datasets using some pandas functions.

Finally, we applied these concepts by creating a powerful structured data classifier model to predict Titanic survivors and a powerful structured data regressor model to predict Boston house prices.

With that, you have learned the basics of how to tackle any problem based on structured data using AutoKeras. With these techniques, any CSV file can be a dataset that you can train your model with.

In the next chapter, we will learn how to perform sentiment analysis on texts using AutoKeras.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Automated Machine Learning with AutoKeras
Published in: May 2021Publisher: PacktISBN-13: 9781800567641
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Luis Sobrecueva

Luis Sobrecueva is a senior software engineer and ML/DL practitioner currently working at Cabify. He has been a contributor to the OpenAI project as well as one of the contributors to the AutoKeras project.
Read more about Luis Sobrecueva