You're reading from Automated Machine Learning with AutoKeras

Product typeBook

Published inMay 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800567641

Edition1st Edition

Languages

Python

Tools

Keras

Concepts

Deep Learning

Author (1)

Luis Sobrecueva

Chapter 6: Working with Structured Data Using AutoKeras

In this chapter, we will focus on using AutoKeras to work with structured data, also known as tabular data. We will learn how to explore this type of dataset and what techniques to apply to solve problems based on this data source.

Once you've completed this chapter, you will be able to explore a structured dataset, transform it, and use it as a data source for specific models, as well as create your own classification and regression models to solve tasks based on structured data.

Specifically, in this chapter, we will cover the following topics:

Understanding structured data
Working with structured data
Creating a structured data classifier to predict Titanic survivors
Creating a structured data regressor to predict Boston house prices

Technical requirements

All the coding examples in this book are available as Jupyter notebooks that can be downloaded from this book's GitHub repository: https://colab.research.google.com/github/PacktPublishing/Automated-Machine-Learning-with-AutoKeras/blob/main/Chapter06/Chapter6_HousingPricePredictor.ipynb.

Since code cells can be executed, each notebook can be self-installed, so you can add a code snippet with the requirements you need. For this reason, at the beginning of each notebook, there is a code cell for environment setup, which installs AutoKeras and its dependencies.

So, to run the coding examples in this book, you only need a computer with Ubuntu Linux as your OS and to install the respective Jupyter notebook with the following code:

$ apt-get install python3-pip jupyter-notebook

Alternatively, you can also run these notebooks using Google Colaboratory. In that case, you will only need a web browser. For further details, see the AutoKeras with Google...

Understanding structured data

Structured data is basically tabular data; that is, data represented by rows and columns of a database. These tables contain two types of structured data, as follows:

Numerical data: This is data that is expressed on a numerical scale. Furthermore, it is represented in two ways, as follows:
a. Continuous: Data that can take any value in an interval, such as temperature, speed, height, and so on. For example, a person's height could be any value (within the range of human heights), not just certain fixed heights.
b. Discrete: Data that can take only non-divisible integer values, such as counters. Examples include the amount of money in a bank account, the population of a country, and so on.
Categorical data: This is data that can take only a specific set of values corresponding to possible categories. In turn, they are divided into the following categories:
a. Binary: Data that can only accept two values (0/1)
b. Ordinal: Data...

Working with structured data

AutoKeras allows us to quickly and easily create high-performance models for solving tasks based on structured data.

Depending on the format of each column, AutoKeras will preprocess them automatically before feeding the model. For instance, if the column contains text, it will convert it into an embedding, if the column values are fixed categories, it will convert them into one-hot encoding arrays, and so on.

In the following sections, we will see how easy it is to work with tabular datasets.

Creating a structured data classifier to predict Titanic survivors

This model will predict whether a Titanic passenger will survive the sinking of the ship based on characteristics that have been extracted from the Titanic Kaggle dataset. Although luck was an important factor in survival, some groups of people were more likely to survive than others.

There are a train dataset and a test dataset in this dataset. Both are similar datasets that include passenger information such as name, age, sex, socioeconomic class, and so on.

The train dataset (train.csv) contains details about a subset of the passengers on board (891, to be exact), revealing if they survived or not in the survived column.

The test dataset (test.csv) will be used in the final evaluation and contains similar information for the other 418 passengers.

AutoKeras will find patterns in the train data to predict whether these other 418 passengers on board (found in test.csv) survived.

The full source code notebook...

Creating a structured data regressor to predict Boston house prices

In the following example, we will try to predict the median home price in a Boston suburb in the mid-1970s, given data features about the suburb at that time, such as the crime rate, tax rate of the property, local property, and so on.

We will create a model that will find out the house price of a specific suburb based on its features. For this, we will train the model with the boston_housing dataset, which we must add to our repository (https://github.com/PacktPublishing/Automated-Machine-Learning-with-AutoKeras/blob/main/boston.csv). The dataset we will use is relatively small – 506 samples divided between 404 training samples and 102 test samples. Note that the dataset isn't normalized, which means that each characteristic in the input data applies a different scale to its values. For example, some columns have values in the 0 to 1 range, while others are between 1 and 12, 0 and 100, and so on. So...

Summary

In this chapter, we learned what structured data is and its different categories, how to feed our AutoKeras models with different structured data formats (pandas, CSV files, and so on), and how to load and explore tabular datasets using some pandas functions.

Finally, we applied these concepts by creating a powerful structured data classifier model to predict Titanic survivors and a powerful structured data regressor model to predict Boston house prices.

With that, you have learned the basics of how to tackle any problem based on structured data using AutoKeras. With these techniques, any CSV file can be a dataset that you can train your model with.

In the next chapter, we will learn how to perform sentiment analysis on texts using AutoKeras.

The rest of the chapter is locked

You have been reading a chapter from

Automated Machine Learning with AutoKeras

Published in: May 2021Publisher: PacktISBN-13: 9781800567641

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Luis Sobrecueva

Luis Sobrecueva is a senior software engineer and ML/DL practitioner currently working at Cabify. He has been a contributor to the OpenAI project as well as one of the contributors to the AutoKeras project.
Read more about Luis Sobrecueva

Other recommended products

Related to this chapter

Automated Machine Learning

This guide will help you to explore automated machine learning (AutoML), a rapidly growing subfield of machine learning. You’ll learn how you can use AutoML to fully automate the machine learning process even if you’re not an expert, and in turn increase your productivity drastically.

BookFeb 2021312 pages

Keras 2.x Projects

Keras is a deep learning library that enables the fast, efficient training of deep learning models. The book begins with setting up the environment, training various types of models in the domain of deep learning and reinforcement learning. The projects are exciting and are real-world market demanding projects which take you from simple to complex level.

BookDec 2018394 pages

TensorFlow 2.0 Computer Vision Cookbook

This book covers recipes for solving various computer vision tasks using TensorFlow, taking you through all the tips and tricks you need to overcome any challenges that you may face while building various computer vision applications. You will discover machine learning techniques to solve problems in image processing, feature extraction, and more.

BookFeb 2021542 pages

Machine Learning Automation with TPOT

If you are a developer looking to build machine learning models without spending months and years learning machine learning prerequisites, look no further than AutoML. This practical and concise guide will show you how to build automated models for regression and classification, both with traditional algorithms and neural networks.

BookMay 2021270 pages

Advanced Deep Learning with R

This book will help readers to apply deep learning algorithms in R using advanced examples. You will cover variants of neural network models such as ANN, CNN, RNN, LSTM, and more using expert techniques. Readers will make use of popular deep learning libraries such as Keras-R, Tensorflow-R, and more to implement AI models.

BookDec 2019352 pages

Hands-On Automated Machine Learning

This book helps machine learning professionals in developing AutoML systems that can be utilized to build ML solutions. This book covers the necessary foundations and shows the most practical ways possible to get to speed with regards to creating AutoML modules.

BookApr 2018282 pages

The Applied TensorFlow and Keras Workshop

The Applied TensorFlow and Keras Workshop provides you with a blueprint to build an application that generates predictions using a deep learning model. You’ll learn to apply techniques to improve the model: add more data and features, change its architecture, or create a new model by changing the core components to meet your own requirements.

BookJul 2020174 pages

Deep Learning with Microsoft Cognitive Toolkit Quick Start Guide

Cognitive Toolkit is one of the most popular and recently open sourced deep learning toolkit by Microsoft. Cognitive Toolkit is used to train fast and effective deep learning models. This book will be a quick introduction to using Cognitive Toolkit and will teach you how to train and validate different types of neural networks.

BookMar 2019208 pages

Master Data Science with Python

Data Science with Python will help you get comfortable with using the Python environment for data science. You will learn all the libraries that a data scientist uses on a daily basis. By the end of this course, you will be able to take a large raw dataset, clean it, manipulate it, and run machine learning algorithms to obtain results that influence business decisions.

BookJul 2019426 pages

What's New in TensorFlow 2.0

This book will cover all the new features that have been introduced in TensorFlow 2.0 especially the major highlight, including eager execution and more. You will learn how to make the best use of these features to migrate your codes from TensorFlow 1.x to TensorFlow 2.0 in a seamless way.

BookAug 2019202 pages

Deep Learning with TensorFlow 2 and Keras

Deep Learning with TensorFlow 2 and Keras, Second Edition teaches deep learning techniques alongside TensorFlow (TF) and Keras. The book introduces neural networks with TensorFlow, runs through the main applications, covers two working example apps, and then dives into TF and cloudin production, TF mobile, and using TensorFlow with AutoML.

BookDec 2019646 pages

Hands-On Transfer Learning with Python

The purpose of this book is two-fold, we focus on detailed coverage of deep learning and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus is on real-world examples and research problems using TensorFlow, Keras and Python ecosystem with hands-on examples.

BookAug 2018438 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages