You're reading from Automated Machine Learning

Product type Book

Published in Feb 2021

Publisher Packt

ISBN-13 9781800567689

Pages 312 pages

Edition 1st Edition

Languages

Python

Concepts

Machine Learning

Author (1):

Adnan Masood

Table of Contents (15) Chapters

Preface

Section 1: Introduction to Automated Machine Learning

Chapter 1: A Lap around Automated Machine Learning

Chapter 2: Automated Machine Learning, Algorithms, and Techniques

Chapter 3: Automated Machine Learning with Open Source Tools and Libraries

Section 2: AutoML with Cloud Platforms

Chapter 4: Getting Started with Azure Machine Learning

Chapter 5: Automated Machine Learning with Microsoft Azure

Chapter 6: Machine Learning with AWS

Chapter 7: Doing Automated Machine Learning with Amazon SageMaker Autopilot

Chapter 8: Machine Learning with Google Cloud Platform

Chapter 9: Automated Machine Learning with GCP

Section 3: Applied Automated Machine Learning

Chapter 10: AutoML in the Enterprise

Other Books You May Enjoy

Chapter 2: Automated Machine Learning, Algorithms, and Techniques

"Machine intelligence is the last invention that humanity will ever need to make."

– Nick Bostrom

"The key to artificial intelligence has always been the representation."

– Jeff Hawkins

"By far, the greatest danger of artificial intelligence is that people conclude too early that they understand it."

– Eliezer Yudkowsky

Automating the automation sounds like one of those wonderful Zen meta ideas, but learning to learn is not without its challenges. In the last chapter, we covered the Machine Learning (ML) development life cycle, and defined automated ML, with a brief overview of how it works.

In this chapter, we will explore under-the-hood technologies, techniques, and tools used to make automated ML possible. Here, you will see how AutoML actually works, the algorithms and techniques of automated feature engineering, automated model and hyperparameter...

Automated ML – Opening the hood

To oversimplify, a typical ML pipeline comprises data cleaning, feature selection, pre-processing, model development, deployment, and consumption steps, as seen in the following workflow:

Figure 2.1 – The ML life cycle

The goal of automated ML is to simplify and democratize the steps of this pipeline so that it is accessible by citizen data scientists. Originally, the key focus of the automated ML community was model selection and hyperparameter tuning, that is, finding the best-performing model for the job and the corresponding parameters that work best for the problem. However, in recent years, it has been shifted to include the entire pipeline as shown in the following diagram:

Figure 2.2 – A simplified AutoML pipeline by Waring et al.

The notion of meta-learning, that is, learning to learn, is an overarching theme in the automated ML landscape. Meta-learning techniques are used...

Automated feature engineering

Feature engineering is the art and science of extracting and selecting the right attributes from the dataset. It is an art because it not only requires subject matter expertise, but also domain knowledge and an understanding of ethical and social concerns. From a scientific perspective, the importance of a feature is highly correlated with its resulting impact on the outcome. Feature importance in predictive modeling measures how much a feature influences the target, hence making it easier in retrospect to assign ranking to attributes with the most impact. The following diagram explains how the iterative process of automated feature generation works, by generating candidate features, ranking them, and then selecting the specific ones to become part of the final feature set:

Figure 2.5 – Iterative feature generation process by Zoller et al. Benchmark and survey of automated ML frameworks, 2020

Extracting a feature from the...

Hyperparameter optimization

Due to its ubiquity and ease of framing, hyperparameter optimization is sometimes regarded as being synonymous with automated ML. Depending on the search space, if you include features, hyperparameter optimization, also dubbed hyperparameter tuning and hyperparameter learning, is known as automated pipeline learning. All these terms can be bit daunting for something as simple as finding the right parameters for a model, but graduating students must publish, and I digress.

There are a couple of key points regarding hyperparameters that are important to note as we look further into these constructs. It is well established that the default parameters are not optimized. Olson et al., in their NIH paper, demonstrated how the default parameters are almost always a bad idea. Olson mentions that "Tuning often improves an algorithm's accuracy by 3–5%, depending on the algorithm…. In some cases, parameter tuning led to CV accuracy improvements...

Neural architecture search

Selecting models can be challenging. In the case of regression, that is, predicting a numerical value, you have a choice of linear regression, decision trees, random forest, lasso versus ridge regression, k-means elastic net, gradient boosting methods, including XGBoost, and SVMs, among many others.

For classification, that in other words, separating out things by classes, you have logistic regression, random forest, AdaBoost, gradient boost, and SVM-based classifiers at your disposal.

Neural architecture has the notion of search space, which defines which architectures can be used in principle. Then, a search strategy must be defined that outlines how to explore using the exploration-exploitation trade-off. Finally, there has to be a performance estimation strategy, which estimates the candidate's performance. This includes training and validation of the architecture.

There are several techniques for performing the exploration of search...

Summary

Today, the success of ML within an enterprise largely depends on human ML experts who can construct business-specific features and workflows. Automated ML aims to change this, as it aims to automate ML so as to provide off-the-shelf ML methods that can be utilized without expert knowledge. To understand how automated ML works, we need to review the underlying four subfields, or pillars, of automated ML: hyperparameter optimization; automated feature engineering; neural architecture search; and meta-learning.

In this chapter, we explained what is under the hood in terms of the technologies, techniques, and tools used to make automated ML possible. We hope that this chapter has introduced you to automated ML techniques and that you are now ready to do a deeper dive into the implementation phase.

In the next chapter, we will review the open source tools and libraries that implement these algorithms to get a hands-on overview of how to use these concepts in practice, so...