Reader small image

You're reading from  Automated Machine Learning

Product typeBook
Published inFeb 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800567689
Edition1st Edition
Languages
Right arrow
Author (1)
Adnan Masood
Adnan Masood
author image
Adnan Masood

Adnan Masood, PhD is an artificial intelligence and machine learning researcher, visiting scholar at Stanford AI Lab, software engineer, Microsoft MVP (Most Valuable Professional), and Microsoft's regional director for artificial intelligence. As chief architect of AI and machine learning at UST Global, he collaborates with Stanford AI Lab and MIT CSAIL, and leads a team of data scientists and engineers building artificial intelligence solutions to produce business value and insights that affect a range of businesses, products, and initiatives.
Read more about Adnan Masood

Right arrow

Chapter 3: Automated Machine Learning with Open Source Tools and Libraries

"Empowerment of individuals is a key part of what makes open source work since, in the end, innovations tend to come from small groups, not from large, structured efforts."

– Tim O'Reilly

"In open source, we feel strongly that to really do something well, you have to get a lot of people involved."

– Linus Torvalds

In the previous chapter, you looked under the hood of automated Machine Learning (ML) technologies, techniques, and tools. You learned how AutoML actually works – that is, the algorithms and techniques of automated feature engineering, automated model and hyperparameter turning, and automated deep learning. You also explored Bayesian optimization, reinforcement learning, the evolutionary algorithm, and various gradient-based approaches by looking at their use in automated ML.

However, as a hands-on engineer, you probably don't get the...

Technical requirements

The technical requirements for this chapter are as follows:

The open source ecosystem for AutoML

By reviewing the history of automated ML, it is evident that, in the early days, the focus had always been on hyperparameter optimization. The earlier tools, such as AutoWeka and HyperoptSkLearn, and later TPOT, had an original focus on using Bayesian optimization techniques to find the most suitable hyperparameters for the model. However, this trend shifted left to include model selection, which eventually engulfed the entire pipeline by including feature selection, preprocessing, construction, and data cleaning. The following table shows some of the prominent automated ML tools that are available, including TPOT, AutoKeras, auto-sklearn, and Featuretools, along with their optimization techniques, ML tasks, and training frameworks:

Figure 3.1 – Features of automated ML frameworks

For several of the examples in this chapter, we will be using the MNIST database of handwritten digits. We will be using the scikit-learn...

Introducing TPOT

The Tree-based Pipeline Optimization Tool, or TPOT for short, is a product of the University of Pennsylvania's, Computational Genetics Lab. TPOT is an automated ML tool written in Python. It helps build and optimize ML pipelines with genetic programming. Built on top of scikit-learn, TPOT helps automate the feature selection, preprocessing, construction, model selection, and parameter optimization processes by "exploring thousands of possible pipelines to find the best one". It is one of the only toolkits with a short learning curve.

The toolkit is available on GitHub to be downloaded: github.com/EpistasisLab/tpot.

To explain the framework, let's start with a minimal working example. For this example, we will be using the MNIST database of handwritten digits:

  1. Create a new Colab notebook and run pip install TPOT. TPOT can be directly used from the command line or via Python code:

    Figure 3.3 – Installing TPOT on a Colab notebook...

Introducing Featuretools

Featuretools is an excellent Python framework that helps with automated feature engineering by using DFS. Feature engineering is a tough problem due to its very nuanced nature. However, this open source toolkit, with its robust timestamp handling and reusable feature primitives, provides a proper framework for us to build and extract combinations of features and their impact.

The toolkit is available on GitHub to be downloaded: https://github.com/FeatureLabs/featuretools/. The following steps will guide you through how to install Featuretools, as well as how to run an automated ML experiment using the library. Let's get started:

  1. To start Featuretools in Colab, you will need to use pip to install the package. In this example, we will try to create features for the Boston Housing Prices dataset:

    Figure 3.19 – AutoML with Featuretools – installing Featuretools

    In this experiment, we will be using the Boston Housing Prices dataset...

Introducing Microsoft NNI

Microsoft Neural Network Intelligence (NNI) is an open source platform that addresses the three key areas of any automated ML life cycle – automated feature engineering, architectural search (also referred to as neural architectural search or NAS), and hyperparameter tuning (HPI). The toolkit also offers model compression features and operationalization. NNI comes with many hyperparameter tuning algorithms already built in.

A high-level architecture diagram of NNI is as follows:

Figure 3.26 – Microsoft NNI high-level architecture

NNI has several state-of-the-art hyperparameter optimization algorithms built in, and they are called tuners. The list includes TPE, Random Search, Anneal, Naive Evolution, SMAC, Metis Tuner, Batch Tuner, Grid Search, GP Tuner, Network Morphism, Hyperband, BOHB, PPO Tuner, and PBT Tuner.

The toolkit is available on GitHub to be downloaded: https://github.com/microsoft/nni. More information...

Introducing auto-sklearn

scikit-learn (also known as sklearn) is a very popular ML library for Python development – so popular that it has its own memes:

Figure 3.40 – An ML meme

As part of this ecosystem and based on Efficient and Robust Automated Machine Learning by Feurer et al., auto-sklearn is an automated ML toolkit that performs algorithm selection and hyperparameter tuning using Bayesian optimizationmeta-learning, and ensemble construction.

The toolkit is available on GitHub to be downloaded: github.com/automl/auto-sklearn.

auto-sklearn touts its ease of use for performing automated ML since it's a four-line automated ML solution:

Figure 3.41 – AutoML with auto-sklearn – getting started

If the preceding syntax looks familiar, then it's because this is how scikit-learn does predictions and therefore makes auto-sklearn one of the easiest libraries to use. auto-sklearn uses...

AutoKeras

Keras is one of the most widely used deep learning frameworks and is an integral part of the TensorFlow 2.0 ecosystem. Auto-Keras is based on the paper by Jin et al., (https://arxiv.org/abs/1806.10282) which proposed "a novel method for efficient neural architecture search with network morphism, enabling Bayesian optimization". AutoKeras is built on the concept that since existing neural architecture search algorithms such as NASNet and PNAS are computationally quite expensive, using Bayesian optimization to guide the network's morphism is an efficient approach to explore the search space.

The toolkit is available on GitHub to be downloaded: github.com/jhfjhfj1/autokeras.

The following steps will guide you through how to install AutoKeras and how to run an automated ML experiment using the library. Let's get started:

  1. To get started with Auto-Keras, run the following install commands in Colab or in a Jupyter Notebook. Doing this will install...

AutoGluon – the AutoML toolkit for deep learning

From AWS Labs, with the goal of democratization of ML in mind, AutoGluon is described as being developed to enable "easy-to-use and easy-to-extend AutoML with a focus on deep learning and real-world applications spanning image, text, or tabular data". AutoGluon, an integral part of AWS's automated ML strategy, enables both junior and seasoned data scientists to build deep learning models and end-to-end solutions with ease. Like other automated ML toolkits, AutoGluon offers network architecture search, model selection, and the ability for you to improve custom models.

The toolkit is available on GitHub to be downloaded: https://github.com/awslabs/autogluon.

Summary

In this chapter, you reviewed some major open source tools that are used for AutoML, including TPOT, AutoKeras, auto-sklearn, Featuretools, and Microsoft NNI. These tools have been provided to help you understand the concepts we discussed in Chapter 2, Automated Machine Learning, Algorithms, and Techniques, and the underlying approaches that are used in each of these libraries.

In the next chapter, we will do an in-depth review of commercial automated ML offerings, starting with the Microsoft Azure platform.

Further reading

For more information on the topics that were covered in this chapter, please refer to the resources and links:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Automated Machine Learning
Published in: Feb 2021Publisher: PacktISBN-13: 9781800567689
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Adnan Masood

Adnan Masood, PhD is an artificial intelligence and machine learning researcher, visiting scholar at Stanford AI Lab, software engineer, Microsoft MVP (Most Valuable Professional), and Microsoft's regional director for artificial intelligence. As chief architect of AI and machine learning at UST Global, he collaborates with Stanford AI Lab and MIT CSAIL, and leads a team of data scientists and engineers building artificial intelligence solutions to produce business value and insights that affect a range of businesses, products, and initiatives.
Read more about Adnan Masood