You're reading from Practical Machine Learning on Databricks

Product typeBook

Published inNov 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781801812030

Edition1st Edition

Languages

Python

Concepts

Data Science

Author (1)

Debu Sinha

Create a Baseline Model Using Databricks AutoML

In the last chapter, we understood MLflow and all its components. After running the notebook from Chapter 4, Understanding MLflow Components on Databricks, you might have recognized how easy it actually is to start tracking your ML model training in Databricks using the integrated MLflow tracking server. In this chapter, we will cover another new and unique feature of Databricks called AutoML.

Databricks AutoML, like all the other features that are part of the Databricks workspace, is fully integrated with MLflow features and the Feature Store.

Databricks AutoML, at the time of writing of this book, supports classification, regression, and forecasting use cases using traditional ML algorithms and not deep learning. You can see a list of supported algorithms in the second section of the chapter.

You can use AutoML with a table registered in Databricks’ Hive metastore, feature tables, or even upload a new file using the...

Technical requirements

To go through the chapter, we’ll need the following requirements:

we'll need the execution of the notebooks pertaining to Chapter 3, which involves the ingestion of raw data from a CSV file into a Delta table and the subsequent registration of a new feature table, to have already been completed.

Understanding the need for AutoML

If you have never worked with any AutoML framework before, you might be wondering what AutoML is and when and how it can be useful.

AutoML simplifies the machine learning model development process by automating various tasks. It automatically generates baseline models tailored to your specific datasets and even offers preconfigured notebooks to kickstart your projects. This is particularly appealing to data scientists of all levels of expertise because it saves valuable time in the initial stages of model development. Instead of manually crafting models from scratch, AutoML provides a quick and efficient way to obtain baseline models, making it a valuable tool for both beginners and experienced data scientists alike.

AutoML makes machine learning not only accessible to citizen data scientists and business subject matter experts. AutoML, while undoubtedly a powerful tool, also grapples with significant limitations. One notable challenge is its...

Understanding AutoML in Databricks

Databricks AutoML uses a glass-box approach to AutoML. When you use Databricks AutoML either through the UI or through the supported Python API, it logs every combination of model and hyperparameter (trial) as an MLflow run and generates Python notebooks with source code corresponding to each model trial. The results of all these model trials are logged into the MLflow tracking server. Each of the trials can be compared and reproduced. Since you have access to the source code, the data scientists can easily rerun a trial after modifying the code. We will look at this in more detail when we go over the example.

Databricks AutoML also prepares the dataset for training and then performs model training and hyperparameter tuning on the Databricks cluster. One important thing to keep in mind here is that Databricks AutoML spreads hyperparameter tuning trials across the cluster. A trial is a unique configuration of hyperparameters associated with the...

Running AutoML on our churn prediction dataset

Let’s take a look at how to use Databricks AutoML with our bank customer churn prediction dataset.

If you executed the notebooks from Chapter 3, Utilizing the Feature Store, you will have raw data available as a Delta table in your Hive metastore. It has the name raw_data. In the Chapter 3 code, we read a CSV file from our Git repository with raw data, wrote that as a Delta table, and registered it in our integrated metastore. Take a look at cmd 15 in your notebook. In your environment, the dataset can be coming from another data pipeline or uploaded directly to the Databricks workspace using the Upload file functionality.

To view the tables, you need to have your cluster up and running.

Figure 5.1 – The location of the raw dataset

Let’s create our first Databricks AutoML experiment.

Important note

Make sure that before following the next steps, you have a cluster up and running...

Summary

In this chapter, we covered the importance of AutoML and how it can help data scientists get started and become productive with the problem at hand. We then covered the Databricks AutoML glassbox approach, which makes it easy to interpret model results and automatically capture lineage. We also learned how Databricks AutoML is integrated with the MLflow tracking server within the Databricks workspace.

In the next chapters, we will go over managing your ML model’s life cycle using the MLflow model registry and Webhooks in more detail.

Debu is an experienced Data Science and Engineering leader with deep expertise in Software Engineering and Solutions Architecture. With over 10 years in the industry, Debu has a proven track record in designing scalable Software Applications, Big Data, and Machine Learning systems. As Lead ML Specialist on the Specialist Solutions Architect team at Databricks, Debu focuses on AI/ML use cases in the cloud and serves as an expert on LLMs, Machine Learning, and MLOps. With prior experience as a startup co-founder, Debu has demonstrated skills in team-building, scaling, and delivering impactful software solutions. An established thought leader, Debu has received multiple awards and regularly speaks at industry events.
Read more about Debu Sinha

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Practical Machine Learning on Databricks

Create a Baseline Model Using Databricks AutoML

Technical requirements

Understanding the need for AutoML

Understanding AutoML in Databricks

Running AutoML on our churn prediction dataset

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook