Reader small image

You're reading from  Practical Machine Learning on Databricks

Product typeBook
Published inNov 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781801812030
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Debu Sinha
Debu Sinha
author image
Debu Sinha

Debu is an experienced Data Science and Engineering leader with deep expertise in Software Engineering and Solutions Architecture. With over 10 years in the industry, Debu has a proven track record in designing scalable Software Applications, Big Data, and Machine Learning systems. As Lead ML Specialist on the Specialist Solutions Architect team at Databricks, Debu focuses on AI/ML use cases in the cloud and serves as an expert on LLMs, Machine Learning, and MLOps. With prior experience as a startup co-founder, Debu has demonstrated skills in team-building, scaling, and delivering impactful software solutions. An established thought leader, Debu has received multiple awards and regularly speaks at industry events.
Read more about Debu Sinha

Right arrow

Utilizing the Feature Store

In the last chapter, we briefly touched upon what a feature store is and how Databricks Feature Store is unique in its own way.

This chapter will take a more hands-on approach and utilize Databricks Feature Store to register our first feature table and discuss concepts related to Databricks Feature Store.

We will be covering the following topics:

  • Diving into feature stores and the problems they solve
  • Discovering feature stores on the Databricks platform
  • Registering your first feature table in Databricks Feature Store

Technical requirements

All the code is available on the GitHub repository https://github.com/PacktPublishing/Practical-Machine-Learning-on-Databricks and is self-contained. To execute the notebooks, you can import the code repository directly into your Databricks workspace using Repos. We discussed Repos in the second chapter.

Working knowledge of Delta format is required. If you are new to Delta format, check out https://docs.databricks.com/en/delta/index.html and https://docs.databricks.com/en/delta/tutorial.html before going forward.

Diving into feature stores and the problems they solve

As more teams in the organization start to use AI and ML to solve various business use cases, it becomes necessary to have a centralized, reusable, and easily discoverable feature repository. This repository is called a feature store.

All the curated features are in centralized, governed, access-controlled storage, such as a curated data lake. Different data science teams can be granted access to feature tables based on their needs. Like in enterprise data lakes, we can track data lineage; similarly, we can track the lineage of a feature table logged in Databricks Feature Store. We can also see all the downstream models that are consuming features from a registered feature table.

There are hundreds of data science teams tackling different business questions in large organizations. Each team may have its own domain knowledge and expertise. Performing feature engineering often requires heavy processing. Without a feature store...

Discovering feature stores on the Databricks platform

Each Databricks workspace has its own feature store. At the time of writing this book, Databricks Feature Store only supports the Python API. The latest Python API reference is located at https://docs.databricks.com/applications/machine-learning/feature-store/python-api.html.

Databricks Feature Store is fully integrated with Managed MLFlow and other Databricks components. This allows models that are deployed by utilizing MLFlow to automatically retrieve the features at the time of training and inference. The exact steps involved in defining a feature table and using it with model training and inference are going to be covered in the following sections.

Let’s look at some of the key concepts and terminology associated with Databricks Feature Store.

Feature table

As the name suggests, a feature store stores features generated by data scientists after doing feature engineering for a particular problem.

These features...

Registering your first feature table in Databricks Feature Store

Before we get started, the code needs to be downloaded from the Git repository accompanying this book (https://github.com/debu-sinha/Practical_Data_Science_on_Databricks.git).

We will use the Databricks repository feature to clone the GitHub repo.

To clone the code repository, complete the following steps:

  1. Click on the Repos tab and select your username:
Figure 3.1 – A screenshot displaying the Repos tab

Figure 3.1 – A screenshot displaying the Repos tab

Important note

In light of a recent user interface update, the 'Repos' section has been moved and can now be accessed by clicking on the 'Workspaces' icon, as illustrated in the following image.

Despite this change, the workflow outlined in this chapter remains applicable.

  1. Right-click and add the repo:
Figure 3.2 – A screenshot displaying how to clone the code for this chapter (step 2)

Figure 3.2 – A screenshot displaying how to clone the code for this chapter (step 2)

...

Summary

In this chapter, we got a deeper understanding of feature stores, the problems they solve, and a detailed look into the feature store implementation within the Databricks environment. We also went through an exercise to register our first feature table. This will enable us to utilize the feature table to create our first ML model as we discussed in the MLFlow chapter.

Next, we will cover MLFlow in detail.

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Machine Learning on Databricks
Published in: Nov 2023Publisher: PacktISBN-13: 9781801812030
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Debu Sinha

Debu is an experienced Data Science and Engineering leader with deep expertise in Software Engineering and Solutions Architecture. With over 10 years in the industry, Debu has a proven track record in designing scalable Software Applications, Big Data, and Machine Learning systems. As Lead ML Specialist on the Specialist Solutions Architect team at Databricks, Debu focuses on AI/ML use cases in the cloud and serves as an expert on LLMs, Machine Learning, and MLOps. With prior experience as a startup co-founder, Debu has demonstrated skills in team-building, scaling, and delivering impactful software solutions. An established thought leader, Debu has received multiple awards and regularly speaks at industry events.
Read more about Debu Sinha