You're reading from Azure Databricks Cookbook

Product typeBook

Published inSep 2021

PublisherPackt

ISBN-139781789809718

Edition1st Edition

Concepts

Data Streaming

Authors (2):

Phani Raj

Vinod Jaiswal

View More author details

Chapter 9: DevOps Integrations and Implementing CI/CD for Azure Databricks

DevOps is the core of any project/organization these days. DevOps enables organizations to build and quickly deploy their applications and solutions to various environments by providing a framework that can be used for seamless deployment. In this chapter, you will learn how Azure DevOps is used for Continuous Integration and Continuous Deployment for Azure Databricks notebooks. Knowing how Azure DevOps works is helpful as it will help you to plan, develop, deliver, and operate your end-to-end business applications.

In this chapter, we're going to cover the following main topics:

How to integrate Azure DevOps with an Azure Databricks notebook
Using GitHub for Azure Databricks notebook version control
Understanding the CI/CD process for Azure Databricks
How to set up an Azure DevOps pipeline for deploying notebooks
Deploying notebooks to multiple environments
Enabling CI...

Technical requirements

To follow along with the examples shown in the recipes, you will need to have the following:

An Azure subscription and the required permissions on the subscription that was mentioned in the Technical requirements section of Chapter 1, Creating Azure Databricks Service.
We will be using an Azure Databricks premium workspace for this chapter. There is no need to spin up a cluster in the workspace as we are not running any notebooks.
An Azure DevOps repo should be created, and you need to ensure the Azure DevOps Services organization is linked to the same Azure AD tenant as Databricks. You can follow along with the steps mentioned at the following link to create a repository if you don't have one already created: https://docs.microsoft.com/en-in/azure/devops/repos/git/create-new-repo?view=azure-devops.

Once you have the repository created, you can get started with this chapter.

How to integrate Azure DevOps with an Azure Databricks notebook

Nowadays, DevOps is an integral part of any project and is heavily used for deploying the resources and artifacts to various environments apart from other services and features that Azure DevOps provides. Integrating Azure DevOps with Azure Databricks helps teams and organizations to source control the notebooks that can be used for collaborations and this enables the enterprise practice of Continuous Integration and Continuous Deployment for Azure Databricks resources and notebooks. In this recipe, you will learn how to integrate Azure DevOps with Azure Databricks.

Getting ready

Before starting with this recipe, you need to ensure that you have the resources created as mentioned in the Technical requirements section of the current chapter.

The Azure DevOps Services organization must be linked to the same Azure AD tenant that the Azure Databricks resource is part of. You need to set the Gitprovider to Azure DevOps...

Using GitHub for Azure Databricks notebook version control

Apart from using Azure DevOps for version control of your Azure Databricks notebooks, you can also use GitHub. Using a specific type of version control depends upon the organizational, project, and business needs. In this recipe, you will learn how to integrate a GitHub repository with Azure Databricks to version control your notebooks.

Getting ready

Before starting with this recipe, you need to ensure that you have generated a personal access token, which is used for authentication to GitHub. To generate a personal access token, go through the steps mentioned at the following link: https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token.

You need to only select the repo scope as shown in the following screenshot; other scopes are not required before generating the token:

Figure 9.14 – GitHub Personal access...

Understanding the CI/CD process for Azure Databricks

In this recipe, we will learn what the advantage is of using the Continuous Integration and Continuous Delivery (CI/CD) process while working with Azure Databricks.

CI/CD is a method to frequently integrate code changes with a repository and deploy them to other environments. It offers an automated way of integrating, testing, and deploying code changes. CI in Azure Databricks enables developers to regularly build, test, and merge code changes to a shared repository. It's a solution for the very common problem of having multiple developers working on the same code or having multiple branches that could cause a conflict with each other in the software development life cycle.

CD refers to continuous delivery and/or continuous deployment. Continuous delivery means developers' changes are automatically tested and merged or uploaded to a repository such as Azure DevOps or Git. From there, changes can be deployed to other...

How to set up an Azure DevOps pipeline for deploying notebooks

An Azure DevOps release pipeline provides users with an option to automate the process of deploying various Azure resources such as Azure Databricks and Azure SQL Database to different environments such as dev, test, UAT, and production. It helps project teams to streamline the process of deployment and have a consistent deployment framework created for all their deployments to Azure. We can use Azure DevOps release pipelines to deploy Azure Databricks artifacts such as notebooks and libraries to various environments.

Setting up the DevOps build and release pipeline will enable you to implement CI/CD for Azure Databricks notebooks.

Getting ready

Before getting started on this recipe, you should have completed the first recipe of this chapter where you integrated Azure DevOps with an Azure Databricks workspace. After you check in all your notebooks, you can move ahead with the next section of this recipe.

You...

Deploying notebooks to multiple environments

The Azure DevOps CI/CD process can be used to deploy Azure resources and artifacts to various environments from the same release pipelines. Also, we can set the deployment sequence specifically to the needs of a project or application. For example, you can deploy notebooks to the test environment first. If the deployment to the test environment succeeds, then deploy them to UAT, and later, upon approval of the changes, they can be deployed to the production environment. In this recipe, you will learn how to use variable groups and mapping groups in a specific environment and use the values in variables at runtime to deploy Azure Databricks notebooks to different environments.

In variable groups, we create a set of variables that hold values that can be used in the release pipeline for all the stages or scope it to one specific stage. For example, if we have two stages in the release pipeline that are deploying to different environments...

Enabling CI/CD in an Azure DevOps build and release pipeline

As you have learned in the preceding recipes how to create DevOps pipelines and the concepts around the CI/CD process for Azure Databricks notebooks, in this recipe you will implement CI/CD for the Azure DevOps pipeline that you created in the How to set up an Azure DevOps pipeline for deploying notebooks recipe.

In this recipe, you will learn the entire process of how to enable a build to be triggered automatically when changes are merged to the main branch. Later, you will learn how to trigger the release pipeline automatically when the build succeeds, which will deploy the artifacts (notebooks) to the different Databricks workspaces.

Getting ready

Before getting started, you need to complete the following two recipes of this chapter:

How to set up an Azure DevOps pipeline for deploying notebooks
Deploying notebooks to multiple environments

The following is the flow for the automated CI/CD process...

Deploying an Azure Databricks service using an Azure DevOps release pipeline

Azure DevOps release pipelines can be used to automate the process of Azure resource deployment to different environments. In this recipe, you will learn how to deploy Azure Databricks to a specific resource group in a subscription. Knowing this process will not only help you to deploy Azure Databricks services but can also be used to deploy other Azure resources as well.

Getting ready

In this recipe, we will be using ARM templates to deploy Azure Databricks resources from Azure DevOps pipelines. To use the ARM templates in the Azure DevOps release pipeline, you will have to check in the ARM JSON files. You can download the JSON files from the following link: https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.databricks/databricks-workspace.

The following is a screenshot of the folder in the Azure DevOps repo where we have checked in the JSON files:

...

The rest of the chapter is locked

You have been reading a chapter from

Azure Databricks Cookbook

Published in: Sep 2021Publisher: PacktISBN-13: 9781789809718

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Phani Raj

Phani Raj is an experienced data architect and a product manager having 15 years of experience working with customers on building data platforms on both on-prem and on cloud. Worked on designing and implementing large scale big data solutions for customers on different verticals. His passion for continuous learning and adapting to the dynamic nature of technology underscores his role as a trusted advisor in the realm of data architecture ,data science and product management.
Read more about Phani Raj

Vinod Jaiswal

Vinod Jaiswal is an experienced data engineer, excels in transforming raw data into valuable insights. With over 8 years in Databricks, he designs and implements data pipelines, optimizes workflows, and crafts scalable solutions for intricate data challenges. Collaborating seamlessly with diverse teams, Vinod empowers them with tools and expertise to leverage data effectively. His dedication to staying updated on the latest data engineering trends ensures cutting-edge, robust solutions. Apart from technical prowess, Vinod is a proficient educator. Through presentations and mentoring, he shares his expertise, enabling others to harness the power of data within the Databricks ecosystem.
Read more about Vinod Jaiswal

Other recommended products

Related to this chapter

Azure Data Engineering Cookbook

This book will help you design and implement modern ETL workflows along with data management, monitoring, and security aspects to meet the current organization's needs. You will use various services such as Azure Data Factory, Azure Databricks, Azure Stream Analytics, and Azure Data Explorer to design efficient data processing solutions.

BookApr 2021454 pages

Distributed Data Systems with Azure Databricks

This book helps you to learn how to extract, transform, and orchestrate massive amounts of data to develop robust data pipelines. You'll perform complex machine learning tasks using advanced Azure Databricks features, and also explore model tuning, deployment, and control using Databricks functionalities such as AutoML and Delta Lake with TensorFlow.

BookMay 2021414 pages

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure is an end-to-end guide to processing and analyzing big data using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.

BookNov 2019242 pages

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

BookJun 2021392 pages

Cloud Scale Analytics with Azure Data Services

This book will help you to understand the architectural components of a modern data warehouse and select those suitable for your requirements. You’ll learn everything from how to integrate your source data into Azure Data Lake at scale to how to structure your analytical data estate and more.

BookJul 2021520 pages

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Stream Analytics with Microsoft Azure

This book is your guide to understanding the basics of how Azure Stream Analytics works, and build your own analytics solution using its capabilities. By the end of this book, you will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution which can work with any type of data.

BookDec 2017322 pages

Data Modeling for Azure Data Services

Data modeling for Azure Data Services teaches you the core concepts of setting up different types of databases for different use cases. With this hands-on guide, you'll learn how to implement the resulting data model in Azure efficiently.

BookJul 2021428 pages

Exam Ref AZ-304 Microsoft Azure Architect Design Certification and Beyond

If you're taking the AZ-304 Microsoft Azure Architect Design exam, you need to know which Azure technologies to choose and when. Exam Ref AZ-304 Microsoft Azure Architect Design Certification and Beyond prepares you for the AZ-304 exam and shows you how to design scalable and secure solutions using compute, storage, data, monitoring, and logging.

BookJul 2021520 pages

Azure for Architects

Azure cloud services have risen rapidly and there is also a gradual increase in the number of organizations that adopt Azure for their cloud services. This 3rd edition will assist readers to create a comprehensive Azure cloud solution that is Enterprise-class and ready for the future.

BookJul 2020698 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages