Reader small image

You're reading from  Azure Databricks Cookbook

Product typeBook
Published inSep 2021
PublisherPackt
ISBN-139781789809718
Edition1st Edition
Right arrow
Authors (2):
Phani Raj
Phani Raj
author image
Phani Raj

Phani Raj is an experienced data architect and a product manager having 15 years of experience working with customers on building data platforms on both on-prem and on cloud. Worked on designing and implementing large scale big data solutions for customers on different verticals. His passion for continuous learning and adapting to the dynamic nature of technology underscores his role as a trusted advisor in the realm of data architecture ,data science and product management.
Read more about Phani Raj

Vinod Jaiswal
Vinod Jaiswal
author image
Vinod Jaiswal

Vinod Jaiswal is an experienced data engineer, excels in transforming raw data into valuable insights. With over 8 years in Databricks, he designs and implements data pipelines, optimizes workflows, and crafts scalable solutions for intricate data challenges. Collaborating seamlessly with diverse teams, Vinod empowers them with tools and expertise to leverage data effectively. His dedication to staying updated on the latest data engineering trends ensures cutting-edge, robust solutions. Apart from technical prowess, Vinod is a proficient educator. Through presentations and mentoring, he shares his expertise, enabling others to harness the power of data within the Databricks ecosystem.
Read more about Vinod Jaiswal

View More author details
Right arrow

Chapter 9: DevOps Integrations and Implementing CI/CD for Azure Databricks

DevOps is the core of any project/organization these days. DevOps enables organizations to build and quickly deploy their applications and solutions to various environments by providing a framework that can be used for seamless deployment. In this chapter, you will learn how Azure DevOps is used for Continuous Integration and Continuous Deployment for Azure Databricks notebooks. Knowing how Azure DevOps works is helpful as it will help you to plan, develop, deliver, and operate your end-to-end business applications.

In this chapter, we're going to cover the following main topics:

  • How to integrate Azure DevOps with an Azure Databricks notebook
  • Using GitHub for Azure Databricks notebook version control
  • Understanding the CI/CD process for Azure Databricks
  • How to set up an Azure DevOps pipeline for deploying notebooks
  • Deploying notebooks to multiple environments
  • Enabling CI...

Technical requirements

To follow along with the examples shown in the recipes, you will need to have the following:

  • An Azure subscription and the required permissions on the subscription that was mentioned in the Technical requirements section of Chapter 1, Creating Azure Databricks Service.
  • We will be using an Azure Databricks premium workspace for this chapter. There is no need to spin up a cluster in the workspace as we are not running any notebooks.
  • An Azure DevOps repo should be created, and you need to ensure the Azure DevOps Services organization is linked to the same Azure AD tenant as Databricks. You can follow along with the steps mentioned at the following link to create a repository if you don't have one already created: https://docs.microsoft.com/en-in/azure/devops/repos/git/create-new-repo?view=azure-devops.

Once you have the repository created, you can get started with this chapter.

How to integrate Azure DevOps with an Azure Databricks notebook

Nowadays, DevOps is an integral part of any project and is heavily used for deploying the resources and artifacts to various environments apart from other services and features that Azure DevOps provides. Integrating Azure DevOps with Azure Databricks helps teams and organizations to source control the notebooks that can be used for collaborations and this enables the enterprise practice of Continuous Integration and Continuous Deployment for Azure Databricks resources and notebooks. In this recipe, you will learn how to integrate Azure DevOps with Azure Databricks.

Getting ready

Before starting with this recipe, you need to ensure that you have the resources created as mentioned in the Technical requirements section of the current chapter.

The Azure DevOps Services organization must be linked to the same Azure AD tenant that the Azure Databricks resource is part of. You need to set the Gitprovider to Azure DevOps...

Using GitHub for Azure Databricks notebook version control

Apart from using Azure DevOps for version control of your Azure Databricks notebooks, you can also use GitHub. Using a specific type of version control depends upon the organizational, project, and business needs. In this recipe, you will learn how to integrate a GitHub repository with Azure Databricks to version control your notebooks.

Getting ready

Before starting with this recipe, you need to ensure that you have generated a personal access token, which is used for authentication to GitHub. To generate a personal access token, go through the steps mentioned at the following link: https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token.

You need to only select the repo scope as shown in the following screenshot; other scopes are not required before generating the token:

Figure 9.14 – GitHub Personal access...

Understanding the CI/CD process for Azure Databricks

In this recipe, we will learn what the advantage is of using the Continuous Integration and Continuous Delivery (CI/CD) process while working with Azure Databricks.

CI/CD is a method to frequently integrate code changes with a repository and deploy them to other environments. It offers an automated way of integrating, testing, and deploying code changes. CI in Azure Databricks enables developers to regularly build, test, and merge code changes to a shared repository. It's a solution for the very common problem of having multiple developers working on the same code or having multiple branches that could cause a conflict with each other in the software development life cycle.

CD refers to continuous delivery and/or continuous deployment. Continuous delivery means developers' changes are automatically tested and merged or uploaded to a repository such as Azure DevOps or Git. From there, changes can be deployed to other...

How to set up an Azure DevOps pipeline for deploying notebooks

An Azure DevOps release pipeline provides users with an option to automate the process of deploying various Azure resources such as Azure Databricks and Azure SQL Database to different environments such as dev, test, UAT, and production. It helps project teams to streamline the process of deployment and have a consistent deployment framework created for all their deployments to Azure. We can use Azure DevOps release pipelines to deploy Azure Databricks artifacts such as notebooks and libraries to various environments.

Setting up the DevOps build and release pipeline will enable you to implement CI/CD for Azure Databricks notebooks.

Getting ready

Before getting started on this recipe, you should have completed the first recipe of this chapter where you integrated Azure DevOps with an Azure Databricks workspace. After you check in all your notebooks, you can move ahead with the next section of this recipe.

You...

Deploying notebooks to multiple environments

The Azure DevOps CI/CD process can be used to deploy Azure resources and artifacts to various environments from the same release pipelines. Also, we can set the deployment sequence specifically to the needs of a project or application. For example, you can deploy notebooks to the test environment first. If the deployment to the test environment succeeds, then deploy them to UAT, and later, upon approval of the changes, they can be deployed to the production environment. In this recipe, you will learn how to use variable groups and mapping groups in a specific environment and use the values in variables at runtime to deploy Azure Databricks notebooks to different environments.

In variable groups, we create a set of variables that hold values that can be used in the release pipeline for all the stages or scope it to one specific stage. For example, if we have two stages in the release pipeline that are deploying to different environments...

Enabling CI/CD in an Azure DevOps build and release pipeline

As you have learned in the preceding recipes how to create DevOps pipelines and the concepts around the CI/CD process for Azure Databricks notebooks, in this recipe you will implement CI/CD for the Azure DevOps pipeline that you created in the How to set up an Azure DevOps pipeline for deploying notebooks recipe.

In this recipe, you will learn the entire process of how to enable a build to be triggered automatically when changes are merged to the main branch. Later, you will learn how to trigger the release pipeline automatically when the build succeeds, which will deploy the artifacts (notebooks) to the different Databricks workspaces.

Getting ready

Before getting started, you need to complete the following two recipes of this chapter:

  • How to set up an Azure DevOps pipeline for deploying notebooks
  • Deploying notebooks to multiple environments

The following is the flow for the automated CI/CD process...

Deploying an Azure Databricks service using an Azure DevOps release pipeline

Azure DevOps release pipelines can be used to automate the process of Azure resource deployment to different environments. In this recipe, you will learn how to deploy Azure Databricks to a specific resource group in a subscription. Knowing this process will not only help you to deploy Azure Databricks services but can also be used to deploy other Azure resources as well.

Getting ready

In this recipe, we will be using ARM templates to deploy Azure Databricks resources from Azure DevOps pipelines. To use the ARM templates in the Azure DevOps release pipeline, you will have to check in the ARM JSON files. You can download the JSON files from the following link: https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.databricks/databricks-workspace.

The following is a screenshot of the folder in the Azure DevOps repo where we have checked in the JSON files:

...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Databricks Cookbook
Published in: Sep 2021Publisher: PacktISBN-13: 9781789809718
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Phani Raj

Phani Raj is an experienced data architect and a product manager having 15 years of experience working with customers on building data platforms on both on-prem and on cloud. Worked on designing and implementing large scale big data solutions for customers on different verticals. His passion for continuous learning and adapting to the dynamic nature of technology underscores his role as a trusted advisor in the realm of data architecture ,data science and product management.
Read more about Phani Raj

author image
Vinod Jaiswal

Vinod Jaiswal is an experienced data engineer, excels in transforming raw data into valuable insights. With over 8 years in Databricks, he designs and implements data pipelines, optimizes workflows, and crafts scalable solutions for intricate data challenges. Collaborating seamlessly with diverse teams, Vinod empowers them with tools and expertise to leverage data effectively. His dedication to staying updated on the latest data engineering trends ensures cutting-edge, robust solutions. Apart from technical prowess, Vinod is a proficient educator. Through presentations and mentoring, he shares his expertise, enabling others to harness the power of data within the Databricks ecosystem.
Read more about Vinod Jaiswal