Azure Databricks is a high-performance Apache Spark-based platform that has been optimized for the Microsoft Azure cloud.
It offers three environments for building and developing data applications:
- Databricks Data Science and Engineering: This provides an interactive workspace that enables collaboration between data engineers, data scientists, machine learning engineers, and business analysts and allows you to build big data pipelines.
- Databricks SQL: This allows you to run ad hoc SQL queries on your data lake and supports multiple visualization types to explore your query results.
- Databricks Machine Learning: Provides end-to-end machine learning environment for feature development, model training , experiment tracking along with model serving and management.
In this chapter, we will cover how to create an Azure Databricks service using the Azure portal, Azure CLI, and ARM templates. We will learn about different...
To follow along with the examples in this chapter, you will need to have the following:
- An Azure subscription
- The Azure CLI installed on one of the following platforms:
(a) For Windows (install the Azure CLI for Windows | Microsoft Docs https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli)
(b) For Mac (install the Azure CLI for macOS | Microsoft Docs https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-macos)
(c) For Linux (install the Azure CLI for Linux manually | Microsoft Docs https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-linux?pivots=apt)
- You can find the scripts for this chapter at https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Chapter01. The
Chapter-01folder contains the script for this chapter.
- As an Azure AD user, you will need the Contributor role in your subscription and create the Azure Databricks service via the Azure portal. You must also be the admin...
Creating a Databricks workspace in the Azure portal
There are multiple ways we can create an Azure Databricks service. This recipe will focus on creating the service in the Azure portal. This method is usually used for learning purposes or ad hoc requests. The preferred methods for creating services are using the Azure PowerShell, Azure CLI and ARM templates.
By the end of this recipe, you will have learned how to create an Azure Databricks service instance using the Azure portal.
You will need access via a subscription to the service and have a Contributor role available in it.
How to do it…
Follow these steps to create a Databricks service using the Azure portal:
- Log into the Azure portal (https://portal.azure.com) and click on Create a resource. Then, search for
Databricksand click on Create:
- Create a new resource group (
CookbookRG) or pick any existing resource group...
Creating a Databricks service using the Azure CLI (command-line interface)
By the end of this recipe, you will know how to use the Azure CLI and deploy Azure Databricks. Knowing how to deploy resources using the CLI will help you automate the task of deploying from your DevOps pipeline or running the task from a PowerShell terminal.
Azure hosts the Azure Cloud Shell, which can be used to work with Azure services. Azure Cloud Shell uses preinstalled commands that can be executed without us needing to install it in our local environment.
You can find out how to create an Azure AD app and a SP in the Azure portal by going to Microsoft identity platform | Microsoft...
Creating a Databricks service using Azure Resource Manager (ARM) templates
Using ARM templates for deployment is a well known method to deploy resource in Azure.
By the end of this recipe, you will have learned how to deploy an Azure Databricks workspace using ARM templates. ARM templates can be deployed from an Azure DevOps pipeline, as well as by using PowerShell or CLI commands.
You can find out how to create an Azure AD app and service principal by going to the Azure portal and selecting Microsoft identity platform | Microsoft Docs (https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal#:~:text=Option%201:%20Upload%20a%20certificate.%201%20Select%20Run,key,%20and%20export%20to%20a%20.CER%20file).
For service principal authentication...
Adding users and groups to the workspace
In this recipe, we will learn how to add users and groups to the workspace so that they can collaborate when creating data applications. This exercise will provide you with a detailed understanding of how users are created in a workspace. You will also learn about the different permissions that can be granted to users.
Log into the Databricks workspace as an Azure Databricks admin. Before you add a user to the workspace, ensure that the user exists in Azure Active Directory.
How to do it…
Follow these steps to create users and groups from the Admin Console:
Creating a cluster from the user interface (UI)
In this recipe, we will look at the different types of clusters and cluster modes in Azure Databricks and how to create them. Understanding cluster types will help you determine the right type of cluster you should use for your workload and usage pattern.
Before you get started, ensure you have created an Azure Databricks workspace, as shown in the preceding recipes.
How to do it…
Follow these steps to create a cluster via the UI:
- After launching your workspace, click on the Cluster option from the left-hand pane:
- Provide a name for your cluster and select a cluster mode based on your scenario. Here, we are selecting Standard.
- We are not selecting a pool here. Let's go with the latest version of Spark (3.0.1) and the latest version of the Databricks runtime (7.4) that's available at the time of writing this book.
- The possibility to...
Getting started with notebooks and jobs in Azure Databricks
In this recipe, we will import a notebook into our workspace and learn how to execute and schedule it using jobs. By the end of this recipe, you will know how to import, create, execute, and schedule Notebooks in Azure Databricks.
Ensure the Databricks cluster is up and running. Clone the cookbook repository from https://github.com/PacktPublishing/Azure-Databricks-Cookbook to any location on your laptop/PC. You will find the required demo files in the
How to do it…
Let's dive into importing the Notebook into our workspace:
- First, let's create a simple Notebook that will be used to create a new job and schedule it.
- In the cloned repository, go to chapter-01. You will find a file called DemoRun.dbc. You can import the dbc file into your workspace by right-clicking the Shared workspace and selecting the Import option:
Authenticating to Databricks using a PAT
- Azure Active Directory tokens
A PAT is used as an alternative password to authenticate and access Databricks REST APIs. By the end of this recipe, you will have learned how to use PATs to access the Spark managed tables that we created in the preceding recipes using Power BI Desktop and create basic visualizations.
Users can create PATs and use them in REST API requests. Tokens have optional expiration dates and can be revoked.
How to do it…
This section will show you how to generate PATs using the Azure Databricks UI. Also, apart from the UI, you can use the Token API to generate and revoke tokens. However, there...