Reader small image

You're reading from  Azure Databricks Cookbook

Product typeBook
Published inSep 2021
PublisherPackt
ISBN-139781789809718
Edition1st Edition
Right arrow
Authors (2):
Phani Raj
Phani Raj
author image
Phani Raj

Phani Raj is an experienced data architect and a product manager having 15 years of experience working with customers on building data platforms on both on-prem and on cloud. Worked on designing and implementing large scale big data solutions for customers on different verticals. His passion for continuous learning and adapting to the dynamic nature of technology underscores his role as a trusted advisor in the realm of data architecture ,data science and product management.
Read more about Phani Raj

Vinod Jaiswal
Vinod Jaiswal
author image
Vinod Jaiswal

Vinod Jaiswal is an experienced data engineer, excels in transforming raw data into valuable insights. With over 8 years in Databricks, he designs and implements data pipelines, optimizes workflows, and crafts scalable solutions for intricate data challenges. Collaborating seamlessly with diverse teams, Vinod empowers them with tools and expertise to leverage data effectively. His dedication to staying updated on the latest data engineering trends ensures cutting-edge, robust solutions. Apart from technical prowess, Vinod is a proficient educator. Through presentations and mentoring, he shares his expertise, enabling others to harness the power of data within the Databricks ecosystem.
Read more about Vinod Jaiswal

View More author details
Right arrow

Chapter 5: Integrating with Azure Key Vault, App Configuration, and Log Analytics

These days, most data processing or data ingestion pipelines read or write data from or to various external sources such as Azure Blob/ADLS Gen-2, SQL, Cosmos, and Synapse. To access these sources, you will need credentials. Storing these credentials, such as a storage account key or SQL password, in a notebook is not an option from a security standpoint and not a recommended approach when deploying these data ingestion or processing pipelines to production. To overcome these problems, Microsoft provides Azure Key Vault, where you can store credentials such as username and password, and access them securely from your notebooks. Apart from Key Vault, we can also use App Configuration to store these passwords and access them from Databricks notebooks.

By the end of this chapter, you will have learned how to create an Azure Key Vault from the Azure portal and the Azure CLI, as well as how to access Key...

Technical requirements

To follow along with the examples in this chapter, you will need the following:

Creating an Azure Key Vault to store secrets using the UI

In this recipe, you will learn how to create an Azure Key Vault instance via the Azure portal and how to create secrets in the Azure Key Vault. Once the Key Vault instance and secrets have been created, we can refer to these secrets in the notebook so that we are not exposing passwords to our users. By the end of this recipe, you will have learned how to create an Azure Key Vault via the Azure portal and how to create secrets in the Azure Key Vault.

Getting ready

Before starting, you need to ensure you have contributor access to the subscription or are the owner of the resource group.

How to do it…

In this section, you will learn how to create an Azure Key Vault via the Azure portal. Let's get started:

  1. From the Azure portal home page, search for Key Vault and select the Create button.
  2. Provide the Resource group name, Key vault name, and select the Standard pricing tier:

    Figure 5.1 –...

Creating an Azure Key Vault to store secrets using ARM templates

In this recipe, you will learn how to create an Azure Key Vault instance and secrets using the Azure CLI and ARM templates. It is very useful to know how to deploy templates using the CLI, which helps you automate the task of deploying from your DevOps pipeline.

Getting ready

In this recipe, we will use a service principal to authenticate to Azure so that we can create an Azure Key Vault resource from the Azure CLI. Follow the steps mentioned in the following link to create a service principal:

https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal#:~:text=Option%201:%20Upload%20a%20certificate.%201%20Select%20Run,key,%20and%20export%20to%20a%20.CER%20file

You can find the ARM template JSON files and the required PowerShell script at https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Chapter05/Code.

You can find the Parquet files that were...

Using Azure Key Vault secrets in Azure Databricks

In this recipe, you will learn how to use the secrets stored in Azure Key Vault in an Azure Databricks notebook. You can store information such as storage account keys or SQL Database passwords in Azure Key Vault as secrets, ensuring sensitive information is not exposed.

Getting ready

To use Azure Key Vault secrets in Databricks notebooks, we must create a secret scope in Databricks. The following steps will show you how to create a secret scope in Azure Databricks:

  1. Open Databricks and create a scope URL called <Databricks URL>/#secrets/createScope. You can find the Databricks URL on the Azure Key Vault Overview page. The following screenshot shows the Databricks URL. The secret scope URL will be similar to https://xxxxxx.azuredatabricks.net/#secrets/createScope:

    Figure 5.7 – Databricks service URL

  2. Once you have opened the Secret scope URL, provide any name for your scope. Manage Principal can be either...

Creating an App Configuration resource

Azure App Configuration is a service that's used to manage application configuration settings such as flags or any other static keys that are required across applications. It provides a centralized location to store configurations that can be referred to later. When you are working in a cloud environment, many components are distributed across multiple machines, and having the application configurations for each component can cause issues such as inconsistencies in the configuration values, which makes it hard to troubleshoot errors. In such scenarios, you can use Azure App Configuration to store your application settings in a centralized location that can be changed dynamically without requiring any deployment.

In this recipe, you will learn how to create an App Configuration resource in the Azure portal using a UI.

Getting ready

Before starting, you need to ensure you have contributor access to the subscription or are the owner...

Using App Configuration in an Azure Databricks notebook

In this recipe, you will learn how to use the secrets stored as configuration values in the App Configuration service in an Azure Databricks notebook.

Getting ready

You can follow along by running the steps in the 5_2.Using App Config in Notebooks notebook at https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Chapter05/.

You can find the Parquet files that will be used in this recipe at https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Common/Customer/parquetFiles.

To execute the code mentioned in the notebook, we need to get the connection string of the App Configuration service we have created. The following screenshot shows how to get the connection string for your App Configuration service. The name of the App Configuration service in this example is DevAppconfigurationRes:

Figure 5.13 – Creating a key-value configuration value

In the next...

Creating a Log Analytics workspace

As more and more Azure services are being used to build enterprise solutions, there needs to be a centralized location where we can collect performance and application metrics for various Azure services. This will help us understand how the service is functioning. Every Azure resource has a set of resource logs that provides information about the operations that are performed on the Azure service, as well as the health of that service. With the help of Azure Monitor Logs, we can collect data from resource logs, as well as performance metrics from applications and virtual machines, into a common Log Analytics workspace. We can also use these metrics to identify any specific trends, understand the performance of the service, or even find any anomalies. We can analyze the data that has been captured in a Log Analytics Workspace using Log Query, which was written in Kusto Query Language (KQL), and perform various types of Data Analytics operations. In...

Integrating a Log Analytics workspace with Azure Databricks

In this recipe, you will learn how to integrate a Log Analytics workspace with Azure Databricks. You will also learn how to send Databricks service metrics and Databricks application metrics to the Log Analytics workspace.

Getting ready

Before starting, we need to ensure we have access to a subscription and have contributor access to that subscription. To send Databricks service metrics to a Log Analytics workspace, we will need the Azure Databricks Premium tier.

You can follow along by running the steps in the 5_1.Using Azure Key Vault Secrets in Notebooks notebook. This will allow you to send Databricks log data to a Log Analytics workspace (https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Chapter05/).

How to do it…

To send metrics to the Log Analytics workspace, we need to turn on diagnostics in the Azure Databricks service. Following this, we must send the service or application...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Databricks Cookbook
Published in: Sep 2021Publisher: PacktISBN-13: 9781789809718
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Phani Raj

Phani Raj is an experienced data architect and a product manager having 15 years of experience working with customers on building data platforms on both on-prem and on cloud. Worked on designing and implementing large scale big data solutions for customers on different verticals. His passion for continuous learning and adapting to the dynamic nature of technology underscores his role as a trusted advisor in the realm of data architecture ,data science and product management.
Read more about Phani Raj

author image
Vinod Jaiswal

Vinod Jaiswal is an experienced data engineer, excels in transforming raw data into valuable insights. With over 8 years in Databricks, he designs and implements data pipelines, optimizes workflows, and crafts scalable solutions for intricate data challenges. Collaborating seamlessly with diverse teams, Vinod empowers them with tools and expertise to leverage data effectively. His dedication to staying updated on the latest data engineering trends ensures cutting-edge, robust solutions. Apart from technical prowess, Vinod is a proficient educator. Through presentations and mentoring, he shares his expertise, enabling others to harness the power of data within the Databricks ecosystem.
Read more about Vinod Jaiswal