Azure Data Engineering Cookbook

By Ahmad Osama
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Chapter 2: Working with Relational Databases in Azure

About this book

Data engineering is one of the faster growing job areas as Data Engineers are the ones who ensure that the data is extracted, provisioned and the data is of the highest quality for data analysis. This book uses various Azure services to implement and maintain infrastructure to extract data from multiple sources, and then transform and load it for data analysis.

It takes you through different techniques for performing big data engineering using Microsoft Azure Data services. It begins by showing you how Azure Blob storage can be used for storing large amounts of unstructured data and how to use it for orchestrating a data workflow. You'll then work with different Cosmos DB APIs and Azure SQL Database. Moving on, you'll discover how to provision an Azure Synapse database and find out how to ingest and analyze data in Azure Synapse. As you advance, you'll cover the design and implementation of batch processing solutions using Azure Data Factory, and understand how to manage, maintain, and secure Azure Data Factory pipelines. You’ll also design and implement batch processing solutions using Azure Databricks and then manage and secure Azure Databricks clusters and jobs. In the concluding chapters, you'll learn how to process streaming data using Azure Stream Analytics and Data Explorer.

By the end of this Azure book, you'll have gained the knowledge you need to be able to orchestrate batch and real-time ETL workflows in Microsoft Azure.

Publication date:
April 2021
Publisher
Packt
Pages
454
ISBN
9781800206557

 

Chapter 1: Working with Azure Blob Storage

Azure Blob storage is a highly scalable and durable object-based cloud storage solution from Microsoft. Blob storage is optimized to store large amounts of unstructured data such as log files, images, video, and audio.

It is an important data source in structuring an Azure data engineering solution. Blob storage can be used as a data source and destination. As a source, it can be used to stage unstructured data, such as application logs, images, and video and audio files. As a destination, it can be used to store the result of a data pipeline.

In this chapter, we'll learn to read, write, manage, and secure Azure Blob storage and will cover the following recipes:

  • Provisioning an Azure storage account using the Azure portal
  • Provisioning an Azure storage account using PowerShell
  • Creating containers and uploading files to Azure Blob storage using PowerShell
  • Managing blobs in Azure Storage using PowerShell
  • Managing...
 

Technical requirements

For this chapter, the following are required:

  • An Azure subscription
  • Azure PowerShell

The code samples can be found at https://github.com/PacktPublishing/azure-data-engineering-cookbook.

 

Provisioning an Azure storage account using the Azure portal

In this recipe, we'll provision an Azure storage account using the Azure portal. Azure Blob storage is one of the four storage services available in Azure Storage. The other storage services are Table, Queue, and file share.

Getting ready

Before you start, open a web browser and go to the Azure portal at https://portal.azure.com.

How to do it…

The steps for this recipe are as follows:

  1. In the Azure portal, select Create a resource and choose Storage account – blob, file, table, queue (or, search for storage accounts in the search bar. Do not choose Storage accounts (classic)).
  2. A new page, Create storage account, will open. There are five tabs on the Create storage account page – Basics, Networking, Advanced, Tags, and Review + create.
  3. In the Basics tab, we need to provide the Azure Subscription, Resource group, Storage account name, Location, Performance, Account kind,...
 

Provisioning an Azure storage account using PowerShell

PowerShell is a scripting language used to programmatically manage various tasks. In this recipe, we'll learn to provision an Azure storage account using PowerShell.

Getting ready

Before you start, we need to log in to the Azure subscription from the PowerShell console. To do this, execute the following command in a new PowerShell window:

Connect-AzAccount

Then, follow the instructions to log in to the Azure account.

How to do it…

The steps for this recipe are as follows:

  1. Execute the following command in a PowerShell window to create a new resource group. If you want to create the Azure storage account in an existing resource group, this step isn't required:
    New-AzResourceGroup -Name Packtade-powershell -Location 'East US'

    You should get the following output:

    Figure 1.5 – Creating a new resource group

  2. Execute the following command to create a new Azure storage account...
 

Creating containers and uploading files to Azure Blob storage using PowerShell

In this recipe, we'll create a new container and will upload files to Azure Blob storage using PowerShell.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Log in to your Azure subscription in PowerShell. To log in, run the Connect-AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

The steps for this recipe are as follows:

  1. Execute the following commands to create the container in an Azure storage account:
    $storageaccountname="packtadestorage"
    $containername="logfiles"
    $resourcegroup="packtade"
    #Get the Azure Storage account context
    $storagecontext = (Get-AzStorageAccount -ResourceGroupName $resourcegroup -Name $storageaccountname...
 

Managing blobs in Azure Storage using PowerShell

In this recipe, we'll learn to perform various management tasks on an Azure blob. We'll copy blobs between containers, list all blobs in a container, modify a blob access tier, download blobs from Microsoft Azure to a local system, and delete a blob from Azure Storage.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
  3. Log in to your Azure subscription in PowerShell. To log in, run the Connect-AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

Let's begin by copying blobs between containers.

Copying blobs...

 

Managing an Azure blob snapshot in Azure Storage using PowerShell

An Azure blob snapshot is a point-in-time copy of a blob. A snapshot can be used as a blob backup. In this recipe, we'll learn to create, list, promote, and delete an Azure blob snapshot.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
  3. Make sure you have existing blobs/files in an Azure storage container. If not, you can upload blobs according to the previous recipe.
  4. Log in to your Azure subscription in PowerShell. To log in, run the Connect-AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

Let...

 

Configuring blob life cycle management for blob objects using the Azure portal

Azure Storage provides different blob access tiers such as Hot, Cool, and Archive. Each access tier has a different storage and data transfer cost. Applying a proper life cycle rule to move a blob among different access tiers helps optimize the cost. In this recipe, we'll learn to apply a life cycle rule to a blob using the Azure portal.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
  3. Make sure you have existing blobs/files in an Azure storage container. If not, you can upload blobs in accordance with the previous recipe. Then, log in to the...
 

Configuring a firewall for an Azure storage account using the Azure portal

Storage account access can be restricted to an IP or range of IPs by whitelisting the allowed IPs in the storage account firewall. In this recipe, we'll learn to restrict access to an Azure storage account using a firewall.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe.

How to do it…

To provide access to an IP or range of IPs, follow the given steps:

  1. In the Azure portal, locate and open the Azure storage account. In our case, the storage account is packtadestorage.
  2. On the storage account page, under the Settings section, locate and select Firewalls and virtual networks.
  3. As the packtadestorage account was created with public...
 

Configuring virtual networks for an Azure storage account using the Azure portal

A storage account can be public, accessible to everyone, public with access to an IP or range of IPs, or private with access to selected virtual networks. In this recipe, we'll learn to restrict access to an Azure storage account to a virtual network.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and go to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe.

How to do it…

To restrict access to a virtual network, follow the given steps:

  1. In the Azure portal, locate and open the storage account. In our case, it's packtadestorage. On the storage account page, under the Settings section, locate and select Firewalls and virtual networks | Selected networks:

    Figure 1.25 – Azure Storage...

 

Configuring a firewall for an Azure storage account using PowerShell

In this recipe, we'll enable firewall rules for an Azure storage account using PowerShell.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Log in to your Azure subscription in PowerShell. To log in, run the Connect-AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

The steps for this recipe are as follows:

  1. Execute the following command to deny access from all networks:
    Update-AzStorageAccountNetworkRuleSet -ResourceGroupName packtADE -Name packtadestorage -DefaultAction Deny

    You should get a similar output to that shown in the following screenshot:

    Figure 1.29 – Denying access to all networks

  2. Execute the following commands to add a firewall rule for the client...
 

Configuring virtual networks for an Azure storage account using PowerShell

In this recipe, we'll learn to limit access of an Azure storage account to a particular virtual network using PowerShell.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Log in to your Azure subscription in PowerShell. To log in, run the Connect-AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

The steps for this recipe are as follows:

  1. Execute the following command to disable all networks for an Azure storage account:
    $resourcegroup = "packtADE"
    $location="eastus"
    Update-AzStorageAccountNetworkRuleSet -ResourceGroupName $resourcegroup -Name packtadestorage -DefaultAction Deny
  2. Execute the following command to create a new virtual network ...
 

Creating an alert to monitor an Azure storage account

We can create an alert on multiple available metrics to monitor an Azure storage account. To create an alert, we need to define the trigger condition and the action to be performed when the alert is triggered. In this recipe, we'll create an alert to send an email if the used capacity metrics for an Azure storage account exceed 5 MB. The used capacity threshold of 5 MB is not a standard and is deliberately kept low to explain the alert functionality.

Getting ready

Before you start, perform the following steps:

  1. Open a web browser and log in to the Azure portal at https://portal.azure.com.
  2. Make sure you have an existing storage account. If not, create one using the Provisioning an Azure storage account using the Azure portal recipe.

How to do it…

Follow the given steps to create an alert:

  1. In the Azure portal, locate and open the storage account. In our case, the storage account is packtadestorage...
 

Securing an Azure storage account with SAS using PowerShell

A shared access signature (SAS) provides more granular access to blobs by specifying an expiry limit, specific permissions, and IPs.

Using SAS, we can specify different permissions to users or application on different blobs based on the requirement. For example, if an application needs to read one file/blob from a container, instead of providing access to all the files in the container, we can use SAS to provide read access on the required blob.

In this recipe we'll learn to create and use SAS to access blobs.

Getting ready

Before you start, go through the following steps:

  • Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  • Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe...

About the Author

  • Ahmad Osama

    Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.

    Browse publications by this author
Azure Data Engineering Cookbook
Unlock this book and the full library for FREE
Start free trial