Home Data Azure Data Engineering Cookbook - Second Edition

Azure Data Engineering Cookbook - Second Edition

By Nagaraj Venkatesan , Ahmad Osama
books-svg-icon Book
Subscription FREE
eBook $41.99
Print + eBook $51.99
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. $15.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW
Subscription FREE
eBook $41.99
Print + eBook $51.99
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Chapter 2: Securing and Monitoring Data in Azure Data Lake
About this book
The famous quote 'Data is the new oil' seems more true every day as the key to most organizations' long-term success lies in extracting insights from raw data. One of the major challenges organizations face in leveraging value out of data is building performant data engineering pipelines for data visualization, ingestion, storage, and processing. This second edition of the immensely successful book by Ahmad Osama brings to you several recent enhancements in Azure data engineering and shares approximately 80 useful recipes covering common scenarios in building data engineering pipelines in Microsoft Azure. You’ll explore recipes from Azure Synapse Analytics workspaces Gen 2 and get to grips with Synapse Spark pools, SQL Serverless pools, Synapse integration pipelines, and Synapse data flows. You’ll also understand Synapse SQL Pool optimization techniques in this second edition. Besides Synapse enhancements, you’ll discover helpful tips on managing Azure SQL Database and learn about security, high availability, and performance monitoring. Finally, the book takes you through overall data engineering pipeline management, focusing on monitoring using Log Analytics and tracking data lineage using Azure Purview. By the end of this book, you’ll be able to build superior data engineering pipelines along with having an invaluable go-to guide.
Publication date:
September 2022
Publisher
Packt
Pages
608
ISBN
9781803246789

 

Creating and Managing Data in Azure Data Lake

Azure Data Lake is a highly scalable and durable object-based cloud storage solution from Microsoft. It is optimized to store large amounts of structured and semi-structured data such as logs, application data, and documents.

Azure Data Lake can be used as a data source and destination in data engineering projects. As a source, it can be used to stage structured or semi-structured data. As a destination, it can be used to store the result of a data pipeline.

Azure Data Lake is provisioned as a storage account in Azure, capable of storing files (blobs), tables, or queues. This book will focus on Azure Data Lake storage accounts used for storing blobs/files

In this chapter, we will learn how to provision, manage, and upload data into Data Lake accounts and will cover the following recipes:

  • Provisioning an Azure storage account using the Azure portal
  • Provisioning an Azure storage account using PowerShell
  • Creating containers...
 

Technical requirements

For this chapter, the following are required:

  • An Azure subscription
  • Azure PowerShell

The code samples can be found at https://github.com/PacktPublishing/Azure-Data-Engineering-Cookbook-2nd-edition.

 

Provisioning an Azure storage account using the Azure portal

In this recipe, we will provision an Azure storage account using the Azure portal. Azure Blob storage is one of the four storage services available in Azure Storage. The other storage services are Table, Queue, and File Share. Table storage is used to store non-relational structured data as key-value pairs, queue storage is used to store messages as queues, and file share is used for creating file share directories/mount points that can be accessed using the NFS/SMB protocols. This chapter will focus on storing data using the Blob storage service.

Getting ready

Before you start, open a web browser and go to the Azure portal at https://portal.azure.com. Ensure that you have an Azure subscription. Install Azure PowerShell on your machine; instructions for installing it can be found at https://docs.microsoft.com/en-us/powershell/azure/install-az-ps?view=azps-6.6.00.

How to do it…

The steps for this recipe...

 

Provisioning an Azure storage account using PowerShell

PowerShell is a scripting language used to programmatically manage various tasks. In this recipe, we will learn how to provision an Azure storage account using PowerShell.

Getting ready

Before you start, you need to log in to the Azure subscription from the PowerShell console. To do this, execute the following command in a new PowerShell window:

Connect-AzAccount

Then, follow the instructions to log in to the Azure account.

How to do it…

The steps for this recipe are as follows:

  1. Execute the following command in a PowerShell window to create a new resource group. If you want to create the Azure storage account in an existing resource group, this step isn't required:
    New-AzResourceGroup -Name Packtade-powershell -Location 'East US'

You should get the following output:

Figure 1.5 – Creating a new resource group

  1. Execute the following command...
 

Creating containers and uploading files to Azure Blob storage using PowerShell

In this recipe, we will create a new container and upload files to Azure Blob storage using PowerShell.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using the Azure portal recipe.
  2. Log in to your Azure subscription in PowerShell. To log in, run the Connect-AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

The steps for this recipe are as follows:

  1. Execute the following commands to create the container in an Azure storage account:
    $storageaccountname="packtadestoragev2"
    $containername="logfiles"
    $resourcegroup="packtadestorage"
    #Get the Azure Storage account context
    $storagecontext = (Get-AzStorageAccount -ResourceGroupName $resourcegroup -Name $storageaccountname...
 

Managing blobs in Azure Storage using PowerShell

In this recipe, we will learn how to perform various management tasks on an Azure blob. We will perform operations such as copying, listing, modifying, deleting, and downloading files from Azure Blob storage.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
  3. Log in to your Azure subscription in PowerShell. To log in, run the Connect- AzAccount command in a new PowerShell window and follow the instructions.

How to do it…

Let's perform the following operations in this recipe:

  1. Copy files/blobs between two blob storage containers.
  2. List files from...
 

Configuring blob lifecycle management for blob objects using the Azure portal

Azure Storage provides different blob access tiers such as Hot, Cool, and Archive. Each access tier has a different storage and data transfer cost. Applying a proper lifecycle rule to move a blob among different access tiers helps optimize the cost. In this recipe, we will learn how to apply a lifecycle rule to a blob using the Azure portal.

Getting ready

Before you start, perform the following steps:

  1. Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
  2. Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
  3. Make sure you have existing blobs/files in an Azure storage container. If not, you can upload blobs in accordance with the previous recipe. Then, log in to the...
About the Authors
  • Nagaraj Venkatesan

    Nagaraj Venkatesan works as a cloud solution architect at Microsoft. At Microsoft, he works with some of the largest companies in the world, solving their complex data engineering problems and helping them build effective solutions using cutting-edge technologies based on Azure. Nagaraj, based out of Singapore, is a popular member of the data and AI community and is a regular speaker at several international data and AI conferences. He is a two-time Microsoft Most Valuable Professional (MVP) award winner, in 2016 and 2017. Nagaraj shares his technical expertise through his blog and on his YouTube channel called DataChannel. He also holds a master’s degree in computing from the National University of Singapore.

    Browse publications by this author
  • Ahmad Osama

    Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.

    Browse publications by this author
Azure Data Engineering Cookbook - Second Edition
Unlock this book and the full library FREE for 7 days
Start now