Creating and Managing Data in Azure Data Lake
Azure Data Lake is a highly scalable and durable object-based cloud storage solution from Microsoft. It is optimized to store large amounts of structured and semi-structured data such as logs, application data, and documents.
Azure Data Lake can be used as a data source and destination in data engineering projects. As a source, it can be used to stage structured or semi-structured data. As a destination, it can be used to store the result of a data pipeline.
Azure Data Lake is provisioned as a storage account in Azure, capable of storing files (blobs), tables, or queues. This book will focus on Azure Data Lake storage accounts used for storing blobs/files
In this chapter, we will learn how to provision, manage, and upload data into Data Lake accounts and will cover the following recipes:
- Provisioning an Azure storage account using the Azure portal
- Provisioning an Azure storage account using PowerShell
- Creating containers...
Technical requirements
For this chapter, the following are required:
- An Azure subscription
- Azure PowerShell
The code samples can be found at https://github.com/PacktPublishing/Azure-Data-Engineering-Cookbook-2nd-edition.
Provisioning an Azure storage account using the Azure portal
In this recipe, we will provision an Azure storage account using the Azure portal. Azure Blob storage is one of the four storage services available in Azure Storage. The other storage services are Table, Queue, and File Share. Table storage is used to store non-relational structured data as key-value pairs, queue storage is used to store messages as queues, and file share is used for creating file share directories/mount points that can be accessed using the NFS/SMB protocols. This chapter will focus on storing data using the Blob storage service.
Getting ready
Before you start, open a web browser and go to the Azure portal at https://portal.azure.com. Ensure that you have an Azure subscription. Install Azure PowerShell on your machine; instructions for installing it can be found at https://docs.microsoft.com/en-us/powershell/azure/install-az-ps?view=azps-6.6.00.
How to do it…
The steps for this recipe...
Provisioning an Azure storage account using PowerShell
PowerShell is a scripting language used to programmatically manage various tasks. In this recipe, we will learn how to provision an Azure storage account using PowerShell.
Getting ready
Before you start, you need to log in to the Azure subscription from the PowerShell console. To do this, execute the following command in a new PowerShell window:
Connect-AzAccount
Then, follow the instructions to log in to the Azure account.
How to do it…
The steps for this recipe are as follows:
- Execute the following command in a PowerShell window to create a new resource group. If you want to create the Azure storage account in an existing resource group, this step isn't required:
New-AzResourceGroup -Name Packtade-powershell -Location 'East US'
You should get the following output:
Figure 1.5 – Creating a new resource group
Creating containers and uploading files to Azure Blob storage using PowerShell
In this recipe, we will create a new container and upload files to Azure Blob storage using PowerShell.
Getting ready
Before you start, perform the following steps:
- Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using the Azure portal recipe.
- Log in to your Azure subscription in PowerShell. To log in, run the
Connect-AzAccount
command in a new PowerShell window and follow the instructions.
How to do it…
The steps for this recipe are as follows:
- Execute the following commands to create the container in an Azure storage account:
$storageaccountname="packtadestoragev2" $containername="logfiles" $resourcegroup="packtadestorage" #Get the Azure Storage account context $storagecontext = (Get-AzStorageAccount -ResourceGroupName $resourcegroup -Name $storageaccountname...
Managing blobs in Azure Storage using PowerShell
In this recipe, we will learn how to perform various management tasks on an Azure blob. We will perform operations such as copying, listing, modifying, deleting, and downloading files from Azure Blob storage.
Getting ready
Before you start, perform the following steps:
- Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
- Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
- Log in to your Azure subscription in PowerShell. To log in, run the
Connect- AzAccount
command in a new PowerShell window and follow the instructions.
How to do it…
Let's perform the following operations in this recipe:
- Copy files/blobs between two blob storage containers.
- List files from...
Configuring blob lifecycle management for blob objects using the Azure portal
Azure Storage provides different blob access tiers such as Hot, Cool, and Archive. Each access tier has a different storage and data transfer cost. Applying a proper lifecycle rule to move a blob among different access tiers helps optimize the cost. In this recipe, we will learn how to apply a lifecycle rule to a blob using the Azure portal.
Getting ready
Before you start, perform the following steps:
- Make sure you have an existing Azure storage account. If not, create one by following the Provisioning an Azure storage account using PowerShell recipe.
- Make sure you have an existing Azure storage container. If not, create one by following the Creating containers and uploading files to Azure Blob storage using PowerShell recipe.
- Make sure you have existing blobs/files in an Azure storage container. If not, you can upload blobs in accordance with the previous recipe. Then, log in to the...