Reader small image

You're reading from  Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801077743
Edition1st Edition
Right arrow
Author (1)
Manoj Kukreja
Manoj Kukreja
author image
Manoj Kukreja

Manoj Kukreja is a Principal Architect at Northbay Solutions who specializes in creating complex Data Lakes and Data Analytics Pipelines for large-scale organizations such as banks, insurance companies, universities, and US/Canadian government agencies. Previously, he worked for Pythian, a large managed service provider where he was leading the MySQL and MongoDB DBA group and supporting large-scale data infrastructure for enterprises across the globe. With over 25 years of IT experience, he has delivered Data Lake solutions using all major cloud providers including AWS, Azure, GCP, and Alibaba Cloud. On weekends, he trains groups of aspiring Data Engineers and Data Scientists on Hadoop, Spark, Kafka and Data Analytics on AWS and Azure Cloud.
Read more about Manoj Kukreja

Right arrow

Configuring data destinations

Once the batch and streaming ingestion pipelines have been invoked, they will fetch data from the data sources and dump the results into the data destination. The data destination for the bronze layer is Azure Data Lake Storage Gen2:

  1. We will now use the Azure client to create an Azure Data Lake Storage account. Copy each of the following commands, line by line, and paste them inside the Cloud Shell window. Then, press Enter:
    STORAGEACCOUNTNAME="traininglakehouse"
    RESOURCEGROUPNAME="training_rg"
    LOCATION="eastus"
    az storage account create --name $STORAGEACCOUNTNAME --resource-group $RESOURCEGROUPNAME --kind StorageV2 --location $LOCATION  --hns true --sku Standard_LRS --tags owner=data engineering project=lakehouse environment=development

    If the preceding commands are successful, you should see an output that looks like this:

    Figure 5.19 – Output of the creation of the Data Lake Storage account

  2. You...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
Published in: Oct 2021Publisher: PacktISBN-13: 9781801077743

Author (1)

author image
Manoj Kukreja

Manoj Kukreja is a Principal Architect at Northbay Solutions who specializes in creating complex Data Lakes and Data Analytics Pipelines for large-scale organizations such as banks, insurance companies, universities, and US/Canadian government agencies. Previously, he worked for Pythian, a large managed service provider where he was leading the MySQL and MongoDB DBA group and supporting large-scale data infrastructure for enterprises across the globe. With over 25 years of IT experience, he has delivered Data Lake solutions using all major cloud providers including AWS, Azure, GCP, and Alibaba Cloud. On weekends, he trains groups of aspiring Data Engineers and Data Scientists on Hadoop, Spark, Kafka and Data Analytics on AWS and Azure Cloud.
Read more about Manoj Kukreja