Reader small image

You're reading from  Azure Data Engineering Cookbook

Product typeBook
Published inApr 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800206557
Edition1st Edition
Languages
Right arrow
Author (1)
Ahmad Osama
Ahmad Osama
author image
Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama

Right arrow

Chapter 6: Data Flows in Azure Data Factory

In this chapter, we'll look at two data flow activities: the mapping data flow and the wrangling data flow. Data flow activities provide a code-free way to implement transformations on the fly as and when data is being processed.

Incremental data loading is a very common scenario wherein data from a source is incrementally loaded to a destination. There are multiple ways to implement incremental data flows. This chapter provides an implementation of incremental data loading that you can use in your environment as and when required.

The wrangling data flow provides a code-free UI that can be used to clean and transform data using Power Query. This makes it easy for non-developers to implement data transformation and cleaning and create data pipelines quickly.

In this chapter, we'll cover the following recipes:

  • Implementing incremental data loading with a mapping data flow
  • Implementing a wrangling data flow

    Note...

Technical requirements

For this chapter, the following are required:

  • A Microsoft Azure subscription
  • PowerShell 7
  • Microsoft Azure PowerShell

Implementing incremental data loading with a mapping data flow

A mapping data flow provides a code-free data flow transformation environment. We use the UI to implement the ETL and process the pipeline. Spark clusters are then provisioned, and then the data flow is transformed to Spark code and executed.

In this recipe, we'll look at one of the approaches to implement incremental data loading using a mapping data flow.

Getting ready

To get started, do the following:

  1. Log in to https://portal.azure.com using your Azure credentials.
  2. Open a new PowerShell prompt. Execute the following command to log in to your Azure account from PowerShell:
    Connect-AzAccount
  3. You will need an existing Data Factory account. If you don't have one, create one by executing the following PowerShell script: ~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1.
  4. Create an Azure storage account and upload files to the ~/Chapter06/Data folder in orders/datain...

Implementing a wrangling data flow

A wrangling data flow performs code-free data preparation at scale by integrating Power Query to prepare/transform data. The Power Query code is converted to Spark and gets executed on a Spark cluster.

In this recipe, we'll implement a wrangling data flow to read the orders.txt file, clean the data, calculate the total sales by country and customer name, and insert the data into an Azure SQL Database table.

Getting ready

To get started, do the following:

  1. Log in to https://portal.azure.com using your Azure credentials.
  2. Open a new PowerShell prompt. Execute the following command to log in to your Azure account from PowerShell:
    Connect-AzAccount
  3. You will need an existing Data Factory account. If you don't have one, create one by executing the ~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1 PowerShell script.
  4. Create an Azure storage account and upload the files to the ~/Chapter06/Data folder...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Engineering Cookbook
Published in: Apr 2021Publisher: PacktISBN-13: 9781800206557
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama