Reader small image

You're reading from  Azure Data Engineer Associate Certification Guide

Product typeBook
Published inFeb 2022
PublisherPackt
ISBN-139781801816069
Edition1st Edition
Tools
Concepts
Right arrow
Author (1)
Newton Alex
Newton Alex
author image
Newton Alex

Newton Alex leads several Azure Data Analytics teams in Microsoft, India. His team contributes to technologies including Azure Synapse, Azure Databricks, Azure HDInsight, and many open source technologies, including Apache YARN, Apache Spark, and Apache Hive. He started using Hadoop while at Yahoo, USA, where he helped build the first batch processing pipelines for Yahoo's ad serving team. After Yahoo, he became the leader of the big data team at Pivotal Inc., USA, where he was responsible for the entire open source stack of Pivotal Inc. He later moved to Microsoft and started the Azure Data team in India. He has worked with several Fortune 500 companies to help build their data systems on Azure.
Read more about Newton Alex

Right arrow

Splitting data

ADF provides multiple ways to split data in a pipeline. The important ones are Conditional Split and cloning (New branch). While Conditional Split is used to split data based on certain conditions, the New branch option is used to just copy the entire dataset for a new execution flow. We have already seen an example of Conditional Split in Figure 8.11. Let's see how we can create a new branch in the data pipeline.

In order to create a new branch, just click on the + icon next to any data source artifact (such as the DriverCSV11 block shown in the following screenshot). From there, you can choose the New branch option:

Figure 8.22 – New branch option in ADF

Apart from these two options, ADF also provides the ability to split the input files into multiple sub-files using partitions. Let's see how to accomplish that next.

File splits

In order to use file splits, create a new Sink artifact, and in the Optimize tab, just...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Azure Data Engineer Associate Certification Guide
Published in: Feb 2022Publisher: PacktISBN-13: 9781801816069

Author (1)

author image
Newton Alex

Newton Alex leads several Azure Data Analytics teams in Microsoft, India. His team contributes to technologies including Azure Synapse, Azure Databricks, Azure HDInsight, and many open source technologies, including Apache YARN, Apache Spark, and Apache Hive. He started using Hadoop while at Yahoo, USA, where he helped build the first batch processing pipelines for Yahoo's ad serving team. After Yahoo, he became the leader of the big data team at Pivotal Inc., USA, where he was responsible for the entire open source stack of Pivotal Inc. He later moved to Microsoft and started the Azure Data team in India. He has worked with several Fortune 500 companies to help build their data systems on Azure.
Read more about Newton Alex