Reader small image

You're reading from  Azure Data Factory Cookbook - Second Edition

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781803246598
Edition2nd Edition
Right arrow
Authors (4):
Dmitry Foshin
Dmitry Foshin
author image
Dmitry Foshin

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.
Read more about Dmitry Foshin

Tonya Chernyshova
Tonya Chernyshova
author image
Tonya Chernyshova

Tonya Chernyshova is an experienced Data Engineer with over 10 years in the field, including time at Amazon. Specializing in Data Modeling, Automation, Cloud Computing (AWS and Azure), and Data Visualization, she has a strong track record of delivering scalable, maintainable data products. Her expertise drives data-driven insights and business growth, showcasing her proficiency in leveraging cloud technologies to enhance data capabilities.
Read more about Tonya Chernyshova

Dmitry Anoshin
Dmitry Anoshin
author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

Xenia Ireton
Xenia Ireton
author image
Xenia Ireton

Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building distributed services, data pipelines and data warehouses.
Read more about Xenia Ireton

View More author details
Right arrow

The Best Practices of Working with ADF

Welcome to the final chapter of Azure Data Factory Cookbook, where we delve into the best practices for working with Azure Data Factory (ADF) and Azure Synapse. Throughout this cookbook, we’ve explored a multitude of recipes and techniques to help you harness the power of ADF for your data integration and transformation needs. In this closing chapter, we’ll guide you through essential considerations, strategies, and practical recipes that will elevate your ADF projects to new heights of efficiency, security, and scalability.

We will cover the following list of recipes in this chapter:

  • Setting up roles and permissions with access levels for working with ADF
  • Setting up Meta ETL with ADF
  • Scaling your ADF project
  • Using ADF disaster recovery built-in features
  • Change data capture
  • Managing data factory costs with FinOps

Technical requirements

For this chapter, you will need the following:

Setting up roles and permissions with access levels in ADF

ADF is built on principles of collaboration, and to work effectively you will need to grant access privileges to other users and teams. By its very nature, ADF relies on integration with other services, therefore entities such as users, service principles, and managed identities will require access to resources within your ADF instance. User access management is a pivotal feature of ADF.

Similar to many Azure services, ADF relies on Role-Based Access Control (RBAC). RBAC enables fine-grained definitions of roles that can be granted, or assigned, to users, groups, service principals, or managed identities. These role assignments determine who can perform specific actions, such as viewing or making changes to pipelines, datasets, linked services, and other components, and ultimately govern access to your data workflows.

Imagine a scenario where a company is using ADF to orchestrate their data pipelines, which involves...

Setting up Meta ETL with ADF

When faced with the task of copying vast amounts of objects, such as thousands of tables, or loading data from a diverse range of sources, an effective approach is to leverage a control table that contains a list of object names along with their required copy behaviors. By employing parameterized pipelines, these object names and behaviors can be read from the control table and applied to the jobs accordingly. “Copy behaviors” refer to the specific actions or configurations associated with copying each object. These behaviors can include parameters such as source and destination locations, data transformation requirements, scheduling preferences, error-handling strategies, and any other settings relevant to the copying process.

Unlike traditional methods that require redeploying pipelines whenever the objects list needs modification (e.g., adding or removing objects), utilizing a control table allows for swift and straightforward updates...

Leveraging ADF scalability: Performance tuning of an ADF pipeline

Due to its serverless architecture, ADF is inherently scalable, dynamically adjusting its resource allocation to meet workload demands without the need for users to manage physical servers. This flexible architecture offers users various techniques to enhance the performance of their data pipelines.

One approach for improving performance involves harnessing the power of parallelism, such as incorporating a ForEach activity into your pipelines. The ForEach activity allows for the parallel processing of data by iterating over a collection of items, executing a specified set of activities for each item in parallel. This can significantly reduce overall execution time, especially when dealing with large datasets or when multiple independent tasks can be processed concurrently.

For example, suppose you have a pipeline that needs to process data from multiple files stored in Azure Blob Storage. By using a ForEach...

Using ADF disaster recovery built-in features

ADF provides organizations with the tools they need to effortlessly create, schedule, and oversee data pipelines, facilitating the seamless movement and transformation of data. Maintaining data availability and keeping downtime to a minimum are pivotal aspects of preserving business operations. In this recipe, we’ll guide you through the process of designing a disaster recovery solution for your ADF as the ETL/ELT engine for data movement and transformation.

Getting ready

Before we start, please ensure that you have an Azure subscription and are familiar with the basics of Azure resources such as the Azure portal, creating and deleting Azure resources, and creating pipelines in ADF.

How to do it...

Before diving into disaster recovery planning, it’s crucial to understand that ADF is a Platform-as-a-Service (PaaS) offering by Azure. Azure PaaS provides a ready-to-develop and deploy infrastructure, including...

Change Data Capture

The Change Data Capture (CDC) tool in Azure Data Factory enables real-time data synchronization by efficiently tracking and capturing only the changed data. It optimizes data integration workflows, reduces processing time, and ensures data consistency across systems. With built-in connectors and support for hybrid environments, CDC empowers organizations to stay up to date with analytics and reporting.

Getting ready

Before getting started with the recipe, log in to your Microsoft Azure account.

We assume you have a pre-configured resource group and storage account with Azure Data Lake Gen2, Azure Data Factory, and Azure SQL Database. To set these up, please refer to Chapter 1, Getting Started with ADF, and the Creating and executing our first job in ADF recipe.

  • In Azure SQL Database, you will need to have movielens CSV files to be loaded to the dbo schema with the following table name: dbo.movielens_ratings.
  • In Azure Data Lake Gen2...

Managing Data Factory costs with FinOps

Data Factory is a crucial service for data processing in Azure, but managing its costs effectively is essential to avoid unexpected expenses.

FinOps is a set of practices and principles that help organizations manage their cloud costs efficiently. It involves collaboration between finance, IT, and business teams to optimize cloud spending, allocate costs accurately, and drive accountability. The goal of FinOps is to strike a balance between cost optimization and enabling cloud innovation.

Examples of applying FinOps principles to ADF include:

  • Resource Right-sizing:: Analyze the compute resources used by your Data Factory pipelines and adjust them based on actual workload requirements. For instance, if certain pipelines consistently underutilize resources, consider downsizing the compute instances to save costs.
  • Schedule Optimization: Leverage Data Factory’s scheduling capabilities to run pipelines during off-peak...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Factory Cookbook - Second Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781803246598
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (4)

author image
Dmitry Foshin

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.
Read more about Dmitry Foshin

author image
Tonya Chernyshova

Tonya Chernyshova is an experienced Data Engineer with over 10 years in the field, including time at Amazon. Specializing in Data Modeling, Automation, Cloud Computing (AWS and Azure), and Data Visualization, she has a strong track record of delivering scalable, maintainable data products. Her expertise drives data-driven insights and business growth, showcasing her proficiency in leveraging cloud technologies to enhance data capabilities.
Read more about Tonya Chernyshova

author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

author image
Xenia Ireton

Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building distributed services, data pipelines and data warehouses.
Read more about Xenia Ireton