You're reading from Azure Data Factory Cookbook - Second Edition

Product typeBook

Published inFeb 2024

PublisherPackt

ISBN-139781803246598

Edition2nd Edition

Concepts

Data Engineering

Authors (4):

Dmitry Foshin

Tonya Chernyshova

Dmitry Anoshin

Xenia Ireton

View More author details

Azure Data Factory Cookbook, Second Edition: Data engineers guide to build and manage ETL and ELT pipelines with data integration

Welcome to Packt Early Access. We’re giving you an exclusive preview of this book before it goes on sale. It can take many months to write a book, but our authors have cutting-edge information to share with you today. Early Access gives you an insight into the latest developments by making chapter drafts available. The chapters may be a little rough around the edges right now, but our authors will update them over time.You can dip in and out of this book or follow along from start to finish; Early Access is designed to be flexible. We hope you enjoy getting to know more about the process of writing a Packt book.

Chapter 1: Getting Started with ADF
Chapter 2: Orchestration and Control Flow
Chapter 3: Setting up Synapse Analytics
Chapter 4: Working with Data Lake and Spark Pools
Chapter 5: Working with Big Data: Databricks

Introduction to the Azure data platform

The Azure data platform provides us with a number of data services for databases, data storage, and analytics. In Table 1.1, you can find a list of services and their purpose:

Figure 1.1: Azure data platform services

Using Azure data platform services can help you build a modern analytics solution that is secure and scalable. The following diagram shows an example of a typical modern cloud analytics architecture:

Figure 1.2: Modern analytics solution architecture

You can find most of the Azure data platform services here. ADF is a core service for data movement and transformation.

Let’s learn more about the reference architecture in Figure 1.1. It starts with source systems. We can collect data from files, databases, APIs, IoT, and so on. Then, we can use Event Hubs for streaming data and ADF for batch operations. ADF will push data into Azure Data Lake as a staging area, and then we can prepare data for...

Creating and executing our first job in ADF

ADF allows us to create workflows for transforming and orchestrating data movement. You may think of ADF as an Extract, Transform, Load (ETL) tool for the Azure cloud and the Azure data platform. ADF is Software as a Service (SaaS). This means that we don’t need to deploy any hardware or software. We pay for what we use. Often, ADF is referred to as code-free ETL as a service or managed service. The key operations of ADF are listed here:

Ingest: Allows us to collect data and load it into Azure data platform storage or any other target location. ADF has 90+ data connectors.
Control flow: Allows us to design code-free extracting and loading workflows.
Data flow: Allows us to design code-free data transformations.
Schedule: Allows us to schedule ETL jobs.
Monitor: Allows us to monitor ETL jobs.

We have learned about the key operations of ADF. Next, we should try them.

Getting ready

...

Creating an ADF pipeline using the Copy Data tool

We just reviewed how to create the ADF job using the UI. However, we can also use the Copy Data tool (CDT). The CDT allows us to load data into Azure storage faster. We don’t need to set up linked services, pipelines, and datasets as we did in the previous recipe. In other words, depending on your activity, you can use the ADF UI or the CDT. Usually, we will use the CDT for simple load operations, when we have lots of data files and we would like to ingest them into Data Lake as fast as possible.

Getting ready

In this recipe, we will use the CDT in order to do the same task of copying data from one folder to another.

How to do it...

We already created the ADF job with the UI. Let’s review the CDT:

In the previous recipe, we created the Azure Blob storage instance and container. We will use the same file and the same container. However, we have to delete the file from the output location.

Creating an ADF pipeline using Python

We can use PowerShell, .NET, and Python for ADF deployment and data integration automation. Here is an extract from the Microsoft documentation:

”Azure Automation delivers a cloud-based automation and configuration service that provides consistent management across your Azure and non-Azure environments. It consists of process automation, update management, and configuration features. Azure Automation provides complete control during deployment, operations, and decommissioning of workloads and resources.”

In this recipe, we want to cover the Python scenario because Python is one of the most popular languages for analytics and data engineering. We will use Jupyter Notebook with example code.

You can use Jupyter notebooks or Visual Code notebooks.

Getting ready

For this exercise, we will use Python in order to create a data pipeline and copy our file from one folder to another. We need to use the azure...

Creating a data factory using PowerShell

Often, we don’t have access to the UI and we want to create infrastructure as code. It is easily maintainable and deployable and allows us to track versions and have code commit and change requests. In this recipe, we will use PowerShell to create a data factory. If you have never used PowerShell before, you can find information about how to get PowerShell and install it onto your machine at the end of this recipe.

Getting ready

For this exercise, we will use PowerShell to create a data pipeline and copy our file from one folder to another.

How to do it…

Let’s create an ADF job using PowerShell:

In the case of macOS, we can run the following command to install PowerShell:
```
brew install powershell/tap/powershell
```
Check that it is working:
```
pwsh
```
Optionally, we can download PowerShell for our OS from https://github.com/PowerShell/PowerShell/.

...

Using templates to create ADF pipelines

Modern organizations are operating in a fast-paced environment. It is important to deliver insights faster and have shorter analytics iterations. Moreover, Azure found that many organizations have similar use cases for their modern cloud analytics deployments. As a result, Azure built a number of predefined templates. For example, if you have data in Amazon S3 and you want to copy it into Azure Data Lake, you can find a specific template for this operation; or say you want to move an on-premises Oracle data warehouse to the Azure Synapse Analytics data warehouse – you are covered with ADF templates.

Getting ready

ADF provides us with templates in order to accelerate data engineering development. In this recipe, we will review the common templates and see how to use them.

How to do it...

We will find and review an existing template using Data Factories:

In the Azure portal, choose Data Factories.
Open our...

Creating an Azure Data Factory using Azure Bicep

Azure Bicep is a domain-specific language that offers a more readable and maintainable approach to creating and managing Azure resources. It simplifies the process of creating, deploying, and managing ADF resources, reducing the complexity and tediousness of managing raw JSON files. In this recipe, we will create an Azure Data Factory using Azure Bicep and the Visual Studio Code Azure Bicep extension. The Azure Bicep extension for Visual Studio Code provides syntax highlighting, code snippets, and IntelliSense to make working with Azure Bicep files more efficient.

Getting ready

Before diving into the creation of an Azure Data Factory using Azure Bicep and Visual Studio Code, ensure that you have the necessary prerequisites in place:

An active Azure subscription
Visual Studio Code installed on your local machine
Azure CLI installed on your local machine
Azure Bicep CLI extension installed on your...

The rest of the chapter is locked

You have been reading a chapter from

Azure Data Factory Cookbook - Second Edition

Published in: Feb 2024Publisher: PacktISBN-13: 9781803246598

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (4)

Dmitry Foshin

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.
Read more about Dmitry Foshin

Tonya Chernyshova

Tonya Chernyshova is an experienced Data Engineer with over 10 years in the field, including time at Amazon. Specializing in Data Modeling, Automation, Cloud Computing (AWS and Azure), and Data Visualization, she has a strong track record of delivering scalable, maintainable data products. Her expertise drives data-driven insights and business growth, showcasing her proficiency in leveraging cloud technologies to enhance data capabilities.
Read more about Tonya Chernyshova

Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

Xenia Ireton

Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building distributed services, data pipelines and data warehouses.
Read more about Xenia Ireton

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages