Reader small image

You're reading from  Azure Data Factory Cookbook - Second Edition

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781803246598
Edition2nd Edition
Right arrow
Authors (4):
Dmitry Foshin
Dmitry Foshin
author image
Dmitry Foshin

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.
Read more about Dmitry Foshin

Tonya Chernyshova
Tonya Chernyshova
author image
Tonya Chernyshova

Tonya Chernyshova is an experienced Data Engineer with over 10 years in the field, including time at Amazon. Specializing in Data Modeling, Automation, Cloud Computing (AWS and Azure), and Data Visualization, she has a strong track record of delivering scalable, maintainable data products. Her expertise drives data-driven insights and business growth, showcasing her proficiency in leveraging cloud technologies to enhance data capabilities.
Read more about Tonya Chernyshova

Dmitry Anoshin
Dmitry Anoshin
author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

Xenia Ireton
Xenia Ireton
author image
Xenia Ireton

Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building distributed services, data pipelines and data warehouses.
Read more about Xenia Ireton

View More author details
Right arrow

Managing Deployment Processes with Azure DevOps

Azure DevOps offers a comprehensive set of development collaboration, continuous integration, and continuous delivery tools. With Azure Repos, you can collaborate on code development using free public and private Git repositories, pull requests, and code reviews. Meanwhile, Azure Pipelines enables you to implement a streamlined build, test, and development pipeline for any application.

In this chapter, we will delve into setting up CI and CD for data analytics solutions in Azure Data Factory (ADF) using Azure DevOps.

Continuous Integration (CI) is a practice where code changes are regularly integrated into a shared repository, ensuring that each change is automatically built, tested, and validated.

Continuous Deployment (CD) is the automatic deployment of changes to production or staging environments after they pass CI tests.

By implementing CI/CD in the context of data analytics, you can streamline the development process...

Technical requirements

For this chapter, you will need the following:

Now that you’ve got a sense of what this chapter covers, along with the technical requirements, let’s dive into the recipes.

Setting up Azure DevOps

To get the most out of Azure DevOps integration with ADF, you first need to create an account within Azure DevOps, link it with your Data Factory, and make sure everything is set up and ready to work. This recipe will take you through the steps on how to accomplish that.

Getting ready

Before we start, please ensure that you have an Azure subscription and are familiar with the basics of Azure resources such as the Azure portal, creating and deleting Azure resources, and creating pipelines in ADF.

How to do it...

In this recipe, you will learn how to create an Azure DevOps account, create a new project, connect a DevOps organization with Azure Active Directory, link it with your ADF, and set up a code repository:

  1. Navigate to https://azure.microsoft.com/en-us/products/devops/.
  2. You will see the following screen. Click on Start free to begin creating your Azure DevOps account:

    Figure 9.1: Starting your free Azure DevOps...

Publishing changes to ADF

Collaboration on code development involves using Git. In this recipe, you will learn how to create an ADF pipeline in Azure DevOps Git and publish changes from your master branch to ADF.

Getting ready

Before we start, please ensure that you have an Azure subscription and are familiar with the basics of Azure resources, such as navigating the Azure portal, creating and deleting Azure resources, and creating pipelines in Azure Data Factory (ADF).

Additionally, you will need an Azure DevOps project created and linked to your ADF. If you haven’t set up this connection yet, you can refer to the preceding recipe, titled Setting up Azure DevOps, for step-by-step instructions on how to do so.

How to do it...

We are going to create a new ADF pipeline in the master branch of Azure DevOps Git and publish the changes to Data Factory:

  1. Create a new ADF pipeline with the Wait activity in the master branch. Please refer to Chapter 2...

Deploying your features into the master branch

Now that we have covered how to publish changes from the master branch to Data Factory, we are going to look at how to deploy new branches into the master branch. There are several reasons for creating new branches. While implementing new changes to your project, it is a common practice to create a feature branch, develop your changes there, and then publish them to the master branch. Some teams working in an Agile environment can create branches per story development. Other teams may have branches per developer. In all these situations, the main purpose is to avoid breaking changes during the release into the production environment.

Getting ready

Before we start, please ensure that you have an Azure subscription and are familiar with the basics of Azure resources such as the Azure portal, creating and deleting Azure resources, and creating pipelines in ADF. Also, you will need an Azure DevOps project created and linked to your...

Getting ready for the CI/CD of ADF

CD includes the deployment of ADF pipelines between different environments - that is, development, testing, and production. The best practice and most secure way of configuring your pipelines in the CI/CD process is using Azure Key Vault (AKV) instead of a connection string. AKV is utilized in Azure Data Factory pipelines for CD because it provides a highly secure and centrally managed solution for storing and safeguarding sensitive information such as connection strings, passwords, and authentication tokens. It ensures controlled access, facilitates secret rotation, enhances auditing capabilities, and seamlessly integrates with ADF, making it the best practice for securing pipelines in the CD process.

In this recipe, you will learn what you need to set up before creating a CD process, and how to establish AKV and connect it with ADF and an Azure storage account.

Getting ready

Before we start, please ensure that you have an Azure subscription...

Creating an Azure pipeline for CD

The Pipelines service of Azure DevOps helps you automate your release cycle between different environments - that is, development, testing, and production. In this recipe, you will learn how to create an Azure pipeline and connect it with Azure Data Factories related to different environments.

Getting ready

Before we start, please ensure that you have an Azure subscription and are familiar with the basics of Azure resources such as the Azure portal, creating and deleting Azure resources, and creating pipelines in ADF. Also, you will need an Azure DevOps project created and linked to your ADF.

How to do it...

We are going to create a new pipeline in Azure DevOps and set it up to release changes from development to the testing environment:

  1. Go to your Azure DevOps account, and click Pipelines | Releases | New release pipeline | Empty job. You will see the following screen:
    A screenshot of a software Description automatically generated

    Figure 9.35: Creating a new release pipeline...

Install and configure Visual Studio to work with ADF deployment

Using Visual Studio for ADF deployment provides a powerful and feature-rich development environment that improves efficiency, collaboration, and code management throughout the ADF development life cycle. In this recipe, you will learn how to install and configure Visual Studio.

Getting ready

Before we start, please ensure you have an Azure subscription and are familiar with the basics of Azure resources such as the Azure portal, creating and deleting Azure resources, and creating pipelines in ADF. Before moving ahead, make sure you have downloaded and installed Visual Code from the official website: https://code.visualstudio.com/.

How to do it...

We are going to create a new project in Azure DevOps and Visual Studio:

  1. Let’s go to Azure DevOps and create a new project. In the Advanced settings, choose Git and click Create:

    Figure 9.48: Creating a new project in Azure DevOps

  2. ...

Setting up ADF as a Visual Studio project

In this recipe, you will learn how to create an ADF by using Visual Studio and the Azure CLI, define linked services, datasets, and pipelines in the JSON format, and deploy changes to Azure DevOps.

Getting ready

Before we start, please ensure you have installed Visual Studio and have configured the cloned project from Azure DevOps to your local machine and Visual Studio (the steps described in the previous recipe). Also, download and install the Azure CLI from the official website (https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).

How to do it...

  1. Open a terminal in Visual Studio Code by going to View | Terminal. Sign in to your Azure account using the Azure CLI by running the following command:
    az login
    

    Follow the instructions to authenticate.

  1. In the terminal, use the Azure CLI to create a new ADF instance, such as this example:
    az datafactory create --location...

Running Airflow DAGs with ADF

ADF’s Managed Airflow service provides a streamlined and effective solution for creating and managing Apache Airflow environments, simplifying the execution of data pipelines at scale. Apache Airflow, an open-source platform, empowers users to programmatically design, schedule, and monitor intricate data workflows. By defining tasks as operators and arranging them into directed acyclic graphs (DAGs), Airflow facilitates the representation of data pipelines. It enables scheduled or event-triggered execution of DAGs, real-time workflow monitoring, and task status visibility, making it a popular choice in data engineering and science for orchestrating data pipelines due to its adaptability, extensibility, and user-friendliness. In this recipe, we will run an existing pipeline with Managed Airflow.

Getting ready

Before we start, please ensure that you have an Azure subscription, Azure storage account, and ADF pipeline set up.

How to do...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Factory Cookbook - Second Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781803246598
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (4)

author image
Dmitry Foshin

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.
Read more about Dmitry Foshin

author image
Tonya Chernyshova

Tonya Chernyshova is an experienced Data Engineer with over 10 years in the field, including time at Amazon. Specializing in Data Modeling, Automation, Cloud Computing (AWS and Azure), and Data Visualization, she has a strong track record of delivering scalable, maintainable data products. Her expertise drives data-driven insights and business growth, showcasing her proficiency in leveraging cloud technologies to enhance data capabilities.
Read more about Tonya Chernyshova

author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

author image
Xenia Ireton

Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building distributed services, data pipelines and data warehouses.
Read more about Xenia Ireton