Reader small image

You're reading from  Azure Data Engineering Cookbook

Product typeBook
Published inApr 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800206557
Edition1st Edition
Languages
Right arrow
Author (1)
Ahmad Osama
Ahmad Osama
author image
Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama

Right arrow

Chapter 7: Azure Data Factory Integration Runtime

The Azure Data Factory Integration Runtime (IR) is the compute infrastructure that is responsible for executing data flows, pipeline activities, data movement, and SQL Server Integration Services (SSIS) packages. There are three types of IR: Azure, self-hosted, and Azure SSIS.

The Azure IR is the default IR that is created whenever a new data factory is created. It can process data flows, data movement, and activities.

A self-hosted IR can be installed on-premises or on a virtual machine running the Windows OS. A self-hosted IR can be used to work with data on-premises or in the cloud. It can be used for data movement and activities.

The Azure SSIS IR is used to lift and shift existing SQL SSIS.

In this chapter, we'll learn how to use a self-hosted IR and Azure SSIS IR through the following recipes:

  • Configuring a self-hosted IR
  • Configuring a shared self-hosted IR
  • Migrating an SSIS package to Azure...

Technical requirements

For this chapter, the following are required:

  • A Microsoft Azure subscription
  • PowerShell 7
  • Microsoft Azure PowerShell

Configuring a self-hosted IR

In this recipe, we'll learn how to configure a self-hosted IR and then use the IR to copy files from on-premises to Azure Storage using the Copy data activity.

Getting ready

To get started, do the following:

  1. Log in to https://portal.azure.com using your Azure credentials.
  2. Open a new PowerShell prompt. Execute the following command to log in to your Azure account from PowerShell:
    Connect-AzAccount
  3. You will need an existing Data Factory account. If you don't have one, create one by executing the ~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1 PowerShell script.

How to do it…

To configure a self-hosted runtime, follow the given steps:

  1. In the Azure portal, open Data Factory, and then open the Manage tab:

    Figure 7.1 – Opening the Manage tab – Data Factory

  2. Select New, and then select Azure, Self-Hosted:

    Figure 7.2 – Selecting the Azure, Self-Hosted runtime...

Configuring a shared self-hosted IR

A shared self-hosted runtime, as the name suggests, can be shared among more than one data factory. This helps to use a single self-hosted IR to run multiple pipelines. In this activity, we'll learn how to share a self-hosted IR.

Getting ready

To get started, do the following:

  1. Log in to https://portal.azure.com using your Azure credentials.
  2. Open a new PowerShell prompt. Execute the following command to log in to your Azure account from PowerShell:
    Connect-AzAccount
  3. You will need an existing Data Factory account. If you don't have one, create one by executing the ~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1 PowerShell script.
  4. You need a self-hosted IR. If you don't have one, follow the previous recipe to create one.

How to do it…

Let's start by creating a new Azure data factory:

  1. Execute the following PowerShell command to create an Azure data factory...

Migrating an SSIS package to Azure Data Factory

SQL SSIS is a widely used on-premises ETL tool. In this recipe, we'll learn how to migrate an existing SSIS package to Azure Data Factory.

We'll do this by configuring an Azure SSIS IR, uploading the SSIS package to Azure SQL Database SSISDB, and then executing the package using the Execute SSIS Package activity.

Getting ready

To get started, do the following:

  1. Log in to https://portal.azure.com using your Azure credentials.
  2. Open a new PowerShell prompt. Execute the following command to log in to your Azure account from PowerShell:
    Connect-AzAccount
  3. You will need an existing Data Factory account. If you don't have one, create one by executing the ~/azure-data-engineering-cookbook\Chapter04\3_CreatingAzureDataFactory.ps1 PowerShell script.
  4. Provision an Azure storage account and upload files to it using ~/Chapter06/2_UploadDatatoAzureStorage.ps1.

How to do it…

We'll start...

Executing an SSIS package with an on-premises data store

We may have SSIS packages accessing an on-premises data source or the destination; for example, we may have files in an on-premises file store to be uploaded to Azure SQL Database, or we may have an on-premises database as the source. In such cases, we require the Azure SSIS IR to connect to the on-premises data store. There are two ways to do that:

  • Configuring Azure SSIS to connect to on-premises using a Point-to-Site VPN, Site-to-Site VPN, or ExpressRoute. There are three steps to this:

    1) Set up a Point-to-Site VPN, Site-to-Site VPN, or ExpressRoute between on-premises and Azure.

    2) Create a virtual network and join the Azure SSIS IR with the virtual network.

    3) Create a virtual network gateway.

  • Configure Azure SSIS to use a self-hosted IR as a proxy to connect to on-premises.

In this recipe, we'll explore the second option. To get more information on option 1, you can check out https://docs.microsoft...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Engineering Cookbook
Published in: Apr 2021Publisher: PacktISBN-13: 9781800206557
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama