Reader small image

You're reading from  Azure Data Factory Cookbook

Product typeBook
Published inDec 2020
PublisherPackt
ISBN-139781800565296
Edition1st Edition
Right arrow
Authors (4):
Dmitry Anoshin
Dmitry Anoshin
author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

Dmitry Foshin
Dmitry Foshin
author image
Dmitry Foshin

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.
Read more about Dmitry Foshin

Roman Storchak
Roman Storchak
author image
Roman Storchak

Roman Storchak is a PhD, and is a chief data officer whose main interest lies in building data-driven cultures through making analytics easy. He has led teams that have built ETL-heavy products in AdTech and retail and often uses Azure Stack, PowerBI, and Data Factory.
Read more about Roman Storchak

Xenia Ireton
Xenia Ireton
author image
Xenia Ireton

Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building distributed services, data pipelines and data warehouses.
Read more about Xenia Ireton

View More author details
Right arrow

Using the Lookup, Web, and Execute Pipeline activities

In this recipe, we shall implement error handling logic for our pipeline – similar to the previous recipe, but with a more sophisticated design: we shall isolate the error handling flow in its own pipeline. Our main parent pipeline will then call the child pipeline. This recipe also introduces three very useful activities to the user: Lookup, Web, and Execute Pipeline. The recipe will illustrate how to retrieve information from an Azure SQL table and how to invoke other Azure services from the pipeline.

Getting ready

We shall be using all the Azure services that are mentioned in the Technical requirements section at the beginning of the chapter. In addition, this recipe requires a table to store the email addresses of the status email recipients. Please refer to the Technical requirements section for the table creation scripts and instructions.

We shall be building a pipeline that sends an email in the case of failure. There is no activity in ADF capable of sending emails, so we shall be using the Azure Logic Apps service. Follow these steps to create an instance of this service:

  1. In the Azure portal, look for Logic Apps in Azure services. Then, use the Add button to create a new logic app.
  2. Name your logic app ADF-Email-LogicApp and fill in the Subscription, Resource Group, and Region information fields. Click on Create and wait until your logic app is deployed. Then, click on Go to Resource.
  3. In the Logic App Designer, select the When a HTTP request is received trigger:
    Figure 2.28 – HTTP trigger

    Figure 2.28 – HTTP trigger

  4. In the displayed tile, click on Use sample payload to generate schema, and use the following code block:
    {
    "subject": "<subject of the email message>",
    "messageBody": "<body of the email message >",
    "emailAddress": "<email-address>"
    }

    Enter the text in the textbox as shown in the following figure:

    Figure 2.29 – Configuring a logic app – the capture message body

    Figure 2.29 – Configuring a logic app – the capture message body

  5. Click on the Next Step button and choose the email service that you want to use to send the notification emails. For the purposes of this tutorial, we shall use Gmail.

    Note:

    Even though we use Gmail for the purposes of this tutorial, you can also send emails using Office 365 Outlook or Outlook.com. In the See also section of this recipe, we included a link to a tutorial on how to send emails using those providers.

  6. Select Gmail from the list of services and Send Email from Actions. Log in with your account credentials:

    Figure 2.30 – Configuring a logic app – specifying an email service

    Figure 2.30 – Configuring a logic app – specifying an email service

  7. From the Add new parameter dropdown, check the Subject and Body checkboxes:
    Figure 2.31 – Configuring a logic app – specifying the Body, Subject, and Recipient fields

    Figure 2.31 – Configuring a logic app – specifying the Body, Subject, and Recipient fields

  8. Place your cursor inside the To text field and enter @{triggerBody()['emailAddress']}.
  9. In a similar way, enter @{triggerBody()['subject']} in the Subject text field.
  10. Finally, in the Body text box, enter @{triggerBody()['messageBody']}:
    Figure 2.32 – Configuring a logic app – specifying the To, Subject, and Body values

    Figure 2.32 – Configuring a logic app – specifying the To, Subject, and Body values

  11. Save your logic app. In the first tile, you should see that HTTP POST URL was populated. This is the URL we'll use to invoke this logic app from the Data Factory pipeline.

How to do it…

First, we shall create the child pipeline to retrieve the email addresses of the email recipients and send the status email:

  1. Create a new pipeline and name it pl_orchestration_recipe_5_child.
  2. From the Activities pane, select a Lookup activity and add it to the pipeline canvas. Configure it in the following way:

    (a) In the General tab, change the activity name to Get Email Recipients.

    (b) In the Settings tab, select AzureSQLTables as the value for Source dataset, and specify EmailRecipients for tableName.

    (c) Also, in the Settings tab, select the Use Query radio button and enter SELECT * FROM EmailRecipients into the text box. Make sure to uncheck the First row only checkbox at the bottom. Your Settings tab should look similar to the following figure:

    Figure 2.33 – The Get Email Recipients activity settings

    Figure 2.33 – The Get Email Recipients activity settings

  3. Next, add a ForEach activity to the canvas and configure it in the following way:

    In the Settings tab, enter @activity('Get Email Recipients').output.value into the Items textbox.

  4. Click on the pencil icon within the ForEach activity. This will open a new canvas. Add a Web activity onto this canvas.

    We shall now configure the Web activity. First, go to the General tab, and rename it Send Email. Then, in the URL text field, paste the URL for the logic app (which you created in the Getting ready section).

    In the Method textbox, select POST.

    In the Headers section, click on the New button to add a header. Enter Content-Type into the Name text box and application/json into the Value textbox.

    In the Body text box, enter the following text (be sure to copy the quotes accurately):

    @json(concat('{"emailAddress": "', item().emailAddress, '", "subject": "ADF Pipeline Failure", "messageBody": "ADF Pipeline Failed"}'))

    Your Settings tab should look similar to Figure 2.34:

    Figure 2.34 – The Send Email activity settings

    Figure 2.34 – The Send Email activity settings

  5. Run this pipeline in Debug mode and verify that it works. You should have some test email addresses in the EmailRecipients table in order to test your pipeline. You can also verify that the email was sent out by going to the ADF-Email-LogicApp UI in the Azure portal and examining the run in the Overview pane:
    Figure 2.35 – Logic Apps portal view

    Figure 2.35 – Logic Apps portal view

  6. We are ready to design the parent pipeline, which will invoke the child pipeline we just tested. For this, clone the pipeline we designed in the Chaining and branching activities within your pipeline recipe. Rename your clone pl_orchestration_recipe_5_parent.
  7. In this pipeline, delete the On Failure Stored Procedure activity, and instead add an Execute Pipeline activity to the canvas. Connect it to the red square in the Copy From Blob to Azure SQL activity.
  8. Configure the Execute Pipeline activity:

    In the General tab, change the name to Send Email On Failure.

    In the Settings tab, specify the name of the invoked pipeline as pl_orchestration_recipe_5_child.

  9. The parent pipeline should already be configured with the incorrect table name in the Copy activity sink (we deliberately misconfigured it in order to test the On Failure flow). Verify that this is still the case and run the pipeline in Debug mode:
    Figure 2.36 – Parent pipeline after execution with an incorrect tableName value

    Figure 2.36 – Parent pipeline after execution with an incorrect tableName value

  10. Verify that the email was sent to the recipients.
  11. Publish your changes to save them.

How it works…

In this recipe, we introduced the concept of parent and child pipelines and used the pipeline hierarchy to incorporate error handling functionality. This technique offers several benefits:

  • It allows us to reuse existing pipelines.
  • It makes it easier to design/debug parts of the pipeline separately.
  • Finally, it allows users to design pipelines that contain more than 40 activities (Microsoft limits the number of activities per pipeline).

To craft the child pipeline, we started by adding a Lookup activity to retrieve a list of email recipients from the database table. This is a very common use for the Lookup activity: fetching a dataset for subsequent processing. In the configuration, we specified a query for the dataset retrieval: SELECT * from EmailRecipients. We can also use a more sophisticated query to filter the email recipients, or we can retrieve all the data by selecting the Table radio button. The ability to specify a query gives users a lot of choice and flexibility in filtering a dataset or using field projections with very little effort.

The list of email recipients was processed by the ForEach activity. We encountered the ForEach activity in the previous recipe. However, inside the ForEach activity, we introduced a new kind of activity: the Web activity, which we configured to invoke a simple logic app. This illustrates the power of the Web activity: it enables the user to invoke external REST APIs without leaving the Data Factory pipeline.

There's more…

There is another ADF activity that offers the user an option to integrate external APIs into a pipeline: the Webhook activity. It has a lot of similarities to the Web activity, with two major differences:

  • The Webhook activity always passes an implicit callBackUri property to the external service, along with the other parameters you specify in the request body. It expects to receive a response from the invoked web application. If the response is not received within the configurable timeout period, the Webhook activity fails. The Web activity does not have a callBackUri property, and, while it does have a timeout period, it is not configurable but limited to 1 minute.

    This feature of the Webhook activity can be used to control the execution flow of the pipeline – for example, to wait for user input into a web form before proceeding with further steps.

  • The Web activity allows users to pass linked services and datasets. This can be used for data movement to a remote endpoint. The Webhook activity does not offer this capability.

See also

For more information about the Webhook activity, refer to the Microsoft documentation:

https://docs.microsoft.com/azure/data-factory/control-flow-webhook-activity

If you want to learn how to configure a logic app to send emails using providers other than Gmail, follow this tutorial:

https://docs.microsoft.com/azure/logic-apps/tutorial-process-email-attachments-workflow

Previous PageNext Page
You have been reading a chapter from
Azure Data Factory Cookbook
Published in: Dec 2020Publisher: PacktISBN-13: 9781800565296
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (4)

author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

author image
Dmitry Foshin

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.
Read more about Dmitry Foshin

author image
Roman Storchak

Roman Storchak is a PhD, and is a chief data officer whose main interest lies in building data-driven cultures through making analytics easy. He has led teams that have built ETL-heavy products in AdTech and retail and often uses Azure Stack, PowerBI, and Data Factory.
Read more about Roman Storchak

author image
Xenia Ireton

Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building distributed services, data pipelines and data warehouses.
Read more about Xenia Ireton