Reader small image

You're reading from  Limitless Analytics with Azure Synapse

Product typeBook
Published inJun 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800205659
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Prashant Kumar Mishra
Prashant Kumar Mishra
author image
Prashant Kumar Mishra

Prashant Kumar Mishra is an engineering architect at Microsoft. He has more than 10 years of professional expertise in the Microsoft data and AI segment as a developer, consultant, and architect. He has been focused on Microsoft Azure Cloud technologies for several years now and has helped various customers in their data journey. He prefers to share his knowledge with others to make the data community stronger day by day through his blogs and meetup groups.
Read more about Prashant Kumar Mishra

Right arrow

Chapter 4: Using Synapse Pipelines to Orchestrate Your Data

Bringing data to Synapse is definitely a first big step, but it's not the final destination. You still need to cross many hurdles on the way before you start adding any flavor to your data. A Synapse pipeline comprises datasets and activities, but the main advantage is that you can reuse the same dataset with various pipelines. Synapse supports various data stores and provides feasibility to transform your data without writing any code. In this chapter, We will learn how to create Azure Synapse pipelines to orchestrate your data.

In this chapter, we will cover the following topics:

  • Introducing Synapse pipelines
  • Creating linked services
  • Defining source and target datasets
  • Using various activities in Synapse pipelines
  • Scheduling Synapse pipelines
  • Creating pipelines using samples

Technical requirements

Before you start orchestrating your data, certain prerequisites apply, as outlined here:

  • You should have an Azure subscription, or access to any other subscription with contributor-level access.
  • Create your Synapse workspace on this subscription. You can follow the instructions from Chapter 1, Introduction to Azure Synapse, to create your Synapse workspace.
  • Create a Structured Query Language (SQL) pool and a Spark pool on Azure Synapse. This was covered in Chapter 2, Considerations for Your Compute Environment.
  • You must have an Azure Data Lake Storage Gen2 account with two containers, demozipfiles-ch04 and demozipfilestating-ch04, with read/write permissions.
  • Download the sample zipped files from http://bit.ly/ch04-prerequisites and extract the ZIP files to get two zipped files, SampleUserData09262020.zip and SampleUserData09272020.zip.
  • Upload these two zipped files to the demozipfiles-ch04 container in your Azure Data Lake Storage...

Introducing Synapse pipelines

Synapse pipelines are used to perform Extract, Transform, and Load (ETL) operations on data. This service is similar to Azure Data Factory, but these pipelines can be created within Synapse Studio itself. In this section, we are going to learn how to create a pipeline for copying data from different sources to Azure Synapse Analytics. We will also see how we can use multiple activities within the same pipeline and create dependency endpoints to connect one activity with another activity in the pipeline.

The following screenshot shows a Copy data activity in a Synapse pipeline:

Figure 4.1 – A screenshot of a Synapse pipeline in Synapse Studio

Figure 4.1 – A screenshot of a Synapse pipeline in Synapse Studio

These pipelines comprise various components, and we are going to learn about these components in brief in the following sections.

Integration runtime

An Integration Runtime (IR) is a compute infrastructure used by Azure Data Factory or Synapse pipelines to provide data...

Creating linked services

Linked services define the connection information needed for a Synapse pipeline to connect to an external data source. These linked services are not specific to any pipeline, but you can use the same linked service for multiple pipelines at the same time if they share the same data source.

In this example, we are going to create a linked service for Azure SQL Database (which is our data source), with Synapse as our target.

Before we proceed with the steps to create the linked service for the source and target, make sure you have met all the technical requirements outlined at the start of this chapter. Then, proceed as follows:

  1. Launch Synapse Studio by clicking on the Synapse Studio link on the Synapse workspace.
  2. Click on Linked services under the Manage tab, and click on + New to create a new linked service, as illustrated in the following screenshot:
    Figure 4.8 – Creating linked services in Azure Synapse

    Figure 4.8 – Creating linked services in Azure Synapse

  3. Select Azure Data Lake Storage...

Defining source and target datasets

Datasets are created in a pipeline in order to identify data stored in various data sources in different formats, such as tables, files, folders, documents, and so on. A dataset can be used by multiple activities or pipelines.

Before we start adding some transformations onto the data, we should have the required datasets in place. So, follow these instructions to create a dataset for the source:

  1. Go to the Data tab in Synapse Studio and click on + on the Data canvas, as highlighted in the following screenshot:
    Figure 4.12 – Creating a dataset in Synapse Studio

    Figure 4.12 – Creating a dataset in Synapse Studio

  2. Select Integration dataset from the dropdown, and select the required data store from the list of all available data stores appearing in the Integration dataset window. In this example, we are going to select Azure Data Lake Storage Gen2 as our data store, and then click on Continue.
  3. Select the DelimitedText format for your data from the list of all available options...

Using various activities in Synapse pipelines

Synapse pipelines give you the option to add various transformations; however, we will try to cover just a couple of transformations in this section. Proceed as follows:

  1. Navigate to the Integrate tab on Synapse Studio and click on + to select Pipeline out of the other available options, as illustrated in the following screenshot:
    Figure 4.18 – Creating a Synapse pipeline in Synapse Studio

    Figure 4.18 – Creating a Synapse pipeline in Synapse Studio

  2. Fill in the name and description in the Properties window of the pipeline that you created in the preceding step and click on Publish all to save the changes.
  3. Let's add some activities to the canvas. We are going to select the Get Metadata activity from the list of all available activities to begin with, as illustrated in the following screenshot:
    Figure 4.19 – Adding the Get Metadata activity to the Synapse pipeline canvas

    Figure 4.19 – Adding the Get Metadata activity to the Synapse pipeline canvas

  4. Provide a name for this activity in the General tab. We are going to enter GetMetadataForZipFiles...

Scheduling Synapse pipelines

Azure Synapse pipelines allow you to run your pipeline just once or trigger it manually whenever you need to run it. However, Synapse pipelines enable you to schedule the pipelines to run at regular intervals as well.

With Synapse pipelines, it's just a matter of a few clicks to schedule your pipeline. The following instructions will help you in scheduling your pipeline:

  1. Go to the Triggers page under the Monitor tab in Synapse Studio and click on + New at the top of the screen, as illustrated in the following screenshot:
    Figure 4.27 – A screenshot of the Triggers page in Synapse Studio

    Figure 4.27 – A screenshot of the Triggers page in Synapse Studio

  2. Provide a name and description for your trigger. It's better to keep your pipeline's line appended to the trigger's name so that in the case of any failure it will be easy to identify the corresponding pipeline. The fields are shown in the following screenshot:
    Figure 4.28 – Creating a trigger for the Pipeline_Gen2_Synapse pipeline

    Figure 4.28 – Creating a trigger for the Pipeline_Gen2_Synapse pipeline...

Creating pipelines using samples

Synapse has provided various sample pipelines that can help you in building your production-ready pipeline in just a few steps.

We will go through the following steps to create pipelines using samples provided by Synapse:

  1. Go to the Integrate tab on the Synapse Studio screen.
  2. Go to the sample center by clicking on Browse samples, as highlighted in the following screenshot:
    Figure 4.31 – A screenshot of the Browse samples link under the Integrate tab in Synapse Studio

    Figure 4.31 – A screenshot of the Browse samples link under the Integrate tab in Synapse Studio

  3. You can see sample datasets, notebooks, and SQL scripts in the sample center. Let's try to use one of the sample notebooks. Go to the Notebooks section, select Getting Started with Delta Lake, and click on Continue. The following screenshot provides an overview of the sample center:

    Figure 4.32 – A screenshot of the sample center in Synapse Studio

    Figure 4.32 – A screenshot of the sample center in Synapse Studio

  4. On the next screen, you can see a preview of the notebook that you selected. Click on Next after going...

Summary

So far, we have learned how to create linked services, datasets, pipelines, and triggers. We learned how can we use multiple activities together in a pipeline. We got a fair understanding of variables and parameters in Synapse pipelines. Synapse has provided the option to use sample pipelines, but it's important to learn how to use these sample pipelines—therefore in this chapter, we also covered how we can start using these.

Synapse supports various data stores and various ways to transform your data, but we could only cover a couple of transformations in this chapter. However, now that you are comfortable with Synapse pipelines, it will be easy for you to add any activity to the pipeline as per your business requirements. You can go to http://bit.ly/transform-data-on-synapse if you want to learn more about any specific activity.

We will talk about a couple of other activities throughout the book that will give you more clarity on Synapse pipelines.

In...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Limitless Analytics with Azure Synapse
Published in: Jun 2021Publisher: PacktISBN-13: 9781800205659
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Prashant Kumar Mishra

Prashant Kumar Mishra is an engineering architect at Microsoft. He has more than 10 years of professional expertise in the Microsoft data and AI segment as a developer, consultant, and architect. He has been focused on Microsoft Azure Cloud technologies for several years now and has helped various customers in their data journey. He prefers to share his knowledge with others to make the data community stronger day by day through his blogs and meetup groups.
Read more about Prashant Kumar Mishra