You're reading from Limitless Analytics with Azure Synapse

Product typeBook

Published inJun 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800205659

Edition1st Edition

Languages

Python

Tools

Azure Stream Analytics

Concepts

Data Science

Author (1)

Prashant Kumar Mishra

Chapter 4: Using Synapse Pipelines to Orchestrate Your Data

Bringing data to Synapse is definitely a first big step, but it's not the final destination. You still need to cross many hurdles on the way before you start adding any flavor to your data. A Synapse pipeline comprises datasets and activities, but the main advantage is that you can reuse the same dataset with various pipelines. Synapse supports various data stores and provides feasibility to transform your data without writing any code. In this chapter, We will learn how to create Azure Synapse pipelines to orchestrate your data.

In this chapter, we will cover the following topics:

Introducing Synapse pipelines
Creating linked services
Defining source and target datasets
Using various activities in Synapse pipelines
Scheduling Synapse pipelines
Creating pipelines using samples

Technical requirements

Before you start orchestrating your data, certain prerequisites apply, as outlined here:

You should have an Azure subscription, or access to any other subscription with contributor-level access.
Create your Synapse workspace on this subscription. You can follow the instructions from Chapter 1, Introduction to Azure Synapse, to create your Synapse workspace.
Create a Structured Query Language (SQL) pool and a Spark pool on Azure Synapse. This was covered in Chapter 2, Considerations for Your Compute Environment.
You must have an Azure Data Lake Storage Gen2 account with two containers, demozipfiles-ch04 and demozipfilestating-ch04, with read/write permissions.
Download the sample zipped files from http://bit.ly/ch04-prerequisites and extract the ZIP files to get two zipped files, SampleUserData09262020.zip and SampleUserData09272020.zip.
Upload these two zipped files to the demozipfiles-ch04 container in your Azure Data Lake Storage...

Introducing Synapse pipelines

Synapse pipelines are used to perform Extract, Transform, and Load (ETL) operations on data. This service is similar to Azure Data Factory, but these pipelines can be created within Synapse Studio itself. In this section, we are going to learn how to create a pipeline for copying data from different sources to Azure Synapse Analytics. We will also see how we can use multiple activities within the same pipeline and create dependency endpoints to connect one activity with another activity in the pipeline.

The following screenshot shows a Copy data activity in a Synapse pipeline:

Figure 4.1 – A screenshot of a Synapse pipeline in Synapse Studio

These pipelines comprise various components, and we are going to learn about these components in brief in the following sections.

Integration runtime

An Integration Runtime (IR) is a compute infrastructure used by Azure Data Factory or Synapse pipelines to provide data...

Creating linked services

Linked services define the connection information needed for a Synapse pipeline to connect to an external data source. These linked services are not specific to any pipeline, but you can use the same linked service for multiple pipelines at the same time if they share the same data source.

In this example, we are going to create a linked service for Azure SQL Database (which is our data source), with Synapse as our target.

Before we proceed with the steps to create the linked service for the source and target, make sure you have met all the technical requirements outlined at the start of this chapter. Then, proceed as follows:

Launch Synapse Studio by clicking on the Synapse Studio link on the Synapse workspace.
Click on Linked services under the Manage tab, and click on + New to create a new linked service, as illustrated in the following screenshot:
Figure 4.8 – Creating linked services in Azure Synapse
Select Azure Data Lake Storage...

Defining source and target datasets

Datasets are created in a pipeline in order to identify data stored in various data sources in different formats, such as tables, files, folders, documents, and so on. A dataset can be used by multiple activities or pipelines.

Before we start adding some transformations onto the data, we should have the required datasets in place. So, follow these instructions to create a dataset for the source:

Go to the Data tab in Synapse Studio and click on + on the Data canvas, as highlighted in the following screenshot:
Figure 4.12 – Creating a dataset in Synapse Studio
Select Integration dataset from the dropdown, and select the required data store from the list of all available data stores appearing in the Integration dataset window. In this example, we are going to select Azure Data Lake Storage Gen2 as our data store, and then click on Continue.
Select the DelimitedText format for your data from the list of all available options...

Using various activities in Synapse pipelines

Synapse pipelines give you the option to add various transformations; however, we will try to cover just a couple of transformations in this section. Proceed as follows:

Navigate to the Integrate tab on Synapse Studio and click on + to select Pipeline out of the other available options, as illustrated in the following screenshot:
Figure 4.18 – Creating a Synapse pipeline in Synapse Studio
Fill in the name and description in the Properties window of the pipeline that you created in the preceding step and click on Publish all to save the changes.
Let's add some activities to the canvas. We are going to select the Get Metadata activity from the list of all available activities to begin with, as illustrated in the following screenshot:
Figure 4.19 – Adding the Get Metadata activity to the Synapse pipeline canvas
Provide a name for this activity in the General tab. We are going to enter GetMetadataForZipFiles...

Scheduling Synapse pipelines

Azure Synapse pipelines allow you to run your pipeline just once or trigger it manually whenever you need to run it. However, Synapse pipelines enable you to schedule the pipelines to run at regular intervals as well.

With Synapse pipelines, it's just a matter of a few clicks to schedule your pipeline. The following instructions will help you in scheduling your pipeline:

Go to the Triggers page under the Monitor tab in Synapse Studio and click on + New at the top of the screen, as illustrated in the following screenshot:
Figure 4.27 – A screenshot of the Triggers page in Synapse Studio
Provide a name and description for your trigger. It's better to keep your pipeline's line appended to the trigger's name so that in the case of any failure it will be easy to identify the corresponding pipeline. The fields are shown in the following screenshot:
Figure 4.28 – Creating a trigger for the Pipeline_Gen2_Synapse pipeline...

Creating pipelines using samples

Synapse has provided various sample pipelines that can help you in building your production-ready pipeline in just a few steps.

We will go through the following steps to create pipelines using samples provided by Synapse:

Go to the Integrate tab on the Synapse Studio screen.
Go to the sample center by clicking on Browse samples, as highlighted in the following screenshot:
Figure 4.31 – A screenshot of the Browse samples link under the Integrate tab in Synapse Studio
You can see sample datasets, notebooks, and SQL scripts in the sample center. Let's try to use one of the sample notebooks. Go to the Notebooks section, select Getting Started with Delta Lake, and click on Continue. The following screenshot provides an overview of the sample center:

Figure 4.32 – A screenshot of the sample center in Synapse Studio
On the next screen, you can see a preview of the notebook that you selected. Click on Next after going...

Summary

So far, we have learned how to create linked services, datasets, pipelines, and triggers. We learned how can we use multiple activities together in a pipeline. We got a fair understanding of variables and parameters in Synapse pipelines. Synapse has provided the option to use sample pipelines, but it's important to learn how to use these sample pipelines—therefore in this chapter, we also covered how we can start using these.

Synapse supports various data stores and various ways to transform your data, but we could only cover a couple of transformations in this chapter. However, now that you are comfortable with Synapse pipelines, it will be easy for you to add any activity to the pipeline as per your business requirements. You can go to http://bit.ly/transform-data-on-synapse if you want to learn more about any specific activity.

We will talk about a couple of other activities throughout the book that will give you more clarity on Synapse pipelines.

In...

The rest of the chapter is locked

You have been reading a chapter from

Limitless Analytics with Azure Synapse

Published in: Jun 2021Publisher: PacktISBN-13: 9781800205659

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Prashant Kumar Mishra

Prashant Kumar Mishra is an engineering architect at Microsoft. He has more than 10 years of professional expertise in the Microsoft data and AI segment as a developer, consultant, and architect. He has been focused on Microsoft Azure Cloud technologies for several years now and has helped various customers in their data journey. He prefers to share his knowledge with others to make the data community stronger day by day through his blogs and meetup groups.
Read more about Prashant Kumar Mishra

Other recommended products

Related to this chapter

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

Azure Data Engineering Cookbook

This book will help you design and implement modern ETL workflows along with data management, monitoring, and security aspects to meet the current organization's needs. You will use various services such as Azure Data Factory, Azure Databricks, Azure Stream Analytics, and Azure Data Explorer to design efficient data processing solutions.

BookApr 2021454 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure enables you to understand the design and business considerations that you must keep in mind while planning to adopt the cloud analytics model for your business.

BookJan 2021184 pages

Cloud Scale Analytics with Azure Data Services

This book will help you to understand the architectural components of a modern data warehouse and select those suitable for your requirements. You’ll learn everything from how to integrate your source data into Azure Data Lake at scale to how to structure your analytical data estate and more.

BookJul 2021520 pages

Data Modeling for Azure Data Services

Data modeling for Azure Data Services teaches you the core concepts of setting up different types of databases for different use cases. With this hands-on guide, you'll learn how to implement the resulting data model in Azure efficiently.

BookJul 2021428 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure is an end-to-end guide to processing and analyzing big data using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.

BookNov 2019242 pages

Azure Databricks Cookbook

The Azure Databricks Cookbook shows you how to work with the latest as well as older versions of Apache Spark and integrate with various Azure resources for orchestrating, deploying, and monitoring big data solutions. You'll use Azure Databricks to build end-to-end solutions and address challenges in securing, productionizing, and monitoring them.

BookSep 2021452 pages

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Stream Analytics with Microsoft Azure

This book is your guide to understanding the basics of how Azure Stream Analytics works, and build your own analytics solution using its capabilities. By the end of this book, you will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution which can work with any type of data.

BookDec 2017322 pages

SQL Server 2017 Integration Services Cookbook

SQL Server Integration Services is a tool that facilitates data extraction, consolidation, and loading options (ETL), SQL Server coding enhancements, data warehousing, and customizations. With the help of this book, you’ll gain complete hands-on experience of SSIS 2017’s new features, and design and development improvements including SCD, Profiling, Tuning, and Customizations.

BookJun 2017558 pages

Professional Azure SQL Managed Database Administration

Whether it is learning different techniques to monitor and tune an Azure SQL database or improving performance using in-memory technology, this book will enable you to make the most out of Azure SQL database features and functionality for data management solutions.

BookMar 2021724 pages

Hands-On Data Warehousing with Azure Data Factory

Azure Data Factory (ADF) is a Microsoft Azure PaaS solution which supports data movement between many on premises and cloud data sources. This book covers custom tailored tutorials to help you develop , maintain and troubleshoot data movement processes and environments using Azure Data Factory V2 and SQL Server Integration Services 2017

BookMay 2018284 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages