Reader small image

You're reading from  Limitless Analytics with Azure Synapse

Product typeBook
Published inJun 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800205659
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Prashant Kumar Mishra
Prashant Kumar Mishra
author image
Prashant Kumar Mishra

Prashant Kumar Mishra is an engineering architect at Microsoft. He has more than 10 years of professional expertise in the Microsoft data and AI segment as a developer, consultant, and architect. He has been focused on Microsoft Azure Cloud technologies for several years now and has helped various customers in their data journey. He prefers to share his knowledge with others to make the data community stronger day by day through his blogs and meetup groups.
Read more about Prashant Kumar Mishra

Right arrow

Chapter 3: Bringing Your Data to Azure Synapse

Data has been the backbone of many top enterprises over the past few decades. Now, you can bring your data from various sources to Azure Synapse through various means and start analyzing your data immediately.

So far, you have learned about the Synapse workspace, as well as the architecture and components of Synapse SQL and Synapse Spark. You can create your SQL or Spark pool on Azure Synapse. So now, it's time to take the next step and bring your data to Azure Synapse.

The following topics will be the focus of this chapter, along with the various concepts that you must be aware of before you decide which method to use:

  • Using Synapse pipelines to import data
  • Using Azure Data Factory to import data
  • Using SQL Server Integration Services to import data
  • Using a COPY statement to import data

Technical requirements

To comply with the instructions in the following sections, there are a number of prerequisites that need to be fulfilled before we proceed:

  • You should have your Azure subscription, or access to any other subscription with contributor-level access.
  • Create your Synapse workspace on this subscription. You can follow the instructions from Chapter 1, Introduction to Azure Synapse, to create your Synapse workspace.
  • Create your SQL pool and Spark pool on Azure Synapse. This has been covered in Chapter 2, Consideration for Your Compute Environment.
  • You must have created a storage account or must have the requisite permission to access Data Lake. You can go to the following link, https://azure.microsoft.com/en-us/resources/videos/creating-your-first-adls-gen2-data-lake/, to create a new storage account if you are creating one for the first time.
  • You must have SQL Server installed on your machine to follow this chapter. If you do not have a SQL...

Using Synapse pipelines to import data

Data ingestion is one of the most critical aspects of data analytics, and there are many tools available for data movement. The challenge is to decide which tool is more efficient for your environment. In this section, we are going to show how can you use inbuilt orchestration tools available in Synapse for data ingestion. There are two options available under the Orchestrate tab: the first one is Pipeline, which can be used for data ingestion, but also allows you to add transformation logic. The second option is to use Copy Data tool, which gives you the option to only move data without implementing any data transformation logic. So, in a nutshell, you can use Copy Data tool if you require a pipeline just for data ingestion, but you have the option to use Pipeline if you need to add business logic to your data.

The following screenshot shows the Integrate hub of Synapse Studio, where we are going to use the Copy Data tool or Pipeline options...

Bringing data to your Synapse SQL pool using Copy Data tool

Copy Data tool makes it very easy to bring your data to Azure Synapse. This is not that different to using the Copy activity of Azure Data Factory, except you do not have to spin up another service for data ingestion in Azure Synapse. You need to make sure you have applied all of the technical requirements before you start following these steps:

  1. Click on Copy Data tool as highlighted in Figure 3.1. This will open a new window where you need to provide the source and destination connection details.
  2. Provide an appropriate name for your pipeline, along with a brief description.
  3. You can choose to run this pipeline once only, or you can schedule it to run regularly. For this example, we are going to schedule our pipeline to run on a daily basis.

    Click on Run regularly on schedule and select the Schedule trigger type.

  4. Provide an appropriate value for Start Date (UTC). This is auto populated with the current date...

Using Azure Data Factory to import data

Data Factory pipelines and Synapse pipelines have almost identical features. The only major difference lies in how you create your pipeline: you need to spin up another resource in Azure if you want to use Data Factory for data ingestion, whereas you can create pipelines within Synapse directly without leaving your Synapse workspace.

As we have already covered Copy Data tool in Synapse, which is exactly like Copy Data tool in Data Factory, in this section, we will create a pipeline to bring the data to Synapse. Make sure you have already satisfied all the prerequisites mentioned in the Technical requirements section:

  1. Log in to the Azure portal, at https://portal.azure.com.
  2. Click on Create a resource on the Azure home page and search for Data Factory in the Search Marketplace bar.
  3. Select Data Factory from the search results and then click on Create.
  4. Select the subscription and resource group where you want to create your...

Using SQL Server Integration Services to import data

SSIS is an ETL tool that is used for data ingestion and orchestration purposes. This tool comes with the SQL Server license, so, if you already have a SQL Server license, you may want to use SSIS as your ETL tool instead of spending money on any other ETL services. However, it is important to understand the pros and cons of using the SSIS package instead of using an Azure PaaS service such as Data Factory.

So far, we have covered Synapse pipelines and Data Factory, and now it's time to learn how to bring data to Azure Synapse using SQL Server Integration Services. Follow these steps to create your SSIS package in Visual Studio:

  1. Launch Visual Studio and create a new Integration Services project in Visual Studio.
  2. Enter the project name and provide a file location for saving SSIS packages.
  3. Add Data Flow Task from the SSIS Toolbox to the canvas.
  4. Go to the Connection Managers window and right-click to select...

Using a COPY statement to import data

There are various ways in which to bring data from various sources to Azure Synapse SQL. However, it is recommended that you use a COPY statement if your data is residing in an Azure Storage account. The best part of following this technique is that you can copy the data just by running a single T-SQL statement. The syntax for the COPY statement is very simple, with a set of arguments to choose from. You can decide which argument you want to use with your COPY statement.

The following is the syntax that can be customized as per your business requirements to bring the data from an external source to the Azure Synapse SQL pool:

COPY INTO [schema.]table_name
[(Column_list)] 
FROM '<external_location>' [,...n]
WITH  
 ( 
 [FILE_TYPE = {'CSV' | 'PARQUET' | 'ORC'} ]
 [,FILE_FORMAT = EXTERNAL FILE FORMAT OBJECT ]
 [,CREDENTIAL = (AZURE CREDENTIAL) ]
 [,ERRORFILE = '[http(s)://storageaccount...

Summary

In this chapter, we covered various ways to bring your data to Azure Synapse. We will be using these techniques further in upcoming chapters as per your requirements. You are the best person to decide which tool will be the best fit for your business, but it is important to have an understanding of all of these tools before you bet on any of these in particular.

In this chapter, we have covered data ingestion without any orchestration. However, in the next chapter you will learn how to orchestrate your data in Azure Synapse by using Synapse pipelines.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Limitless Analytics with Azure Synapse
Published in: Jun 2021Publisher: PacktISBN-13: 9781800205659
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Prashant Kumar Mishra

Prashant Kumar Mishra is an engineering architect at Microsoft. He has more than 10 years of professional expertise in the Microsoft data and AI segment as a developer, consultant, and architect. He has been focused on Microsoft Azure Cloud technologies for several years now and has helped various customers in their data journey. He prefers to share his knowledge with others to make the data community stronger day by day through his blogs and meetup groups.
Read more about Prashant Kumar Mishra