Reader small image

You're reading from  Azure Data Engineering Cookbook

Product typeBook
Published inApr 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800206557
Edition1st Edition
Languages
Right arrow
Author (1)
Ahmad Osama
Ahmad Osama
author image
Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama

Right arrow

Chapter 3: Analyzing Data with Azure Synapse Analytics

Azure Synapse Analytics, formerly known as SQL Data Warehouse, combines data warehousing and big data analytics to provide a unified experience to extract, load, and transform data. Azure Synapse has the following features—Synapse SQL – T-SQL-based analytics (SQL pools and SQL on-demand), Spark Analytics, Synapse pipelines, and Synapse studio. At the time of writing this book, all features except Synapse SQL are in preview.

Azure Synapse Analytics is important to learn as data warehousing is an important part of data engineering and big data solutions.

Azure Synapse SQL can be used to quickly load data from sources and perform transformations. The transformation queries are fast as Azure Synapse SQL uses massive parallel processing (MPP) architecture to process the queries.

In this chapter, we'll cover the following recipes:

  • Provisioning and connecting to an Azure Synapse SQL pool using PowerShell...

Technical requirements

The following tools are required for the recipes in this chapter:

  • A Microsoft Azure subscription
  • PowerShell 7
  • Microsoft Azure PowerShell
  • SQL Server Management Studio (SSMS) or Azure Data Studio

Provisioning and connecting to an Azure Synapse SQL pool using PowerShell

In this recipe, we'll provision an Azure Synapse SQL pool using PowerShell. Provisioning a Synapse SQL pool uses the same commands that are used for provisioning an Azure SQL database, but with different parameters.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

How to do it…

Follow the given steps to provision a new Azure Synapse SQL pool:

  1. Execute the following command to create a new resource group:
    #Create resource group
    New-AzResourceGroup -Name packtade -location centralus
  2. Execute the following command to create a new Azure SQL server:
    #create credential object for the Azure SQL Server admin credential
    $sqladminpassword = ConvertTo-SecureString 'Sql@Server@1234' -AsPlainText -Force
    $sqladmincredential = New-Object System.Management.Automation...

Pausing or resuming a Synapse SQL pool using PowerShell

In an Azure Synapse SQL pool, the compute and storage are decoupled and are therefore charged separately. This means that if we aren't running any queries and are not using the compute, we can pause the Synapse SQL pool to save on compute cost. We'll still be charged for the storage.

A data warehouse is mostly used when it's to be refreshed with the new data or any computation needs to be performed to prepare the summary tables for the reports. We can save compute costs when it's not being used.

In this recipe, we'll learn how to pause and resume a Synapse SQL pool.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

You need a Synapse SQL pool to perform the steps in this recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps...

Scaling an Azure Synapse SQL pool instance using PowerShell

An Azure Synapse SQL pool can be scaled up or down as per the usage or the workload requirement. For example, consider a scenario where we are at performance level DW100c. A new workload requires a higher performance level. Therefore, to support the new workload, we can scale up to a higher performance level, say, DW400c, and scale down to DW100c when the workload finishes.

In this recipe, we'll learn how to scale up or scale down an Azure Synapse SQL pool using PowerShell.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

You need a Synapse SQL pool to perform the steps in the recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the Provisioning and connecting to an Azure Synapse SQL pool using PowerShell recipe.

How to do it…...

Loading data into a SQL pool using PolyBase with T-SQL

PolyBase allows you to query external data in Hadoop, Azure Blob storage, or Azure Data Lake Storage from SQL Server using T-SQL. In this recipe, we'll import data from a CSV file in an Azure Data Lake Storage account into an Azure Synapse SQL pool using PolyBase.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

You need a Synapse SQL pool to perform the steps in the recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the Provisioning and connecting to an Azure Synapse SQL pool using PowerShell recipe.

How to do it…

Follow the given steps to import data into Azure Synapse SQL using PolyBase:

  1. Execute the following command to create an Azure Data Lake Storage account and upload the data:
    #Create a new Azure Data Lake Storage...

Loading data into a SQL pool using the COPY INTO statement

The COPY INTO statement provides a faster and easier way to bulk insert data from Azure storage. We can use one T-SQL COPY INTO statement to ingest data instead of creating multiple database objects. At the time of writing this book, the COPY INTO statement is in preview.

In this recipe, we'll use the COPY INTO statement to load data into an Azure Synapse SQL pool.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

You need a Synapse SQL pool to perform the steps in the recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the Provisioning and connecting to an Azure Synapse SQL pool using PowerShell recipe.

How to do it…

Follow the given steps to import data into a Synapse SQL pool from Azure Data Lake Storage Gen2:

    ...

Implementing workload management in an Azure Synapse SQL pool

A data warehouse usually has mixed workloads, such as data import or data warehouse updates, reporting queries, aggregation queries, and data export. Running all of these queries in parallel results in resource challenges in the data warehouse.

Workload management uses workload classification, workload importance, and isolation to provide better control over how the workload uses the system resources.

In this recipe, we'll learn how to prioritize a workload using Windows classifiers and importance.

Getting ready

Before you start, open SSMS and log in to Azure SQL Server.

You need a Synapse SQL pool to perform the steps in this recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the Provisioning and connecting to an Azure Synapse SQL pool using PowerShell recipe.

The recipe requires the orders table in the Synapse SQL pool. If you don't have...

Optimizing queries using materialized views in Azure Synapse Analytics

Views are an old concept in SQL Server and are often used to encapsulate complex queries into virtual tables. We can then replace the query with the virtual table wherever required. A standard view is just a name given to the complex query. Whenever we query the standard view, it accesses the underlying tables in the query to fetch the result set.

Materialized views, unlike standard views, maintain the data as a physical table instead of a virtual table. The view data is maintained just like a physical table and is refreshed automatically whenever the underlying tables are updated.

In this recipe, we'll learn how to optimize queries using materialized views.

Getting ready

Before you start, open SSMS and log in to Azure SQL Server.

You need a Synapse SQL pool to perform the steps in this recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Engineering Cookbook
Published in: Apr 2021Publisher: PacktISBN-13: 9781800206557
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama