You're reading from Azure Data Engineering Cookbook

Product typeBook

Published inApr 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800206557

Edition1st Edition

Languages

Python

Tools

Azure Stream Analytics

Concepts

Real Time Streaming

Author (1)

Ahmad Osama

Chapter 3: Analyzing Data with Azure Synapse Analytics

Azure Synapse Analytics, formerly known as SQL Data Warehouse, combines data warehousing and big data analytics to provide a unified experience to extract, load, and transform data. Azure Synapse has the following features—Synapse SQL – T-SQL-based analytics (SQL pools and SQL on-demand), Spark Analytics, Synapse pipelines, and Synapse studio. At the time of writing this book, all features except Synapse SQL are in preview.

Azure Synapse Analytics is important to learn as data warehousing is an important part of data engineering and big data solutions.

Azure Synapse SQL can be used to quickly load data from sources and perform transformations. The transformation queries are fast as Azure Synapse SQL uses massive parallel processing (MPP) architecture to process the queries.

In this chapter, we'll cover the following recipes:

Provisioning and connecting to an Azure Synapse SQL pool using PowerShell...

Technical requirements

The following tools are required for the recipes in this chapter:

A Microsoft Azure subscription
PowerShell 7
Microsoft Azure PowerShell
SQL Server Management Studio (SSMS) or Azure Data Studio

Provisioning and connecting to an Azure Synapse SQL pool using PowerShell

In this recipe, we'll provision an Azure Synapse SQL pool using PowerShell. Provisioning a Synapse SQL pool uses the same commands that are used for provisioning an Azure SQL database, but with different parameters.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

How to do it…

Follow the given steps to provision a new Azure Synapse SQL pool:

Execute the following command to create a new resource group:

#Create resource group
New-AzResourceGroup -Name packtade -location centralus

Execute the following command to create a new Azure SQL server:

#create credential object for the Azure SQL Server admin credential
$sqladminpassword = ConvertTo-SecureString 'Sql@Server@1234' -AsPlainText -Force
$sqladmincredential = New-Object System.Management.Automation...

Pausing or resuming a Synapse SQL pool using PowerShell

In an Azure Synapse SQL pool, the compute and storage are decoupled and are therefore charged separately. This means that if we aren't running any queries and are not using the compute, we can pause the Synapse SQL pool to save on compute cost. We'll still be charged for the storage.

A data warehouse is mostly used when it's to be refreshed with the new data or any computation needs to be performed to prepare the summary tables for the reports. We can save compute costs when it's not being used.

In this recipe, we'll learn how to pause and resume a Synapse SQL pool.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

You need a Synapse SQL pool to perform the steps in this recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps...

Scaling an Azure Synapse SQL pool instance using PowerShell

An Azure Synapse SQL pool can be scaled up or down as per the usage or the workload requirement. For example, consider a scenario where we are at performance level DW100c. A new workload requires a higher performance level. Therefore, to support the new workload, we can scale up to a higher performance level, say, DW400c, and scale down to DW100c when the workload finishes.

In this recipe, we'll learn how to scale up or scale down an Azure Synapse SQL pool using PowerShell.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

You need a Synapse SQL pool to perform the steps in the recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the Provisioning and connecting to an Azure Synapse SQL pool using PowerShell recipe.

How to do it…...

Loading data into a SQL pool using PolyBase with T-SQL

PolyBase allows you to query external data in Hadoop, Azure Blob storage, or Azure Data Lake Storage from SQL Server using T-SQL. In this recipe, we'll import data from a CSV file in an Azure Data Lake Storage account into an Azure Synapse SQL pool using PolyBase.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

How to do it…

Follow the given steps to import data into Azure Synapse SQL using PolyBase:

Execute the following command to create an Azure Data Lake Storage account and upload the data:
```
#Create a new Azure Data Lake Storage...
```

Loading data into a SQL pool using the COPY INTO statement

The COPY INTO statement provides a faster and easier way to bulk insert data from Azure storage. We can use one T-SQL COPY INTO statement to ingest data instead of creating multiple database objects. At the time of writing this book, the COPY INTO statement is in preview.

In this recipe, we'll use the COPY INTO statement to load data into an Azure Synapse SQL pool.

Getting ready

Before you start, log in to Azure from PowerShell. To do this, execute the following command and follow the instructions to log in to Azure:

Connect-AzAccount

How to do it…

Follow the given steps to import data into a Synapse SQL pool from Azure Data Lake Storage Gen2:

...

Implementing workload management in an Azure Synapse SQL pool

A data warehouse usually has mixed workloads, such as data import or data warehouse updates, reporting queries, aggregation queries, and data export. Running all of these queries in parallel results in resource challenges in the data warehouse.

Workload management uses workload classification, workload importance, and isolation to provide better control over how the workload uses the system resources.

In this recipe, we'll learn how to prioritize a workload using Windows classifiers and importance.

Getting ready

Before you start, open SSMS and log in to Azure SQL Server.

You need a Synapse SQL pool to perform the steps in this recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the Provisioning and connecting to an Azure Synapse SQL pool using PowerShell recipe.

The recipe requires the orders table in the Synapse SQL pool. If you don't have...

Optimizing queries using materialized views in Azure Synapse Analytics

Views are an old concept in SQL Server and are often used to encapsulate complex queries into virtual tables. We can then replace the query with the virtual table wherever required. A standard view is just a name given to the complex query. Whenever we query the standard view, it accesses the underlying tables in the query to fetch the result set.

Materialized views, unlike standard views, maintain the data as a physical table instead of a virtual table. The view data is maintained just like a physical table and is refreshed automatically whenever the underlying tables are updated.

In this recipe, we'll learn how to optimize queries using materialized views.

Getting ready

Before you start, open SSMS and log in to Azure SQL Server.

You need a Synapse SQL pool to perform the steps in this recipe. If you don't have an existing Synapse SQL pool, you can create one using the steps from the...

The rest of the chapter is locked

You have been reading a chapter from

Azure Data Engineering Cookbook

Published in: Apr 2021Publisher: PacktISBN-13: 9781800206557

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ahmad Osama

Ahmad Osama works for Pitney Bowes Pvt. Ltd. as a technical architect and is a former Microsoft Data Platform MVP. In his day job, he works on developing and maintaining high performant, on-premises and cloud SQL Server OLTP environments as well as deployment and automating tasks using PowerShell. When not working, Ahmad blogs at DataPlatformLabs and can be found glued to his Xbox.
Read more about Ahmad Osama

Other recommended products

Related to this chapter

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

Professional Azure SQL Database Administration

As the cloud version of SQL Server, Azure SQL Database differs in key ways when it comes to management, maintenance, and administration. It’s important to know how to administer SQL Database to fully benefit from all of the features and functionality that it provides. This book addresses important aspects of an Azure SQL Database instance such as migration, backup restorations, pricing policies, security, scalability, monitoring, performance optimization, high availability, and disaster recovery. It is a complete guide for database administrators, and ideal for those who are planning to migrate from on premise SQL Server database to an Azure SQL Server database.

BookJul 2018398 pages

Professional Azure SQL Database Administration

This book is your one-stop solution to learning all that is needed to migrate a traditional on-premise SQL server database to a cloud-based solution with Microsoft Azure. Built with database administrators in mind, this book emulates different scenarios you might come across while working with large, complex SQL database migrations and provides solutions for effectively managing the migrated databases.

BookJul 2019562 pages

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

BookJun 2021392 pages

Professional Azure SQL Managed Database Administration

Whether it is learning different techniques to monitor and tune an Azure SQL database or improving performance using in-memory technology, this book will enable you to make the most out of Azure SQL database features and functionality for data management solutions.

BookMar 2021724 pages

Learning Microsoft Azure Storage

Microsoft Azure Storage is the bedrock of Microsoft's core storage solution offering in Azure. No matter what solution you are building for the cloud, you'll find a compelling use for Azure Storage. This book will help you get up-to-speed quickly on Microsoft Azure Storage by teaching you how to use the different storage services. You will be able to leverage secure design patterns based on real-world scenarios and develop a strong storage foundation for Azure virtual machines.

BookNov 2017276 pages

Azure Databricks Cookbook

The Azure Databricks Cookbook shows you how to work with the latest as well as older versions of Apache Spark and integrate with various Azure resources for orchestrating, deploying, and monitoring big data solutions. You'll use Azure Databricks to build end-to-end solutions and address challenges in securing, productionizing, and monitoring them.

BookSep 2021452 pages

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Hands-On Data Warehousing with Azure Data Factory

Azure Data Factory (ADF) is a Microsoft Azure PaaS solution which supports data movement between many on premises and cloud data sources. This book covers custom tailored tutorials to help you develop , maintain and troubleshoot data movement processes and environments using Azure Data Factory V2 and SQL Server Integration Services 2017

BookMay 2018284 pages

Cloud Scale Analytics with Azure Data Services

This book will help you to understand the architectural components of a modern data warehouse and select those suitable for your requirements. You’ll learn everything from how to integrate your source data into Azure Data Lake at scale to how to structure your analytical data estate and more.

BookJul 2021520 pages

Professional SQL Server High Availability and Disaster Recovery

The Professional SQL Server High Availability and Disaster Recovery book explains the high availability and the disaster recovery technologies, their technical implementation, and different topologies that you can use when creating a highly available infrastructure with hybrid topologies.

BookJan 2019564 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure enables you to understand the design and business considerations that you must keep in mind while planning to adopt the cloud analytics model for your business.

BookJan 2021184 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages