You're reading from Cloud Scale Analytics with Azure Data Services

Product typeBook

Published inJul 2021

PublisherPackt

ISBN-139781800562936

Edition1st Edition

Tools

Azure Pack

Concepts

Data Streaming

Author (1)

Patrik Borosch

Chapter 4: Understanding Synapse SQL Pools and SQL Options

In this chapter, you will learn what Massively Parallel Processing (MPP) means in terms of a cloud PaaS database service. You will examine the concepts of distributing and replicating data in a database. Furthermore, you will see how to manage the workload in this database to your benefit by avoiding early scaling and leveraging all the performance capabilities of the service. Partitioning will extend your options when it comes to massive amounts of data when you need to grow from terabytes to petabytes. You will also learn how to load data efficiently into your database service.

Finally, we will have a look at the next evolutionary steps of the SQL pools in Azure Synapse and other SQL components, such as SQL on-demand compute.

At the end, we will compare other SQL database services and their options in Azure and how they may fit into your architecture.

You will find the following sections covered in this chapter...

Uncovering MPP in the cloud – the power of 60

SQL Synapse Analytics – the MPP database service formerly called Azure SQL Data Warehouse – is already well known. It has already reached its second generation and the third evolutional step is right around the corner. We will see Gen 3 in 2021. So, maybe at the time when this book is published, it will already be available, at least as a preview.

When we examine Synapse Analytics, we find some interesting concepts. Sure, the idea of MPP is not new and it has been implemented already with several appliances out there. The formerly known Microsoft Parallel Data Warehouse (PDW), now called Analytics Platform System (APS), https://docs.microsoft.com/en-us/sql/analytics-platform-system/home-analytics-platform-system-aps-pdw?view=aps-pdw-2016-au7, is one of them. There are some quite successful on-premises solutions from other vendors on the market – Teradata, Greenplum, and DB2 UDB being just a few of them.

...

Provisioning a Synapse dedicated SQL pool

You have two options when it comes to creating a Synapse dedicated SQL pool: a standalone version and a version that is integrated with the Synapse Analytics workspace. As we will need the Synapse workspace several times again later in this book, for example, when you examine the Synapse Spark pools in Chapter 6, Using Synapse Spark Pools, we will go and provision a workspace directly:

Click + Create a resource in your Azure portal and type Synapse into the search field. From the quick results, you can now already select Synapse Analytics or you hit Enter and select Synapse Analytics from the search results. On the Overview blade, hit Create and start provisioning.
On the following Basics blade, fill in the corresponding fields. You need to select the subscription where you want to create your environment. In the Resource Group field, either select an existing one or create a new one.
In the Managed Resource Group field, add...

Talking about partitioning

When you need to load massive amounts of data to your database, partitioning might be another optimization option. But you really should be confronted with massive amounts of data when you start considering partitioning.

Do you remember the math of the CCI and why it will only perform when you load around 63 to 100 million rows to your database (see the preceding section, Understanding CCI)? Now, you need to add another factor to this equation: the number of partitions that you are planning for your database.

Let's assume that you want to have one partition for every month (the most typical usage of partitions) in your table, and you plan to load data for 5 years to your database. This will add another 60 as a factor to your preceding term: 60 distributions x 1,048,578 rows per distribution x 60 months in the database. This results in 3,774,880,800 rows that your table needs to hold as a minimum in order for the CCI to be built over all the partitions...

Implementing workload management

You have read about control nodes and compute nodes in the preceding sections, and you will learn that you can add more compute nodes when your workload needs more power in the Scaling the database section. But keep in mind that adding compute nodes will also add cost to your bill at the end of the month.

Tip

Try and identify time slots where your database is not "hammered on" big time and either scale the database back down or, when you can afford to, hibernate it. You can save a lot of money by doing so. Development and testing environments are also good candidates for switching the database off when they aren't needed.

Fortunately, the Synapse product group has added some options to the database that will help you with optimizing the usage of a given compute environment before you need to scale the database. Workload management will enable you to set different priorities (workload importance) for different processes, to isolate...

Scaling the database

In Figure 4.13, in the Understanding concurrency and memory settings section, you already learned about the available resource settings that you can configure within Synapse Analytics. But how do we get there?

When you browse to your Synapse Analytics service in your portal, you can scale the database from there. In the Overview blade, you will find the Scale link right next to the Resume or Pause button (displayed depending on the running state of the database):

Figure 4.15 – The Resume and Scale buttons on the Overview blade

When you click on the Scale button, you are taken to the Scaling blade of the database. By dragging the slider in the Scale your system section, you can add or remove resources in your database:

Figure 4.16 – Scaling your database

If your database contains workload groups for workload management, they will also be displayed here with an overview of their possible resource...

Loading data

With all the parallel options that the database can offer to you, you want to use them when you load data to your database, too. Remember the purpose of the control and the compute nodes? When loading data to your database, you want to use a technique that makes use of the compute nodes as much as possible.

Using the COPY statement

The COPY statement will support you in doing so. It will talk directly to the compute nodes and will therefore use the whole parallelism that the database can offer. It comes as part of the T-SQL dialect of the Synapse Analytics database and offers many options to influence the loading of data to the database.

When you talk to the control node, in contrast to the capability of the COPY statement, you will create a bottleneck during your load. The load would be single-threaded instead and all the rows that need to be written to the database would first flow through the control node and would then be spread to the distributions using...

Understanding other SQL options in Azure

In Chapter 2, Connecting Requirements and Technology, we talked about different size options for your modern data warehouse. When we look at the M size, we have added an Azure SQL database for the presentation layer. When we compare the Azure SQL database to Synapse Analytics databases, the main difference is the SMP character of the Azure SQL database. See also Figure 4.1 for a comparison of SMP versus MPP.

In a SQL database, data is, by default, stored in a row orientation as it is done in SQL Server. In general, you can think of the Azure SQL database as a single database that you would spin up and use. Almost all of the functionality of a SQL Server database is available with an Azure SQL database as well.

You will also have the option to create CCIs on your tables. This will give you high analytical performance on your data stored there. In comparison to Synapse Analytics, you won't get the same scale-out architecture with...

Summary

In this chapter, you examined the Synapse Analytics database relational storage option. You learned about the MPP architecture and the control and compute nodes, as well as how tables can be distributed or replicated in a database and how partitioning influences the data in the database.

You read about the CCI and how you can benefit from its performance.

Furthermore, you learned about resource allocation, concurrency, and the limits of the DWUc configurations. We touched on workload management and how you can optimize your database workload before you need to scale the database and pay more money for it. But when you need to scale for more concurrency and memory, and therefore more performance, you now know where to search for this functionality.

At the end, you covered the basics of loading data and why it is a good idea to use the COPY statement. You then learned how to maintain your statistics and how to rebuild your CCI to optimize it.

Finally, we compared...

Patrik Borosch is a cloud solution architect for data and AI at Microsoft Switzerland GmbH. He has more than 25 years of BI and analytics development, engineering, and architecture experience and is a Microsoft Certified Data Engineer and a Microsoft Certified AI Engineer. Patrik has worked on numerous significant international data warehouse, data integration, and big data projects. Through this, he has built and extended his experience in all facets, from requirements engineering to data modeling and ETL, all the way to reporting and dashboarding. At Microsoft Switzerland, he supports customers in their journey into the analytical world of the Azure Cloud.
Read more about Patrik Borosch

Other recommended products

Related to this chapter

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

BookJun 2021392 pages

Azure Data Engineering Cookbook

This book will help you design and implement modern ETL workflows along with data management, monitoring, and security aspects to meet the current organization's needs. You will use various services such as Azure Data Factory, Azure Databricks, Azure Stream Analytics, and Azure Data Explorer to design efficient data processing solutions.

BookApr 2021454 pages

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

Azure Databricks Cookbook

The Azure Databricks Cookbook shows you how to work with the latest as well as older versions of Apache Spark and integrate with various Azure resources for orchestrating, deploying, and monitoring big data solutions. You'll use Azure Databricks to build end-to-end solutions and address challenges in securing, productionizing, and monitoring them.

BookSep 2021452 pages

Data Modeling for Azure Data Services

Data modeling for Azure Data Services teaches you the core concepts of setting up different types of databases for different use cases. With this hands-on guide, you'll learn how to implement the resulting data model in Azure efficiently.

BookJul 2021428 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure enables you to understand the design and business considerations that you must keep in mind while planning to adopt the cloud analytics model for your business.

BookJan 2021184 pages

Hands-On Data Warehousing with Azure Data Factory

Azure Data Factory (ADF) is a Microsoft Azure PaaS solution which supports data movement between many on premises and cloud data sources. This book covers custom tailored tutorials to help you develop , maintain and troubleshoot data movement processes and environments using Azure Data Factory V2 and SQL Server Integration Services 2017

BookMay 2018284 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure is an end-to-end guide to processing and analyzing big data using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.

BookNov 2019242 pages

Stream Analytics with Microsoft Azure

This book is your guide to understanding the basics of how Azure Stream Analytics works, and build your own analytics solution using its capabilities. By the end of this book, you will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution which can work with any type of data.

BookDec 2017322 pages

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Distributed Data Systems with Azure Databricks

This book helps you to learn how to extract, transform, and orchestrate massive amounts of data to develop robust data pipelines. You'll perform complex machine learning tasks using advanced Azure Databricks features, and also explore model tuning, deployment, and control using Databricks functionalities such as AutoML and Delta Lake with TensorFlow.

BookMay 2021414 pages

Introducing Microsoft SQL Server 2019

Introducing Microsoft SQL Server 2019 takes you through what’s new in SQL Server 2019 and why it matters. After reading this book, you’ll be well placed to explore exactly how you can make MIcrosoft SQL Server 2019 work best for you.

BookApr 2020488 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Cloud Scale Analytics with Azure Data Services

Chapter 4: Understanding Synapse SQL Pools and SQL Options

Uncovering MPP in the cloud – the power of 60

Provisioning a Synapse dedicated SQL pool

Talking about partitioning

Implementing workload management

Scaling the database

Loading data

Using the COPY statement

Understanding other SQL options in Azure

Summary

Further reading

Additional links

Unlock this book and the full library FREE for 7 days

Author (1)

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

Azure Data Engineering Cookbook

Azure Data Factory Cookbook

Azure Databricks Cookbook

Data Modeling for Azure Data Services

Data modeling for Azure Data Services teaches you the core concepts of setting up different types of databases for different use cases. With this hands-on guide, you'll learn how to implement the resulting data model in Azure efficiently.

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure enables you to understand the design and business considerations that you must keep in mind while planning to adopt the cloud analytics model for your business.

Hands-On Data Warehousing with Azure Data Factory

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure is an end-to-end guide to processing and analyzing big data using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.

Stream Analytics with Microsoft Azure

ETL with Azure Cookbook

Distributed Data Systems with Azure Databricks

Introducing Microsoft SQL Server 2019

Introducing Microsoft SQL Server 2019 takes you through what’s new in SQL Server 2019 and why it matters. After reading this book, you’ll be well placed to explore exactly how you can make MIcrosoft SQL Server 2019 work best for you.

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook