Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Limitless Analytics with Azure Synapse

You're reading from  Limitless Analytics with Azure Synapse

Product type Book
Published in Jun 2021
Publisher Packt
ISBN-13 9781800205659
Pages 392 pages
Edition 1st Edition
Languages
Author (1):
Prashant Kumar Mishra Prashant Kumar Mishra
Profile icon Prashant Kumar Mishra

Table of Contents (20) Chapters

Preface Section 1: The Basics and Key Concepts
Chapter 1: Introduction to Azure Synapse Chapter 2: Considerations for Your Compute Environment Section 2: Data Ingestion and Orchestration
Chapter 3: Bringing Your Data to Azure Synapse Chapter 4: Using Synapse Pipelines to Orchestrate Your Data Chapter 5: Using Synapse Link with Azure Cosmos DB Section 3: Azure Synapse for Data Scientists and Business Analysts
Chapter 6: Working with T-SQL in Azure Synapse Chapter 7: Working with R, Python, Scala, .NET, and Spark SQL in Azure Synapse Chapter 8: Integrating a Power BI Workspace with Azure Synapse Chapter 9: Perform Real-Time Analytics on Streaming Data Chapter 10: Generate Powerful Insights on Azure Synapse Using Azure ML Section 4: Best Practices
Chapter 11: Performing Backup and Restore in Azure Synapse Analytics Chapter 12: Securing Data on Azure Synapse Chapter 13: Managing and Monitoring Synapse Workloads Chapter 14: Coding Best Practices Other Books You May Enjoy

Chapter 2: Considerations for Your Compute Environment

This chapter covers the analytics runtimes available with Azure Synapse. You will learn about the concepts of SQL Pool, SQL on-demand, and Spark pool. After completing this chapter, you will be able to decide which analytics runtime will be suitable for solving your business problem.

SQL Pool and SQL on-demand are both part of the Structured Query Language (SQL) engine, but they differ in terms of provisioning. When you create a SQL pool, you will provision databases under a logical server in your subscription; this means you will be paying for running the SQL engine all the time until SQL pool is paused. However, SQL on-demand is created when you want to leverage the SQL engine for running your workloads only for a short duration.

On the other hand, Spark pool works with the Apache Spark engine, deeply integrated with Azure Synapse. This gives you the option to configure your Spark pool with just a few clicks, along with...

Technical requirements

In order to follow the instructions in the following sections, you need to have met certain prerequisites before we proceed, outlined here:

  • You need to have your Azure subscription, or access to any other subscription with contributor-level access.
  • You need to have your Synapse workspace on this subscription. You can follow the instructions from Chapter 1, Introduction to Azure Synapse, to create your Synapse workspace.

Introducing SQL Pool

SQL Pool uses a scale-out, node-based architecture with one control node and multiple compute nodes for distributed computational processing. Control nodes are a single point of contact for end users to interact with all compute nodes. The control node runs the Massively Parallel Processing (MPP) engine, which passes an operation to multiple compute nodes to do their work in parallel. MPP databases are optimized for analytical workloads, such as aggregating and processing large datasets. In this type of architecture, each compute node (which are also called processing units) works independently, with its own operating system and dedicated memory.

In this section, you will learn about the architecture of SQL Pool, which will help you in understanding data distribution across various nodes in SQL Pool. We will cover how to create a SQL pool using both the Azure portal and Synapse Studio in the following section.

Creating a SQL pool

In this section, you will...

Understanding Synapse SQL on-demand

SQL on-demand is a serverless distributed data processing system that enables you to analyze your big data faster. There is no need to set up infrastructure or maintain a cluster to start using SQL on-demand, so you can start querying data as soon the workspace is created.

In this section, we are going to talk about the architecture and components of Synapse SQL on-demand, the benefits of using SQL on-demand, and how you can query files in your Azure Storage accounts using SQL on-demand.

SQL on-demand architecture and components

SQL on-demand is serverless, so scaling automatically accommodates the resource requirements for any query. The SQL on-demand architecture also has a control node, a compute node, DMS, and Azure Storage, but it does not have an MPP engine; instead, it uses a Distributed Query Processing (DQP) engine.

The architecture, as illustrated in the following screenshot, explains how a control node leverages a DQP engine...

Understanding Spark pool

Apache Spark is a very fast unified analytics engine for big data and machine learning.

Synapse Spark Pool is one of Microsoft's implementations of Apache Spark in Azure. Synapse Analytics workspace has a Spark engine built in, along with Notebook support. Because Synapse Spark supports C#, we can write Spark .NET directly within notebooks. You can also write your code in Python, Scala, C#, and SQL.

One Spark pool can be accessed by multiple users, but for every user, one new Spark instance will be created. A Spark instance is also dependent on the Spark pool capacity: if there is enough capacity in the pool to run multiple queries, the existing instance will be able to process the job; otherwise, a new instance will be created to process the job.

The following diagram displays different components of Apache Spark on Azure Synapse:  

Figure 2.17 – Apache Spark in Azure Synapse Analytics

Figure 2.17 – Apache Spark in Azure Synapse Analytics

Let&apos...

Summary

In this chapter, we covered the concepts of Synapse SQL and Synapse Spark. After going through this chapter, you have learned how to create your SQL pool, how to use SQL on-demand, and how to use Spark pool, as well as learning how to change DWUs for your SQL pool using both the Azure portal and Synapse Studio.

You can refer to other books to learn more about Apache Spark. In this chapter, we have tried to cover the Apache Spark concepts that are most relevant to Synapse.

We have used Azure Data Studio in a couple of places, to give you an idea of how it works. We will be seeing Azure Data Studio again, later on. I personally like to use Azure Data Studio because it offers a very smooth SQL coding experience with built-in features such as multiple tab windows, a rich SQL editor, code navigation, and source control integration.

In the next chapter, we are going to talk about various ways to bring your data to Azure Synapse.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Limitless Analytics with Azure Synapse
Published in: Jun 2021 Publisher: Packt ISBN-13: 9781800205659
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}