Reader small image

You're reading from  Limitless Analytics with Azure Synapse

Product typeBook
Published inJun 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800205659
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Prashant Kumar Mishra
Prashant Kumar Mishra
author image
Prashant Kumar Mishra

Prashant Kumar Mishra is an engineering architect at Microsoft. He has more than 10 years of professional expertise in the Microsoft data and AI segment as a developer, consultant, and architect. He has been focused on Microsoft Azure Cloud technologies for several years now and has helped various customers in their data journey. He prefers to share his knowledge with others to make the data community stronger day by day through his blogs and meetup groups.
Read more about Prashant Kumar Mishra

Right arrow

Technical requirements

Before you start orchestrating your data, there are certain prerequisites that you should meet:

Enabling the analytical store in Cosmos DB

You can enable Synapse Link on Cosmos DB directly from the Azure portal:

  1. Log in to the Azure portal at https://porta.azure.com.
  2. Go to your Cosmos DB account and click on Data Explorer.
  3. Click on the Enable link while creating a new container:
    Figure 5.1 – Enabling Azure Synapse Link on a Cosmos DB account

    Figure 5.1 – Enabling Azure Synapse Link on a Cosmos DB account

  4. You can click on the Features tab to verify whether Azure Synapse Link is enabled or not. You have the option to enable it from there as well if it is not enabled yet:
    Figure 5.2 – Verifying the status of Azure Synapse Link under the Features tab of a Cosmos DB account

    Figure 5.2 – Verifying the status of Azure Synapse Link under the Features tab of a Cosmos DB account

    Important note

    The analytical store can only be enabled for new containers.

  5. After you enable the analytical store, it creates a container with the Analytical Storage Time to Live property associated with the container. The default value is -1, which means infinite retention, however, we can change this value to any number of days and as many...

Data storage

A Cosmos DB analytical store is fully isolated from transactional workloads. The operational data in a Cosmos DB container is internally stored in row-based transactional stores in order to allow fast transactional reads and writes.

It is not recommended to run complex queries on your transactional workload – it may cause bad performance for your application running these queries. Ideally, you should add an analytical data layer on top of Cosmos DB transactional data if you want to perform complex operations on the data. The major caveat for this architecture is an ETL operation for data sync between transactional and analytical data stores. This additional step may lead to increased Total Cost of Operation (TCO) and overhead of maintaining the data in sync always.

With this new feature of Synapse Link, Cosmos DB gives you the flexibility to enable an analytical store within your Cosmos DB account without performing an ETL operation. Both the data layers are...

Querying the Cosmos DB analytical store

With Azure Synapse, you get the option to choose between Spark or SQL as your compute environment. You can query a Cosmos DB analytical store using Spark and SQL Serverless, however, this feature is not available with SQL provisioned as of now.

Let's learn how to query data in the analytical store of a Cosmos DB container.

Querying with Azure Synapse Spark

Azure Synapse Spark allows you to analyze data in your Synapse Link enabled Azure Cosmos DB containers. You can query an analytical store from Spark in two possible ways:

  • Loading data to a Spark DataFrame
  • Creating a Spark table

A Spark DataFrame leverages the cached metadata through the lifetime of the Spark session, so any change in the source data will not be reflected here until you start a new Spark session. The metadata of the analytical store is reloaded on every query execution against the Spark table.

You can ingest data to the analytical store of...

Summary

In this chapter, we covered Azure Synapse Link, which is a new feature added to Azure Synapse, and we learned a step-by-step process to query data directly from an Azure Cosmos DB account. This feature dispenses with the need for ETL processes to bring data from a Cosmos DB account to Synapse. Now, we know that we can write queries directly on Cosmos DB data by creating corresponding linked services. We also saw how the transactional store syncs the data in the analytical store through auto-sync, and we learned about modes of schema representation in the analytical store. We used the Python language in this chapter; however, you are free to use any supported language that you are comfortable with.

There are many possible use cases of Azure Synapse Link. You can find a couple of these use cases mentioned in Microsoft Docs: https://docs.microsoft.com/en-us/azure/cosmos-db/synapse-link-use-cases.

In the next chapter, we are going to get some good coding experience on Azure...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Limitless Analytics with Azure Synapse
Published in: Jun 2021Publisher: PacktISBN-13: 9781800205659
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Prashant Kumar Mishra

Prashant Kumar Mishra is an engineering architect at Microsoft. He has more than 10 years of professional expertise in the Microsoft data and AI segment as a developer, consultant, and architect. He has been focused on Microsoft Azure Cloud technologies for several years now and has helped various customers in their data journey. He prefers to share his knowledge with others to make the data community stronger day by day through his blogs and meetup groups.
Read more about Prashant Kumar Mishra