Reader small image

You're reading from  Cloud Scale Analytics with Azure Data Services

Product typeBook
Published inJul 2021
PublisherPackt
ISBN-139781800562936
Edition1st Edition
Right arrow
Author (1)
Patrik Borosch
Patrik Borosch
author image
Patrik Borosch

Patrik Borosch is a cloud solution architect for data and AI at Microsoft Switzerland GmbH. He has more than 25 years of BI and analytics development, engineering, and architecture experience and is a Microsoft Certified Data Engineer and a Microsoft Certified AI Engineer. Patrik has worked on numerous significant international data warehouse, data integration, and big data projects. Through this, he has built and extended his experience in all facets, from requirements engineering to data modeling and ETL, all the way to reporting and dashboarding. At Microsoft Switzerland, he supports customers in their journey into the analytical world of the Azure Cloud.
Read more about Patrik Borosch

Right arrow

Chapter 2: Connecting Requirements and Technology

When you examine what cloud vendors are offering to form an analytical data estate, there is an abundance of options. Many data architects are investing a lot of time in carving out the right technology and the right tools to build on. This chapter will focus on the reference architecture of the modern data warehouse and introduce the suitable Microsoft Azure services with a short description, as well as the information you will need to select the right service to reach your goal.

In this chapter, we will cover the following topics:

  • Formulating your requirements
  • Understanding basic architecture patterns
  • Finding the right Azure tool for the right purpose
  • Understanding industry data models
  • Defining T-shirt sizes
  • Understanding the supporting services

Formulating your requirements

I won't try to write an exhaustive requirements engineering paragraph here. Almost every Data Architect is already using a comprehensive method to collect requirements and derive suitable artifacts from them. Let's try to emphasize the direction and the results that should be focused on when the requirements have been gathered and engineered.

Asking in the right direction

When you are planning for a modern data warehouse, you need to come up with questions that must be answered. Please see the Questions section at the end of this chapter for more examples. Let's just go through some examples here, to give you an idea:

  • For the main storage component, what is your expected volume of data?
  • How will you put data in your system? Are you already using an ETL/ELT tool, and can it connect to all the sources and targets that you need to maintain?
  • How will data be transformed throughout the system? Do you want to perform...

Understanding basic architecture patterns

In this section, we will examine the basic architecture pattern of the modern data warehouse (see the following diagram) and examine the components in a bit more detail:

Figure 2.1 – High-level modern data warehouse architecture

We'll now dive into the scalable components in the following section.

Examining the scalable storage component

This points to a storage component that must be able to store any amount of data in any format. We need to be able to create folder structures of the necessary depth. The component needs to offer high throughput when writing, as well as reading, data. Additionally, security is of high importance, and we need at least a Portable Operating System Interface (POSIX)-like interface with the option to configure access control that's both file- and folder-based. This means we must be able to control who can read, write, or execute content that is stored here.

As...

Finding the right Azure tool for the right purpose

Now that we have a more detailed generic picture of the components that make a modern data warehouse, let's examine the available Azure services. We will try to identify the ones that are suitable for your requirements. All the following services are classified as Platform-as-a-Service (PaaS) components. And, as a consultant will always tell you, it depends.

While going through your engineered requirements, you should have a picture in mind of where you want to go. You will know about the data volume, data formats and sources, your transformations, the presentation strategy, and all the questions from the first section of this chapter (see Asking in the right direction).

The answers to the questions about volume, for example, together with data formats, will play a vital role in you selecting your storage component. They will point you to the DBs service that you will choose to implement your presentation layer in. There...

Understanding Industry Data Models

Industry data models have been developed by different companies for a long time. Microsoft started to offer this approach with the Common Data Model (CDM), which spans all three cloud pillars of Microsoft (Azure, Office 365, and Dynamics 365). The goal is to easily interconnect data services with each other and follow a predefined structure that is known by all the components involved. If, for example, sales data is read from Dynamics 365, its structure is defined and already available for Data Factory. It can be written to Azure Data Lake Storage or Synapse SQL pools, and from there Power BI already knows the structure and can offer reports and dashboards.

The CDM can be adjusted to your needs. This means you, as the developer, can add attributes to existing entities, create additional entities, or skip the ones you don't need.

To learn more about industry data models, check out Chapter 13, Introducing Industry Data Models.

Note

...

Thinking about different sizes

When you look at your engineered requirements and go through all the Azure data services that can form a modern data warehouse, you still might find it complex to decide which services to pick. And with the generic reference architecture in mind, there is no silver bullet to provision so that everything is fine. But let's examine some considerations about sizes, performance, and cost. One of the beauties of the cloud in general, and the ADS framework on Azure in particular, is that you can always switch gears once you recognize your system is too small, you need far more punch for your calculations, or to ensure speed for your users. Don't get me wrong – some services can be cumbersome to replace. However, it is still far easier to replace something than to rebuild everything from scratch.

Let's check out the three different sizes of modern data warehouse by using S, M, and L as rough indicators of our requirements.

Planning...

Understanding the supporting services

Regardless of the size you are planning for, some services might catch your attention anyway. This section will give you an overview of the supporting services that you might need in your modern data warehouse approach.

Requiring data governance

When you start building your modern data warehouse, there is one thing that you need to get right: data governance! There are too many data lakes out there that have mutated into data swamps, and you'll find so many Data Warehouses that have lost any credibility because their users can't find the right data or are only able to find outdated information. The relationship between these data sources might not be clear to self-service BI users if they produce the wrong figures. Plus, dimension data, which describes the measures that you want to report on, might not be up to date or might rely on the wrong sources. Alternatively, your users might not be able to recognize the date and time of...

Summary

In this chapter, we talked about requirements engineering and the importance of asking the right questions. The answers, once engineered in a structured approach, will be vital to deciding on the building blocks and tools that will be used to create our modern data warehouse.

We built a generic reference architecture for a modern data warehouse and examined the building blocks that form this approach. We also talked about their major functionalities.

We then learned about industry models for acceleration purposes and how they can help you kickstart your project.

Finally, we explored three suggestions for sizes and mapped different Azure data services to them. Additionally, you learned about the assisting services that will complete your architecture. And don't get me wrong, there are even more to discover.

What I want to express here is that there are many ways to tackle challenges in a modern cloud-based environment and that, as we mentioned previously, there...

Questions

The following are additional questions from the Asking in the right direction section:

  • General questions: Your modern data warehouse may need to hold data for a longer period. Do you need different access tiers (hot, cool, or archive)? Is older data not accessed that often? How do you need to design the access rights to the data? Are there only automatic processes, or will users want to access data themselves? Do you need to establish replication for reliability, and to what extent?
  • Data loading: Are you planning for a new ETL/ELT tool? Do you want to run that in the cloud or on-premises? What are your expectations for the availability of connectors? What are your expectations for usability, or do you want to code your data transport layer? What language do you prefer for this? What will the volume of the data be that is to be transported? Do you need parallel processes? Do you expect scalability?
  • Data transformation: Do you want to perform data transformation...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cloud Scale Analytics with Azure Data Services
Published in: Jul 2021Publisher: PacktISBN-13: 9781800562936
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Patrik Borosch

Patrik Borosch is a cloud solution architect for data and AI at Microsoft Switzerland GmbH. He has more than 25 years of BI and analytics development, engineering, and architecture experience and is a Microsoft Certified Data Engineer and a Microsoft Certified AI Engineer. Patrik has worked on numerous significant international data warehouse, data integration, and big data projects. Through this, he has built and extended his experience in all facets, from requirements engineering to data modeling and ETL, all the way to reporting and dashboarding. At Microsoft Switzerland, he supports customers in their journey into the analytical world of the Azure Cloud.
Read more about Patrik Borosch