Reader small image

You're reading from  Azure Data and AI Architect Handbook

Product typeBook
Published inJul 2023
PublisherPackt
ISBN-139781803234861
Edition1st Edition
Concepts
Right arrow
Authors (2):
Olivier Mertens
Olivier Mertens
author image
Olivier Mertens

Olivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium. Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master's degree in information management, a postgraduate degree as an AI business architect, and a bachelor's degree in business management.
Read more about Olivier Mertens

Breght Van Baelen
Breght Van Baelen
author image
Breght Van Baelen

Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium. Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master's degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor's degree in computer science from the University of Hasselt.
Read more about Breght Van Baelen

View More author details
Right arrow

Storing Data for Consumption

This chapter will explore the critical topic of early data orchestration and storage design. As companies gather increasingly massive amounts of data, it becomes more important to establish best practices for managing and storing that data efficiently.

We will begin by examining how to classify data as structured, semi-structured, or unstructured, and how to determine its use case. We will also determine how data will be used and the differences between ACID transactions and non-ACID transactions, SQL and NoSQL databases, and OLAP and OLTP systems. Additionally, we will focus on when to choose which storage service in Azure, such as Azure Cosmos DB, Azure SQL Database, or Azure Blob Storage, based on your data platform’s specific functional and technical requirements.

By the end of this chapter, you will have a firm grasp of the fundamental principles of data storage design, as well as the tools and techniques available for constructing a robust...

Classifying the data type

First, we will explore how the architect can classify different types of data. Data can be classified into three different types:

  • Structured data
  • Semi-structured data
  • Unstructured data

We will also examine various file types associated with each type of data, as different file formats have their own characteristics, benefits, and drawbacks. For each data type, a solid understanding of these file types and their features can help to optimize storage costs, retrieval speeds, and scalability.

Note that there can be some ambiguity on which file format falls under which data type. In particular, file formats such as CSV and Avro are often classified as either structured or semi-structured, depending on whom you ask and what their exact definition is. However, this exact classification is not of importance to the data architect. What is important is knowing which file type is optimal in which scenario.

Structured data

Structured data...

Determining how the data will be used

The aforementioned data types are stored in either a data lake or a database. How the data will be used will determine in which service the data needs to be stored.

As described in the previous chapters, a data lake is a centralized repository that allows data to be stored in its raw format without the need for predefined schemas. Data lakes are often used for big data and analytics workloads, as they enable storing and processing large amounts of data from various sources in a flexible way.

A database, on the other hand, can store structured (and, in some cases, semi-structured) data that is organized in a specific way, typically with a defined schema and defined relationships between the data. This form of organization makes it easy to search, sort, and manipulate the data, and is often used for transactional workloads.

Relational databases

Structured data is often stored and queried using relational databases. These databases utilize...

Choosing the right storage solution on Azure

Now that we’ve reviewed various storage concepts, let’s examine the Azure storage options available to the cloud solution architect and how they correspond to OLTP, OLAP, and NoSQL.

Azure OLTP services

For OLTP scenarios, we will discuss the following:

  • SQL Server on Azure virtual machines
  • Azure SQL Managed Instance
  • Azure SQL Database

Briefly put, choosing an OLTP service on Azure comes down to deciding on the right SQL option. The level of manageability is a key difference between options, with SQL Server on virtual machines being an Infrastructure-as-a-Service (IaaS) solution, while Azure SQL Managed Instance and Azure SQL Database come as Platform-as-a-Service (PaaS) solutions. The differences are captured in Figure 5.4:

Figure 5.4 – The difference in the level of management between the three cloud-based SQL options

Figure 5.4 – The difference in the level of management between the three cloud-based SQL options

As with any IaaS versus PaaS situation, it...

Summary

To summarize, this chapter provided you with valuable skills and lessons related to storage design. We learned how to classify data as structured, semi-structured, or unstructured, which is essential for choosing the right type of storage solution. Next, we determined how the data will be used and covered key concepts such as ACID transactions, SQL and NoSQL databases, and OLAP and OLTP systems. Finally, we learned how to choose which storage service to use in Azure and, in every scenario, whether it requires an OLTP, OLAP, or NoSQL solution. For each of the three, you will have a set of solid and powerful services to choose from.

These skills and lessons are vital for businesses and organizations that manage large amounts of data. By understanding how to classify data and choose the right data serving method, companies can ensure their data platform is efficient, scalable, and capable of supporting their business needs. Choosing the right storage service in Azure can help...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data and AI Architect Handbook
Published in: Jul 2023Publisher: PacktISBN-13: 9781803234861
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Olivier Mertens

Olivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium. Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master's degree in information management, a postgraduate degree as an AI business architect, and a bachelor's degree in business management.
Read more about Olivier Mertens

author image
Breght Van Baelen

Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium. Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master's degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor's degree in computer science from the University of Hasselt.
Read more about Breght Van Baelen