You're reading from Azure Data and AI Architect Handbook

Product typeBook

Published inJul 2023

PublisherPackt

ISBN-139781803234861

Edition1st Edition

Tools

Azure Databricks

Concepts

Data Science

Authors (2):

Olivier Mertens

Breght Van Baelen

View More author details

Storing Data for Consumption

This chapter will explore the critical topic of early data orchestration and storage design. As companies gather increasingly massive amounts of data, it becomes more important to establish best practices for managing and storing that data efficiently.

We will begin by examining how to classify data as structured, semi-structured, or unstructured, and how to determine its use case. We will also determine how data will be used and the differences between ACID transactions and non-ACID transactions, SQL and NoSQL databases, and OLAP and OLTP systems. Additionally, we will focus on when to choose which storage service in Azure, such as Azure Cosmos DB, Azure SQL Database, or Azure Blob Storage, based on your data platform’s specific functional and technical requirements.

By the end of this chapter, you will have a firm grasp of the fundamental principles of data storage design, as well as the tools and techniques available for constructing a robust...

Classifying the data type

First, we will explore how the architect can classify different types of data. Data can be classified into three different types:

Structured data
Semi-structured data
Unstructured data

We will also examine various file types associated with each type of data, as different file formats have their own characteristics, benefits, and drawbacks. For each data type, a solid understanding of these file types and their features can help to optimize storage costs, retrieval speeds, and scalability.

Note that there can be some ambiguity on which file format falls under which data type. In particular, file formats such as CSV and Avro are often classified as either structured or semi-structured, depending on whom you ask and what their exact definition is. However, this exact classification is not of importance to the data architect. What is important is knowing which file type is optimal in which scenario.

Structured data

Structured data...

Determining how the data will be used

The aforementioned data types are stored in either a data lake or a database. How the data will be used will determine in which service the data needs to be stored.

As described in the previous chapters, a data lake is a centralized repository that allows data to be stored in its raw format without the need for predefined schemas. Data lakes are often used for big data and analytics workloads, as they enable storing and processing large amounts of data from various sources in a flexible way.

A database, on the other hand, can store structured (and, in some cases, semi-structured) data that is organized in a specific way, typically with a defined schema and defined relationships between the data. This form of organization makes it easy to search, sort, and manipulate the data, and is often used for transactional workloads.

Relational databases

Structured data is often stored and queried using relational databases. These databases utilize...

Choosing the right storage solution on Azure

Now that we’ve reviewed various storage concepts, let’s examine the Azure storage options available to the cloud solution architect and how they correspond to OLTP, OLAP, and NoSQL.

Azure OLTP services

For OLTP scenarios, we will discuss the following:

SQL Server on Azure virtual machines
Azure SQL Managed Instance
Azure SQL Database

Briefly put, choosing an OLTP service on Azure comes down to deciding on the right SQL option. The level of manageability is a key difference between options, with SQL Server on virtual machines being an Infrastructure-as-a-Service (IaaS) solution, while Azure SQL Managed Instance and Azure SQL Database come as Platform-as-a-Service (PaaS) solutions. The differences are captured in Figure 5.4:

Figure 5.4 – The difference in the level of management between the three cloud-based SQL options

As with any IaaS versus PaaS situation, it...

Summary

To summarize, this chapter provided you with valuable skills and lessons related to storage design. We learned how to classify data as structured, semi-structured, or unstructured, which is essential for choosing the right type of storage solution. Next, we determined how the data will be used and covered key concepts such as ACID transactions, SQL and NoSQL databases, and OLAP and OLTP systems. Finally, we learned how to choose which storage service to use in Azure and, in every scenario, whether it requires an OLTP, OLAP, or NoSQL solution. For each of the three, you will have a set of solid and powerful services to choose from.

These skills and lessons are vital for businesses and organizations that manage large amounts of data. By understanding how to classify data and choose the right data serving method, companies can ensure their data platform is efficient, scalable, and capable of supporting their business needs. Choosing the right storage service in Azure can help...

The rest of the chapter is locked

You have been reading a chapter from

Azure Data and AI Architect Handbook

Published in: Jul 2023Publisher: PacktISBN-13: 9781803234861

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Olivier Mertens

Olivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium. Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master's degree in information management, a postgraduate degree as an AI business architect, and a bachelor's degree in business management.
Read more about Olivier Mertens

Breght Van Baelen

Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium. Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master's degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor's degree in computer science from the University of Hasselt.
Read more about Breght Van Baelen

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages