Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Azure Data and AI Architect Handbook

You're reading from  Azure Data and AI Architect Handbook

Product type Book
Published in Jul 2023
Publisher Packt
ISBN-13 9781803234861
Pages 284 pages
Edition 1st Edition
Languages
Authors (2):
Olivier Mertens Olivier Mertens
Profile icon Olivier Mertens
Breght Van Baelen Breght Van Baelen
Profile icon Breght Van Baelen
View More author details

Table of Contents (18) Chapters

Preface Part 1: Introduction to Azure Data Architect
Chapter 1: Introduction to Data Architectures Chapter 2: Preparing for Cloud Adoption Part 2: Data Engineering on Azure
Chapter 3: Ingesting Data into the Cloud Chapter 4: Transforming Data on Azure Chapter 5: Storing Data for Consumption Part 3: Data Warehousing and Analytics
Chapter 6: Data Warehousing Chapter 7: The Semantic Layer Chapter 8: Visualizing Data Using Power BI Chapter 9: Advanced Analytics Using AI Part 4: Data Security, Governance, and Compliance
Chapter 10: Enterprise-Level Data Governance and Compliance Chapter 11: Introduction to Data Security Index Other Books You May Enjoy

Data Warehousing

As businesses collect ever-increasing amounts of data, data warehousing has become a critical component in managing and analyzing data. Cloud-based data warehousing provides a flexible and cost-effective solution for building systems that support data analytics as well as reporting. Here, we’ll explore the fundamental concepts of data warehousing, as well as the different approaches to designing a data warehouse.

We will start by discussing the two main approaches to data warehousing: the normalized approach by Bill Inmon (Inmon, William H. (1992). Building the Data Warehouse. Boston: QED Technical Pub. Group. ISBN 0-89435-404-3. OCLC 24846118) and the dimensional approach by Ralph Kimball (Ralph Kimball and Margy Ross (26 April 2002). The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second ed.). Wiley. ISBN 0-471-20024-7). Inmon’s approach emphasizes the importance of data integration and consistency, while Kimball’s...

Fundamental concepts of data warehousing

An (enterprise) data warehouse, often abbreviated as DW or DWH, is a specialized system utilized for analyzing and reporting data. It acts as a centralized hub where data from different sources is consolidated and organized, serving as a vital component of business intelligence (BI). Businesses need to make informed decisions by learning from data from the past as well as examining present data. To accomplish this, both sets of data are stored in a single location called the data warehouse. Operational systems such as customer relationship management (CRM) systems (sales) or marketing are often data sources of a data warehouse and may require cleansing and curating before they can be utilized for analysis and reporting.

The design of a data warehouse includes two essential concepts: extract, transform, and load (ETL) and extract, load, and transform (ELT). These processes involve extracting data from source systems and transforming it into...

Approaches to data warehousing

There are two popular approaches to designing a data warehouse, namely the normalized approach by Bill Inmon and the dimensional approach by Ralph Kimball. In this section, we’ll explain the two different approaches further in detail.

Normalized approach by Bill Inmon

The first approach we would like to introduce is the normalized approach by Bill Inmon. Bill Inmon was one of the first people to define the term data warehousing and defined it as a “subject-oriented, nonvolatile, integrated, time-variant collection of data in support of management’s decisions.” This means a data warehouse focuses on subjects, such as products and customers, and stores historical data for an indefinite time.

The Inmon design approach constructs a logical model for each subject that includes all relevant attributes, relationships, dependencies, and affiliations. This logical model uses a normalized structure (3NF) to minimize data redundancy...

SCDs

In data warehousing, an SCD is used to manage infrequent changes to the values of a business entity over time, as opposed to changes that occur on a set schedule. Star schema design theory defines various types of SCDs, with Type 1 and Type 2 being the most common. In practice, a dimension table may support a combination of SCD types, such as Type 3 and Type 6, for tracking historical changes. Understanding the differences between these SCD types can be useful for designing an effective data warehousing solution.

Type 1 SCD

A Type 1 SCD design approach updates the dimension table with the most recent changes every time updates in the sources are seen. This method is typically used for storing supplementary values such as email addresses or company names of customers. If a customer’s contact information is altered, an update will be made to the dimension table to reflect the changes. The key of the table, such as CustomerID, will remain unchanged, ensuring that links...

Building a data warehouse in the cloud

Data warehouses can be built with different Azure services. Traditional data warehouses used to be built on-premises with databases in SQL servers. When moving to the cloud, this changed to either SQL server on Azure VMs (Infrastructure as a Service, or IaaS) or Azure SQL Database or Managed Instance (Platform as a Service, or PaaS), depending on how Microsoft-managed the database needed to be. Building SQL databases feel very familiar to building data warehouses as they are also often used operationally as a backend for applications. However, data warehouses are built for analytical purposes, not operational purposes, and thus have different needs, as outlined here:

  • Queries against operational databases are often frequent and simple in nature (small reads and writes), whereas queries against analytical data warehouses are infrequent and complex in nature (often with lots of joins and aggregates).
  • Data warehouses are often nonvolatile...

Summary

In this chapter on data warehousing in the cloud, we covered key concepts and approaches to data warehousing, including the normalized approach by Bill Inmon and the dimensional approach by Ralph Kimball. We also explored building a data warehouse in the cloud using Azure SQL Database and Synapse SQL, including dedicated pools and serverless pools with the medallion architecture.

By the end of this chapter, readers should have a solid understanding of the different data warehousing approaches and the benefits of building a data warehouse in the cloud. They should also be able to build a data warehouse using Azure SQL Database and Synapse SQL. These skills are essential for data architects looking to design and implement data warehousing solutions in the cloud.

The semantic layer is the next logical step in building a data platform after the data warehouse. It enables users to easily access and analyze data without needing to understand the complex underlying data model...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Azure Data and AI Architect Handbook
Published in: Jul 2023 Publisher: Packt ISBN-13: 9781803234861
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}