You're reading from Azure Data and AI Architect Handbook

Product type Book

Published in Jul 2023

Publisher Packt

ISBN-13 9781803234861

Pages 284 pages

Edition 1st Edition

Languages

Concepts

Data Science

Authors (2):

Olivier Mertens

Breght Van Baelen

View More author details

Table of Contents (18) Chapters

Preface

Part 1: Introduction to Azure Data Architect

Chapter 1: Introduction to Data Architectures

Chapter 2: Preparing for Cloud Adoption

Part 2: Data Engineering on Azure

Chapter 3: Ingesting Data into the Cloud

Chapter 4: Transforming Data on Azure

Chapter 5: Storing Data for Consumption

Part 3: Data Warehousing and Analytics

Chapter 6: Data Warehousing

Chapter 7: The Semantic Layer

Chapter 8: Visualizing Data Using Power BI

Chapter 9: Advanced Analytics Using AI

Part 4: Data Security, Governance, and Compliance

Chapter 10: Enterprise-Level Data Governance and Compliance

Chapter 11: Introduction to Data Security

Index

Why subscribe?

Other Books You May Enjoy

The Semantic Layer

Positioned after the data has been modeled in a data warehouse or lakehouse, the semantic layer acts as a performance-enhancing data querying layer for reporting, offering a more efficient and business-friendly view of the data. By employing aggregations, calculations, and business-oriented terms and concepts, the semantic layer simplifies the complexity of technical data structures and caters to the needs of business users. The end result is a well-defined data model that effectively translates the underlying data structures of the warehouse or lakehouse into business-oriented terms. The value of this is to allow reports to more quickly query and visualize data, as well as allow business people to understand the data structures and extract insights from them.

In this chapter, we’re going to cover the following main topics:

Multidimensional versus tabular models
The VertiPaq engine for tabular models
Modes in tabular models
Tools for...

Multidimensional versus tabular models

The semantic layer can be constructed using two different approaches: multi-dimensional models and tabular models. Multi-dimensional models, which represent an earlier technique, involve the creation of what are known as OLAP cubes. These cubes are designed to capture and express the multidimensional nature of the underlying data. To better understand this concept, one can envision an extension of a two-dimensional table in a spreadsheet, where data can be aggregated across not just two axes but multiple dimensions.

To illustrate the practical application of multi-dimensional models, let’s consider a scenario where a data warehouse contains detailed purchase records at the lowest level of granularity. The CEO of the organization wants to translate this data into a structure that allows them to report on revenue trends per country per product. In this case, a multidimensional cube would be created, aggregating the purchase data across...

The VertiPaq engine for tabular models

The VertiPaq engine is an in-memory database engine that is responsible for storing the data model in memory. This engine leverages advanced compression algorithms and memory management techniques to fit the data model in memory:

Columnar compression: VertiPaq stores data in a columnar format, where each column is compressed independently. This allows for efficient compression based on the characteristics of the data in each column. Columnar compression eliminates redundant values and exploits data patterns within a column, resulting in significant space savings.
Hash or dictionary encoding: VertiPaq uses dictionary encoding to compress repetitive values within a column. It creates a dictionary that maps unique values to numeric identifiers, then replaces the actual values with these compact identifiers. As a result, the storage required for repeated values is greatly reduced.
Run-length encoding: This technique is used to compress...

Modes in tabular models

By default, tabular models use an Import mode to load data into memory. An ETL tool such as Power Query extracts the data from data sources, transforms it, and loads it into memory. Afterward, DAX queries can be performed against the in-memory database to calculate and aggregate the data. When tabular models query data residing in memory, processing can be very fast, but the data also needs to be refreshed with the ETL tool every now and then to reflect the most recent changes.

DirectQuery mode is very different from Import mode. Queries are run against the underlying data sources instead of the in-memory database. This means the data is always up-to-date and no refreshes of the in-memory database need to be scheduled, but latencies are higher, meaning performance is worse. Another benefit of DirectQuery mode is that the data model can grow beyond the memory size limits as no copy of the data is kept in memory.

When performing a DAX query against a tabular...

Tools for the semantic layer

There are several different tools for implementing the semantic layer, as follows:

SQL Server Analysis Services (SSAS): SSAS is part of the SQL Server services. It is primarily deployed on-premises, meaning it is hosted within an organization’s local infrastructure. SSAS offers a robust and scalable platform for building multidimensional and tabular models. Initially, only multidimensional models were supported, but due to popular demand, tabular model support was added later. Multidimensional models are designed for complex analytical scenarios and support features such as hierarchies, calculated members, and advanced aggregation capabilities. Tabular models, on the other hand, provide a simpler and more intuitive approach to data modeling, allowing users to create analytical models using tables and relationships. SSAS also includes advanced features such as data mining and data visualization.

The following architecture diagram (Figure...

Summary

In this chapter, we discussed how the semantic layer is used to create a more performant layer for reporting. We learned the semantic layer can be implemented using either multidimensional or tabular models. Multidimensional models are based on the concept of cubes and provide analytical capabilities by aggregating the data. Tabular models, on the other hand, utilize a columnar in-memory technology known as the VertiPaq engine, which enables faster processing and compression. Tabular models are well suited for scenarios where fast query performance and self-service analytics are paramount.

The VertiPaq engine was explained as a key component of tabular models and powers their impressive performance capabilities. By leveraging in-memory storage and columnar data structures, the VertiPaq engine optimizes data compression and enables efficient query execution. This engine plays a significant role in the success of tabular models, allowing for interactive and near-real-time...