Reader small image

You're reading from  Data Modeling with Snowflake

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837634453
Edition1st Edition
Right arrow
Author (1)
Serge Gershkovich
Serge Gershkovich
author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Right arrow

Modeling Facts for Rapid Analysis

Fact tables are used to store the quantitative measurements of business operations or events such as sales, employee headcounts, or website traffic. Because they contain the official record of business transactions, fact tables are a prime target for operational analytics. Fact tables aggregate metrics such as sales totals and active users, as well as historical trends (deltas), such as the margin impact of daily returns or same-day bookings before cancelations.

Because business needs vary by industry and sector, various fact table models exist to fit these different demands. Facts such as product sales and returns are erratic, while others, such as manufacturing or fulfillment, follow a predictable pattern. The fact tables supporting such processes must anticipate not only the nature of the data they aim to capture but also the organization’s analytical needs to allow for efficient reporting.

This chapter will cover the various fact table...

Technical requirements

The scripts used to instantiate and load the examples in this chapter are available in the following GitHub repo: https://github.com/PacktPublishing/Data-Modeling-with-Snowflake/tree/main/ch14. While the key section of each script will be explained in the latter half of this chapter, please refer to the repo for the complete code used to maintain and schedule the loading of the objects discussed, as it is too long to reprint here.

Fact table types

By capturing the daily operational transactions of an organization, fact tables tend to contain large amounts of records that are constantly growing. By analyzing the data in fact tables, analysts and business users glean insights into business performance and identify trends and patterns. Considering these demands, fact tables must be designed in such a way that balances data loading efficiency with analytical needs and query patterns.

After nearly 20 years and three editions, the definitive guide to designing fact tables remains The Data Warehouse Toolkit. In it, authors Ralph Kimball and Margy Ross expertly cover the fundamentals of dimensional modeling, fact table design, and related industry case studies. This chapter will not attempt to replicate the content to which Kimball and Ross have dedicated nearly a quarter of their book. Instead, we will focus on what The Data Warehouse Toolkit does not cover: database-specific transformations for managing and maintaining...

Fact table measures

Numerical measures associated with a business transaction are called facts, and they fall into three basic categories:

  • Additive facts – These are measures that can be summed across any dimension. Additive facts are the most common type of fact in a DWH, allowing for a wide variety of analytical calculations and insights. These values can be aggregated across any combination of dimensions, such as time, geography, or product. Examples of additive facts include sales revenue, profit, and quantity sold.
  • Semi-additive facts – These are measures that can be summed across some dimensions but not all. These measures are usually numeric values that can only be aggregated across certain dimensions, such as customers or products. Examples of semi-additive facts include account balance and inventory levels, respectively. Semi-additive facts require special handling in data analysis to ensure that the aggregation is done correctly and does not spill...

Getting the facts straight

The facts in a source system are recorded in real time and updated in case of adjustments. By definition, they are always current. A DWH has a much harder task because it needs to capture and report current facts and track historical changes. Suppose an order was adjusted from containing two items to one. The DWH must find a way to report that a change was made while avoiding the issue of double-counting (as the total quantity is now one, not three).

The task of historical tracking is made even more complicated when the facts are not point-in-time transactions but intervals such as advertising campaigns or employee hires and leavers. In such cases, tabulating the cost of a department can no longer be accomplished by simple aggregation because employees can come and go at various intervals.

Operating a business is messy and complex, and the data that it generates is no exception. Employees come and go, orders are returned, and in some cases, records...

Maintaining fact tables using Snowflake features

In this section, we will practice creating and maintaining the fact table techniques discussed previously using available Snowflake features. Like in previous chapters, we will load data from the snowflake_sample_data.tpch_sf10 schema, which will serve as the sample set. We will then simulate source system updates by randomly loading records from this sample.

The first two exercises will use data from the LINEITEM table. To continue, create a schema to house the exercises from this chapter and instantiate the source system table, as well as the DWH landing area.

Open the first file in this chapter’s repository, ch_14.1_reverse_balance_fact.sql, and run the first three steps:

CREATE OR REPLACE SCHEMA ch14_facts;
CREATE OR REPLACE TABLE source_system_lineitem ...;
CREATE OR REPLACE TABLE src_lineitem ...;

These examples require lineitem orders to be loaded and processed in their entirety (containing all constituent...

Summary

In a DWH, fact tables present an additional challenge on top of merely capturing the latest values – they must also be able to capture and reconcile historical changes in a way that allows users to flexibly and cost-effectively query them to resolve business questions, because when it comes to operational analytics, analyzing changes, variations, and what didn’t happen can be just as valuable as the current state of truth.

Various types of fact tables exist to help an organization meet these demanding analytical needs, such as transactional, snapshot, and accumulating snapshot fact tables, among others. These fact tables must differentiate between the various kinds of measures they store (e.g., additive, semi-additive, and non-additive) because each is treated differently when updating or recording changes.

To help data teams construct and maintain these tables in Snowflake, this chapter dissected some of the toughest challenges in maintaining fact tables...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023Publisher: PacktISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich