You're reading from Data Modeling with Snowflake

Product type Book

Published in May 2023

Publisher Packt

ISBN-13 9781837634453

Pages 324 pages

Edition 1st Edition

Languages

Concepts

Data Engineering

Author (1):

Serge Gershkovich

Table of Contents (24) Chapters

Preface

1. Part 1: Core Concepts in Data Modeling and Snowflake Architecture

2. Chapter 1: Unlocking the Power of Modeling

3. Chapter 2: An Introduction to the Four Modeling Types

4. Chapter 3: Mastering Snowflake’s Architecture

5. Chapter 4: Mastering Snowflake Objects

6. Chapter 5: Speaking Modeling through Snowflake Objects

7. Chapter 6: Seeing Snowflake’s Architecture through Modeling Notation

8. Part 2: Applied Modeling from Idea to Deployment

9. Chapter 7: Putting Conceptual Modeling into Practice

10. Chapter 8: Putting Logical Modeling into Practice

11. Chapter 9: Database Normalization

12. Chapter 10: Database Naming and Structure

13. Chapter 11: Putting Physical Modeling into Practice

14. Part 3: Solving Real-World Problems with Transformational Modeling

15. Chapter 12: Putting Transformational Modeling into Practice

16. Chapter 13: Modeling Slowly Changing Dimensions

17. Chapter 14: Modeling Facts for Rapid Analysis

18. Chapter 15: Modeling Semi-Structured Data

19. Chapter 16: Modeling Hierarchies

20. Chapter 17: Scaling Data Models through Modern Techniques

21. Index

Why subscribe?

22. Other Books You May Enjoy

Appendix

A look at relational and transformational modeling

The previous section describes how modeling varies between operational and data warehouse scenarios. Before exploring the modeling process in detail, it’s helpful to understand the look and feel of relational and transformational modeling and what we’re working toward. Before proceeding, it would help to summarize the main differences between transactional databases and data warehouses. You can see what these are in the following table:

Transactional Database	Data Warehouse
Supports daily operations	Provides operational insight
Operates on single records	Summarizes many records
Accurate as of the present instant	Historical snapshots over time
Single source of truth (SSOT), non-redundant	Redundant to support different analyses
Data models defined by business operations	Data models generated by business questions
Static and structured data model	Inherited structure and dynamically transformed data model
Single-application data	Multiple sources of converging data

Figure 1.4 – Common differences between transactional databases and warehouses

Given these differences, the following sections demonstrate what modeling looks like in each system and what it aims to achieve.

What modeling looks like in operational systems

Completely ignoring the modeling workflow that got us here, which will be covered in later chapters, we can observe an example of the type of modeling most commonly seen in transactional systems. The physical diagram in Figure 1.5 serves both as a blueprint for declaring the required tables and a guide to understanding their business context.

Following modeling conventions (don’t worry if they are still unfamiliar—they will be covered thoroughly in the coming chapters), we can infer a lot of information from this simple diagram. For example, a person is uniquely identified by an eight-digit identifier (the primary key) and must have a Social Security number (SSN), driver’s license, name, and birth date.

The one-to-many relationship between the two tables establishes that while a person does not necessarily need to have an account created, an account must belong to just one person:

Figure 1.5 – A physical model using crow’s foot notation

These details, combined with the list of attributes, data types, and constraints, not only dictate what kinds of data can be written to these tables but also provide an idea of how the business operates. So, how does this differ in analytical databases?

What modeling looks like in analytical systems

In a data warehouse scenario, the PERSON and ACCOUNT tables would not be defined from scratch—they would be extracted from the source in which they exist and loaded—bringing both structure and data into the process. Then, the analytical transformations begin in answer to the organization’s business questions. This is a process known as Extract Transform Load (ETL). (Although ELT has become the preferred processing order, the original term stuck.)

Suppose the management team wanted to analyze which age groups (by decade) were opening which account types and they wanted to store the result in a separate table for an independent analysis.

The following diagram shows the resulting relational model of an object obtained through transformational analysis but provides no business context:

Figure 1.6 – A relational model of a transformational requirement

Although physical modeling could describe such a table (as seen in Figure 1.6)—containing the account type with age and count of accounts as integers—such a model would fail to communicate the most relevant details, presented here:

The logic used to perform the analysis
The relationship between the source tables and the output

The business requirement for ACCOUNT_TYPE_AGE_ANALYSIS in this example purposely excludes the source key fields from the target table, preventing the possibility of establishing any relational links. However, the relational model still serves a vital role: it tells us how the sources are related and how to join them correctly to produce the required analysis.

The logic could then be constructed by joining PERSON and ACCOUNT, as shown here:

CREATE TABLE account_types_age_analysis AS
SELECT
    a.account_type,
    ROUND(DATEDIFF(years, p.birth_date, CURRENT_DATE()), -1
    ) AS age_decade,
    COUNT(a.account_id) AS total_accounts
    FROM account AS  a
     INNER JOIN person AS p
     ON a.person_id = p.person_id
GROUP BY     1, 2;

Although there is no relational connection between ACCOUNT_TYPE_AGE_ANALYSIS and its sources, there is still a clear dependency on them and their columns. Instead of using ERDs, which convey entities and relationships, transformational pipelines are visualized through a lineage diagram. This type of diagram gives a column-level mapping from source to target, including all intermediate steps, as shown here:

Figure 1.7 – Transformational modeling seen visually

Paired with the SQL logic used to construct it, the lineage graph gives a complete picture of the transformational relationship between sources and targets in an analytical/warehousing scenario.

Having witnessed both relational and analytical approaches to modeling, it is clear that both play a vital role in navigating the complex dynamic environments that one is liable to encounter in an enterprise-scale Snowflake environment.

Although we have only skimmed the surface of what modeling entails and the unique features of the Snowflake platform that can be leveraged to this end, this chapter has hopefully given you an idea of the vital role that modeling plays in building, maintaining, and documenting database systems. Before diving into the specifics of verbal, technical, and visual modeling semantics of modeling in the chapters to come, let’s review what we learned.

You're reading from Data Modeling with Snowflake

Table of Contents (24) Chapters

A look at relational and transformational modeling

What modeling looks like in operational systems

What modeling looks like in analytical systems

Authors (1)

Personalised recommendations for you