Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Modeling with Snowflake

You're reading from  Data Modeling with Snowflake

Product type Book
Published in May 2023
Publisher Packt
ISBN-13 9781837634453
Pages 324 pages
Edition 1st Edition
Languages
Author (1):
Serge Gershkovich Serge Gershkovich
Profile icon Serge Gershkovich

Table of Contents (24) Chapters

Preface 1. Part 1: Core Concepts in Data Modeling and Snowflake Architecture
2. Chapter 1: Unlocking the Power of Modeling 3. Chapter 2: An Introduction to the Four Modeling Types 4. Chapter 3: Mastering Snowflake’s Architecture 5. Chapter 4: Mastering Snowflake Objects 6. Chapter 5: Speaking Modeling through Snowflake Objects 7. Chapter 6: Seeing Snowflake’s Architecture through Modeling Notation 8. Part 2: Applied Modeling from Idea to Deployment
9. Chapter 7: Putting Conceptual Modeling into Practice 10. Chapter 8: Putting Logical Modeling into Practice 11. Chapter 9: Database Normalization 12. Chapter 10: Database Naming and Structure 13. Chapter 11: Putting Physical Modeling into Practice 14. Part 3: Solving Real-World Problems with Transformational Modeling
15. Chapter 12: Putting Transformational Modeling into Practice 16. Chapter 13: Modeling Slowly Changing Dimensions 17. Chapter 14: Modeling Facts for Rapid Analysis 18. Chapter 15: Modeling Semi-Structured Data 19. Chapter 16: Modeling Hierarchies 20. Chapter 17: Scaling Data Models through Modern Techniques 21. Index 22. Other Books You May Enjoy Appendix

A look at relational and transformational modeling

The previous section describes how modeling varies between operational and data warehouse scenarios. Before exploring the modeling process in detail, it’s helpful to understand the look and feel of relational and transformational modeling and what we’re working toward. Before proceeding, it would help to summarize the main differences between transactional databases and data warehouses. You can see what these are in the following table:

Transactional Database

Data Warehouse

Supports daily operations

Provides operational insight

Operates on single records

Summarizes many records

Accurate as of the present instant

Historical snapshots over time

Single source of truth (SSOT), non-redundant

Redundant to support different analyses

Data models defined by business operations

Data models generated by business questions

Static and structured data model

Inherited structure and dynamically transformed data model

Single-application data

Multiple sources of converging data

Figure 1.4 – Common differences between transactional databases and warehouses

Given these differences, the following sections demonstrate what modeling looks like in each system and what it aims to achieve.

What modeling looks like in operational systems

Completely ignoring the modeling workflow that got us here, which will be covered in later chapters, we can observe an example of the type of modeling most commonly seen in transactional systems. The physical diagram in Figure 1.5 serves both as a blueprint for declaring the required tables and a guide to understanding their business context.

Following modeling conventions (don’t worry if they are still unfamiliar—they will be covered thoroughly in the coming chapters), we can infer a lot of information from this simple diagram. For example, a person is uniquely identified by an eight-digit identifier (the primary key) and must have a Social Security number (SSN), driver’s license, name, and birth date.

The one-to-many relationship between the two tables establishes that while a person does not necessarily need to have an account created, an account must belong to just one person:

Figure 1.5 – A physical model using crow’s foot notation

Figure 1.5 – A physical model using crow’s foot notation

These details, combined with the list of attributes, data types, and constraints, not only dictate what kinds of data can be written to these tables but also provide an idea of how the business operates. So, how does this differ in analytical databases?

What modeling looks like in analytical systems

In a data warehouse scenario, the PERSON and ACCOUNT tables would not be defined from scratch—they would be extracted from the source in which they exist and loaded—bringing both structure and data into the process. Then, the analytical transformations begin in answer to the organization’s business questions. This is a process known as Extract Transform Load (ETL). (Although ELT has become the preferred processing order, the original term stuck.)

Suppose the management team wanted to analyze which age groups (by decade) were opening which account types and they wanted to store the result in a separate table for an independent analysis.

The following diagram shows the resulting relational model of an object obtained through transformational analysis but provides no business context:

Figure 1.6 – A relational model of a transformational requirement

Figure 1.6 – A relational model of a transformational requirement

Although physical modeling could describe such a table (as seen in Figure 1.6)—containing the account type with age and count of accounts as integers—such a model would fail to communicate the most relevant details, presented here:

  • The logic used to perform the analysis
  • The relationship between the source tables and the output

The business requirement for ACCOUNT_TYPE_AGE_ANALYSIS in this example purposely excludes the source key fields from the target table, preventing the possibility of establishing any relational links. However, the relational model still serves a vital role: it tells us how the sources are related and how to join them correctly to produce the required analysis.

The logic could then be constructed by joining PERSON and ACCOUNT, as shown here:

CREATE TABLE account_types_age_analysis AS
SELECT
    a.account_type,
    ROUND(DATEDIFF(years, p.birth_date, CURRENT_DATE()), -1
    ) AS age_decade,
    COUNT(a.account_id) AS total_accounts
    FROM account AS  a
     INNER JOIN person AS p
     ON a.person_id = p.person_id
GROUP BY     1, 2;

Although there is no relational connection between ACCOUNT_TYPE_AGE_ANALYSIS and its sources, there is still a clear dependency on them and their columns. Instead of using ERDs, which convey entities and relationships, transformational pipelines are visualized through a lineage diagram. This type of diagram gives a column-level mapping from source to target, including all intermediate steps, as shown here:

Figure 1.7 – Transformational modeling seen visually

Figure 1.7 – Transformational modeling seen visually

Paired with the SQL logic used to construct it, the lineage graph gives a complete picture of the transformational relationship between sources and targets in an analytical/warehousing scenario.

Having witnessed both relational and analytical approaches to modeling, it is clear that both play a vital role in navigating the complex dynamic environments that one is liable to encounter in an enterprise-scale Snowflake environment.

Although we have only skimmed the surface of what modeling entails and the unique features of the Snowflake platform that can be leveraged to this end, this chapter has hopefully given you an idea of the vital role that modeling plays in building, maintaining, and documenting database systems. Before diving into the specifics of verbal, technical, and visual modeling semantics of modeling in the chapters to come, let’s review what we learned.

You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023 Publisher: Packt ISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}