You're reading from Data Modeling with Snowflake

Product type Book

Published in May 2023

Publisher Packt

ISBN-13 9781837634453

Pages 324 pages

Edition 1st Edition

Languages

Concepts

Data Engineering

Author (1):

Serge Gershkovich

Table of Contents (24) Chapters

Preface

1. Part 1: Core Concepts in Data Modeling and Snowflake Architecture

2. Chapter 1: Unlocking the Power of Modeling

3. Chapter 2: An Introduction to the Four Modeling Types

4. Chapter 3: Mastering Snowflake’s Architecture

5. Chapter 4: Mastering Snowflake Objects

6. Chapter 5: Speaking Modeling through Snowflake Objects

7. Chapter 6: Seeing Snowflake’s Architecture through Modeling Notation

8. Part 2: Applied Modeling from Idea to Deployment

9. Chapter 7: Putting Conceptual Modeling into Practice

10. Chapter 8: Putting Logical Modeling into Practice

11. Chapter 9: Database Normalization

12. Chapter 10: Database Naming and Structure

13. Chapter 11: Putting Physical Modeling into Practice

14. Part 3: Solving Real-World Problems with Transformational Modeling

15. Chapter 12: Putting Transformational Modeling into Practice

16. Chapter 13: Modeling Slowly Changing Dimensions

17. Chapter 14: Modeling Facts for Rapid Analysis

18. Chapter 15: Modeling Semi-Structured Data

19. Chapter 16: Modeling Hierarchies

20. Chapter 17: Scaling Data Models through Modern Techniques

21. Index

Why subscribe?

22. Other Books You May Enjoy

Appendix

Speaking Modeling through Snowflake Objects

In its purest form, relational modeling (normalized tables with strictly enforced physical constraints) is most often found in online transaction processing (OLTP) databases. Transactional databases store the latest (as-is) version of business information, unlike data warehouses, which store historical snapshots and track changes in the information over time, allowing for additional (as-at) analysis across a temporal dimension.

However, this does not mean that relational modeling concepts do not apply in an online analytical processing (OLAP) database—quite the contrary. A data warehouse not only replicates existing entities and relations from transactional systems but also needs to manage the added task of conforming dimensions from other sources and joining them together in downstream transformations and analyses.

Another reason to master the common language of modeling is the Hybrid Unistore tables described in the previous...

Entities as tables

Before delving into database details, let’s recall the concept of an entity at the business level: a person, object, place, event, or concept relevant to the business for which an organization wants to maintain information. In other words, an entity is a business-relevant concept with common properties. A rule of thumb for identifying and naming entities is that they conform to singular English nouns, for example, customer, item, and reservation.

The obvious candidate for storing and maintaining information in Snowflake is a table. Through SQL, tables give users a standard and familiar way to access and manipulate entity details. As we saw in the last chapter, Snowflake tables come in several flavors, offering different backup and recovery options. Besides selecting a table type that provides adequate Time Travel and Fail-safe, Snowflake tables live up to the company’s claim of near-zero maintenance—there are no indexes, tablespaces, or partitions...

Attributes as columns

Recall from the previous section that an entity is a business-relevant concept for which an organization wishes to maintain information. Recall that attributes—defined with the business team during conceptual modeling or loaded from existing source data during the ETL process—are properties that describe the entity and are stored as columns. Attributes can be descriptive (such as NAME, ADDRESS, and QUANTITY) or metadata (such as ETL_SOURCE and LOAD_DATE).

The nature of the attribute—whether numeric, string, date, or other—is an essential detail for understanding the business requirement at the conceptual level and selecting the right data type at the physical level. Snowflake offers basic data types found in other databases (such as VARCHAR, DATE, and INTEGER) and less-common ones (such as VARIANT and GEOGRAPHY), which offer exciting possibilities for modeling and working with table contents.

Let us get to know Snowflake data...

Constraints and enforcement

The remainder of this chapter deals with table constraints, so it is necessary to understand what they are and to mention several important details regarding their use in Snowflake. In the ANSI-SQL standard, constraints define integrity and consistency rules for data stored in tables. Snowflake supports four constraint types:

PRIMARY KEY
UNIQUE
FOREIGN KEY
NOT NULL

Since the function of each of these constraints is covered later in this chapter, this section will be limited to their enforcement.

Enforcement, on the part of the database, means actively monitoring the integrity rules of a given constraint when DML operations are performed on a table. By enforcing a constraint, a database ensures that an error is raised when the constraint is violated, and the offending DML operation is not allowed to complete.

For example, a NOT NULL constraint on a column indicates that this column cannot contain NULL values. By enforcing...

Identifiers as primary keys

Tables store information for business entities using attributes of relevant data types. A row in the CUSTOMER table holds information for a given customer, and a row in the ORDERS table represents an order—or does it? Perhaps in this example, orders can contain multiple products and span just as many rows. To determine a unique instance of an entity, an identifier—or primary key (PK), if referring to a physical database—is used.

A PK is a column or set of columns whose values uniquely determine an instance of an entity. Only one PK can be defined per table. From a business perspective, a PK represents a single entity instance. To return to the previous example—is an order a single row containing one product, or does our organization allow multiple products per order? PKs inform database users of what that reality looks like at the table level.

The following figure shows some sample data from a fictitious ORDERS table.

...

Alternate keys as unique constraints

Suppose we were modeling an EMPLOYEE table that contains an EMPLOYEE_ID column—a unique business identifier—and Social Security numbers—government-issued personal identifiers. Either column would satisfy the PK requirement of uniquely identifying a record in the EMPLOYEE table, but recall that a table may only be assigned one PK. To let database users know that another column (or columns) satisfies the conditions for a PK when a PK already exists, alternate keys (AKs) or UNIQUE constraints can also be defined.

In the previous example, the EMPLOYEE table had two valid PK candidates: EMPLOYEE_ID and SOCIAL_SECURITY_ID. In OLTP databases, the column or columns that act as the organizational business key should be made the primary. In a data warehouse, where business keys from multiple source systems may be loaded, a surrogate key would be used instead. By this convention, the EMPLOYEE table should be modeled with EMPLOYEE_ID...

Relationships as foreign keys

Any business can be broken down into a set of entities and their interactions. For example, customers place orders for items provided by suppliers while applying promotion codes from an active marketing campaign—which is still a very narrow slice of what goes on in a typical organization. So far, this chapter has focused on the entities themselves: orders, items, and suppliers, for example. Now it is time to focus on the interactions—or relationships, as they are called in modeling parlance—such as placing orders, providing items, and applying promotions.

When business entities are related, their corresponding tables must have a way to capture the details of the interaction. When a customer orders an item, the order details must capture who the customer is and what items they ordered. Remember, PKs identify a unique record in a table. Therefore, when two tables share a relationship, the PKs of one must be included in the other to...

Mandatory columns as NOT NULL constraints

When defining attributes for an entity, the question of which ones are mandatory and which are optional inevitably arises. As with most modeling decisions, the answer depends on the business context more than any technical database property. The same attribute, for example, the email address for CUSTOMER, may be mandatory for an online store but optional for a brick-and-mortar retailer. In the latter case, not having an email address means missing sales announcements, while in the former, it may mean being unable to access the website.

When moving from a conceptual model to a physical Snowflake design, mandatory columns can be defined through the NOT NULL constraint. The NOT NULL constraint is declared inline next to the corresponding column and does not need to be given a name. Due to this, it is not possible to declare NOT NULL constraints out of line.

The format for adding a NOT NULL constraint to a column is as follows:

<col1_name...

Summary

This chapter discussed how to transition from logical modeling concepts to physical Snowflake objects. During this process, we learned how Snowflake handles tables of near-infinite size by breaking them down into manageable micro-partitions and how these partitions can be clustered to optimize query and DML performance.

After that, we learned how to define attributes by understanding Snowflake’s data types and their properties. Snowflake offers a variety of functions to make working with data types easier and more performant, to say nothing of the powerful options it offers for semi-structured data.

Before diving into individual constraint types, we understood what database constraints are and how Snowflake organizes and enforces them depending on the type of table where they are applied.

We saw why unique identifiers are vital for defining tables and how Snowflake manages this through the PK constraint. PKs help make life easier for database users by helping...