Reader small image

You're reading from  Data Modeling with Snowflake

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837634453
Edition1st Edition
Right arrow
Author (1)
Serge Gershkovich
Serge Gershkovich
author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Right arrow

Speaking Modeling through Snowflake Objects

In its purest form, relational modeling (normalized tables with strictly enforced physical constraints) is most often found in online transaction processing (OLTP) databases. Transactional databases store the latest (as-is) version of business information, unlike data warehouses, which store historical snapshots and track changes in the information over time, allowing for additional (as-at) analysis across a temporal dimension.

However, this does not mean that relational modeling concepts do not apply in an online analytical processing (OLAP) database—quite the contrary. A data warehouse not only replicates existing entities and relations from transactional systems but also needs to manage the added task of conforming dimensions from other sources and joining them together in downstream transformations and analyses.

Another reason to master the common language of modeling is the Hybrid Unistore tables described in the previous...

Entities as tables

Before delving into database details, let’s recall the concept of an entity at the business level: a person, object, place, event, or concept relevant to the business for which an organization wants to maintain information. In other words, an entity is a business-relevant concept with common properties. A rule of thumb for identifying and naming entities is that they conform to singular English nouns, for example, customer, item, and reservation.

The obvious candidate for storing and maintaining information in Snowflake is a table. Through SQL, tables give users a standard and familiar way to access and manipulate entity details. As we saw in the last chapter, Snowflake tables come in several flavors, offering different backup and recovery options. Besides selecting a table type that provides adequate Time Travel and Fail-safe, Snowflake tables live up to the company’s claim of near-zero maintenance—there are no indexes, tablespaces, or partitions...

Attributes as columns

Recall from the previous section that an entity is a business-relevant concept for which an organization wishes to maintain information. Recall that attributes—defined with the business team during conceptual modeling or loaded from existing source data during the ETL process—are properties that describe the entity and are stored as columns. Attributes can be descriptive (such as NAME, ADDRESS, and QUANTITY) or metadata (such as ETL_SOURCE and LOAD_DATE).

The nature of the attribute—whether numeric, string, date, or other—is an essential detail for understanding the business requirement at the conceptual level and selecting the right data type at the physical level. Snowflake offers basic data types found in other databases (such as VARCHAR, DATE, and INTEGER) and less-common ones (such as VARIANT and GEOGRAPHY), which offer exciting possibilities for modeling and working with table contents.

Let us get to know Snowflake data...

Constraints and enforcement

The remainder of this chapter deals with table constraints, so it is necessary to understand what they are and to mention several important details regarding their use in Snowflake. In the ANSI-SQL standard, constraints define integrity and consistency rules for data stored in tables. Snowflake supports four constraint types:

  • PRIMARY KEY
  • UNIQUE
  • FOREIGN KEY
  • NOT NULL

Since the function of each of these constraints is covered later in this chapter, this section will be limited to their enforcement.

Enforcement, on the part of the database, means actively monitoring the integrity rules of a given constraint when DML operations are performed on a table. By enforcing a constraint, a database ensures that an error is raised when the constraint is violated, and the offending DML operation is not allowed to complete.

For example, a NOT NULL constraint on a column indicates that this column cannot contain NULL values. By enforcing...

Identifiers as primary keys

Tables store information for business entities using attributes of relevant data types. A row in the CUSTOMER table holds information for a given customer, and a row in the ORDERS table represents an order—or does it? Perhaps in this example, orders can contain multiple products and span just as many rows. To determine a unique instance of an entity, an identifier—or primary key (PK), if referring to a physical database—is used.

A PK is a column or set of columns whose values uniquely determine an instance of an entity. Only one PK can be defined per table. From a business perspective, a PK represents a single entity instance. To return to the previous example—is an order a single row containing one product, or does our organization allow multiple products per order? PKs inform database users of what that reality looks like at the table level.

The following figure shows some sample data from a fictitious ORDERS table.

...

Alternate keys as unique constraints

Suppose we were modeling an EMPLOYEE table that contains an EMPLOYEE_ID column—a unique business identifier—and Social Security numbers—government-issued personal identifiers. Either column would satisfy the PK requirement of uniquely identifying a record in the EMPLOYEE table, but recall that a table may only be assigned one PK. To let database users know that another column (or columns) satisfies the conditions for a PK when a PK already exists, alternate keys (AKs) or UNIQUE constraints can also be defined.

In the previous example, the EMPLOYEE table had two valid PK candidates: EMPLOYEE_ID and SOCIAL_SECURITY_ID. In OLTP databases, the column or columns that act as the organizational business key should be made the primary. In a data warehouse, where business keys from multiple source systems may be loaded, a surrogate key would be used instead. By this convention, the EMPLOYEE table should be modeled with EMPLOYEE_ID...

Relationships as foreign keys

Any business can be broken down into a set of entities and their interactions. For example, customers place orders for items provided by suppliers while applying promotion codes from an active marketing campaign—which is still a very narrow slice of what goes on in a typical organization. So far, this chapter has focused on the entities themselves: orders, items, and suppliers, for example. Now it is time to focus on the interactions—or relationships, as they are called in modeling parlance—such as placing orders, providing items, and applying promotions.

When business entities are related, their corresponding tables must have a way to capture the details of the interaction. When a customer orders an item, the order details must capture who the customer is and what items they ordered. Remember, PKs identify a unique record in a table. Therefore, when two tables share a relationship, the PKs of one must be included in the other to...

Mandatory columns as NOT NULL constraints

When defining attributes for an entity, the question of which ones are mandatory and which are optional inevitably arises. As with most modeling decisions, the answer depends on the business context more than any technical database property. The same attribute, for example, the email address for CUSTOMER, may be mandatory for an online store but optional for a brick-and-mortar retailer. In the latter case, not having an email address means missing sales announcements, while in the former, it may mean being unable to access the website.

When moving from a conceptual model to a physical Snowflake design, mandatory columns can be defined through the NOT NULL constraint. The NOT NULL constraint is declared inline next to the corresponding column and does not need to be given a name. Due to this, it is not possible to declare NOT NULL constraints out of line.

The format for adding a NOT NULL constraint to a column is as follows:

<col1_name...

Summary

This chapter discussed how to transition from logical modeling concepts to physical Snowflake objects. During this process, we learned how Snowflake handles tables of near-infinite size by breaking them down into manageable micro-partitions and how these partitions can be clustered to optimize query and DML performance.

After that, we learned how to define attributes by understanding Snowflake’s data types and their properties. Snowflake offers a variety of functions to make working with data types easier and more performant, to say nothing of the powerful options it offers for semi-structured data.

Before diving into individual constraint types, we understood what database constraints are and how Snowflake organizes and enforces them depending on the type of table where they are applied.

We saw why unique identifiers are vital for defining tables and how Snowflake manages this through the PK constraint. PKs help make life easier for database users by helping...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023Publisher: PacktISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich