What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Data Modeling with Snowflake

Unlocking the Power of Modeling

The word modeling has come to mean very different things in the half a century that it has been practiced in database systems. This opening chapter prefaces the book’s overall aim to demystify modeling, along with its applications, methodologies, and benefits. Throughout this journey, the concept of modeling will unfold into a set of methods and terms that help organizations design and manage data and, more importantly, help them understand themselves.

In its broadest interpretation, modeling is a selective simplification that aids in navigating or designing something more complex. Any system can be broken down into smaller, more manageable pieces. Manipulating any piece individually may be straightforward, but doing so without regard to an overall strategy is a tenuous proposal that is sure to encumber scalability and maintenance down the line.

While modeling is generally considered database-agnostic, modern cloud data platforms, such as Snowflake, present their users with many unique features thanks to their innovative architecture and consumption-based pricing. A clear and forward-looking design that takes advantage of the native features of the platform that supports it is the key to building cost-effective solutions capable of meeting and anticipating business needs.

As the analytical requirements of a data-driven organization are notoriously complex and constantly evolving, modeling must keep pace and accompany data teams from idea to execution. To achieve this, modeling must go beyond the structure and relationships of database tables and embrace the transformational logic that moves and shapes the underlying data. Only by leaning into the specifics of Snowflake features and architecture can a model be built efficiently from beginning to end.

In this chapter, we’re going to cover the following main topics:

Recognizing the utility of models in our daily lives
Getting a glimpse of modeling conventions in action
Getting acquainted with the tools in the modeling toolkit
Uncovering the benefits of modeling for enterprise teams
Incorporating modeling into strategic planning
Understanding modeling applications for transactional and analytical systems

Leveraging the modeling toolkit

Before we continue, we need to formally delineate three distinct concepts often used together in the service of modeling to make it simpler to refer to a specific tool in the modeling toolkit in later sections. By understanding where each piece fits in the broader domain of database design and management, diving into deeper technical concepts later in the book will become more meaningful and easier to digest.

The three components are listed here:

Natural language semantics—words
Technical semantics—SQL
Visual semantics—diagrams

Let’s discuss each of these in detail, as follows:

Natural language semantics: Terminology employed in communicating details of a model between people. These are agreed-upon words that employ pre-defined conventions to encapsulate more complex concepts in simpler terms. For example, when both parties involved in a verbal exchange understand the concept of a surrogate key, it saves them from having to explain that it is a unique identifier for a table record that holds no intrinsic business meaning, such as an integer sequence or a hash value.

To ensure effective technical conversations, it helps to be fluent in the semantics of modeling. Not only does it save time by succinctly communicating a complex concept, but it also saves even more time by not miscommunicating it. A waiter would return different foods when ordering chips in London rather than in Los Angeles. A properly modeled database would never return different records for the same surrogate key.

Technical semantics: SQL is a domain-specific language used to manage data in a Relational Database Management System (RDBMS). Unlike a general-purpose language (for example, YAML or Python), domain-specific languages have a much smaller application but offer much richer nuance and precision. While it can’t format a website or send an email, SQL allows us to create the structure of our database and manipulate its contents.

SQL bridges modeling concepts (expressed in words or images) and what is physically defined in the database. Snowflake uses an American National Standards Institute (ANSI)-compliant SQL syntax, meaning its basic commands (such as SELECT, UPDATE, DELETE, INSERT, and WHERE) are compatible with other database vendors who use this standard. Snowflake also offers many extra functions, clauses, and conventions that go beyond ANSI-standard SQL and give users added flexibility to manage the database.

Unfortunately, due to its domain-specific nature, SQL presents a significant limitation: it can only express what the database explicitly understands. While SQL can define table structure and precisely manipulate data, it is too detailed to easily articulate the underlying business requirements.

Visual semantics: Through their simplicity, images can convey a density of information that other forms of language simply cannot. In modeling, diagrams combine the domain-specific precision of SQL with the nuance of natural language. This gives diagrams a lot to work with to capture a data model’s business meaning and technical specifics.

To start, diagrams vary in the level of detail they present—giving the observer exactly what they’re looking for without overwhelming (or underwhelming) them with information. Most importantly, the semantic conventions used in diagrams are universal and can be understood by people besides data analysts and engineers. Yes—modeling diagrams are considered technical drawings; they represent strict technical concepts through agreed-upon visual conventions. However, in their simplest form, models can be understood almost intuitively with no prior knowledge. Even at the more advanced levels, such as logical and physical, learning to read a model is much simpler than learning SQL.

When all these semantics come together and are understood by the entire organization, they form a ubiquitous language, a concept first described by Eric Evans in Domain-Driven Design. Modeling then forms a part of the vocabulary that is understood universally throughout the organization to describe its business and store the data assets that support it. But that is just one of the many benefits that modeling provides.

The benefits of database modeling

Tactics without strategy is the noise before defeat. (Sun Tzu)

For many people, database modeling brings to mind stale diagrams, arcane symbols, or extra work at the end of a project. Only a decade ago, fueled by the rise of distributed computing in the early 2000s—which popularized the concept of big data—the notion that modeling is dead gained notoriety. More precisely, it was thought that cheap and near-limitless computing power had made planning and designing a thing of the past. It was said that flexible semi-structured data formats and the ability to parse them on the fly—known as schema-on-read—had made modeling obsolete.

Eventually, operating and maintenance costs caught up with reality and revealed two great shortcomings of the schema-on-read approach. One is that no matter how data is structured, it must be functionally bound to the business that it helps support. In other words, semi-structured formats are neither a panacea nor an excuse to forgo the process of business validation. The second—and most important—is that a model is not simply the shape that data takes once uploaded to a database, but rather, the blueprint for business operations, without which it is impossible to build sustainable architectures.

Sustainable solutions require a long-term strategy to ensure their design matches the underlying business model. Without this, schema-on-read (discussed in Chapter 15, Modeling Semi-Structured Data), star schema (discussed in Chapter 17, Scaling Data Models through Modern Techniques), or any other schema are narrow-sighted tactics that lead nowhere. But done right, modeling makes developing database architectures more agile and helps the project evolve from the idea phase to implementation. At every stage of development, the model serves as a guide for supporting the conversations necessary to propel the design into the next phase and provide additional business context. Once implemented, the model becomes a living document that helps users understand, navigate, and evolve the system it helped create.

While every organization models in the technical sense—creating tables and transforming data—not everyone models strategically, end to end, in the broad sense of the word—thereby foregoing the long-term benefits. Some of these benefits include the following:

Consensus and visibility of the broader business model
More productive conversations with business teams
Better quality of requirements
Higher signal, lower noise in technical conversations
Cross-platform, cross-domain, and widely understood conventions
Big-picture visual overview of the business and its database footprint
Preliminary designs become implementation blueprints
Accelerating onboarding of new team members
Making data more accessible and unlocking self-service within organizations
Keeping the database landscape manageable at scale
Getting a handle on complex data pipelines

To demonstrate the difficulties of working without formal modeling, we can take a simple schema based on Snowflake’s shared TPC-H dataset (available in the shared database called SNOWFLAKE_SAMPLE_DATA), which, at first glance, looks like this:

Figure 1.2 – A list of tables in the Snowsight UI

While these tables have been modeled in the strict sense of the word and even contain data, we get very little information on what that data represents, how it relates to data in other tables, or where it fits in the broad context of business operations.

Intuition suggests that SALES_ORDER and CUSTOMER share a relationship, but this assertion needs to be tested. Even in this trivial example of only eight tables, it will take considerable time to thoroughly sift through the data to understand its context.

The irony is that many of the details we’re looking for are already baked into the design of the physical tables, having been modeled at some point in the past. We just can’t see them. Without a map, the terrain is lost from view.

Here is the same set of tables visualized through a modeling convention called an Entity-Relationship Diagram (ERD):

Figure 1.3 – A conceptual model using crow’s foot notation

At a glance, the big picture comes into focus. Diagrams such as this one allow us to understand the business concepts behind the data and ensure they are aligned. Having a visual model also lets us zoom out from individual tables and understand the semantics of our business: What are the individual pieces involved and how do they interact? This global perspective gives everyone in the organization a means of finding and making sense of data assets without requiring a technical background—thus, business analysts or new hires can unlock the value of the information without any help from the data team.

As the organization grows, expanding in personnel and data assets, it will inevitably become too big for any person, or even a team of people, to coordinate. Here, organizations that have embraced data modeling will stand out from those that did not. Modeling can be the thing that helps organizations scale their data landscape, or it can be the technical debt that holds them back.

Yet, for all its benefits, modeling is not a cookie-cutter solution that guarantees success. There are many approaches to modeling and various modeling methodologies that are suited for different workloads. Throughout this book, we will tackle the fundamentals of modeling that will allow you to understand these differences and apply the best solution using a first-principles approach. First, we will begin by breaking down the two main database use cases and observing the role modeling plays in each of them.

Operational and analytical modeling scenarios

The relational database as we know it today emerged in the 1970s—allowing organizations to store their data in a centralized repository instead of on individual tapes. Later that decade, Online Transaction Processing (OLTP) emerged, enabling faster access to data and unlocking new uses for databases such as booking and bank teller systems. This was a paradigm shift for databases, which evolved from data archives to operational systems.

Due to limited resources, data analysis could not be performed on the same database that ran operational processes. The need to analyze operational data gave rise, in the 1980s, to Management Information Systems (MIS), or Decision Support Systems (DSS) as they later became known. Data would be extracted from the operational database to the DSS, where it could be analyzed according to business needs. OLTP architecture is not best suited for the latter case, so Online Analytical Processing (OLAP) emerged to enable users to analyze multidimensional data from multiple perspectives using complex queries. This is the same paradigm used today by modern data platforms such as Snowflake.

The approach to storing and managing data in OLAP systems fundamentally differs from the operational or transactional database. Data in OLAP systems is generally stored in data warehouses (also known as DWs or DWHs)—centralized repositories that store structured data from various sources for the purpose of analysis and decision-making. While the transactional system keeps the up-to-date version of the truth and is generally concerned with individual records, the data warehouse snapshots many historical versions and aggregates volumes of data to satisfy various analytical needs.

Data originates in the transactional database when daily business operations (for example, bookings, sales, withdrawals) are recorded. In contrast, the warehouse does not create but rather loads extracted information from one or various source systems. The functional differences between transactional databases and warehouses present different modeling challenges.

A transactional system must be modeled to fit the nature of the data it is expected to process. This means knowing the format, relationships, and attributes required for a transaction.

The main concern of a transactional database model is the structure and relationships between its tables.

By contrast, the data warehouse loads existing data from the source system. A data warehouse isn’t concerned with defining a single transaction but with analyzing multitudes of transactions across various dimensions to answer business questions. To do this, a data warehouse must transform the source data to satisfy multiple business analyses, which often means creating copies with varying granularity and detail.

Modeling in a data warehouse builds upon the relational models of its source systems by conforming common elements and transforming the data using logic.

Wait—if transformational logic is a core concept in data warehouse modeling, why is it so consistently absent from modeling discussions? Because in order to do transformational modeling justice, one must forgo the universality of general modeling principles and venture into the realm of platform specifics (that is, syntax, storage, and memory utilization). This book, in contrast, will embrace Snowflake specifics and go beyond physical modeling by diving into the transformation logic behind the physical tables. This approach provides a fuller understanding of the underlying modeling concepts and equips the reader with the required SQL recipes to not only build models but to load and automate them in the most efficient way possible. As we’ll see in later chapters, this is where Snowflake truly shines and confers performance and cost-saving benefits.

Is Snowflake limited to OLAP?

Snowflake’s primary use case is that of a data warehouse—with all the OLAP properties to enable multidimensional analysis at scale over massive datasets. However, at the 2022 Snowflake Summit, the company announced a new table type called Hybrid Unistore, which features both an OLTP-storage table and an OLAP analysis table under one semantic object. This announcement means Snowflake users can now design transactional OLTP database schemas while leveraging the analytical performance that Snowflake is known for. Hybrid Unistore tables are discussed in more detail in later chapters.

Although OLAP and OLTP systems are optimized for different kinds of database operations, they are still databases at heart and operate on the same set of objects (such as tables, constraints, and views) using SQL. However, each use case requires very different approaches to modeling the data within. The following section demonstrates what modeling will typically look like in each scenario.

A look at relational and transformational modeling

The previous section describes how modeling varies between operational and data warehouse scenarios. Before exploring the modeling process in detail, it’s helpful to understand the look and feel of relational and transformational modeling and what we’re working toward. Before proceeding, it would help to summarize the main differences between transactional databases and data warehouses. You can see what these are in the following table:

Transactional Database	Data Warehouse
Supports daily operations	Provides operational insight
Operates on single records	Summarizes many records
Accurate as of the present instant	Historical snapshots over time
Single source of truth (SSOT), non-redundant	Redundant to support different analyses
Data models defined by business operations	Data models generated by business questions
Static and structured data model	Inherited structure and dynamically transformed data model
Single-application data	Multiple sources of converging data

Figure 1.4 – Common differences between transactional databases and warehouses

Given these differences, the following sections demonstrate what modeling looks like in each system and what it aims to achieve.

What modeling looks like in operational systems

Completely ignoring the modeling workflow that got us here, which will be covered in later chapters, we can observe an example of the type of modeling most commonly seen in transactional systems. The physical diagram in Figure 1.5 serves both as a blueprint for declaring the required tables and a guide to understanding their business context.

Following modeling conventions (don’t worry if they are still unfamiliar—they will be covered thoroughly in the coming chapters), we can infer a lot of information from this simple diagram. For example, a person is uniquely identified by an eight-digit identifier (the primary key) and must have a Social Security number (SSN), driver’s license, name, and birth date.

The one-to-many relationship between the two tables establishes that while a person does not necessarily need to have an account created, an account must belong to just one person:

Figure 1.5 – A physical model using crow’s foot notation

These details, combined with the list of attributes, data types, and constraints, not only dictate what kinds of data can be written to these tables but also provide an idea of how the business operates. So, how does this differ in analytical databases?

What modeling looks like in analytical systems

In a data warehouse scenario, the PERSON and ACCOUNT tables would not be defined from scratch—they would be extracted from the source in which they exist and loaded—bringing both structure and data into the process. Then, the analytical transformations begin in answer to the organization’s business questions. This is a process known as Extract Transform Load (ETL). (Although ELT has become the preferred processing order, the original term stuck.)

Suppose the management team wanted to analyze which age groups (by decade) were opening which account types and they wanted to store the result in a separate table for an independent analysis.

The following diagram shows the resulting relational model of an object obtained through transformational analysis but provides no business context:

Figure 1.6 – A relational model of a transformational requirement

Although physical modeling could describe such a table (as seen in Figure 1.6)—containing the account type with age and count of accounts as integers—such a model would fail to communicate the most relevant details, presented here:

The logic used to perform the analysis
The relationship between the source tables and the output

The business requirement for ACCOUNT_TYPE_AGE_ANALYSIS in this example purposely excludes the source key fields from the target table, preventing the possibility of establishing any relational links. However, the relational model still serves a vital role: it tells us how the sources are related and how to join them correctly to produce the required analysis.

The logic could then be constructed by joining PERSON and ACCOUNT, as shown here:

CREATE TABLE account_types_age_analysis AS
SELECT
    a.account_type,
    ROUND(DATEDIFF(years, p.birth_date, CURRENT_DATE()), -1
    ) AS age_decade,
    COUNT(a.account_id) AS total_accounts
    FROM account AS  a
     INNER JOIN person AS p
     ON a.person_id = p.person_id
GROUP BY     1, 2;

Although there is no relational connection between ACCOUNT_TYPE_AGE_ANALYSIS and its sources, there is still a clear dependency on them and their columns. Instead of using ERDs, which convey entities and relationships, transformational pipelines are visualized through a lineage diagram. This type of diagram gives a column-level mapping from source to target, including all intermediate steps, as shown here:

Figure 1.7 – Transformational modeling seen visually

Paired with the SQL logic used to construct it, the lineage graph gives a complete picture of the transformational relationship between sources and targets in an analytical/warehousing scenario.

Having witnessed both relational and analytical approaches to modeling, it is clear that both play a vital role in navigating the complex dynamic environments that one is liable to encounter in an enterprise-scale Snowflake environment.

Although we have only skimmed the surface of what modeling entails and the unique features of the Snowflake platform that can be leveraged to this end, this chapter has hopefully given you an idea of the vital role that modeling plays in building, maintaining, and documenting database systems. Before diving into the specifics of verbal, technical, and visual modeling semantics of modeling in the chapters to come, let’s review what we learned.

Key benefits

Learn core modeling techniques tied to practical examples using native Snowflake architecture

Adopt a universal modeling language to communicate business value to functional teams

Go beyond physical modeling with SQL recipes to transform and shape your Snowflake data

Description

The Snowflake Data Cloud is one of the fastest-growing platforms for data warehousing and application workloads. Snowflake's scalable, cloud-native architecture and expansive set of features and objects enables you to deliver data solutions quicker than ever before. Yet, we must ensure that these solutions are developed using recommended design patterns and accompanied by documentation that’s easily accessible to everyone in the organization. This book will help you get familiar with simple and practical data modeling frameworks that accelerate agile design and evolve with the project from concept to code. These universal principles have helped guide database design for decades, and this book pairs them with unique Snowflake-native objects and examples like never before – giving you a two-for-one crash course in theory as well as direct application. By the end of this Snowflake book, you’ll have learned how to leverage Snowflake’s innovative features, such as time travel, zero-copy cloning, and change-data-capture, to create cost-effective, efficient designs through time-tested modeling principles that are easily digestible when coupled with real-world examples.

Who is this book for?

This book is for developers working with SQL who are looking to build a strong foundation in modeling best practices and gain an understanding of where they can be effectively applied to save time and effort. Whether you’re an ace in SQL logic or starting out in database design, this book will equip you with the practical foundations of data modeling to guide you on your data journey with Snowflake. Developers who’ve recently discovered Snowflake will be able to uncover its core features and learn to incorporate them into universal modeling frameworks.

What you will learn

Discover the time-saving benefits and applications of data modeling

Learn about Snowflake's cloud-native architecture and its features

Understand and apply modeling techniques using Snowflake objects

Universal modeling concepts and language through Snowflake objects

Get comfortable reading and transforming semistructured data

Learn directly with pre-built recipes and examples

Learn to apply modeling frameworks from Star to Data Vault

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Frequently bought together

$51.99

$64.99

$49.99

Total $ 166.97

Filter reviews by

All

Packt verified reviews

Feefo verified reviews

Amazon verified reviews

Ed Jul 18, 2023

Took me three days to read it. Super informative and helpful.

Amazon Verified review

Amazon Customer Jul 17, 2023

kashif Aug 21, 2023

The book is very comprehensive and takes you through step by step, It depends on where you like to start if you fall in beginner category then go from first chapter/ intermediate then start from chapter 7, Even through I feel it first edition is for beginner/intermediate level people but I would still say anyone can pick it and give a read. i haven't seen any book coming close to this book anywhereIt definitely emphasis the best recipes, these recipes can be still applied in other modern DWH.I would rate it 4.5 and encourage everyone to go for it, Even through it doesn't build complex modules and break the ice , Its still worth reading, I hope second edition will give more focus on breaking the ice with complex examples and give practical examples for data mesh. Since the whole book is practical was expecting something in data vault / data mesh also in deep.Eagerly waiting for second edition

Anna Jul 19, 2023

Data modeling has been partially forgotten in the cloud data warehouse scenario and now making a legendary comeback. At the same time, data professionals have been raving about Snowflake Data Cloud.This book offers a clear and practical approach to understanding the combo: data modeling within the Snowflake realm. From fundamental principles to real examples, it's a must-have resource for both beginners and experienced professionals.The author is a good writer. He does a fantastic job explaining hard technical concepts in an easy-to-read & understandable format. We need more educational materials like this.Highly recommended!

Marie Aug 10, 2023

Not being from a technical background, this book explains data modeling, the importance of it and the process and an understandable way. I was able to really learn and take in a lot of information, and I did not feel overwhelmed as I had expected to. Great read!

Data Modeling with Snowflake: A practical guide to accelerating Snowflake development using universal data modeling techniques

What do you get with Print?

Data Modeling with Snowflake

Unlocking the Power of Modeling

Technical requirements

Modeling with purpose

Leveraging the modeling toolkit

The benefits of database modeling

Operational and analytical modeling scenarios

A look at relational and transformational modeling

What modeling looks like in operational systems

What modeling looks like in analytical systems

Summary

Further reading

References

Page 1 of 10

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs

Data Modeling with Snowflake: A practical guide to accelerating Snowflake development using universal data modeling techniques

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access