Reader small image

You're reading from  Data Modeling with Snowflake

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837634453
Edition1st Edition
Right arrow
Author (1)
Serge Gershkovich
Serge Gershkovich
author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Right arrow

Putting Transformational Modeling into Practice

In the preceding chapters, we went from gathering requirements with business teams to creating and deploying a physical data model to Snowflake, which aligns with our organization’s operations. Now it is time to leverage Snowflake’s powerful query processing engine and its full-featured library of functions and data manipulation features to creatively transform data to answer business questions.

While physical modeling creates objects by defining the structure, transformational modeling uses logic—selecting existing data and creating a new object from the query result. However, query processing in Snowflake comes at the cost of compute credits. This chapter will cover the best practices for writing efficient queries in Snowflake to help control costs and increase performance.

As we build transformational models, we will also learn how to monitor their performance and detect issues using the Snowflake query profile...

Technical requirements

The examples and exercises in this chapter require data to execute. The script used to load the model created in the previous chapter with sample data is create_physical_model_w_data.sql, and is included in the following GitHub repository: https://github.com/PacktPublishing/Data-Modeling-with-Snowflake/tree/main/ch12. The complete transformational model for the exercise at the end of the chapter is also included (create_loyalty_customer.sql), although readers are encouraged to recreate it from scratch.

Separating the model from the object

The ability to instantly scale up warehouses gives Snowflake users easy control over query performance and duration. However, increased warehouse size comes at the price of compute credits. Even keeping the warehouse size constant, changes in data volume and query patterns can cause performant and cost-effective data sources to degrade. To mitigate performance degradation, a view may need to be materialized as a table, or a table may need to become a materialized view.

However, even when converting from a view to a table, the transformational logic stays constant. While traditional modeling advice advocates differentiating views and other objects through suffixes (e.g., CUSTOMER_V), Snowflake users are encouraged to avoid such conventions. Orienting object names to their contents (e.g., CUSTOMER, DIM_DATE) rather than their object type allows modelers to easily pivot between them without breaking downstream dependencies.

Shaping transformations through relationships

The exercises in previous chapters devoted much attention to relational constraints and how they help ensure data integrity, consistency, and quality while making data management easier. Exactly how much easier will be the recurring theme of this chapter.

Even simple business questions, such as What is the customer and location name for the customer with identifier 775699?, require us to refer to the relational model to find the tables in which these attributes are stored and the foreign key columns that can be used to join them.

Figure 12.1 – Foreign key constraint between location and customer tables

Figure 12.1 – Foreign key constraint between location and customer tables

Using the information from the physical model, we can construct a query to obtain customer and location names for customer 775699 as follows:

SELECT c.name AS customer_name , l.name AS location_name FROM customer c
JOIN location l ON c.location_id = l.location_id
WHERE customer_id = 775699;

And...

Join elimination using constraints

In the previous chapter, we set the RELY property on our constraints to pave the way for the performance gains we will now explore. Snowflake uses the RELY property to perform join elimination—avoiding unnecessary or redundant joins in queries. Depending on table size, joins can be memory-intensive, so avoiding them when possible can significantly improve performance and save compute credits.

Even if a join is specified in the query, but no columns from the joined table are selected as part of the result, the RELY property will tell the Snowflake query engine to avoid performing the join.

If we modify the previous query—joining CUSTOMER and LOCATION tables—but only request information from CUSTOMER, the RELY property will help us avoid the unnecessary join operation.

Figure 12.3 – A query with join elimination

Figure 12.3 – A query with join elimination

A look at the query profile (which we will explore later in this chapter) confirms...

Joins and set operators

In a database, there are two ways to bring sources of data together: joins and set operators. A join combines rows from two or more tables based on related columns, and a set operator—such as UNION—combines the results of multiple SELECT statements. But, you can also use join on multiple SELECT statements.

An easier way to think about it is that joins combine data horizontally—across related rows—and set operators work vertically. The following example shows how identical records would look in a JOIN or UNION result.

Figure 12.8 – An example of a JOIN and a UNION operation

Figure 12.8 – An example of a JOIN and a UNION operation

JOIN and UNION are just some of the tools at the developer’s disposal. Let’s review all the join and set operators that Snowflake provides and briefly cover some of their use cases.

The following diagram shows the join and set operators available in Snowflake.

Figure 12.9 – Snowflake joins and set operators

Figure 12.9 –...

Performance considerations and monitoring

Because the Snowflake services layer takes care of query tuning and optimization, writing efficient queries and following best practices should be the developer’s only concern. To do this, Snowflake provides a visual query profile to break down query execution into constituent substeps and help identify potential issues. A sample query profile can be seen in the following screenshot:

Figure 12.10 – Snowflake query profile

Figure 12.10 – Snowflake query profile

A good way to familiarize yourself with the query profile screen is by reviewing the kinds of query issues that it is designed to identify. The numbers in the query profile correspond to the topics listed in the following section.

Common query problems

Most day-to-day query issues can be identified by paying attention to the information displayed in the query profile and following the query best practices described here:

  • Queuing – When Snowflake determines that...

Putting transformational modeling into practice

In the last chapter, we deployed our sample physical model. Here, we will populate it with data and create a transformational model to satisfy a business requirement using the techniques and best practices covered in the preceding sections.

Transformational modeling requires data. The script to populate our operational schema with sample data can be found in the Git repository for this chapter. Please run the script titled create_physical_model_w_data.sql to recreate the physical model and load it with data from the SNOWFLAKE_SAMPLE_DATA database if you have not already done so.

After running the script, all the transactional tables in the schema will have been populated with data. However, the LOYALTY_CUSTOMER table is not transactional; it needs to be created through a transformational model. Just as the data model in the previous chapters took shape after getting to know the workings of our company by communicating with the business...

Summary

While a physical model reflects the company’s business model, transformational models typically focus on addressing analytical business questions. Transformational models depend on existing data to perform their functions and, since data grows and changes over time, the object type used to represent the underlying logic may also change. By keeping object names constant, users can make adjustments and pivot between different database objects without breaking downstream processes.

The relationships established in the physical models can inform transformational designs as well as improving performance through join elimination by using the RELY property. Users can track performance to spot potential query issues using the query profile in the Snowflake UI. The query profile is a valuable tool for identifying performance issues such as exploding joins and inefficient pruning.

While transformational modeling is performed on top of the physical model, the two should always...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023Publisher: PacktISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich