Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Modeling with Snowflake

You're reading from  Data Modeling with Snowflake

Product type Book
Published in May 2023
Publisher Packt
ISBN-13 9781837634453
Pages 324 pages
Edition 1st Edition
Languages
Author (1):
Serge Gershkovich Serge Gershkovich
Profile icon Serge Gershkovich

Table of Contents (24) Chapters

Preface 1. Part 1: Core Concepts in Data Modeling and Snowflake Architecture
2. Chapter 1: Unlocking the Power of Modeling 3. Chapter 2: An Introduction to the Four Modeling Types 4. Chapter 3: Mastering Snowflake’s Architecture 5. Chapter 4: Mastering Snowflake Objects 6. Chapter 5: Speaking Modeling through Snowflake Objects 7. Chapter 6: Seeing Snowflake’s Architecture through Modeling Notation 8. Part 2: Applied Modeling from Idea to Deployment
9. Chapter 7: Putting Conceptual Modeling into Practice 10. Chapter 8: Putting Logical Modeling into Practice 11. Chapter 9: Database Normalization 12. Chapter 10: Database Naming and Structure 13. Chapter 11: Putting Physical Modeling into Practice 14. Part 3: Solving Real-World Problems with Transformational Modeling
15. Chapter 12: Putting Transformational Modeling into Practice 16. Chapter 13: Modeling Slowly Changing Dimensions 17. Chapter 14: Modeling Facts for Rapid Analysis 18. Chapter 15: Modeling Semi-Structured Data 19. Chapter 16: Modeling Hierarchies 20. Chapter 17: Scaling Data Models through Modern Techniques 21. Index 22. Other Books You May Enjoy Appendix

Putting Transformational Modeling into Practice

In the preceding chapters, we went from gathering requirements with business teams to creating and deploying a physical data model to Snowflake, which aligns with our organization’s operations. Now it is time to leverage Snowflake’s powerful query processing engine and its full-featured library of functions and data manipulation features to creatively transform data to answer business questions.

While physical modeling creates objects by defining the structure, transformational modeling uses logic—selecting existing data and creating a new object from the query result. However, query processing in Snowflake comes at the cost of compute credits. This chapter will cover the best practices for writing efficient queries in Snowflake to help control costs and increase performance.

As we build transformational models, we will also learn how to monitor their performance and detect issues using the Snowflake query profile...

Technical requirements

The examples and exercises in this chapter require data to execute. The script used to load the model created in the previous chapter with sample data is create_physical_model_w_data.sql, and is included in the following GitHub repository: https://github.com/PacktPublishing/Data-Modeling-with-Snowflake/tree/main/ch12. The complete transformational model for the exercise at the end of the chapter is also included (create_loyalty_customer.sql), although readers are encouraged to recreate it from scratch.

Separating the model from the object

The ability to instantly scale up warehouses gives Snowflake users easy control over query performance and duration. However, increased warehouse size comes at the price of compute credits. Even keeping the warehouse size constant, changes in data volume and query patterns can cause performant and cost-effective data sources to degrade. To mitigate performance degradation, a view may need to be materialized as a table, or a table may need to become a materialized view.

However, even when converting from a view to a table, the transformational logic stays constant. While traditional modeling advice advocates differentiating views and other objects through suffixes (e.g., CUSTOMER_V), Snowflake users are encouraged to avoid such conventions. Orienting object names to their contents (e.g., CUSTOMER, DIM_DATE) rather than their object type allows modelers to easily pivot between them without breaking downstream dependencies.

Shaping transformations through relationships

The exercises in previous chapters devoted much attention to relational constraints and how they help ensure data integrity, consistency, and quality while making data management easier. Exactly how much easier will be the recurring theme of this chapter.

Even simple business questions, such as What is the customer and location name for the customer with identifier 775699?, require us to refer to the relational model to find the tables in which these attributes are stored and the foreign key columns that can be used to join them.

Figure 12.1 – Foreign key constraint between location and customer tables

Figure 12.1 – Foreign key constraint between location and customer tables

Using the information from the physical model, we can construct a query to obtain customer and location names for customer 775699 as follows:

SELECT c.name AS customer_name , l.name AS location_name FROM customer c
JOIN location l ON c.location_id = l.location_id
WHERE customer_id = 775699;

And...

Join elimination using constraints

In the previous chapter, we set the RELY property on our constraints to pave the way for the performance gains we will now explore. Snowflake uses the RELY property to perform join elimination—avoiding unnecessary or redundant joins in queries. Depending on table size, joins can be memory-intensive, so avoiding them when possible can significantly improve performance and save compute credits.

Even if a join is specified in the query, but no columns from the joined table are selected as part of the result, the RELY property will tell the Snowflake query engine to avoid performing the join.

If we modify the previous query—joining CUSTOMER and LOCATION tables—but only request information from CUSTOMER, the RELY property will help us avoid the unnecessary join operation.

Figure 12.3 – A query with join elimination

Figure 12.3 – A query with join elimination

A look at the query profile (which we will explore later in this chapter) confirms...

Joins and set operators

In a database, there are two ways to bring sources of data together: joins and set operators. A join combines rows from two or more tables based on related columns, and a set operator—such as UNION—combines the results of multiple SELECT statements. But, you can also use join on multiple SELECT statements.

An easier way to think about it is that joins combine data horizontally—across related rows—and set operators work vertically. The following example shows how identical records would look in a JOIN or UNION result.

Figure 12.8 – An example of a JOIN and a UNION operation

Figure 12.8 – An example of a JOIN and a UNION operation

JOIN and UNION are just some of the tools at the developer’s disposal. Let’s review all the join and set operators that Snowflake provides and briefly cover some of their use cases.

The following diagram shows the join and set operators available in Snowflake.

Figure 12.9 – Snowflake joins and set operators

Figure 12.9 –...

Performance considerations and monitoring

Because the Snowflake services layer takes care of query tuning and optimization, writing efficient queries and following best practices should be the developer’s only concern. To do this, Snowflake provides a visual query profile to break down query execution into constituent substeps and help identify potential issues. A sample query profile can be seen in the following screenshot:

Figure 12.10 – Snowflake query profile

Figure 12.10 – Snowflake query profile

A good way to familiarize yourself with the query profile screen is by reviewing the kinds of query issues that it is designed to identify. The numbers in the query profile correspond to the topics listed in the following section.

Common query problems

Most day-to-day query issues can be identified by paying attention to the information displayed in the query profile and following the query best practices described here:

  • Queuing – When Snowflake determines that...

Putting transformational modeling into practice

In the last chapter, we deployed our sample physical model. Here, we will populate it with data and create a transformational model to satisfy a business requirement using the techniques and best practices covered in the preceding sections.

Transformational modeling requires data. The script to populate our operational schema with sample data can be found in the Git repository for this chapter. Please run the script titled create_physical_model_w_data.sql to recreate the physical model and load it with data from the SNOWFLAKE_SAMPLE_DATA database if you have not already done so.

After running the script, all the transactional tables in the schema will have been populated with data. However, the LOYALTY_CUSTOMER table is not transactional; it needs to be created through a transformational model. Just as the data model in the previous chapters took shape after getting to know the workings of our company by communicating with the business...

Summary

While a physical model reflects the company’s business model, transformational models typically focus on addressing analytical business questions. Transformational models depend on existing data to perform their functions and, since data grows and changes over time, the object type used to represent the underlying logic may also change. By keeping object names constant, users can make adjustments and pivot between different database objects without breaking downstream processes.

The relationships established in the physical models can inform transformational designs as well as improving performance through join elimination by using the RELY property. Users can track performance to spot potential query issues using the query profile in the Snowflake UI. The query profile is a valuable tool for identifying performance issues such as exploding joins and inefficient pruning.

While transformational modeling is performed on top of the physical model, the two should always...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023 Publisher: Packt ISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at ₹800/month. Cancel anytime}