Our Inside Data Engineering Newsletter gives data engineers and practitioners what they often lack today: clear, real-world insights—where every byte tells a story.Subscribe here to stay ahead in data engineeringIntroductionIn modern analytics environments, few modeling challenges are as important or as misunderstood as Slowly Changing Dimensions (SCDs). Business attributes do not stand still. Customers change status, relocate, upgrade tiers, and evolve over time. The real challenge is not just recording those changes. It is preserving the right version of history so that facts remain analytically correct.In Snowflake-powered data warehouses, modeling dimensions properly means balancing performance, maintainability, and historical accuracy. Should attributes be overwritten, versioned, split into mini-dimensions, or tracked with effective dates and surrogate keys?This article explores the full spectrum of SCD strategies from Type 0 through Type 7 using a practical CUSTOMER example. You will see how different change-tracking techniques impact reporting, how surrogate keys and validity intervals maintain point-in-time accuracy, and how Snowflake-native features can be used to implement these patterns efficiently.If you would like to follow along, the scripts used in these examples are available here: https://github.com/PacktPublishing/Data-Modeling-with-Snowflake2E/tree/main/ch14Dimensions overviewA dimension unifies (or conforms) similar attributes from one or various source systems into a single table under a common unique identifier known as a business key. A single surrogate key can also be used in place of multi-column business or primary keys. The unique key of a dimension table plays a critical role in identifying dimension records and allows the database team to track and maintain changes over time. A dimension table can be structured in predetermined ways to allow for different types of change tracking depending on the business requirement.SCD typesAttributes within a dimension have differing requirements for durability and change tracking. Some attributes are updated directly, while others require historical snapshots, yet others cannot change at all. This section will cover the types of SCDs, or update profiles, that a given attribute in a dimension can have.It’s important to note that the dimension type may not necessarily apply across all dimension attributes equally. Within the same dimension table, some attributes may be overwritten while others may not. By understanding SCD types and when to use them, database developers can implement the proper table structure and update techniques to satisfy the organization’s reporting and analytics requirements.Example scenarioTo explain the various SCD types, we will use a simplified CUSTOMER dimension as an example and track the change as it would appear under each configuration.Suppose our fact table stores order details from customer X, made on the first of every month in2022. Thanks to X’s patronage, their customer status went from new to active midway through the year. Not only do we want to track when the change occurred but we want to tie the correct status to the recorded sales facts (that is, the customer is active today, but half their orders were made with a status of new).The change in customer status is displayed here as it currently appears in the source system and data warehouse landing area:Figure: A changed record in the source system and data warehouse raw schema With this scenario in mind, let’s explore the SCD types.Type 0: Maintain originalIronically, the first SCD—Type 0—does not change. Type 0 dimensions are intended for durable attributes that cannot change due to their business nature. Examples of Type 0 attributes include birth dates, calendar dates, and any attribute recorded at record creation that needs to be tracked as a baseline, such as original price, weight, or date of first login.Type 1: OverwriteType 1 attributes do not require historical tracking and may be directly overwritten with an UPDATE statement. Sometimes, the latest attribute value is all that the business cares about. For example, our organization demands the latest customer status, and previous values are irrelevant. Maintaining a Type 1 dimension is relatively simple—for example, if the status changes, it is updated directly in the customer dimension, as illustrated here:Figure: New updated for STATUS change in a Type 1 SCDHowever, overwriting values is often not enough—a historical value must also be preserved.Type 2: Add a new rowFor some attributes, an organization must register the latest value and maintain prior historical records. Type 2 attributes generate a new row every time a change is recorded. Generating new rows for a given business key means that uniqueness is violated unless a time dimension (the effective date) is added to the primary key. The effective date of a Type 2 SCD not only separates historical values for a given business key but also allows those records to be tied to fact tables at a given point in time.Maintaining a Type 2 SCD requires creating new rows when record changes are detected and additional metadata columns to track them. A single record in our example would generate the following change in a Type 2 table:Figure: New row generated for a change in a Type 2 SCDThe following metadata fields make working with Type 2 attributes easier:Validity intervals: Because the business key is being duplicated with each change, another column must be added to the primary key to maintain uniqueness. Validity intervals (also named valid_from/to, start/end_date) provide the additional unique value for the primary key and timestamp when the change occurred, allowing facts to be linked with the correct point-in-time dimension value. The TO_DATE column also provides a flag for identifying the latest record using the standard surrogate high date of 9999-12-31.Hash: Using a hashing function, such as MD5, provides a quick and standard way to identify when record changes occur. This concept is borrowed from Data Vault (discussed in Chapter 18, Scaling Data Models Through Modern Techniques). When there are many Type2 attributes in a table, instead of checking for changes one by one, hash all of them into a single column and compare them in a single go, as follows:Create the hash field: SELECT MD5 (Col1 || Col2 || ... || ColN) AS hashCompare the hash field: IFF(hash_new = hash_old, 'same', 'changed')Type 3: Add a new columnType 3 dimensions track changes by adding a new column to store the previous value when a change occurs. The original column is updated and not renamed to avoid breaking any existing downstream references. An effective date metadata column records the time of the change, allowing analytics processes to use the new or historical value based on their validity period.An example of a status update in a Type 3 attribute is given here:Figure: New row column created for a change in a Type 3 SCDAlthough Type 3 is easier to maintain than Type 2, the limitation is storing multiple changes. While Type 2 attributes can change as often as needed, generating new rows each time, Type 3 can only show one change without creating additional columns—not a scalable design if regular changes occur.Type 4: Add a mini dimensionWhen SCDs become quickly changing dimensions—due to rapidly changing attributes—the number of records that Type 2 dimensions generate can cause performance issues. This is especially true with dimensions containing many records—as in millions of rows or more.In a Type 4 scenario, the solution is to split the frequently changing attributes into a separate mini dimension. To further curtail the number of records, the values in the mini dimension can be banded within business-agreed value ranges that provide a meaningful breakdown for analysis. The mini dimension has its own surrogate key and does not contain the main dimension foreign key—allowing both to retain a relatively low cardinality. However, to tie the main dimension to the mini, the mini dimension foreign key must be included in the fact table (as the main dimension appears at the time of the generated fact).On a diagram, the arrangement of a Type 4 dimension would look like this:Figure: A Type 4 SCD on a relational diagramFor our example, the business wants to track the length in months for how long a customer has been active, as well as their total yearly spend at the time of each sale. To avoid generating a record for each month and order placed, the business teams have agreed to group the MONTHS_ACTIVE attribute into two categories (less than or greater than five months) and band the sales volume into three groups. The mini dimension would need to contain every possible (or allowable by existing business rules) combination of groupings.Our example would look like this (notice how the profile ID changes throughout the year as a function of the customer’s attributes):Figure: Mini dimension and foreign key in a fact table in a Type 4 SCDWhile this arrangement satisfies the reporting requirement, bridging dimension tables via a fact encumbers analysis on the dimension itself. To unify the main and mini dimensions into one, a Type 5 SCD is used.Type 5: Type 4 mini dimension + Type 1A Type 5 SCD is an extension of the Type 4 mini-dimension technique—adding the mini-dimension key as a Type 1 attribute in the main dimension (hence the name, 4+1 = 5). This approach affords the performance gains of a Type 4 dimension by avoiding the explosive growth of rapidly changing Type 2 records and gives users a simple way to unify the main dimension with the mini dimension through a common join column.On a diagram, the arrangement of a Type 5 dimension would look like this:Figure: A Type 5 SCD and related view on a relational diagramNotice that to further simplify the user experience, a view is created over the main and mini dimensions to give the users a single entity to work with. Analysis of the fact table becomes more versatile by allowing users to join on one entity (the view) instead of the main and mini dimensions if historical values are not required.The same scenario described in the section on Type 4 would look like this under Type 5:Figure: Mini-dimension and a related view in a Type 5 SCDUnfortunately, Type 4, and by extension, Type 5, suffer from the inconvenience of calculating the mini-dimension value to include it as part of each fact. The performance implications involved in adding the mini-dimension foreign key to the fact table should outweigh the performance gain in reducing the number of dimension records through the use of the mini dimension.Type 6: The Type 1, 2, and 3 hybridA Type 6 SCD is so named because it combines the techniques of Type 1, 2, and 3 (1+2+3 = 6) dimensions into one table. Based on business needs, users will demand different levels of historical values to achieve a balance of detail and flexibility in their analytics.Suppose our customer X from previous examples began to relocate—moving headquarters to Mexico in 2023, then to Brazil in 2024. A Type 6 approach yields a dimension table that gives analysts every possible temporal attribute value in every snapshot: a Type 1 current value, a Type 2 effective dated value, and a Type 3 previous value.To recap the status and country changes mentioned in this example, a snapshot of the source system over time is presented here:Figure: Source system showing changes for customer XIn a business scenario where the customer status needed Type 2 and the country was presented as Type 1, 2, and 3, the resulting table would look like this (the HASH column is now calculated as a function of STATUS and COUNTRY):Figure: Type 1, 2, and 3 columns combine in a Type 6 SCDType 7: Complete as-at flexibilityBusiness users across all cultures and industries have a penchant for changing their minds. The Type 7 approach gives database modelers a way to deliver the needed historical attribute no matter the criteria or temporal reference point requested.A Type 7 dimension (unimaginatively named as the number that follows 6) includes a natural key and a surrogate key in a Type 2 table structure and embeds both in the fact table.A method for generating surrogate keysAn efficient—and data vault-inspired—way to generate a surrogate key for Type 2 records is to use an MD5 hash on the compound primary key (in this example, CUSTOMER_ID and FROM_DATE):SELECT MD5(customer_id || from_date) AS customer_skeyIn a Type 7 configuration, a surrogate key is added to an otherwise Type 2 structure and is embedded in the fact (the latest SKEY as of the creation of each fact record). Based on the example scenario from the Type 6 section, the tables would look like this:Figure: A Type 7 SCD offers complete analytical flexibilityA Type 7 SCD allows business users to select the appropriate customer attributes based on the following criteria:The most recent or current information (that is, TO_DATE = '9999-12-31')The primary effective date on the fact record (that is, LOAD_DATE)When the user changes their mind, any date associated with the fact record (that is, ORDER_ DATE or SHIPMENT_DATE)Here is how those queries might look:--get current
SELECT < fact and attribute fields >
FROM order o
INNER JOIN customer c USING(customer_id)
WHERE c.to_date = '9999-12-31'
--get dimension values as at the primary effective date on the fact record
SELECT < fact and attribute fields >
FROM order o
INNER JOIN customer c USING(customer_skey)
--get dimension values as-at any date on the fact record
--example will use SHIPMENT_DATE
SELECT < fact and attribute fields >
FROM order o
INNER JOIN customer c USING(customer_skey)
AND o.shipment_date BETWEEN c.from_date AND c.to_dateNow that you have a general understanding of the different SCD types, let’s recap before detailing the Snowflake recipes used to construct them.Overview of SCD typesThe following screenshot summarizes the seven SCD types covered in the previous section, including their maintenance strategy and usage. While eight (including Type 0) SCDs may seem like a lot, most database designs rarely go beyond Type 3, as the first four SCD types strike an acceptable balance of performance, maintainability, and historical reporting needs:Figure: A comparison of SCD typesNow, let’s see how to build SCDs with maximal efficiency using Snowflake-specific features.ConclusionSlowly Changing Dimensions are not a one-size-fits-all solution. Some attributes should never change (Type 0), others can be safely overwritten (Type 1), and many require full historical tracking (Type 2). As business complexity grows, hybrid approaches such as Types 4, 5, 6, and 7 offer increasing levels of analytical flexibility. They help organizations answer not only what is true now, but also what was true at the time of each transaction.Choosing the right SCD type is ultimately a business decision supported by sound modeling principles. The goal remains consistent: preserve the integrity of historical analysis while keeping data structures scalable and performant.For a deeper dive into implementing SCDs in Snowflake, including practical SQL recipes, hashing strategies, dynamic approaches, and performance considerations, you can learn more in the book Data Modeling with Snowflake, Second Edition by Serge Gershkovich. Modeling guides are often steeped in theory. This book’s innovative approach combines practical modeling concepts with Snowflake best practices and unique features - allowing you to create efficient designs that leverage the power of the Data Cloud. Author BioSerge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more