Reader small image

You're reading from  Data Modeling with Snowflake

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837634453
Edition1st Edition
Right arrow
Author (1)
Serge Gershkovich
Serge Gershkovich
author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich

Right arrow

Database Normalization

Previous chapters have explored the method of capturing the real-world business workings of an organization and modeling them using visual semantics. The resulting model and accompanying diagrams make it easy for the domain and data teams to reach a consensus about the business’s fundamental entities and the interactions between them. However, as the modeling process approaches the physical stage, we should understand that many ways exist to structure the data at the database level. The process of dividing the data into smaller, modularized segments is known as normalization. In this chapter, we will understand how it works and the advantages and disadvantages that go along with it.

Normalization is not a binary classification but a spectrum of ever-increasing rules a design must satisfy to achieve a stated level. While normalization contains the root word normal, along with the obvious positive connotations, more normalization does not necessarily...

An overview of database normalization

Database normalization is the process of organizing a database in a way that reduces redundancy and dependency within its tables. This is achieved by breaking a large table into smaller ones and linking them through foreign key relationships. Doing so leads to fewer data inconsistencies and improved data integrity. A normalized database results in a modular design that is easy to scale and modify.

Normalization occurs through escalating stages of formal rules called normal forms, ranging from the first normal form (1NF) to the sixth normal form (6NF)—although the first through third are most commonly used and are sufficient for most use cases.

Each normal form builds on the requirements of its predecessor and adds additional criteria that every database table must satisfy. A normal form is considered satisfied when every table in the database meets the criteria laid out for it (and, by extension, its predecessors).

Neglecting normalization...

Data anomalies

Bad data comes in many flavors. From misspellings to improper encoding, some data quality issues can not be avoided. However, denormalized designs make it possible to walk headlong into several well-known and preventable blunders.

To understand how normalization prevents data anomalies, we need to unpack the dual dangers it mitigates: redundancy and dependency:

  • Redundancy: Repeated data, whether within one or across multiple tables. When data values are duplicated, synchronizing everything through DML operations becomes harder.
  • Dependency: When the value of one attribute depends on the value of another. Dependencies can be functional (such as a person’s age attribute depending on their name) or multivalued (such as name, age, and hobby stored in a single table would make it impossible to delete a hobby without deleting all the people who practice it).

With these dangers in mind, let’s review the kinds of data anomalies that have the...

Database normalization through examples

Normalization was first proposed by Edgar F. Codd, the inventor of the relational model for database management. Codd introduced normalization in 1NF form and later extended it to 2NF and 3NF. Later, Codd, working with Raymond F. Boyce, developed Boyce-Codd Normal Form (BCNF) also called 3.5NF.

As database theory continues to develop, subsequent normal forms have been proposed up to 6NF, following the progression from least to most restrictive. While it is important to understand normalization in all its forms, it is also essential to recognize that typical business scenarios do not require going past 3NF.

To understand how normal forms organize a database and prevent data anomalies, we will run through an exercise of taking a wholly denormalized dataset and running it through the normalization rules required to satisfy each form. The exercises that follow use examples of data from the music industry. But, unlike the wide world of music...

Data models on a spectrum of normalization

The prior sections demonstrate how, as we move up the normal form scale, redundancy and the potential for data anomalies decrease while raising the number of tables and associated complexity. With this in mind, we can visualize where various data modeling patterns fall on the normal form spectrum to help understand their suitability in design scenarios. While the dimensional modeling process was discussed in Chapter 7, Putting Conceptual Modeling into Practice, the associated schema patterns (star and snowflake) are explained in further detail in Chapter 17, Scaling Data Models through Modern Techniques, along with Data Vault (where they are often implemented at the information mart layer). However, visualizing them in the context of normal forms will help lay the groundwork for later understanding their design. Also, bear in mind that the following assertions speak to general tendencies, not hard-set rules.

The following image displays...

Summary

Through various examples in this chapter, we saw how the process of normalization organizes a database in a way that reduces redundancy and dependency within its tables.

Dependency and redundancy in database tables increase the likelihood of data anomalies, which come in many forms. Update anomalies occur due to redundancy, which makes it possible to update some, but not all, of the associated records. Physical (as opposed to logical) dependencies are the root cause of insertion and deletion anomalies. When too many details of varying granularity are bunched into a single table, inserting or deleting records that do not match all the criteria becomes difficult. The domain anomaly is the hardest to spot because it requires functional knowledge of the data in question.

Database normalization can be applied through escalating stages of formal rules called normal forms, ranging from 1NF to 6NF to avoid such anomalies. The most commonly used normal forms are the first through...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Modeling with Snowflake
Published in: May 2023Publisher: PacktISBN-13: 9781837634453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Serge Gershkovich

Serge Gershkovich is a seasoned data architect with decades of experience designing and maintaining enterprise-scale data warehouse platforms and reporting solutions. He is a leading subject matter expert, speaker, content creator, and Snowflake Data Superhero. Serge earned a bachelor of science degree in information systems from the State University of New York (SUNY) Stony Brook. Throughout his career, Serge has worked in model-driven development from SAP BW/HANA to dashboard design to cost-effective cloud analytics with Snowflake. He currently serves as product success lead at SqlDBM, an online database modeling tool.
Read more about Serge Gershkovich