The third normal form (3NF) is the second level of database normalization. You will need to complete 2NF before beginning this step. The primary reason to implement 3NF is to ensure that your tables aren't created so that dependencies between columns may cause inconsistency. Generally, if a database is described as normalized, then it's normalized according to the third normal form. This step will ensure that you are doing the following:
- Meeting the requirements of 2NF: You will need to make sure that your tables comply with 2NF before proceeding with 3NF.
- No attributes depend on other non-key attributes: This means that you will need to look at your tables and see whether more fields can be split into other tables since they aren't dependent on a key.
Working from our 2NF example, you can further split the franchise table into a franchise table and a team table. The rank of the team isn't dependent on the primary key of the franchise table.
In 3NF, our franchise table becomes the following two tables.
franchID becomes the primary key in the franchise table:
|
franchID
|
franchname
|
|
PHI
|
Philadelphia Phillies
|
|
CHI
|
Chicago Cubs
|
|
PIT
|
Pittsburg Pirates
|
The team table has a primary key of teamID and a foreign key of franchID referring to the franchise table:
|
franchID
|
teamID
|
rank
|
yearID
|
|
PHI
|
PHI
|
2
|
2004
|
|
PHI
|
PHI
|
1
|
2000
|
|
PHI
|
PHI
|
3
|
2001
|
|
CHI
|
CHI
|
8
|
2015
|
|
CHI
|
CHI
|
8
|
2016
|
|
CHI
|
CHI
|
6
|
2017
|
|
PIT
|
PIT
|
17
|
2015
|
|
PIT
|
PIT
|
9
|
2002
|
|
PIT
|
PIT
|
1
|
2004
|
To summarize the process of taking data from denormalized to the third normal, here's a diagram of the changes that are made:
The preceding diagram shows how you went from denormalized to 3NF. Following the rules of the normal forms, you took a single table and turned it into four tables. To begin with, you split up one denormalized table into two tables as part of 1NF. 1NF ensured that you didn't have duplicate data and repeating groups. This resulted in a player and franchise table. Next, you split the tables out into three tables as part of 2NF. 2NF resolved the issue of not giving each table a specific purpose, resulting in a player, franchise, and batting table. For the final step, you split the tables into four tables as part of 3NF. 3NF ensured that you didn't have any fields in a table that weren't dependent on the primary key, resulting in a player, franchise, batting, and team table.
By going from a denormalized table to 3NF, you accomplished several things, including ensuring that you don't have duplicate data, that you have keys linking data to each other in the tables, that you have a single purpose for each table, and that you have minimized the storage and compute costs for your queries.
Even adhering to the third normal form can be taken to extremes, so while the third normal form is desirable, it's not always required. For instance, with zip codes, you could create a table with just zip codes since they may be duplicated in a table with users' addresses, but this may degrade performance instead of helping performance.