Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Learn SQL Database Programming

You're reading from   Learn SQL Database Programming Query and manipulate databases from popular relational database servers using SQL

Arrow left icon
Product type Paperback
Published in May 2020
Publisher Packt
ISBN-13 9781838984762
Length 564 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Josephine Bush Josephine Bush
Author Profile Icon Josephine Bush
Josephine Bush
Arrow right icon
View More author details
Toc

Table of Contents (24) Chapters Close

Preface 1. Section 1: Database Fundamentals
2. Introduction to Relational Database Management Systems FREE CHAPTER 3. Installing and Using MySQL Workbench 4. Understanding Data Types 5. Designing and Creating a Database 6. Importing and Exporting Data 7. Section 2: Basic SQL Querying
8. Querying a Single Table 9. Querying Multiple Tables 10. Modifying Data and Table Structures 11. Section 3: Advanced SQL Querying
12. Working with Expressions 13. Grouping and Summarizing Data 14. Advanced Querying Techniques 15. Programmable Objects 16. Section 4: Presenting Your Findings
17. Exploring and Processing Your Data 18. Telling a Story with Your Data 19. Section 5: SQL Best Practices
20. Best Practices for Designing and Querying 21. SQL Appendix 22. Assessments 23. Other Books You May Enjoy

Database normalization

Database normalization is the process of putting your raw data into tables using rules to avoid redundant data, optimize database performance, and ensure data integrity.

Without proper normalization, not only can you have data redundancy, which uses additional storage space, but it can be more difficult to update and maintain the database without data loss.

Normalization requires forms. Forms are sets of rules to follow to normalize your data into database tables. There are three forms that we will discuss: the first normal form, the second normal form, and the third normal form. Each of these forms has a set of rules to ensure that your database complies with the form. Each of the forms builds on the previous forms.

The first normal form

The first normal form (1NF) is the first level of database normalization. You will need to complete this step before proceeding to other database normalization forms. The primary reason to implement 1NF is to eliminate repeating groups. This ensures that you can use simple SQL statements to query the data. It also ensures that you aren't duplicating data, which uses additional storage and computing time. This step will ensure that you are doing the following:

  • Defining data, columns, and data types and putting related data into columns
  • Eliminating repeating groups of data:
  • This means that you will not have repeating columns, such as Year1, Year2, Year3, but instead will have a column that is named Year, and each row in the table will be a different year.
  • Another example of this is not having multiple values in the same field, such as 1985, 1987, 1989, but instead placing each year in a row.
  • This means that there are no exact duplicate rows. The example following this bullet list will explain this concept in more depth.
  • Creating a primary key for each table

In the following example, you could make the first column the primary key in the people table and the foreign key in the salaries table. In the salaries table, you could create a new primary key or create a composite key that is an amalgamation of multiple fields.

Here is a denormalized sample table:

Name

birthYear

Salary

SalaryYear

Jim Jones

1981

2750000, 4500000

2010, 2011

Joe Smith

1974

10600000

2014

There is a right way and wrong way to normalize this table. Let's go over the wrong way first:

Name

birthYear

Salary1

Salary2

SalaryYear1

SalaryYear2

Jim Jones

1981

2750000

4500000

2010

2011

Joe Smith

1974

10600000

2014

The preceding design has introduced new problems. Even though it doesn't have groups of repeating data in one column, the salary is limited to two values. What if a player has more than two salaries? You don't have anywhere to put another salary without adding a third column. This also wastes space for those players that only have one salary, and searching through this table for a player with a specific salary becomes difficult. The same goes for the SalaryYear columns.

The right way to normalize the denormalized table to the first normal form is to ensure that there aren't repeating groups, as shown in the following table. The people table with player information would look like the following:

playerID

nameFirst

nameLast

birthYear

jjones01

Jim

Jones

1981

jsmith01

Joe

Smith

1974

The Salary value has been removed and placed in another table with the playerID field linking them to each other; therefore, the salaries table will look like the following:

salaryID

playerID

salary

year

1

jjones01

2750000

2010

2

jjones01

4500000

2011

3

jsmith01

10600000

2014

Let's go through a denormalization example by looking at the following table:

playerID namefirst namelast birthYear franchID franchname teamID RBI rank yearID
abbotpa01 Paul Abbott 1967 PHI Philadelphia Phillies PHI 2 2 2004
abreubo01 Bobby Abreu 1974 PHI Philadelphia Phillies PHI 79 1 2000
abreubo01 Bobby Abreu 1974 PHI Philadelphia Phillies PHI 110 3 2001
alcanar01 Arismendy Alcantara 1991 CHI Chicago Cubs CHI 1 8 2015
almoral01 Albert Almora 1994 CHI Chicago Cubs CHI 14 8 2016
almoral01 Albert Almora 1994 CHI Chicago Cubs CHI 46 6 2017
alvarpe01 Pedro Alvarez 1987 PIT Pittsburg Pirates PIT 77 17 2015
alvarto01 Tony Alvarez 1979 PIT Pittsburg Pirates PIT 2 9 2002
alvarto01 Tony Alvarez 1979 PIT Pittsburg Pirates PIT 8 1 2004

To meet the requirements of 1NF, you would need to split this table into multiple tables. Depending on the table you are trying to normalize, you might not need to split it if it's already following the rules of 1NF.

This table only contains the information about the player and has a primary key of playerID:

playerID

namefirst

namelast

birthYear

abbotpa01

Paul

Abbott

1967

abreubo01

Bobby

Abreu

1974

alcanar01

Arismendy

Alcantara

1991

almoral01

Albert

Almora

1994

alvarpe01

Pedro

Alvarez

1987

alvarto01

Tony

Alvarez

1979

The other table would contain the rest of the fields from the denormalized table. The following table has a foreign key relationship to the preceding table regarding playerID:

playerID

franchID

franchname

teamID

RBI

rank

yearID

abbotpa01

PHI

Philadelphia Phillies

PHI

2

2

2004

abreubo01

PHI

Philadelphia Phillies

PHI

79

1

2000

abreubo01

PHI

Philadelphia Phillies

PHI

110

3

2001

alcanar01

CHI

Chicago Cubs

CHI

1

8

2015

almoral01

CHI

Chicago Cubs

CHI

14

8

2016

almoral01

CHI

Chicago Cubs

CHI

46

6

2017

alvarpe01

PIT

Pittsburg Pirates

PIT

77

17

2015

alvarto01

PIT

Pittsburg Pirates

PIT

2

9

2002

alvarto01

PIT

Pittsburg Pirates

PIT

8

1

2004

The second normal form

The second normal form (2NF) is the second level of database normalization. You will need to complete 1NF before beginning this step. The primary reason to implement 2NF is to narrow tables down to a single purpose, which makes it easier to use and design tables. This step will ensure that you do the following:

  • Meet the requirements of 1NF: You will need to implement 1NF before you can use 2NF.
  • Remove partial dependencies: This will entail narrowing tables down to a single purpose where possible.

Starting with the tables from our 1NF example, you can break these down further into additional tables. You will still have the same player table from 1NF since it serves a single purpose of giving us player information. The franchise table has multiple purposes with RBI and rank, so since RBI isn't related to the franchise, you will split the franchise table into two.

The franchise table has all the franchise and team information in it now, and the RBI columns and related columns can be split out into a batting table. The franchise table still has a primary key of franchID and playerID with a foreign key referring back to the player table on playerID:

playerID

franchID

franchname

teamID

rank

yearID

abbotpa01

PHI

Philadelphia Phillies

PHI

2

2004

abreubo01

PHI

Philadelphia Phillies

PHI

1

2000

abreubo01

PHI

Philadelphia Phillies

PHI

3

2001

alcanar01

CHI

Chicago Cubs

CHI

8

2015

almoral01

CHI

Chicago Cubs

CHI

8

2016

almoral01

CHI

Chicago Cubs

CHI

6

2017

alvarpe01

PIT

Pittsburg Pirates

PIT

17

2015

alvarto01

PIT

Pittsburg Pirates

PIT

9

2002

alvarto01

PIT

Pittsburg Pirates

PIT

1

2004

The batting table has a primary key of playerID and teamID and has a foreign key of playerID to the player table and a foreign key of teamID to the franchise table:

playerID

teamID

RBI

yearID

abbotpa01

PHI

2

2004

abreubo01

PHI

79

2000

abreubo01

PHI

110

2001

alcanar01

CHI

1

2015

almoral01

CHI

14

2016

almoral01

CHI

46

2017

alvarpe01

PIT

77

2015

alvarto01

PIT

2

2002

alvarto01

PIT

8

2004

Since you split the franchise table into two tables—one table with franchise information and one table with batting information—these tables now comply with 2NF because each table is serving a single purpose.

The third normal form

The third normal form (3NF) is the second level of database normalization. You will need to complete 2NF before beginning this step. The primary reason to implement 3NF is to ensure that your tables aren't created so that dependencies between columns may cause inconsistency. Generally, if a database is described as normalized, then it's normalized according to the third normal form. This step will ensure that you are doing the following:

  • Meeting the requirements of 2NF: You will need to make sure that your tables comply with 2NF before proceeding with 3NF.
  • No attributes depend on other non-key attributes: This means that you will need to look at your tables and see whether more fields can be split into other tables since they aren't dependent on a key.

Working from our 2NF example, you can further split the franchise table into a franchise table and a team table. The rank of the team isn't dependent on the primary key of the franchise table.

In 3NF, our franchise table becomes the following two tables.

franchID becomes the primary key in the franchise table:

franchID

franchname

PHI

Philadelphia Phillies

CHI

Chicago Cubs

PIT

Pittsburg Pirates

The team table has a primary key of teamID and a foreign key of franchID referring to the franchise table:

franchID

teamID

rank

yearID

PHI

PHI

2

2004

PHI

PHI

1

2000

PHI

PHI

3

2001

CHI

CHI

8

2015

CHI

CHI

8

2016

CHI

CHI

6

2017

PIT

PIT

17

2015

PIT

PIT

9

2002

PIT

PIT

1

2004

To summarize the process of taking data from denormalized to the third normal, here's a diagram of the changes that are made:

The preceding diagram shows how you went from denormalized to 3NF. Following the rules of the normal forms, you took a single table and turned it into four tables. To begin with, you split up one denormalized table into two tables as part of 1NF. 1NF ensured that you didn't have duplicate data and repeating groups. This resulted in a player and franchise table. Next, you split the tables out into three tables as part of 2NF. 2NF resolved the issue of not giving each table a specific purpose, resulting in a player, franchise, and batting table. For the final step, you split the tables into four tables as part of 3NF. 3NF ensured that you didn't have any fields in a table that weren't dependent on the primary key, resulting in a player, franchise, batting, and team table.

By going from a denormalized table to 3NF, you accomplished several things, including ensuring that you don't have duplicate data, that you have keys linking data to each other in the tables, that you have a single purpose for each table, and that you have minimized the storage and compute costs for your queries.

Even adhering to the third normal form can be taken to extremes, so while the third normal form is desirable, it's not always required. For instance, with zip codes, you could create a table with just zip codes since they may be duplicated in a table with users' addresses, but this may degrade performance instead of helping performance.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Learn SQL Database Programming
You have been reading a chapter from
Learn SQL Database Programming
Published in: May 2020
Publisher: Packt
ISBN-13: 9781838984762
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Modal Close icon
Modal Close icon