In the realm of database management, tables and relationships form the backbone of organizing and connecting data in SQL Server. With the power of structured data organization and connectivity, SQL Server enables businesses and organizations to efficiently store, retrieve, and analyze vast amounts of information.
The SQL CREATE DATABASE statement
All tables in SQL Server are stored in a repository known as a database. A database is a collection of tables related to a specific entity. For instance, we can have separate databases for insurance company providers such as Blue Cross Blue Shield and Cigna. Having one database for each entity helps maintain and scale the database and its dataset for the future. The owner of the database is known as a database administrator who holds admin privileges to the database. Only a database administrator can provide or revoke access to a database.
- The syntax for creating a database is
CREATE
DATABASE Database_name
- The preceding statement will create a database with the name
Database_name
- The syntax for deleting a database is
DROP
DATABASE Database_name
- The syntax for viewing the entire database is
SHOW DATABASE
The SQL CREATE TABLE statement
The CREATE TABLE
statement is used to create a new table, which is a combination of rows and columns, in a specified database:
CREATE TABLE DATABASENAME.TABLE_NAME(
Column 1 datatype,
Column 2 datatype,
Column 3 datatype,
);
For example. we can create a walmart.customer_info
table as follows:
(Customer_ID int,
Name varchar(255),
Address varchar(255),
Email varchar(255),
Phone varchar(255)
)
In the preceding code, the Customer_ID
column is defined as an integer data type, while the Name
column is defined as a varchar data type. This means that the Name
column can hold both letters and numbers, up to a maximum length of 255 characters. The code will create an empty table with these columns.
Customer_ID
|
Name
|
Address
|
Phone
|
Email
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 1.5 – customer_info table
Note
Post-creation of this table, the database administrator has to provide read access to all the table users so that they can access the data within it.
SQL DROP TABLE versus TRUNCATE TABLE
If the requirement is to delete the entire table along with its schema, the SQL syntax used is DROP TABLE table_name
. However, if the need is to delete only the content from within the table and retain the table structure, the SQL syntax used is TRUNCATE TABLE table_name
. After truncation, the table will still exist but with 0 rows within it.
SQL ALTER TABLE
The SQL syntax of ALTER TABLE
is used to insert, update, or delete data or columns in a pre-created table:
ALTER TABLE Walmart.customer_infoDROP COLUMN Email;
The following table will be created after the execution of the drop statement:
Customer_ID
|
Name
|
Address
|
Phone
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 1.6 – Email column dropped
SQL constraints
Constraints are a set of predefined rules that a database administrator or table creator defines to ensure that the table and its data are unique and clean. They can be defined at the table or column level. Some commonly used constraints include the following:
- The UNIQUE constraint: Ensures all columns have a unique value.
- The NOT NULL constraint: Ensures all columns have some value.
- PRIMARY KEY: A unique value at each row level.
- FOREIGN KEY: A relational key, which is a copy of a primary key column from a different table within the same database. A foreign key is used to ensure that the communication between two tables is maintained and never destroyed by accidentally dropping the foreign key column. More on this will be discussed in the following sections.
SQL keys
In RDBMSes, the term “relational” refers to the relationship between tables in a database that allows for the retrieval of necessary data. This relationship is established through the use of keys such as primary keys, foreign keys, and candidate keys.
Customer_Id
|
Name
|
Passport_Number
|
DOB
|
1
|
Adam
|
L08790
|
7/11/1990
|
2
|
James
|
L08791
|
8/6/1992
|
3
|
Paul
|
L08792
|
3/4/1993
|
Figure 1.7 – Candidate keys
Candidate keys
A candidate key is a set of one or more columns that can uniquely identify a record in a table. It can be used as a primary key as it cannot be null and must be unique. A candidate key is a super key with no repeating attributes. Out of all the candidate keys that are possible for a table, only one key can be used to retrieve unique rows from the table. This key is called the primary key. It is important to remember that the candidate key used as a primary key should be unique and have non-null attributes.
In the customer table shown in Figure 1.8, we have one candidate key – Passport_Number – which is unique, whereas the Customer_ID column is a primary key.
Customer_ID
|
1
|
2
|
3
|
Name
|
Adam
|
James
|
Paul
|
Passport_Number
|
L08790
|
L08791
|
L08792
|
DOB
|
7/11/1990
|
8/6/1992
|
3/4/1993
|
Figure 1.8 – Candidate key
Primary keys
A primary key is an attribute used to uniquely identify a row in a table. In the table mentioned previously, Customer_ID would be the primary key, while Passport_Number would not be a primary key as it contains confidential information.
Customer_ID
|
1
|
2
|
3
|
➡
|
Primary Key
|
Name
|
Adam
|
James
|
Paul
|
Passport_Number
|
L08790
|
L08791
|
L08792
|
|
|
DOB
|
7/11/1990
|
8/6/1992
|
3/4/1993
|
|
|
Figure 1.9 – Primary key
Alternate keys
A candidate key that hasn’t already been assigned as a primary key is known as an alternate key that can uniquely identify a row. In the following table, Customer_ID is the primary key and Passport_Number is a candidate key; therefore, License_Number can be an alternate key as it can also uniquely identify a customer.
Customer_ID
|
1
|
2
|
3
|
➡
|
Alternate Key
|
Name
|
Adam
|
James
|
Paul
|
Passport_Number
|
L08790
|
L08791
|
L08792
|
DOB
|
7/11/1990
|
8/6/1992
|
3/4/1993
|
License_Number
|
L01YZ
|
L02ER
|
L03PX
|
Figure 1.10 – Alternate key
Super keys
If more than one attribute is assigned as the primary key and it still uniquely identifies a row within a table, then it becomes a super key.
For example, Customer_ID
+ Name
is a super key, as the name of a customer may not be unique, but when combined with Customer_ID
, then it becomes unique.
Customer_ID
|
1
|
2
|
3
|
}➡
|
Super Key
|
Name
|
Adam
|
James
|
Paul
|
Passport_Number
|
L08790
|
L08791
|
L08792
|
|
|
DOB
|
7/11/1990
|
8/6/1992
|
3/4/1993
|
|
|
Figure 1.11 – Super key
Composite keys
If the table does not have an individual attribute that can qualify as a candidate key, then we need to select two or more columns to create a unique key, which is known as a composite key.
For example, if we do not have a customer ID or passport number, we can use a composite primary key consisting of the full name and date of birth.
There is still the possibility of duplicate rows in this scenario if both the name and date of birth have the same value.
Customer_ID
|
1
|
2
|
3
|
}➡
|
Composite Key
|
Name
|
Adam
|
James
|
Paul
|
Passport_Number
|
L08790
|
L08791
|
L08792
|
DOB
|
7/11/1990
|
8/6/1992
|
3/4/1993
|
Figure 1.12 – Composite key
Surrogate key
A surrogate key is a key that is generated by the system and has no business meaning. The values generated for the keys are sequential and act as a primary key. When we don’t have a proper primary key for the table, a surrogate key is generated to uniquely identify the data. In such scenarios, the surrogate key becomes the primary key.
For example, let’s consider that we are creating a database for addresses.
Address_ID
|
Street_Name
|
City
|
State
|
Zipcode
|
1
|
Jefferson St
|
Dallas
|
Texas
|
38256
|
2
|
Thomas St
|
Memphis
|
Tennessee
|
38257
|
3
|
James St
|
Chicago
|
Illinois
|
33189
|
4
|
Perkins St
|
Miami
|
Florida
|
23487
|
Figure 1.13 – Address table with Address_ID as a surrogate key
Here Address_ID
is the surrogate key as it is generated systematically and is used to uniquely identify the rows. It holds no business value.
Primary keys in detail
A primary key is a specially created attribute to uniquely identify each record in a table and has the following features:
- It holds unique values for each record/row
- It can’t have null values
The primary key and foreign key are core principles used to establish relationships between tables in a relational database.
A few examples of primary keys include Social Security Number (SSN), passport number, and driver’s license number. These are used to uniquely identify a person.
To ensure unique identification, a composite primary key is sometimes created using a combination of columns.
Foreign keys in detail
A foreign key is a column or combination of columns that creates a link between data in one table (the referencing table) and another table that holds the primary key values (the referenced table). It creates cross-references between tables by referencing the primary key (unique values) in another table.
The table that has the primary key is called the parent table or referenced table, and the table that has a foreign key is called the referencing table or child table.
The column that has a foreign key must have a corresponding value in its related table. This ensures referential integrity.
The FOREIGN KEY
constraint is used to prevent actions that would destroy links between tables. Essentially, it ensures that if a column value A refers to a column value B, then column value B must exist.
For example, let’s consider the Orders
table and the Customers
table. In this case, the customer_ID
column in the Orders
table refers to the Customer_ID
column in the Customers
table. Here are some key points regarding the relationship between these tables:
- Any value that is updated or inserted in the
customer_id
attribute (foreign key) of the Orders
table must exactly match a value in the Customer_ID
attribute (primary key) of the Customers
table or be NULL
to ensure the relation is maintained.
- The values in the
ID
attribute (the Customer_ID
primary key) of the Customers
table that are referencing the customer_ID
attribute (foreign key) of the Orders
table cannot be updated or deleted unless cascading rules are applied (cascading actions will be discussed shortly). However, the values of ID
in the Customers
table that are not present in the Customer_ID
attribute of the Orders
table can be deleted or updated.
Figure 1.14 – Foreign key illustration
Cascading actions
Cascading actions in SQL refer to the automatic propagation of changes in a parent table to related child tables through foreign key constraints. It enables actions such as deletion or modification in the parent table to automatically affect corresponding records in the child tables. This ensures data integrity and simplifies data management by reducing manual updates.
DELETE CASCADE
This ensures that when a row with a primary key is deleted from a parent table, the corresponding row in the child table is also deleted.
UPDATE CASCADE
This ensures that when a referencing row that is a primary key is updated in a parent table, then the same is updated in the child table as well.
Using a foreign key eliminates the need to store data repeatedly. Since we can directly reference primary keys in another table, we don’t have to store that data again in every table.
Take the following example:
Figure 1.15 – Customers table
Order_ID
|
Customer_ID
|
OrderDate
|
ShippingDate
|
ShippingStatus
|
O1
|
1
|
1/1/2022
|
1/7/2022
|
Delivered
|
O2
|
1
|
9/1/2022
|
9/4/2022
|
In Progress
|
O3
|
2
|
12/20/2022
|
12/31/2022
|
Not Started
|
O4
|
3
|
8/15/2022
|
8/20/2022
|
Delivered
|
O5
|
4
|
5/31/2022
|
5/31/2022
|
Delivered
|
Figure 1.16 – Orders table
As you can see, Customer_ID
in the Orders
table is a foreign key that can be used to establish a connection between the Orders
and Customers
tables. This allows us to retrieve customer details from the Orders
table and vice versa.
Database relationships
Database relationships are used to build well-defined table structures and establish relationships between different tables. With the correct relationships, it helps to standardize data quality and eliminate data redundancy.
Different types of database relationships include the following:
- One-to-one relationships
- One-to-many relationships
- Many-to-many relationships
A database entity can be a customer, product, order, unit, object, or any other item that has data stored in the database. Typically, entities are represented by tables in the database.
An entity relationship diagram, commonly known as an ER diagram or ERD, is a flowchart that illustrates how different entities/tables are related to each other.
One-to-many relationships
One-to-many is the most common type of relationship, in which one record in one entity/table can be associated with multiple records in another entity/table.
Figure 1.17 – One-to-many relationship
Example
Let’s consider the Customers
and Orders
tables. In this case, one customer can have multiple orders, establishing a one-to-many relationship between them. Customer ID
in the Customers
table serves as the primary key and is associated with unique values, representing each customer uniquely. On the other hand, Customer ID
in the Orders
table acts as the foreign key and can have multiple instances, indicating that multiple orders can be associated with the same customer.
In this case, a single customer ordered multiple products, creating a one-to-many relationship where each product was associated with one customer.
One-to-one relationships
A one-to-one relationship is a type of relationship between two tables in which one record in one table is associated with only one record in another table.
Figure 1.18 – One-to-one relationship
Example
In a school database, each student is assigned a unique student_ID, and each student_ID is linked to only one student.
In a country database, each country is associated with one capital city, and each capital city is associated with only one country.
Many-to-many relationships
A many-to-many relationship is one in which multiple records in one table are associated with multiple records in another table.
Figure 1.19 – Many-to-many relationship
Example
Let’s consider the Customers
and Products
tables. Customers can purchase multiple products, and each product can be purchased by different customers.
In a relational database, a direct many-to-many relationship is typically not permitted between two tables. For example, in a bank transaction database with multiple invoices having the same number, it can be difficult to map the correct invoice and retrieve the necessary information when a customer makes an inquiry. To address this issue, many-to-many relation tables are often broken down into two one-to-one relationship tables by introducing a third table known as a join table. The join table holds the primary key of both tables as a foreign key and may also contain other necessary attributes.
For example, let’s consider the Products
and Distributor
tables.
Here we have the following attributes in each table.
The attributes for the Distributor
table are as follows:
id
: The distributor’s ID is the primary key used to identify the distributor
distributors_name
: The distributor’s name
distributors_address
: The distributor’s address
distributors_city
: The city where the distributor is located
distributors_state
: The state where the distributor is located
The attributes for the Products
table are as follows:
id
: The product’s ID is the primary key used to identify the product ID
product_name
: The product’s name
product_description
: The product’s description
price
: The product’s price per unit
Multiple products can be ordered by multiple distributors, and each distributor can order different products. Therefore, this information needs to be transformed into a relational database model, which would look something like this:
Figure 1.20 – Relational database model
This is known as a join table, and it contains two attributes that serve as foreign keys referencing the primary keys of the original tables.
id
references the distributor ID from the distributor table.
product_id
references the product_id
column in the product
table.
These two together serve as the primary key for this table.
However, this information is not sufficient. It would be better to add more attributes to this table.
Figure 1.21 – Adding more attributes to the relational database model
So, we have now converted it into a relational database model and have the data in a cleaner form. Additionally, I have changed the table name to make it more aligned with the data and named it Orders
. We have also added the following additional attributes to the table:
order_id
: Contains the unique order ID for each order placed by the customer
distributor_id
: The distributor’s ID is the unique identifier for each distributor
product_id
: The unique identifier for each product, which can be ordered
order_date
: The order date
quantity
: The number of units ordered
total_price
: The total price of the orders
Note
There are two ways that you can create a join table.
The first way to create a join table is by having the table contain only foreign keys that reference other tables. This is the most common and straightforward approach. In this case, the join table primarily serves as a mapping table, capturing the relationships between the records in the two tables it connects. For example, let’s consider two tables: “Students
” and “Courses
.” If multiple students can enroll in multiple courses, we need a way to represent this many-to-many relationship. We can create a join table named “Enrollments
” that contains foreign keys referencing the primary keys of both the “Students” and “Courses” tables. Each row in the “Enrollments” table represents a specific student’s enrollment in a particular course. The second way to create a join table involves adding additional attributes, effectively turning the join table into a new entity. In this approach, the join table gains its own meaning and significance beyond simply representing the relationship between the two tables. For instance, let’s consider a scenario where we have two tables: “Employees
” and “Projects
.” Instead of just recording the relationships between employees and projects, we may want to include additional attributes in the join table, such as the date an employee was assigned to a project or their role on that project. In this case, the join table, which we can call “Assignments,” becomes a separate entity that stores specific information about the association between employees and projects. By adding these additional attributes, the join table becomes more than just a mapping table and can provide more context or details about the relationship between the two tables it connects.
You should always model tables according to the requirements, and there are no hard and fast rules for doing so.