1. SQL Basics
This chapter covers the very basic concepts of SQL that will get you started with writing simple commands. By the end of this chapter, you will be able to identify the difference between structured and unstructured data, explain the basic SQL concepts, create tables using the
CREATE statement, and insert values into tables using SQL commands.
The vast majority of companies today work with large amounts of data. This could be product information, customer data, client details, employee data, and so on. Most people who are new to working with data will do so using spreadsheets. Software such as Microsoft Excel has many tools for manipulating and analyzing data, but as the volume and complexity of the data you're working with increases, these tools may become inefficient.
A more powerful and controlled way of working with data is to store it in a database and use SQL to access and manipulate it. SQL works extremely well for organized data and can be used very effectively to insert, retrieve, and manipulate data with just a few lines of code. In this chapter, we'll get an introduction to SQL and see how to create databases and tables, as well as how to insert values into them.
For most companies, storing and retrieving data is a day-to-day activity. Based on how data is stored, we can broadly classify data as structured or unstructured. Unstructured data, simply put, is data that is not well-organized. Documents, PDFs, and videos fall into this category—they contain a mixture of different data types (text, images, audio, video, and so on) that have no consistent relationship between them. Media and publishing are examples of industries that deal with unstructured data such as this.
In this book, our focus will be on structured data. Structured data is organized according to a consistent structure. As such, structured data can be easily organized into tables. Thanks to its consistent organization, working with structured data is easier, and it can be processed more effectively. Tables are collections of entities or tuples (rows) and attributes (columns).
For example, consider the following table:
For each row, there is a clear relationship; a given student takes a particular subject and achieves a specific score in that subject. The columns are also known as fields, while the rows are known as records.
Data that is presented in tabular form can be stored in a relational database. Relational databases, as the name suggests, store data that has a certain relationship with another piece of data. A Relational Database Management System (RDBMS) is a system that's used to manage relational data. SQL works very well with relational data. Popular RDBMSs include Microsoft SQL Server, MySQL, and Oracle. Throughout this book, we will be working with MySQL. We can use various SQL commands to work with data in relational databases. We'll have a brief look at them in the next section.
An Overview of Basic SQL Commands
SQL (often pronounced "sequel") stands for Structured Query Language. A query in SQL is constructed using different commands. These commands are classified into what are called sublanguages of SQL. Even if you think you know them already, give this a read to see if these seem more relatable to you. There are five sublanguages in SQL, as follows:
- Data Definition Language (DDL): As the name suggests, the commands that fall under this category work with defining either a table, a database, or anything within. Any command that talks about creating something in SQL is part of DDL. Some examples of such commands are
The following table shows the DDL commands:
- Data Manipulation Language (DML): In DML, you do not deal with the containers of data but the data itself. When you must update the data itself, or perform calculations or operations on it, you use the DML. The commands that form part of this language (or sublanguage) include
DML allows you to work on the data without modifying the container or stored procedures. A copy of the data is created and the operations are performed on this copy of the data. These operations are performed using the DML. The following table shows the DML commands:
- Data Control Language (DCL): When we sit back and think about what the word control means in the context of data, we think of allowing and disallowing actions on the data. In SQL terms, or in terms of data, this is about authorization. Therefore, the commands that fall in this category are
REVOKE. They control access to the data. The following table explains them:
- Transaction Control Language (TCL): Anything that makes a change to the data is called a transaction. When you perform a data manipulation operation, the manipulation happens to data in a temporary location and not the table/database itself. The result is shown after the operation. In order to write or remove something from the database, you need to use a command to ask the database to update itself with the new content. Applying these changes to the database is called a transaction and is done using the TCL. The commands associated with this language are
ROLLBACK. The following table explains these commands in detail:
- Data Query Language (DQL): The final part of this section regarding the classification of commands is the DQL. This is used to fetch data from the database with the SELECT command. It's explained in detail in the following table:
We'll look at these queries in detail in later chapters.
An interesting point to note is that the
create database command is not part of the regular SQL standard. However, it is supported by almost all database products today. The
create database statement is straightforward. You just need to issue a database name along with the command, followed by a semicolon.
Let's start by creating a simple example database. We'll call it
studentdemo. To create the
studentdemo database with the default configuration, use the following command:
create database studentdemo;
To run this statement, click the
Execute button (shaped like a lightning bolt):
Action Output pane, the successful completion of a command will appear. You will also be able to see the newly created database in the
Schemas tab of the
SQL is not case sensitive. This implies
CREATE TABLE studentdemo; is the same as
create table studentdemo;.
We cannot have multiple databases with the same name. If you try to run the query again, you'll get the following error:
The Use of Semicolons
As you may have noticed, there's a semicolon,
;, at the end of the statement as an indication that that's the end of that statement. It depends on the database system you are using; some of them require a semicolon at the end of each statement and some don't, but you can still add it without worrying about the results.
In general, it's good practice to use a semicolon at the end of a statement as it could play a significant role when we have multiple SQL statements or while writing a function or a trigger. This will be explained in more detail in the upcoming chapters. Throughout this book, we will use semicolons at the end of each statement.
Data Types in SQL
Like every other programming language, SQL also has data types. Every piece of data that is entered into a database must comply with the data types and their formats. This implies that any data that you store is either a number, a character, or some other data type. Those are the basic data types. There are some special data types as well.
For instance, "00:43 on Monday, 1 April 2019" is a combination of letters, numbers, and punctuation. However, when we see something like this, we immediately start thinking of the day. A data type is the type of value that can be stored in a system. Some examples of data types are
STRING, and combinations of these such as
Since there's a large amount of data types, most languages classify data types. Here, we will go through some of the most common ones. The idea here is to get you acquainted with the data types, not to give you a complete rundown of them as this would overwhelm you with hardly any significant returns. Moreover, once the concept is clear, you will be able to adapt to the rest of the data types with little effort.
In the interest of better data integrity and modeling, it is critical to select the right data type for the situation. It may seem trivial when the database is small, but with a larger database, it becomes difficult to manage. As a programmer, it is your responsibility to model your data in the right way.
In order to keep this simple, let's broadly classify the data types into five categories:
- Numeric data types: Numeric data types include everything that involves numbers, such as integers (small/big), floating- and fixed-point decimal numbers, and real numbers. Here are some of the most common ones:
- Fixed and varying length characters and text: Performance is key when selecting either fixed- or variable-length characters. When you know that a certain piece of data will be of a fixed number of characters, use the fixed width. For example, if you know that the employee code will always be of 4 characters, you can use
CHAR. When you are unsure of the number of characters, use variable width. If a certain column holds only six characters, you are better off specifying it so that space used will be limited. By doing this, you will get better performance by not using up more resources than required. If you are unsure of the width, you don't want to be limited by the total width. Therefore, you should ideally use character types of varying lengths. An example of this can be a person's first name, where the length of the name is not fixed.
You can use
CHARwith varying lengths of characters (
VARCHAR) as well. For instance, in a field that accepts up to six characters, you can enter data that is three characters long. However, you would be leaving the other three-character spaces unused, which will be right-padded, meaning that the remaining spaces will be reserved as actual spaces. When the data is retrieved, these trailing spaces will be trimmed. If you don't want them to be trimmed, you can set a flag in SQL that tells SQL to reserve the spaces and not trim them during retrieval. There are situations where you would need to do this using the
TRIMstring function, for example, to enhance data security.
Unicode characters and string data types are different. They are prefixed with N, such as
NTEXT. Also, note that not all SQL implementations support Unicode data types.
Unicode character data types consume twice the storage space compared to non-Unicode character data types.
The other character-based data type is
TEXT. This can store textual data up to a certain limit, which may vary with the system. For instance, MS SQL supports text up to 2 GB in size.
- Binary data types: Binary forms of data are also allowed in SQL. For instance, an
IMAGEwould be an object of binary form. Similarly, you have
- Miscellaneous data types: Miscellaneous data types include most of the now-popular data types, such as Binary Large Object (BLOB), Character Large Object (CLOB), XML, and JSON. We have included
DATETIMEas well in this class.
Character and binary large objects include types such as files. For instance, a film stored on Netflix is a binary large object. So would be an application package such as an EXE or an MSI, or other types of files such as PDFs.
SQL Server 2016 supports JSON. JSON Unicode character representation uses
CHARfor non-Unicode strings.
MySQL version 5.7.8 supports a native JSON data type.
- Proprietary types: In the real world, there is hardly a pure SQL implementation that is favored by enterprises. Different businesses have different requirements, and to cater to these requirements, SQL implementations have created their own data types. For instance, Microsoft SQL has
MONEYas a data type.
Not all data types are supported by all vendors. For instance, Oracle's implementation of SQL does not support
DATETIME, while MySQL does not support
CLOB. Therefore, the flavor of SQL is an important consideration when designing your database schema.
As we mentioned previously, this is not an exhaustive list of all data types. Your flavor of SQL will have its own supporting set of data types. Read the documentation that comes with the product kit to find out what it supports—as a programmer or a SQL administrator, it is you who decides what is necessary. This book will empower you to do that.
The size limits illustrated in Figure 1.9 are only indicative. Just as different flavors of databases may have different data types, they may have different limits as well. The documentation that accompanies the product you plan to use will have this information.
Creating Simple Tables
After creating the database, we want to create a table The
create table statement is part of the SQL standard. The
create table statement allows you to configure your table, your columns, and all your relations and constraints. Along with the
create table command, you're going to pass the table name and a list of column definitions. At the minimum for every column, you must provide the column name and the data type the column will hold.
Let's say you want to add a table called
Student to the previously created database,
studentdemo, and you want this table to contain the following details:
- Student name: The student's full name.
- Student ID: A value to identify each student uniquely.
- Grade: Each student is graded as A, B, or C based on their performance.
- Age: The age of the student.
- Course: The course they are enrolled on.
To achieve this, we need to complete a two-step process:
- To set the current database as
studentdemo, enter the following code in the new query tab:
You can open a new query tab, by clicking
New Query Tab.
- Create a table
studentdemowith the following columns:
create table Student ( StudentID CHAR (4), StudentName VARCHAR (30), grade CHAR(1), age INT, course VARCHAR(50), PRIMARY KEY (StudentID) );
The preceding code creates a
Student table with the following columns:
StudentIDwill contain four character values.
'SSSS'are all valid inputs and can be stored in the
gradewill just contain a single character.
'z'are all valid inputs.
StudentNamewill contain variable-length values, which can be 30 characters in size at most.
'Smith'are all valid inputs.
coursewill also contain variable-length values, which can be 50 characters in size at most.
agewill be an integer value.
345are all valid values.
StudentID is defined as the primary key. This implies that all the values in the
StudentID field will be unique, and no value can be null. You can uniquely identify any record in the
Student table using
StudentID. We will learn about primary keys in detail in Chapter 3, Normalization.
NULL is used to represent missing values.
Notice that we have provided the
PRIMARY KEY constraint for
StudentID because we require this to be unique.
Once your table has been created successfully, you will see it in the
Schemas tab of the
Exercise 1.01: Building the PACKT_ONLINE_SHOP Database
In this exercise, we're going to start building the database for a Packt Online Shop—a store that sells a variety of items to customers. We will be using the MySQL Community Server in this book. The Packt Online Shop has been working on spreadsheets so far, but as they plan to scale up, they realize that this is not a feasible option, and so they wish to move toward data management through SQL. The first step in this process will be to create a database named
PACKT_ONLINE_SHOP with a table for storing their customer details. Perform the following steps to complete this exercise:
- Create a database using the
create database PACKT_ONLINE_SHOP;
- Switch to this database:
- Create the
create table Customers ( FirstName varchar(50) , MiddleName varchar(50) , LastName varchar(50) , HomeAddress varchar(250) , Email varchar(200) , Phone varchar(50) , Notes varchar(250) );
nvarcharis a variable-length data type; however, in
nvarchar, the data is stored in Unicode, not in ASCII. Therefore, columns defined with
nvarcharcan contain values in other languages as well.
nvarcharrequires 2 bytes per character, whereas
varcharuses 1 byte.
- Execute the statement by clicking the Execute button:
- Review the table by right-clicking the table in the
Schemas taband clicking
Select Rows - Limit 1000in the contextual menu:
This runs a simple
Select query. You will learn about the
Select statement in Chapter 4, The SELECT Statement. The top 1,000 rows are displayed. Since we have not inserted values into the table yet, we are only able to view the column headers in
If you are working on Microsoft SQL Server, you can do this by right-clicking the table in the Object Explorer window and then selecting
Select Top 1000 Rows.
In the next section, we will look at inserting values into tables.
Populating Your Tables
Once the table has been created, the next logical step is to insert values into the table. To do this, SQL provides the
INSERT statement. Let's try adding a row of data to the
Student table of the
studentdemo database that we created previously.
Here is the SQL statement to achieve this. First, switch to the
studentdemo database and enter the following query:
USE studentdemo; INSERT INTO Student (StudentID, StudentName, grade, age, course) VALUES ('S001', 'Prashanth Jayaram', 'A', 36, 'Computer Science');
If you check the contents of the database after running this query, you should see something like this:
To see the contents of this database, follow the process you used in the earlier exercises. Right-click the table and choose
Select Rows - Limit 1000.
Adding single rows like this in multiple queries will be time-consuming. We can add multiple rows by writing a query like the following one:
INSERT INTO Student (StudentID, StudentName, grade, age, course) VALUES ('S002', 'Frank Solomon', 'B', 35, 'Physics'), ('S003', 'Rachana Karia', 'B', 36, 'Electronics'), ('S004', 'Ambika Prashanth', 'C', 35, 'Mathematics');
The preceding query looks like this on the Query tab.
When you run the query, all three rows will be added with a single query:
Exercise 1.02: Inserting Values into the Customers Table of the PACKT_ONLINE_SHOP Database
Now that we have the
Customers table ready, let's insert values into the table using a single query. We have the data from an already existing Excel spreadsheet. We will be using that data to write our query. Here is what the Excel file looks like:
You can find the csv format of the file here: https://packt.live/369ytTu.
To move this data into the database, we will need to perform the following steps:
- Switch to the
- Insert the values based on the Excel spreadsheet provided wherever we have blank data. We will use
NULLto do this:
INSERT INTO Customers (FirstName, MiddleName, LastName, HomeAddress, Email, Phone, Notes) VALUES('Joe', 'Greg', 'Smith', '2356 Elm St.', '[email protected]', '(310) 555-1212', 'Always gets products home delivered'), ('Grace', 'Murray', 'Hopper', '123 Compilation Street', '[email protected]', '(818) 555-3678', 'Compiler pioneer'), ('Ada', NULL, 'Lovelace', '22 Algorithm Way', '[email protected]', '(717) 555-3457', 'First software engineer'), ('Joseph', 'Force', 'Crater', '1313 Mockingbird Lane', '[email protected]', '(212) 555-5678', 'Works everyday'), ('Jacqueline', 'Jackie', 'Cochran', '1701 Flightspeed Avenue', '[email protected]', '(717) 555-3457', 'Researcher'), (NULL, 'Paul', 'Jones', '126 Bonhomme Richard Ave.', '[email protected]', '(216) 555-6232', 'Admiral');
- When you execute the query and check the contents of the
Customerstable, you should see the following output.
With this, you have successfully populated the
Activity 1.01: Inserting Values into the Products Table in the PACKT_ONLINE_SHOP Database
Now that we've migrated the customer's data into the database, the next step is to migrate the product data from the Excel spreadsheet to the database. The data to be entered into the database can be found at https://packt.live/2ZnJiyZ.
Here is a screenshot of the Excel spreadsheet:
- Create a table called
- Create the columns as present in the Excel sheet.
- Use the
INSERTstatement to input the required data into the table.
The solution for this activity can be found via this link.
In this chapter, we had a look at the different types of data and how data is stored in relational databases. We also had a brief look at the different commands available in SQL. We specifically focused on creating databases and tables within the databases, as well as how we can easily insert values into tables.
In the next chapter, we will look at how we can modify the data, the properties of tables, and databases, and build complex tables.