Databases play a vital role in our life. Almost every application relies on a database for its operation. A database could be a flat file with a few bytes of data or a more complex system holding petabytes of data. Designing a database model is essential for an application to work effectively. Since the dawn of database systems, SQL databases ruled this technology space. SQL databases work on a relational database model that uses a tabular structure in which each element has a defined relationship with the other. However, recently, new generations of NoSQL databases have been created to combat the rising demands of data, and they are seeing widespread adoption from both technology giants and startups alike. This new wave of NoSQL databases has created a new school of thought, which requires a radically different thought process when compared to SQL databases.
This chapter explores the major differences between SQL and NoSQL databases, and provides an insight into what it takes to think in the NoSQL way. Here, you will also learn how to make decisions based on reliability, performance, and complexity.
A new application almost always starts with data modeling. In traditional data modeling for relational databases, this process is known as data normalization, which focuses on removing data redundancy and minimizing data dependency. The modeling formulates a series of tables, which are defined by its schema (rows and columns of the table).
Why normalization? Data normalization helps to reduce dependency and redundancy of data, which makes it easier to update the data, while maintaining its consistency.
Normalization ensures that the update is done only once for an entity. Let's take a typical example to showcase how data normalization works:
ID |
Name |
Subject |
---|---|---|
1 |
John |
Biology |
2 |
Smith |
Math |
3 |
John |
Math |
4 |
Scott |
Biology |
5 |
Smith |
Physics |
In this example, the data in the Name field is repeating as one student can take multiple subjects. In normalization, it is required to have data that is tabular with each row-column intersection having one value. The redundancy of data can create inconsistencies in the data during updates, where there is a real possibility of the same data having different values. To make the preceding table normalized, we can consider adding another column for extra subjects, which looks something like this:
ID |
Name |
Subject_1 |
Subject_2 |
---|---|---|---|
1 |
John |
Math |
Biology |
2 |
Smith |
Physics |
Math |
3 |
Scott |
Biology |
Null |
The preceding structure creates its own problems. For instance, if you want to know all the students who are taking a Math class, you have to check two columns. The problem gets more complex if there is a student who is taking more than two subjects, which requires you to add a column to the table. Updating and deleting a record becomes more problematic.
In this scenario, to maintain consistency and to avoid redundancy, we have to create two tables. The first table will contain all the student information as follows:
Student_ID |
Name |
---|---|
1 |
John |
2 |
Smith |
3 |
Scott |
The second table will contain the information related to the subjects offered as follows:
Subject_ID |
Subject_name |
---|---|
1 |
Biology |
2 |
Math |
3 |
Physics |
Now, we have to create a map of uniquely identifiable IDs in the form of a table that shows which student is taking what subjects:
Student_ID |
Subject_ID |
---|---|
1 |
1 |
1 |
2 |
2 |
2 |
2 |
3 |
3 |
1 |
In the preceding data model, we do not have to worry about data inconsistencies. This allows us to manipulate the data easily, and allows insertions and deletions of data without having to worry about artifacts. However, a problem arises when reading the data. If you want to know the name of the student who is taking Math, the SQL query will be as follows:
SELECT a.Name from STUDENTS a JOIN STU_SUB_MAP b ON a.student_id=b.student_id JOIN SUBJECTS c ON c.Subject_id = b.Subject_id where c.subject_name='Math'
The result of the query will look like the following table:
Name |
---|
John |
Smith |
Even though we have got the correct result from the database, the problem lies in the amount of work the database needs to do to get this result. It has to join three different tables, read the data, and display the results. The JOIN
operation is a relatively expensive operation, which can be mitigated to some extent by using indexes. However, as the table size increases, the amount of time the database takes to compute also increases.
Though normalization gives us highly structured tables with the utmost consistency, sometimes we need to keep the data in a denormalized format. This is usually necessary in scenarios where performance takes precedence over structure. A denormalized structure is more beneficial if your application has much more reads than writes.
The main advantage of NoSQL is that there is no concept of normalization. This is the reason why you get more performance from a NoSQL database when seen against a normalized SQL database. There is a trade-off in that you are sacrificing data consistency in the NoSQL database, but the benefits achieved in doing so are higher. Clearly, NoSQL databases are built with one central feature, which is performance. So in order to achieve performance, the data consistency and reliability are sacrificed at various levels.
Redis, as explained by the creator, is a key-value store. The data is always stored as key and value pairs. This means a lot of work needs to go into data modeling. There are no easy normalization steps like in SQL that instantly give us the tables. The data modeling needs to account for all the application data and closely binds to requirements of the application. In Redis, the key should be a string, but the value can be of many types, such as strings, lists, sets, hashes, and many more.
Note
Redis is not a SQL replacement. It works completely out of memory and is best suited for faster writes and random reads. In most cases, it cannot be used as the only data store in your application stack.
Revisiting the same student subject schema, there are many possibilities to store the same data in Redis. It also offers convenience to the user in terms of storing and retrieving the data. For example, we can use hashes on the student subject schema as follows:
HMSET JOHN subject1 Math subject2 Biology HMSET SMITH subject1 Math subject2 Physics HMSET SCOTT subject1 Biology
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
The above command uses the hashes data type in Redis and creates three hashes with the student name as key. If you want to get all the subjects taken by a student, you can use the following command:
HVALS SMITH
The preceding command will give the following output:
1) "Math" 2) "Physics"
A limitation with the previous structure is that it is difficult to update or delete a value from the student field due to the named field name for the hashes. We can achieve a similar result by using SET
as follows:
SADD student:JOHN Math Biology SADD student:SMITH Math Physics SADD student:SCOTT Biology
If we would like to know the subjects taken by Scott, the command is simple: SMEMBERS SCOTT
. In case we need to query for the subject, we can store the data as follows:
SADD subject:Math JOHN SMITH SADD subject:Biology JOHN SCOTT SADD subject:Physics SMITH
So, if we need to know the students taking Biology, we can use SMEMBERS Biology
.
Note
If we are interested in knowing the students who are taking both Math and Biology, we can use SINTER Math Biology
.
As we can see, data modeling on Redis entirely depends on what data we want to read back from it. However, in the case of an update or delete, we need to update both the subject set and student set.
The decision to use different data types and formats majorly depends on the access patterns of our application. We need to store denormalized data in case of reads and store partial normalized data in case of equal reads and writes to maintain consistency. The flexibility of Redis lets your application, if used correctly, take advantage of its powerful ultrafast data store. We will look into various data modeling examples in the upcoming chapters, where we will delve into more complex use case patterns. In the next chapter, we will look into how the transactions and locks work in Redis in comparison to the traditional SQL systems.