Home Data Redis Stack for Application Modernization

Redis Stack for Application Modernization

By Luigi Fugaro , Mirko Ortensi
books-svg-icon Book
eBook $29.99 $20.98
Print $36.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $29.99 $20.98
Print $36.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Chapter 1: Introducing Redis Stack
About this book
In modern applications, efficiency in both operational and analytical aspects is paramount, demanding predictable performance across varied workloads. This book introduces you to Redis Stack, an extension of Redis and guides you through its broad data modeling capabilities. With practical examples of real-time queries and searches, you’ll explore Redis Stack’s new approach to providing a rich data modeling experience all within the same database server. You’ll learn how to model and search your data in the JSON and hash data types and work with features such as vector similarity search, which adds semantic search capabilities to your applications to search for similar texts, images, or audio files. The book also shows you how to use the probabilistic Bloom filters to efficiently resolve recurrent big data problems. As you uncover the strengths of Redis Stack as a data platform, you’ll explore use cases for managing database events and leveraging introduce stream processing features. Finally, you’ll see how Redis Stack seamlessly integrates into microservices architectures, completing the picture. By the end of this book, you’ll be equipped with best practices for administering and managing the server, ensuring scalability, high availability, data integrity, stored functions, and more.
Publication date:
December 2023
Publisher
Packt
Pages
336
ISBN
9781837638185

 

Introducing Redis Stack

Redis has achieved several important milestones since its inception in 2009, from taking the lead as the most popular key-value data store, according to the ranking published every month by the website DB-Engines (and the sixth among all database systems), up to establishing the record as the most downloaded container image on Docker. Not to mention that Redis has been the most loved database for five years in a row, according to the Developer Survey published by Stack Overflow in the years 2016-2021. And, for sure, you, or a friend of yours, have used it for some reason, work, or hobby.

If you are reading this book, chances are you have programmed an application using a Redis server, or at least you know what it is and what it is used for. In this chapter, we’ll recap what made Redis the most famous caching system in the world and we’ll share some anecdotes about the development undertaken by its creator, Salvatore Sanfilippo. We won’t stay long on the story of Redis, though, because this book is about application modernization. As you read through, you will discover how the original database, designed for speed and simplicity, has evolved to resolve many of the new challenges of this age, without compromising on the ease of adoption, flexibility, and, above all, speed.

Redis Stack is an extension of Redis presented in 2022, which introduces JSON, vector, and time series data modeling capabilities, all supporting real-time queries and searches. Redis Stack represents a new approach to providing a rich data modeling experience all within the same database server. It introduces features such as vector similarity search to query structured and unstructured data (for example, text, images, or audio files) and delivers probabilistic Bloom filters to efficiently resolve recurrent big data problems. Redis Stack is also a data platform that supports event-driven programming and introduces stream processing features. By the end of this chapter, you will understand what Redis Stack is and how it enhances the Redis server with many new capabilities. Above all, you will learn the motivation behind Redis Stack and why multi-model databases can increase the speed of technological innovation for organizations of all sizes. In this chapter, we are going to cover the following topics:

  • Exploring the history of Redis
  • The open source project
  • From key-value to multi-model real-time databases
  • Redis Stack deployment types
 

Technical requirements

To follow along with the examples in the chapter, you will need the following:

  • Redis Stack Server version 7.2 or later installed on your development environment. Alternatively, you can create a free Redis Cloud subscription to achieve a free plan and use a managed Redis Stack database. Refer to Chapter 3, Getting Started with Redis Stack.
  • The dataset used in the examples – a conversion of the rows in the popular MySQL World database to Redis Hash data types. Find and download it from this book’s repository if you’d like to test the examples that we propose in this chapter: https://github.com/PacktPublishing/Redis-Stack-for-Application-Modernization.
 

Exploring the history of Redis

Redis was conceived and designed in 2009 by the Italian software engineer Salvatore Sanfilippo as a solution to scaling LLOOGG, an online analytics server co-founded with Fabio Pitrola that empowered web admins to track user activities. Challenged by the scalability limitations of MySQL, Salvatore decided to rethink the concept of key-value storage and design something that would (admittedly) be different from Memcached, while preserving its simplicity and speed. The first beta release was shared on Google Code on February 25, 2009. A few months later, in September 2009, the first stable release, Redis 1.0, was published as a tar package of less than 200 KB.

Redis has been designed to offer an alternative for problems where relational databases (RDBMSs) are not a good fit because there is something wrong if we use an RDBMS for all kinds of work. However, in comparison to other data storage options that became popular when the NoSQL wave shook the world of databases (Memcached, the key-value data store released in 2003, or MongoDB, the document store released in 2009, and many more), Redis has its roots in computer science and makes a rich variety of data structures available. This is one of the distinguishing features of Redis and the likely reason that fostered its adoption by software engineers and developers – presenting data structures such as hashes, lists, sets, bitmaps, and so on that are familiar to software engineers so they could transfer the programming logic to data modeling without any lengthy and computationally expensive data transformation. Viewed in this light, we could say that Redis is about persisting the data structures of a programming language. An example of the simplicity of storing a Python dictionary in a Redis hash data structure follows:

user = {"name":"John",
        "surname":"Smith",
        "company":"Redis",
        "department":"Sales"}
r.hset("user:{}".format(str(2345)), mapping=user)

In the same way, adding elements to a Redis Set can be done using Python lists:

languages = ['Python', 'C++', 'JavaScript']
r.sadd("coding", *languages)

In these examples, the user dictionary and the languages list are stored without transformations, and this is one of the advantages that Redis data structures offer to developers: simplifying data modeling and reducing the transformational overhead required to convert the data in a format that can be mapped to the data store (thus reducing the so-called impedance mismatch).

There was a short gap between the first release and its adoption by Instagram and GitHub. If we try to dig into the reasons that made Redis so popular, we can mention a few, among which we count the speed and simplicity of deployment. Beyond the user experience, Redis is an act of dedication and passion, and as we read in Redis’s own manifesto, code is like poetry; it’s not just something we write to reach some practical result. People love beautiful stories and simplicity and everybody should fight against complexity.

What is surely true is that Redis is an idea to solve problems where relational databases, still tied to rigid paradigms, wouldn’t fit the purpose. It is the product of creativity, inspiration, and love for things done manually, where good design and craftsmanship intertwine to accomplish something that simply works. An intimate artwork. And we like to recall Salvatore’s words about the creative approach when writing Redis:

My wife claims I wrote it mostly while sitting on the WC for the first years, on a MacBook Air 11. Would be nice to tell her she is wrong, but she happens to be perfectly right about the matter.

From the most-used thinking room in Sicily to becoming the most-loved and used key-value database in the world, this is the story we have decided to tell in this book, and we are sure you will find the journey through the pages an exciting adventure.

One of the guiding principles behind Redis is being open source and driven by a community of enthusiast contributors. We’ll explore that in the next section.

 

The open source project

The success of a technical project is always measurable in terms of the innovation of the proposal, simplicity of use, exhaustive documentation, high performance, low footprint, and stability, among other aspects. However, and this is true for many things, at the end of the day what matters is the capacity to resolve a problem and the impact of the solution. Organizations that decide to add new technology to their stack face several challenges to understand, prototype, validate, and set up a plan to deploy test environments together with a release strategy, a maintenance plan, and, finally, a plan to develop competence. Success stories require careful planning. From these many perspectives, Redis is considered first-in-class, and in this book, we will expose many of the reasons that made Redis the de-facto standard among the in-memory data stores in the world. But even before digging into the features of Redis Stack, Redis, as an open source project, has undoubtedly added value to many businesses:

  • A variety of options exist to get it running close to the application. It is available as a managed service in every public cloud provider, it can be installed from the source code or as a binary file, and Docker images are available for all the versions and flavors.
  • It has good documentation and a command reference, together with examples (from the https://redis.io/ website).
  • It is straightforward to set up and test. The source code is self-contained and does not depend on external libraries.
  • Client libraries for the most popular programming languages are available and supported (Java, JavaScript, Python, Go, and C#/.NET).
  • The well-known and permissive BSD license grants the freedom to use, modify, and distribute Redis, among other advantages. Users can test and run Redis in production without any concerns.

These reasons, together with the fact that it’s very easy to learn Redis, make it an attractive option to set up and use. On a computer configured to build C projects, pulling the source code from the GitHub repository, compiling it, and running the server can be done in less than a minute:

git clone https://github.com/redis/redis.git
cd redis/
make
./src/redis-server &
./src/redis-cli PING
PONG

The open source project delivers the core Redis server plus additional utilities, such as these:

  • redis-cli, the command-line interface to administer the server and manage the data. This utility assists also in configuring scalable deployments with Redis Cluster and high availability with replication and Sentinel. Among other features, it includes auto-completion, online help for single commands (for example, HELP HSET), or by group of commands (for example, HELP @hash, to learn about the commands that can be used with the Hash data structure). Just type HELP to understand how to make use of the online help.
  • redis-benchmark, a simple benchmarking utility to perform batches of tests for different data structures. Useful to evaluate how well the server performs on determined hardware.
  • redis-sentinel, the agent that automates the management of replicated topologies and provides clients with a discovery service.
  • create-cluster, a utility useful to set up a Redis Cluster environment for testing.
  • redis-check-rdb and redis-check-aof, utilities to health check AOF and RDB persistence files.

Now that we have reviewed the basic principles behind Redis and its utilities, we are ready to dive into the world of data modeling. This journey will take us from relational databases to Redis core data structures, and we will see how the multi-model capabilities of Redis Stack simplify many data modeling problems.

 

From key-value to multi-model real-time databases

The core data structures that are available out of the box in the Redis server solve a variety of problems when it comes to mapping entities and relationships. To start with concrete examples of modeling using Redis, the usual option to store an object is the Hash data structure, while collections can be stored using Sets, Sorted Sets, or Lists (among other options because a collection can be modeled in several other ways). In this section, we will introduce the multi-model features of Redis Stack using a comprehensive approach, which may be useful for those who are used to storing data using the relational paradigm, which implies organizing the data in rows and columns of a table.

Consider the requirement to model a list of cities. Using the relational data model, we can define a table using the SQL data definition language (DDL) instruction CREATE TABLE as follows:

CREATE TABLE `city` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`)
)

This table definition defines attributes for the city entity and specifies a primary key on an integer identifier (a surrogate key, in this case, provided the uniqueness of the attributes is not guaranteed for the city entity). The DDL command also defines an index on the CountryCode attribute. Data encoding, collation, and the specific technology adopted as the storage engine are not relevant in this context. We are focused on understanding the model and the ability that we have to query it.

Primary key lookup

Primary key lookup is the most efficient way to access data in a relational table. Filtering the table on the primary key attribute is as easy as executing the SQL SELECT statement:

SELECT * FROM city WHERE ID=653;
+-----+--------+-------------+----------+------------+
| ID  | Name   | CountryCode | District | Population |
+-----+--------+-------------+----------+------------+
| 653 | Madrid | ESP         | Madrid   |    2879052 |
+-----+--------+-------------+----------+------------+
1 row in set (0.00 sec)

Modeling a city using one of the Redis core data structures leads to mapping the data in the SQL table to Hashes, so we can store the attributes as field-value pairs, with the key name including the primary key:

127.0.0.1:6379> HSET city:653 Name "Madrid" CountryCode "ESP" District "Madrid" Population 2879052

The HGETALL command can be used to retrieve the entire hash with minimal overhead (HGETALL has direct access to the value in the Redis keyspace):

HGETALL city:653
1) "Name"
2) "Madrid"
3) "CountryCode"
4) "ESP"
5) "District"
6) "Madrid"
7) "Population"
8) "2879052"

In addition, we can limit the bandwidth usage caused by the entire row transfer to the client and select only specific attributes. The SQL syntax is as follows:

SELECT Name, Population FROM city WHERE ID=653;
+--------+------------+
| Name   | Population |
+--------+------------+
| Madrid |    2879052 |
+--------+------------+
1 row in set (0.00 sec)

In this analogy between the relational model and Redis, the command is HGET (or HMGET for multiple values):

127.0.0.1:6379> HMGET city:653 Name Population
1) "Madrid"
2) "2879052"

While we need to extract data based on the primary key identifier, the solution is at hand in both the relational database and in Redis. Things get more complicated if we want to perform lookup and search queries on the dataset. In the next examples, we’ll see how the complexity and performance of such operations may vary substantially.

Secondary key lookup

Primary key lookups are efficient: after all, the primary key is an index, and it guarantees direct access to the table row. But what if we want to search for cities by filtering on an attribute? Let’s try an indexed search against our relational database over the CountryCode column, which has a secondary index:

mysql> SELECT Name FROM city WHERE CountryCode = "ESP";
+--------------------------------+
| Name                           |
+--------------------------------+
| Madrid                         |
| Barcelona                      |
| [...]                          |
+--------------------------------+
59 rows in set (0.02 sec)

This is an efficient search because the table defines an index on the CountryCode column. To continue the comparison of the relational database versus Redis, we will need to execute the same query against the stored Hashes. For this demonstration, we will assume that we have migrated the city table to Hashes in the Redis server. By design, Redis has no secondary indexing feature for any of the core data structures, which means that we should scan all the Hashes prefixed by the “city:” namespace, then read the city name from every Hash and check whether it matches our search term. The following example performs a non-blocking scan of the keyspace, filtering on the key name (“city:*”) in batches of configurable size (three, in the example):

127.0.0.1:6379> SCAN 0 MATCH city:* COUNT 3
1) "512"
2) 1) "city:4019"
   2) "city:9"
   3) "city:103"

The client should now extract the CountryCode value from every city, compare it to the search term, and repeat until the scan is concluded. This is obviously a time-consuming and expensive approach. There are ways to improve the efficiency of such batched operations. We will explore three standard options and then show how to resolve the problem using the Redis Stack capabilities:

  • Pipelining
  • Using functions
  • Using indexes
  • Redis Stack capabilities

We will look at these in detail next.

Pipelining

The first approach to reducing the overhead of the search operation is to use pipelining, which is supported by all major client libraries. Pipelining collects a batch of commands, delivers them to the server, and collects the outputs from the server immediately before returning the result to the client. This option dramatically reduces the latency of the overall operation, as it saves on the roundtrip time to the server (an analogy that works is going to the supermarket once to purchase 30 items rather than going 30 times and purchasing one item on every visit). The pros and cons of pipelining are as follows:

  • Pros: Saves on roundtrip time and does not block the server, as the server executes a batch of commands and returns the results to the client. Therefore, it increases overall system throughput. Pipelining is especially useful when batching operations.
  • Cons: The complexity of the operation is proportional to the number and complexity of the operations in the pipeline that are executed by the server. This may increase the memory usage on the server, as it keeps the intermediate results in memory until all commands in the pipeline are processed. The client manages multiple responses, which adds complexity to its business logic, especially when it has to deal with errors of some operations in the pipeline.

Using functions

Lua scripting and functions (functions were introduced in Redis 7.0 and represent an evolution of Lua scripting for remote server execution) help to offload the client and remove network latency. The search is local to the server and close to the data (equivalent to the concept of stored procedures). The following function is an example of local search:

#!lua name=mylib
local function city_by_cc(keys, args)
   local match, cursor = {}, "0";
   repeat
      local ret = redis.call("SCAN", cursor, "MATCH", "city:*", "COUNT", 100);
      local cities = ret[2];
        for i = 1, #cities do
         local keyname = cities[i];
         local ccode = redis.call('HMGET',keyname,'Name','CountryCode')
         if ccode[2] == args[1] then
            match[#match + 1] = ccode[1];
         end;
        end;
        cursor = ret[1];
      until cursor == "0";
   return match;
end
redis.register_function('city_by_cc', city_by_cc)

In this function, we do the following:

  1. We perform a scan of the entire keyspace, filtering by the “city:*” prefix, which means that we will iterate through all the keys in the Redis server database.
  2. For every key returned by the SCAN command, we retrieve the name and CountryCode of the city using the HMGET command.
  3. If CountryCode matches our search filter, we add the city to an output array.
  4. When the scan is completed, we return the array to the client.

Type the code into the mylib.lua file and import the library as follows:

cat mylib.lua | redis-cli -x FUNCTION LOAD

The function can be invoked using the following command:

127.0.0.1:6379> FCALL city_by_cc 0 "ESP"
 1) "A Coru\xf1a (La Coru\xf1a)"
 2) "Almer\xeda"
[...]
59) "Barakaldo"

The pros and cons of using functions are as follows:

  • Pros: The operation is executed on the server, and the client does not experiment with any overhead.
  • Cons: The complexity of the operation is linear, and the function (like any other Lua script or function) blocks the server. Any other concurrent operation must wait until the execution of the function is completed. Long scans make the server appear stuck to other clients.

Using indexes

Data scans, wherever they are executed (client or server side), are slow and ineffective in satisfying real-time requirements. This is especially true when the keyspace stores millions of keys or more. An alternative approach for search operations using the Redis core data structures is to create a secondary index. There are many options to do this using Redis collections. As an example, we can create an index of Spanish cities using a Set as follows:

SADD city:esp "Sevilla" "Madrid" "Barcelona" "Valencia" "Bilbao" "Las Palmas de Gran Canaria"

This data structure has interesting properties for our needs. We can retrieve all the Spanish cities in a single command:

127.0.0.1:6379> SMEMBERS city:esp
1) "Madrid"
2) "Sevilla"
3) "Valencia"
4) "Barcelona"
5) "Bilbao"
6) "Las Palmas de Gran Canaria"

Or we can check whether a specific city is in Spain using SISMEMBER, a constant time-complexity command:

127.0.0.1:6379> SISMEMBER city:esp "Madrid"
(integer) 1

And we can even search the index for cities having a name that matches a pattern:

127.0.0.1:6379> SSCAN city:esp 0 MATCH B*
1) "0"
2) 1) "Barcelona"
   2) "Bilbao"

We can refine our search requirements and design an index that considers the population. In such a case we could use a Sorted Set and Set the population as the score:

127.0.0.1:6379> ZADD city:esp 2879052 "Madrid" 701927 "Sevilla" 1503451 "Barcelona" 739412 "Valencia" 357589 "Bilbao" 354757 "Las Palmas de Gran Canaria"
(integer) 6

The main feature of the Sorted Set data structure is that its members are stored in an ordered tree-like structure (Redis uses a skiplist data structure), and with that, it is possible to execute low-complexity range searches. As an example, let’s retrieve Spanish cities with more than 2 million inhabitants:

127.0.0.1:6379> ZRANGE city:esp 2000000 +inf BYSCORE
1) "Madrid"

We can also check whether a city belongs to the index of Spanish cities:

127.0.0.1:6379> ZRANK city:esp Madrid
(integer) 5

In the former example, the ZRANK command informs us that the city Madrid belongs to the index and is fifth highest in the ranking. This solution resolves the overhead caused by having to scan the entire keyspace looking for matches.

The drawback of such a manual approach to indexing the data is that indexes need to reflect the data at any time. Considering scenarios where we want to add or remove a city from our database, we need to perform the two operations of removing the city Hash and updating the index, atomically. We can use a Redis transaction to perform atomic changes on both the data and the index:

127.0.0.1:6379> MULTI
OK
127.0.0.1:6379(TX)> DEL city:653
QUEUED
127.0.0.1:6379(TX)> ZREM city:esp "Madrid"
QUEUED
127.0.0.1:6379(TX)> EXEC
1) (integer) 1
2) (integer) 1

Custom secondary indexes come at a price, though, because complex searches become hard to manage using multiple data structures. Indexes must be maintained, and the complexity of such solutions may get out of hand, putting the consistency of search operations at risk. The pros and cons of using indexing are as follows:

  • Pros: Simple and fast search operations are possible using Redis core data structures to create a secondary index
  • Cons: The secondary index needs to be maintained, and search operations on multiple fields (what is called a composite index in relational databases) are not immediate and need thoughtful planning, implementation, and maintenance

Next, we will examine the capabilities of Redis Stack.

Redis Stack capabilities

Caching is one of the frequent use cases for which Redis shines as the best-in-class storage solution. This is because it stores data in memory, and offers real-time performance. It is also lightweight, as data structures are optimized to consume little memory. Redis does not need any complex configuration or maintenance and it is open source, so there is no reason not to give it a try. As a real-time data storage, it seems plausible that complex search operations may not be the primary use case users are interested in when using Redis. After all, fast retrieval of data by key is what made Redis so versatile as a cache or as a session store.

However, if in addition to the ability to use core data structures to store the data, we ensure that fast searches can be performed (besides primary key lookup), it is possible to think beyond the basic caching use case and start looking at Redis as a full-fledged database, capable of high-speed searches.

So far, we have presented simple and common search problems and both solutions using the traditional SQL approach and possible data modeling strategies using Redis core data structures. In the following sections, we will show how Redis Stack resolves query and search use cases and extends the core features of Redis with an integrated modeling and developing experience. We will introduce the following capabilities:

  • Querying, indexing, and searching documents
  • Time series data modeling
  • Probabilistic data structures
  • Programmability

Let’s discuss each of these capabilities in detail.

Querying, indexing, and searching documents

Redis Stack complements Redis with the ability to create secondary indexes on Hashes or JSON documents, the two document types supported by Redis Stack. The search examples seen so far can be resolved with the indexing features. To perform an indexed search, we create an index against the hashes modeling the cities using the following syntax:

FT.CREATE city_idx
ON HASH
PREFIX 1 city:
SCHEMA Name AS name TEXT
CountryCode AS countrycode TAG SORTABLE
Population AS population NUMERIC SORTABLE

The FT.CREATE command instructs the server to perform the following operations:

  1. Create an index for the desired values of the Hash document.
  2. Scan the keyspace and retrieve the documents prefixed by the “hash:” string.
  3. Create the index corresponding to the desired data structure and, as specified by the FT.CREATE command, the Hash in this case. The indexes defined in this example are of the following types:
    • TEXT, which enables full-text search on the Name field
    • TAG SORTABLE, which enables an exact-match search against the CountryCode field and enables high-performance sorting by the value of the attribute
    • NUMERIC SORTABLE, which enables range queries against the Population field and enables high-performance sorting by the value of the attribute

As soon as the indexing operation against the relevant data – all the keys prefixed by “hash:”– is completed, we can execute the queries and searches seen so far, and more. The syntax in the following example executes a search of all the cities with the value “ESP” in the TAG field type and returns only the name of the cities, sorted in lexicographical order. Finally, the first three results are returned using the LIMIT option. Note that this query is executed against the new city_idx index, and not directly against the data:

127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' RETURN 1 name SORTBY name LIMIT 0 3
1) (integer) 59
2) "city:670"
3) 1) "name"
   2) "A Coru\xc3\xb1a (La Coru\xc3\xb1a)"
4) "city:690"
5) 1) "name"
   2) "Albacete"
6) "city:687"
7) 1) "name"
   2) "Alcal\xc3\xa1 de Henares"

It is possible to combine several textual queries/filters in the same index. Using exact-match and full-text search, we can verify whether Madrid is a Spanish city:

127.0.0.1:6379> FT.SEARCH city_idx '@name:Madrid @countrycode:{ESP}' RETURN 1 name
1) (integer) 1
2) "city:653"
3) 1) "name"
   2) "Madrid"

In a previous example, the range search was executed using the ZRANGE data structure. Using the indexing capability of Redis Stack, we can execute range searches using the NUMERIC field type. So, if we want to retrieve the Spanish cities with more than 2 million inhabitants, we will write the following search query:

127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name
1) (integer) 1
2) "city:653"
3) 1) "name"
   2) "Madrid"

Redis Stack offers flexibility and concise syntax to combine several field types, of which we have seen only a limited but representative number of examples. Once the index is created, the user can go ahead and use it, and add new documents or update existing ones. The database maintains the indexes updated synchronously as soon as documents are created or changed.

Besides full-text, exact-match, and range searches, we can also perform data aggregation (as we would in a relational database using the GROUP BY statement). If we would like to retrieve the three most populated countries, sorted in descending order, we would solve the problem in SQL as follows:

SELECT CountryCode,
SUM(Population) AS sum
FROM city
GROUP BY CountryCode
ORDER BY sum DESC
LIMIT 3;
+-------------+-----------+
| CountryCode | sum       |
+-------------+-----------+
| CHN         | 175953614 |
| IND         | 123298526 |
| BRA         |  85876862 |
+-------------+-----------+
3 rows in set (0.01 sec)

We can perform complex aggregations with the FT.AGGREGATE command. Using the following command, we can perform a real-time search and aggregation to compute the total population of the top three countries by summing up the inhabitants of the cities per country:

127.0.0.1:6379> FT.AGGREGATE city_idx * GROUPBY 1 @countrycode REDUCE SUM 1 @population AS sum SORTBY 2 @sum DESC LIMIT 0 3
1) (integer) 232
2) 1) "countrycode"
   2) "chn"
   3) "sum"
   4) "175953614"
3) 1) "countrycode"
   2) "ind"
   3) "sum"
   4) "123298526"
4) 1) "countrycode"
   2) "bra"
   3) "sum"
   4) "85876862"

To summarize this brief introduction where we addressed the search and aggregation capabilities, it is worth mentioning that there are multiple types of searches, such as phonetic matching, auto-completion suggestions, geo searches, or a spellchecker to help design great applications. We will cover them in depth in Chapter 5, Redis Stack as a Document Store, where we showcase Redis Stack as a document store.

Besides modeling objects as Hash, it is possible to store, update, and retrieve JSON documents. The JSON format needs no introduction, as it permeates data pipelines including heterogeneous subsystems, protocols, databases, and so on. Redis Stack delivers this capability out of the box and manages JSON documents in a similar way to Hashes, which means that it is possible to store, index, and search JSON objects and work with them using JSONPath syntax:

  1. To illustrate the syntax to store, search, and retrieve JSON data along the lines of the previous examples, let’s store city objects formatted as JSON:
    JSON.SET city:653 $ '{"Name":"Madrid", "CountryCode":"ESP", "District":"Madrid", "Population":2879052}'
    JSON.SET city:5 $ '{"Name":"Amsterdam", "CountryCode":"NLD", "District":"Noord-Holland", "Population":731200}'
    JSON.SET city:1451 $ '{"Name":"Tel Aviv-Jaffa", "CountryCode":"ISR", "District":"Tel Aviv", "Population":348100}'
  2. We don’t need anything else to start working with the JSON documents stored in Redis Stack. We can then perform basic retrieval operations on entire documents:
    127.0.0.1:6379> JSON.GET city:653
    "{\"Name\":\"Madrid\",\"CountryCode\":\"ESP\",\"District\":\"Madrid\",\"Population\":2879052}"
  3. We can also retrieve the desired property (or multiple properties at once) stored on a certain path, with fast access guaranteed, because the document is stored in a tree structure:
    127.0.0.1:6379> JSON.GET city:653 $.Name
    "[\"Madrid\"]"
    127.0.0.1:6379> JSON.GET city:653 $.Name $.CountryCode
    "{\"$.Name\":[\"Madrid\"],\"$.CountryCode\":[\"ESP\"]}"
  4. As we have seen for Hash documents, we can index JSON documents using a similar syntax and perform search operations. The following command creates an index for all the JSON documents with the city: prefix in the database:
    FT.CREATE city_idx ON JSON PREFIX 1 city: SCHEMA $.Name AS name TEXT $.CountryCode AS countrycode TAG SORTABLE $.Population AS population NUMERIC SORTABLE
  5. And using the FT.SEARCH command with an identical syntax as seen for the Hash documents, we can perform search operations:
    127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name
    1) (integer) 1
    2) "city:653"
    3) 1) "name"
       2) "Madrid"

Unlike Hash documents, the JSON supports nested levels (up to 128) and can store properties, objects, arrays, and geographical locations at any level in a tree-like structure, so the JSON format opens up a variety of use cases using a compact and flexible data structure.

Time series data modeling

Time series databases do not need any long introduction: they are data structures that can store data points happening at a certain time, indicated by a Unix timestamp expressed in milliseconds, with an associated numeric data value, typically with double precision. This data structure applies to many use cases, such as monitoring entities over time or tracking user activities for a determined service. Redis Stack has an integrated time series database that offers many useful features to manage the data points, for querying and searching, and provides convenient formatting commands for data processing and visualization. Beginning with time series modeling is straightforward:

  1. We can create a time series from the command-line interface (or from any of the client libraries that support time series):
    TS.CREATE "app:monitor:temp"
  2. Storing samples into the time series can be done with the TS.ADD command. If we would like to store the temperature measured by the sensor of a meteorological station captured every few seconds, the commands would be as follows:
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632813307
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632818179
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632824174
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20.1"
    (integer) 1675632829519
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632835052
  3. We are instructing the database to insert the sample at the current time, so we specify the * argument. We can finally retrieve the samples stored in the time series for the desired interval:
    127.0.0.1:6379> "TS.RANGE" "app:monitor:temp" "1675632818179" "1675632829519"
    1) 1) (integer) 1675632818179
       2) 20
    2) 1) (integer) 1675632824174
       2) 20
    3) 1) (integer) 1675632829519
       2) 20.1

We have just scratched the surface of using time series with Redis Stack, because data may be aggregated, down-sampled, and indexed to address many different uses.

Probabilistic data structures

Deterministic data structures – all those structures that store and return the same data that was stored (such as Strings, Sets, Hashes, and the rest of Redis structures) – are a good solution for standard amounts of data, but they may become inadequate due to the constantly growing volumes of data that systems must handle. Redis offers several options to store and present data to extract different types of insights. Strings are an example because they can be encoded as integers and used as counters:

127.0.0.1:6379> INCR cnt
(integer) 1
127.0.0.1:6379> INCRBY cnt 3
(integer) 4

Strings can also be managed down to the bit level to store multiple integer counters of variable length and stored at different offsets of a single string to reduce storage overheads using the bitfield data structure:

127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5
1) (integer) 5
127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5
1) (integer) 10
127.0.0.1:6379> BITFIELD cnt GET i5 0
1) (integer) 10

Regular counters, sets, and hash tables perform well for any amount of data but handling large amounts of data represents a challenge to scale the resources of the machine where Redis Stack is running, because of its memory requirements.

Deterministic data structures have given way to probabilistic data structures because of the need to scale up to large quantities of data and give a reasonably approximated answer to questions such as the following:

  • How many different pages has the user visited so far?
  • What are the top players with the highest score?
  • Has the user already seen this ad?
  • How many unique values have appeared so far in the data stream?
  • How many values in the data stream are smaller than a given value?

In the attempt to give an answer to the first question in the list, we could calculate the hash of the URL of the visited page and store it in a Redis collection, such as a Set, and then retrieve the cardinality of the structure using the SCARD command. While this solution works very well (and is deterministically exact), scaling it to many users and many visited pages represents a cost.

Let’s consider an example with a probabilistic data structure. HyperLogLog estimates the cardinality of a set with minimal memory usage and computational overhead without compromising the accuracy of the results, while consuming only a fraction of memory and CPU, so you would count the visited pages and get an estimation as follows:

127.0.0.1:6379> PFADD pages "https://redis.com/" "https://redis.io/docs/stack/bloom/" "https://redis.io/docs/data-types/hyperloglogs/"
(integer) 1
127.0.0.1:6379> PFCOUNT pages
(integer) 3

Redis reports the following memory usage for HyperLogLog:

127.0.0.1:6379> MEMORY USAGE pages
(integer) 96

Attempting to resolve the same problem using a Set and storing the hashes for these URLs would be done as follows:

127.0.0.1:6379> SADD hashpages "522195171ed14f78e1f33f84a98f0de6" "f5518a82f8be40e2994fdca7f71e090d" "c4e78b8c136f6e1baf454b7192e89cd1"
(integer) 3
127.0.0.1:6379> MEMORY USAGE hashpages
(integer) 336

Probabilistic data structures trade accuracy for time and space efficiency and give an answer to this and other questions by addressing several data analysis problems against big amounts of data and, most relevantly, efficiently.

Programmability

Redis Stack embeds a serverless engine for event-driven data processing allowing users to write and run their own functions on data stored in Redis. The functions are implemented in JavaScript and executed by the engine upon user invocation or in response to events such as changes to data, execution of commands, or when events are added to a Redis Stream data structure. It is also possible to configure timed executions, so periodical maintenance operations can be scheduled.

Redis Stack minimizes the execution time by running the functions as close as possible to the data, improving data locality, minimizing network congestion, and increasing the overall throughput of the system.

With this capability, it is possible to implement event-driven data flows, thus opening the doors to many use cases, such as the following:

  1. A basic library including a function can be implemented in text files, as in the following snippet:
    #!js api_version=1.0 name=lib
    redis.registerFunction('hello', function(){
        return 'Hello Gears!';
    });
  2. The lib.js file containing this function can then be imported into Redis Stack:
    redis-cli -x TFUNCTION LOAD < ./lib.js
  3. It can then be executed on demand from the command-line interface:
    127.0.0.1:6379>  TFCALL lib.hello 0
    "Hello Gears!"
  4. Things become more interesting if we subscribe to data changes as follows:
    redis.registerKeySpaceTrigger("key_logger", "user:", function(client, data){
        if (data.event == 'del'){
            client.call("INCR", "removed");
            redis.log(JSON.stringify(data));
            redis.log("A user has been removed");
        }
    });

    In this function, we do the following:

    • We are subscribing to events against the keys prefixed by the “user: namespace
    • We check the command that triggered the event, and if it is a deletion, we act and specify what’s going to happen next
    • The triggered action will be the increment of a counter, and it will also write a message into the server’s log
  5. To test this function, we proceed to create and delete a user profile:
    127.0.0.1:6379> HSET user:123 name "John" last "Smith"
    (integer) 2
    127.0.0.1:6379> DEL user:123
    (integer) 1
  6. A quick check of the server’s log verifies that the condition has been met, and the information logged:
    299:M 05 Feb 2023 19:13:09.004 * <redisgears_2> {"event":"del","key":"user:123","key_raw":{}}
    299:M 05 Feb 2023 19:13:09.005 * <redisgears_2> A user has been removed

    And the counter has increased:

    127.0.0.1:6379> GET removed
    "1"

Through this book, we will come to understand the differences between Lua scripts, Redis functions, and JavaScript functions, and we will explore the many possible programmability features along with proposals to resolve challenging problems with simple solutions.

So, what is Redis Stack?

Redis Stack combines the speed and stability of the Redis server with a set of well-established capabilities and integrates them into a compact solution that is easy to install and manage – Redis Stack Server. The RedisInsight desktop application is a visualization tool and data manager that complements Redis Stack Server with a set of functionalities useful for visualizing data stored by different models as well as providing interactive tutorials with popular examples, and more.

To complete the picture, the Redis Stack Client SDK includes the most popular client libraries to develop against Redis Stack in the Java, Python, and JavaScript programming languages.

Figure 1.1 – The Redis Stack logo

Figure 1.1 – The Redis Stack logo

Redis Stack empowers users with the liberty to use it for free in development and production environments and merges the open source BSD-licensed Redis with search and query capabilities, JSON support, time series handling, and probabilistic data structures. It is available under a dual license, specifically the Redis Source Available License (RSALv2) and the Server Side Public License (SSPL).

So, in a few examples, we have introduced new possibilities to modernize applications, and now we owe you an answer to the original question, “What is Redis Stack?

Key-value storage

To define what Redis Stack is, we need to go back for a moment to its origins, because Redis is the spinal cord of Redis Stack. Redis was born as in-memory storage to accelerate massive amounts of queries and achieve sub-millisecond latency while optimizing memory usage and maximizing the ease of adoption and administration. It appeared at the same time as other solutions taking part in the NoSQL wave and deviating from relational modeling. While the key-value Memcached store was an already established solution, Redis became popular too as a type of key-value storage. So, we can surely say that Redis Stack can be used as a key-value store.

Data structure server

However, considering Redis Stack as a simple key-value data store is reductive. Redis is best known for its flexibility in storing collections such as Hashes, Sets, Sorted Sets or Lists, Bitmaps and Bitfields, Streams, HyperLogLog probabilistic data structures, and geo indexes. And, together with data structures, its efficient low-complexity algorithms make storing and searching data a joy for developers. We can certainly say that Redis Stack is also a data structure store.

Multi-model database

The features introduced so far are integrated into Redis Stack Server and extend the Redis server, turning the data structure server into a multi-model database. This provides a rich data modeling experience where multiple heterogeneous data structures such as documents, vectors, and time series coexist in the same database. Software architects will appreciate the variety of possibilities for designing new solutions without multiple specialized databases and software developers will be empowered with a rich set of client libraries that improve the ease of software design. Database administrators will discover how shallow the learning curve is to learn to administer a single database rather than installing, configuring, and maintaining several data stores.

Data platform

The characteristics discussed so far, together with stream processing and the possibility to execute JavaScript functions for event-driven development, push Redis Stack beyond the boundaries of the multi-model database definition. Combining Redis, the key-value data store that is popular as a cache, with advanced data structures and multi-model design, and with the capability of a message broker with event-driven programming features, turns Redis Stack into a powerful data platform.

We have completed the Redis Stack walk-through, and to conclude this chapter, we will briefly discuss how to install it using different methods.

 

Redis Stack deployment types

We have completed an overview of Redis Stack and its key differentiators from the Redis server. In the next chapters, we will dive into the many use cases that can be solved and will discuss lots of examples and code snippets. For the time being, you can start planning your next Redis Stack-based modern application and think about the platform that will host the data store.

Redis Stack is available on all main operating systems (Linux, Mac, Windows) in binary format. It is also available as a Docker image, so you can start it right now by launching a container on your machine as follows:

docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest

Redis Stack is free, and you can install, manage, and deploy it in production without any license fee. It’s Redis, after all. You can also install RedisInsight and connect it to Redis Stack Server to see how easy is to bring your data under control.

If you don’t want to install Redis Stack, you can also create a free Redis Cloud account at https://redis.com/try-free/. You can get a 30 MB forever-free database and a public endpoint to use it from your laptop. No VPN is needed, and no certificate setup is required. You can choose where to create your free instance, for example on Amazon AWS, Google Cloud, or Microsoft Azure.

Be prepared, because if you haven’t already thought, “I didn’t know that Redis could do this,” we will surprise you with the many things you will be able to do, for free, with Redis Stack!

 

Summary

In this chapter, we have introduced Redis Stack starting from its foundation, the open source Redis server. We have introduced the multi-model approach of Redis Stack with examples, and we have performed simple searches beyond primary key lookup. You have learned about the syntax of the commands to use Redis Stack as a document store capable of storing Hash and JSON documents, and as a time series store, to store data points and search through them. Finally, we explored probabilistic data structures and have shown examples of database programmability.

In Chapter 2, Developing Modern Use Cases with Redis Stack, we will see that Redis Stack can be used in many different scenarios. From an in-memory, real-time cache and session store, to storing leaderboards, or being used as a message broker in a microservice architecture, you will learn that Redis Stack can be a better fit than deploying multiple specialized databases and messaging solutions.

About the Authors
  • Luigi Fugaro

    Luigi Fugaro's first encounter with computers was in the early 80s when he was a kid. He started with a Commodore Vic-20, passing through a Sinclair, a Commodore 64, and an Atari ST 1040, where he spent days and nights giving breath mints to Otis. In 1998, he started his career as a webmaster doing HTML, JavaScript, Applets, and some graphics with Paint Shop Pro. He then switched to Delphi, Visual Basic, and then started working on Java projects. He has been developing all kinds of web applications, dealing with backend and frontend frameworks. In 2012, he started working for Red Hat and is now an architect in the EMEA Middleware team. He has authored WildFly Cookbook and Mastering JBoss Enterprise Application Platform 7 by Packt Publishing.

    Browse publications by this author
  • Mirko Ortensi

    Mirko Ortensi earned a degree in Electronic Engineering and a Master's degree in Software Engineering. Mirko's career has spanned several roles from Software Engineering to Customer Support, particularly centered around distributed database systems. As a Senior Technical Enablement Architect at Redis, Mirko shares technical knowledge about Redis's products and services.

    Browse publications by this author
Redis Stack for Application Modernization
Unlock this book and the full library FREE for 7 days
Start now