Redis has achieved several important milestones since its inception in 2009, from taking the lead as the most popular key-value data store, according to the ranking published every month by the website DB-Engines (and the sixth among all database systems), up to establishing the record as the most downloaded container image on Docker. Not to mention that Redis has been the most loved database for five years in a row, according to the Developer Survey published by Stack Overflow in the years 2016-2021. And, for sure, you, or a friend of yours, have used it for some reason, work, or hobby.
If you are reading this book, chances are you have programmed an application using a Redis server, or at least you know what it is and what it is used for. In this chapter, we’ll recap what made Redis the most famous caching system in the world and we’ll share some anecdotes about the development undertaken by its creator, Salvatore Sanfilippo. We won’t stay long on the story of Redis, though, because this book is about application modernization. As you read through, you will discover how the original database, designed for speed and simplicity, has evolved to resolve many of the new challenges of this age, without compromising on the ease of adoption, flexibility, and, above all, speed.
Redis Stack is an extension of Redis presented in 2022, which introduces JSON, vector, and time series data modeling capabilities, all supporting real-time queries and searches. Redis Stack represents a new approach to providing a rich data modeling experience all within the same database server. It introduces features such as vector similarity search to query structured and unstructured data (for example, text, images, or audio files) and delivers probabilistic Bloom filters to efficiently resolve recurrent big data problems. Redis Stack is also a data platform that supports event-driven programming and introduces stream processing features. By the end of this chapter, you will understand what Redis Stack is and how it enhances the Redis server with many new capabilities. Above all, you will learn the motivation behind Redis Stack and why multi-model databases can increase the speed of technological innovation for organizations of all sizes. In this chapter, we are going to cover the following topics:
To follow along with the examples in the chapter, you will need the following:
Redis was conceived and designed in 2009 by the Italian software engineer Salvatore Sanfilippo as a solution to scaling LLOOGG, an online analytics server co-founded with Fabio Pitrola that empowered web admins to track user activities. Challenged by the scalability limitations of MySQL, Salvatore decided to rethink the concept of key-value storage and design something that would (admittedly) be different from Memcached, while preserving its simplicity and speed. The first beta release was shared on Google Code on February 25, 2009. A few months later, in September 2009, the first stable release, Redis 1.0, was published as a tar package of less than 200 KB.
Redis has been designed to offer an alternative for problems where relational databases (RDBMSs) are not a good fit because there is something wrong if we use an RDBMS for all kinds of work. However, in comparison to other data storage options that became popular when the NoSQL wave shook the world of databases (Memcached, the key-value data store released in 2003, or MongoDB, the document store released in 2009, and many more), Redis has its roots in computer science and makes a rich variety of data structures available. This is one of the distinguishing features of Redis and the likely reason that fostered its adoption by software engineers and developers – presenting data structures such as hashes, lists, sets, bitmaps, and so on that are familiar to software engineers so they could transfer the programming logic to data modeling without any lengthy and computationally expensive data transformation. Viewed in this light, we could say that Redis is about persisting the data structures of a programming language. An example of the simplicity of storing a Python dictionary in a Redis hash data structure follows:
user = {"name":"John", "surname":"Smith", "company":"Redis", "department":"Sales"} r.hset("user:{}".format(str(2345)), mapping=user)
In the same way, adding elements to a Redis Set can be done using Python lists:
languages = ['Python', 'C++', 'JavaScript'] r.sadd("coding", *languages)
In these examples, the user dictionary and the languages list are stored without transformations, and this is one of the advantages that Redis data structures offer to developers: simplifying data modeling and reducing the transformational overhead required to convert the data in a format that can be mapped to the data store (thus reducing the so-called impedance mismatch).
There was a short gap between the first release and its adoption by Instagram and GitHub. If we try to dig into the reasons that made Redis so popular, we can mention a few, among which we count the speed and simplicity of deployment. Beyond the user experience, Redis is an act of dedication and passion, and as we read in Redis’s own manifesto, code is like poetry; it’s not just something we write to reach some practical result. People love beautiful stories and simplicity and everybody should fight against complexity.
What is surely true is that Redis is an idea to solve problems where relational databases, still tied to rigid paradigms, wouldn’t fit the purpose. It is the product of creativity, inspiration, and love for things done manually, where good design and craftsmanship intertwine to accomplish something that simply works. An intimate artwork. And we like to recall Salvatore’s words about the creative approach when writing Redis:
My wife claims I wrote it mostly while sitting on the WC for the first years, on a MacBook Air 11. Would be nice to tell her she is wrong, but she happens to be perfectly right about the matter.
From the most-used thinking room in Sicily to becoming the most-loved and used key-value database in the world, this is the story we have decided to tell in this book, and we are sure you will find the journey through the pages an exciting adventure.
One of the guiding principles behind Redis is being open source and driven by a community of enthusiast contributors. We’ll explore that in the next section.
The success of a technical project is always measurable in terms of the innovation of the proposal, simplicity of use, exhaustive documentation, high performance, low footprint, and stability, among other aspects. However, and this is true for many things, at the end of the day what matters is the capacity to resolve a problem and the impact of the solution. Organizations that decide to add new technology to their stack face several challenges to understand, prototype, validate, and set up a plan to deploy test environments together with a release strategy, a maintenance plan, and, finally, a plan to develop competence. Success stories require careful planning. From these many perspectives, Redis is considered first-in-class, and in this book, we will expose many of the reasons that made Redis the de-facto standard among the in-memory data stores in the world. But even before digging into the features of Redis Stack, Redis, as an open source project, has undoubtedly added value to many businesses:
These reasons, together with the fact that it’s very easy to learn Redis, make it an attractive option to set up and use. On a computer configured to build C projects, pulling the source code from the GitHub repository, compiling it, and running the server can be done in less than a minute:
git clone https://github.com/redis/redis.git cd redis/ make ./src/redis-server & ./src/redis-cli PING PONG
The open source project delivers the core Redis server plus additional utilities, such as these:
Now that we have reviewed the basic principles behind Redis and its utilities, we are ready to dive into the world of data modeling. This journey will take us from relational databases to Redis core data structures, and we will see how the multi-model capabilities of Redis Stack simplify many data modeling problems.
The core data structures that are available out of the box in the Redis server solve a variety of problems when it comes to mapping entities and relationships. To start with concrete examples of modeling using Redis, the usual option to store an object is the Hash data structure, while collections can be stored using Sets, Sorted Sets, or Lists (among other options because a collection can be modeled in several other ways). In this section, we will introduce the multi-model features of Redis Stack using a comprehensive approach, which may be useful for those who are used to storing data using the relational paradigm, which implies organizing the data in rows and columns of a table.
Consider the requirement to model a list of cities. Using the relational data model, we can define a table using the SQL data definition language (DDL) instruction CREATE TABLE as follows:
CREATE TABLE `city` ( `ID` int NOT NULL AUTO_INCREMENT, `Name` char(35) NOT NULL DEFAULT '', `CountryCode` char(3) NOT NULL DEFAULT '', `District` char(20) NOT NULL DEFAULT '', `Population` int NOT NULL DEFAULT '0', PRIMARY KEY (`ID`), KEY `CountryCode` (`CountryCode`) )
This table definition defines attributes for the city entity and specifies a primary key on an integer identifier (a surrogate key, in this case, provided the uniqueness of the attributes is not guaranteed for the city entity). The DDL command also defines an index on the CountryCode attribute. Data encoding, collation, and the specific technology adopted as the storage engine are not relevant in this context. We are focused on understanding the model and the ability that we have to query it.
Primary key lookup is the most efficient way to access data in a relational table. Filtering the table on the primary key attribute is as easy as executing the SQL SELECT statement:
SELECT * FROM city WHERE ID=653; +-----+--------+-------------+----------+------------+ | ID | Name | CountryCode | District | Population | +-----+--------+-------------+----------+------------+ | 653 | Madrid | ESP | Madrid | 2879052 | +-----+--------+-------------+----------+------------+ 1 row in set (0.00 sec)
Modeling a city using one of the Redis core data structures leads to mapping the data in the SQL table to Hashes, so we can store the attributes as field-value pairs, with the key name including the primary key:
127.0.0.1:6379> HSET city:653 Name "Madrid" CountryCode "ESP" District "Madrid" Population 2879052
The HGETALL command can be used to retrieve the entire hash with minimal overhead (HGETALL has direct access to the value in the Redis keyspace):
HGETALL city:653 1) "Name" 2) "Madrid" 3) "CountryCode" 4) "ESP" 5) "District" 6) "Madrid" 7) "Population" 8) "2879052"
In addition, we can limit the bandwidth usage caused by the entire row transfer to the client and select only specific attributes. The SQL syntax is as follows:
SELECT Name, Population FROM city WHERE ID=653; +--------+------------+ | Name | Population | +--------+------------+ | Madrid | 2879052 | +--------+------------+ 1 row in set (0.00 sec)
In this analogy between the relational model and Redis, the command is HGET (or HMGET for multiple values):
127.0.0.1:6379> HMGET city:653 Name Population 1) "Madrid" 2) "2879052"
While we need to extract data based on the primary key identifier, the solution is at hand in both the relational database and in Redis. Things get more complicated if we want to perform lookup and search queries on the dataset. In the next examples, we’ll see how the complexity and performance of such operations may vary substantially.
Primary key lookups are efficient: after all, the primary key is an index, and it guarantees direct access to the table row. But what if we want to search for cities by filtering on an attribute? Let’s try an indexed search against our relational database over the CountryCode column, which has a secondary index:
mysql> SELECT Name FROM city WHERE CountryCode = "ESP"; +--------------------------------+ | Name | +--------------------------------+ | Madrid | | Barcelona | | [...] | +--------------------------------+ 59 rows in set (0.02 sec)
This is an efficient search because the table defines an index on the CountryCode column. To continue the comparison of the relational database versus Redis, we will need to execute the same query against the stored Hashes. For this demonstration, we will assume that we have migrated the city table to Hashes in the Redis server. By design, Redis has no secondary indexing feature for any of the core data structures, which means that we should scan all the Hashes prefixed by the “city:” namespace, then read the city name from every Hash and check whether it matches our search term. The following example performs a non-blocking scan of the keyspace, filtering on the key name (“city:*”) in batches of configurable size (three, in the example):
127.0.0.1:6379> SCAN 0 MATCH city:* COUNT 3 1) "512" 2) 1) "city:4019" 2) "city:9" 3) "city:103"
The client should now extract the CountryCode value from every city, compare it to the search term, and repeat until the scan is concluded. This is obviously a time-consuming and expensive approach. There are ways to improve the efficiency of such batched operations. We will explore three standard options and then show how to resolve the problem using the Redis Stack capabilities:
We will look at these in detail next.
The first approach to reducing the overhead of the search operation is to use pipelining, which is supported by all major client libraries. Pipelining collects a batch of commands, delivers them to the server, and collects the outputs from the server immediately before returning the result to the client. This option dramatically reduces the latency of the overall operation, as it saves on the roundtrip time to the server (an analogy that works is going to the supermarket once to purchase 30 items rather than going 30 times and purchasing one item on every visit). The pros and cons of pipelining are as follows:
Lua scripting and functions (functions were introduced in Redis 7.0 and represent an evolution of Lua scripting for remote server execution) help to offload the client and remove network latency. The search is local to the server and close to the data (equivalent to the concept of stored procedures). The following function is an example of local search:
#!lua name=mylib local function city_by_cc(keys, args) local match, cursor = {}, "0"; repeat local ret = redis.call("SCAN", cursor, "MATCH", "city:*", "COUNT", 100); local cities = ret[2]; for i = 1, #cities do local keyname = cities[i]; local ccode = redis.call('HMGET',keyname,'Name','CountryCode') if ccode[2] == args[1] then match[#match + 1] = ccode[1]; end; end; cursor = ret[1]; until cursor == "0"; return match; end redis.register_function('city_by_cc', city_by_cc)
In this function, we do the following:
Type the code into the mylib.lua file and import the library as follows:
cat mylib.lua | redis-cli -x FUNCTION LOAD
The function can be invoked using the following command:
127.0.0.1:6379> FCALL city_by_cc 0 "ESP" 1) "A Coru\xf1a (La Coru\xf1a)" 2) "Almer\xeda" [...] 59) "Barakaldo"
The pros and cons of using functions are as follows:
Data scans, wherever they are executed (client or server side), are slow and ineffective in satisfying real-time requirements. This is especially true when the keyspace stores millions of keys or more. An alternative approach for search operations using the Redis core data structures is to create a secondary index. There are many options to do this using Redis collections. As an example, we can create an index of Spanish cities using a Set as follows:
SADD city:esp "Sevilla" "Madrid" "Barcelona" "Valencia" "Bilbao" "Las Palmas de Gran Canaria"
This data structure has interesting properties for our needs. We can retrieve all the Spanish cities in a single command:
127.0.0.1:6379> SMEMBERS city:esp 1) "Madrid" 2) "Sevilla" 3) "Valencia" 4) "Barcelona" 5) "Bilbao" 6) "Las Palmas de Gran Canaria"
Or we can check whether a specific city is in Spain using SISMEMBER, a constant time-complexity command:
127.0.0.1:6379> SISMEMBER city:esp "Madrid" (integer) 1
And we can even search the index for cities having a name that matches a pattern:
127.0.0.1:6379> SSCAN city:esp 0 MATCH B* 1) "0" 2) 1) "Barcelona" 2) "Bilbao"
We can refine our search requirements and design an index that considers the population. In such a case we could use a Sorted Set and Set the population as the score:
127.0.0.1:6379> ZADD city:esp 2879052 "Madrid" 701927 "Sevilla" 1503451 "Barcelona" 739412 "Valencia" 357589 "Bilbao" 354757 "Las Palmas de Gran Canaria" (integer) 6
The main feature of the Sorted Set data structure is that its members are stored in an ordered tree-like structure (Redis uses a skiplist data structure), and with that, it is possible to execute low-complexity range searches. As an example, let’s retrieve Spanish cities with more than 2 million inhabitants:
127.0.0.1:6379> ZRANGE city:esp 2000000 +inf BYSCORE 1) "Madrid"
We can also check whether a city belongs to the index of Spanish cities:
127.0.0.1:6379> ZRANK city:esp Madrid (integer) 5
In the former example, the ZRANK command informs us that the city Madrid belongs to the index and is fifth highest in the ranking. This solution resolves the overhead caused by having to scan the entire keyspace looking for matches.
The drawback of such a manual approach to indexing the data is that indexes need to reflect the data at any time. Considering scenarios where we want to add or remove a city from our database, we need to perform the two operations of removing the city Hash and updating the index, atomically. We can use a Redis transaction to perform atomic changes on both the data and the index:
127.0.0.1:6379> MULTI OK 127.0.0.1:6379(TX)> DEL city:653 QUEUED 127.0.0.1:6379(TX)> ZREM city:esp "Madrid" QUEUED 127.0.0.1:6379(TX)> EXEC 1) (integer) 1 2) (integer) 1
Custom secondary indexes come at a price, though, because complex searches become hard to manage using multiple data structures. Indexes must be maintained, and the complexity of such solutions may get out of hand, putting the consistency of search operations at risk. The pros and cons of using indexing are as follows:
Next, we will examine the capabilities of Redis Stack.
Caching is one of the frequent use cases for which Redis shines as the best-in-class storage solution. This is because it stores data in memory, and offers real-time performance. It is also lightweight, as data structures are optimized to consume little memory. Redis does not need any complex configuration or maintenance and it is open source, so there is no reason not to give it a try. As a real-time data storage, it seems plausible that complex search operations may not be the primary use case users are interested in when using Redis. After all, fast retrieval of data by key is what made Redis so versatile as a cache or as a session store.
However, if in addition to the ability to use core data structures to store the data, we ensure that fast searches can be performed (besides primary key lookup), it is possible to think beyond the basic caching use case and start looking at Redis as a full-fledged database, capable of high-speed searches.
So far, we have presented simple and common search problems and both solutions using the traditional SQL approach and possible data modeling strategies using Redis core data structures. In the following sections, we will show how Redis Stack resolves query and search use cases and extends the core features of Redis with an integrated modeling and developing experience. We will introduce the following capabilities:
Let’s discuss each of these capabilities in detail.
Redis Stack complements Redis with the ability to create secondary indexes on Hashes or JSON documents, the two document types supported by Redis Stack. The search examples seen so far can be resolved with the indexing features. To perform an indexed search, we create an index against the hashes modeling the cities using the following syntax:
FT.CREATE city_idx ON HASH PREFIX 1 city: SCHEMA Name AS name TEXT CountryCode AS countrycode TAG SORTABLE Population AS population NUMERIC SORTABLE
The FT.CREATE command instructs the server to perform the following operations:
As soon as the indexing operation against the relevant data – all the keys prefixed by “hash:”– is completed, we can execute the queries and searches seen so far, and more. The syntax in the following example executes a search of all the cities with the value “ESP” in the TAG field type and returns only the name of the cities, sorted in lexicographical order. Finally, the first three results are returned using the LIMIT option. Note that this query is executed against the new city_idx index, and not directly against the data:
127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' RETURN 1 name SORTBY name LIMIT 0 3 1) (integer) 59 2) "city:670" 3) 1) "name" 2) "A Coru\xc3\xb1a (La Coru\xc3\xb1a)" 4) "city:690" 5) 1) "name" 2) "Albacete" 6) "city:687" 7) 1) "name" 2) "Alcal\xc3\xa1 de Henares"
It is possible to combine several textual queries/filters in the same index. Using exact-match and full-text search, we can verify whether Madrid is a Spanish city:
127.0.0.1:6379> FT.SEARCH city_idx '@name:Madrid @countrycode:{ESP}' RETURN 1 name 1) (integer) 1 2) "city:653" 3) 1) "name" 2) "Madrid"
In a previous example, the range search was executed using the ZRANGE data structure. Using the indexing capability of Redis Stack, we can execute range searches using the NUMERIC field type. So, if we want to retrieve the Spanish cities with more than 2 million inhabitants, we will write the following search query:
127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name 1) (integer) 1 2) "city:653" 3) 1) "name" 2) "Madrid"
Redis Stack offers flexibility and concise syntax to combine several field types, of which we have seen only a limited but representative number of examples. Once the index is created, the user can go ahead and use it, and add new documents or update existing ones. The database maintains the indexes updated synchronously as soon as documents are created or changed.
Besides full-text, exact-match, and range searches, we can also perform data aggregation (as we would in a relational database using the GROUP BY statement). If we would like to retrieve the three most populated countries, sorted in descending order, we would solve the problem in SQL as follows:
SELECT CountryCode, SUM(Population) AS sum FROM city GROUP BY CountryCode ORDER BY sum DESC LIMIT 3; +-------------+-----------+ | CountryCode | sum | +-------------+-----------+ | CHN | 175953614 | | IND | 123298526 | | BRA | 85876862 | +-------------+-----------+ 3 rows in set (0.01 sec)
We can perform complex aggregations with the FT.AGGREGATE command. Using the following command, we can perform a real-time search and aggregation to compute the total population of the top three countries by summing up the inhabitants of the cities per country:
127.0.0.1:6379> FT.AGGREGATE city_idx * GROUPBY 1 @countrycode REDUCE SUM 1 @population AS sum SORTBY 2 @sum DESC LIMIT 0 3 1) (integer) 232 2) 1) "countrycode" 2) "chn" 3) "sum" 4) "175953614" 3) 1) "countrycode" 2) "ind" 3) "sum" 4) "123298526" 4) 1) "countrycode" 2) "bra" 3) "sum" 4) "85876862"
To summarize this brief introduction where we addressed the search and aggregation capabilities, it is worth mentioning that there are multiple types of searches, such as phonetic matching, auto-completion suggestions, geo searches, or a spellchecker to help design great applications. We will cover them in depth in Chapter 5, Redis Stack as a Document Store, where we showcase Redis Stack as a document store.
Besides modeling objects as Hash, it is possible to store, update, and retrieve JSON documents. The JSON format needs no introduction, as it permeates data pipelines including heterogeneous subsystems, protocols, databases, and so on. Redis Stack delivers this capability out of the box and manages JSON documents in a similar way to Hashes, which means that it is possible to store, index, and search JSON objects and work with them using JSONPath syntax:
JSON.SET city:653 $ '{"Name":"Madrid", "CountryCode":"ESP", "District":"Madrid", "Population":2879052}' JSON.SET city:5 $ '{"Name":"Amsterdam", "CountryCode":"NLD", "District":"Noord-Holland", "Population":731200}' JSON.SET city:1451 $ '{"Name":"Tel Aviv-Jaffa", "CountryCode":"ISR", "District":"Tel Aviv", "Population":348100}'
127.0.0.1:6379> JSON.GET city:653 "{\"Name\":\"Madrid\",\"CountryCode\":\"ESP\",\"District\":\"Madrid\",\"Population\":2879052}"
127.0.0.1:6379> JSON.GET city:653 $.Name "[\"Madrid\"]" 127.0.0.1:6379> JSON.GET city:653 $.Name $.CountryCode "{\"$.Name\":[\"Madrid\"],\"$.CountryCode\":[\"ESP\"]}"
FT.CREATE city_idx ON JSON PREFIX 1 city: SCHEMA $.Name AS name TEXT $.CountryCode AS countrycode TAG SORTABLE $.Population AS population NUMERIC SORTABLE
127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name 1) (integer) 1 2) "city:653" 3) 1) "name" 2) "Madrid"
Unlike Hash documents, the JSON supports nested levels (up to 128) and can store properties, objects, arrays, and geographical locations at any level in a tree-like structure, so the JSON format opens up a variety of use cases using a compact and flexible data structure.
Time series databases do not need any long introduction: they are data structures that can store data points happening at a certain time, indicated by a Unix timestamp expressed in milliseconds, with an associated numeric data value, typically with double precision. This data structure applies to many use cases, such as monitoring entities over time or tracking user activities for a determined service. Redis Stack has an integrated time series database that offers many useful features to manage the data points, for querying and searching, and provides convenient formatting commands for data processing and visualization. Beginning with time series modeling is straightforward:
TS.CREATE "app:monitor:temp"
127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632813307 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632818179 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632824174 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20.1" (integer) 1675632829519 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632835052
127.0.0.1:6379> "TS.RANGE" "app:monitor:temp" "1675632818179" "1675632829519" 1) 1) (integer) 1675632818179 2) 20 2) 1) (integer) 1675632824174 2) 20 3) 1) (integer) 1675632829519 2) 20.1
We have just scratched the surface of using time series with Redis Stack, because data may be aggregated, down-sampled, and indexed to address many different uses.
Deterministic data structures – all those structures that store and return the same data that was stored (such as Strings, Sets, Hashes, and the rest of Redis structures) – are a good solution for standard amounts of data, but they may become inadequate due to the constantly growing volumes of data that systems must handle. Redis offers several options to store and present data to extract different types of insights. Strings are an example because they can be encoded as integers and used as counters:
127.0.0.1:6379> INCR cnt (integer) 1 127.0.0.1:6379> INCRBY cnt 3 (integer) 4
Strings can also be managed down to the bit level to store multiple integer counters of variable length and stored at different offsets of a single string to reduce storage overheads using the bitfield data structure:
127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5 1) (integer) 5 127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5 1) (integer) 10 127.0.0.1:6379> BITFIELD cnt GET i5 0 1) (integer) 10
Regular counters, sets, and hash tables perform well for any amount of data but handling large amounts of data represents a challenge to scale the resources of the machine where Redis Stack is running, because of its memory requirements.
Deterministic data structures have given way to probabilistic data structures because of the need to scale up to large quantities of data and give a reasonably approximated answer to questions such as the following:
In the attempt to give an answer to the first question in the list, we could calculate the hash of the URL of the visited page and store it in a Redis collection, such as a Set, and then retrieve the cardinality of the structure using the SCARD command. While this solution works very well (and is deterministically exact), scaling it to many users and many visited pages represents a cost.
Let’s consider an example with a probabilistic data structure. HyperLogLog estimates the cardinality of a set with minimal memory usage and computational overhead without compromising the accuracy of the results, while consuming only a fraction of memory and CPU, so you would count the visited pages and get an estimation as follows:
127.0.0.1:6379> PFADD pages "https://redis.com/" "https://redis.io/docs/stack/bloom/" "https://redis.io/docs/data-types/hyperloglogs/" (integer) 1 127.0.0.1:6379> PFCOUNT pages (integer) 3
Redis reports the following memory usage for HyperLogLog:
127.0.0.1:6379> MEMORY USAGE pages (integer) 96
Attempting to resolve the same problem using a Set and storing the hashes for these URLs would be done as follows:
127.0.0.1:6379> SADD hashpages "522195171ed14f78e1f33f84a98f0de6" "f5518a82f8be40e2994fdca7f71e090d" "c4e78b8c136f6e1baf454b7192e89cd1" (integer) 3 127.0.0.1:6379> MEMORY USAGE hashpages (integer) 336
Probabilistic data structures trade accuracy for time and space efficiency and give an answer to this and other questions by addressing several data analysis problems against big amounts of data and, most relevantly, efficiently.
Redis Stack embeds a serverless engine for event-driven data processing allowing users to write and run their own functions on data stored in Redis. The functions are implemented in JavaScript and executed by the engine upon user invocation or in response to events such as changes to data, execution of commands, or when events are added to a Redis Stream data structure. It is also possible to configure timed executions, so periodical maintenance operations can be scheduled.
Redis Stack minimizes the execution time by running the functions as close as possible to the data, improving data locality, minimizing network congestion, and increasing the overall throughput of the system.
With this capability, it is possible to implement event-driven data flows, thus opening the doors to many use cases, such as the following:
#!js api_version=1.0 name=lib redis.registerFunction('hello', function(){ return 'Hello Gears!'; });
redis-cli -x TFUNCTION LOAD < ./lib.js
127.0.0.1:6379> TFCALL lib.hello 0 "Hello Gears!"
redis.registerKeySpaceTrigger("key_logger", "user:", function(client, data){ if (data.event == 'del'){ client.call("INCR", "removed"); redis.log(JSON.stringify(data)); redis.log("A user has been removed"); } });
In this function, we do the following:
127.0.0.1:6379> HSET user:123 name "John" last "Smith" (integer) 2 127.0.0.1:6379> DEL user:123 (integer) 1
299:M 05 Feb 2023 19:13:09.004 * <redisgears_2> {"event":"del","key":"user:123","key_raw":{}} 299:M 05 Feb 2023 19:13:09.005 * <redisgears_2> A user has been removed
And the counter has increased:
127.0.0.1:6379> GET removed "1"
Through this book, we will come to understand the differences between Lua scripts, Redis functions, and JavaScript functions, and we will explore the many possible programmability features along with proposals to resolve challenging problems with simple solutions.
Redis Stack combines the speed and stability of the Redis server with a set of well-established capabilities and integrates them into a compact solution that is easy to install and manage – Redis Stack Server. The RedisInsight desktop application is a visualization tool and data manager that complements Redis Stack Server with a set of functionalities useful for visualizing data stored by different models as well as providing interactive tutorials with popular examples, and more.
To complete the picture, the Redis Stack Client SDK includes the most popular client libraries to develop against Redis Stack in the Java, Python, and JavaScript programming languages.
Figure 1.1 – The Redis Stack logo
Redis Stack empowers users with the liberty to use it for free in development and production environments and merges the open source BSD-licensed Redis with search and query capabilities, JSON support, time series handling, and probabilistic data structures. It is available under a dual license, specifically the Redis Source Available License (RSALv2) and the Server Side Public License (SSPL).
So, in a few examples, we have introduced new possibilities to modernize applications, and now we owe you an answer to the original question, “What is Redis Stack?”
To define what Redis Stack is, we need to go back for a moment to its origins, because Redis is the spinal cord of Redis Stack. Redis was born as in-memory storage to accelerate massive amounts of queries and achieve sub-millisecond latency while optimizing memory usage and maximizing the ease of adoption and administration. It appeared at the same time as other solutions taking part in the NoSQL wave and deviating from relational modeling. While the key-value Memcached store was an already established solution, Redis became popular too as a type of key-value storage. So, we can surely say that Redis Stack can be used as a key-value store.
However, considering Redis Stack as a simple key-value data store is reductive. Redis is best known for its flexibility in storing collections such as Hashes, Sets, Sorted Sets or Lists, Bitmaps and Bitfields, Streams, HyperLogLog probabilistic data structures, and geo indexes. And, together with data structures, its efficient low-complexity algorithms make storing and searching data a joy for developers. We can certainly say that Redis Stack is also a data structure store.
The features introduced so far are integrated into Redis Stack Server and extend the Redis server, turning the data structure server into a multi-model database. This provides a rich data modeling experience where multiple heterogeneous data structures such as documents, vectors, and time series coexist in the same database. Software architects will appreciate the variety of possibilities for designing new solutions without multiple specialized databases and software developers will be empowered with a rich set of client libraries that improve the ease of software design. Database administrators will discover how shallow the learning curve is to learn to administer a single database rather than installing, configuring, and maintaining several data stores.
The characteristics discussed so far, together with stream processing and the possibility to execute JavaScript functions for event-driven development, push Redis Stack beyond the boundaries of the multi-model database definition. Combining Redis, the key-value data store that is popular as a cache, with advanced data structures and multi-model design, and with the capability of a message broker with event-driven programming features, turns Redis Stack into a powerful data platform.
We have completed the Redis Stack walk-through, and to conclude this chapter, we will briefly discuss how to install it using different methods.
We have completed an overview of Redis Stack and its key differentiators from the Redis server. In the next chapters, we will dive into the many use cases that can be solved and will discuss lots of examples and code snippets. For the time being, you can start planning your next Redis Stack-based modern application and think about the platform that will host the data store.
Redis Stack is available on all main operating systems (Linux, Mac, Windows) in binary format. It is also available as a Docker image, so you can start it right now by launching a container on your machine as follows:
docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest
Redis Stack is free, and you can install, manage, and deploy it in production without any license fee. It’s Redis, after all. You can also install RedisInsight and connect it to Redis Stack Server to see how easy is to bring your data under control.
If you don’t want to install Redis Stack, you can also create a free Redis Cloud account at https://redis.com/try-free/. You can get a 30 MB forever-free database and a public endpoint to use it from your laptop. No VPN is needed, and no certificate setup is required. You can choose where to create your free instance, for example on Amazon AWS, Google Cloud, or Microsoft Azure.
Be prepared, because if you haven’t already thought, “I didn’t know that Redis could do this,” we will surprise you with the many things you will be able to do, for free, with Redis Stack!
In this chapter, we have introduced Redis Stack starting from its foundation, the open source Redis server. We have introduced the multi-model approach of Redis Stack with examples, and we have performed simple searches beyond primary key lookup. You have learned about the syntax of the commands to use Redis Stack as a document store capable of storing Hash and JSON documents, and as a time series store, to store data points and search through them. Finally, we explored probabilistic data structures and have shown examples of database programmability.
In Chapter 2, Developing Modern Use Cases with Redis Stack, we will see that Redis Stack can be used in many different scenarios. From an in-memory, real-time cache and session store, to storing leaderboards, or being used as a message broker in a microservice architecture, you will learn that Redis Stack can be a better fit than deploying multiple specialized databases and messaging solutions.
Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.
If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.
Please Note: Packt eBooks are non-returnable and non-refundable.
Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:
If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:
Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.
You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.
Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.
When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.
For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.