(For more resources related to this topic, see here.)
Redis provides configuration settings for persistence and for enabling durability of data depending on the project statement.
- If durability of data is critical
- If durability of data is not important
You can achieve persistence of data using the snapshotting mode, which is the simplest mode in Redis. Depending on the configuration, Redis saves a dump of all the data sets in its memory into a single RDB file. The interval in which Redis dumps the memory can be configured to happen every X seconds or after Y operations. Consider an example of a moderately busy server that receives 15,000 changes every minute over its 1 GB data set in memory. Based on the snapshotting rule, the data will be stored every 60 seconds or whenever there are at least 15,000 writes. So the snapshotting runs every minute and writes the entire data of 1 GB to the disk, which soon turns ugly and very inefficient.
To solve this particular problem, Redis provides another way of persistence, Append-only file (AOF), which is the main persistence option in Redis. This is similar to journal files, where all the operations performed are recorded and replayed in the same order to rebuild the exact state.
Redis's AOF persistence supports three different modes:
- No fsync: In this mode, we take a chance and let the operating system decide when to flush the data. This is the fastest of the three modes.
- fsync every second: This mode is a compromised middle point between performance and durability. Data will be flushed using fsync every second. If the disk is not able to match the write speed, the fsync can take more than a second, in which case Redis delays the write up to another second. So this mode guarantees a write to be committed to OS buffers and transferred to the disk within 2 seconds in the worstcase scenario.
- fsync always: This is the last and safest mode. This provides complete durability of data at a heavy cost to performance. In this mode, the data needs to be written to the file and synced with the disk using fsync before the client receives an acknowledgment. This is the slowest of all three modes.
How to do it...
First let us see how to configure snapshotting, followed by the Append-only file method:
In Redis, we can configure when a new snapshot of the data set will be performed. For example, Redis can be configured to dump the memory if the last dump was created more than 30 seconds ago and there are at least 100 keys that are modified or created.
Snapshotting should be configured in the /etc/redis/6379.conf file. The configuration can be as follows:
save 900 1
save 60 10000
The first line translates to take a snapshot of data after 900 seconds if at least one key has changed, while the second line translates to snapshotting every 60 seconds if 10,000 keys have been modified in the meantime.
The configuration parameter rdbcompression defines whether the RDB file is to be compressed or not. There is a trade-off between the CPU and RDB dump file size.
We are interested in changing the dump's filename using the dbfilename parameter. Redis uses the current folder to create the dump files. For convenience, it is advised to store the RDB file in a separate folder.
Let us run a small test to make sure the RDB dump is working. Start the server again. Connect to the server using redis-cli, as we did already. To test whether our snapshotting is working, issue the following commands:
SET Key Value
After the SAVE command, a file should be created in the folder /var/lib/redis with the name redis-snapshot.rdb. This confirms that our installation is able to take a snapshot of our data into a file.
Now let us see how to configure persistence in Redis using the AOF method:
The configuration for persistence through AOF also goes into the same file located in /etc/redis/6379.conf. By default, the Append-only mode is not enabled. Enable it using the appendonly parameter.
Also, if you would like to specify a filename for the AOF log, uncomment the line and change the filename.
The appendfsync everysec command provides a good balance between performance and durability.
Redis needs to know when it has to rewrite the AOF file. This will be decided based on two configuration parameters, as follows:
Unless the minimum size is reached and the percentage of the increase in size when compared to the last rewrite is less than 100 percent, the AOF rewrite will not be performed.
How it works...
First let us see how snapshotting works.
When one of the criteria is met, Redis forks the process. The child process starts writing the RDB file to the disk at the folder specified in our configuration file. Meanwhile, the parent process continues to serve the requests. The problem with this approach is that the parent process stores the keys, which change during this snapshotting by the child, in the extra memory. In the worst-case scenario, if all the keys are modified, the memory usage spikes to roughly double.
Be aware that the bigger the RDB file, the longer it takes Redis to restore the data on startup.
Corruption of the RDB file is not possible as it is created by the append-only method from the data in Redis's memory, by the child process. The new RDB file is created as a temporary file and is then renamed to the destination file using the atomic rename system call once the dump is completed.
AOF's working is simple. Every time a write operation is performed, the command operation gets logged into a logfile. The format used in the logfile is the same as the format used by clients to communicate to the server. This helps in easy parsing of AOF files, which brings in the possibility of replaying the operation in another Redis instance. Only the operations that change the data set are written to the log. This log will be used on startup to reconstruct the exact data.
As we are continuously writing the operations into the log, the AOF file explodes in size as compared to the amount of operations performed. So, usually, the size of the AOF file is larger than the RDB dump. Redis manages the increasing size of the data log by compacting the file in a non-blocking manner periodically. For example, say a specific key, key1, has changed 100 times using the SET command. In order to recreate the final state in the last minute, only the last SET command is required. We do not need information about the previous 99 SET commands. This might look simple in theory, but it gets complex when dealing with complex data structures and operations such as union and intersection. Due to this complexity, it becomes very difficult to compress the existing file.
To reduce the complexity of compacting the AOF, Redis starts with the data in the memory and rewrites the AOF file from scratch. This is more similar to the snapshotting method. Redis forks a child process that recreates the AOF file and performs an atomic rename to swap the old file with a new one. The same problem, of the requirement of extra memory for operations performed during the rewrite, is present here. So the memory required can spike up to two times based on the operations while writing an AOF file.
Both snapshotting and AOF have their own advantages and limitations, which makes it ideal to use both at the same time. Let us now discuss the major advantages and limitations in the snapshotting method.
Advantages of snapshotting
The advantages of configuring snapshotting in Redis are as follows:
- RDB is a single compact file that cannot get corrupted due to the way it is created. It is very easy to implement.
- This dump file is perfect to take backups and for disaster recovery of remote servers. The RDB file can just be copied and saved for future recoveries.
- In comparison, this approach has little or no influence over performance as the only work the parent process needs to perform is forking a child process. The parent process will never perform any disk operations; they are all performed by the child process.
- As an RDB file can be compressed, it provides a faster restart when compared to the append-only file method.
Limitations of snapshotting
Snapshotting, in spite of the advantages mentioned, has a few limitations that you should be aware of:
- The periodic background save can result in significant loss of data in case of server or hardware failure.
- The fork() process used to save the data might take a moment, during which the server will stop serving clients. The larger the data set to be saved, the longer it takes the fork() process to complete.
- The memory needed for the data set might double in the worst-case scenario, when all the keys in the memory are modified while snapshotting is in progress.
What should we use?
Now that we have discussed both the modes of persistence Redis provides us with, the big question is what should we use? The answer to this question is entirely based on our application and requirements. In cases where we expect good durability, both snapshotting and AOF can be turned on and be made to work in unison, providing us with redundant persistence. Redis always restores the data from AOF wherever applicable, as it is supposed to have better durability with little loss of data. Both RDB and AOF files can be copied and stored for future use or for recovering another instance of Redis.
In a few cases, where performance is very critical, memory usage is limited, and persistence is also paramount, persistence can be turned off completely. In these cases, replication can be used to get durability. Replication is a process in which two Redis instances, one master and one slave, are in sync with the same data. Clients are served by the master, and the master server syncs the data with a slave.
Replication setup for persistence
Consider a setup as shown in the preceding image; that is:
- Master instance with no persistence
- Slave instance with AOF enabled
In this case, the master does not need to perform any background disk operations and is fully dedicated to serve client requests, except for a trivial slave connection. The slave server configured with AOF performs the disk operations. As mentioned before, this file can be used to restore the master in case of a disaster.
Persistence in Redis is a matter of configuration, balancing the trade-off between performance, disk I/O, and data durability. If you are looking for more information on persistence in Redis, you will find the article by Salvatore Sanfilippo at http://oldblog.antirez.com/post/redis-persistence-demystified.html interesting.
This article helps you to understand the persistence option available in Redis, which could ease your efforts of adding Redis to your application stack.
Resources for Article :
- Using Execnet for Parallel and Distributed Processing with NLTK [Article]
- Parsing Specific Data in Python Text Processing [Article]
- Python Text Processing with NLTK: Storing Frequency Distributions in Redis [Article]