You're reading from Elasticsearch 5.x Cookbook - Third Edition
Elasticsearch is very commonly used as a datastore for logs and other kind of data, so if you store valuable data you also need tools to back up and restore this data to support disaster recovery.
In the first versions of Elasticsearch the only viable solution was to dump your data with a complete scan and then reindex it. As Elasticsearch matured as a complete product, it supported native functionalities to back up the data and to restore it.
In this chapter, we'll see how to configure a shared storage via NFS for storing your backups, and how to execute and restore a backup.
In the last recipe of the chapter we will see how to use the reindex functionality to clone data between different Elasticsearch clusters. This approach is very useful if you are not able to use standard backup/restore functionalities due to moving from an old Elasticsearch version to the new one.
Elasticsearch provides a built-in system to rapidly ot and restore your data. When working with live data, keeping a backup is complex, due to the large number of concurrency problems.
An Elasticsearch snapshot allows for the creation of snapshots of individual indices (or aliases), or an entire cluster, into a remote repository.
Before starting to execute a snapshot, a repository must be created--this is where your backups/snapshots will be stored.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line you need to install curl
for your operating system.
We need to edit config/elasticsearch.yml
and add the directory of your backup repository:
path.repo: /tmp/
For our examples, we'll be using the /tmp
directory available in every Unix system. Generally, in a production cluster, this directory should be a shared repository...
In the previous recipe, we defined a repository: the place where we will store the backups. Now we can create snapshots of indices, a full backup of an index, in the exact instant that the command is called.
For every repository it's possible to define multiple snapshots.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line you need to install curl
for your operating system.
To correctly execute the following command, the repository created in the previous recipe is required.
To manage a snapshot, we will perform the following steps:
To create a snapshot called
snap_1
for thetest
andtest1
indices, the HTTP method isPUT
and thecurl
command is as follows:curl -XPUT "http://localhost:9200/_snapshot/my_repository/snap_1? wait_for_completion=true" -d '{ "indices...
Once you have snapshots of your data, it can be restored. The restore process is very fast: the indexed shard data is simply copied on the nodes and activated.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line, you need to install curl
for your operative system.
To correctly execute the following command, the backup created in the previous recipe is required.
To restore a snapshot, we will perform the following steps:
To restore a snapshot called
snap_1
for thetest
andtest1
indices, the HTTP method isPUT
and thecurl
command is:curl -XPOST "http://localhost:9200/_snapshot/my_repository/snap_1/_restore? pretty" -d '{ "indices": "test-index,test-2", "ignore_unavailable": "true", "include_global_state": false, "rename_pattern": "test-(.+...
The snapshot and restore APIs are very fast and the preferred way to back up data, but they have some limitations, such as:
The backup is a safe Lucene index copy, so it depends on the Elasticsearch version used. If you are switching from a version of Elastisearch that is prior to version 5.x, it's not possible to restore old indices.
It's not possible to restore backups of a newer Elasticsearch version in an older version. The restore is only forward-compatible.
It's not possible to restore partial data from a backup.
To be able to copy data in this scenario, the solution is to use the reindex API using a remote server.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via command line, you need to install curl
for your operative system.