Reader small image

You're reading from  Elasticsearch 8.x Cookbook - Fifth Edition

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781801079815
Edition5th Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 10: Backups and Restoring Data

Elasticsearch is commonly used as a data store for logs and other kinds of data. Therefore, if you store valuable data, you will also need tools to back up and restore that data to support disaster recovery.

In earlier versions of Elasticsearch, the only viable solution was to dump your data with a complete scan and then reindex it. As Elasticsearch has matured as a complete product, it now supports native functionalities to back the data up and restore it.

In this chapter, we'll explore how you can configure shared storage, using a Network File System (NFS), to store your backups and how to execute and restore a backup. In the case of cloud and managed deployments, you can use other supported repositories such as Google Cloud Storage, Azure Blob Storage, and Amazon S3.

In the last recipe of the chapter, we will demonstrate how you can use the reindex functionality to clone data between different Elasticsearch clusters. This approach...

Managing repositories

Elasticsearch provides a built-in system that rapidly backs up and restores your data. When working with live data, keeping a backup is complex. This is due to a large number of concurrency problems (such as record writing during the backup phases).

An Elasticsearch snapshot allows you to create snapshots of individual indices (or aliases), or an entire cluster, in a remote repository.

Before starting to execute a snapshot, a repository must be created – this is where your backups or snapshots will be stored.

Getting ready

You will need an up-and-running Elasticsearch installation – similar to the one that we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

In order to execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). Additionally, you can use the Kibana console as it provides code completion and better character...

Executing a snapshot

In the previous recipe, we defined a repository, that is, the place where we will store the backups. Now we can create snapshots of indices (using the full backup of an index) in the exact instant that the command is called.

For each repository, it's possible to define multiple snapshots.

Getting ready

You will need an up-and-running Elasticsearch installation – similar to the one that we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as Curl (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). Additionally, you can use the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, the repository that was created in the Managing repositories recipe is required.

To have a mybooks-1 index with a record, we will execute the following command...

Restoring a snapshot

Once you have snapshots of your data, it can be restored. The restoration process is very fast – the indexed shard data is simply copied onto the nodes and activated.

Getting ready

You will need an up-and-running Elasticsearch installation—similar to the one that we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as Curl (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). Additionally, you can use the Kibana console as it provides code completion and better character escaping for Elasticsearch.

To correctly execute the following commands, the backup that was created in the Executing a snapshot recipe is required.

How to do it...

To restore a snapshot, we will perform the following steps:

  1. To restore a snapshot called snap_1 for the mybooks-* indices, the HTTP method that we use is POST. The command is as...

Setting up an NFS share for backups

Managing the repository (where the data is stored) is the most crucial part of Elasticsearch backup management. Due to its native distributed architecture, the snapshot and the restoration process are designed in a cluster style.

During a snapshot, the shards are copied to the defined repository. If this repository is local to the nodes, then the backup data is spread across all the nodes. For this reason, it's necessary to have shared repository storage if you have a multi-node cluster.

A common approach is to use an NFS, as it's very easy to set up, and it's a very quick solution (additionally, standard Windows Samba shares can be used).

Getting ready

We have a network with the following nodes:

  • Host server: 192.168.1.30 (where we will store the backup data)
  • Elasticsearch master node 1: 192.168.1.40
  • Elasticsearch data node 1: 192.168.1.50
  • Elasticsearch data node 2: 192.168.1.51

You will need...

Reindexing from a remote cluster

The snapshot and restore APIs are very fast and are the preferred way to back up data. However, they do have some limitations:

  • The backup is a safe Lucene index copy, so it depends on the Elasticsearch version that has been used. If you are switching from a version of Elasticsearch that is earlier than version 5.x, then it's not possible to restore the old indices.
  • It's not possible to restore the backups of a newer Elasticsearch version in an older version; the restore is only forward compatible.
  • It's not possible to restore partial data from a backup.

To be able to copy data in this scenario, the solution is to use the reindex API using a remote server.

Getting ready

You will need an up-and-running Elasticsearch installation – similar to the one that we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

In order to execute the commands, any HTTP client...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 8.x Cookbook - Fifth Edition
Published in: May 2022Publisher: PacktISBN-13: 9781801079815
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro