Reader small image

You're reading from  Elasticsearch 7.0 Cookbook. - Fourth Edition

Product typeBook
Published inApr 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789956504
Edition4th Edition
Languages
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Backups and Restoring Data

Elasticsearch is commonly used as a data store for logs and other kinds of data. Therefore, if you store valuable data, then you will also need tools to back up and restore this data to support disaster recovery.

In the earlier versions of Elasticsearch, the only viable solution was to dump your data with a complete scan and then reindex it. As Elasticsearch matured as a complete product, it supported native functionalities to back up the data and restore it.

In this chapter, we'll explore how you can configure a shared storage using Network File System (NFSfor storing your backups, and how to execute and restore a backup.

In the last recipe of the chapter, we will demonstrate how to use the reindex functionality to clone data between different Elasticsearch clusters. This approach is very useful if you are not able to use standard backup...

Managing repositories

Elasticsearch provides a built-in system to rapidly back up and restore your data. When working with live data, keeping a backup is complex, due to the large number of concurrency problems.

An Elasticsearch snapshot allows you to create snapshots of individual indices (or aliases), or an entire cluster, in a remote repository.

Before starting to execute a snapshot, a repository must be created – this is where your backups or snapshots will be stored.

Getting ready

You will need an up-and-running Elasticsearch installation – similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

In order to execute the commands...

Executing a snapshot

In the previous recipe, we defined a repository – that is, the place where we will store the backups. Now we can create snapshots of indices (using the full backup of an index) in the exact instant that the command is called.

For every repository it's possible to define multiple snapshots.

Getting ready

You will need an up-and-running Elasticsearch installation – similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

To execute the commands, any HTTP client can be used, such as Curl (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use the Kibana console as it provides code completion...

Restoring a snapshot

Once you have snapshots of your data, it can be restored. The restore process is very fast – the indexed shard data is simply copied on the nodes and activated.

Getting ready

You will need an up-and-running Elasticsearch installation—similar to the one that we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started

To execute the commands, any HTTP client can be used, such as Curl (https://curl.haxx.se/) or Postman (https://www.getpostman.com/). You can use the Kibana console as it provides code completion and better character escaping for Elasticsearch.

In order to correctly execute the following commands, the backup that was created in...

Setting up an NFS share for backups

Managing the repository (where the data is stored) is the most crucial part in Elasticsearch backup management. Due to its native distributed architecture, the snapshot and the restore are designed in a cluster style.

During a snapshot, the shards are copied to the defined repository. If this repository is local to the nodes, then the backup data is spread across all the nodes. For this reason, it's necessary to have shared repository storage if you have a multinode cluster.

A common approach is to use an NFS, as it's very easy to set up and it's a very quick solution (additionally, standard Windows Samba shares can be used.)

Getting ready

We have a network with the following...

Reindexing from a remote cluster

The snapshot and restore APIs are very fast and are the preferred way to back up data, but they do have some limitations:

  • The backup is a safe Lucene index copy, so it depends on the Elasticsearch version that is used. If you are switching from a version of Elastisearch that is prior to version 5.x, then it's not possible to restore the old indices.
  • It's not possible to restore the backups of a newer Elasticsearch version in an older version; the restore is only forward-compatible.
  • It's not possible to restore partial data from a backup.

To be able to copy data in this scenario, the solution is to use the reindex API using a remote server.

Getting ready

You will need an up-and-running...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 7.0 Cookbook. - Fourth Edition
Published in: Apr 2019Publisher: PacktISBN-13: 9781789956504
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro