Chapter 10. Backups and Security
Data backups and data security are the most important aspects of any organization. It is even more important to design and implement business continuity plans to tackle data loss because of various factors. While Elasticsearch is not a database and it does not provide the backup and security functionalities that you can get in databases, it still offers some way around this. Let's learn how you can create cost effective and robust backup plans for your Elasticsearch clusters.
In this chapter, we will cover the following topics:
Introducing backup and restore mechanisms
Securing an Elasticsearch cluster
Load balancing using Nginx
Introducing backup and restore mechanisms
In Elasticsearch, you can implement a backup and restore functionality in two different ways depending on the requirements and efforts put in. You can either create a script to create manual backups and restoration or you can opt for a more automated and functionality-rich Backup-Restore API offered by Elasticsearch.
Backup using snapshot API
A snapshot is the backup of a complete cluster or selected indices. The best thing about snapshots is that they are incremental in nature. So, only data that has been changed since the last snapshot will be taken in the next snapshot.
Life was not so easy before the release of Elasticsearch Version 1.0.0. This release not only introduced powerful aggregation functionalities to Elasticsearch, but also brought in the Snapshot Restore API to create backups and restore them on the fly. Initially, only a shared file system was supported by this API, but gradually it has been possible to use this API on AWS to create...
Elasticsearch does not have any default security mechanisms. Anyone can destroy your entire data collection with just a single command. However, with the increasing demand of securing Elasticsearch clusters, the Elastic team has launched a new product called shield that provides you with a complete security solution including authentication, encryption, role-based access control, IP filtering, field- and document-level security, and audit logging. However, if you cannot afford shield, there are other ways to protect Elasticsearch. One way can be to not expose Elasticsearch publicly and put a firewall in front of it to allow access to only a limited number of IPs. The other way is to wrap Elasticsearch in a reverse proxy to enable access control and SSL encryption. In this chapter, we will see how you can secure your Elasticsearch cluster using a basic HTTP authentication behind a reverse proxy.
In the remaining sections, we will go on to learn how to use Nginx to secure...
In this chapter, you learned how to create data backups of an Elasticsearch cluster and restore them back into the same or another cluster. You also learned how to secure Elasticsearch clusters and load balance them using Nginx.
Finally, we have reached the end of the book, and we hope that you have had a pleasant reading experience. Elasticsearch is vast, and covering every tiny detail in this book was not possible. However, as per the goal, it covers almost every "essential" topic of Elasticsearch for developers to start from scratch and to be able to manage and scale an Elasticsearch cluster on their own. Most interestingly, this book serves both Java and Python programmers under one hood.
Not only has Elasticsearch matured, but the community around this technology is also much more mature now. If you face any issue, you can post your questions to the official user discussion group: https://discuss.elastic.co.
We also suggest you keep visiting the official blog of Elasticsearch at...