Jan 31 2 Back up Elasticsearch with S3 compatible providers
ElasticSearch is a popular search engine and database that's being used in applications where search and analytics is important. It's been used as a primary database in such applications as HipChat, storing billions of messages while making them searchable.
While being very feature-complete for use cases like that, being new (compared to other popular datastores like MySQL), ElasticSearch also has a disadvantage when being used as a permanent datastore: backups.
In the early days of ElasticSearch, backup was crude. You shut down your node, or flushed its contents to disk, and did a copy of the data storage directory on the harddrive. Copying a data directory, isn't very convenient for high-uptime applications, however.
In later versions, ES introduces snapshots which will let you do a complete copy of an index. As of version 2, there's several different snapshot repository plugins available:
- HDFS
- Amazon S3
- Azure
- File system/Directory
File System
For the file system repository type, ElasticSearch requires that the same directory is being mounted on all nodes in the cluster. This starts getting inconvenient fast as your ES cluster grows.
The mount type could be NFS, CIFS, SSHFS or similar. To make sure the file mount is always available, you can use a program like AutoFS to make sure.
On clusters with a few nodes, I haven't had good luck with it – even using AutoFS, the connection can be unstable and lead to errors from ElasticSearch, and I've also experienced nodes crashing when the repository mount came offline.
S3/Azure
Then there's S3 and Azure. They work great – provided that there isn't anything preventing you from storing your data with a 3rd party, American-owned cloud provider. It's plug and play.
S3 Compatible
If you for some reason can't use S3, there's other providers that provide storage cloud services that are compatible with the S3 API.
If you prefer an on-prem solution, you can use storage engine that support it. Minio is a server written in Go that's very easy to get started with. More complex tools include Riak S2 and Ceph.
Creating an S3 compatible repository is the same as creating an Amazon S3 repository. You need to install the cloud-aws plugin in ES, and in the elasticsearch.yml config file, you need to add the following line:
cloud.aws.signer: S3SignerType
Not adding this line will result in errors like these:
com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 400; Error Code: InvalidArgument; Request ID: null)
and
The request signature we calculated does not match the signature you provided
Per default, it's AWSS3SignerType, and that prevents you from using an S3 compatible storage repository.
Setting the repository up in ES is similar to the AWS type, except you also specify an endpoint. For example, with the provider Dunkel.de, you'd add a repository like this:
POST http://es-node:9200/_snapshot/backups
{ "type": "s3", "settings": { "bucket": "backups", "base_path": "/snapshots", "endpoint": "s3-compatible.example.com", "protocol": "https", "access_key": "Ze5Zepu0Cofax8", "secret_key": "Qepi7Pe0Foj2RuNat2Fox8Zos7YuNat2Fox8Zos7Yu" } }
To learn more about the snapshot endpoints, here's a link to the ES documentation.
If you take a lot of different backups, I'd also recommend to take a look at the kopf ES plugin, which has a nice web interface for creating, restoring and otherwise administering snapshots.
Periodical snapshots
I've had success setting up snapshots using cronjobs. Here's an example on how to automatically do snapshots.
On one of the ES nodes, simply add a cronjob which fires a simple request to ES, like this, which creates a snapshot with the current date:
0,30 * * * * curl -XPUT 'http://127.0.0.1:9200/_snapshot/backups/'$(date +\%d-\%m-\%Y-\%H-\%M-\%S)''
This will create a snapshot in the backups repository with a name like "20-12-2016-11-30-00" – the current date and time. You can also use a similar command to create a new ES repository every month, for example, so you can periodically take a complete snapshot of the cluster.
If you want a little more control, Elastic provides a nice tool called Curator which lets you easily organise repositories, snapshots, deleting old indexes, and more. Instead of doing a curl request in a cronjob, you write a Curator script which you can run in a cronjob – it gives you more flexibility.
Concurrency errors with snapshots
This section isn't S3 specific, but I've run into these issues so often that I thought I'd write a little about them.
ElasticSearch can be extremely finicky when there's network timeouts while doing snapshots, for example, and you won't get any help from the official ES documentation.
For example, you may experience that a snapshot is stuck. It's IN_PROGRESS, but it never finishes. You can then do a DELETE <repository_name>/<snapshot_name>, and it will be of status ABORTED. Then you might experience you're stuck. It will stay at ABORTED forever, and when trying to DELETE it again, you'll get this:
{ "error": { "root_cause": [ { "type": "concurrent_snapshot_execution_exception", "reason": "[<repository_name>:<snapshot_name>] another snapshot is currently running cannot delete" } ], "type": "concurrent_snapshot_execution_exception", "reason": "[<repository_name>:<snapshot_name>] another snapshot is currently running cannot delete" }, "status": 503 }
Now, trying to create another snapshot gets you this:
{ "error": { "root_cause": [ { "type": "concurrent_snapshot_execution_exception", "reason": "[<repository_name>:<snapshot_name>] a snapshot is already running" } ], "type": "concurrent_snapshot_execution_exception", "reason": "[<repository_name>:<snapshot_name>] a snapshot is already running" }, "status": 503 }
The only way to fix this is to do either a rolling upgrade (e.g. restart one node, then the next), or a complete restart of the whole cluster. That's it.