Archive for January, 2017

Jan 31 2 Back up Elasticsearch with S3 compatible providers

ElasticSearch is a popular search engine and database that's being used in applications where search and analytics is important. It's been used as a primary database in such applications as HipChat, storing billions of messages while making them searchable.

While being very feature-complete for use cases like that, being new (compared to other popular datastores like MySQL), ElasticSearch also has a disadvantage when being used as a permanent datastore: backups.

In the early days of ElasticSearch, backup was crude. You shut down your node, or flushed its contents to disk, and did a copy of the data storage directory on the harddrive. Copying a data directory, isn't very convenient for high-uptime applications, however.

In later versions, ES introduces snapshots which will let you do a complete copy of an index. As of version 2, there's several different snapshot repository plugins available:

  • HDFS
  • Amazon S3
  • Azure
  • File system/Directory

File System

For the file system repository type, ElasticSearch requires that the same directory is being mounted on all nodes in the cluster. This starts getting inconvenient fast as your ES cluster grows.

The mount type could be NFS, CIFS, SSHFS or similar. To make sure the file mount is always available, you can use a program like AutoFS to make sure.

On clusters with a few nodes, I haven't had good luck with it – even using AutoFS, the connection can be unstable and lead to errors from ElasticSearch, and I've also experienced nodes crashing when the repository mount came offline.

S3/Azure

Then there's S3 and Azure. They work great – provided that there isn't anything preventing you from storing your data with a 3rd party, American-owned cloud provider. It's plug and play.

S3 Compatible

If you for some reason can't use S3, there's other providers that provide storage cloud services that are compatible with the S3 API.

If you prefer an on-prem solution, you can use storage engine that support it. Minio is a server written in Go that's very easy to get started with. More complex tools include Riak S2 and Ceph.

Creating an S3 compatible repository is the same as creating an Amazon S3 repository. You need to install the cloud-aws plugin in ES, and in the elasticsearch.yml config file, you need to add the following line:

cloud.aws.signer: S3SignerType

Not adding this line will result in errors like these:

com.amazonaws.services.s3.model.AmazonS3Exception: 
null (Service: Amazon S3; Status Code: 400; Error Code: InvalidArgument; Request ID: null)

and

The request signature we calculated does not match the signature you provided

Per default, it's AWSS3SignerType, and that prevents you from using an S3 compatible storage repository.

Setting the repository up in ES is similar to the AWS type, except you also specify an endpoint. For example, with the provider Dunkel.de, you'd add a repository like this:

POST http://es-node:9200/_snapshot/backups
{
  "type": "s3",
  "settings": {
    "bucket": "backups",
    "base_path": "/snapshots",
    "endpoint": "s3-compatible.example.com",
    "protocol": "https",
    "access_key": "Ze5Zepu0Cofax8",
    "secret_key": "Qepi7Pe0Foj2RuNat2Fox8Zos7YuNat2Fox8Zos7Yu"
  }
}

To learn more about the snapshot endpoints, here's a link to the ES documentation.

If you take a lot of different backups, I'd also recommend to take a look at the kopf ES plugin, which has a nice web interface for creating, restoring and otherwise administering snapshots.

Periodical snapshots

I've had success setting up snapshots using cronjobs. Here's an example on how to automatically do snapshots.

On one of the ES nodes, simply add a cronjob which fires a simple request to ES, like this, which creates a snapshot with the current date:

0,30 * * * * curl -XPUT 'http://127.0.0.1:9200/_snapshot/backups/'$(date +\%d-\%m-\%Y-\%H-\%M-\%S)''

This will create a snapshot in the backups repository with a name like "20-12-2016-11-30-00" – the current date and time. You can also use a similar command to create a new ES repository every month, for example, so you can periodically take a complete snapshot of the cluster.

If you want a little more control, Elastic provides a nice tool called Curator which lets you easily organise repositories, snapshots, deleting old indexes, and more. Instead of doing a curl request in a cronjob, you write a Curator script which you can run in a cronjob – it gives you more flexibility.

Concurrency errors with snapshots

This section isn't S3 specific, but I've run into these issues so often that I thought I'd write a little about them.

ElasticSearch can be extremely finicky when there's network timeouts while doing snapshots, for example, and you won't get any help from the official ES documentation.

For example, you may experience that a snapshot is stuck. It's IN_PROGRESS, but it never finishes. You can then do a DELETE <repository_name>/<snapshot_name>, and it will be of status ABORTED. Then you might experience you're stuck. It will stay at ABORTED forever, and when trying to DELETE it again, you'll get this:

{
 "error": {
 "root_cause": [
   {
     "type": "concurrent_snapshot_execution_exception",
     "reason": "[<repository_name>:<snapshot_name>] another snapshot is currently running cannot delete"
   }
 ],
 "type": "concurrent_snapshot_execution_exception",
 "reason": "[<repository_name>:<snapshot_name>] another snapshot is currently running cannot delete"
 },
 "status": 503
}

Now, trying to create another snapshot gets you this:

{
 "error": {
 "root_cause": [
   {
     "type": "concurrent_snapshot_execution_exception",
     "reason": "[<repository_name>:<snapshot_name>] a snapshot is already running"
   }
 ],
 "type": "concurrent_snapshot_execution_exception",
 "reason": "[<repository_name>:<snapshot_name>] a snapshot is already running"
 },
 "status": 503
}

The only way to fix this is to do either a rolling upgrade (e.g. restart one node, then the next), or a complete restart of the whole cluster. That's it.

Jan 15 1 Simple Mac window management with BetterTouchTool

As a software developer, I not only work with lots of different windows on my computer screen, but with lots of different sets of windows. Not only am I dependent on windows being in different places, but in different sizes. As such, I need to manage all these windows in some way.

For example, I often need to have 3 browser windows open. Maybe one for documentation, one for a project management tool and one for testing. And then I'd of course want a text editor. Maybe for a while I'd like one of the windows to take up more space, so I move one to a different screen and make the other window larger.

It would take me a while to manually drag these windows to their right places.

Luckily, a program for Mac called BetterTouchTool allows me to easily define sets of hotkeys that carries out all this moving and sizing of windows. I find that it speeds up my workflow a lot – I can easily organise my desktop.

It's even preferable to the Windows 7-style drag-to-maximize Snap feature since I don't have to use my mouse at all.

Here's the shortcuts I've defined:

Use the link below to download a BTT preset of these shortcuts.

Did you create any cool sets of shortcuts or workflow improvements with BetterTouchTool you want to share? Let us know in the comments.