Last week, Elasticsearch organised their first conference dedicated to the ELK stack
(Elasticsearch, Logstash, Kibana). Two days filled with sessions about search, going from in-depth Elasticsearch sessions
to use-cases by users.

Elasticsearch becomes Elastic

Elasticsearch was also the name for the company behind the three ELK products. During the keynote they
announced a brand change. Elasticsearch, the company,
becomes “Elastic”. This will definitly cause less confusion, now that the company is named differently then one of the
products. The brand change was well prepared, a new logo and website were launched during the
keynote. I wondered why there were zero goodies the first morning of the conference, after the keynote the stickers/t-shirts
showed up with the new logo and name.

Elastic acquires Found

Another great announcement was made during the keynote, the acquirement of Found. Found is
Elasticsearch as a service and provides hosted Elasticsearch solutions, they’ll handle your cluster for you.

Elasticsearch 2.0

Although they didn’t want to make any promises about the future releases (features and release dates), they did mention a
few awesome features. They are working on friendlier error messages, no longer vague query parsing errors but the errors
will point out what is exactly wrong. In the same line, they are also working on simplifying the Query DSL.

A new API will be introduced for re-indexing an index. At the moment you have to create a new index and move the data
yourself, this will become build-in functionality.

Some others:

  • stricter mapping
  • reducers (for computation on aggregations)
  • optimising the cluster state (only push changes to nodes instead of full cluster state)
  • scripting (looking for other solutions to prevent security issues)
  • faster recovery (easier/faster upgrading when in readonly mode)
  • easier installation of plugins

Aggregations

An in-depth talk about aggregations. This talk made it clear to me, again, what kind of complex problems Elasticsearch
solves for you, and yet make it so accessible.

They build aggregations with speed in mind, which might have an impact on accuracy. Aggregations are calculated on each
shard (in memory) and then merged together on a coordinating shard. This might cause some accuracy losses (for example
top 5 on each shards, doesn’t necessarily mean the top 5 overal). They pointed out the importance of
shard_size
for this problem, which allows you to set a bigger size per shard, but returning a smaller size to the client.

Aggregations work on all the documents in your result set. This means that is far more efficient to limit your result
set by adding filters/queries outside the aggregations instead of in each aggregation.

The entire talk was filled with examples and walking through what happens behind the scenes, which made topics much easier
to understand. I had never heard of breadth-first vs depth-first,
their examples made it much more clear.

This is definitely a talk I’ll watch again when it comes online.

Language clients

Elasticsearch provides a few language specific clients, this talk explained a bit why
certain decisions are being made. Each language has its own features but they try to implement a similar interface in
all languages (naming for example).

Most important quote, for me: “Nobody should have a reason to not use the client”. Thats why they are providing low level
clients. They initially started building the clients because none of the community clients implemented Elasticsearch fully.
They don’t want to enforce things or make assumptions for the users.

We made a, slightly less low level, client that uses the PHP client: elasticsearcher.

ELK in the wild

An entire track was dedicated to use-cases. Companies that were using the ELK stack gave an insight in how they were
using it and how they implemented it. Some names: Github, Yelp, Facebook, NASA, US Government, …

The U.S. Geological Survey explained how they were using Elasticsearch to store tweets about earthquakes. Because the
data can be queried live, they could detect earthquakes minutes after they occur. Which turned out to be faster way to
detect earthquakes then the sensors they have in the field.

NASA talked about how the data sent back by the Mars Rover is stored in Elasticsearch, allowing NASA to quickly query the
data.

Awesome stuff.

Book

They gave away a pre-edition of a book “Elasticsearch – The definitive guide”.
I was one of the lucky ones to get one (only the X first got it). I have read the first chapters and already learned quite
a few things, looking forward to finishing it.