Elasticsearch Performance – Impact of number of Shards

In this article I discuss about the impact of number of shards on the Elasticsearch (ES) indexing and update performance.

We tested with 5 parallel clients indexing 5 documents in each batch. We kept on increasing the number of shards to see the impact on performance. We tested with 100K and 1M documents. The size of each document was around 400KB.

We tested against Elasticsearch version 5.1.1.

We did our test with single ES node without any replica. We tested the same index with 1, 2, 4, 8 and 16 shards. The best performance we got was with shard count between 4 and 8.

Our node was a blank node with no other indexes or shards on it.

With 1 shard it was taking around 60 minutes to index or update 100K documents.

With 2 shards it took around 45-50 mins to index or update 100K documents.

With 4 shards it took around 35-40 mins to index or update 100K documents.

With 8 shards it took around 30-35 mins to index or update 100K documents.

With 16 shards we got similar performance as 8 shards.

We increased our document count to 1Million with 6 shards and got proportional performance i.e. around 300-350 mins to index or update 1Million documents.

On some of the blogs I researched they suggested 3 shards per index per node. They basically suggest to limit the shard size as per jvm heap size i.e around 20-30 GB, so if you plan to index 100GB of data, 3-4 shards should be enough but if you plan to index 1TB of data you might need 30-40 shards.

But from our test we could see that number of documents did not impact performance that much, rather we got proportional performance.