Elasticsearch Performance – Impact of parallel clients

In this article I discuss about the impact of parallel clients on the Elasticsearch (ES) indexing and update performance.

We tested against an index with single shard, 5 documents were indexed in each batch. We kept on increasing the number of parallel clients to see the impact on performance. We tested with 100K documents. The size of each document was around 400KB.

Our parallel clients were java based, they emulated the service instances that do actual indexing and update to ES in production.

We tested against Elasticsearch version 5.1.1.

We did our test with single ES node without any replica. We tested the same index with 5, 10, 15 and 20 parallel clients. The best performance we got was with parallel clients between 15 and 20.

Our node was a blank node with no other indexes on it, our node had 6 unrelated shards.

With 5 parallel clients it was taking around 58-60 minutes to index or update 100K documents. We achieved disk throughput of around 80-85MB.

With 10 parallel clients it took around 33-36 mins to index or update 100K documents. We achieved disk throughput of around 131-143MB.

With 15 parallel clients it took around 25-28 mins to index or update 100K documents. We achieved disk throughput of around 154-164MB.

With 20 parallel clients it took around 23-28 mins to index or update 100K documents. We achieved disk throughput of around 168-170MB.

During my research I found no material related to impact of parallel clients on the performance of Elasticsearch.