In this article I discuss about the impact of auto generated document id and external document id on the Elasticsearch (ES) indexing performance.
While indexing documents in ES, we have option to use ES auto generated document id or pass our own external document id to the IndexRequest method. The auto generated document id option is considered to be faster as ES auto generates unique id, it does not have look for duplicate document id to make sure it is unique. With external document id, we pass the document id externally, while indexing ES would look if the external document id already exists or not, if it does not exist it would index the document with that id but if it exists, it would simply overwrite the existing document.
We decided to do a test to see the impact of auto generated doc id and external doc id on indexing performance
We tested against an index with single shard, 5 documents were indexed in each batch. We tested with 5 parallel clients. We indexed 100K documents, once with auto generated doc id method and then with external doc id method.
It took 56 mins to index 100K documents with auto generated document id method and it took 58 mins with external document id method.
So the auto generated document id method was faster but not by a very big margin, but if the document count is millions or billions, it could have significant impact on indexing performance. In many of the cases external document id method is preferred as it give us better control over document id.