Deep Dive Into Elasticsearch
Kunal Kapoor
Software Consultant
Knoldus Software LLP
● What is Elasticsearch
● Getting Started
● Key Terminologies
● CRUD Operations
● Understanding the physical layout
● What happens when you index a document
● How to make an inverted index mutable
● How per-segment search works
● How a delete operation works
● Segment Merging
What is Elasticsearch
● Search engine based on Lucene.
● Provides near real-time search
● Distributed
● Fault Tolerant
● Notable users:
– Facebook
– Github
– LinkedIn
Getting Started
● Download the elasticsearch distribution from
● To start the elasticsearch server run the following
command from within the extracted directory
– ./bin/elasticsearch
● Once the server or node is created you can check the
health of your cluster by running
– curl 'localhost:9200/_cat/health?v'

(Lucene indices)
Deep Dive Into Elasticsearch
Inverted Index
● Data structure storing a mapping, from content such as
words or numbers, to its locations in a database file, or a
set of documents.
● Provides full-text search
● Consists of 2 parts
– Sorted Dictionary
– Postings
● Immutable

Terms Document1 Document2
elasticsearch ✓ ✓
is ✓ ✓
an ✓ -
awesome ✓ -
search ✓ -
engine ✓ -
not - ✓
a - ✓
database - ✓
How to make an inverted index
● Earlier, the whole inverted index would be rewritten to
disk with the changes.
● Very costly approach
● Lucene introduced the concept of per-segment search.
● Now a Lucene index would mean a collection of
segments plus a commit point.
● A commit point is a file that contains the list of segments
that are ready for search.
How per-segment search works?
● New documents are collected in an in-memory buffer.
● Every so often, the buffer is commited (refresh)
– A new supplementary segment with a commit point is
written to file-system cache.
– The transaction log is updated with the request for a
full commit later.
● The buffer is cleared and the segment is made available
for search.
In-memory Buffer
Transaction Log

● The Definitive Guide by Clinton Gormley and Zachary

  3. A node has multiple shards within themAn Es index can span across multiple nodes through shards. A shard is the lowest level worker that contains the data that is inserted in the index.
  4. A lucene index or a shard contains various segments that are like mini indicesThese indices contain the datastructures required by elasticsearch to provide near-real time search.