SlideShare a Scribd company logo
Elasticsearch…When?
• Use cases
1. Search
a. When you need to provide a search experience for users
b. Features like full-text search, document custom scoring, suggestions, result hightliting
etc.
2. Logging
a. Highly scalable ingestion rates
b. Full text search
c. Fast aggregations
Elasticsearch…When Not?
• DSL less common, flexible as compared to POSTgres SQL
• Everything is indexed by default, which creates an index overhead.
• Less control over consistency (no transactions)
• Need to understand tokenizers/analyzers to understand how to query
data and how it is stored.
Tuning For Indexing
• Disable refresh and replicas for initial loads
• Indexing buffer size
• Give memory to filesystem cache
• Use auto-generated ids
• Disable swapping
Tuning For Indexing (contd.)
• Increase the refresh interval
• Use bulk requests
• Use multiple worker/threads to send data
Tuning For Searching
• Faster hardware
• Document modelling
• Pre-index data
• Mapping identifiers as keywords
• Search rounded dates
• Force merge read-only indices
Tuning For Searching (contd.)
• Index sorting
• Preference to optimize cache utilization
• Warming up global ordinals
• Search as few fields as possible
• Avoid scripts
Elasticsearch vs Solr
Feature Solr/SolrCloud Elasticsearch
Community and Developers Apache Software Foundation and
community support
Single commercial entity and its
employees
Node Discovery Apache Zookeeper, mature and
battle tested in a large number of
projects
Zen, built into Elasticsearch
itself, requires dedicated master
nodes to be split brain proof
Shard Placement
Static in nature, requires
manual work to migrate shards
Dynamic, shards can be moved
on demand depending on the
cluster state
Caches Global, invalidated with each
segment change
Per segment, better for
dynamically changing data
Analytics Engine Facets and powerful streaming
aggregations
Sophisticated and highly flexible
aggregations
Optimized Query Execution Currently none
Faster range queries depending
on the context
Search Speed Best for static data
Very good for rapidly changing
data

More Related Content

Elasticsearch tuning

  • 1. Elasticsearch…When? • Use cases 1. Search a. When you need to provide a search experience for users b. Features like full-text search, document custom scoring, suggestions, result hightliting etc. 2. Logging a. Highly scalable ingestion rates b. Full text search c. Fast aggregations
  • 2. Elasticsearch…When Not? • DSL less common, flexible as compared to POSTgres SQL • Everything is indexed by default, which creates an index overhead. • Less control over consistency (no transactions) • Need to understand tokenizers/analyzers to understand how to query data and how it is stored.
  • 3. Tuning For Indexing • Disable refresh and replicas for initial loads • Indexing buffer size • Give memory to filesystem cache • Use auto-generated ids • Disable swapping
  • 4. Tuning For Indexing (contd.) • Increase the refresh interval • Use bulk requests • Use multiple worker/threads to send data
  • 5. Tuning For Searching • Faster hardware • Document modelling • Pre-index data • Mapping identifiers as keywords • Search rounded dates • Force merge read-only indices
  • 6. Tuning For Searching (contd.) • Index sorting • Preference to optimize cache utilization • Warming up global ordinals • Search as few fields as possible • Avoid scripts
  • 7. Elasticsearch vs Solr Feature Solr/SolrCloud Elasticsearch Community and Developers Apache Software Foundation and community support Single commercial entity and its employees Node Discovery Apache Zookeeper, mature and battle tested in a large number of projects Zen, built into Elasticsearch itself, requires dedicated master nodes to be split brain proof Shard Placement Static in nature, requires manual work to migrate shards Dynamic, shards can be moved on demand depending on the cluster state Caches Global, invalidated with each segment change Per segment, better for dynamically changing data Analytics Engine Facets and powerful streaming aggregations Sophisticated and highly flexible aggregations Optimized Query Execution Currently none Faster range queries depending on the context Search Speed Best for static data Very good for rapidly changing data