The audience will get to learn how to use ElasticSearch efficiently and reliably, when data scales up in their applications. It will be about tuning your ElasticSearch and configuring ElasticSearch internal queues and buffers for heavy indexing. Another takeaway will be some insight to internals of ElasticSearch.
Report
Share
Report
Share
1 of 7
More Related Content
Elasticsearch tuning
1. Elasticsearch…When?
• Use cases
1. Search
a. When you need to provide a search experience for users
b. Features like full-text search, document custom scoring, suggestions, result hightliting
etc.
2. Logging
a. Highly scalable ingestion rates
b. Full text search
c. Fast aggregations
2. Elasticsearch…When Not?
• DSL less common, flexible as compared to POSTgres SQL
• Everything is indexed by default, which creates an index overhead.
• Less control over consistency (no transactions)
• Need to understand tokenizers/analyzers to understand how to query
data and how it is stored.
3. Tuning For Indexing
• Disable refresh and replicas for initial loads
• Indexing buffer size
• Give memory to filesystem cache
• Use auto-generated ids
• Disable swapping
4. Tuning For Indexing (contd.)
• Increase the refresh interval
• Use bulk requests
• Use multiple worker/threads to send data
5. Tuning For Searching
• Faster hardware
• Document modelling
• Pre-index data
• Mapping identifiers as keywords
• Search rounded dates
• Force merge read-only indices
6. Tuning For Searching (contd.)
• Index sorting
• Preference to optimize cache utilization
• Warming up global ordinals
• Search as few fields as possible
• Avoid scripts
7. Elasticsearch vs Solr
Feature Solr/SolrCloud Elasticsearch
Community and Developers Apache Software Foundation and
community support
Single commercial entity and its
employees
Node Discovery Apache Zookeeper, mature and
battle tested in a large number of
projects
Zen, built into Elasticsearch
itself, requires dedicated master
nodes to be split brain proof
Shard Placement
Static in nature, requires
manual work to migrate shards
Dynamic, shards can be moved
on demand depending on the
cluster state
Caches Global, invalidated with each
segment change
Per segment, better for
dynamically changing data
Analytics Engine Facets and powerful streaming
aggregations
Sophisticated and highly flexible
aggregations
Optimized Query Execution Currently none
Faster range queries depending
on the context
Search Speed Best for static data
Very good for rapidly changing
data