SlideShare a Scribd company logo
Elasticsearch
For BEGINNERS
Neil Baker
Software Engineer (Zapelin S.L.)
What is elasticsearch?
ElasticSearch is a free and open source distributed inverted index created by Shay Banon.
Build on top of Apache Lucene
- Lucene is a most popular java-based full text search index implementation.
First public release version v0.4 in February 2010.
Developed in Java, so inherently cross-plateform.
Which companies use elasticsearch?
Easy to scale
Everything is one JSON call away (RESTful API)
Unleashed power of Lucene under the hood
Excellent Query DSL
Multi-tenancy
Support for advanced search features (Full Text)
Configurable and Extensible
Document Oriented
Schema free
Conflict management
Active community
Why Elasticsearch?
Elasticsearch allows you to start small, but will grow
with your business. It is built to scale horizontally out
of the box.
As you need more capacity, just add more nodes, and
let the cluster reorganize itself to take advantage of
the extra hardware.
Easy to Scale
RESTful API
Elasticsearch is API driven. Almost any action can be performed using a simple
RESTful API using JSON over HTTP.  An API already exists in the language of your
choice.
Responses are always in JSON, which is both machine and human readable.
Excellent Query DSL
The REST API exposes a very complex and capable query DSL, that is very easy to use. Every query is just a
JSON object that can practically contain any type of query, or even several of them combined.
Using filtered queries, with some queries expressed as Lucene filters, helps leverage caching and thus speed
up common queries, or complex queries with parts that can be reused.
Faceting, another very common search feature, is just something that upon-request is accompanied to
search results, and then is ready for you to use.
Per-operation Persistence
Elasticsearch puts your data safety first. Document changes are recorded in
transaction logs on multiple nodes in the cluster to minimize the chance of any
data loss.
You can host multiple indexes on one Elasticsearch installation - node
or cluster. Each index can have multiple "types", which are essentially
completely different indexes.
The nice thing is you can query multiple types and multiple indexes
with one simple query. This opens quite a lot of options.
Multi-tenancy
Support for advanced search features (Full Text)
Elasticsearch uses Lucene under the covers to provide the most powerful full text search
capabilities available in any open source product.
Search comes with multi-language support, a powerful query language, support for
geolocation, context aware did-you-mean suggestions, autocomplete and search snippets.
script support in filters and scorers
Many of Elasticsearch configurations can be changed while Elasticsearch is running, but some will require a restart (and in some cases
reindexing). Most configurations can be changed using the REST API too.
Elasticsearch has several extension points - namely site plugins (let you serve static content from ES - like monitoring javascript apps),
rivers (for feeding data into Elasticsearch), and plugins that let you add modules or components within Elasticsearch itself. This allows you
to switch almost every part of Elasticsearch if so you choose, fairly easily.
If you need to create additional REST endpoints to your Elasticsearch cluster, that is easily done as well.
Configurable and Extensible
Document Oriented
Store complex real world entities in Elasticsearch as structured JSON
documents. All fields are indexed by default, and all the indices can be
used in a single query, to return results at breath taking speed.
Elasticsearch allows you to get started easily. Toss it a JSON document
and it will try to detect the data structure, index the data and make it
searchable. Later, apply your domain specific knowledge of your data
to customize how your data is indexed.
Schema free
Conflict management
Optimistic version control can be used where needed to ensure that data is
never lost due to conflicting changes from multiple processes.
Active community
The community, other than creating nice tools and plugins, is very
helpful and supporting. The overall vibe is really great, and this is an
important metric of any OSS project.
There are also some books currently being written by community
members, and many blog posts around the net sharing experiences
and knowledge
BASIC CONCEPTS
Cluster :
A cluster consistsofone or morenodeswhichsharethesamecluster name. Eachclusterhasasinglemasternode whichis chosenautomaticallyby
thecluster andwhichcan bereplacedif the currentmasternode fails.
Node :
A nodeisarunninginstanceofelasticsearchwhichbelongsto acluster. Multiplenodescanbestartedonasingleserverfor testingpurposes, but
usuallyyoushouldhaveone nodeper server.
Atstartup, anodewill useunicast(or multicast, if specified)to discoveranexistingcluster withthesameclusternameandwill tryto jointhatcluster.
Index :
Anindex is like a‘database’inarelationaldatabase. Ithas amappingwhichdefines multipletypes.
Anindex is alogicalnamespacewhichmapsto one or moreprimaryshardsandcanhavezero or morereplicashards.
Type :
A type islikea‘table’inarelationaldatabase. Each typehasalistoffields thatcanbespecifiedfordocuments of that type. The mappingdefines
howeachfieldinthedocumentis analyzed.
Document :
A documentisaJSONdocumentwhichis storedinelasticsearch. Itislike arowinatableinarelational database. Each documentisstoredinan
indexandhas atypeandan id.
A documentisaJSONobject(also knowninotherlanguages asa hash /hashmap/ associative array) whichcontains zeroor more fields, or key-
value pairs. Theoriginal JSONdocumentthatisindexedwillbestoredinthe_sourcefield, whichisreturnedbydefaultwhengettingor searching
for adocument.
Field :
A documentcontains alistoffields, or key-value pairs. Thevaluecanbeasimple(scalar)value(ega string, integer, date), or anestedstructurelike
an arrayoranobject. A fieldis similartoacolumnina table ina relationaldatabase.
The mappingfor eachfieldhas afield‘type’(notto be confusedwithdocumenttype)whichindicatesthetypeof data thatcanbe storedinthatfield,
eginteger, string, object. Themappingalso allows youto define(amongstother things) howthevalueforafieldshouldbe analyzed.
Mapping :
A mappingislikea‘schemadefinition’inarelational database. Eachindexhas amapping, whichdefineseachtype withintheindex, plus anumber
ofindex-widesettings.A mappingcaneither bedefinedexplicitly, or itwill begeneratedautomaticallywhenadocumentis indexed
Shard :
A shardisasingle Luceneinstance. Itis alow-level“worker��� unitwhich is managedautomaticallybyelasticsearch. An indexisalogicalnamespace
whichpointstoprimaryandreplicashards.
Elasticsearchdistributes shards amongstallnodes in the cluster, andcanmove shardsautomaticallyfromone nodeto another inthecaseof node
failure, or theadditionof newnodes.
PrimaryShard :
Eachdocumentisstoredin asingleprimaryshard. Whenyouindexadocument, itisindexedfirston theprimaryshard, thenonallreplicasof the
primaryshard. Bydefault,anindex has5 primaryshards.Youcanspecifyfewer or more primaryshards to scalethenumber ofdocumentsthat
your indexcanhandle.
ReplicaShard :
Eachprimaryshardcanhavezero ormore replicas. A replicaisacopyof the primaryshard, andhastwo purposes:
1) increasefailover: areplicashardcanbepromotedto aprimaryshardiftheprimaryfails.
2) increaseperformance:get andsearchrequests canbehandledbyprimaryor replicashards.
ElasticSearch Routing
All of your data lives in a primary shard, somewhere in the cluster. You may have five shards or five hundred, but any particular
document is only located in one of them. Routing is the process of determining which shard that document will reside in.
Elasticsearch has no idea where to look for your document. All the docs were randomly distributed around your cluster. so
Elasticsearch has no choice but to broadcasts the request to all  shards. This is a non-negligible overhead and can easily impact
performance.
Wouldn’t it be nice if we could tell Elasticsearch which shard the document lived in? Then you would only have to search one shard
to find the document(s) that you need.
Routing ensures that all documents with the same routing value will locate to the same shard, eliminating the need to broadcast
searches.
Cluster Architecture
Index Request
Search Request
ELASTIC VS. MySQL
ElasticSearch
Indices
Types
Documents
Keys
MySQL
Database
Tables
Rows
Columns
Performance
Core i7, a 2Ghz, 8GB RAM, 128GB SSD)
Insert of 10 Mio. Datasets:
Elasticsearch: 23 Minutes
MySQL without index: 56 Minutes
MySQL with Index: 228 Minutes
Select name and firstname of 100 Entrys:
Elasticsearch: 5 ms
MySQL: 9 ms
Select of 100 full Entrys:
Elasticsearch: 5 ms
MySQL: 9 ms
Select of the next 100 full Entrys:
Elasticsearch: 4 ms
MySQL: 18 ms
INSTALL ELASTIC
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.tar.gz
tar -xzf elasticsearch-5.1.1.tar.gz
cd elasticsearch-5.1.1/
./bin/elasticsearch
Is it running?
GET http://localhost:9200/?pretty
Response :
{
"name": "Vivisector",
"cluster_name": "elasticsearch",
"version": {
"number": "2.3.3",
"build_hash": "218bdf10790eef486ff2c41a3df5cfa32dadcfde",
"build_timestamp": "2016-05-17T15:40:04Z",
"build_snapshot": false,
"lucene_version": "5.5.0"
},
"tagline": "You Know, for Search"
}
Let´s PLAY WITH
ELASTICSEARCH
Indexing a document
Request :
$ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{
"rank": 21,
"city": "Boston",
"state": "Massachusetts",
"population2010": 617594,
"land_area": 48.277,
"location": {
"lat": 42.332,
"lon": 71.0202 },
"abbreviation": "MA"
}‘
Response : {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":1}
Getting a document
Request:
$ curl -XGET "http://localhost:9200/test-data/cities/21?pretty"
Response:
{
"_index" : "test-data",
"_type" : "cities",
"_id" : "21",
"_version" : 1,
"exists" : true, "_source" : {
"rank": 21,
"city": "Boston",
"state": "Massachusetts",
"population2010": 617594,
"land_area": 48.277,
"location": {
"lat": 42.332,
"lon": 71.0202 },
"abbreviation": "MA"
}
}
Updating a document
Request :
$ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{
"rank": 21,
"city": "Boston",
"state": "Massachusetts",
"population2010": 617594,
"population2012": 636479,
"land_area": 48.277,
"location": {
"lat": 42.332,
"lon": 71.0202 },
"abbreviation": "MA"
}‘
Response : {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":2}
Searching
Searching and querying takes the format of: http://localhost:9200/[index]/[type]/[operation]
Search across all indexes and all types
http://localhost:9200/_search
Search across all types in the test-data index.
http://localhost:9200/test-data/_search
Search explicitly for documents of type cities within the test-data index.
http://localhost:9200/test-data/cities/_search
Search explicitly for documents of type cities within the test-data index using paging.
http://localhost:9200/test-data/cities/_search?size=5&from=10
There’s3 differenttypesofsearchqueries
 Full Text Search (query string)
 Structured Search (filter)
 Analytics (facets)
Full Text Search (query string)
Inthiscaseyouwillbe searchinginbitsofnaturallanguage for (partially) matchingquerystrings. TheQueryDSL
alternativefor searchingfor“Boston” inall documents, wouldlooklike:
Request:
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty=true" -d '{
“query": { “query_string": { “query": “boston" }}}’
Response: {
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 6.1357985,
"hits" : [ {
"_index" : "test-data",
"_type" : "cities",
"_id" : "21",
"_score" : 6.1357985, "_source" : {"rank":"21","city":"Boston",...}
} ]
}
}...
Structured Search (filter)
Structuredsearchis aboutinterrogatingdata thathasinherentstructure. Dates, timesandnumbersareall structured—theyhave apreciseformatthatyou   
canperformlogicaloperations on. Commonoperations includecomparingrangesofnumbersor dates, or determiningwhichof two values is larger.
Withstructuredsearch, the answerto your questionis always ayes or no; somethingeither belongsinthesetor itdoes not. Structuredsearchdoes not
worryaboutdocumentrelevance or scoring—itsimplyincludes or excludesdocuments.   
Request:
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty=true" -d '{
“query": { “filtered": { “filter”: { “term": { “city” : “boston“ }}}}}’
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{
"query": {
"range": {
"population2012": {
"from": 500000,
"to": 1000000
}}}}‘
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{
"query": { "bool": { "should": [{ "match": { "state": "Texas"} }, {"match": { "state":
"California"} }],
"must": { "range": { "population2012": { "from": 500000, "to": 1000000 } } },
"minimum_should_match": 1}}}'
Analytics (facets)
Requestsofthistypewillnotreturnalistofmatchingdocuments,butastatisticalbreakdownof thedocuments.
Elasticsearchhasfunctionalitycalledaggregations,whichallowsyoutogeneratesophisticatedanalyticsoveryourdata.ItissimilartoGROUPBYinSQL.
Request:
$ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty=true" -d '{
“aggs": { “all_states": { “terms“: { “field” : “state“ }}}}’
Response:
{ ...
"hits": { ... },
"aggregations": {
"all_states": {
"buckets": [
{"key": "massachusetts ", "doc_count": 2},
{"key": "danbury", "doc_count": 1}
]
}}}
ElasticSearch Monitoring
ElasticSearch-Head - https://github.com/mobz/elasticsearch-head
Marvel - http://www.elasticsearch.org/guide/en/marvel/current/#_marvel_8217_s_dashboards
Paramedic - https://github.com/karmi/elasticsearch-paramedic
Bigdesk - https://github.com/lukas-vlcek/bigdesk/
ElasticSearch Limitations
Security : ElasticSearchdoesnotprovideanybuild-in authenticationor accesscontrolfunctionality.
Transactions : There is no muchmoresupportfor transactions or processingondatamanipulation.
Durability : ESisdistributedandfairlystablebutbackupsanddurabilityarenotashighpriorityas inotherdatastores
Large Computations: Commandsfor searchingdataare notsuitedto"large"scansof data andadvancedcomputationonthe dbside.
Data Availability : ESmakesdataavailable in "near real-time" whichmayrequireadditional considerationsinyour application(ie:
commentspagewhereauser addsnewcomment,refreshingthepage mightnotactuallyshowthenewpostbecausetheindexisstill
updating).
Open-Source Libraries
https://github.com/elasticsearch
http://stackoverflow.com/questions/tagged/elasticsearch
stackoverflow
Get started.
www.elasticsearch.org
About me
Founder and Technical Director of Terions Communication LTD (London/Berlin, 1996-2006)
- Datacenter Operator for Press and Image Agencys and Part of the DPA (German Press Agency)
Software Developer Zapelin S.L (Adeje)
- Development of IPTV Solutions for Hotels e.g.
RHCE - Red Hat Certified Engineer
CCDP - Cisco Certified Design Professional
DBA - Oracle Certified Professional, MySQL 5 Database Administrator
Contact for Questions: neil@cconnect.es
Elasticsearch for beginners

More Related Content

Elasticsearch for beginners

  • 2. What is elasticsearch? ElasticSearch is a free and open source distributed inverted index created by Shay Banon. Build on top of Apache Lucene - Lucene is a most popular java-based full text search index implementation. First public release version v0.4 in February 2010. Developed in Java, so inherently cross-plateform.
  • 3. Which companies use elasticsearch?
  • 4. Easy to scale Everything is one JSON call away (RESTful API) Unleashed power of Lucene under the hood Excellent Query DSL Multi-tenancy Support for advanced search features (Full Text) Configurable and Extensible Document Oriented Schema free Conflict management Active community Why Elasticsearch?
  • 5. Elasticsearch allows you to start small, but will grow with your business. It is built to scale horizontally out of the box. As you need more capacity, just add more nodes, and let the cluster reorganize itself to take advantage of the extra hardware. Easy to Scale RESTful API Elasticsearch is API driven. Almost any action can be performed using a simple RESTful API using JSON over HTTP.  An API already exists in the language of your choice. Responses are always in JSON, which is both machine and human readable.
  • 6. Excellent Query DSL The REST API exposes a very complex and capable query DSL, that is very easy to use. Every query is just a JSON object that can practically contain any type of query, or even several of them combined. Using filtered queries, with some queries expressed as Lucene filters, helps leverage caching and thus speed up common queries, or complex queries with parts that can be reused. Faceting, another very common search feature, is just something that upon-request is accompanied to search results, and then is ready for you to use. Per-operation Persistence Elasticsearch puts your data safety first. Document changes are recorded in transaction logs on multiple nodes in the cluster to minimize the chance of any data loss.
  • 7. You can host multiple indexes on one Elasticsearch installation - node or cluster. Each index can have multiple "types", which are essentially completely different indexes. The nice thing is you can query multiple types and multiple indexes with one simple query. This opens quite a lot of options. Multi-tenancy Support for advanced search features (Full Text) Elasticsearch uses Lucene under the covers to provide the most powerful full text search capabilities available in any open source product. Search comes with multi-language support, a powerful query language, support for geolocation, context aware did-you-mean suggestions, autocomplete and search snippets. script support in filters and scorers
  • 8. Many of Elasticsearch configurations can be changed while Elasticsearch is running, but some will require a restart (and in some cases reindexing). Most configurations can be changed using the REST API too. Elasticsearch has several extension points - namely site plugins (let you serve static content from ES - like monitoring javascript apps), rivers (for feeding data into Elasticsearch), and plugins that let you add modules or components within Elasticsearch itself. This allows you to switch almost every part of Elasticsearch if so you choose, fairly easily. If you need to create additional REST endpoints to your Elasticsearch cluster, that is easily done as well. Configurable and Extensible Document Oriented Store complex real world entities in Elasticsearch as structured JSON documents. All fields are indexed by default, and all the indices can be used in a single query, to return results at breath taking speed.
  • 9. Elasticsearch allows you to get started easily. Toss it a JSON document and it will try to detect the data structure, index the data and make it searchable. Later, apply your domain specific knowledge of your data to customize how your data is indexed. Schema free Conflict management Optimistic version control can be used where needed to ensure that data is never lost due to conflicting changes from multiple processes. Active community The community, other than creating nice tools and plugins, is very helpful and supporting. The overall vibe is really great, and this is an important metric of any OSS project. There are also some books currently being written by community members, and many blog posts around the net sharing experiences and knowledge
  • 11. Cluster : A cluster consistsofone or morenodeswhichsharethesamecluster name. Eachclusterhasasinglemasternode whichis chosenautomaticallyby thecluster andwhichcan bereplacedif the currentmasternode fails. Node : A nodeisarunninginstanceofelasticsearchwhichbelongsto acluster. Multiplenodescanbestartedonasingleserverfor testingpurposes, but usuallyyoushouldhaveone nodeper server. Atstartup, anodewill useunicast(or multicast, if specified)to discoveranexistingcluster withthesameclusternameandwill tryto jointhatcluster. Index : Anindex is like a‘database’inarelationaldatabase. Ithas amappingwhichdefines multipletypes. Anindex is alogicalnamespacewhichmapsto one or moreprimaryshardsandcanhavezero or morereplicashards. Type : A type islikea‘table’inarelationaldatabase. Each typehasalistoffields thatcanbespecifiedfordocuments of that type. The mappingdefines howeachfieldinthedocumentis analyzed.
  • 12. Document : A documentisaJSONdocumentwhichis storedinelasticsearch. Itislike arowinatableinarelational database. Each documentisstoredinan indexandhas atypeandan id. A documentisaJSONobject(also knowninotherlanguages asa hash /hashmap/ associative array) whichcontains zeroor more fields, or key- value pairs. Theoriginal JSONdocumentthatisindexedwillbestoredinthe_sourcefield, whichisreturnedbydefaultwhengettingor searching for adocument. Field : A documentcontains alistoffields, or key-value pairs. Thevaluecanbeasimple(scalar)value(ega string, integer, date), or anestedstructurelike an arrayoranobject. A fieldis similartoacolumnina table ina relationaldatabase. The mappingfor eachfieldhas afield‘type’(notto be confusedwithdocumenttype)whichindicatesthetypeof data thatcanbe storedinthatfield, eginteger, string, object. Themappingalso allows youto define(amongstother things) howthevalueforafieldshouldbe analyzed. Mapping : A mappingislikea‘schemadefinition’inarelational database. Eachindexhas amapping, whichdefineseachtype withintheindex, plus anumber ofindex-widesettings.A mappingcaneither bedefinedexplicitly, or itwill begeneratedautomaticallywhenadocumentis indexed
  • 13. Shard : A shardisasingle Luceneinstance. Itis alow-level“worker” unitwhich is managedautomaticallybyelasticsearch. An indexisalogicalnamespace whichpointstoprimaryandreplicashards. Elasticsearchdistributes shards amongstallnodes in the cluster, andcanmove shardsautomaticallyfromone nodeto another inthecaseof node failure, or theadditionof newnodes. PrimaryShard : Eachdocumentisstoredin asingleprimaryshard. Whenyouindexadocument, itisindexedfirston theprimaryshard, thenonallreplicasof the primaryshard. Bydefault,anindex has5 primaryshards.Youcanspecifyfewer or more primaryshards to scalethenumber ofdocumentsthat your indexcanhandle. ReplicaShard : Eachprimaryshardcanhavezero ormore replicas. A replicaisacopyof the primaryshard, andhastwo purposes: 1) increasefailover: areplicashardcanbepromotedto aprimaryshardiftheprimaryfails. 2) increaseperformance:get andsearchrequests canbehandledbyprimaryor replicashards.
  • 14. ElasticSearch Routing All of your data lives in a primary shard, somewhere in the cluster. You may have five shards or five hundred, but any particular document is only located in one of them. Routing is the process of determining which shard that document will reside in. Elasticsearch has no idea where to look for your document. All the docs were randomly distributed around your cluster. so Elasticsearch has no choice but to broadcasts the request to all  shards. This is a non-negligible overhead and can easily impact performance. Wouldn’t it be nice if we could tell Elasticsearch which shard the document lived in? Then you would only have to search one shard to find the document(s) that you need. Routing ensures that all documents with the same routing value will locate to the same shard, eliminating the need to broadcast searches.
  • 20. Performance Core i7, a 2Ghz, 8GB RAM, 128GB SSD) Insert of 10 Mio. Datasets: Elasticsearch: 23 Minutes MySQL without index: 56 Minutes MySQL with Index: 228 Minutes Select name and firstname of 100 Entrys: Elasticsearch: 5 ms MySQL: 9 ms Select of 100 full Entrys: Elasticsearch: 5 ms MySQL: 9 ms Select of the next 100 full Entrys: Elasticsearch: 4 ms MySQL: 18 ms
  • 22. wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.tar.gz tar -xzf elasticsearch-5.1.1.tar.gz cd elasticsearch-5.1.1/ ./bin/elasticsearch
  • 23. Is it running? GET http://localhost:9200/?pretty Response : { "name": "Vivisector", "cluster_name": "elasticsearch", "version": { "number": "2.3.3", "build_hash": "218bdf10790eef486ff2c41a3df5cfa32dadcfde", "build_timestamp": "2016-05-17T15:40:04Z", "build_snapshot": false, "lucene_version": "5.5.0" }, "tagline": "You Know, for Search" }
  • 25. Indexing a document Request : $ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{ "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }‘ Response : {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":1}
  • 26. Getting a document Request: $ curl -XGET "http://localhost:9200/test-data/cities/21?pretty" Response: { "_index" : "test-data", "_type" : "cities", "_id" : "21", "_version" : 1, "exists" : true, "_source" : { "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" } }
  • 27. Updating a document Request : $ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{ "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "population2012": 636479, "land_area": 48.277, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }‘ Response : {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":2}
  • 28. Searching Searching and querying takes the format of: http://localhost:9200/[index]/[type]/[operation] Search across all indexes and all types http://localhost:9200/_search Search across all types in the test-data index. http://localhost:9200/test-data/_search Search explicitly for documents of type cities within the test-data index. http://localhost:9200/test-data/cities/_search Search explicitly for documents of type cities within the test-data index using paging. http://localhost:9200/test-data/cities/_search?size=5&from=10 There’s3 differenttypesofsearchqueries  Full Text Search (query string)  Structured Search (filter)  Analytics (facets)
  • 29. Full Text Search (query string) Inthiscaseyouwillbe searchinginbitsofnaturallanguage for (partially) matchingquerystrings. TheQueryDSL alternativefor searchingfor“Boston” inall documents, wouldlooklike: Request: $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty=true" -d '{ “query": { “query_string": { “query": “boston" }}}’ Response: { "took" : 5, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 6.1357985, "hits" : [ { "_index" : "test-data", "_type" : "cities", "_id" : "21", "_score" : 6.1357985, "_source" : {"rank":"21","city":"Boston",...} } ] } }...
  • 30. Structured Search (filter) Structuredsearchis aboutinterrogatingdata thathasinherentstructure. Dates, timesandnumbersareall structured—theyhave apreciseformatthatyou    canperformlogicaloperations on. Commonoperations includecomparingrangesofnumbersor dates, or determiningwhichof two values is larger. Withstructuredsearch, the answerto your questionis always ayes or no; somethingeither belongsinthesetor itdoes not. Structuredsearchdoes not worryaboutdocumentrelevance or scoring—itsimplyincludes or excludesdocuments.    Request: $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty=true" -d '{ “query": { “filtered": { “filter”: { “term": { “city” : “boston“ }}}}}’ $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "range": { "population2012": { "from": 500000, "to": 1000000 }}}}‘ $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "bool": { "should": [{ "match": { "state": "Texas"} }, {"match": { "state": "California"} }], "must": { "range": { "population2012": { "from": 500000, "to": 1000000 } } }, "minimum_should_match": 1}}}'
  • 31. Analytics (facets) Requestsofthistypewillnotreturnalistofmatchingdocuments,butastatisticalbreakdownof thedocuments. Elasticsearchhasfunctionalitycalledaggregations,whichallowsyoutogeneratesophisticatedanalyticsoveryourdata.ItissimilartoGROUPBYinSQL. Request: $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty=true" -d '{ “aggs": { “all_states": { “terms“: { “field” : “state“ }}}}’ Response: { ... "hits": { ... }, "aggregations": { "all_states": { "buckets": [ {"key": "massachusetts ", "doc_count": 2}, {"key": "danbury", "doc_count": 1} ] }}}
  • 32. ElasticSearch Monitoring ElasticSearch-Head - https://github.com/mobz/elasticsearch-head Marvel - http://www.elasticsearch.org/guide/en/marvel/current/#_marvel_8217_s_dashboards Paramedic - https://github.com/karmi/elasticsearch-paramedic Bigdesk - https://github.com/lukas-vlcek/bigdesk/
  • 33. ElasticSearch Limitations Security : ElasticSearchdoesnotprovideanybuild-in authenticationor accesscontrolfunctionality. Transactions : There is no muchmoresupportfor transactions or processingondatamanipulation. Durability : ESisdistributedandfairlystablebutbackupsanddurabilityarenotashighpriorityas inotherdatastores Large Computations: Commandsfor searchingdataare notsuitedto"large"scansof data andadvancedcomputationonthe dbside. Data Availability : ESmakesdataavailable in "near real-time" whichmayrequireadditional considerationsinyour application(ie: commentspagewhereauser addsnewcomment,refreshingthepage mightnotactuallyshowthenewpostbecausetheindexisstill updating).
  • 37. About me Founder and Technical Director of Terions Communication LTD (London/Berlin, 1996-2006) - Datacenter Operator for Press and Image Agencys and Part of the DPA (German Press Agency) Software Developer Zapelin S.L (Adeje) - Development of IPTV Solutions for Hotels e.g. RHCE - Red Hat Certified Engineer CCDP - Cisco Certified Design Professional DBA - Oracle Certified Professional, MySQL 5 Database Administrator Contact for Questions: neil@cconnect.es

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>