SlideShare a Scribd company logo
Debugging and
Testing ES Systems
Chris Birchall
2013/8/29
Elasticsearch 勉強会 第1回
#elasticsearchjp
Elasticsearch and me
● At Infoscience, helped build a log
management product based on ES +
Hadoop
● At M3, ES evangelist (??)
○ Maintain ES cluster
○ Help dev teams integrate ES into their apps
Twitter: @cbirchall
Github: https://github.com/cb372
Search at M3
● Using ES for all new services
○ Search, recommendation (MoreLikeThis)
● Slowly migrating other services from Solr
● A few legacy services use Lucene directly
● Running all indices on one ES cluster
● Kuromoji for Japanese content
Debugging
Mostly debugging of queries
● “Why doesn’t doc X match query Y?”
● “Why does this search return no results?”
Operational issues are very rare
● ES’s clustering magic is surprisingly
stable!
● No performance issues so far
Debugging - Step 1
Check for typos!
ES will silently ignore many typos in
settings/mapping definitions
Typo - Example
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mapping" : {
"article" : {
"_source": { "enabled": false },
"properties": {
"title": { "type": "string", "store": "true" },
"body": { "type": "string", "store": "true" },
...
}
},
...
}'
Let’s create a new index...
Typo - Example (cont’d)
{"ok":true,"acknowledged":true}
Response from ES:
OK, seems fine...
Typo - Example (cont’d)
$ curl localhost:9200/myapp/_mappings?pretty
Response from ES:
{
"myapp" : { }
}
Eh?
Where are my lovingly-crafted mappings?!
Now check the mappings...
Typo - Example (cont’d)
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mappings" : {
"article" : {
"_source": { "enabled": false },
"properties": {
"title": { "type": "string", "store": "true" },
"body": { "type": "string", "store": "true" },
...
}
},
...
}'
Oops!
Debugging - Step 2
Set up a local environment
● Makes it easy to wipe & rebuild index
Setting up a local env (OSX)
# Install
$ brew install elasticsearch
# Kuromoji plugin (optional)
$ /usr/local/opt/elasticsearch/bin/plugin -install
elasticsearch/elasticsearch-analysis-kuromoji/1.5.0
# Start
$ elasticsearch
# Create index
$ curl -X PUT localhost:9200/my_app -d '{ ... }'
# Insert some documents
$ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }'
$ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }'
# Done!
Useful commands - Analyze
$ curl 'localhost:9200/myindex/_analyze?pretty' /
-d '東京特許許可局許可局長'
{
"tokens" : [ {
"token" : "東京",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "特許",
"start_offset" : 2,
"end_offset" : 4,
"type" : "word",
...
How is my
document/query
being
tokenized?
Useful commands - Explain
$ curl 'localhost:9200/kuro/docs/123/_explain?pretty' /
-d '{ "query": { "term": { "body": "東京" } } }'
{
...
"matched" : true,
"explanation" : {
"value" : 0.375,
"description" : "weight(body:東京 in 0)
[PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.375,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
...
Why does this
document (not)
match this
query?
Specify document ID
Tuning queries
Parameters to tweak
● default_operator (AND/OR)
● auto_generate_phrase_queries
● minumum_should_match
● Stop words/tags
● Kuromoji
○ Segmentation mode
○ Reading form filter
○ Disable Kuromoji! (for some fields)
Why disable Kuromoji?
Problem: occasionally weird tokenization
● AND query will fail, because not all terms match
● OR query will match any document with 病院
→ low precision
Phrase Terms
特定医療法人財団 日本会 東日本病院
(document field)
特定、医療、法人、財団、
日本、会、東日本、病院
東日本 (query) 東日、東日本、本
東日本病院 (query) 東、東日本、日本、病院
Useful plugin - Head
$ bin/plugin -install mobz/elasticsearch-head
http://mobz.github.io/elasticsearch-head/
Testing
Main goal: Ensure that queries return the
results that we expect
● Test coverage of representative queries
○ Freedom to tune for a given query without
breaking other queries
Ideally, tests should:
● Run fast
● Run standalone (i.e. no need to have an
ES server running)
Testing - Java
elasticsearch-test is awesome
● DSL to set up/tear down ES
● Annotations + JUnit runner
● ES runs in-process
○ No need to start an external ES server
● Index is stored in-memory
○ Runs quickly
https://github.com/tlrx/elasticsearch-test
https://github.com/cb372/elasticsearch-test-example
Testing - Java
Simple elasticsearch-test example
Testing - Ruby
Simple Rails + Tire + RSpec example
https://github.com/cb372/elasticsearch-rspec-example
We’re hiring!
TODO We are hiring slide
http://bit.ly/m3jobs

More Related Content

Debugging and Testing ES Systems

  • 1. Debugging and Testing ES Systems Chris Birchall 2013/8/29 Elasticsearch 勉強会 第1回 #elasticsearchjp
  • 2. Elasticsearch and me ● At Infoscience, helped build a log management product based on ES + Hadoop ● At M3, ES evangelist (??) ○ Maintain ES cluster ○ Help dev teams integrate ES into their apps Twitter: @cbirchall Github: https://github.com/cb372
  • 3. Search at M3 ● Using ES for all new services ○ Search, recommendation (MoreLikeThis) ● Slowly migrating other services from Solr ● A few legacy services use Lucene directly ● Running all indices on one ES cluster ● Kuromoji for Japanese content
  • 4. Debugging Mostly debugging of queries ● “Why doesn’t doc X match query Y?” ● “Why does this search return no results?” Operational issues are very rare ● ES’s clustering magic is surprisingly stable! ● No performance issues so far
  • 5. Debugging - Step 1 Check for typos! ES will silently ignore many typos in settings/mapping definitions
  • 6. Typo - Example $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mapping" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Let’s create a new index...
  • 7. Typo - Example (cont’d) {"ok":true,"acknowledged":true} Response from ES: OK, seems fine...
  • 8. Typo - Example (cont’d) $ curl localhost:9200/myapp/_mappings?pretty Response from ES: { "myapp" : { } } Eh? Where are my lovingly-crafted mappings?! Now check the mappings...
  • 9. Typo - Example (cont’d) $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mappings" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Oops!
  • 10. Debugging - Step 2 Set up a local environment ● Makes it easy to wipe & rebuild index
  • 11. Setting up a local env (OSX) # Install $ brew install elasticsearch # Kuromoji plugin (optional) $ /usr/local/opt/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 # Start $ elasticsearch # Create index $ curl -X PUT localhost:9200/my_app -d '{ ... }' # Insert some documents $ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }' $ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }' # Done!
  • 12. Useful commands - Analyze $ curl 'localhost:9200/myindex/_analyze?pretty' / -d '東京特許許可局許可局長' { "tokens" : [ { "token" : "東京", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "特許", "start_offset" : 2, "end_offset" : 4, "type" : "word", ... How is my document/query being tokenized?
  • 13. Useful commands - Explain $ curl 'localhost:9200/kuro/docs/123/_explain?pretty' / -d '{ "query": { "term": { "body": "東京" } } }' { ... "matched" : true, "explanation" : { "value" : 0.375, "description" : "weight(body:東京 in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.375, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" ... Why does this document (not) match this query? Specify document ID
  • 14. Tuning queries Parameters to tweak ● default_operator (AND/OR) ● auto_generate_phrase_queries ● minumum_should_match ● Stop words/tags ● Kuromoji ○ Segmentation mode ○ Reading form filter ○ Disable Kuromoji! (for some fields)
  • 15. Why disable Kuromoji? Problem: occasionally weird tokenization ● AND query will fail, because not all terms match ● OR query will match any document with 病院 → low precision Phrase Terms 特定医療法人財団 日本会 東日本病院 (document field) 特定、医療、法人、財団、 日本、会、東日本、病院 東日本 (query) 東日、東日本、本 東日本病院 (query) 東、東日本、日本、病院
  • 16. Useful plugin - Head $ bin/plugin -install mobz/elasticsearch-head http://mobz.github.io/elasticsearch-head/
  • 17. Testing Main goal: Ensure that queries return the results that we expect ● Test coverage of representative queries ○ Freedom to tune for a given query without breaking other queries Ideally, tests should: ● Run fast ● Run standalone (i.e. no need to have an ES server running)
  • 18. Testing - Java elasticsearch-test is awesome ● DSL to set up/tear down ES ● Annotations + JUnit runner ● ES runs in-process ○ No need to start an external ES server ● Index is stored in-memory ○ Runs quickly https://github.com/tlrx/elasticsearch-test
  • 20. Testing - Ruby Simple Rails + Tire + RSpec example https://github.com/cb372/elasticsearch-rspec-example
  • 21. We’re hiring! TODO We are hiring slide http://bit.ly/m3jobs