SlideShare a Scribd company logo
ElasticSearch



Friday, March 8, 13
Links


                      • https://bitbucket.org/lsdr/es/overview
                      • https://confluence.abril.com.br/x/J5I_AQ


Friday, March 8, 13
Instalação

                      • Mac OSX:
                       •   brew install elasticsearch

                      • CentOS 6.x:
                       •   não tem RPM oficial :-(

                       •   https://gist.github.com/lsdr/5117589




Friday, March 8, 13
Setup

                 cluster.name: buffalo
                 node.name:    "Bruce Smith"

                 path.data:    /usr/local/var/elasticsearch/
                 path.logs:    /usr/local/var/log/elasticsearch/
                 path.plugins: /usr/local/var/lib/elasticsearch/plugins

                 network.host: 127.0.0.1




                          suficiente para subir um server local!

Friday, March 8, 13
Setup++
                      • Configuração de um nó:
                                                                    Master
                                                           TRUE              FALSE
                                         TRUE            Development        Workhorse
                         Data
                                         FALSE           Coordinator       Load Balancer


                          http://elasticsearch.org/guide/reference/modules/node.html


Friday, March 8, 13
Setup++
                      • # shards e # replicas
                        •   possível aumentar replicas em runtime, shards
                            não

                      • Plugins “obrigatórios”
                        •   só inicia nó se estiverem presentes

                      • Tunning JVM
                      • Outros módulos: Thrift, JMX
Friday, March 8, 13
“Hello World!”


                      $ curl -XGET 'localhost:9200/world'


                      No handler found for uri [/world] and method [GET]




Friday, March 8, 13
“Hello World!”

                      curl -XPOST "localhost:9200/world/hello" -d
                      '{
                         "text": "hello world",
                         "lang": "en"
                      }'




Friday, March 8, 13
“Hello World!”

                      {
                          "ok": true,
                          "_index": "world",
                          "_type": "hello",
                          "_id": "A5HX8IhTR0CzMNWHBPhEqA",
                          "_version": 1
                      }




                                POST: id automágico | PUT: id “manual”


Friday, March 8, 13
“Hello World!”
                      $ curl -XGET 'localhost:9200/world/_count’


                      {
                          "count":1,
                          "_shards":
                            {
                              "total": 3,
                              "successful": 3,
                              "failed":0
                            }
                      }




Friday, March 8, 13
“Hello World!”
                      $ curl -XGET 'localhost:9200/world/_search’


                      "hits" [
                        {
                          "_index": "world",
                          "_type": "hello",
                          "_id": "A5HX8IhTR0CzMNWHBPhEqA",
                          "_score": 1.0,
                          "_source": {"text": "hello world", "lang": "en"}
                        }
                      ]




Friday, March 8, 13
“Hello World!”
                      $ curl -XGET 'localhost:9200/world/hello/_mapping’
                      {
                          "hello": {
                            "properties": {
                              "lang": {
                                 "type": "string"
                              },
                              "text": {
                                 "type": "string"
                              }
                            }
                          }
                      }


Friday, March 8, 13
Mapping

            Mapping is the process of defining how a document
             should be mapped to the Search Engine, including
             its searchable characteristics such as which fields
               are searchable and if/how they are tokenized.

                      http://elasticsearch.org/guide/reference/mapping/




Friday, March 8, 13
Querying
                      • URI Request
                       •   Não expõe todos os features do ES

                       •   /guide/reference/api/search/uri-request.html

                      • Query DSL
                       •   POST-based (no cache!)

                       •   /guide/reference/query-dsl/



Friday, March 8, 13
Querying

                      • Brincar de fazer queries em matérias!

                      • Queries simples funcionam, mas...
                        •   facets quebram?




Friday, March 8, 13
Mapping

              By default, there isn’t a need to define an explicit
              mapping, (...) Only when the defaults need to be
             overridden must a mapping definition be provided.


                      http://elasticsearch.org/guide/reference/mapping/




Friday, March 8, 13
Mapping

                      • Override não é trivial
                      • Possivelmente envolve reindexação
                      • Esse é o trabalho do time
                       •   “massagistas de dados”




Friday, March 8, 13
Analyzer


             curl -XGET 'localhost:9200/_analyze?analyzer=standard' -d
             'Esporte::Futebol'


             curl -XGET 'localhost:9200/_analyze?analyzer=keyword' -d
             'Esporte::Futebol'




Friday, March 8, 13
River




Friday, March 8, 13
River
                      • Cria índices/mappings se não existir
                        •   lembrar limitações

                      • Pulling
                      • elasticsearch-river-mongo
                        •   Explode se o mongo estiver fora

                        •   Demora (se perde?) quando voltar



Friday, March 8, 13
River

                      • Instalar River (plugin)
                      • Criar River
                      • mongorestore


Friday, March 8, 13
River

             $ES_HOME/bin/plugin	
  -­‐install	
  elasticsearch/elasticsearch-­‐mapper-­‐
             attachments/1.6.0

             $ES_HOME/bin/plugin	
  -­‐url	
  https://github.com/downloads/
             richardwilly98/elasticsearch-­‐river-­‐mongodb/elasticsearch-­‐river-­‐
             mongodb-­‐1.6.1.zip	
  -­‐install	
  river-­‐mongodb




Friday, March 8, 13
River
                      curl -XPUT "localhost:9200/_river/v/_meta" -d '{
                         "type": "mongodb",
                         "mongodb": {
                            "db": "alx_midia",
                            "collection": "videos",
                            "servers": [
                              { "host": "localhost", "port": "27017" }
                            ]
                         },
                         "index": {
                            "name": "videos",
                            "type": "documents"
                         }
                      }'

                                        origem - destino
Friday, March 8, 13
River
                      curl -XGET "localhost:9200/_river/v/_meta"
                      {
                          "_index": "_river",
                          "_id": "_meta",
                          "exists": true,
                          "_source": {
                              "type": "mongodb",
                              "mongodb": {
                                  "db": "alx_midia",
                                  "collection": "videos",
                                  "servers": [
                                       { "host": "localhost", "port": "27017" }
                                  ]
                              },
                              "index": {
                                  "name": "videos",
                                  "type": "documents"
                              }
                          }
                      }


Friday, March 8, 13
River


                      $ mongorestore --host localhost --port 27017
                      --noIndexRestore alx_midia




Friday, March 8, 13
River

           [videos] creating index, cause [api], shards [3]/[1], mappings []
           [_river] update_mapping [v] (dynamic)
           [mongodb][v] No known previous slurping time for this collection
           [_river] update_mapping [v] (dynamic)
           [videos] update_mapping [documents] (dynamic)
           Indexed 100 insertions 0, updates, 0 deletions, 100 documents per second
           Indexed 100 insertions 0, updates, 0 deletions, 100 documents per second
           Indexed 81 insertions 0, updates, 0 deletions, 81 documents per second
           Indexed 100 insertions 0, updates, 0 deletions, 100 documents per second
           Indexed 15 insertions 0, updates, 0 deletions, 15 documents per second




Friday, March 8, 13
River

                      • Na operação padrão, não vai ter
                        “restore” em caso de falha
                      • Necessário pensar em uma solução de
                        “recrawling”




Friday, March 8, 13

More Related Content

ElasticSearch

  • 2. Links • https://bitbucket.org/lsdr/es/overview • https://confluence.abril.com.br/x/J5I_AQ Friday, March 8, 13
  • 3. Instalação • Mac OSX: • brew install elasticsearch • CentOS 6.x: • não tem RPM oficial :-( • https://gist.github.com/lsdr/5117589 Friday, March 8, 13
  • 4. Setup cluster.name: buffalo node.name: "Bruce Smith" path.data: /usr/local/var/elasticsearch/ path.logs: /usr/local/var/log/elasticsearch/ path.plugins: /usr/local/var/lib/elasticsearch/plugins network.host: 127.0.0.1 suficiente para subir um server local! Friday, March 8, 13
  • 5. Setup++ • Configuração de um nó: Master TRUE FALSE TRUE Development Workhorse Data FALSE Coordinator Load Balancer http://elasticsearch.org/guide/reference/modules/node.html Friday, March 8, 13
  • 6. Setup++ • # shards e # replicas • possível aumentar replicas em runtime, shards não • Plugins “obrigatórios” • só inicia nó se estiverem presentes • Tunning JVM • Outros módulos: Thrift, JMX Friday, March 8, 13
  • 7. “Hello World!” $ curl -XGET 'localhost:9200/world' No handler found for uri [/world] and method [GET] Friday, March 8, 13
  • 8. “Hello World!” curl -XPOST "localhost:9200/world/hello" -d '{ "text": "hello world", "lang": "en" }' Friday, March 8, 13
  • 9. “Hello World!” { "ok": true, "_index": "world", "_type": "hello", "_id": "A5HX8IhTR0CzMNWHBPhEqA", "_version": 1 } POST: id automágico | PUT: id “manual” Friday, March 8, 13
  • 10. “Hello World!” $ curl -XGET 'localhost:9200/world/_count’ { "count":1, "_shards": { "total": 3, "successful": 3, "failed":0 } } Friday, March 8, 13
  • 11. “Hello World!” $ curl -XGET 'localhost:9200/world/_search’ "hits" [ { "_index": "world", "_type": "hello", "_id": "A5HX8IhTR0CzMNWHBPhEqA", "_score": 1.0, "_source": {"text": "hello world", "lang": "en"} } ] Friday, March 8, 13
  • 12. “Hello World!” $ curl -XGET 'localhost:9200/world/hello/_mapping’ { "hello": { "properties": { "lang": { "type": "string" }, "text": { "type": "string" } } } } Friday, March 8, 13
  • 13. Mapping Mapping is the process of defining how a document should be mapped to the Search Engine, including its searchable characteristics such as which fields are searchable and if/how they are tokenized. http://elasticsearch.org/guide/reference/mapping/ Friday, March 8, 13
  • 14. Querying • URI Request • Não expõe todos os features do ES • /guide/reference/api/search/uri-request.html • Query DSL • POST-based (no cache!) • /guide/reference/query-dsl/ Friday, March 8, 13
  • 15. Querying • Brincar de fazer queries em matérias! • Queries simples funcionam, mas... • facets quebram? Friday, March 8, 13
  • 16. Mapping By default, there isn’t a need to define an explicit mapping, (...) Only when the defaults need to be overridden must a mapping definition be provided. http://elasticsearch.org/guide/reference/mapping/ Friday, March 8, 13
  • 17. Mapping • Override não é trivial • Possivelmente envolve reindexação • Esse é o trabalho do time • “massagistas de dados” Friday, March 8, 13
  • 18. Analyzer curl -XGET 'localhost:9200/_analyze?analyzer=standard' -d 'Esporte::Futebol' curl -XGET 'localhost:9200/_analyze?analyzer=keyword' -d 'Esporte::Futebol' Friday, March 8, 13
  • 20. River • Cria índices/mappings se não existir • lembrar limitações • Pulling • elasticsearch-river-mongo • Explode se o mongo estiver fora • Demora (se perde?) quando voltar Friday, March 8, 13
  • 21. River • Instalar River (plugin) • Criar River • mongorestore Friday, March 8, 13
  • 22. River $ES_HOME/bin/plugin  -­‐install  elasticsearch/elasticsearch-­‐mapper-­‐ attachments/1.6.0 $ES_HOME/bin/plugin  -­‐url  https://github.com/downloads/ richardwilly98/elasticsearch-­‐river-­‐mongodb/elasticsearch-­‐river-­‐ mongodb-­‐1.6.1.zip  -­‐install  river-­‐mongodb Friday, March 8, 13
  • 23. River curl -XPUT "localhost:9200/_river/v/_meta" -d '{ "type": "mongodb", "mongodb": { "db": "alx_midia", "collection": "videos", "servers": [ { "host": "localhost", "port": "27017" } ] }, "index": { "name": "videos", "type": "documents" } }' origem - destino Friday, March 8, 13
  • 24. River curl -XGET "localhost:9200/_river/v/_meta" { "_index": "_river", "_id": "_meta", "exists": true, "_source": { "type": "mongodb", "mongodb": { "db": "alx_midia", "collection": "videos", "servers": [ { "host": "localhost", "port": "27017" } ] }, "index": { "name": "videos", "type": "documents" } } } Friday, March 8, 13
  • 25. River $ mongorestore --host localhost --port 27017 --noIndexRestore alx_midia Friday, March 8, 13
  • 26. River [videos] creating index, cause [api], shards [3]/[1], mappings [] [_river] update_mapping [v] (dynamic) [mongodb][v] No known previous slurping time for this collection [_river] update_mapping [v] (dynamic) [videos] update_mapping [documents] (dynamic) Indexed 100 insertions 0, updates, 0 deletions, 100 documents per second Indexed 100 insertions 0, updates, 0 deletions, 100 documents per second Indexed 81 insertions 0, updates, 0 deletions, 81 documents per second Indexed 100 insertions 0, updates, 0 deletions, 100 documents per second Indexed 15 insertions 0, updates, 0 deletions, 15 documents per second Friday, March 8, 13
  • 27. River • Na operação padrão, não vai ter “restore” em caso de falha • Necessário pensar em uma solução de “recrawling” Friday, March 8, 13