SlideShare a Scribd company logo
Using Solr to Search and
Analyze Logs

Radu Gheorghe
@sematext

@radu0gheorghe
Logsene
Kibana
Elasticsearch API
Logstash
syslog
receiver

syslogd
Solr for Indexing and Searching Logs
What about

?
defining and handling logs in general

4 sets of tools to send logs to

Performance tuning and SolrCloud
Defining and Handling Logs
(story time!)
syslog

syslog

?
syslog

syslog
Requirements
1) What’s wrong?
(

for debugging)

http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png
Problem

looooots of messages coming in

http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346
Solved with no indexing

BUT
Elasticsearch
Requirements
1) What’s wrong?

✓

2) What will go wrong?
(stats)
Parsing Raw Logs
still slow

BUT

user

format changes

item

time

mickey mouse 10
Parsing Raw Logs
still slow

BUT

format changes

add error code

mickey mouse 0 10
Facets. Logging in JSON
2013-11-06… mickey mouse

{
"date": "2013-11-06",
"message": "mickey mouse"
}
Facets. Logging in JSON
2013-11-06… mickey mouse

2013-11-06… @cee:{"user": "mickey"}

{

{
"date": "2013-11-06",
"message": "mickey mouse"

}

"date": "2013-11-06",
"user": "mickey"
}
Requirements
1) What’s wrong?

✓

2) What will go wrong? ✓
3) Handle logs like production data ✓
Requirements
1) What’s wrong?

✓

2) What will go wrong? ✓
What is a log?

3) Handle logs like production data ✓
How to handle logs?
4 Ways of Sending Logs to Solr
logger

Logstash

files
Schemaless

% cd solr-4.5.1/example/
% mv solr solr.bak

% cp -R example-schemaless/solr/ .
Automatic ID generation
solrconfig.xml
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
……..

<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
mmjsonparse
/dev/log

logger

omprog + script
/dev/log -> parse -> format -> send to Solr
% logger '@cee: {"hello": "world"}'

rsyslog.conf
module(load="imuxsock") # version 7+
/dev/log -> parse -> format -> send to Solr
...

module(load="mmjsonparse")
action(type="mmjsonparse")
/dev/log -> parse -> format -> send to Solr
...
template(name="CEE"
type="list") {
property(name="$!all-json")
constant(value="n")
}
/dev/log -> parse -> format -> send to Solr
...
action(type="mmjsonparse")
template(name="CEE"
…

module(load="omprog")
if $parsesuccess == "OK" then action(type="omprog"
binary="/opt/json-to-solr.py"
template="CEE")
/dev/log -> parse -> format -> send to Solr
import json, pysolr, sys
solr = pysolr.Solr('http://localhost:8983/solr/')
while True:
line = sys.stdin.readline()
doc = json.loads(line)
solr.add([doc])
Morphline
Solr Sink
Avro
Avro -> buffer -> parse -> send to Solr
https://github.com/mpercy/flume-log4j-example
flume.conf
agent.sources = avroSrc
agent.sources.avroSrc.type = avro
agent.sources.avroSrc.bind = 0.0.0.0
agent.sources.avroSrc.port = 41414
Avro -> buffer -> parse -> send to Solr

flume.conf
agent.channels = solrMemoryChannel
agent.channels.solrMemoryChannel.type = memory
agent.sources.avroSrc.channels = solrMemoryChannel
Avro -> buffer -> parse -> send to Solr
flume.conf
agent.sinks = solrSink
agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.solrSink.morphlineFile = conf/morphline.conf
agent.sinks.solrSink.channel = solrMemoryChannel
Avro -> buffer -> parse -> send to Solr
morphline.conf
...
commands : [
{ readLine { charset : UTF-8 }}
{ grok {
dictionaryFiles : [conf/grok-patterns]
expressions : {
message : """%{INT:pid} %{DATA:message}"""
...
https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries
Avro -> buffer -> parse -> send to Solr
morphline.conf
SOLR_LOCATOR : {
collection : collection1
#zkHost : "127.0.0.1:2181"
solrUrl : "http://localhost:8983/solr/"
}
...
commands : [
...
{ loadSolr {
solrLocator : ${SOLR_LOCATOR}
...
fluent-logger

fluent-plugin-solr
fluent-logger -> fluentd -> fluent-plugin-solr
% pip install fluent-logger

from fluent import sender,event
sender.setup('solr.test')
event.Event('forward', {'hello': 'world'})
fluent-logger -> fluentd -> fluent-plugin-solr
<source>
type forward
</source>
<match solr.**>
type solr
host localhost
port 8983
core collection1
</match>
fluent-logger -> fluentd -> fluent-plugin-solr
% gem install fluent-plugin-solr

https://github.com/btigit/fluent-plugin-solr

out_solr.rb
doc = Solr::Document.new(:hello => record["hello"])
grok filter
file input

file

solr_http output

Logstash
file input -> grok filter -> solr_http output
% echo '2 world' >> /tmp/testlog

logstash.conf:
input {
file { path => "/tmp/testlog" }
}
file input -> grok filter -> solr_http output
logstash.conf:
filter {
grok {
match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"]
}
}

{"pid": "2", "hello":"world"}
file input -> grok filter -> solr_http output
logstash.conf:
output {
solr_http { # master or v1.2.3+
solr_url => "http://localhost:8983/solr"
}
}
Fast and Cloud
“It Depends”

load test

monitor: SPM
20% off: LR2013SPM20
http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png
|>>>>|Single Core: # of docs/update

http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp
|>>>>|Single Core: Commits

<autoSoftCommit>
<maxTime>...

<autoCommit>
<openSearcher>false
<maxTime>???
<ramBufferSizeMB>???

http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpg
http://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png
|>>>>|Single Core: Size and Merges

omitNorms="true"
omitTermFreqAndPositions="true"

<mergeFactor>??

http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.png
http://mergewords.com/gfx/logo-big.png
|>>>>|Single Core: Caches

facets

<fieldValueCache ...
size="???"
autowarmCount="0"

changing data
to sort&facet

docValues="true"

http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.png
http://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png
http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png
SolrCloud: ZooKeeper

bin/zkServer.sh start
OR
java -DzkRun … -jar start.jar
http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png
http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
SolrCloud: ZooKeeper

zkcli.sh -cmd upconfig 
-zkhost SERVER:2181 
-confdir solr/collection1/conf/ 
-confname start
-Dbootstrap_confdir=solr/collection1/conf Dcollection.configName=start

http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png
http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
SolrCloud: Start Nodes

java -DzkHost=SERVER:2181 -jar start.jar
Timed Collections
optimize

04
Nov

05
Nov

search latest

06
Nov
search all

07
Nov

index
Collections API

action=DELETE
&name=05Nov

05
Nov

06
Nov

07
Nov

08
Nov

action=CREATE
&name=08Nov
&numShards=4
Aliases. Optimize

07Nov/update?optimize=true

05
Nov

06
Nov

07
Nov

action=CREATEALIAS
&name=LATEST
&collection=08Nov

08
Nov

action=CREATEALIAS
&name=ALL
&collection=06Nov,07Nov,08Nov
Solr for Indexing and Searching Logs
logs =
production
data
logs =
production
data

Logstash
commits
docs/update
mergeFactor
logs =
production
data

Logstash

docValues
caches

omit*
commits
docs/update
mergeFactor
logs =
production
data

Logstash

docValues
caches

omit*
commits
docs/update
mergeFactor
logs =
production
data

docValues

omit*

caches

time

Logstash

Collections API
aliases
optimize
We’re hiring!

sematext.com/about/jobs
Thank you!

radu.gheorghe@sematext.com
@radu0gheorghe

@sematext

And @ our booth :)

More Related Content

Solr for Indexing and Searching Logs