This document summarizes Sadayuki Furuhashi's background and open source projects, and provides an overview of Fluentd. Fluentd is an open source data collection tool that allows filtering, buffering, and routing logs and event data to various outputs such as databases, cloud services, and analysis systems. It has a simple core with plugins that provide extensibility and features like high availability, load balancing, and more.
10. # logs from a file
<source>
type tail
path /var/log/httpd.log
format apache2
tag web.access
</source>
# logs from client libraries
<source>
type forward
port 24224
</source>
# store logs to MongoDB and S3
<match **>
type copy
<match>
type mongo
host mongo.example.com
capped
capped_size 200m
</match>
<match>
type s3
path archive/
</match>
</match>
Fluentd
13. watch server
Librato Metrics
for realtime analysis
Treasure Data
for historical analysis
out_tdlog out_metricsense
✓ streaming aggregation
Fluentd in Treasure Data
Fluentd
14. Internal Architecture
Input Buffer Output
Plugin Plugin Plugin
2012-02-04 01:33:51
myapp.buylog {
“user”: ”me”,
“path”: “/buyItem”,
“price”: 150,
“referer”: “/landing”
}
time
tag
record
15. Architecture :: Input plugins
Input
HTTP+JSON (in_http)
File tail (in_tail)
Syslog (in_syslog)
...
Plugin
✓ Receive logs
✓ Or pull logs from data sources
✓ in non-blocking manner
24. class CassandraOutput < BufferedOutput
Fluent::Plugin.register_output('cassandra', self)
require 'cassandra'
config_param :keyspace, :string
config_param :columnfamily, :string
config_param :host, :string, :default => 'localhost'
config_param :port, :int, :default => 9160
def start
super
@connection = Cassandra.new(@keyspace, “#{@host}:#{@port}”)
end
def format(tag, time, record)
record['tag'] = tag
record['time'] = time
record.to_msgpack
end
def write(chunk)
chunk.msgpack_each do |record|
@connection.insert(@columnfamily, "#{record["tag"]}_#{record["time"]}", record)
end
end
end
out_cassandra
25. Use cases
http://www.slideshare.net/tagomoris/rubykaigi-2013-111130
“Complex Event Processing on Ruby, Fluentd and Norikra”
TAGOMORI Satoshi, RubyKaigi 2013
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
“Log analysis system with Hadoop”
NHN Japan Corp., Hadoop Conference Japan 2013
http://www.slideshare.net/sylvainkalache/fluentd-at-slideshare
“fluentd at slideshare”
@SylvainKalache, Fluentd meetup
26. Use cases
http://www.slideshare.net/frsyuki/how-24042353
“How we use Fluentd in Treasure Data”
Sadayuki Furuhashi, Fluentd meetup at slideshare
http://www.slideshare.net/sematext/solr-for-indexing-and-searching-logs
“Using Solr to Search and Analyze Logs”
Radu Gheorghe
http://docs.fluentd.org/articles/free-alternative-to-splunk-by-fluentd
“Free Alternative to Splunk Using Fluentd”
27. Expected discussions...
> Who are using Fluentd?
> What’s the differences compared to XYZ?
> Is there a plugin to send/recv data to/from XYZ?
> How can my system XYZ send data to Fluentd?
> Does Fluentd really work in case of XYZ?
29. class SomeInput < Fluent::Input
Fluent::Plugin.register_input('myin', self)
config_param :tag, :string
def start
Thread.new {
while true
time = Engine.new
record = {“user”=>1, “size”=>1}
Engine.emit(@tag, time, record)
end
}
end
def shutdown
...
end
end
<source>
type myin
tag myapp.api.heartbeat
</source>
30. class SomeOutput < Fluent::BufferedOutput
Fluent::Plugin.register_output('myout', self)
config_param :myparam, :string
def format(tag, time, record)
[tag, time, record].to_json + "n"
end
def write(chunk)
puts chunk.read
end
end
<match **>
type myout
myparam foobar
</match>
31. class MyTailInput < Fluent::TailInput
Fluent::Plugin.register_input('mytail', self)
def configure_parser(conf)
...
end
def parse_line(line)
array = line.split(“t”)
record = {“user”=>array[0], “item”=>array[1]}
time = Engine.now
return time, record
end
end
<source>
type mytail
</source>