SlideShare a Scribd company logo
Sadayuki Furuhashi
Founder & Software Architect
Set Up Once, Collect More.
Treasure Data, inc.
Self-introduction
> Sadayuki Furuhashi
github/twitter: @frsyuki
> Treasure Data, Inc.
Founder & Software Architect
> Open source projects
MessagePack - efficient object serializer
Fluentd - data collection tool
ServerEngine - Ruby framework to build multiprocess servers
LS4 - distributed object storage system (suspended)
kumofs - distributed key-value data store (suspended)
What’s Fluentd?
An extensible & reliable data collection tool
What’s Fluentd?
An extensible & reliable data collection tool
simple core + plugins
buffering, HA (failover),
load balance, etc.
like syslogd
Blueflood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your system
filter / buffer / routing
Blueflood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your system
filter / buffer / routing
Blueflood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your system
filter / buffer / routing
Blueflood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your system
filter / buffer / routing
Input Plugins Output Plugins
Buffer Plugins
(Filter Plugins)
# logs from a file
<source>
type tail
path /var/log/httpd.log
format apache2
tag web.access
</source>
# logs from client libraries
<source>
type forward
port 24224
</source>
# store logs to MongoDB and S3
<match **>
type copy
<match>
type mongo
host mongo.example.com
capped
capped_size 200m
</match>
<match>
type s3
path archive/
</match>
</match>
Fluentd
Fluentd - Set Up Once, Collect More
API servers
Fluentd
Rails app
Fluentd
Queue
PerfectQueue
Ruby app
Fluentd
Fluentd
Rails app
worker servers
Ruby app
Fluentd
fluent-logger-ruby
+ in_forward
watch server
scriptout_forward
in_exec
Fluentd in Treasure Data
watch server
Librato Metrics
for realtime analysis
Treasure Data
for historical analysis
out_tdlog out_metricsense
✓ streaming aggregation
Fluentd in Treasure Data
Fluentd
Internal Architecture
Input Buffer Output
Plugin Plugin Plugin
2012-02-04 01:33:51
myapp.buylog {
“user”: ”me”,
“path”: “/buyItem”,
“price”: 150,
“referer”: “/landing”
}
time
tag
record
Architecture :: Input plugins
Input
HTTP+JSON (in_http)
File tail (in_tail)
Syslog (in_syslog)
...
Plugin
✓ Receive logs
✓ Or pull logs from data sources
✓ in non-blocking manner
Architecture :: Output plugins
Plugin
✓ Write or send event logs
Output
File (out_file)
Amazon S3 (out_s3)
MongoDB (out_mongo)
...
Architecture :: Buffer plugins
Plugin
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safety
Buffer
Memory (buf_memory)
File (buf_file)
Architecture :: Buffer plugins
Plugin
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safety
chunk
chunk
chunk output
Input
in_tail
Apache
buf_filein_tail
Fluentd
/var/log/access.log
/var/log/fluentd/bufer
in_tail
Apache
buf_filein_tail
Fluentd
/var/log/access.log
/var/log/fluentd/bufer
✓ retrying automatically,
✓ with exponential wait,
✓ and persistence on a disk.
in_tail
Apache
buf_filein_tail
Fluentd
/var/log/access.log
/var/log/fluentd/bufer
✓ buffering for any outputs,
✓ with exponential wait,
✓ and persistence on a disk.Amazon S3 Hadoop
Fluentd
Fluentd Fluentd
fluentd
applications, log files, HTTP, etc.
Fluentdentd Fluentd Flu
Heartbeat
Fluentd
Fluentd Fluentd
fluentd
applications, log files, HTTP, etc.
Fluentdentd Fluentd Flu
Heartbeat
✓ load balancing or active-backup
class CassandraOutput < BufferedOutput
Fluent::Plugin.register_output('cassandra', self)
require 'cassandra'
config_param :keyspace, :string
config_param :columnfamily, :string
config_param :host, :string, :default => 'localhost'
config_param :port, :int, :default => 9160
def start
super
@connection = Cassandra.new(@keyspace, “#{@host}:#{@port}”)
end
def format(tag, time, record)
record['tag'] = tag
record['time'] = time
record.to_msgpack
end
def write(chunk)
chunk.msgpack_each do |record|
@connection.insert(@columnfamily, "#{record["tag"]}_#{record["time"]}", record)
end
end
end
out_cassandra
Use cases
http://www.slideshare.net/tagomoris/rubykaigi-2013-111130
“Complex Event Processing on Ruby, Fluentd and Norikra”
TAGOMORI Satoshi, RubyKaigi 2013
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
“Log analysis system with Hadoop”
NHN Japan Corp., Hadoop Conference Japan 2013
http://www.slideshare.net/sylvainkalache/fluentd-at-slideshare
“fluentd at slideshare”
@SylvainKalache, Fluentd meetup
Use cases
http://www.slideshare.net/frsyuki/how-24042353
“How we use Fluentd in Treasure Data”
Sadayuki Furuhashi, Fluentd meetup at slideshare
http://www.slideshare.net/sematext/solr-for-indexing-and-searching-logs
“Using Solr to Search and Analyze Logs”
Radu Gheorghe
http://docs.fluentd.org/articles/free-alternative-to-splunk-by-fluentd
“Free Alternative to Splunk Using Fluentd”
Expected discussions...
> Who are using Fluentd?
> What’s the differences compared to XYZ?
> Is there a plugin to send/recv data to/from XYZ?
> How can my system XYZ send data to Fluentd?
> Does Fluentd really work in case of XYZ?
Links
http://fluentd.org/plugin/
class SomeInput < Fluent::Input
Fluent::Plugin.register_input('myin', self)
config_param :tag, :string
def start
Thread.new {
while true
time = Engine.new
record = {“user”=>1, “size”=>1}
Engine.emit(@tag, time, record)
end
}
end
def shutdown
...
end
end
<source>
type myin
tag myapp.api.heartbeat
</source>
class SomeOutput < Fluent::BufferedOutput
Fluent::Plugin.register_output('myout', self)
config_param :myparam, :string
def format(tag, time, record)
[tag, time, record].to_json + "n"
end
def write(chunk)
puts chunk.read
end
end
<match **>
type myout
myparam foobar
</match>
class MyTailInput < Fluent::TailInput
Fluent::Plugin.register_input('mytail', self)
def configure_parser(conf)
...
end
def parse_line(line)
array = line.split(“t”)
record = {“user”=>array[0], “item”=>array[1]}
time = Engine.now
return time, record
end
end
<source>
type mytail
</source>

More Related Content

Fluentd - Set Up Once, Collect More