SlideShare a Scribd company logo
Structured logging	

!
Reliable forwarding	

!
Pluggable architecturehttp://fluentd.org/
Agenda
> Background
> Overview
> Product Comparison
> Use cases
Background
Data Processing
Collect Store Process Visualize
Data source
Reporting
Monitoring
Related Products
Store Process
Cloudera
Horton Works
Treasure Data
Collect Visualize
Tableau
Excel
R
easier & shorter time
???
The basics of fluentd
Before Fluentd
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
FluentLog
High Latency!
must wait for a day...
After Fluentd
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
In streaming!
Fluentd Fluentd Fluentd
Fluentd Fluentd
Overview
> Open sourced log collector written in Ruby
> Reliable, scalable and easy to extend
> Using rubygems ecosystem for plugins
!
!
In short
It’s like syslogd, but
uses JSON for log messages
tail
insert
event
buffering
127.0.0.1 - - [11/Dec/2012:07:26:27] "GET / ...
127.0.0.1 - - [11/Dec/2012:07:26:30] "GET / ...
127.0.0.1 - - [11/Dec/2012:07:26:32] "GET / ...
127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ...
127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ...
...
Fluentd
Web Server
2012-02-04 01:33:51	

apache.log	

{	

"host": "127.0.0.1",	

"method": "GET",	

...	

}
Example (apache to mongo)
> default second unit
> from data source or

adding parsed time
Event structure(log message)
✓ Time
> for message routing
✓ Tag
> JSON format
> MessagePack

internally
> non-unstructured
✓ Record
Pluggable Architecture
Buffer Output
Input
> Forward
> HTTP
> File tail
> dstat
> ...
> Forward
> File
> MongoDB
> ...
> File
> Memory
Engine
Output
> rewrite
> ...
Pluggable Pluggable
Fluentd
# Ruby!
Fluent.open(“myapp”)!
Fluent.event(“login”, {“user” => 38})!
#=> 2012-12-11 07:56:01 myapp.login {“user”:38}
> Ruby	

> Java	

> Perl	

> PHP	

> Python	

> D	

> Scala	

> ...
Application
Time:Tag:Record
Client libraries
Configuration and operation
> No central / master node
> HTTP include helps configuration sharing
> Operation depends on your environment
> Use your deamon management
> Use Chef in Treasure Data
> Apache like syntax and Ruby DSL
# receive events via HTTP
<source>
type http
port 8888
</source>
!
# read logs from a file
<source>
type tail
path /var/log/httpd.log
format apache
tag apache.access
</source>
!
# save access logs to MongoDB
<match apache.access>
type mongo
database apache
collection log
</match>
# save alerts to a file	

<match alert.**>	

type file	

path /var/log/fluent/alerts	

</match>	

!
# forward other logs to servers	

<match **>	

type forward	

<server>	

host 192.168.0.11	

weight 20	

</server>	

<server>	

host 192.168.0.12	

weight 60	

</server>	

</match>	

!
include http://example.com/conf
Reliability (core + plugin)
> Buffering
> Use file buffer for persistent data
> buffer chunk has ID for idempotent
> Retrying
> Error handling
> transaction, failover, etc on forward plugin
> secondary for backup
Plugins - use rubygems
$ fluent-gem search -rd fluent-plugin!
!
$ fluent-gem search -rd fluent-mixin!
!
$ fluent-gem install fluent-plugin-mongo
http://www.fluentd.org/plugins
in_tail
✓ read a log file!
✓ read log files in directory!
✓ custom regexp!
✓ custom parser in Ruby
FluentdApache
access.log
> apache
> apache2
> syslog
> nginx
> json
> csv
> tsv
> ltsv
Supported format:
> none
> multiline



Fluentd
out_mongo
Apache
bufferaccess.log
✓ retry automatically!
✓ exponential retry wait!
✓ persistent on a file
Fluentd
out_webhdfs
buffer
✓ retry automatically!
✓ exponential retry wait!
✓ persistent on a file
✓ slice files based on time
2013-01-01/01/access.log.gz!
2013-01-01/02/access.log.gz!
2013-01-01/03/access.log.gz!
...
HDFS
✓ custom text formatter
Apache
access.log
out_copy + other plugins
✓ routing based on tags!
✓ copy to multiple storages
Amazon S3
Hadoop
Fluentd
buffer
Apache
access.log
out_forward
apache
✓ automatic fail-over!
✓ load balancing
FluentdApache
bufferaccess.log
✓ retry automatically!
✓ exponential retry wait!
✓ persistent on a file
Fluentd
Fluentd
Fluentd
Forward topology
send/ack
Fluentd
Fluentd
Fluentd
Fluentd
Fluentd
Fluentd
Fluentd
send/ack
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
filter / buffer / routing
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
filter / buffer / routing
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
filter / buffer / routing
td-agent
> Open sourced distribution package of fluentd
> ETL part of Treasure Data
> deb, rpm, dmg (since td-agent 2.0)
> Including useful components
> ruby, jemalloc, fluentd
> 3rd party gems: td, mongo, webhdfs, etc…
> http://packages.treasure-data.com/
v1
> New features without breaking compatibility
> Filter, Label and better error handling
> Serverengine based: multi-process, signal, etc.
> New configuration and DSL format
> JRuby and Windows support
> github issue: Plan for v1 release #251
Use cases
Treasure Data
Frontend
Job Queue
Worker
Hadoop
Hadoop
Fluentd
Applications push
metrics to Fluentd

(via local Fluentd)
Librato
Metrics
for realtime analysis
Treasure
Data
for historical analysis
Fluentd sums up data minutes

(partial aggregation)
hundreds of app servers
sends event logs
sends event logs
sends event logs
Rails app td-agent
td-agent
td-agent
Google
Spreadsheet
Treasure Data
MySQL
Logs are available
after several mins.
Daily/Hourly
Batch
KPI
visualizationFeedback rankings
Rails app
Rails app
Unlimited scalability
Flexible schema
Realtime
Less performance impact
Cookpad
✓ Over 100 RoR servers (2012/2/4)
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
NHN Japan
by @tagomoris
✓ 16 nodes!
✓ 120,000+ lines/sec!
✓ 400Mbps at peak!
✓ 1.5+ TB/day (raw)
Web
Servers Fluentd

Cluster
Archive

Storage

(scribed)
Fluentd

Watchers
Graph

Tools
Notifications

(IRC)
Hadoop Cluster

CDH4

(HDFS, YARN)
webhdfs
Huahin

Manager
hive

server
STREAM
Shib ShibUI
BATCH
SCHEDULED
BATCH
Other usecases
> Collect censor logs
> Embedded devise, Rapsberry Pi, etc
> Integrated with Elasticsearch and Kibana
> Integrated with Norikra CEP engine



http://www.fluentd.org/guides
Other companies
http://www.fluentd.org/testimonials
> Fluentd is a widely-used log collector
> There are many use cases
> Many contributors and plugins
> Keep it simple
> Easy to integrate your environment
Conclusion

More Related Content

The basics of fluentd