SlideShare a Scribd company logo
Fluentd
the missing log collector


         fluentd.org
Sadayuki Furuhashi
Self-introduction

>   Sadayuki Furuhashi
    twitter/github: @frsyuki

>   Treasure Data, Inc.
    Founder & Software Architect

>   Open source projects
    MessagePack - “It’s like JSON. but fast and small”
    Fluentd - “Log everything in JSON”
Today’s topic:

Make log collection easy
     using Fluentd
Reporting & Monitoring
Collect      Store    Process      Visualize




          Reporting & Monitoring
easier & shorter time


Collect     Store     Process      Visualize




          Hadoop / Hive           Excel

          MongoDB                 Tableau
          Treasure Data           R
How to shorten here?    easier & shorter time


    Collect              Store         Process    Visualize




                       Hadoop / Hive             Excel

                       MongoDB                   Tableau
                       Treasure Data             R
How to shorten here?    easier & shorter time


    Collect              Store         Process    Visualize




                       Hadoop / Hive             Excel

                       MongoDB                   Tableau
                       Treasure Data             R
Fluentd Users
How Fluentd works?
Fluentd
   =
syslogd
   +
 many
Fluentd
   =      ✓ Plugins
syslogd
   +      ✓ JSON
 many
Access logs                               Alerting
  Apache                                    Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
                                          Archiving
              filter / buffer / routing
Databases                                   Amazon S3
Access logs                               Alerting
  Apache                                    Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
                                          Archiving
              filter / buffer / routing
Databases                                   Amazon S3
Access logs                               Alerting
  Apache                                    Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
                                          Archiving
              filter / buffer / routing
Databases                                   Amazon S3
Input Plugins                      Output Plugins




                Buffer Plugins
                (Filter Plugins)
log
Input Plugins                   Output Plugins

                                   time
                                     tag

                 2012-02-04 01:33:51
                 myapp.buylog {
         JSON      “user”: ”me”,
                   “path”: “/buyItem”,
                   “price”: 150,
                   “referer”: “/landing”
                 }
                                record
in_tail: reads file and parses lines

      apache
                             fluentd
                   in_tail


               access.log



                              ✓ read a log file
                              ✓ custom regexp
                              ✓ custom parser in Ruby
failure handling & retrying

      apache
                             fluentd
                   in_tail


               access.log    buffer



                                      ✓ retry automatically
                                      ✓ exponential retry wait
                                      ✓ persistent on a file
routing / copying
                                               Hadoop
      apache
                             fluentd
                   in_tail


               access.log    buffer

                                             Amazon S3


                                  ✓ routing based on tags
                                  ✓ copy to multiple storages
# logs from a file             # store logs to MongoDB and S3
<source>                       <match **>
  type tail                      type copy
  path /var/log/httpd.log
  format apache2                 <match>
  tag web.access                   type mongo
</source>                          host mongo.example.com
                                   capped
# logs from client libraries       capped_size 200m
<source>                         </match>
  type forward
  port 24224                     <match>
</source>                          type s3
                                   path archive/
                                 </match>
                               </match>


                                              Fluentd
forwarding



          fluentd
                            send / ack
          fluentd   fluentd
Fluentd                            fluentd
          fluentd   fluentd
          fluentd
Fluentd
   =      ✓ Plugins
syslogd
   +      ✓ JSON
 many
Fluentd - plugin distribution platform



$ fluent-gem search -rd fluent-plugin


$ fluent-gem install fluent-plugin-mongo
Fluentd - plugin distribution platform



$ fluent-gem search -rd fluent-plugin


$ fluent-gem install fluent-plugin-mongo




                               117 plugins!
Treasure Data?
 Collect           Store     Process      Visualize




                 Hadoop / Hive           Excel

                 MongoDB                 Tableau
                 Treasure Data           R


                            our company provides
We’re Hiring!
careers@treasure-data.com
 http://www.treasure-data.com/careers/
Backup slides
Fluentd and Flume NG - configuration
                       # source
                       host1.sources = avro-source1
                       host1.sources.avro-source1.type = avro
<source>               host1.sources.avro-source1.bind = 0.0.0.0
  type forward         host1.sources.avro-source1.port = 41414
  port 24224           host1.sources.avro-source1.channels = ch1
</source>
                       # channel
<match **>             host1.channels = ch_avro_log
  type file            host1.channels.ch_avro_log.type = memory
  path /var/log/logs
</match>               # sink
                       host1.sinks = log-sink1
                       host1.sinks.log-sink1.type = logger
                       host1.sinks.log-sink1.channel = ch1
Fluentd and Flume NG - topology

           fluentd
                                 send / ack
           fluentd     fluentd
Fluentd                                 fluentd
           fluentd     fluentd
           fluentd


           Agent
                                 send / ack
           Agent     Collector
Flume NG                               Collector
           Agent     Collector
           Agent
out_hdfs                                 ✓ automatic fail-over
                                         ✓ load balancing

                                                   fluentd
      apache
                                fluentd             fluentd
                     in_tail
                                                   fluentd

                access.log      buffer


   ✓ slice files based on time
                                          ✓ retry automatically
       2013-01-01/01/access.log.gz        ✓ exponential retry wait
       2013-01-01/02/access.log.gz        ✓ persistent on a file
       2013-01-01/03/access.log.gz
       ...
out_s3

      apache
                                fluentd
                     in_tail


                 access.log     buffer       Amazon S3

   ✓ slice files based on time
                                         ✓ retry automatically
         2013-01-01/01/access.log.gz     ✓ exponential retry wait
         2013-01-01/02/access.log.gz     ✓ persistent on a file
         2013-01-01/03/access.log.gz
         ...
out_hdfs                                   ✓ custom text formater



      apache
                                fluentd
                     in_tail


                access.log      buffer            HDFS

   ✓ slice files based on time
                                         ✓ retry automatically
       2013-01-01/01/access.log.gz       ✓ exponential retry wait
       2013-01-01/02/access.log.gz       ✓ persistent on a file
       2013-01-01/03/access.log.gz
       ...

More Related Content

Fluentd meetup at Slideshare

  • 1. Fluentd the missing log collector fluentd.org Sadayuki Furuhashi
  • 2. Self-introduction > Sadayuki Furuhashi twitter/github: @frsyuki > Treasure Data, Inc. Founder & Software Architect > Open source projects MessagePack - “It’s like JSON. but fast and small” Fluentd - “Log everything in JSON”
  • 3. Today’s topic: Make log collection easy using Fluentd
  • 5. Collect Store Process Visualize Reporting & Monitoring
  • 6. easier & shorter time Collect Store Process Visualize Hadoop / Hive Excel MongoDB Tableau Treasure Data R
  • 7. How to shorten here? easier & shorter time Collect Store Process Visualize Hadoop / Hive Excel MongoDB Tableau Treasure Data R
  • 8. How to shorten here? easier & shorter time Collect Store Process Visualize Hadoop / Hive Excel MongoDB Tableau Treasure Data R
  • 11. Fluentd = syslogd + many
  • 12. Fluentd = ✓ Plugins syslogd + ✓ JSON many
  • 13. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd Archiving filter / buffer / routing Databases Amazon S3
  • 14. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd Archiving filter / buffer / routing Databases Amazon S3
  • 15. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd Archiving filter / buffer / routing Databases Amazon S3
  • 16. Input Plugins Output Plugins Buffer Plugins (Filter Plugins)
  • 17. log Input Plugins Output Plugins time tag 2012-02-04 01:33:51 myapp.buylog { JSON “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” } record
  • 18. in_tail: reads file and parses lines apache fluentd in_tail access.log ✓ read a log file ✓ custom regexp ✓ custom parser in Ruby
  • 19. failure handling & retrying apache fluentd in_tail access.log buffer ✓ retry automatically ✓ exponential retry wait ✓ persistent on a file
  • 20. routing / copying Hadoop apache fluentd in_tail access.log buffer Amazon S3 ✓ routing based on tags ✓ copy to multiple storages
  • 21. # logs from a file # store logs to MongoDB and S3 <source> <match **> type tail type copy path /var/log/httpd.log format apache2 <match> tag web.access type mongo </source> host mongo.example.com capped # logs from client libraries capped_size 200m <source> </match> type forward port 24224 <match> </source> type s3 path archive/ </match> </match> Fluentd
  • 22. forwarding fluentd send / ack fluentd fluentd Fluentd fluentd fluentd fluentd fluentd
  • 23. Fluentd = ✓ Plugins syslogd + ✓ JSON many
  • 24. Fluentd - plugin distribution platform $ fluent-gem search -rd fluent-plugin $ fluent-gem install fluent-plugin-mongo
  • 25. Fluentd - plugin distribution platform $ fluent-gem search -rd fluent-plugin $ fluent-gem install fluent-plugin-mongo 117 plugins!
  • 26. Treasure Data? Collect Store Process Visualize Hadoop / Hive Excel MongoDB Tableau Treasure Data R our company provides
  • 29. Fluentd and Flume NG - configuration # source host1.sources = avro-source1 host1.sources.avro-source1.type = avro <source> host1.sources.avro-source1.bind = 0.0.0.0 type forward host1.sources.avro-source1.port = 41414 port 24224 host1.sources.avro-source1.channels = ch1 </source> # channel <match **> host1.channels = ch_avro_log type file host1.channels.ch_avro_log.type = memory path /var/log/logs </match> # sink host1.sinks = log-sink1 host1.sinks.log-sink1.type = logger host1.sinks.log-sink1.channel = ch1
  • 30. Fluentd and Flume NG - topology fluentd send / ack fluentd fluentd Fluentd fluentd fluentd fluentd fluentd Agent send / ack Agent Collector Flume NG Collector Agent Collector Agent
  • 31. out_hdfs ✓ automatic fail-over ✓ load balancing fluentd apache fluentd fluentd in_tail fluentd access.log buffer ✓ slice files based on time ✓ retry automatically 2013-01-01/01/access.log.gz ✓ exponential retry wait 2013-01-01/02/access.log.gz ✓ persistent on a file 2013-01-01/03/access.log.gz ...
  • 32. out_s3 apache fluentd in_tail access.log buffer Amazon S3 ✓ slice files based on time ✓ retry automatically 2013-01-01/01/access.log.gz ✓ exponential retry wait 2013-01-01/02/access.log.gz ✓ persistent on a file 2013-01-01/03/access.log.gz ...
  • 33. out_hdfs ✓ custom text formater apache fluentd in_tail access.log buffer HDFS ✓ slice files based on time ✓ retry automatically 2013-01-01/01/access.log.gz ✓ exponential retry wait 2013-01-01/02/access.log.gz ✓ persistent on a file 2013-01-01/03/access.log.gz ...