SlideShare a Scribd company logo
Log analysis system
                         with Hadoop
              in livedoor 2013 Winter
                                      2013/01/20
              Hadoop Conference Japan 2013 Winter

                   TAGOMORI Satoshi (@tagomoris)
                                NHN Japan Corp.
13年1月21日月曜日
TAGOMORI SATOSHI (@TAGOMORIS)
                             NHN JAPAN CORP.
         WEB SERVICE BUSINESS DIVISION DEVELOPMENT DEPARTMENT 2
                   (IN JAN 2012, LIVEDOOR -> NHN JAPAN)


13年1月21日月曜日
13年1月21日月曜日
13年1月21日月曜日
livedoor in NHN Japan



13年1月21日月曜日
13年1月21日月曜日
large scale web services
              400+ Web Servers


              5Gbps @ Aug 2009
              15Gbps @ Aug 2011
              20+Gbps @ Jan 2013
               (direct outbound + CDN)

13年1月21日月曜日
giant access log traffic

              At Aug 2011 (HCJ2011)
               From 96 servers
               580GB/day



13年1月21日月曜日
giant access log traffic
              NOW (At Jan 2013 HCJ2013W)
               From 320+ servers
               1.5+ TB/day (raw)
               5,300,000,000+ lines/day
               120,000+ lines/sec (peak time)
               400Mbps log traffic
13年1月21日月曜日
What we want to do
              COUNT PV,UU and others (daily)
              COUNT Service metrics (daily/hourly)
              FIND Surprised Errors [4xx,5xx] (immediately)
              CHECK Response Times (immediately)
              SERCH Logs in troubles (hourly/immediately)


13年1月21日月曜日
Batches and Streams
              Hadoop is for batches
              High performance batch is important
              HDFS has good performance
              Stream log writing and calcurations
                  are also VERY VERY IMPORTANT
              Hybrid System:
              Stream processing + Batch
13年1月21日月曜日
System Overview
                                                            Archive
                                                            Storage
     Web
    Servers                   Fluentd                      (scribed)
                              Cluster
                                                           Notifications
                    STREAM                                    (IRC)
                                        Fluentd
                                        Watchers
                                                              Graph
                                                              Tools

                    webhdfs                           SCHEDULED
                                            BATCH       BATCH
                                    hive
                 Hadoop Cluster    server
                                                    Shib      ShibUI
                 (HDFS, YARN)     Huahin
                                  Manager


13年1月21日月曜日
Hadoop in livedoor 2013
              18 nodes (Master 3 + Slave 15)
               120core, 180GB RAM, 100TB HDFS
              CDH4.1.2
               NameNode HA(QJM), WebHDFS
               YARN, Hive + HiverServer1

13年1月21日月曜日
Fluentd in livedoor 2013
              16 nodes (Deliver 4 + Worker 10 + Watcher 2)
              Fluentd (latest release / trunk)
                Ruby based message transfer
                daemon
                Many plugins from rubygems.org


13年1月21日月曜日
Hadoop/Fluentd engineer
              in livedoor 2013



                       1 person.



13年1月21日月曜日
Processes Overview
              Log collection / Archiving
              Parse / Transform / Add flags
              Load into Hive tables
              On-demand queries
              Scheduled queries
              Stream aggregations + Notifications
13年1月21日月曜日
Past and present
              1st gen: Fully batch (late 2011)

                Scribed + Hadoop

              2nd gen: Partially stream processing (earlier 2012)

                Fluentd + Hadoop

              3rd gen: Fully stream processing (late 2012)

                Fluentd + Hadoop + Graph Tools

              4th gen: New Cluster with CDH4 (earlier 2013)

13年1月21日月曜日
BREAK.




13年1月21日月曜日
1st gen: First impl.                             Archive
                                                               Storage
     Web
    Servers                                                   (scribed)
                                Scribed


                     STREAM


                                (LIBHDFS)



                                               BATCH
                 Hadoop Cluster        hive
                                      server
                    CDH3b2                             Shib
                 (Hadoop Streaming)


13年1月21日月曜日
Shib: Hive Web Client




                  https://github.com/tagomoris/shib
13年1月21日月曜日
1st gen: Fully batch
              Log collection / Archiving     Scribed(libhdfs)


              Parse / Transform / Add flags          Hadoop
                                                   Streaming

              Load into Hive tables
                                      HiveServer
              On-demand queries         + Shib

              Scheduled queries
              Stream aggregations + Notifications
13年1月21日月曜日
1st gen: Fully batch
        Simplicity: easy to implement
        Shib: easy to run on-demand query
        Latency: hourly rotation + import batch
        Performance: import batch needs CPU
        Scribed: libhdfs dependency problem

13年1月21日月曜日
2nd gen: +Fluentd
                                                                Archive
                                                                Storage
     Web
    Servers                       Fluentd                      (scribed)
                                  Cluster


                    STREAM




                   Cludera Hoop
                                                BATCH
                 Hadoop Cluster         hive
                                       server
                    CDH3u2                              Shib
                                      Huahin
                     (Hive)           Manager


13年1月21日月曜日
Fluentd stream processing
        out_exec_filter
              any filter programs with STDIN/
              STDOUT
              compatible with Hadoop Streaming!
        out_hoop
              output plugin to write HDFS over Hoop
              Hoop: a.k.a. HttpFs in Hadoop 2.0.x
13年1月21日月曜日
Fluentd stream processing
  Web Servers

                                                   Fluentd worker
                    Fluentd deliver
                                               Fluentd worker
               Fluentd deliver
                                          Fluentd worker
        Fluentd deliver
                                      Fluentd worker
                                                                Hoop Server
                                 Fluentd worker
                                                                    HDFS
                           Fluentd worker
13年1月21日月曜日
Huahin Manager
              REST API for:
               JobTracker (MRv1)
               ResourceManager (YARN)
               HiveServer


                  http://huahinframework.org/huahin-manager/


13年1月21日月曜日
2nd gen: +Fluentd
              Log collection / Archiving           Fluentd


              Parse / Transform / Add flags            Fluentd

              Load into Hive tables
                                      HiveServer
              On-demand queries         + Shib

              Scheduled queries
              Stream aggregations + Notifications
13年1月21日月曜日
2nd gen: +Fluentd
        Compatibility:
         RPC based HDFS/JobTracker Access
        Performance: import needs no CPU
        (Load Only)
        Latency: hourly rotation only
        Latency: hourly rotation for any queries
        Hoop Server: SPOF / traffic bottleneck
13年1月21日月曜日
3rd gen: ++++++
                                                            Archive
                                                            Storage
     Web
    Servers                   Fluentd                      (scribed)
                              Cluster
                                                           Notifications
                    STREAM                                    (IRC)
                                        Fluentd
                                        Watchers
                                                              Graph
                                                              Tools

                    webhdfs                           SCHEDULED
                                            BATCH       BATCH
                 Hadoop Cluster     hive
                                   server
                    CDH3u5                          Shib      ShibUI
                                  Huahin
                     (Hive)       Manager


13年1月21日月曜日
WebHDFS (CDH3u5 or CDH4)
      HttpFs (Hoop)                           NameNode

                                                         DataNode
                                httpfs
                Client
                                server                   DataNode

                         HTTP            Java Native     DataNode


      WebHDFS                                 NameNode

                                                         DataNode
                Client
                                                         DataNode

                                                         DataNode
                                HTTP
13年1月21日月曜日
Fluentd online aggregation

        Semi-realtime aggregation to:
              counts errors of HTTP response
              calculate avg/%tiles of response time
              draw graphs immediately
        Many plugins for real time aggregation

13年1月21日月曜日
Graph Tools:
              GrowthForecast / HRForecast


         Graph drawing tools to update values
              over very simple HTTP request
         GrowthForecast: Real-time values
         HRForecast: Summarized (past) values


13年1月21日月曜日
HTTP Status/Response Time
              on GrowthForecast
   HTTP STATUS: 2XX(BLUE),3XX(GREEN),4XX(ORANGE), 5XX(RED)




   HTTP RESPONSE TIMES: AVG, [90, 95, 98, 99]PERCENTILE




                  http://kazeburo.github.com/GrowthForecast/
13年1月21日月曜日
ShibUI




13年1月21日月曜日
ShibUI




                 https://github.com/kazeburo/hrforecast

13年1月21日月曜日
3rd gen: +++++++
              Log collection / Archiving           Fluentd


              Parse / Transform / Add flags            Fluentd

              Load into Hive tables
                                      HiveServer
              On-demand queries         + Shib


              Scheduled queries       ShibUI
                                                        Fluentd
              Stream aggregations + Notifications
13年1月21日月曜日
3rd gen: +++++++
        NO SPOF: for data stream
        Real time monitoring
        Queries for services:
              Scheduled queries, Visualization
        Latency: hourly rotation for any queries
        SPOF: NameNode (VIP & DRBD is xxxx...)
13年1月21日月曜日
4th gen: NOW
                                                            Archive
                                                            Storage
     Web
    Servers                   Fluentd                      (scribed)
                              Cluster
                                                           Notifications
                    STREAM                                    (IRC)
                                        Fluentd
                                        Watchers
                                                              Graph
                                                              Tools

                    webhdfs                           SCHEDULED
                                            BATCH       BATCH
                 Hadoop Cluster     hive
                                   server
                     CDH4                           Shib      ShibUI
                                  Huahin
                 (HDFS, YARN)     Manager


13年1月21日月曜日
4th gen: CDH4.1.2
        NO SPOF: QJM based NameNode HA
        Performance: YARN (?)
        Latency: multiple rotation in an hour
              with hive table schema change
        NONE should be improved!

13年1月21日月曜日
Good parts for solo engineer:

              RPC: Loosely-coupled architecture
               High compatibility / Low maintenance cost

              Open Source
               All components are OSS

              Open knowledge
               Well blogged / presentationed



13年1月21日月曜日
OUR DRIVER IS
                "OPENNESS"




              thanks to crouton & @kbysmnr !
13年1月21日月曜日
Software list:

              https://ccp.cloudera.com/display/SUPPORT/Downloads
              http://fluentd.org/
              http://fluentd.org/plugin/
              https://github.com/tagomoris/fluent-agent-lite
              https://github.com/tagomoris/shib
              https://github.com/tagomoris/shibui
              http://huahinframework.org/huahin-manager/
              http://kazeburo.github.com/GrowthForecast/
              http://github.com/kazeburo/hrforecast




13年1月21日月曜日
See also:
          Hadoop and Subsystem in livedoor (2011)
              http://www.slideshare.net/tagomoris/hadoop-and-subsystems-in-livedoor-hcj11f


          Distributed message stream processing on Fluentd
              http://www.slideshare.net/tagomoris/distributed-stream-processing-on-fluentd-fluentd


          Hive Tools in NHN Japan
              http://www.slideshare.net/tagomoris/hive-tools-in-nhn-japan-hadoopreading


          OSS based large scale log aggregation in livedoor
              http://www.slideshare.net/tagomoris/oss-nhntech


          Fluentd and WebHDFS
              http://www.slideshare.net/tagomoris/fluentd-and-webhdfs




13年1月21日月曜日

More Related Content

Log analysis with Hadoop in livedoor 2013

  • 1. Log analysis system with Hadoop in livedoor 2013 Winter 2013/01/20 Hadoop Conference Japan 2013 Winter TAGOMORI Satoshi (@tagomoris) NHN Japan Corp. 13年1月21日月曜日
  • 2. TAGOMORI SATOSHI (@TAGOMORIS) NHN JAPAN CORP. WEB SERVICE BUSINESS DIVISION DEVELOPMENT DEPARTMENT 2 (IN JAN 2012, LIVEDOOR -> NHN JAPAN) 13年1月21日月曜日
  • 5. livedoor in NHN Japan 13年1月21日月曜日
  • 7. large scale web services 400+ Web Servers 5Gbps @ Aug 2009 15Gbps @ Aug 2011 20+Gbps @ Jan 2013 (direct outbound + CDN) 13年1月21日月曜日
  • 8. giant access log traffic At Aug 2011 (HCJ2011) From 96 servers 580GB/day 13年1月21日月曜日
  • 9. giant access log traffic NOW (At Jan 2013 HCJ2013W) From 320+ servers 1.5+ TB/day (raw) 5,300,000,000+ lines/day 120,000+ lines/sec (peak time) 400Mbps log traffic 13年1月21日月曜日
  • 10. What we want to do COUNT PV,UU and others (daily) COUNT Service metrics (daily/hourly) FIND Surprised Errors [4xx,5xx] (immediately) CHECK Response Times (immediately) SERCH Logs in troubles (hourly/immediately) 13年1月21日月曜日
  • 11. Batches and Streams Hadoop is for batches High performance batch is important HDFS has good performance Stream log writing and calcurations are also VERY VERY IMPORTANT Hybrid System: Stream processing + Batch 13年1月21日月曜日
  • 12. System Overview Archive Storage Web Servers Fluentd (scribed) Cluster Notifications STREAM (IRC) Fluentd Watchers Graph Tools webhdfs SCHEDULED BATCH BATCH hive Hadoop Cluster server Shib ShibUI (HDFS, YARN) Huahin Manager 13年1月21日月曜日
  • 13. Hadoop in livedoor 2013 18 nodes (Master 3 + Slave 15) 120core, 180GB RAM, 100TB HDFS CDH4.1.2 NameNode HA(QJM), WebHDFS YARN, Hive + HiverServer1 13年1月21日月曜日
  • 14. Fluentd in livedoor 2013 16 nodes (Deliver 4 + Worker 10 + Watcher 2) Fluentd (latest release / trunk) Ruby based message transfer daemon Many plugins from rubygems.org 13年1月21日月曜日
  • 15. Hadoop/Fluentd engineer in livedoor 2013 1 person. 13年1月21日月曜日
  • 16. Processes Overview Log collection / Archiving Parse / Transform / Add flags Load into Hive tables On-demand queries Scheduled queries Stream aggregations + Notifications 13年1月21日月曜日
  • 17. Past and present 1st gen: Fully batch (late 2011) Scribed + Hadoop 2nd gen: Partially stream processing (earlier 2012) Fluentd + Hadoop 3rd gen: Fully stream processing (late 2012) Fluentd + Hadoop + Graph Tools 4th gen: New Cluster with CDH4 (earlier 2013) 13年1月21日月曜日
  • 19. 1st gen: First impl. Archive Storage Web Servers (scribed) Scribed STREAM (LIBHDFS) BATCH Hadoop Cluster hive server CDH3b2 Shib (Hadoop Streaming) 13年1月21日月曜日
  • 20. Shib: Hive Web Client https://github.com/tagomoris/shib 13年1月21日月曜日
  • 21. 1st gen: Fully batch Log collection / Archiving Scribed(libhdfs) Parse / Transform / Add flags Hadoop Streaming Load into Hive tables HiveServer On-demand queries + Shib Scheduled queries Stream aggregations + Notifications 13年1月21日月曜日
  • 22. 1st gen: Fully batch Simplicity: easy to implement Shib: easy to run on-demand query Latency: hourly rotation + import batch Performance: import batch needs CPU Scribed: libhdfs dependency problem 13年1月21日月曜日
  • 23. 2nd gen: +Fluentd Archive Storage Web Servers Fluentd (scribed) Cluster STREAM Cludera Hoop BATCH Hadoop Cluster hive server CDH3u2 Shib Huahin (Hive) Manager 13年1月21日月曜日
  • 24. Fluentd stream processing out_exec_filter any filter programs with STDIN/ STDOUT compatible with Hadoop Streaming! out_hoop output plugin to write HDFS over Hoop Hoop: a.k.a. HttpFs in Hadoop 2.0.x 13年1月21日月曜日
  • 25. Fluentd stream processing Web Servers Fluentd worker Fluentd deliver Fluentd worker Fluentd deliver Fluentd worker Fluentd deliver Fluentd worker Hoop Server Fluentd worker HDFS Fluentd worker 13年1月21日月曜日
  • 26. Huahin Manager REST API for: JobTracker (MRv1) ResourceManager (YARN) HiveServer http://huahinframework.org/huahin-manager/ 13年1月21日月曜日
  • 27. 2nd gen: +Fluentd Log collection / Archiving Fluentd Parse / Transform / Add flags Fluentd Load into Hive tables HiveServer On-demand queries + Shib Scheduled queries Stream aggregations + Notifications 13年1月21日月曜日
  • 28. 2nd gen: +Fluentd Compatibility: RPC based HDFS/JobTracker Access Performance: import needs no CPU (Load Only) Latency: hourly rotation only Latency: hourly rotation for any queries Hoop Server: SPOF / traffic bottleneck 13年1月21日月曜日
  • 29. 3rd gen: ++++++ Archive Storage Web Servers Fluentd (scribed) Cluster Notifications STREAM (IRC) Fluentd Watchers Graph Tools webhdfs SCHEDULED BATCH BATCH Hadoop Cluster hive server CDH3u5 Shib ShibUI Huahin (Hive) Manager 13年1月21日月曜日
  • 30. WebHDFS (CDH3u5 or CDH4) HttpFs (Hoop) NameNode DataNode httpfs Client server DataNode HTTP Java Native DataNode WebHDFS NameNode DataNode Client DataNode DataNode HTTP 13年1月21日月曜日
  • 31. Fluentd online aggregation Semi-realtime aggregation to: counts errors of HTTP response calculate avg/%tiles of response time draw graphs immediately Many plugins for real time aggregation 13年1月21日月曜日
  • 32. Graph Tools: GrowthForecast / HRForecast Graph drawing tools to update values over very simple HTTP request GrowthForecast: Real-time values HRForecast: Summarized (past) values 13年1月21日月曜日
  • 33. HTTP Status/Response Time on GrowthForecast HTTP STATUS: 2XX(BLUE),3XX(GREEN),4XX(ORANGE), 5XX(RED) HTTP RESPONSE TIMES: AVG, [90, 95, 98, 99]PERCENTILE http://kazeburo.github.com/GrowthForecast/ 13年1月21日月曜日
  • 35. ShibUI https://github.com/kazeburo/hrforecast 13年1月21日月曜日
  • 36. 3rd gen: +++++++ Log collection / Archiving Fluentd Parse / Transform / Add flags Fluentd Load into Hive tables HiveServer On-demand queries + Shib Scheduled queries ShibUI Fluentd Stream aggregations + Notifications 13年1月21日月曜日
  • 37. 3rd gen: +++++++ NO SPOF: for data stream Real time monitoring Queries for services: Scheduled queries, Visualization Latency: hourly rotation for any queries SPOF: NameNode (VIP & DRBD is xxxx...) 13年1月21日月曜日
  • 38. 4th gen: NOW Archive Storage Web Servers Fluentd (scribed) Cluster Notifications STREAM (IRC) Fluentd Watchers Graph Tools webhdfs SCHEDULED BATCH BATCH Hadoop Cluster hive server CDH4 Shib ShibUI Huahin (HDFS, YARN) Manager 13年1月21日月曜日
  • 39. 4th gen: CDH4.1.2 NO SPOF: QJM based NameNode HA Performance: YARN (?) Latency: multiple rotation in an hour with hive table schema change NONE should be improved! 13年1月21日月曜日
  • 40. Good parts for solo engineer: RPC: Loosely-coupled architecture High compatibility / Low maintenance cost Open Source All components are OSS Open knowledge Well blogged / presentationed 13年1月21日月曜日
  • 41. OUR DRIVER IS "OPENNESS" thanks to crouton & @kbysmnr ! 13年1月21日月曜日
  • 42. Software list: https://ccp.cloudera.com/display/SUPPORT/Downloads http://fluentd.org/ http://fluentd.org/plugin/ https://github.com/tagomoris/fluent-agent-lite https://github.com/tagomoris/shib https://github.com/tagomoris/shibui http://huahinframework.org/huahin-manager/ http://kazeburo.github.com/GrowthForecast/ http://github.com/kazeburo/hrforecast 13年1月21日月曜日
  • 43. See also: Hadoop and Subsystem in livedoor (2011) http://www.slideshare.net/tagomoris/hadoop-and-subsystems-in-livedoor-hcj11f Distributed message stream processing on Fluentd http://www.slideshare.net/tagomoris/distributed-stream-processing-on-fluentd-fluentd Hive Tools in NHN Japan http://www.slideshare.net/tagomoris/hive-tools-in-nhn-japan-hadoopreading OSS based large scale log aggregation in livedoor http://www.slideshare.net/tagomoris/oss-nhntech Fluentd and WebHDFS http://www.slideshare.net/tagomoris/fluentd-and-webhdfs 13年1月21日月曜日