Log analysis with Hadoop in livedoor 2013
- 1. Log analysis system
with Hadoop
in livedoor 2013 Winter
2013/01/20
Hadoop Conference Japan 2013 Winter
TAGOMORI Satoshi (@tagomoris)
NHN Japan Corp.
13年1月21日月曜日
- 2. TAGOMORI SATOSHI (@TAGOMORIS)
NHN JAPAN CORP.
WEB SERVICE BUSINESS DIVISION DEVELOPMENT DEPARTMENT 2
(IN JAN 2012, LIVEDOOR -> NHN JAPAN)
13年1月21日月曜日
- 7. large scale web services
400+ Web Servers
5Gbps @ Aug 2009
15Gbps @ Aug 2011
20+Gbps @ Jan 2013
(direct outbound + CDN)
13年1月21日月曜日
- 8. giant access log traffic
At Aug 2011 (HCJ2011)
From 96 servers
580GB/day
13年1月21日月曜日
- 9. giant access log traffic
NOW (At Jan 2013 HCJ2013W)
From 320+ servers
1.5+ TB/day (raw)
5,300,000,000+ lines/day
120,000+ lines/sec (peak time)
400Mbps log traffic
13年1月21日月曜日
- 10. What we want to do
COUNT PV,UU and others (daily)
COUNT Service metrics (daily/hourly)
FIND Surprised Errors [4xx,5xx] (immediately)
CHECK Response Times (immediately)
SERCH Logs in troubles (hourly/immediately)
13年1月21日月曜日
- 11. Batches and Streams
Hadoop is for batches
High performance batch is important
HDFS has good performance
Stream log writing and calcurations
are also VERY VERY IMPORTANT
Hybrid System:
Stream processing + Batch
13年1月21日月曜日
- 12. System Overview
Archive
Storage
Web
Servers Fluentd (scribed)
Cluster
Notifications
STREAM (IRC)
Fluentd
Watchers
Graph
Tools
webhdfs SCHEDULED
BATCH BATCH
hive
Hadoop Cluster server
Shib ShibUI
(HDFS, YARN) Huahin
Manager
13年1月21日月曜日
- 13. Hadoop in livedoor 2013
18 nodes (Master 3 + Slave 15)
120core, 180GB RAM, 100TB HDFS
CDH4.1.2
NameNode HA(QJM), WebHDFS
YARN, Hive + HiverServer1
13年1月21日月曜日
- 14. Fluentd in livedoor 2013
16 nodes (Deliver 4 + Worker 10 + Watcher 2)
Fluentd (latest release / trunk)
Ruby based message transfer
daemon
Many plugins from rubygems.org
13年1月21日月曜日
- 16. Processes Overview
Log collection / Archiving
Parse / Transform / Add flags
Load into Hive tables
On-demand queries
Scheduled queries
Stream aggregations + Notifications
13年1月21日月曜日
- 17. Past and present
1st gen: Fully batch (late 2011)
Scribed + Hadoop
2nd gen: Partially stream processing (earlier 2012)
Fluentd + Hadoop
3rd gen: Fully stream processing (late 2012)
Fluentd + Hadoop + Graph Tools
4th gen: New Cluster with CDH4 (earlier 2013)
13年1月21日月曜日
- 19. 1st gen: First impl. Archive
Storage
Web
Servers (scribed)
Scribed
STREAM
(LIBHDFS)
BATCH
Hadoop Cluster hive
server
CDH3b2 Shib
(Hadoop Streaming)
13年1月21日月曜日
- 20. Shib: Hive Web Client
https://github.com/tagomoris/shib
13年1月21日月曜日
- 21. 1st gen: Fully batch
Log collection / Archiving Scribed(libhdfs)
Parse / Transform / Add flags Hadoop
Streaming
Load into Hive tables
HiveServer
On-demand queries + Shib
Scheduled queries
Stream aggregations + Notifications
13年1月21日月曜日
- 22. 1st gen: Fully batch
Simplicity: easy to implement
Shib: easy to run on-demand query
Latency: hourly rotation + import batch
Performance: import batch needs CPU
Scribed: libhdfs dependency problem
13年1月21日月曜日
- 23. 2nd gen: +Fluentd
Archive
Storage
Web
Servers Fluentd (scribed)
Cluster
STREAM
Cludera Hoop
BATCH
Hadoop Cluster hive
server
CDH3u2 Shib
Huahin
(Hive) Manager
13年1月21日月曜日
- 24. Fluentd stream processing
out_exec_filter
any filter programs with STDIN/
STDOUT
compatible with Hadoop Streaming!
out_hoop
output plugin to write HDFS over Hoop
Hoop: a.k.a. HttpFs in Hadoop 2.0.x
13年1月21日月曜日
- 25. Fluentd stream processing
Web Servers
Fluentd worker
Fluentd deliver
Fluentd worker
Fluentd deliver
Fluentd worker
Fluentd deliver
Fluentd worker
Hoop Server
Fluentd worker
HDFS
Fluentd worker
13年1月21日月曜日
- 26. Huahin Manager
REST API for:
JobTracker (MRv1)
ResourceManager (YARN)
HiveServer
http://huahinframework.org/huahin-manager/
13年1月21日月曜日
- 27. 2nd gen: +Fluentd
Log collection / Archiving Fluentd
Parse / Transform / Add flags Fluentd
Load into Hive tables
HiveServer
On-demand queries + Shib
Scheduled queries
Stream aggregations + Notifications
13年1月21日月曜日
- 28. 2nd gen: +Fluentd
Compatibility:
RPC based HDFS/JobTracker Access
Performance: import needs no CPU
(Load Only)
Latency: hourly rotation only
Latency: hourly rotation for any queries
Hoop Server: SPOF / traffic bottleneck
13年1月21日月曜日
- 29. 3rd gen: ++++++
Archive
Storage
Web
Servers Fluentd (scribed)
Cluster
Notifications
STREAM (IRC)
Fluentd
Watchers
Graph
Tools
webhdfs SCHEDULED
BATCH BATCH
Hadoop Cluster hive
server
CDH3u5 Shib ShibUI
Huahin
(Hive) Manager
13年1月21日月曜日
- 30. WebHDFS (CDH3u5 or CDH4)
HttpFs (Hoop) NameNode
DataNode
httpfs
Client
server DataNode
HTTP Java Native DataNode
WebHDFS NameNode
DataNode
Client
DataNode
DataNode
HTTP
13年1月21日月曜日
- 31. Fluentd online aggregation
Semi-realtime aggregation to:
counts errors of HTTP response
calculate avg/%tiles of response time
draw graphs immediately
Many plugins for real time aggregation
13年1月21日月曜日
- 32. Graph Tools:
GrowthForecast / HRForecast
Graph drawing tools to update values
over very simple HTTP request
GrowthForecast: Real-time values
HRForecast: Summarized (past) values
13年1月21日月曜日
- 33. HTTP Status/Response Time
on GrowthForecast
HTTP STATUS: 2XX(BLUE),3XX(GREEN),4XX(ORANGE), 5XX(RED)
HTTP RESPONSE TIMES: AVG, [90, 95, 98, 99]PERCENTILE
http://kazeburo.github.com/GrowthForecast/
13年1月21日月曜日
- 35. ShibUI
https://github.com/kazeburo/hrforecast
13年1月21日月曜日
- 36. 3rd gen: +++++++
Log collection / Archiving Fluentd
Parse / Transform / Add flags Fluentd
Load into Hive tables
HiveServer
On-demand queries + Shib
Scheduled queries ShibUI
Fluentd
Stream aggregations + Notifications
13年1月21日月曜日
- 37. 3rd gen: +++++++
NO SPOF: for data stream
Real time monitoring
Queries for services:
Scheduled queries, Visualization
Latency: hourly rotation for any queries
SPOF: NameNode (VIP & DRBD is xxxx...)
13年1月21日月曜日
- 38. 4th gen: NOW
Archive
Storage
Web
Servers Fluentd (scribed)
Cluster
Notifications
STREAM (IRC)
Fluentd
Watchers
Graph
Tools
webhdfs SCHEDULED
BATCH BATCH
Hadoop Cluster hive
server
CDH4 Shib ShibUI
Huahin
(HDFS, YARN) Manager
13年1月21日月曜日
- 39. 4th gen: CDH4.1.2
NO SPOF: QJM based NameNode HA
Performance: YARN (?)
Latency: multiple rotation in an hour
with hive table schema change
NONE should be improved!
13年1月21日月曜日
- 40. Good parts for solo engineer:
RPC: Loosely-coupled architecture
High compatibility / Low maintenance cost
Open Source
All components are OSS
Open knowledge
Well blogged / presentationed
13年1月21日月曜日
- 41. OUR DRIVER IS
"OPENNESS"
thanks to crouton & @kbysmnr !
13年1月21日月曜日
- 42. Software list:
https://ccp.cloudera.com/display/SUPPORT/Downloads
http://fluentd.org/
http://fluentd.org/plugin/
https://github.com/tagomoris/fluent-agent-lite
https://github.com/tagomoris/shib
https://github.com/tagomoris/shibui
http://huahinframework.org/huahin-manager/
http://kazeburo.github.com/GrowthForecast/
http://github.com/kazeburo/hrforecast
13年1月21日月曜日
- 43. See also:
Hadoop and Subsystem in livedoor (2011)
http://www.slideshare.net/tagomoris/hadoop-and-subsystems-in-livedoor-hcj11f
Distributed message stream processing on Fluentd
http://www.slideshare.net/tagomoris/distributed-stream-processing-on-fluentd-fluentd
Hive Tools in NHN Japan
http://www.slideshare.net/tagomoris/hive-tools-in-nhn-japan-hadoopreading
OSS based large scale log aggregation in livedoor
http://www.slideshare.net/tagomoris/oss-nhntech
Fluentd and WebHDFS
http://www.slideshare.net/tagomoris/fluentd-and-webhdfs
13年1月21日月曜日