SlideShare a Scribd company logo
An Empirical Study of Flash Crowd Dynamics in a P2P-based Live Video Streaming System Bo Li, Gabriel Y. Keung, Susu Xie,  Fangming Liu , Ye Sun, and Hao Yin   Email:  [email_address] Hong Kong University of Science & Technology Dec 2, 2008 @ IEEE GLOBECOM, New Orleans
Overview:   Internet Video Streaming   Enable video distribution from  any place  to  anywhere  in the world in  any format
Cont. Recently, significant deployment in adopting  Peer-to-Peer  (P2P) technology for Internet live video streaming Protocol design:  Overcast, CoopNet, SplitStream, Bullet, and etc. Real deployment:  ESM, CoolStreaming, PPLive, and etc. Key   Requires minimum support from  the infrastructure Greater demands also generate more resources : Each peer not only downloading the video content, but also uploading it to other participants Easy to deploy Good scalability
Challenges Real-time constraints , requiring timely and sustained streaming delivery to all participating peers Performance-demanding , involving bandwidth requirements of hundreds of kilobits per second and even more for higher quality video Large-scale and extreme peer dynamics , corresponding to tens of thousands of users simultaneously participating in the streaming with highly peer dynamics (join and leave at will) especially  flash crowd Real-time constraints Performance-demanding Large-scale and extreme peer dynamics
Motivation Flash crowd A large increase in the number of users joining the streaming in a short period of time ( e.g., during the initial few minutes of a live broadcast program ) Difficult to quickly accommodate new peers within a  stringent time constraint , without significantly impacting the video streaming quality of existing and newly arrived peers Different from file sharing Challenge : Large-scale &  extreme peer dynamics Current P2P live streaming systems still suffer from potentially  long startup delay  & unstable streaming quality Especially under realistic challenging scenarios such as  flash crowd
Focus Cont. Little prior study  on the detailed dynamics of P2P live streaming systems during flash crowd and its impacts E.g., Hei et al. measurement on PPLive, the dynamic of user population during the annual Spring Festival Gala on Chinese New Year How to capture various effects of flash crowd in  P2P live streaming systems? What are the impacts from flash crowd on  user experience & behaviors, and system scale? What are the rationales behind them?
Outline System Architecture Measurement Methodology Important Results Short Sessions under Flash Crowd User Retry Behavior under Flash Crowd System Scalability under Flash Crowd Summary
Some Facts of CoolStreaming System CoolStreaming  Co operative  O ver l ay  Streaming First released in 2004 Roxbeam Inc.  received  USD 30M investment , current through  YahooBB , the largest video streaming portal in Japan 400,000 Google entries (keyword: Coolstreaming) 150,000 Peak-time online user 20,000 Average online user 2,000,000 Download
CoolStreaming System Architecture Membership manager Maintaining partial view of the overlay:  gossip Partnership manager Establishing & maintaining TCP connections ( partnership ) with other nodes Exchanging the  data availability: Buffer Map (BM) Stream manager Providing stream data to local player Making decision where and how to retrieve stream data Hybrid Push & Pull Stream Manager Partner Manager Member Manager BM Segments
Mesh-based (Data-driven) Approaches No explicit structures  are constructed and maintained e.g.,   Coolstreaming ,  PPLive Data flow is guided by the availability of data   Video stream is divided into segments of uniform length, availability of segments in the buffer of a peer is represented by a  buffer map (BM) Periodically exchange data availability info with a set of  partners (partial view of the overlay)  and retrieves currently unavailable data from each other Segment scheduling  algorithm determines which segments are to be fetched from which partners accordingly Overhead & delay : peers need to explore the content availability with one another, which is usually achieved with the use of  gossip  protocol
Measurement Methodology 3 types of status report QoS report % of video data missing the playback deadline Traffic report Partner report 4 events of each session Join event Start subscription event Media player ready event receives sufficient data to start playing Leave event Each user reports its activities & internal status to the log server periodically Using HTTP, peer log compacted into parameter parts of the URL string
Log & Data Collection Real-world traces obtained from a live event broadcast in Japan Yahoo using the CoolStreaming system A sport channel on Sept. 27, 2006 (24 hours) Live baseball game broadcast  at 18:00 Stream bit-rate is  768 Kbps 24 dedicated servers with 100 Mbps connections
How to capture flash crowd effects? Two key measures Short session distribution Counts for those that either  fail to start viewing  a program or the  service is disrupted  during flash crowd Session duration is the time interval between a user joining and leaving the system User retry behavior To cope with the possible service disruption often observed during flash crowd, each peer can re-connect (retry) to the program
Short Sessions under Flash Crowd Filter out normal sessions (i.e., users who successfully join the program)  Focus on short sessions with the duration <= 120 sec and 240 sec No. short session increases significantly at around 18:00 when flash crowd  occurs with a large number of peers joining the live broadcast program
Strong Correlation Between the Number of Short Sessions and Peer Joining Rate
What are the rationales behind these observations? Relevant factors: User client  connection fault Insufficient uploading capacity  from at least one of the parents Poor sustainable bandwidth  at beginning of the stream subscription Long waiting time  (timeout) for cumulating sufficient video content at playback buffer Newly coming peers do not have adequate content to share with others, thus  initially they can only consume the uploading capacity from existing peers With partial knowledge (gossip), the delay to gather enough upload  bandwidth resources among peers and the heavy resource competition  could be the fundamental bottleneck
Approximate User Impatient Time In face of poor playback continuity, users either reconnect or opt to leave Compare the total downloaded  bytes of a session with the expected  total playback video bytes  according to the session duration Extract sessions with insufficient  download bytes The avg. user impatient time  is between 60s to 120s
User Retry Behavior under Flash Crowd Retry rate: count the NO. peers that opt to re-join to the overlay  with same IP address and port per unit time Users could have tried many times to successfully start a video session Again shows that flash crowd has significant impact on the initial joining phase User perspective:  playback could be restored System perspective:  amplify the join rates
System Scalability under Flash Crowd Media player ready Received sufficient  data to start playing Successfully joined The gap illustrates  “ catch up process” Media player ready rate picks up when the flash crowd occurs and  increases steadily; however, the ratio between these two rates <= 0.67 Imply that the system has capability to accommodate a sudden surge  of the user arrivals (flash crowd), but up to some maximum limit
Media Player Ready Time under different time period Considerably longer during the period  when the peer join rate is higher
Scale-Time Relationship System perspective : Though there  could be enough aggregate   resources  brought by newly coming peers,  cannot be utilized immediately It  takes time  for the system to exploit such resources  i.e., newly coming peers (with partial view of overlay) need to find & consume existing resources to obtain adequate content for startup and contribute to others User perspective : Cause long startup delay & disrupted streaming (thus short session, retry, impatience) Future work:  System scale ??? Long    startup delay Short    continuity Amount of initial buffering
Summary Based on real-world measurement,  capture flash crowd effects The system can scale up to a limit during the flash crowd Strong correlation  between the number of short sessions and joining rate The user behavior during flash crowd can be best captured by the  number of short sessions, retries and the impatient time Relevant  rationales  behind these findings
Future work Modeling  to quantify and analyze flash crowd effects Correlation among  initial system capacity , the  user joining   rate/startup delay , and  system scale ?   Intuitively, a larger initial system size can tolerate a higher joining rate Challenge : how to formulate the factors and performance gaps relevant to partial knowledge (gossip)?
Based on the above study, perhaps more importantly for practical  systems, how can servers help alleviate the flash crowd problem, i.e.,  shorten users’ startup delays, boost system scaling? Commercial systems have utilized self-deployed servers or CDN Coolstreaming, Japan Yahoo, 24 servers in different regions that allowed users to join a program in order of seconds PPLive is utilizing the CDN services On measurement , examine what real-world systems do and experience On technical side , derive the relationship between Expected Number of Viewers ??? Amount of  Server Provisioning  along with their joining behaviors Further, how servers are  geographically distributed
References &quot;Inside the New Coolstreaming: Principles, Measurements and Performance Implications,&quot;   B. Li, S. Xie, Y. Qu, Y. Keung, C. Lin, J. Liu, and X. Zhang, in  Proc. of IEEE INFOCOM , Apr. 2008. &quot;Coolstreaming: Design, Theory and Practice,&quot;   Susu Xie, Bo Li, Gabriel Y. Keung, and Xinyan Zhang, in  IEEE Transactions on Multimedia , 9(8): 1661-1671, December 2007 &quot;An Empirical Study of the Coolstreaming+ System,&quot;   Bo Li, Susu Xie, Gabriel Y. Keung, Jiangchuan Liu, Ion Stoica, Hui Zhang, and Xinyan Zhang, in  IEEE Journal on Selected Areas in Communications , 25(9):1-13, December 2007
Q&A Thanks !
Additional Info & Results
Comparison with the first release The initial system adopted a simple pull-based scheme Content availability information exchange using buffer map  Per block overhead Longer delay in retrieving the video content Implemented a  hybrid pull and push  mechanism Pushed by a parent node to a child node except for the first block Lower overhead associated with each video block transmission Reduces the initial delay and increases the video playback quality  Multiple sub-stream  scheme is implemented Enables multi-source and multi-path delivery for video streams Gossip protocol  was enhanced to handle the push function Buffer management and scheduling schemes are re-designed to deal with the dissemination of multiple sub-streams
Gossip-based Dissemination Gossip protocol - used in BitTorrent Iteration Nodes send messages to random sets of nodes Each node does similarly in every round Messages gradually flood the whole overlay Pros: Simple, robust to random failures, decentralized  Cons: Latency trade-off  Related to Coolstreaming Updated membership content Multiple sub-streams
Multiple Sub-streams Video stream is divided into  blocks Each block is assigned a  sequence number An example of stream decomposition  Adoption of the gossip concept from P2P file-sharing application
Buffering Synchronization Buffer Received block firstly put into Syn. Buffer for corresponding sub-stream Blocks with continuous sequence number will be combined  Cache Buffer Combined blocks are stored in Cache Buffer
Comparison with the 1 st  release (II)
Comparison with the 1 st  release (III)
Parent-children and partnership Partners  are connected with TCP connections Parents  are supporting video streams to  children  by TCP connection
System Dynamics
Peer Join and Adaptation Stream bit-rate normalized to ONE Two Sub-streams Weight of node is outgoing bandwidth Node  E  is newly arrival
Peer Adaptation
Peer Adaptation in Coolstreaming Inequality (1) is used to monitor the buffer status of received sub-streams for node  A If this inequality does not hold, it implies that at least one sub-stream is delayed beyond threshold value  T s Inequality (2) is used to monitor the buffer status in the parents of node  A If this inequality does not hold, it implies that the parent node is considerably lagging behind in the number of blocks received when comparing to at least one of the partners, which currently is not a parent node for the given node  A
User Types Distribution
Contribution Index
Conceptual Overlay Topology Source node  “O” Super-peers {A, B, C, D} Moderate-peers   {a} Casual-peers   {b, c, d}
Event Distributions
Media Player Ready Time under different time period
Session Distribution

More Related Content

Oral Presentation

  • 1. An Empirical Study of Flash Crowd Dynamics in a P2P-based Live Video Streaming System Bo Li, Gabriel Y. Keung, Susu Xie, Fangming Liu , Ye Sun, and Hao Yin Email: [email_address] Hong Kong University of Science & Technology Dec 2, 2008 @ IEEE GLOBECOM, New Orleans
  • 2. Overview: Internet Video Streaming Enable video distribution from any place to anywhere in the world in any format
  • 3. Cont. Recently, significant deployment in adopting Peer-to-Peer (P2P) technology for Internet live video streaming Protocol design: Overcast, CoopNet, SplitStream, Bullet, and etc. Real deployment: ESM, CoolStreaming, PPLive, and etc. Key Requires minimum support from the infrastructure Greater demands also generate more resources : Each peer not only downloading the video content, but also uploading it to other participants Easy to deploy Good scalability
  • 4. Challenges Real-time constraints , requiring timely and sustained streaming delivery to all participating peers Performance-demanding , involving bandwidth requirements of hundreds of kilobits per second and even more for higher quality video Large-scale and extreme peer dynamics , corresponding to tens of thousands of users simultaneously participating in the streaming with highly peer dynamics (join and leave at will) especially flash crowd Real-time constraints Performance-demanding Large-scale and extreme peer dynamics
  • 5. Motivation Flash crowd A large increase in the number of users joining the streaming in a short period of time ( e.g., during the initial few minutes of a live broadcast program ) Difficult to quickly accommodate new peers within a stringent time constraint , without significantly impacting the video streaming quality of existing and newly arrived peers Different from file sharing Challenge : Large-scale & extreme peer dynamics Current P2P live streaming systems still suffer from potentially long startup delay & unstable streaming quality Especially under realistic challenging scenarios such as flash crowd
  • 6. Focus Cont. Little prior study on the detailed dynamics of P2P live streaming systems during flash crowd and its impacts E.g., Hei et al. measurement on PPLive, the dynamic of user population during the annual Spring Festival Gala on Chinese New Year How to capture various effects of flash crowd in P2P live streaming systems? What are the impacts from flash crowd on user experience & behaviors, and system scale? What are the rationales behind them?
  • 7. Outline System Architecture Measurement Methodology Important Results Short Sessions under Flash Crowd User Retry Behavior under Flash Crowd System Scalability under Flash Crowd Summary
  • 8. Some Facts of CoolStreaming System CoolStreaming Co operative O ver l ay Streaming First released in 2004 Roxbeam Inc. received USD 30M investment , current through YahooBB , the largest video streaming portal in Japan 400,000 Google entries (keyword: Coolstreaming) 150,000 Peak-time online user 20,000 Average online user 2,000,000 Download
  • 9. CoolStreaming System Architecture Membership manager Maintaining partial view of the overlay: gossip Partnership manager Establishing & maintaining TCP connections ( partnership ) with other nodes Exchanging the data availability: Buffer Map (BM) Stream manager Providing stream data to local player Making decision where and how to retrieve stream data Hybrid Push & Pull Stream Manager Partner Manager Member Manager BM Segments
  • 10. Mesh-based (Data-driven) Approaches No explicit structures are constructed and maintained e.g., Coolstreaming , PPLive Data flow is guided by the availability of data Video stream is divided into segments of uniform length, availability of segments in the buffer of a peer is represented by a buffer map (BM) Periodically exchange data availability info with a set of partners (partial view of the overlay) and retrieves currently unavailable data from each other Segment scheduling algorithm determines which segments are to be fetched from which partners accordingly Overhead & delay : peers need to explore the content availability with one another, which is usually achieved with the use of gossip protocol
  • 11. Measurement Methodology 3 types of status report QoS report % of video data missing the playback deadline Traffic report Partner report 4 events of each session Join event Start subscription event Media player ready event receives sufficient data to start playing Leave event Each user reports its activities & internal status to the log server periodically Using HTTP, peer log compacted into parameter parts of the URL string
  • 12. Log & Data Collection Real-world traces obtained from a live event broadcast in Japan Yahoo using the CoolStreaming system A sport channel on Sept. 27, 2006 (24 hours) Live baseball game broadcast at 18:00 Stream bit-rate is 768 Kbps 24 dedicated servers with 100 Mbps connections
  • 13. How to capture flash crowd effects? Two key measures Short session distribution Counts for those that either fail to start viewing a program or the service is disrupted during flash crowd Session duration is the time interval between a user joining and leaving the system User retry behavior To cope with the possible service disruption often observed during flash crowd, each peer can re-connect (retry) to the program
  • 14. Short Sessions under Flash Crowd Filter out normal sessions (i.e., users who successfully join the program) Focus on short sessions with the duration <= 120 sec and 240 sec No. short session increases significantly at around 18:00 when flash crowd occurs with a large number of peers joining the live broadcast program
  • 15. Strong Correlation Between the Number of Short Sessions and Peer Joining Rate
  • 16. What are the rationales behind these observations? Relevant factors: User client connection fault Insufficient uploading capacity from at least one of the parents Poor sustainable bandwidth at beginning of the stream subscription Long waiting time (timeout) for cumulating sufficient video content at playback buffer Newly coming peers do not have adequate content to share with others, thus initially they can only consume the uploading capacity from existing peers With partial knowledge (gossip), the delay to gather enough upload bandwidth resources among peers and the heavy resource competition could be the fundamental bottleneck
  • 17. Approximate User Impatient Time In face of poor playback continuity, users either reconnect or opt to leave Compare the total downloaded bytes of a session with the expected total playback video bytes according to the session duration Extract sessions with insufficient download bytes The avg. user impatient time is between 60s to 120s
  • 18. User Retry Behavior under Flash Crowd Retry rate: count the NO. peers that opt to re-join to the overlay with same IP address and port per unit time Users could have tried many times to successfully start a video session Again shows that flash crowd has significant impact on the initial joining phase User perspective: playback could be restored System perspective: amplify the join rates
  • 19. System Scalability under Flash Crowd Media player ready Received sufficient data to start playing Successfully joined The gap illustrates “ catch up process” Media player ready rate picks up when the flash crowd occurs and increases steadily; however, the ratio between these two rates <= 0.67 Imply that the system has capability to accommodate a sudden surge of the user arrivals (flash crowd), but up to some maximum limit
  • 20. Media Player Ready Time under different time period Considerably longer during the period when the peer join rate is higher
  • 21. Scale-Time Relationship System perspective : Though there could be enough aggregate resources brought by newly coming peers, cannot be utilized immediately It takes time for the system to exploit such resources i.e., newly coming peers (with partial view of overlay) need to find & consume existing resources to obtain adequate content for startup and contribute to others User perspective : Cause long startup delay & disrupted streaming (thus short session, retry, impatience) Future work: System scale ??? Long  startup delay Short  continuity Amount of initial buffering
  • 22. Summary Based on real-world measurement, capture flash crowd effects The system can scale up to a limit during the flash crowd Strong correlation between the number of short sessions and joining rate The user behavior during flash crowd can be best captured by the number of short sessions, retries and the impatient time Relevant rationales behind these findings
  • 23. Future work Modeling to quantify and analyze flash crowd effects Correlation among initial system capacity , the user joining rate/startup delay , and system scale ? Intuitively, a larger initial system size can tolerate a higher joining rate Challenge : how to formulate the factors and performance gaps relevant to partial knowledge (gossip)?
  • 24. Based on the above study, perhaps more importantly for practical systems, how can servers help alleviate the flash crowd problem, i.e., shorten users’ startup delays, boost system scaling? Commercial systems have utilized self-deployed servers or CDN Coolstreaming, Japan Yahoo, 24 servers in different regions that allowed users to join a program in order of seconds PPLive is utilizing the CDN services On measurement , examine what real-world systems do and experience On technical side , derive the relationship between Expected Number of Viewers ??? Amount of Server Provisioning along with their joining behaviors Further, how servers are geographically distributed
  • 25. References &quot;Inside the New Coolstreaming: Principles, Measurements and Performance Implications,&quot; B. Li, S. Xie, Y. Qu, Y. Keung, C. Lin, J. Liu, and X. Zhang, in Proc. of IEEE INFOCOM , Apr. 2008. &quot;Coolstreaming: Design, Theory and Practice,&quot; Susu Xie, Bo Li, Gabriel Y. Keung, and Xinyan Zhang, in IEEE Transactions on Multimedia , 9(8): 1661-1671, December 2007 &quot;An Empirical Study of the Coolstreaming+ System,&quot; Bo Li, Susu Xie, Gabriel Y. Keung, Jiangchuan Liu, Ion Stoica, Hui Zhang, and Xinyan Zhang, in IEEE Journal on Selected Areas in Communications , 25(9):1-13, December 2007
  • 27. Additional Info & Results
  • 28. Comparison with the first release The initial system adopted a simple pull-based scheme Content availability information exchange using buffer map Per block overhead Longer delay in retrieving the video content Implemented a hybrid pull and push mechanism Pushed by a parent node to a child node except for the first block Lower overhead associated with each video block transmission Reduces the initial delay and increases the video playback quality Multiple sub-stream scheme is implemented Enables multi-source and multi-path delivery for video streams Gossip protocol was enhanced to handle the push function Buffer management and scheduling schemes are re-designed to deal with the dissemination of multiple sub-streams
  • 29. Gossip-based Dissemination Gossip protocol - used in BitTorrent Iteration Nodes send messages to random sets of nodes Each node does similarly in every round Messages gradually flood the whole overlay Pros: Simple, robust to random failures, decentralized Cons: Latency trade-off Related to Coolstreaming Updated membership content Multiple sub-streams
  • 30. Multiple Sub-streams Video stream is divided into blocks Each block is assigned a sequence number An example of stream decomposition Adoption of the gossip concept from P2P file-sharing application
  • 31. Buffering Synchronization Buffer Received block firstly put into Syn. Buffer for corresponding sub-stream Blocks with continuous sequence number will be combined Cache Buffer Combined blocks are stored in Cache Buffer
  • 32. Comparison with the 1 st release (II)
  • 33. Comparison with the 1 st release (III)
  • 34. Parent-children and partnership Partners are connected with TCP connections Parents are supporting video streams to children by TCP connection
  • 36. Peer Join and Adaptation Stream bit-rate normalized to ONE Two Sub-streams Weight of node is outgoing bandwidth Node E is newly arrival
  • 38. Peer Adaptation in Coolstreaming Inequality (1) is used to monitor the buffer status of received sub-streams for node A If this inequality does not hold, it implies that at least one sub-stream is delayed beyond threshold value T s Inequality (2) is used to monitor the buffer status in the parents of node A If this inequality does not hold, it implies that the parent node is considerably lagging behind in the number of blocks received when comparing to at least one of the partners, which currently is not a parent node for the given node A
  • 41. Conceptual Overlay Topology Source node “O” Super-peers {A, B, C, D} Moderate-peers {a} Casual-peers {b, c, d}
  • 43. Media Player Ready Time under different time period

Editor's Notes

  1. Peer-assisted live video streaming is another attractive service in the Internet: 1) Take PPLive, one of the most popular peer-assisted streaming systems nowadays, as an example: at the end of 2005, it has 20 million download and 1 million; independent viewers per day [2]. According to [4], it supports over 200 ; 000 concurrent users at bit rate in the 400-800 Kbps for 2006 Spring Festival Gala on Chinese New Year on January 28, 2006. In 2007, the number of concurrent users for the most popular PPLive session raises to 1 : 5 million [3]. This corresponds to an aggregate bit rate in the vicinity of 600 Gbps, or 540 TB transferred per the 2 hour event. Actually, this type of service but with more challenges than that of file sharing: First,
  2. , either peers joining process is stretched over a longer period of time or application itself can tolerate much longer delay , the dynamic of user population during annual Spring Festival Gala on Chinese New Year
  3. In face of frequent peer churns, the maintenance of streaming tree(s) in tree-based approaches is still challenging, and the recovery of tree(s) incurs extra cost. Recently, mesh-based approaches (also referred to as data-driven approaches) have been adopted in many large-scale peer-assisted live video streaming systems, such as Coolstreaming [32], PPLive, UUSee, and etc. In contrast with tree-based approaches, mesh-based overlay designs do not construct and maintain an explicit structure for delivering data. 4) Generally, a mesh-based streaming system has a tracker to keep track of peers in the video session. 5) A peer may download/upload segments from/to multiple partners simultaneously
  4. over 20% of the users have tried 1 or 2 times in order to successfully start a video session. Hence, a flash crowd has significant impact on the initial joining phase in a P2P streaming system.
  5. it takes longer time for a newly joined peer to obtain video stream
  6. Besides the above, there are also open issues relevant to multiple channels, ISPs which is covered in my report, but not shown here due to time limit