SlideShare a Scribd company logo
Scalable architecture




                By Adam Brodziak
                Global Sports Media b.v.
Abstract

   Adam Brodziak


An overview of modern web-based application
architecture - from hardware infrastructure,
through PHP/SQL code, HTML/CSS markup
distribution. All of this spiced up by cache,
loadbalancing and CDN.
Who is this guy?

   Lead developer at Global Sports Media
       GSM collects and process sports data
       GSM owns soccerway.com portal
   Linux user
   Interested in frameworks, design patterns
   Semantic Web enthousiast
   Football (soccer) fan
Topics

   The Challenge
   Infrastructure
   Code
   Cache
   CDN
Topics

   The Challenge
   Infrastructure
   Code
   Cache
   CDN
Raw numbers

   7 millions visits / month
   52 millions pageviews / month
   1 billion request / month
   6TB of traffic / month
   300k users at peak time
   Quite a few clients using the same hardware
Not so much, but...

                     700 leagues
                     Livescores
                     Game events
                     Match statistics
                     Rankings
                     Editorials
Traffic growth
The Challenge

   Loads of data to process
       Scores
       Events
       Stats
   In real-time (livescores)
   Growing number of visitors
   13K hits/sec at peak-time
10 servers to run it all
Topics

   The Challenge
   Infrastructure
   Code
   Cache
   CDN
It starts with one
Load balancing
Loadbalancing caveats

   Don't relay on the local filesystem
       Temporary files, session, logs
   Avoid assuming exclusive/single cache
       APC, Zend Cache
   Use distributed session storage
       Memcache, database
   Encalsulate above
Separate database server
DB replication
Replicaton caveats

   Writes only on master
   Reads from slaves
   Data consistency
   Replication lag
       Don't do
$master->query('UPDATE session SET logged = 1');
$slave->query('SELECT logged FROM session');
Whole image
Topics

   The Challenge
   Infrastructure
   Code
   Cache
   CDN
PHP is slow!

   Yes, but it does not matter!
   Database access is slower
   Cache over network is slower
   Disk access is slower
   HTTP requests are slower
   Webservice calls are slower
   Discover bottlenecks before blaming PHP
It's about architecture

   Heavy tasks in background
       CRON, Gearman
   Pregenerate stuff
   Move some code to SQL
       Calculations in queries
       Stored procedures
       Triggers
   C/C++ or Java for heavy computation
   Use PHP to glue it together
PHP Frameworks




   Hundreds of others
   Which one to choose?
Framework? Think again!

   Raw performance matters
   Support for master-slave replication
   Multiple layers of cache
   Working with accelerators (HipHop!)
   Beware of bottlenecks
       i.e. core part of framework is slow
   Designed to scale
Topics

   The Challenge
   Infrastructure
   Code
   Cache
   CDN
Cache is everywhere

   CPU: L1, L2
   Disk buffer
   Linux filesystem
   MySQL
   PHP (APC)
   Smarty
   HTTP Proxy
   Browser cache
Where to cache?
Memory is cheap

   Pre-generate stuff
   Store results in memory
       APC, memcached
   App config in memory
       APC with stat=off
   Increase RAM for MySQL
   Disk is the new tape
Memcached for the rescue!

   Dead simple
   Key-value
   Distributed storage pool
   Automatic invalidation after X sec
       No garbage collecting invoked
   Store arrays, objects, simple values
   Easy integration
Topics

   The Challenge
   Infrastructure
   Code
   Cache
   CDN
Reverse-proxy

   First line of cache
   Returns content if resource is up-to-date
   Works on HTTP level
       Can be integrated into existing infrastructure
   Can do load balancing
   In-memory cache storage
   Squid, Nginx, Varnish
Content Delivery Network

   Network of servers
   Worldwide
   Automatic loadbalancing
   Fast access (low ping time)
   Data redundancy gratis
   Ideal for static resources
       But not only
   Must-have for worldwide websites
CDN as reverse-proxy

   HTTP request / response chain
   Embraces REST architecture
   Requests are distributed
   Reduces latency
   Lowers traffic volume
   Increases availability
   i.e. Akamai Edge Suite
CDN at soccerway.com

   All of the content is served via CDN
       Images, CSS, JS
       Generated HTML
       JSON for Ajax
   90% of traffic via CDN
   Origin requests only from Europe
   Site online even if servers are down
   Can't live without ;)
Thank you for listening




             Questions?
Interested?

   Contact me:
       adam@globalsportsmedia.com
       www.goldenline.pl/adam-brodziak
       www.linkedin.com/in/adambrodziak
   We're hiring!
       Web developers
       Football / sport fans

More Related Content

Skalowalna architektura na przykładzie soccerway.com

  • 1. Scalable architecture By Adam Brodziak Global Sports Media b.v.
  • 2. Abstract  Adam Brodziak An overview of modern web-based application architecture - from hardware infrastructure, through PHP/SQL code, HTML/CSS markup distribution. All of this spiced up by cache, loadbalancing and CDN.
  • 3. Who is this guy?  Lead developer at Global Sports Media  GSM collects and process sports data  GSM owns soccerway.com portal  Linux user  Interested in frameworks, design patterns  Semantic Web enthousiast  Football (soccer) fan
  • 4. Topics  The Challenge  Infrastructure  Code  Cache  CDN
  • 5. Topics  The Challenge  Infrastructure  Code  Cache  CDN
  • 6. Raw numbers  7 millions visits / month  52 millions pageviews / month  1 billion request / month  6TB of traffic / month  300k users at peak time  Quite a few clients using the same hardware
  • 7. Not so much, but...  700 leagues  Livescores  Game events  Match statistics  Rankings  Editorials
  • 9. The Challenge  Loads of data to process  Scores  Events  Stats  In real-time (livescores)  Growing number of visitors  13K hits/sec at peak-time
  • 10. 10 servers to run it all
  • 11. Topics  The Challenge  Infrastructure  Code  Cache  CDN
  • 14. Loadbalancing caveats  Don't relay on the local filesystem  Temporary files, session, logs  Avoid assuming exclusive/single cache  APC, Zend Cache  Use distributed session storage  Memcache, database  Encalsulate above
  • 17. Replicaton caveats  Writes only on master  Reads from slaves  Data consistency  Replication lag  Don't do $master->query('UPDATE session SET logged = 1'); $slave->query('SELECT logged FROM session');
  • 19. Topics  The Challenge  Infrastructure  Code  Cache  CDN
  • 20. PHP is slow!  Yes, but it does not matter!  Database access is slower  Cache over network is slower  Disk access is slower  HTTP requests are slower  Webservice calls are slower  Discover bottlenecks before blaming PHP
  • 21. It's about architecture  Heavy tasks in background  CRON, Gearman  Pregenerate stuff  Move some code to SQL  Calculations in queries  Stored procedures  Triggers  C/C++ or Java for heavy computation  Use PHP to glue it together
  • 22. PHP Frameworks  Hundreds of others  Which one to choose?
  • 23. Framework? Think again!  Raw performance matters  Support for master-slave replication  Multiple layers of cache  Working with accelerators (HipHop!)  Beware of bottlenecks  i.e. core part of framework is slow  Designed to scale
  • 24. Topics  The Challenge  Infrastructure  Code  Cache  CDN
  • 25. Cache is everywhere  CPU: L1, L2  Disk buffer  Linux filesystem  MySQL  PHP (APC)  Smarty  HTTP Proxy  Browser cache
  • 27. Memory is cheap  Pre-generate stuff  Store results in memory  APC, memcached  App config in memory  APC with stat=off  Increase RAM for MySQL  Disk is the new tape
  • 28. Memcached for the rescue!  Dead simple  Key-value  Distributed storage pool  Automatic invalidation after X sec  No garbage collecting invoked  Store arrays, objects, simple values  Easy integration
  • 29. Topics  The Challenge  Infrastructure  Code  Cache  CDN
  • 30. Reverse-proxy  First line of cache  Returns content if resource is up-to-date  Works on HTTP level  Can be integrated into existing infrastructure  Can do load balancing  In-memory cache storage  Squid, Nginx, Varnish
  • 31. Content Delivery Network  Network of servers  Worldwide  Automatic loadbalancing  Fast access (low ping time)  Data redundancy gratis  Ideal for static resources  But not only  Must-have for worldwide websites
  • 32. CDN as reverse-proxy  HTTP request / response chain  Embraces REST architecture  Requests are distributed  Reduces latency  Lowers traffic volume  Increases availability  i.e. Akamai Edge Suite
  • 33. CDN at soccerway.com  All of the content is served via CDN  Images, CSS, JS  Generated HTML  JSON for Ajax  90% of traffic via CDN  Origin requests only from Europe  Site online even if servers are down  Can't live without ;)
  • 34. Thank you for listening Questions?
  • 35. Interested?  Contact me:  adam@globalsportsmedia.com  www.goldenline.pl/adam-brodziak  www.linkedin.com/in/adambrodziak  We're hiring!  Web developers  Football / sport fans