SlideShare a Scribd company logo
Blake Crosby Julian Dunn Media Operations and Technology CBC/Radio-Canada Cache Optimization & Origin Infrastructure Reduction Using Akamai Site Delivery
About CBC Canada’s public broadcaster offering services in both English, French, and eight other aboriginal languages. Our services are offered in the following formats: Radio (FM, AM, Satellite, and Short wave) Television (digital cable and OTA. Both standard and high definitions) Online (cbc.ca and radio-canada.ca)
Agenda Why optimize? Requirements Lessons learned at CBC How to optimize Current performance
Why Use Akamai Site Delivery? Reduce origin footprint and costs Capital expenses and replacement Operating (systems admin, power, maintenance) Scalability Akamai provides CBC with “unlimited” capacity and scaling Speed Last mile acceleration by being as close as possible to end users
Conflicting Requirements Business: “Make the content as fresh as possible.” IT: “Keep the configuration as simple as possible.” Finance: “Keep our costs as low as possible.”
The Optimization Spectrum Two ends of the spectrum Tune to the Nth degree High origin offload Content will not appear very fresh due to long TTLs Complicated tuning Low risk to origin due to spikes in traffic. Don’t tune at all Lower origin offload Content appears very fresh No tuning parameters High risk to origin due to spikes in traffic
Before Tuning, Understand Your Business and Content CBC is primarily a news site. Spiky traffic patterns Content has to be fresh Content almost never changes after the first 24-48 hours Content is not targeted to individuals No sessions Personalization is done via Javascript Deliberate architectural choices: Site is mostly static by design Server Side Includes technology used heavily Executing application layer code for identical requests is wasteful The lessons we teach here are domain specific. Understand Your Content!
Ultimate Priorities Origin stability  Content freshness Performance Cost
Overall Lessons Learned Don’t ignore the architecture of your origin. Keep caching rules simple Is the default “good enough”? Tune at the origin first rather than at the edge Reduces propagation time for TTLs during emergencies. Understand and categorize content before tuning. Understand what “TTL” actually means Leads to us heavily leveraging If-Modified-Since
How We Did It Almost no tuning Default blanket site TTL of 20s for all objects except HTML; 120s for text/html Most objects have no explicit TTL beyond this Heavy leveraging of If-Modified-Since After TTL expiry, Akamai will issue GET with an If-Modified-Since header Origin returns 304 Not Modified if the object hasn’t been changed 304s account for >85% of our origin hits
Typical Origin Response Codes
How We Did It Control TTLs at origin. Fine grain control of object TTLs Instant results, no need to wait for Akamai configuration to deploy. Ability to adjust TTLs on the fly during special events or emergency situations. Using Apache and mod_expires: <Location &quot;/includes&quot;> ExpiresByType &quot;text/css&quot; &quot;access plus 1 hour&quot; ExpiresByType &quot;application/x-javascript&quot; &quot;access plus 1 hour&quot; ExpiresByType &quot;image/gif&quot; &quot;access plus 1 hour&quot; </Location>
How We Did It Categorize your content. We’ve categorized our content into three buckets: frequently changing, moderately changing, and never changing. TTLs applied site wide to objects that fall into these categories. Some Examples: Frequently Changing: XML Data Files (live feeds) Moderately Changing HTML Files (news stories) XML Data Files (weather feeds) Images Never Changing Common site elements (navigation) Javascript/CSS favicon.ico
How We Did It Enable Last Mile Acceleration. Using LMA allows Akamai to deliver content as compressed objects to the end user from the edge. We also use mod_deflate to compress our content and deliver it to Akamai.
How We Did It Use All Available Cache Headers! Cache-Control The TTL is located in the “max-age” parameter “ public” should also be specified to ensure that other caches cache the content. Expires The date the object expires Last-Modified The last-modified header is sent with all objects with the exception of dynamic content and HTML (explained later). E-Tag (be careful!) E-Tags are a “hash” of the files’ inode and last modified time.
How We Did It Content-Type: text/css Last-Modified: Mon, 07 Jun 2010 15:59:39 GMT Etag: &quot;8a14b9cd-84a-c29470c0&quot; Cache-Control: public, max-age=1557 Expires: Tue, 06 Jul 2010 21:38:51 GMT Date: Tue, 06 Jul 2010 21:12:54 GMT X-Cache: TCP_IMS_HIT from a72-246-43-6 (AkamaiGHost/5.9.4-6329478) (-) X-Cache-Key: /L/1849/9617/0s/www.cbc.ca/includes/objects/program-switcher/switcher.css X-True-Cache-Key: /L/www.cbc.ca/includes/objects/program-switcher/switcher.css X-Akamai-Session-Info: name=PARENT_SETTING; value=TD X-Serial: 1849 Connection: keep-alive Vary: Accept-Encoding X-Check-Cacheable: YES
Some Gotchas and Annoyances There are some situations which make controlling caching difficult. Here is how we’ve dealt with them. Server Side Includes Apache cannot calculate the last modified timestamp for files with server side includes (roughly 99% of all html pages on cbc.ca). We’ve applied a blanket 2 min TTL on all files with the “text/html” content type Geolocation Server side customization of content will not work as it will be cached for  all  users. We’ve mitigated this by having all customization occur on the client side. If a cookie does not exist, then the browser makes a POST request (which isn’t cached) to the origin to set a cookie which contains the users location.
Tools To Help You Use the Akamai Portal (Akamai Edge Control) The “origin OK volume per URL” report is handy in determining which objects are being fetched from the origin. 404s are not cached. Check the “top URLs, by number of errors” to help you track down broken links. Firefox Plugins Firebug  allows you to view HTTP request and response headers to troubleshoot caching problems. Akamai Headers  is a plugin that will allow you to view Akamai specific information such as the cache key and which Ghost server responded to your request. Available from Akamai Edge Control. Other Resources Web Caching By Dune Wessels (O’Reilly and Associates)
Examples (Quebec Earthquake: June 23, 2010)
Examples (Quebec Earthquake: June 23, 2010)
Examples (Quebec Earthquake: June 23, 2010)
Examples (FIFA World Cup: June 11, 2010)
Examples (FIFA World Cup: June 11, 2010)
The Outcome Almost no tunables used Meet the requirements using <25 lines of tunables No edge tunables in our Akamai metadata config Capital Expenditure avoidance Ran with six origin webservers from 2003-2010 Today we run nine origin webservers at about 40% CPU utilization
Questions Thank You!

More Related Content

Cache Optimization with Akamai

  • 1. Blake Crosby Julian Dunn Media Operations and Technology CBC/Radio-Canada Cache Optimization & Origin Infrastructure Reduction Using Akamai Site Delivery
  • 2. About CBC Canada’s public broadcaster offering services in both English, French, and eight other aboriginal languages. Our services are offered in the following formats: Radio (FM, AM, Satellite, and Short wave) Television (digital cable and OTA. Both standard and high definitions) Online (cbc.ca and radio-canada.ca)
  • 3. Agenda Why optimize? Requirements Lessons learned at CBC How to optimize Current performance
  • 4. Why Use Akamai Site Delivery? Reduce origin footprint and costs Capital expenses and replacement Operating (systems admin, power, maintenance) Scalability Akamai provides CBC with “unlimited” capacity and scaling Speed Last mile acceleration by being as close as possible to end users
  • 5. Conflicting Requirements Business: “Make the content as fresh as possible.” IT: “Keep the configuration as simple as possible.” Finance: “Keep our costs as low as possible.”
  • 6. The Optimization Spectrum Two ends of the spectrum Tune to the Nth degree High origin offload Content will not appear very fresh due to long TTLs Complicated tuning Low risk to origin due to spikes in traffic. Don’t tune at all Lower origin offload Content appears very fresh No tuning parameters High risk to origin due to spikes in traffic
  • 7. Before Tuning, Understand Your Business and Content CBC is primarily a news site. Spiky traffic patterns Content has to be fresh Content almost never changes after the first 24-48 hours Content is not targeted to individuals No sessions Personalization is done via Javascript Deliberate architectural choices: Site is mostly static by design Server Side Includes technology used heavily Executing application layer code for identical requests is wasteful The lessons we teach here are domain specific. Understand Your Content!
  • 8. Ultimate Priorities Origin stability Content freshness Performance Cost
  • 9. Overall Lessons Learned Don’t ignore the architecture of your origin. Keep caching rules simple Is the default “good enough”? Tune at the origin first rather than at the edge Reduces propagation time for TTLs during emergencies. Understand and categorize content before tuning. Understand what “TTL” actually means Leads to us heavily leveraging If-Modified-Since
  • 10. How We Did It Almost no tuning Default blanket site TTL of 20s for all objects except HTML; 120s for text/html Most objects have no explicit TTL beyond this Heavy leveraging of If-Modified-Since After TTL expiry, Akamai will issue GET with an If-Modified-Since header Origin returns 304 Not Modified if the object hasn’t been changed 304s account for >85% of our origin hits
  • 12. How We Did It Control TTLs at origin. Fine grain control of object TTLs Instant results, no need to wait for Akamai configuration to deploy. Ability to adjust TTLs on the fly during special events or emergency situations. Using Apache and mod_expires: <Location &quot;/includes&quot;> ExpiresByType &quot;text/css&quot; &quot;access plus 1 hour&quot; ExpiresByType &quot;application/x-javascript&quot; &quot;access plus 1 hour&quot; ExpiresByType &quot;image/gif&quot; &quot;access plus 1 hour&quot; </Location>
  • 13. How We Did It Categorize your content. We’ve categorized our content into three buckets: frequently changing, moderately changing, and never changing. TTLs applied site wide to objects that fall into these categories. Some Examples: Frequently Changing: XML Data Files (live feeds) Moderately Changing HTML Files (news stories) XML Data Files (weather feeds) Images Never Changing Common site elements (navigation) Javascript/CSS favicon.ico
  • 14. How We Did It Enable Last Mile Acceleration. Using LMA allows Akamai to deliver content as compressed objects to the end user from the edge. We also use mod_deflate to compress our content and deliver it to Akamai.
  • 15. How We Did It Use All Available Cache Headers! Cache-Control The TTL is located in the “max-age” parameter “ public” should also be specified to ensure that other caches cache the content. Expires The date the object expires Last-Modified The last-modified header is sent with all objects with the exception of dynamic content and HTML (explained later). E-Tag (be careful!) E-Tags are a “hash” of the files’ inode and last modified time.
  • 16. How We Did It Content-Type: text/css Last-Modified: Mon, 07 Jun 2010 15:59:39 GMT Etag: &quot;8a14b9cd-84a-c29470c0&quot; Cache-Control: public, max-age=1557 Expires: Tue, 06 Jul 2010 21:38:51 GMT Date: Tue, 06 Jul 2010 21:12:54 GMT X-Cache: TCP_IMS_HIT from a72-246-43-6 (AkamaiGHost/5.9.4-6329478) (-) X-Cache-Key: /L/1849/9617/0s/www.cbc.ca/includes/objects/program-switcher/switcher.css X-True-Cache-Key: /L/www.cbc.ca/includes/objects/program-switcher/switcher.css X-Akamai-Session-Info: name=PARENT_SETTING; value=TD X-Serial: 1849 Connection: keep-alive Vary: Accept-Encoding X-Check-Cacheable: YES
  • 17. Some Gotchas and Annoyances There are some situations which make controlling caching difficult. Here is how we’ve dealt with them. Server Side Includes Apache cannot calculate the last modified timestamp for files with server side includes (roughly 99% of all html pages on cbc.ca). We’ve applied a blanket 2 min TTL on all files with the “text/html” content type Geolocation Server side customization of content will not work as it will be cached for all users. We’ve mitigated this by having all customization occur on the client side. If a cookie does not exist, then the browser makes a POST request (which isn’t cached) to the origin to set a cookie which contains the users location.
  • 18. Tools To Help You Use the Akamai Portal (Akamai Edge Control) The “origin OK volume per URL” report is handy in determining which objects are being fetched from the origin. 404s are not cached. Check the “top URLs, by number of errors” to help you track down broken links. Firefox Plugins Firebug allows you to view HTTP request and response headers to troubleshoot caching problems. Akamai Headers is a plugin that will allow you to view Akamai specific information such as the cache key and which Ghost server responded to your request. Available from Akamai Edge Control. Other Resources Web Caching By Dune Wessels (O’Reilly and Associates)
  • 22. Examples (FIFA World Cup: June 11, 2010)
  • 23. Examples (FIFA World Cup: June 11, 2010)
  • 24. The Outcome Almost no tunables used Meet the requirements using <25 lines of tunables No edge tunables in our Akamai metadata config Capital Expenditure avoidance Ran with six origin webservers from 2003-2010 Today we run nine origin webservers at about 40% CPU utilization

Editor's Notes

  1. Emphasize that we’re primarily a news site and this has certain implications for the longevity of our content – more later…
  2. Talk about why we optimize too. These are reasons for choosing Akamai DSD but we also optimize to hold the line on CapEx and OpEx growth.
  3. Somewhere in here we want to talk about h uch of the CDN’s value is only realized when it serves &gt; 50% of your content.
  4. At the end of this we segue into the next slide by saying you place the needle on the spectrum by understanding your business context…
  5. Content is frequently updated when news breaks Old news is irrelevant news – access patterns bear this out
  6. The site – and the origin – must stay up no matter how much traffic hits. After that, the content must be fresh. To the business, stale content is worse than no content at all. Cost is mitigated through non-technical means, I.e. choosing a 95 th percentile billing model for DSD traffic so we don’t get hammered on cost by spikes.
  7. We’ve been gradually refining our approach, tools, techniques and architecture over the last ten years and we’ve learned a lot. Executing dynamic code for identical requests is silly (Newsdelivery) Don’t ignore architectural changes that you should make at the origin – CDN can only help so much. If your origin stinks, the edge will too. TTL means the time that Akamai will wait before checking the origin for freshness – doesn’t mean that the object will necessarily be retrieved, if it is already in-cache and fresh.
  8. Need to validate these numbers.
  9. Point out that 83.6% of the requests are 304 Not-Modified, which bears out our strategy of heavily using If-Modified-Since