SlideShare a Scribd company logo
HIGH PERFORMANCE
ON DRUPAL 7 - AN
ANATOMY OF A SITE

Kalle Varisvirta
Technology Director
Designing for high
performance
 The process is usually the same for major
  refactoring and building a new site for high
  performance
 It’s always easier to replace an existing site,
  because you have real data
 Creating a high performance site on some
  estimations from a customer might get you
  pretty far away from the actual needed solution
Designing for high
performance
 For this session, we’ll imagine a situation where
  we have an existing site with actual data
  available
 The recent case where we were working with
  this kind of a design was exactly that: a well
  matured site (running sine 1998!) going to be
  reincarnated for the fourth time
SO, WE HAVE A
PERFORMANCE PROBLEM
First look: identify the
problem
 When a site is not performing well, it can be
  caused by numerous different reasons
 Analyze it
   Profile under load
   Look at the logs
   Look at the server loads under load
First look: identify the
problem
 Make sure you’re not hitting some simple
  bottleneck
   Too many running services on a single hardware
   A crazy database query killing the site
   Broken router causing 3 sec delay to every request
    (seen that, for real)
   And many, many others
Problem identified
 When you’ve arrived to the conclusion that you
  actually have too much volume, then figure out
  of what?
   Too much content? I’ve seen 12 million nodes plus
    60 million comments on a single installation, that’s a
    lot.
   Too many requests per second? Make sure they are
    page requests. Statics can be easily fixed, look at
    cache headers, aggregation, Varnish, Nginx, CDNs.
Problem identified
 Too many Drupal page requests per second?
   Anonymous?
      If anonymous, it’s usually easy to fix, as long as it’s
       cacheable. We’ll go into the whole “cacheable” thing later.
      If it’s cacheable, look at page cache, Boost, Varnish, CDNs.
   Logged in?
      Drupal cache is turning off, and the calls are bypassing all
       the caches
      This usually is a more difficult problem to solve
Problem identified: too
many logged in users
There’s still one case that’s pretty common and still
easy enough to solve:


  logged in users with small amount of
          personalized content
 (small in percentage of the CPU cost of building the page in Drupal)
Problem identified: too
many logged in users

                                     logged in as
                                         user


                                      highlights:
  content area: common content for     common
             everybody


                                     your friends’
                                       favorites
Problem identified: too
many logged in users
 Let’s make a couple pre-requisite conditions
   You’re running on your own environment
   You have Varnish configured in front of the Drupal
    site
   You have some skills in programming with Drupal


  You got all of this? Ok, let’s
  continue.
it’s time for

CACHE
CONTROL
drupal.org/project/cache_control
What’s Cache Control
 It’s similar to ESI module with some benefits
 It’s mainly directed to cache blocks or block-like
  content on the page
 It needs some programming usually
 When dealing with an optimal problem for it, it’s
  the optimal solution and will make your site
  faster by magnitudes
DRUPAL     User first gets the common page
               for everybody from Varnish

               Then a javascript routine checks
               whether the user is logged in or
VARNISH        not

               The javascript either makes the
               hidden for-anonymous content
       USER    visible or fetches this user’s
     BROWSER   content with a ajax request
Problem identified: too
many logged in users

                                     logged in as
                                       login box
                                          user


                                      highlights:
  content area: common content for     common
             everybody


                                     your friends’
                                      staff picks
                                       favorites
High Performance on Drupal 7
Benefits of Cache Control
 Burdens the back-end significantly less due to
  only loading the needed parts
 Loads multiple blocks and/or areas with a single
  request
 Gives the user something to look at while
  loading the hard parts of the page – and it does
  make the site feel faster
 Plays well with some other modules, like
  captcha etc.
What about ESI
 ESI (Edge Side Includes) is a partial loading
  technique supported by Varnish and some CDNs,
  e.g. Akamai
 It basically makes Varnish do the partial page
  loading
   Varnish first fetches the common version from cache
   Then it looks though the page to see any ESI markup
   Then it loads all the ESI marked parts of the page from
    cache or from the Drupal
How is Cache Control
different than ESI
 ESI needs to wait until the whole page is loaded
  before giving anything to the user
 ESI loads all the portions of the page (still in D7,
  this might change in D8) in separate http
  requests, thus burdening the server with even
  more bootstraps than without any cache
HEY… HOW ABOUT THAT
USER GENERATED
CONTENT THAT MAKES
VARNISH PURGE
EVERYTHING ALL THE
TIME?
Different problem
 As stated, Cache Control works well for specific
  problems, but that also is in trouble when the
  Varnish cache gets purged all the time
 That usually happens on a really UGC (User
  Generated Content) oriented site
Different problem: UGC
When a single page on a site gets new content
every 2-30 seconds
 Caching is of no use, purging multiple pages on
  that rate makes no sense
 You need that data to have a way of refreshing
  even more frequently
 And we’re still talking about a page that doesn’t
  update after it has loaded (so no Socket.IO stuff
  on this slide deck, sorry)
Different problem: UGC

                                      logged in as
                                          user

  content area: common content for
             everybody                 highlights:
                                        common
  and this is getting updates every
             30 seconds
                                      your friends’
                                        favorites
Solution: A new cache
layer
 We add a new, fast-paced cache layer on the
  page
 We’ll try to purge and reload that cache as fast a
  humanly possible in Drupal
 We’ll minimize our efforts on the backend
Solution: A new cache
layer
 Let’s load the whole page from Varnish and the
  refresh the fast-paged part with javascript
 To minimize the load on the backend, skip the
  theming layer and just load JSON
 Sound good?
Solution: A new cache
layer
 Until you realize you have to theme everything in
  the Javascript and that’s not fun
 Even if you use a javascript templating engine,
  you still have to keep your themes up to date in
  two places
Let’s pull out

front
themer
drupal.org/project/front_themer
Front themer
 When theming in Javascript, Front themer
  makes your life a bit easier
 It allows you to map your Drupal theme’s theme
  implementations to very simple Javascript
  versions
 It’s designed to help out with simple elements,
  such as boxes and lists
 It might need you to tweak your theming
  functions a bit to make them work better with it
Solution: A new cache
layer
 And the back-end?
 Exove has a module coming out to help get
  grouped and cached JSON outputs fast from
  Views
 It’s not something to be used for integrations but
  just for the faster cache layer
 Going to be released during this fall with a site
  using it
 Until that, just use Views and Views datasource
SO, WAIT A MINUTE

THESE ARE ALL HACKS,
RIGHT?
Not quite.
Drupal doing high
performance
 You can’t really use Drupal for high performance
  out of the box
 Hacks, or actually extensions are needed and if
  done as proper contribs, are safe and
  convenient to use
 Drupal has been made extensible for this exact
  reason, it can be made better by extending it
What would we like to see
in Drupal 8
 We’d like to see a real JSON output from
  Drupal, preferably by piece by piece content
 We’d also like to see a thinner bootstrap with
  lazy-loading for pretty much everything
 REST interface for doing more stuff in the front,
  e.g. with JS frameworks
You can see a pattern here. This is all
covered by the WSSCI and Scotch
iniatives. We’re waiting for Drupal 8 to
be a lot better.

And Cache Control is going to rock on Drupal 8.
and there are always going to be hacks
to get Drupal to do more
THANK YOU FOR YOUR
TIME


PS. We’re hiring. www.exove.fi/careers

More Related Content

High Performance on Drupal 7

  • 1. HIGH PERFORMANCE ON DRUPAL 7 - AN ANATOMY OF A SITE Kalle Varisvirta Technology Director
  • 2. Designing for high performance  The process is usually the same for major refactoring and building a new site for high performance  It’s always easier to replace an existing site, because you have real data  Creating a high performance site on some estimations from a customer might get you pretty far away from the actual needed solution
  • 3. Designing for high performance  For this session, we’ll imagine a situation where we have an existing site with actual data available  The recent case where we were working with this kind of a design was exactly that: a well matured site (running sine 1998!) going to be reincarnated for the fourth time
  • 4. SO, WE HAVE A PERFORMANCE PROBLEM
  • 5. First look: identify the problem  When a site is not performing well, it can be caused by numerous different reasons  Analyze it  Profile under load  Look at the logs  Look at the server loads under load
  • 6. First look: identify the problem  Make sure you’re not hitting some simple bottleneck  Too many running services on a single hardware  A crazy database query killing the site  Broken router causing 3 sec delay to every request (seen that, for real)  And many, many others
  • 7. Problem identified  When you’ve arrived to the conclusion that you actually have too much volume, then figure out of what?  Too much content? I’ve seen 12 million nodes plus 60 million comments on a single installation, that’s a lot.  Too many requests per second? Make sure they are page requests. Statics can be easily fixed, look at cache headers, aggregation, Varnish, Nginx, CDNs.
  • 8. Problem identified  Too many Drupal page requests per second?  Anonymous?  If anonymous, it’s usually easy to fix, as long as it’s cacheable. We’ll go into the whole “cacheable” thing later.  If it’s cacheable, look at page cache, Boost, Varnish, CDNs.  Logged in?  Drupal cache is turning off, and the calls are bypassing all the caches  This usually is a more difficult problem to solve
  • 9. Problem identified: too many logged in users There’s still one case that’s pretty common and still easy enough to solve: logged in users with small amount of personalized content (small in percentage of the CPU cost of building the page in Drupal)
  • 10. Problem identified: too many logged in users logged in as user highlights: content area: common content for common everybody your friends’ favorites
  • 11. Problem identified: too many logged in users  Let’s make a couple pre-requisite conditions  You’re running on your own environment  You have Varnish configured in front of the Drupal site  You have some skills in programming with Drupal You got all of this? Ok, let’s continue.
  • 13. What’s Cache Control  It’s similar to ESI module with some benefits  It’s mainly directed to cache blocks or block-like content on the page  It needs some programming usually  When dealing with an optimal problem for it, it’s the optimal solution and will make your site faster by magnitudes
  • 14. DRUPAL User first gets the common page for everybody from Varnish Then a javascript routine checks whether the user is logged in or VARNISH not The javascript either makes the hidden for-anonymous content USER visible or fetches this user’s BROWSER content with a ajax request
  • 15. Problem identified: too many logged in users logged in as login box user highlights: content area: common content for common everybody your friends’ staff picks favorites
  • 17. Benefits of Cache Control  Burdens the back-end significantly less due to only loading the needed parts  Loads multiple blocks and/or areas with a single request  Gives the user something to look at while loading the hard parts of the page – and it does make the site feel faster  Plays well with some other modules, like captcha etc.
  • 18. What about ESI  ESI (Edge Side Includes) is a partial loading technique supported by Varnish and some CDNs, e.g. Akamai  It basically makes Varnish do the partial page loading  Varnish first fetches the common version from cache  Then it looks though the page to see any ESI markup  Then it loads all the ESI marked parts of the page from cache or from the Drupal
  • 19. How is Cache Control different than ESI  ESI needs to wait until the whole page is loaded before giving anything to the user  ESI loads all the portions of the page (still in D7, this might change in D8) in separate http requests, thus burdening the server with even more bootstraps than without any cache
  • 20. HEY… HOW ABOUT THAT USER GENERATED CONTENT THAT MAKES VARNISH PURGE EVERYTHING ALL THE TIME?
  • 21. Different problem  As stated, Cache Control works well for specific problems, but that also is in trouble when the Varnish cache gets purged all the time  That usually happens on a really UGC (User Generated Content) oriented site
  • 22. Different problem: UGC When a single page on a site gets new content every 2-30 seconds  Caching is of no use, purging multiple pages on that rate makes no sense  You need that data to have a way of refreshing even more frequently  And we’re still talking about a page that doesn’t update after it has loaded (so no Socket.IO stuff on this slide deck, sorry)
  • 23. Different problem: UGC logged in as user content area: common content for everybody highlights: common and this is getting updates every 30 seconds your friends’ favorites
  • 24. Solution: A new cache layer  We add a new, fast-paced cache layer on the page  We’ll try to purge and reload that cache as fast a humanly possible in Drupal  We’ll minimize our efforts on the backend
  • 25. Solution: A new cache layer  Let’s load the whole page from Varnish and the refresh the fast-paged part with javascript  To minimize the load on the backend, skip the theming layer and just load JSON  Sound good?
  • 26. Solution: A new cache layer  Until you realize you have to theme everything in the Javascript and that’s not fun  Even if you use a javascript templating engine, you still have to keep your themes up to date in two places
  • 28. Front themer  When theming in Javascript, Front themer makes your life a bit easier  It allows you to map your Drupal theme’s theme implementations to very simple Javascript versions  It’s designed to help out with simple elements, such as boxes and lists  It might need you to tweak your theming functions a bit to make them work better with it
  • 29. Solution: A new cache layer  And the back-end?  Exove has a module coming out to help get grouped and cached JSON outputs fast from Views  It’s not something to be used for integrations but just for the faster cache layer  Going to be released during this fall with a site using it  Until that, just use Views and Views datasource
  • 30. SO, WAIT A MINUTE THESE ARE ALL HACKS, RIGHT? Not quite.
  • 31. Drupal doing high performance  You can’t really use Drupal for high performance out of the box  Hacks, or actually extensions are needed and if done as proper contribs, are safe and convenient to use  Drupal has been made extensible for this exact reason, it can be made better by extending it
  • 32. What would we like to see in Drupal 8  We’d like to see a real JSON output from Drupal, preferably by piece by piece content  We’d also like to see a thinner bootstrap with lazy-loading for pretty much everything  REST interface for doing more stuff in the front, e.g. with JS frameworks
  • 33. You can see a pattern here. This is all covered by the WSSCI and Scotch iniatives. We’re waiting for Drupal 8 to be a lot better. And Cache Control is going to rock on Drupal 8.
  • 34. and there are always going to be hacks to get Drupal to do more
  • 35. THANK YOU FOR YOUR TIME PS. We’re hiring. www.exove.fi/careers