SlideShare a Scribd company logo
Parsing Strange:
   URL to SQL to HTML
                          Hal Stern
                 snowmanonfire.com      headshot by Richard Stevens
                                        http://dieselsweeties.com

           slideshare.net/freeholdhal

    © 2010 Hal Stern
Some Rights Reserved
Why Do You Care?
•    Database performance = user experience
•    A little database expertise goes a long way
•    Taxonomies for more than sidebar lists
•    Custom post types (!!)
•    WordPress is a powerful CMS
      > Change default behaviors
      > Defy the common wisdom
      > Integrate other content sources/filters
    © 2010 Hal Stern
Some Rights Reserved                WordCamp Boulder   2
Flow of Control
•  Web server URL manipulation
      > Real file or permalink URL?
•  URL to query variables
      > What to display? Tag? Post? Category?
•  Query variables to SQL generation
      > How exactly to get that content?
•  Template file selection
      > How will content be displayed?
•  Content manipulation

    © 2010 Hal Stern
Some Rights Reserved                  WordCamp Boulder   3
Whose File Is This?
•  User URL request passed to web server
•  Web server checks
   .htaccess file    <IfModule mod_rewrite.c>
                     RewriteEngine On
                              RewriteBase   /whereyouputWordPress/
      > WP install root       RewriteCond   %{REQUEST_FILENAME} !-f
                              RewriteCond   %{REQUEST_FILENAME} !-d
      > Other .htaccess       RewriteRule
                              </IfModule>
                                            . /index.php [L]

        files may interfere
•  Basic rewriting rules:
   If file or directory URL doesn’t exist, start
   WordPress via index.php
    © 2010 Hal Stern
Some Rights Reserved                   WordCamp Boulder               4
Example Meta Fail: 404 Not Found
myblog/
myblog/wp-content (etc)
myblog/images


•  Access broken image URLs for
   unintended results: no 404 pages!
   myblog/images/not-a-pic.jpg!
•  Web server can’t find file, assumes it’s a
   permalink, hands to WP
•  WP can’t interpret it, so defaults to home
    © 2010 Hal Stern
Some Rights Reserved         WordCamp Boulder   5
What Happens Before The Loop
•    Parse URL into a query
•    Set conditionals & select templates
•    Execute the query & cache results
•    Run the Loop:
     <?php
     if (have_posts()) :
        while (have_posts()) :
         the_post();
         //loop content
        endwhile;
     endif;
     ?>

    © 2010 Hal Stern
Some Rights Reserved             WordCamp Boulder   6
Examining the Query String
<?php
   global $wp_query;
   echo ”SQL for this page ";
   echo $wp_query->request;
   echo "<br>";
?>


•  SQL passed to MySQL in WP_Query
   object’s request element
•  Brute force: edit theme footer.php	
  to
   see main loop’s query for displayed page
    © 2010 Hal Stern
Some Rights Reserved            WordCamp Boulder   7
“Home Page” Query Deconstruction

SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts WHERE 1=1
AND wp_posts.post_type = 'post’ AND
(wp_posts.post_status = 'publish' OR
 wp_posts.post_status = 'private’)
ORDER BY wp_posts.post_date DESC LIMIT 0, 10


Get all fields from posts table, but limit number of returned rows

Only get posts, and those that are published or private to the user

Sort the results by date in descending order

Start results starting with record 0 and up to 10 more results



    © 2010 Hal Stern
Some Rights Reserved                               WordCamp Boulder   8
Query Parsing
•  parse_request() method of WP_Query
   extracts query variables from URL
•  Execute rewrite rules
      > Pick off ?p=67 style http GET variables
      > Match permalink structure
      > Match keywords like “author” and “tag”
      > Match custom post type slugs


    © 2010 Hal Stern
Some Rights Reserved              WordCamp Boulder   9
Query Variables to SQL
•  Query type: post by title, posts by category
   or tag, posts by date
•  Variables for the query
      > Slug values for category/tags
      > Month/day numbers
      > Explicit variable values
        ?p=67 for post_id
•  post_type variable has been around for
   a while; CPT fill in new values
    © 2010 Hal Stern
Some Rights Reserved               WordCamp Boulder   10
Simple Title Slug Parsing
/2010/premio-sausage




     SELECT wp_posts.* FROM wp_posts WHERE 1=1 AND YEAR
     (wp_posts.post_date)='2010' AND wp_posts.post_name = 'premio-
     sausage' AND wp_posts.post_type = 'post' ORDER BY
     wp_posts.post_date DESC

•  Rewrite matches root of permalink,
   extracts tail of URL as a title slug

    © 2010 Hal Stern
Some Rights Reserved                      WordCamp Boulder           11
Graphs and JOIN Operations
•  WordPress treats tags and categories as
   “terms”, mapped 1:N to posts
•  Relational databases aren’t ideal for this
      > INNER JOIN builds intermediate tables on
        common key values
•  Following link in a social graph is
   equivalent to an INNER JOIN on tables of
   linked items

    © 2010 Hal Stern
Some Rights Reserved             WordCamp Boulder   12
WordPress Taxonomy Tables

wp_posts               wp_term_relationships      wp_term_taxonomy
post_id                object_id                  term_taxonomy_id
….                     term_taxonomy_id           term_id
post_date                                         taxonomy
…                                                 description
post_content


•  Term relationships table maps
   N:1 terms to each post                         wp_terms
                                                  term_id
•  Term taxonomy maps slugs                       name
   1:N to taxonomies                              slug
•  Term table has slugs for URL
   mapping
    © 2010 Hal Stern
Some Rights Reserved                           WordCamp Boulder      13
Taxonomy Lookup
  /tag/premio




SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts
INNER JOIN wp_term_relationships ON
(wp_posts.ID = wp_term_relationships.object_id)
INNER JOIN wp_term_taxonomy ON
  (wp_term_relationships.term_taxonomy_id =
   wp_term_taxonomy.term_taxonomy_id)
INNER JOIN wp_terms ON
   (wp_term_taxonomy.term_id = wp_terms.term_id)
WHERE 1=1 AND wp_term_taxonomy.taxonomy = 'post_tag' AND wp_terms.slug IN
('premio') AND wp_posts.post_type = 'post' AND (wp_posts.post_status =
'publish' OR wp_posts.post_status = 'private') GROUP BY wp_posts.ID ORDER
BY wp_posts.post_date DESC LIMIT 0, 10

    © 2010 Hal Stern
Some Rights Reserved                           WordCamp Boulder             14
More on Canonical URLs
•  Canonical URLs improve SEO
•  WordPress is really good about generating
   301 Redirects for non-standard URLs
•  Example: URL doesn’t appear to match a
   permalink, WordPress does prediction
      > Use “LIKE title%” in WHERE clause
      > Matches “title” as initial substring with %
        wildcard

    © 2010 Hal Stern
Some Rights Reserved                WordCamp Boulder   15
Modifying the Query
•  Brute force isn’t necessarily good
      > Using query_posts() ignores all previous
        parsing, runs a new SQL query
•  Filter query_vars
      > Change default parsing (convert any day to a
        week’s worth of posts, for example)
•  Actions parse_query & parse_request
      > Access WP_Query object before execution
      > is_xx() conditionals are already set

    © 2010 Hal Stern
Some Rights Reserved              WordCamp Boulder   16
SQL Generation Filters
•  posts_where
      > More explicit control over query variable to
        SQL grammar mapping
•  posts_join
      > Add or modify JOINS for other graph like
        relationships
•  Many other filters
      > Change grouping of results
      > Change ordering of results

    © 2010 Hal Stern
Some Rights Reserved                 WordCamp Boulder   17
Custom Post Types
•  Change SQL WHERE clause on post type
      > wp_posts.post_type=‘ebay’
•  Add new rewrite rules for URL parsing similar
   to category & tag
      > Set slug in CPT registration array
           'rewrite' => array ("slug" => “ebay”),
•  Watch out for competing, overwritten or
   unflushed rewrite entries
    <?php echo "<pre>”;
    print_r(get_option('rewrite_rules'));
    echo "</pre>”;
    ?>
    © 2010 Hal Stern
Some Rights Reserved                 WordCamp Boulder   18
Applications
•  Stylized listings
      > Category sorted alphabetically
      > Use posts as listings of resources (jobs,
        clients, events) – good CPT application
•  Custom URL slugs
      > Add rewrite rules to match slug and set query
        variables
•  Joining other social graphs
      > Suggested/related content

    © 2010 Hal Stern
Some Rights Reserved                WordCamp Boulder   19
Template File Selection
•  is_x() conditionals set in query parsing
•  Used to drive template selection
      > is_tag() looks for tag-slug, tag-id, then tag
      > Full search hierarchy in Codex
•  template_redirect action
      > Called in the template loader
      > Add actions to override defaults


    © 2010 Hal Stern
Some Rights Reserved               WordCamp Boulder   20
HTML Generation
•  Done in the_post() method
•  Raw content retrieved from MySQL
      > Short codes interpreted
      > CSS applied
•  Some caching plugins generate and store
   HTML, so YMMV



    © 2010 Hal Stern
Some Rights Reserved              WordCamp Boulder   21
Why Do You Care?
•  User experience improvement
      > JOINS are expensive
      > Large table, repetitive SELECTs = slow
      > Running query once keeps cache warm
      > Category, permalink, title slug choices matter
•  More CMS, less “blog”
      > Alphabetical sort
      > Adding taxonomy/social graph elements

    © 2010 Hal Stern
Some Rights Reserved               WordCamp Boulder      22
Resources

•  Core files where SQL stuff happens
      > query.php
      > post.php
      > canonical.php
      > rewrite.php
•  Template	
  loader	
  search	
  path	
  
      >  http://codex.wordpress.org/Template_Hierarchy



    © 2010 Hal Stern
Some Rights Reserved                      WordCamp Boulder   23
Contact

Hal Stern
freeholdhal@gmail.com
@freeholdhal
snowmanonfire.com
facebook.com/hal.stern

slideshare.net/freeholdhal




    © 2010 Hal Stern
Some Rights Reserved         WordCamp Boulder   24

More Related Content

Parsing strange v2

  • 1. Parsing Strange: URL to SQL to HTML Hal Stern snowmanonfire.com headshot by Richard Stevens http://dieselsweeties.com slideshare.net/freeholdhal © 2010 Hal Stern Some Rights Reserved
  • 2. Why Do You Care? •  Database performance = user experience •  A little database expertise goes a long way •  Taxonomies for more than sidebar lists •  Custom post types (!!) •  WordPress is a powerful CMS > Change default behaviors > Defy the common wisdom > Integrate other content sources/filters © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 2
  • 3. Flow of Control •  Web server URL manipulation > Real file or permalink URL? •  URL to query variables > What to display? Tag? Post? Category? •  Query variables to SQL generation > How exactly to get that content? •  Template file selection > How will content be displayed? •  Content manipulation © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 3
  • 4. Whose File Is This? •  User URL request passed to web server •  Web server checks .htaccess file <IfModule mod_rewrite.c> RewriteEngine On RewriteBase /whereyouputWordPress/ > WP install root RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d > Other .htaccess RewriteRule </IfModule> . /index.php [L] files may interfere •  Basic rewriting rules: If file or directory URL doesn’t exist, start WordPress via index.php © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 4
  • 5. Example Meta Fail: 404 Not Found myblog/ myblog/wp-content (etc) myblog/images •  Access broken image URLs for unintended results: no 404 pages! myblog/images/not-a-pic.jpg! •  Web server can’t find file, assumes it’s a permalink, hands to WP •  WP can’t interpret it, so defaults to home © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 5
  • 6. What Happens Before The Loop •  Parse URL into a query •  Set conditionals & select templates •  Execute the query & cache results •  Run the Loop: <?php if (have_posts()) : while (have_posts()) : the_post(); //loop content endwhile; endif; ?> © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 6
  • 7. Examining the Query String <?php global $wp_query; echo ”SQL for this page "; echo $wp_query->request; echo "<br>"; ?> •  SQL passed to MySQL in WP_Query object’s request element •  Brute force: edit theme footer.php  to see main loop’s query for displayed page © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 7
  • 8. “Home Page” Query Deconstruction SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts WHERE 1=1 AND wp_posts.post_type = 'post’ AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private’) ORDER BY wp_posts.post_date DESC LIMIT 0, 10 Get all fields from posts table, but limit number of returned rows Only get posts, and those that are published or private to the user Sort the results by date in descending order Start results starting with record 0 and up to 10 more results © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 8
  • 9. Query Parsing •  parse_request() method of WP_Query extracts query variables from URL •  Execute rewrite rules > Pick off ?p=67 style http GET variables > Match permalink structure > Match keywords like “author” and “tag” > Match custom post type slugs © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 9
  • 10. Query Variables to SQL •  Query type: post by title, posts by category or tag, posts by date •  Variables for the query > Slug values for category/tags > Month/day numbers > Explicit variable values ?p=67 for post_id •  post_type variable has been around for a while; CPT fill in new values © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 10
  • 11. Simple Title Slug Parsing /2010/premio-sausage SELECT wp_posts.* FROM wp_posts WHERE 1=1 AND YEAR (wp_posts.post_date)='2010' AND wp_posts.post_name = 'premio- sausage' AND wp_posts.post_type = 'post' ORDER BY wp_posts.post_date DESC •  Rewrite matches root of permalink, extracts tail of URL as a title slug © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 11
  • 12. Graphs and JOIN Operations •  WordPress treats tags and categories as “terms”, mapped 1:N to posts •  Relational databases aren’t ideal for this > INNER JOIN builds intermediate tables on common key values •  Following link in a social graph is equivalent to an INNER JOIN on tables of linked items © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 12
  • 13. WordPress Taxonomy Tables wp_posts wp_term_relationships wp_term_taxonomy post_id object_id term_taxonomy_id …. term_taxonomy_id term_id post_date taxonomy … description post_content •  Term relationships table maps N:1 terms to each post wp_terms term_id •  Term taxonomy maps slugs name 1:N to taxonomies slug •  Term table has slugs for URL mapping © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 13
  • 14. Taxonomy Lookup /tag/premio SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) INNER JOIN wp_term_taxonomy ON (wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id) INNER JOIN wp_terms ON (wp_term_taxonomy.term_id = wp_terms.term_id) WHERE 1=1 AND wp_term_taxonomy.taxonomy = 'post_tag' AND wp_terms.slug IN ('premio') AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 10 © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 14
  • 15. More on Canonical URLs •  Canonical URLs improve SEO •  WordPress is really good about generating 301 Redirects for non-standard URLs •  Example: URL doesn’t appear to match a permalink, WordPress does prediction > Use “LIKE title%” in WHERE clause > Matches “title” as initial substring with % wildcard © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 15
  • 16. Modifying the Query •  Brute force isn’t necessarily good > Using query_posts() ignores all previous parsing, runs a new SQL query •  Filter query_vars > Change default parsing (convert any day to a week’s worth of posts, for example) •  Actions parse_query & parse_request > Access WP_Query object before execution > is_xx() conditionals are already set © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 16
  • 17. SQL Generation Filters •  posts_where > More explicit control over query variable to SQL grammar mapping •  posts_join > Add or modify JOINS for other graph like relationships •  Many other filters > Change grouping of results > Change ordering of results © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 17
  • 18. Custom Post Types •  Change SQL WHERE clause on post type > wp_posts.post_type=‘ebay’ •  Add new rewrite rules for URL parsing similar to category & tag > Set slug in CPT registration array 'rewrite' => array ("slug" => “ebay”), •  Watch out for competing, overwritten or unflushed rewrite entries <?php echo "<pre>”; print_r(get_option('rewrite_rules')); echo "</pre>”; ?> © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 18
  • 19. Applications •  Stylized listings > Category sorted alphabetically > Use posts as listings of resources (jobs, clients, events) – good CPT application •  Custom URL slugs > Add rewrite rules to match slug and set query variables •  Joining other social graphs > Suggested/related content © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 19
  • 20. Template File Selection •  is_x() conditionals set in query parsing •  Used to drive template selection > is_tag() looks for tag-slug, tag-id, then tag > Full search hierarchy in Codex •  template_redirect action > Called in the template loader > Add actions to override defaults © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 20
  • 21. HTML Generation •  Done in the_post() method •  Raw content retrieved from MySQL > Short codes interpreted > CSS applied •  Some caching plugins generate and store HTML, so YMMV © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 21
  • 22. Why Do You Care? •  User experience improvement > JOINS are expensive > Large table, repetitive SELECTs = slow > Running query once keeps cache warm > Category, permalink, title slug choices matter •  More CMS, less “blog” > Alphabetical sort > Adding taxonomy/social graph elements © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 22
  • 23. Resources •  Core files where SQL stuff happens > query.php > post.php > canonical.php > rewrite.php •  Template  loader  search  path   >  http://codex.wordpress.org/Template_Hierarchy © 2010 Hal Stern Some Rights Reserved WordCamp Boulder 23