Log File Analysis: The most powerful tool in your SEO toolkit
- 1. LOG FILE ANALYSIS
The most powerful tool in your SEO toolkit
Tom Bennet
Consultant, Builtvisible
@tomcbennet
- 4. What is a log file?
A record of all hits that a server has received – humans and robots.
http://www.brightonseo.com/about/
1. Protocol
2. Host name
3. File name
Host name -> IP Address via DNS -> Connection to Server ->
HTTP Get Request via Protocol for File -> HTML to Browser
- 6. …but they’re very powerful.
188.65.114.122 - - [30/Sep/2013:08:07:05 -0400] "GET /resources/whitepapers/retail-whitepaper/ HTTP/1.1" 200 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)"
Server IP
Timestamp (date & time)
Method (GET / POST)
Request URI
HTTP status code
User-agent
- 8. What is Crawl Budget?
Crawl Budget = The number of URLs crawled on each visit to your site.
Higher Authority = Higher Crawl Budget
- 9. Crawl Budget Utilisation
http://example.com/thin-product-page-1
http://example.com/category/thin-product-page-1
http://example.com/category/subcategory/thin-product-page-1
http://example.com/category/subcategory/thin-product-page-1?colour=blue
Etc…
Conservation of crawl budget is key.
- 11. Preparing Your Data
Extraction: Varies by server. See accompanying guide.
Filter: By Googlebot user-agent, validate the IP range. https://support.google.com/webmasters/answer/80553?hl=en
Tools: Gamut and Splunk are great, but you can’t beat Excel.
- 12. Working in Excel
1. Convert .log to .csv
(cool tip: just change the file extension)
- 13. Working in Excel
2. Sample size
(60-120k Googlebot requests / rows is a good size)
- 14. Working in Excel
3. Text-to-columns
(a space will usually be a suitable delimiter)
- 17. Most vs Least Crawled
Formula: Use COUNTIF on Request URL.
Tip: Extract top-level category for crawl distribution by site-section.
http://www.brightonseo.com/speakers/person-name/
- 18. Crawl Frequency Over Time
Formula: Pivot date against count of requests.
Tip: Segment by site section or by user-agent (G-bot Mobile, Images, Video, etc).
- 19. HTTP Response Codes
Formula: Total up HTTP Response Codes.
Tip: Find most common 302s or 404s, filter by code and sort by URL occurrence.
- 21. Level Up
Robots.txt – Crawl all URLs with Screaming Frog to determine if they are blocked in robots.txt. Investigate most frequently crawled.
Faceted Nav Issues – Dedupe a list of unique resources, sort by times requested.
Sitemap – Add your sitemap URLs into an Excel table, VLOOKUP against your logs. Which mapped URLs are crawl deficient?
CSS / JS – These resources should be crawlable, but are files unnecessary for render absorbing an inordinate amount of crawl budget?
- 22. Top Level Crawl Waste
Formula: Use IF statements to check for every cause of waste.
- 25. THANKS FOR LISTENING
Get in touch
e: tom@builtvisible.com
t: @tomcbennet
Tom Bennet
Consultant, Builtvisible
@tomcbennet