SlideShare a Scribd company logo
Sam Marsden
Cut	the	Crap:	Next	Level	
Content	Audits	With	Crawlers
@sam_marsden
https://www.slideshare.net/DeepCrawl
About me...
2
SEO & Content Manager at DeepCrawl
Last year I started at DeepCrawl...
Soon after we received Series A funding...
@sam_marsden BrightonSEO 4
Very
happy
CEO!
This meant we could scale up...
@sam_marsden BrightonSEO 5
Funds were made
available for a
site redesign
@sam_marsden BrightonSEO
A website redesign is a long process
2Source: http://ezsitecms.com/services/website-redesign/
A site redesign is a long process
7
A website redesign is a long process
2Source: https://juliandontcheff.files.wordpress.com/2014/05/migration.png
...and we needed to move
to a new CMS
8
A website redesign is a long process
9
...because we were suffering from plugin bloat
10
And we needed to
manually re-enter existing content
...so we only wanted
to migrate the content
that we needed
11
A content
audit was in
order!
@sam_marsden BrightonSEO
@sam_marsden BrightonSEO http://www.williamsf1.com/racing/gallery/video/thecoolestpitstop
How can we do this in a thorough
but time-efficient way?
@sam_marsden BrightonSEO
What is a content
audit?
What do the search results say?
15@sam_marsden BrightonSEO
We want
a more
data driven
approach...
16@sam_marsden BrightonSEO
What guides are out there?
17@sam_marsden BrightonSEO
We need a fresh approach...
18@sam_marsden BrightonSEO
Comprehensive Time-
saving
Replicable
Content auditing is
like a spring clean
@sam_marsden BrightonSEO
Content auditing is
like a spring clean
First you need find all
the crap you have
hidden away in your
home.
≈
Discovering all of
your URLs
@sam_marsden BrightonSEO
What’s your reasoning behind
what will go?
≈
Creating a set of criteria for
judging content performance
@sam_marsden BrightonSEO
Making the call on what gets binned?
What stays?
What gets a new lease of life?
≈
Deciding what to do with your pages
23@sam_marsden BrightonSEO
@sam_marsden BrightonSEO
A Crawl-centred
Approach
The Discovery Phase
Aim: To discover all existing URLs
Other guides
say to export
pages...
@sam_marsden BrightonSEO
say to export
pages...
@sam_marsden BrightonSEO
o expopages...
Other guides suggest crawling...
Source: https://i0.wp.com/www.obstacleraceworld.com/wp-content/uploads/2014/09/crawl-under-electrified-fence.jpg?ssl=1
@sam_marsden BrightonSEO
BUt
Other guides suggest crawling...
Source: http://3.bp.blogspot.com/
Great idea
BUT…
Limited
view of
data@sam_marsden BrightonSEO
Other guides
say to export
data from 3rd
party tools…
@sam_marsden BrightonSEO
BUT… Joining the data is
laborious and time
consuming
@sam_marsden BrightonSEO
Here’s where a cloud
crawler can help…
@sam_marsden BrightonSEO
Not limited by scale
@sam_marsden BrightonSEO
Can seamlessly
integrate
multiple data
sources...
35
Crawl data needs to
be at the
centre of your audit,
not just
a single source.
@sam_marsden BrightonSEO
Running a crawl
36@sam_marsden BrightonSEO
Running a crawl
36@sam_marsden BrightonSEO
Use custom extractions to pull out:
● Authors bylines
● Published & last modified date
● Structured and meta data
● Wording and phrasing
● Image alt tags
● Tagging & tracking
Extracting out onpage data
37@sam_marsden BrightonSEO
Running a crawl
36@sam_marsden BrightonSEO
Running a crawl
36@sam_marsden BrightonSEO
Take your dataset…
39@sam_marsden BrightonSEO
2@sam_marsden BrightonSEO
Whittle down to
useful data, so
you can start
making
decisions
2@sam_marsden BrightonSEO
Cut that
sheet down
to size
2@sam_marsden BrightonSEO
You should be left with:
Page descriptors – URLs and page titles
Page attributes - word count, published &
last modified date, links, duplicates, categories, author
Performance metrics - backlinks, social shares,
traffic, SERPs, impressions, engagement metrics
With your inventory in place,
auditing can be streamlined and efficient.
2@sam_marsden BrightonSEO
@sam_marsden BrightonSEO
What are we going to cover?
Four questions you can answer with your
content inventory
Four examples of insights to inform your
content strategy
Introduce ways to automate content auditing process
@sam_marsden BrightonSEO
Four Key Questions
You’ll Want to Answer
2@sam_marsden BrightonSEO
Question No. 1:
What is and isn’t performing well?
Defining a set of criteria to judge content performance
46@sam_marsden BrightonSEO
Performance depends on nature of the site.
A news site that generates revenue through ad
impressions will define successful content
differently from a B2B site that provides a
niche service.
May also have different expectations of content
performance dependent on the content type.
Mass appeal vs. targeted content.
Inclusion relies on correct goal implementation*
In the DeepCrawl content audit…
47@sam_marsden BrightonSEO
Content performance assessed based on:
Unique
pageviews
Share
count
Backlink
count
Page
value*
2@sam_marsden BrightonSEO
https://mylearningsolutions.org/2014/08/13/five-
Question 2:
How can you deal
with content that isn’t
performing well?
Adding an ‘Action’ column
49@sam_marsden BrightonSEO
In your spreadsheet you’ll want to create an ‘Action’ column and
add in four options
Keep Cut Combine Convert
In DeepCrawl’s case...
50@sam_marsden BrightonSEO
We knew there was a lot of outdated content no longer providing value.
We could afford to be cut-throat and only keep content that:
● Had a publish date within the last year.
● Or had a specified volume of traffic from
Analytics or impressions from GSC Search Analytics.
For pages where
you aren’t sure
about what
action to take...
@sam_marsden BrightonSEO
Is the page being seen in
search and receiving traffic?
@sam_marsden BrightonSEO
Is the page actually bringing
value to the site?
£££ $$$ €€€
@sam_marsden BrightonSEO
Do they exude
Expertise,
Authoritativeness
& Trustworthiness?
@sam_marsden BrightonSEO
@sam_marsden BrightonSEO
http://theleagueam.com/2017/06/24/coaching/
Number 3:
How can you get the
most out of content that
is performing well?
2@sam_marsden BrightonSEO https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
Filter your
spreadsheet by
what you want
to keep...
...and examine ways you can maximise
the value of your top performing content...
@sam_marsden BrightonSEO
@sam_marsden BrightonSEO
Optimising titles & meta
descriptions
Keyword cannibalisation
Duplication issues
Internal & external linking
Page speed
Structured data
Tag pages
Start using a fuller breadth of the data you’ve pulled in.
Key areas to focus on for content optimisation:
4. How can you use this data inform your content strategy?
2@sam_marsden BrightonSEO
Question 4:
How can you
inform your
content strategy?
60@sam_marsden BrightonSEO https://www.freepik.com/premium-photo/empty-piggy-bank_1568162.htm
We need
data driven
insights…
because content
marketing
resources are
finite...
2@sam_marsden BrightonSEO
http://blog.peerform.com/will-banks-survive-competition-from-alternative-financial-markets/
And need to ensure resources are
invested into more of what works...
Achieving this is all about
finding relationships...
@sam_marsden BrightonSEO http://hopesrising.com/?p=5677
@sam_marsden BrightonSEO
Useful for larger sites
where page-by-page
assessment isn’t possible
2@sam_marsden BrightonSEO
Let’s look at some relationships
which may be of interest...
Tool of choice: the pivot table
@sam_marsden BrightonSEO
1. Performance by channel/category/content type
68@sam_marsden BrightonSEO
Do some types of content perform better than others?
Group content into categories and look in terms of
performance (views, shares, backlinks) and volume of
production (no. articles published).
Are you allocating content efforts efficiently?
Is time, money and effort being spent on the right
types of content?
Performance by channel, category, content type
69@sam_marsden BrightonSEO
2. Content length and engagement
70@sam_marsden BrightonSEO
Is content length positively correlated with engagement?
Content length and engagement
71@sam_marsden
If engagement doesn’t increase linearly with content length
then can resources for content production be used more
efficiently.
Create guidelines for content length based on insights.
Select topics based on impact rather than length.
Greater awareness of time taken to create content and the
likely impact that can be expected.
BrightonSEO
Is page speed harming bounce rate and conversions?
72@sam_marsden BrightonSEO
Do some pages load more
slowly than others?
Are some resource heavy?
Images optimisation
required?
Important, especially for
eCommerce as load time
and bounce rate closely tied
to conversion rate. https://www.branded3.com/blog/mobile-speed-experience-googles-2-4-second-sweet-spot/
Performance and engagement by author
73@sam_marsden BrightonSEO
How does content performance vary by author?
● Useful for sites with high turnover of content, like news sites.
● Define ranges by which to rate content performance
○ E.g. Poor, average, good, excellent based on pageviews
● Can be replicated on a weekly, monthly, quarterly basis for ongoing monitoring.
Name Poor Average Good Excellent
Barton Haberkorn 26 64 11 60
Jacquelynn Kline 19 79 4 49
Claudette Etheredge 87 79 77 11
Sharell Phinney 73 31 8 20
5. Performance fluctuations by publish date and time
74@sam_marsden BrightonSEO
Is content better
received on
specific days of
the week, time of
the day or months
of the year?
But this is just the beginning...
76@sam_marsden BrightonSEO
From here
you want to
automate the
auditing process
77@sam_marsden BrightonSEO
Aspects
you can
automate
@sam_marsden BrightonSEO
79@sam_marsden BrightonSEO
Crawl like clockwork
Scheduled crawling
Automatically
triggered crawls using
DeepCrawl’s Zapier integration
@sam_marsden BrightonSEO
Creating automated rules
Automated alerts
traffic drops, broken pages
Noindex low quality UGC
...pull data into dashboards for continuous monitoring
81@sam_marsden BrightonSEO
To wrap up...
@sam_marsden BrightonSEO
To wrap up...
83@sam_marsden BrightonSEO
Number 3:
How can you get the most out of
content that is performing well?
Question 2:
How can you deal with content that
isn’t performing well?
4. How can you use
this data inform your
content strategy?
Question 4:
How can you inform
your content strategy?
Question No. 1:
What is and isn’t performing well?
To wrap up...
84@sam_marsden BrightonSEO
The content auditing process should be centred around a cloud
based web crawling solution and be:
Data driven Automated Frequent
THANK YOU
Sam Marsden
SEO & Content Manager
@sam_marsden
BrightonSEO
Useful resources:
75@sam_marsden BrightonSEO
Crawl-Centred Guide to content auditing – Me!
How to do a Content Audit – Everett Sizemore
Automate or Die – David Iwanow
Branded3 Mobile Speed Experience – Mathew McCorry
Kevin Indig Experts on the Wire Podcast – Dan Shure
Webmaster Hangout Notes - DeepCrawl

More Related Content

Cut the Crap: Next Level Content Audits with Crawlers - Sam Marsden, SEO & Content Manager, DeepCrawl