MnSearch Snippets April 2019: Screaming Frog Custom Extraction - Griffin Roer
- 2. “The industry leading website crawler… trusted by
thousands of SEOs and agencies worldwide for
technical SEO audits.”
Screaming Frog
MORE THAN AN SEO
AUDIT TOOL
● Custom website scraping
● Advanced reporting
○ GA & GSC integrations
2Screaming Frog Custom Extraction - MnSearch - April 2019
- 4. DEFAULT
EXTRACTIONS
Screaming Frog extracts a
bunch of data from the HTML of
web pages by default.
4Screaming Frog Custom Extraction - MnSearch - April 2019
Page Title
Meta Description
Meta Keywords
H1
H2
Meta Robots
Meta Refresh
Canonical Link
Pagination
On-page links
Anchor text
Alt text
Hreflang
AMP
- 6. HOW THEY HELP
Use cases:
● Analyze performance by
factors you may normally
not have access to
● Diagnose hidden site issues
● Speed up data-gathering
6Screaming Frog Custom Extraction - MnSearch - April 2019
- 10. 10
RUN YOUR CRAWL
The data you extract is available
in the Custom tab. Set the Filter
dropdown to Extraction.
- 12. 12
SET UP YOUR
EXTRACTION RULES
Screaming Frog requires some
information to know how and
what to extract:
● Extractor Name (optional)
● Extraction Method
● Rule
● Extraction Filter
- 13. ● Use to extract any HTML element of a
webpage
○ Anything in a <div>, <p>, <span>, <a>,
<meta>, etc.
13
XPATH & REGEX
Two syntaxes that you can use
to tell Screaming Frog what you
want to extract from a web
page.
XPATH
● Use to extract inline JavaScript
○ Like, schema markup in JSON-LD or a
an account ID from a tracking pixel
REGEX
- 15. 15
EXAMPLES
A quick example showing how
to extract the date from articles
on the MnSearch blog.
Chrome: Right-Click > Inspect
Date is in a <span> element with the class
“meta-date date updated”
- 16. 16
EXAMPLES
A quick example showing how
to extract the date from articles
on the MnSearch blog. Custom Extraction Results
XPath Rule: //span[@class='meta-date date updated']
- 19. 19
EXAMPLES
See what types of schema
markup are being used on each
page.
● This example shows one
rule using XPath and
another using regex