You Don't Know SEO
inspired by @getify…
This talk is
inspired by a
series of books
called “You Don’t
Know JavaScript”
by Kyle Simpson
We open to our hero pondering
the ramifications of search.
The Industry is so often wrong
about how search works.
We can’t really
listen to webmaster
trends analysts
At this moment Our hero began
to miss matt cutts.
We Spend most of
our time on this
level of
Us and the
We act more like users than the
search engineers that we are.
Sometimes Google is
quite explicit, just in
terminology that we
don’t know or
Accessibility driven by the
mobile user experience
Positioning content to be
extracted easily
Making sure the
experience is fast as
Hyper-targeted and
optimized for specific
user contexts
Building credibility to your
Making sure the different
parts of the SERP work
You don’t know
information retrieval
You Don't Know SEO
Indexing Process per @tcmg
Google adheres very closely to
the response codes.
Do you know about the 304
response code?
Use the 304 response code to
manage your crawl allocation.
According to botify’s recent
study, you’re not using 304s
Little known fact: you can
request a crawl increase here
after big changes
Quick Notes on JavaScript and
Chrome 41. it doesn’t support
everything the latest chrome does.
For auditing javascript sites,
look at the hash in Screaming
If you want to get fancy, crawl
how Google crawls.
You can even watch a Googler
teach you to scrape Google with
@justinrbriggs has a good step
by step on how to do this.
Dynamic serving still
can have pitfalls.
enhancement is still
the only way to be
The missing piece
Content must be
processed before
it can be indexed
Feature Extraction
Google Extracts
features of the
page to inform
Google’s Version
Source: Multi-
stage query
system and
method for use
with tokenspace
The “Index” is really
just a list of
keywords, document
ids and their scores…
Doc Server Structure
The Docserver is
probably what
you’re actually
thinking of
The cache is the latest version of
the page from the doc server.
This is how the “Document Scoring
based on document content update”
capability can work
Search is Powered by 200+ microservices
How ranking basically works
10 Second edit of Paul Haahr &
Jeff Dean videos
Basic Rank Scoring
Summation of
all terms
Basic Rank Scoring
Weight of query term
Basic Rank Scoring
Document Term Weight
Ranking Models
There are many Ranking Models
Bing’s rank scoring Model
Just in case
anyone cares…
Bing’s rank scoring Model
As in Why The F**K
are people using
Google’s Scoring Functions
Google Attempts
multiple different
“scoring functions”
S o u r c e : F r a m e w o r k f o r e v a l u a t in g w e b s e a r c h s c o r in g
f u n c t io n s
Post-retrieval Adjustment
Google often
adjusts rankings
before presenting
themS o u r c e : P h r a s e - b a s e d in d e x in g in a n in f o r m a t io n
r e t r ie v a l s y s t e m
Query processing
Source: Multi-stage
query processing
system and method for
use with tokenspace
Query Adjustment – hummingbird?
Google may also
revise your query
internally to serve
better resultsS o u r c e : F r a m e w o r k f o r e v a l u a t in g w e b s e a r c h s c o r in g
f u n c t io n s
Multi-modal search
Google uses implicit
signals as well as
the explicit query to
show you what
User Modeling
Google builds
models of you
based on your
search history.
Your search history is stored
next to additional data that
google derives
Queries become Entities
Google breaks your
queries down into
entities first before
determining what the
result set is.
This allows them to use
the context of these
entities to show more
relevant results
32 Word Query Limit
This indicates that the inverted
index does not store results
for higher than a 32-gram.
So with all this
complexity, how
can we beat the
You Don't Know SEO
By far one of the greatest blog
posts ever written on seo.
Keyword Usage
As You might imagine,
leveraging Text
Statistics to inform
rankings indicates
that you might want
to use the keywords
on the page.
This is where having
the opportunity to
rank begins.
Search Engines break
paragraphs into
sentences and
sentences into “tokens”
or individual words.
This better positions
content for statistical
N-grams are phrases of n length
Google made their entire n-gram
dataset available in 2006
The n-gram viewer is this concept
used on books
Synonyms & Close Variants
Statistical relevance is
computed using both
synonyms and close
variants of words.
Search Engines break words into their
stems to better determine relationships
and relevance
Similar to stemming,
lEmmatization is the
grouping of inflected
forms of the same word
or idea.
For example, “better” and
“good” have the same
Semantic Distance & Term
Search engines look
for how physically
close words are
together to better
determine their
relationship in
statistical models.
Zipf’s Law
Zipf’s law is a theory
that words will be
similarly distributed
across documents in
the same corpus (or
document set).
Zipf’s Law applied
When run on an
actual dataset, zipf’s
law tends to hold up
in high rankings.
Latent Semantic Analysis
Latent Semantic analysis represents
content as a matrix to determine
Term frequency, inverse
document frequency
(TF-IDF) identifies the
importance of
keywords with respect
to other keywords in
documents in a corpus.
Ryte has a great guide on tf*IDF
Google Specifically
looks for keywords
that co-occur with a
target keyword in a
document set.
Usage of those
keywords is an
indication of relevance
for subsequent
Phrase-based indexing
Heaps Law on Vocab Growth
Heaps law indicates that
vocabulary within a corpus
grows at a predictable rate
Entity Salience
Source: Techniques
for automatically
identifying salient
entities in documents
Hidden markov Model
Hidden Markov Models allow
search engines to extract
implicit entities
Modern Search
engines determine
both prominence of
content and review
the visual hierarchy
to determine how
valuable the content
is to the page.
This All Boils down to fulfilling
the expectations of users and
search engines
You Don't Know SEO
Unfortunately, this
is a place where
most seo tools are
very far behind the
You can use Knime to really dig into
text analytics
They have great examples of how to use
knime’s Text processing plugin
1. Put your domain into semrush
Integrated Search is the way to
influence this
2. See what you don’t rank well
for In organic research
Integrated Search is the way to
influence this
3. Pick a keyword
4. Look at the serp
Really Google?
You Don't Know SEO
This is the page that ranks #86?!
5. See What else could rank for
our site.
Enter ryte’s content success
6. Put in the keyword and
compare tf-idf to ranking pages
You can see the corpus they
pulled for review
7. Put the copy in. see what co-
occurring keywords are missing
and make edits.
content optimization
Searchmetrics’ tool also
accounts for entities and
You Don't Know SEO
Does Google Use Dwell Time?
Information retrieval
Evaluation Measures
CTR and Session
Success Rate are best
practices for
“evaluating” the
performance rank
scoring models.
Google isn’t lying,
just being specific.
Does Google Use CTR?
What’s in a
Query Log
Source: Systems and methods for generating statistics
from search engine query logs
Time Based Ranking Patent Says they do
Machine Learning Response Variable
Integrated Search is the way to
influence this
Compelling meta descriptions
AdWords Search Console SEMRush Analytics Business RulesKeyword Portfolio
Integrated Search is all about
combining data from both
if (Organic Impressions + Paid Impressions) < .3(Search Volume) and LP Conv% > Avg LP Conv% then Increase Bids.
If (LP Conv% > Baseline Conversion Rate) & Keyword Difficulty < 50 then Increase SEO Effort
If (LP Conv% < Baseline Conversion Rate) and QS < Avg QS then Improve Landing Page
If Organic Ranking > 10 and Paid Ranking < 1 and LP Conv% > Baseline Conversion Rate then Increase SEO Effort
If Organic Landing Page != Paid Landing Page and Paid Landing Page has a high QS then optimize your internal
linking structure
IF Need State = Action and LP Conv% < Avg LP Conv% then Improve Landing Page
You Don't Know SEO
Structured Data is not just this.
What kinda serp feature is that?
That’s just a table, but The
relationships are clear.
Embrace all relevant
semantic structural
opportunities even if
they don’t show
something in the
You Don't Know SEO
Pagerank is still the measure, but
I wonder if we ever truly
understood it.
Ranking search
results based on
Google generates quotes from
the linking page indicating that
they want parity on both sides.
Internal link building is one of
the most valuable things you can
do. Especially on large sites.
More internal links leads to
more crawl activity too
Compute Your Internal pagerank in R
Or get you an SEO tool that can
do it.
Here’s An algorithmic Approach
to Generating the best internal
linking Structure
Perform text
analysis at
publish to
extract features
Pull pages from
the database
with related
Place links in
copy based on
Be sure to limit
the number of
links per page
Be sure there is
only one link to
a given page
You Don’t know
Seo implementation
You don’t know how implementation happens
Clients can’t always
“just” fix something.
And our expectations
don’t match the
effort required in
some cases.
Http Header Changes
http headers can be governed by
the server or the application
redirects can be governed by the
server or the application
Managing hreflang requires a
data structure of url
Internal Linking Structure
Often Requires sitewide
Xml Sitemaps
Generating sitemaps requires a
data structure of urls and their
You Don't Know SEO
G o o g l e h a s
a g g r e s s ive
s p e e d
e x p e c ta tio n s
Speed your Site up!
Of course, google uses speed in
its scoring function
Botify’s findings indicate big
sites get crawled more when
they are fast
Critical rendering Path
We look to enforce it via the
critical rendering path
These Recommendations at face
value will break sites.
Actionable & optimal
You don’t know how to be actionable and optimal
You Don't Know SEO
1. Segment your crawl – different
page types are governed by
2. Segment your Rankings. Focus
in on key opportunities.
3. Be painfully Specific.
You Don't Know SEO
Be like @Richardbaxter
Check out the Code Coverage report to see what code
isn’t being used and delete it from your pages.
coverage-page-speed.htm (h/t @portentint)
Is This Your
“Remove unused JavaScript and CSS
from all pages to enhance the page
That’s not
actionable or
Many of those scripts are hosted
libraries such as Facebook Connect,
Google Analytics or jQuery. Hosting
those libraries locally and removing
items will take forever and won’t
support forward compatibility.
Try this instead
Consider removing or otherwise
refactoring lines 49-56 in the
suchandsuch.js (or page type A)
because they are not currently being
used and no functionality or other
code is dependent upon it.
Is This Your
“Implement HTTP/2 for faster site
That’s not
actionable or
Depending on the server version and
environment, the client may not
currently support HTTP/2. If their
server does support it, they may not
know where to start.
Try this instead
Based on your site’s HTTP headers,
you’re running NGINX vX.XX. We
recommend adjusting your HTTPS
server configuration in your .conf
file to include the following.
Is This Your
“You have broken pages throughout
the site; we recommend updating
those URLs to return 301 redirects and
we have prepared a list of 1:1
relationships for redirection in your
.htaccess file.”
That’s not
actionable or
f o r e v a l u a t in g w e b s e a r c h s c o r in g f u n c t io n s
Why? #1
I see a lot of recommendations that
automatically assume Apache
servers. (break down of server tech
on the right)
NGINX and IIS don’t have .htaccess
Why? #2
1:1 rules are suboptimal. Always
create RegEx-driven rules for
redirects to minimize TTFB of every
page throughout the site.
Try this instead
Based on your site’s HTTP headers,
you’re running NGINX vX.XX. We
recommend adjusting your HTTPS
server configuration in your .conf
file to include the following code.
Is This Your
“You have links to redirects
throughout the site; we
recommend updating those links
to the final destination URLs.
Here is a list of URLs with links
to redirects.”
That’s not
actionable or
What are they going to do? Run Find
and Replace on their entire site?
Try this instead
Crawl the site and keep track of all
of the final destination URLs.
Prepare a spreadsheet or database
table and instruct the client to
update these links on the database
level. Alternatively, spec out a
simple crawler they can run on a
daily basis to crawl their site and
update their links.
Is This Your
“Google has recently increased
the meta description from 155-
160 characters to ~320. You
rewrite your meta descriptions
to take more advantage of the
What are they going to write? Which
pages will they do it on? How are
they going to populate the meta
description? How can they capitalize
on all the space? How will they
That’s not
actionable or
Try this instead
We recommend using structured data from each
page template to generate keyword-relevant meta
descriptions that are click-worthy.
The following are schemas to implement per page
We recommend prioritizing the following 5,000 URLs
for implementation because they have higher crawl
We will measure the impact of this recommendation
using CTR and clicks from Google Search Console as
well as traffic and conversion performance from
Google Analytics.
You Don't Know SEO
You Don't Know SEO
software testing
you don’t know the software testing use case for seo
Software Testing Types
Automated Testing Happens when
Devs Push Code
Software Testing process
Case ID
Test Type Test
Test Data Expected
Presence of Meta
Unit Test Check for
presence of meta
description tag in
Page template
All URLs should
have meta
Product Detail
Page is missing
meta description
Viable Internal
Functional Test 1. Render pages
2. Open all
internal links
3. Review
Crawled URL data All links return 200
response code
Many links to
redirects and 404s
Average Page
Speed Less than 2
on Test
1. Render pages
2. Capture page
3. Determine
average page
speed per page
Render all page
types fromURL list
All page types
should return an
average of 2
seconds load time
Homepage takes 5
seconds to load
You may need an internal
Consider a serverless crawler
with a seed set of pages that can
be worked into the push queue.
Split testing
You don’t know how to split test for seo
@Distilled has been pioneering
A/B Testing as a service for seo.
we ran an a/b test before doing
a several million page duplicate
content consolidation.
Mitigate the Risk
Develop a Hypothesis
By changing into , I Can
get more better or more rankings and
increase traffic.
Create two a groups and one variant
group of pages. Number of pages
should be statistically significant.
Bucket Pages
Make sure the pages you run
your experiment on have crawl
Benchmark performance
Benchmark Rankings, traffic, and
Get it started. We let these run
for 30 days or so.
Launch Experiment
Analyze your performance.
Prophet by Facebook for forecasting
the impact of changes.
Run your own tests -
Distilled Free Tool
How to Become a Dramatically
better seo
5 things you should do to be ready for what google throws your way
Commit to doing any three
things you’ve learned from this
Learn more about information
Set up a web server
Build a new website in html, css
and javascript
Make it rank for 5 competitive
The Credits
I’m #ZorasDad
You Don't Know SEO
We Do These Things
SEO Paid Media Machine
& Optimization
Shoutout to @Randfish
‘Nuff said.
Mike King
Founder &
Managing Director
Post-credits scene
You Don't Know SEO
You Don't Know SEO
Use Buzzsumo to find the most popular
content on a site and use it to inform
your outreach email
Use a text summarizer so you don’t have
to read the entire thing.
Use a chatbot to automatically
overcome objections in your outreach
Create a chatbot with DialogFlow (PKA
Set up your responses based on key
objections that people have
Connect your chatbot to your email via
zapier to automatically overcome
You Don't Know SEO

You Don't Know SEO