Fabrice Canel - Advanced Search Summit Napa 2019
- 2. fabricecanel
@facan
About me
22 years search veteran at Microsoft
Principal Program Manager leading the
Bing Web Data platform team. The team
discovering, selecting, crawling, processing
billions of new or updated web pages every
day.
Prior to Bing and MSN Search, Lead
Program Manager for microsoft.com search
in the very early days of Search and SEO.
- 5. fabricecanel
@facan
The Evolution of Search
Keyword based
Search
Natural Language
Search
Voice, Vision
Context-Based
Search
Traditional IR
AUTB streams Inverted index
User Engagement
User clicks,
metawords
Inverted index
AI
Deep Learning Vectors
- 9. fabricecanel
@facan
Visual Search
• Image similarity match
• Deep Learning vector encoding
• Approximate nearest
neighborhood search
• Search in several hundred
million images in a few
milliseconds
• Significant improvement to
image recall
- 14. fabricecanel
@facan
What is the goal of the Search Engines?
Discover and Index Everything
Create “the” Knowledge Graph
Super fast access to information
- 16. fabricecanel
@facan
Bing and Verizon Media
Bing Ads will be the exclusive platform to
manage search ads across Bing
and Verizon Media networks on all devices.
The Microsoft Audience network have now
access to Verizon Media O&O properties,
available via Search workflow campaigns.
Properties include Aol.com, Yahoo.com, Aol
Mail, Yahoo! Mail, Huffington Post and
more.
- 20. fabricecanel
@facan
Bing Content goals Webmasters SEO goals
Maximize Content
Comprehensiveness
Have all my sites relevant URLs indexed in Search
Engines
Maximize Content
Freshness
Have all my sites newest URLs and newest updated
content indexed in Search Engines
Maximize Content
Intelligence
Have search engines able to extract the most useful
information of each web page to ease their abilities to
rank my content at the top of search results
Your SEO goals align with my team goals
- 21. fabricecanel
@facan
My TOP 3 SEO guidelines table
How to have your
URLs DISCOVERED
1. Good links to your content
2. Avoid duplicate and useless URLs
3. XML sitemap, RSS feeds … and new abilities in upcoming slides
How to have your
URLs RANKED for
crawling/indexing
1. Content Authority: High quality content differentiating from others
2. Content useful, sufficiently detailed and well-presented
3. Get an audience
How to have your
URLs CRAWLED
1. Allow us access the content (allow in robots.txt, crawl quota)
2. Guide us to the new content: RSS, sitemaps lastmod
3. Avoid craziness with JavaScript and CSS
How to have your
content
UNDERSTOOD
1. Standard HTML is highly preferable
2. Tell us more via schema (HTML 5 tags, Json-LD, schema.org etc.)
3. Avoid craziness with JavaScript and CSS
- 22. fabricecanel
@facan
Web pages may
appear, change or
disappear on each
web site at any
moment
Bing needs to crawl
and re/crawl to keep
index up-to-date.
Bing needs an effective, efficient crawl
scheduling policy that
1. obeys web hosts’ and Bing’s own
constraints on download bandwidth
2. is efficiently computable over
billions of hosts and hundreds of
billions of page
The challenge of Crawling
- 23. fabricecanel
@facan
Crawling challenges:
Diversity of solutions
HARD VERY HARD
EASY INTERMEDIATE
Web sites with great SEO
• Clean URLs
• RSS and sitemaps with lastmod
• Static HTML
Web sites with bad SEO
• Duplicates, useless, junk URLs
• No sitemaps, no RSS
• Lot of JavaScript
• Limited Bingbot bandwidth
Small web sites Large web sites
- 25. fabricecanel
@facan
Crawling challenges:
Detect relevant content changes
<p>Video provides a powerful way to help you prove your point. When
you click Online Video, you can paste in the embed code for the video you
want to add. You can also type a keyword to search online for the video
that best fits your document. </p>
<p class=ads><a href=/r/15vGdebnbf><img scr=soccerball.png/
alt=“soccer ball”/></a></p>
<p>To make your document look professionally produced, Word provides
header, footer, cover page, and text box designs that complement each
other. For example, you can add a matching cover page, header, and
sidebar. Click Insert and then choose the elements you want from the
different galleries. </p>
- 27. fabricecanel
@facan
Bing main crawl metrics
Metric #1 Crawl effectiveness
North Star: Every page in index is a fresh copy of its web version
Metric #2: Crawl efficiency
North Star: Crawl only updated (fresh on-page content/useful
outbound links) or new URLs
“Metric” #3: Obeying politeness constraints
North Star: Never crawl more often than webmasters *want*
- 29. fabricecanel
@facan
Transforming the paradigm
Real/Near time indexing
Day 1: content update
LAST CENTURY NEW CENTURY
Day 1
Day 2
Day 3
Day 4
Day 5
My daughter has won a prize at school.
https://answers.yahoo.com/question/...
Mar 29, 2012 · My daughter…
Day 1
Hey, I changed.
Thank you !
My daughter has won a prize at school.
https://answers.yahoo.com/question/...
Mar 29, 2012 · My daughter…
- 30. fabricecanel
@facan
How to tell us about new URLs
1. Get your Bing Webmaster tools API ID
https://www.bing.com/webmaster/api/
2. Submit URLs online
Example using wget
wget.exe "https://ssl.bing.com/webmaster/api.svc/pox/SubmitUrl?apikey=7737def21c404dcdaf23bea715e61436"
--header="Content-Type: application/xml; charset=utf-8" --post-data="<SubmitUrl
xmlns="http://schemas.datacontract.org/2004/07/Microsoft.Bing.Webmaster.Api"><siteUrl>http://www.bing.
com</siteUrl><url>http://www.bing.com/fun/?fabrice=pubcon</url></SubmitUrl>"
- 31. fabricecanel
@facan
Getting all content indexed lighting fast
Yoast SEO enabling Bing Real
Time indexing
And soon elsewhere…
• Your sites
• Content Management System
• Web hosting
• SEO plug ins
• SEO agencies
- 32. fabricecanel
@facan
Key takeaway … and the new
magic
Don’t forget the basic : TOP 3 SEO guidelines table
Bing Webmaster Tools
https://www.bing.com/webmaster/
AND Bing Webmaster APIs
https://www.bing.com/webmaster/api/