SlideShare a Scribd company logo
fabricecanel
@facan
The Evolution of AI in Search
fabricecanel
@facan
About me
22 years search veteran at Microsoft
Principal Program Manager leading the
Bing Web Data platform team. The team
discovering, selecting, crawling, processing
billions of new or updated web pages every
day.
Prior to Bing and MSN Search, Lead
Program Manager for microsoft.com search
in the very early days of Search and SEO.
fabricecanel
@facan
The quest for Intelligent Search
fabricecanel
@facan
2009
DESTINATION
EXPERIENCE
The Evolution of Bing
2012
PLATFORM
2017
AI PLATFORM
fabricecanel
@facan
The Evolution of Search
Keyword based
Search
Natural Language
Search
Voice, Vision
Context-Based
Search
Traditional IR
AUTB streams Inverted index
User Engagement
User clicks,
metawords
Inverted index
AI
Deep Learning Vectors
fabricecanel
@facan
Multiple Perspectives
fabricecanel
@facan
fabricecanel
@facan
Intelligent Answers
fabricecanel
@facan
Visual Search
• Image similarity match
• Deep Learning vector encoding
• Approximate nearest
neighborhood search
• Search in several hundred
million images in a few
milliseconds
• Significant improvement to
image recall
fabricecanel
@facan
Object Detection
fabricecanel
@facan
World
knowledge
Work
knowledge
User
knowledge
Microsoft Graph – Creating a more personal graph
fabricecanel
@facan
Microsoft Search and Bing
fabricecanel
@facan
Microsoft Search and Bing
fabricecanel
@facan
What is the goal of the Search Engines?
Discover and Index Everything
Create “the” Knowledge Graph
Super fast access to information
fabricecanel
@facan
No. Not quite:
Provide searchers with
that they can
that provides insights
about the search queries.
fabricecanel
@facan
Bing and Verizon Media
Bing Ads will be the exclusive platform to
manage search ads across Bing
and Verizon Media networks on all devices.
The Microsoft Audience network have now
access to Verizon Media O&O properties,
available via Search workflow campaigns.
Properties include Aol.com, Yahoo.com, Aol
Mail, Yahoo! Mail, Huffington Post and
more.
fabricecanel
@facan
is bigger
than You think
21%
DE
19%
UK
36%
US
18%
FR
fabricecanel
@facan
Is AI the solution to all my
Bing team challenges?
fabricecanel
@facan
My team mission
Build for Microsoft
the world's freshest​,
richest, cleanest,
most comprehensive
and most intelligent
model of the web​​.​​
fabricecanel
@facan
Bing Content goals Webmasters SEO goals
Maximize Content
Comprehensiveness
Have all my sites relevant URLs indexed in Search
Engines
Maximize Content
Freshness
Have all my sites newest URLs and newest updated
content indexed in Search Engines
Maximize Content
Intelligence
Have search engines able to extract the most useful
information of each web page to ease their abilities to
rank my content at the top of search results
Your SEO goals align with my team goals
fabricecanel
@facan
My TOP 3 SEO guidelines table
How to have your
URLs DISCOVERED
1. Good links to your content
2. Avoid duplicate and useless URLs
3. XML sitemap, RSS feeds … and new abilities in upcoming slides
How to have your
URLs RANKED for
crawling/indexing
1. Content Authority: High quality content differentiating from others
2. Content useful, sufficiently detailed and well-presented
3. Get an audience
How to have your
URLs CRAWLED
1. Allow us access the content (allow in robots.txt, crawl quota)
2. Guide us to the new content: RSS, sitemaps lastmod
3. Avoid craziness with JavaScript and CSS
How to have your
content
UNDERSTOOD
1. Standard HTML is highly preferable
2. Tell us more via schema (HTML 5 tags, Json-LD, schema.org etc.)
3. Avoid craziness with JavaScript and CSS
fabricecanel
@facan
Web pages may
appear, change or
disappear on each
web site at any
moment
Bing needs to crawl
and re/crawl to keep
index up-to-date.
Bing needs an effective, efficient crawl
scheduling policy that
1. obeys web hosts’ and Bing’s own
constraints on download bandwidth
2. is efficiently computable over
billions of hosts and hundreds of
billions of page
The challenge of Crawling
fabricecanel
@facan
Crawling challenges:
Diversity of solutions
HARD VERY HARD
EASY INTERMEDIATE
Web sites with great SEO
• Clean URLs
• RSS and sitemaps with lastmod
• Static HTML
Web sites with bad SEO
• Duplicates, useless, junk URLs
• No sitemaps, no RSS
• Lot of JavaScript
• Limited Bingbot bandwidth
Small web sites Large web sites
fabricecanel
@facan
Crawling challenges:
Diversity of intends
SEO, Ads people
“Bing please crawl a full
throttle !”
Web site operation
“Bing please crawl a little off-
peak time”
fabricecanel
@facan
Crawling challenges:
Detect relevant content changes
<p>Video provides a powerful way to help you prove your point. When
you click Online Video, you can paste in the embed code for the video you
want to add. You can also type a keyword to search online for the video
that best fits your document. </p>
<p class=ads><a href=/r/15vGdebnbf><img scr=soccerball.png/
alt=“soccer ball”/></a></p>
<p>To make your document look professionally produced, Word provides
header, footer, cover page, and text box designs that complement each
other. For example, you can add a matching cover page, header, and
sidebar. Click Insert and then choose the elements you want from the
different galleries. </p>
fabricecanel
@facan
Crawling challenges
Webmaster hints often not reliable
Sitemaps are great to inform
search engines about links on
your site...
… but a common SEO
mistake is to set lastmod
(last modified date) to the
date and time the sitemap
is generated
fabricecanel
@facan
Bing main crawl metrics
Metric #1 Crawl effectiveness
North Star: Every page in index is a fresh copy of its web version
Metric #2: Crawl efficiency
North Star: Crawl only updated (fresh on-page content/useful
outbound links) or new URLs
“Metric” #3: Obeying politeness constraints
North Star: Never crawl more often than webmasters *want*
fabricecanel
@facan
What’s about we make it super simple!
You tell us Real Time about your URLs change
fabricecanel
@facan
Transforming the paradigm
Real/Near time indexing
Day 1: content update
LAST CENTURY NEW CENTURY
Day 1
Day 2
Day 3
Day 4
Day 5
My daughter has won a prize at school.
https://answers.yahoo.com/question/...
Mar 29, 2012 · My daughter…
Day 1
Hey, I changed.
Thank you !
My daughter has won a prize at school.
https://answers.yahoo.com/question/...
Mar 29, 2012 · My daughter…
fabricecanel
@facan
How to tell us about new URLs
1. Get your Bing Webmaster tools API ID
https://www.bing.com/webmaster/api/
2. Submit URLs online
Example using wget
wget.exe "https://ssl.bing.com/webmaster/api.svc/pox/SubmitUrl?apikey=7737def21c404dcdaf23bea715e61436"
--header="Content-Type: application/xml; charset=utf-8" --post-data="<SubmitUrl
xmlns="http://schemas.datacontract.org/2004/07/Microsoft.Bing.Webmaster.Api"><siteUrl>http://www.bing.
com</siteUrl><url>http://www.bing.com/fun/?fabrice=pubcon</url></SubmitUrl>"
fabricecanel
@facan
Getting all content indexed lighting fast
Yoast SEO enabling Bing Real
Time indexing
And soon elsewhere…
• Your sites
• Content Management System
• Web hosting
• SEO plug ins
• SEO agencies
fabricecanel
@facan
Key takeaway … and the new
magic
Don’t forget the basic : TOP 3 SEO guidelines table
Bing Webmaster Tools
https://www.bing.com/webmaster/
AND Bing Webmaster APIs
https://www.bing.com/webmaster/api/

More Related Content

Fabrice Canel - Advanced Search Summit Napa 2019

  • 2. fabricecanel @facan About me 22 years search veteran at Microsoft Principal Program Manager leading the Bing Web Data platform team. The team discovering, selecting, crawling, processing billions of new or updated web pages every day. Prior to Bing and MSN Search, Lead Program Manager for microsoft.com search in the very early days of Search and SEO.
  • 5. fabricecanel @facan The Evolution of Search Keyword based Search Natural Language Search Voice, Vision Context-Based Search Traditional IR AUTB streams Inverted index User Engagement User clicks, metawords Inverted index AI Deep Learning Vectors
  • 9. fabricecanel @facan Visual Search • Image similarity match • Deep Learning vector encoding • Approximate nearest neighborhood search • Search in several hundred million images in a few milliseconds • Significant improvement to image recall
  • 14. fabricecanel @facan What is the goal of the Search Engines? Discover and Index Everything Create “the” Knowledge Graph Super fast access to information
  • 15. fabricecanel @facan No. Not quite: Provide searchers with that they can that provides insights about the search queries.
  • 16. fabricecanel @facan Bing and Verizon Media Bing Ads will be the exclusive platform to manage search ads across Bing and Verizon Media networks on all devices. The Microsoft Audience network have now access to Verizon Media O&O properties, available via Search workflow campaigns. Properties include Aol.com, Yahoo.com, Aol Mail, Yahoo! Mail, Huffington Post and more.
  • 17. fabricecanel @facan is bigger than You think 21% DE 19% UK 36% US 18% FR
  • 18. fabricecanel @facan Is AI the solution to all my Bing team challenges?
  • 19. fabricecanel @facan My team mission Build for Microsoft the world's freshest​, richest, cleanest, most comprehensive and most intelligent model of the web​​.​​
  • 20. fabricecanel @facan Bing Content goals Webmasters SEO goals Maximize Content Comprehensiveness Have all my sites relevant URLs indexed in Search Engines Maximize Content Freshness Have all my sites newest URLs and newest updated content indexed in Search Engines Maximize Content Intelligence Have search engines able to extract the most useful information of each web page to ease their abilities to rank my content at the top of search results Your SEO goals align with my team goals
  • 21. fabricecanel @facan My TOP 3 SEO guidelines table How to have your URLs DISCOVERED 1. Good links to your content 2. Avoid duplicate and useless URLs 3. XML sitemap, RSS feeds … and new abilities in upcoming slides How to have your URLs RANKED for crawling/indexing 1. Content Authority: High quality content differentiating from others 2. Content useful, sufficiently detailed and well-presented 3. Get an audience How to have your URLs CRAWLED 1. Allow us access the content (allow in robots.txt, crawl quota) 2. Guide us to the new content: RSS, sitemaps lastmod 3. Avoid craziness with JavaScript and CSS How to have your content UNDERSTOOD 1. Standard HTML is highly preferable 2. Tell us more via schema (HTML 5 tags, Json-LD, schema.org etc.) 3. Avoid craziness with JavaScript and CSS
  • 22. fabricecanel @facan Web pages may appear, change or disappear on each web site at any moment Bing needs to crawl and re/crawl to keep index up-to-date. Bing needs an effective, efficient crawl scheduling policy that 1. obeys web hosts’ and Bing’s own constraints on download bandwidth 2. is efficiently computable over billions of hosts and hundreds of billions of page The challenge of Crawling
  • 23. fabricecanel @facan Crawling challenges: Diversity of solutions HARD VERY HARD EASY INTERMEDIATE Web sites with great SEO • Clean URLs • RSS and sitemaps with lastmod • Static HTML Web sites with bad SEO • Duplicates, useless, junk URLs • No sitemaps, no RSS • Lot of JavaScript • Limited Bingbot bandwidth Small web sites Large web sites
  • 24. fabricecanel @facan Crawling challenges: Diversity of intends SEO, Ads people “Bing please crawl a full throttle !” Web site operation “Bing please crawl a little off- peak time”
  • 25. fabricecanel @facan Crawling challenges: Detect relevant content changes <p>Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document. </p> <p class=ads><a href=/r/15vGdebnbf><img scr=soccerball.png/ alt=“soccer ball”/></a></p> <p>To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries. </p>
  • 26. fabricecanel @facan Crawling challenges Webmaster hints often not reliable Sitemaps are great to inform search engines about links on your site... … but a common SEO mistake is to set lastmod (last modified date) to the date and time the sitemap is generated
  • 27. fabricecanel @facan Bing main crawl metrics Metric #1 Crawl effectiveness North Star: Every page in index is a fresh copy of its web version Metric #2: Crawl efficiency North Star: Crawl only updated (fresh on-page content/useful outbound links) or new URLs “Metric” #3: Obeying politeness constraints North Star: Never crawl more often than webmasters *want*
  • 28. fabricecanel @facan What’s about we make it super simple! You tell us Real Time about your URLs change
  • 29. fabricecanel @facan Transforming the paradigm Real/Near time indexing Day 1: content update LAST CENTURY NEW CENTURY Day 1 Day 2 Day 3 Day 4 Day 5 My daughter has won a prize at school. https://answers.yahoo.com/question/... Mar 29, 2012 · My daughter… Day 1 Hey, I changed. Thank you ! My daughter has won a prize at school. https://answers.yahoo.com/question/... Mar 29, 2012 · My daughter…
  • 30. fabricecanel @facan How to tell us about new URLs 1. Get your Bing Webmaster tools API ID https://www.bing.com/webmaster/api/ 2. Submit URLs online Example using wget wget.exe "https://ssl.bing.com/webmaster/api.svc/pox/SubmitUrl?apikey=7737def21c404dcdaf23bea715e61436" --header="Content-Type: application/xml; charset=utf-8" --post-data="<SubmitUrl xmlns="http://schemas.datacontract.org/2004/07/Microsoft.Bing.Webmaster.Api"><siteUrl>http://www.bing. com</siteUrl><url>http://www.bing.com/fun/?fabrice=pubcon</url></SubmitUrl>"
  • 31. fabricecanel @facan Getting all content indexed lighting fast Yoast SEO enabling Bing Real Time indexing And soon elsewhere… • Your sites • Content Management System • Web hosting • SEO plug ins • SEO agencies
  • 32. fabricecanel @facan Key takeaway … and the new magic Don’t forget the basic : TOP 3 SEO guidelines table Bing Webmaster Tools https://www.bing.com/webmaster/ AND Bing Webmaster APIs https://www.bing.com/webmaster/api/