SlideShare a Scribd company logo
How Search Engines Leverage
Opinion-based Articles for Ranking
Rethinking Search: Corroboration of Web Answers
Koray Tuğberk GÜBÜR
Components for Re-ranking based on Opiniated Factoids
of Web
Open Information
10 Semantic Role
Truth Ranges
Uncertain Inference
• Uncertain Inference is found by C. J. Van
Rijsbergen from Glasgow University.
• Focuses on “Query Inference” with
“Context Understanding”.
• Query Path, and Query Context (Context-
Sensitive Search Elements) are used.
• Query is processed with Probable
Probabilities for Question Generation.
• It requires a “Knowledge Base” for
understanding Factual Needs for the query.
• “Uncertain facts” have a plausibility
threshold that gives “Opinions” to exist on
• Extract word sequences in News Titles.
How do Search Engines know facts?
Andrew Houge
The Structured Search Engine
Uncertain Inference
How do Search Engines know facts?
Andrew Houge – The Structured Search Engine
• Query Processing and Parsing is
another topic.
• But, to reach out to “wrong” and
“true” facts, the high level of
confidence and coverage are
• The Uncertain Inference follows
users’ behaviors in “Adaptive
Search”, or sometimes, it uses
“word-sequences” in a mega corpus.
• Extract, Entity-Attribute Pairs and
their synonyms from News Articles.
Knowledge Base
• Different than Knowledge Graph.
• Stores facts, or factual values for the
same entity-attribute pairs, and
• It is dynamic.
• A fact from today might be
inaccurate information tomorrow.
• Procedural Part of Knowledge Bases
helps to update the connections
between components.
• Understand which facts are
approved by search engine.
Browsable Fact Repository
Corroboration of Web Answers
• One of the best 10 “Opinion Papers” in
Information Retrieval.
• Directly connected to the concept of
“Helpful Content”, or “Information
• “Even, main web source has
contradicting information for the same
question, which one is fact?”.
• Corroboration of Web Answers focus on
“Truth Ranges”, and “Answer
Prominence” to choose answers from
certain sources.
• Create your own truth range by
auditing ranking resources.
How do Search Engines know facts?
Corroboration of Web Answers
• Minji Wu, and Amelia Marian focus on
numeric values and measure units to
find real authorities.
• PageRank, Source Authority, First
Answer, Closeness to First Answer and
De-duplication are used to determine a
“Fact Range”, or “Truth Range”.
• The “Truth Range” changes from today
to tomorrow according to ranking
• Use numeric values, metrics, dates, and
measurement units to have higher
How do Search Engines know facts?
Corroboration of Web Answers
• Google cited the research paper
of “Corroborating Answers from
Multiple Web Sources” more than
40 times in “Candidate Answer
Passage” patent series.
• It is used in Featured Snippets
(Web Answers) since 2018.
• This brings us to “Embarrassment
• Use “safe” and “indirect” answers
for conflicted issues.
How do Search Engines know facts?
Embarrassment Factor
• What is Embarrassment Factor?
• Does a Search Engine get shame?
• Can you make a search engine feel shame
with your bad answer, or opinion?
• What happens if you tell that “Barrack
Obama is a communist” in a featured
snippet? Or, “Global Warming is hoax”, or
“Vaccines are for controlling your brain”.
• Let’s remember, “Truth Ranges”.
• Do not play with the patience of search
engine engineers. Do not take advantage
of fundamental NLP understanding.
How do Search Engines know facts?
Truth Ranges
• Fuzzy Logic is used.
• Not every wrong is equal.
• Some facts are more facts.
• Some opinions are accepted as consensus.
• Upper and Bottom Limits are used to
determine “safe opinions”.
• Google created “Content Advisories” to
help for “Information Consensus”.
• Stay in the consensus (reports with
descriptive news), unless it is “satiric”
(critiques with questions).
• Use “question-format” as a shield against
algorithms, if you are outside of truth
Which one is more factual?
Source: Wesley Chai
Truth Ranges
• There are two different approaches
in Linguistics for a “truth”, or “fact”.
• Words like “will”, “can”, “might”, “may”,
“may” decrease the certainty.
• Numeric Ranges, or Sentiment
Magnitude and Direction are used.
• The middle of range is called
• The answers that are outside of
Range is filtered out.
• Find the balance between
“precision” and “coverage” in news
titles, and intros.
How do Search Engines know facts?
Truth Ranges
• According to Fuzzy Logic:
• 1 > 5 and 1 > 10 are not equally wrong.
• One of them is more wrong than other.
• For “Disagreeing Views”,
“Corroboration” happens with
• “Barrack Obama is born in Hawaii”,
• “Barrack Obama is born in Kenya”.
• A search engine might see “Barrack
Obama is a US Citizen” as a safe answer to
give to avoid embarrassment.
• Use the absolute truths, for projecting
a safe answer rather than giving a
possible wrong factoid.
Journalists share organization’s trustworthiness
Source: Indiatimes
Source: Making Better Informed Trust Decisions with Generalized Fact-Finding
Truth Ranges
• Uncertainty is used as a measurement to filter
• Phrases like “I am sure”, or “%45 possibility” create
• Intrinsic Ambiguities decrease the trust to the
• “Who claims what” is key point for fact-finding
• Source Reliability and, “Variance” and “Mean”
values are used for “fixpoints”.
• Do not use “I am sure”, or “Pretty sure”, “I think…”,
“In my opinion…”, “It might”, “It may”. Tell whether
the “bomb exploded”, or not. Tell “how many
people died”, do not tell “With %45 possibility,
over 20 people…”
• Compare your numbers, names, dates and places
for an event to your competitors.
“Safe Answers” is better.
Source: Making Better Informed Trust Decisions with Generalized Fact-Finding
CIUV: Collaborating Information Against Unreliable Views
Truth Ranges: Why do we need PageRank?
• Speed.
• Google and other search engines do not have time
to process text of the documents.
• News SEO has to prioritize “indexing”.
• News Search Engine has to serve everything in
fastest way.
• Processing the text, checking accuracy is not
possible in seconds, minutes, or hours and days,
when a source publishes 100,000 words a day.
• Thus, Truth Ranges is a “long-term ranking factor”
for news sources.
• Google gets angry when I give PageRank related
• Understand that, some sources are prioritized,
even if they scrape and use your original news
Groundedness - Unanimity
Source: Towards an axiomatic approach to truth discovery
Source: Towards an axiomatic approach to truth discovery
Truth Ranges: Why do we need PageRank?
We guess that this news is quality…
Source: Corroborating Information from Disagreeing Views
Source: Corroborating Information from
Disagreeing Views
Information Extraction (OIE)
An example of OIE
• Open Information Extraction is found
• WAVII is bought by Google for $30
• It is used to expand Google’s
Knowledge Graph.
• OIE is to extract triples, and recognize
minor entities to structure a semantic
• Extract “predicates” from news
articles. Create tuples from
“predicates, nouns, and subjects”.
• Understand which fact, or factoid is
given first, or later.
Open Information Extraction Example from the researchers.
Information Extraction (OIE): Rel-grams
Precision / Coverage
• Open Information Extraction is to
extract opinions, and facts about
certain concepts, and named entities.
• It uses “tuples” as “predicate” and
• Aggregates occurrences, standardizing
the masked sections by comparing the
different OIE iterations.
• Match “prepositions” to
“interrogative” terms.
• Use “uncertain inference” to extract
interrogative terms.
Information Extraction (OIE): Rel-grams
Word Connections and Sense Disambiguation
• OIE is used by Google to recognize and
understand micro entities, and knowledge on
the web.
• OIE is helpful for processing the text in the
news sources to understand latest changes in
real-world, and reflect it on the knowledge
• Open Information Extraction is different than
Information Retrieval.
• The opinions and facts of web sources are
compared to each other to understand the
higher groundedness.
• Update outdated facts in your website. “X
lives in P” declaration might be wrong, if “X”
is not alive anymore. How many “died in”
entity lives in your internal knowledge base?
External Databases (Data Commons)
Structuring the Web
• Data Commons is aggregation of
unified databases for nearly every
topic, industry, geography and
• It is a common fact repository that
is open to all web.
• It is supported by Ramanathan V.
• It focuses on statistical data.
• Query external databases for
“statistics” to create statistic-rich
news articles.
External Databases (Data Commons)
How do Search Engines know facts?
• Google integrated Data
Commons Project to its own
• The announcement is done by
Prabhakar Raghavan.
• It helps to understand accuracy,
and authority of an information
• A trustworthy news article
propagate its trust to next news
External Databases (Data Commons)
“As we may think”
External Databases (Data Commons)
“As we may think”
“Google is planned to be third-part of your brain”
- Sergey Bring
“Google is designed as a Star Trek Computer to
answer your needs.
It is not created for websites, it is created for users.
- Larry Page
“They already hate Google, so what is the down-
- Craig Nevill-Manning
Semantic Role Labeling
Which news source reflected emotions?
• Words’ order change, but sentence’s
meaning stay same.
• Same opinion can be expressed in
many different ways.
• XYZ corporation bought the stock.
• They sold the stock to XYZ corporation.
• The stock was bought by XYZ corporation.
• The purchase of the stock by XYZ
• corporation ...
• The stock purchase by XYZ corporation ...
• OIE provides an aggregation for
tuples, and relational n-grams to
extract factual propositions.
• Semantic Role Labels help for
standardization based on
• Match “emotions” to “causes” with
shorter declarations, stay away from
“nested declarations”.
Semantic Role Labeling as Dependency Parsing: Exploring
Latent Tree Structures Inside Arguments
Semantic Role Labeling
Agent – Predicate - Theme
• Predicates can take multiple
• Semantic role labels are descriptions of
the semantic relation between the
predicate and its arguments.
• Semantic Roles are abstract
representations of the role that an
argument plays in the event described
by the predicate.
• Semantic Role Labeling assigns roles to
the constituents of a sentence.
• Semantic selection restrictions allow
words to have semantic contractions on
the semantic properties.
• Understand “patterns of human
mind”. Reflect these patterns in news
articles, according to “macro-
Semantic Role Labeling
Predicate is context.
• Let’s say, “George Bush” phrase appeared
500,000 times in the News Titles.
• Google has to categorize them according to the
news contexts.
• “Context-based Person Search” is used for this
• But, News Search Engines have to be fast.
• There is no time for processing the text.
• But, “SRL” is a quick process.
• Check Semantic Role Label of Entity, is it agent? Or, is it
• Which instrument is used?
• Which goal is mentioned?
• Which propositional structure is used?
• For the sentence “George Bush signed military
operation”, the “Relational Grams”, “Aggregated
Tuples”, and “Semantic Role Labels” help a
search engine to differentiate entities/context from
each other.
• “Grouping entities” is not enough. Group
“contexts”. “X and Love Life”, “X and Career”
have different contexts. Connections should
follow “identity” and context together. Analyze
“News Context”, more than “Entity” that
Semantic Role Labeling
How do opinions differ in phrases?
• Beyond Classification:
• It helps to see the factual information.
• It is used to differentiate opinions from
each other.
• It measures the possibility of truth.
• It understands the representation of the
web source according to its connection to
• Semantic Role Labeling is used by
semantic search engines to have
better entity associations.
• The suggested associations, or
graphs are accepted or rejected by
semantic network constructors.
• “Names in the News Title”
should match the Faces in the
News Image.
Source: Marina Santini, Brighton University
Source: Grounded Semantic Role Labeling
Question-Answer Pairs
Which evidence is correct?
• Question Generation and Answer
Pairing are NLP tasks for fact
• Question generation involves query
parsing and processing.
• Answer pairing involves dense-context
retrieval and question-answer format
• But, it is not clear which answer is
more accurate.
• Thus, Question-Answer Coverage,
Entity-oriented search and Semantic-
Syntatic Parsing are used.
• Matching entities, attributes,
queries, or phrases are not good
enough, as long as information is
not responsive.
Source: Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
Information Literacy - Consensus
Who said it?
• Google started to give education for
Information Literacy.
• It involves recognizing information source
before the information on the source.
• Google ranks News Sources for certain
topics, contexts and entities before ranking
the news.
• The need of “fast indexing and serving” will
always be more important than
understanding the “truth” at the first stage.
• Thus, the quality news sources have higher
accuracy with more historical data, and
• Google has to assume that truth comes
from strength of repeated evidence from
the most authoritative sources.
• Audit “About the source” panels of your
competitors, create a review, and third-
party mention gap.
Information Literacy - Consensus
Author Authority?
• Danny Sullivan once asked Google
and Bing whether they use social
signals, or author names to
understand who is the real expert on
a topic.
• Both of the search engines said that
they audit “author quality” and
“author expertise” for different
• Associate authoritative authors
with your web source stronger, if
they are writing for multiple web
Information Literacy - Consensus
How do they use Knowledge Base?
Integrating Knowledge Graph and Natural Text for Language
Model Pre-training
• There are hundreds of different
algorithms to understand the
authenticity and “true facts”.
• For a search engine engineer,
there is no “lie” and “fact”.
• It is only “true facts” and “wrong
• And, KELM-like algorithms help
together to differentiate them
from each other.
• Query “Google Knowledge Graph
API” to understand what they
state for the same entity.
Information Literacy - Cues
What makes you trustworthy?
• The research that Google cites
mentioned that there are “6 Cues for
snap judgments about whom to trust”.
• These involve “images”, “brands”,
“headlines – tonality”, “social cues”,
“sponsors”, and “interactivity”.
• Google works with MediaWise to
perform surveys and integrate findings
to their own algorithms.
• Create your own “audit templates” for
news articles for these 6 different
verticals. Mark up “MediaWise”.
Information Literacy – About this source
Why does your opinion matter?
• The story of “Web Answers” is
too long.
• Context-terms, Topical Entries,
Candidate Answer Passages,
Context-scoring for Candidate
Answer Passages, and many
more concepts…
• Google Product Manager calls
these “word callouts”.
• Search Engine Engineers call
them “representative answer”.
• Learn NLP. Scoring Candidate Answer Passages
Some Google Designs
Machine learning to identify opinions in documents
•Identifying opinionated portions in documents
•Relating opinionated portions inside the document
and/or across other documents (e.g., that relate to
the same story)
•To surface opinionated snippets or quotes to users
of a news aggregation.
•To identify portions of a document that convey
•Google might rank a source for “report”, but not
for “opinion”. Understand which vertical has a
higher chance for your web source.
Some Google Designs
System and method for supporting editorial opinion in the
ranking of search results
“Editorial opinion” without “distorting facts” helps you for ranking.
Especially for “first-person” experience stories, or reviews.
Some Google Designs
Embedded communication of link information
“Information in the improved link tags may allow one or more publishers of content and/or
documents to convey opinions about content and/or documents at one or more content
locations and/or one or more document locations. The link tags may also allow one or more
publishers to convey a weighting of the relative importance of one or more content locations
and/or one or more document locations. In some embodiment, at least a portion of the
information in the improved link tags may be encrypted, to allow one or more publishers to
restrict the audience that may view the information in the link tags….. The improved link tags may
allow the publishers to communicate additional information, such as opinions, about the content locations
and/or document locations.”
Categorize boilerplate/main content links according to their context.
“Joe Biden and Congress” might have a different “block-link” than “Joe Biden and Elections”.
Some Google Designs
Aspect-Based Sentiment Summarization
Use “key-points” with “sentiments” to summarize essence of news stories.
Topicality and Context Filters
Long and Shor Term Solutions for SERP Construction in News Vertical
Short-term Solutions for News Search Engines:
• Classify authoritative sources (PageRank,
Article Count, Unique Sentence Count,
Publication Frequency, Length, Citations,
Search Behaviors).
• Rank authoritative sources for different
• Classify and rank news web pages according
to their context, and topicality.
• Serve the most relevant news articles based
on trust and confidence.
Long-term Solutions for News Search Engines:
• Process text.
• Understand facts.
• Audit accuracy and comprehensiveness.
• Filter the sources, by re-assigning topical
relevance and authority.
Samples from News SEO with Factoids
Samples from News SEO with Factoids
Samples from News SEO with Factoids
Samples from News SEO with Factoids
Samples from News SEO with Factoids
Samples from News SEO with Factoids
Some Samples
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?
What would you do if you were Google?
Which opinions should rank?

More Related Content

What's hot

Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Koray Tugberk GUBUR
Coronavirus and Future of SEO: Digital Marketing and Remote Culture
Coronavirus and Future of SEO: Digital Marketing and Remote CultureCoronavirus and Future of SEO: Digital Marketing and Remote Culture
Coronavirus and Future of SEO: Digital Marketing and Remote Culture
Koray Tugberk GUBUR
How to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With PythonHow to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With Python
Quality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCQuality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGC
Hamlet Batista
Slawski New Approaches for Structured Data:Evolution of Question Answering
Slawski   New Approaches for Structured Data:Evolution of Question Answering Slawski   New Approaches for Structured Data:Evolution of Question Answering
Slawski New Approaches for Structured Data:Evolution of Question Answering
Bill Slawski
Semantic seo and the evolution of queries
Semantic seo and the evolution of queriesSemantic seo and the evolution of queries
Semantic seo and the evolution of queries
Bill Slawski
Internal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptxInternal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptx
Dixon Jones
The Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy MarketerThe Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy Marketer
Hamlet Batista
BrightonSEO March 2021 | Dan Taylor, Image Entity Tags
BrightonSEO March 2021 | Dan Taylor, Image Entity TagsBrightonSEO March 2021 | Dan Taylor, Image Entity Tags
BrightonSEO March 2021 | Dan Taylor, Image Entity Tags
Dan Taylor
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022
Andrew Charlton
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
Dawn Anderson MSc DigM
William slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchWilliam slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-search
Bill Slawski
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdfBrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
Steven van Vessum
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
SEO Case Study - From 12 March to 24 September Core Update
SEO Case Study - From 12 March to 24 September Core UpdateSEO Case Study - From 12 March to 24 September Core Update
SEO Case Study - From 12 March to 24 September Core Update
Koray Tugberk GUBUR
What is in a link?
What is in a link?What is in a link?
What is in a link?
Dixon Jones
Antifragility in Digital Marketing
Antifragility in Digital MarketingAntifragility in Digital Marketing
Antifragility in Digital Marketing
Elias Dabbas
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
Aleyda Solís
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google Lighthouse
Hamlet Batista

What's hot (20)

Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Coronavirus and Future of SEO: Digital Marketing and Remote Culture
Coronavirus and Future of SEO: Digital Marketing and Remote CultureCoronavirus and Future of SEO: Digital Marketing and Remote Culture
Coronavirus and Future of SEO: Digital Marketing and Remote Culture
How to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With PythonHow to Automatically Subcategorise Your Website Automatically With Python
How to Automatically Subcategorise Your Website Automatically With Python
Quality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGCQuality Content at Scale Through Automated Text Summarization of UGC
Quality Content at Scale Through Automated Text Summarization of UGC
Slawski New Approaches for Structured Data:Evolution of Question Answering
Slawski   New Approaches for Structured Data:Evolution of Question Answering Slawski   New Approaches for Structured Data:Evolution of Question Answering
Slawski New Approaches for Structured Data:Evolution of Question Answering
Semantic seo and the evolution of queries
Semantic seo and the evolution of queriesSemantic seo and the evolution of queries
Semantic seo and the evolution of queries
Internal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptxInternal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptx
The Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy MarketerThe Python Cheat Sheet for the Busy Marketer
The Python Cheat Sheet for the Busy Marketer
BrightonSEO March 2021 | Dan Taylor, Image Entity Tags
BrightonSEO March 2021 | Dan Taylor, Image Entity TagsBrightonSEO March 2021 | Dan Taylor, Image Entity Tags
BrightonSEO March 2021 | Dan Taylor, Image Entity Tags
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
William slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchWilliam slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-search
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdfBrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
SEO Case Study - From 12 March to 24 September Core Update
SEO Case Study - From 12 March to 24 September Core UpdateSEO Case Study - From 12 March to 24 September Core Update
SEO Case Study - From 12 March to 24 September Core Update
What is in a link?
What is in a link?What is in a link?
What is in a link?
Antifragility in Digital Marketing
Antifragility in Digital MarketingAntifragility in Digital Marketing
Antifragility in Digital Marketing
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
7 E-Commerce SEO Mistakes & How to Fix Them #DeepSEOCon
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google Lighthouse

Similar to Opinion-based Article Ranking for Information Retrieval Systems: Factoids and Facts

How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)
Dixon Jones
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyond
Bill Slawski
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
Jess Melia
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
Lorri Mon
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
Sloan Carne
Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018
Courtney Clark
Dean r berry a determining the credibilitey of sources final 2 27
Dean r berry a determining the credibilitey of sources  final 2 27Dean r berry a determining the credibilitey of sources  final 2 27
Dean r berry a determining the credibilitey of sources final 2 27
Riverside County Office of Education
Dean r berry Determining the Credibility of Sources
Dean r berry  Determining the Credibility of SourcesDean r berry  Determining the Credibility of Sources
Dean r berry Determining the Credibility of Sources
Riverside County Office of Education
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
Marianne Sweeny
Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-final
Marianne Sweeny
Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of Search
Bill Slawski
Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013
Evaluating Webpages
Evaluating WebpagesEvaluating Webpages
Evaluating Webpages
CSI: Clinical Site Intelligence
CSI: Clinical Site IntelligenceCSI: Clinical Site Intelligence
CSI: Clinical Site Intelligence
Class 1-become-an-online-sleuth
Class 1-become-an-online-sleuthClass 1-become-an-online-sleuth
Class 1-become-an-online-sleuth
Wheeler School
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each Other
Ian Lurie
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Marianne Sweeny

Similar to Opinion-based Article Ranking for Information Retrieval Systems: Factoids and Facts (20)

How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)How to evaluate the whole web (without being Google)
How to evaluate the whole web (without being Google)
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyond
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
‘How Topics Affect Everyone and Everything’ by Dixon Jones - Marketing Direct...
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018Data Informed Design - Good Tech Test - May 2018
Data Informed Design - Good Tech Test - May 2018
Dean r berry a determining the credibilitey of sources final 2 27
Dean r berry a determining the credibilitey of sources  final 2 27Dean r berry a determining the credibilitey of sources  final 2 27
Dean r berry a determining the credibilitey of sources final 2 27
Dean r berry Determining the Credibility of Sources
Dean r berry  Determining the Credibility of SourcesDean r berry  Determining the Credibility of Sources
Dean r berry Determining the Credibility of Sources
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-final
Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013Enterprise Search and Findability in 2013
Enterprise Search and Findability in 2013
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of Search
Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013Designing Big Content - Search Exchange 2013
Designing Big Content - Search Exchange 2013
Evaluating Webpages
Evaluating WebpagesEvaluating Webpages
Evaluating Webpages
CSI: Clinical Site Intelligence
CSI: Clinical Site IntelligenceCSI: Clinical Site Intelligence
CSI: Clinical Site Intelligence
Class 1-become-an-online-sleuth
Class 1-become-an-online-sleuthClass 1-become-an-online-sleuth
Class 1-become-an-online-sleuth
What IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each OtherWhat IA, UX and SEO Can Learn from Each Other
What IA, UX and SEO Can Learn from Each Other
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape

Recently uploaded

10th International Conference on Networks, Mobile Communications and Telema...
10th International Conference on Networks, Mobile Communications and   Telema...10th International Conference on Networks, Mobile Communications and   Telema...
10th International Conference on Networks, Mobile Communications and Telema...
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirtsJarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Book dating , international dating phgra
Book dating , international dating phgraBook dating , international dating phgra
Book dating , international dating phgra
very nice project on internet class 10.pptx
very nice project on internet class 10.pptxvery nice project on internet class 10.pptx
very nice project on internet class 10.pptx
seo proposal | Kiyado Innovations LLP pdf
seo proposal | Kiyado Innovations LLP  pdfseo proposal | Kiyado Innovations LLP  pdf
seo proposal | Kiyado Innovations LLP pdf
Founders Of Digital World Social Media..
Founders Of Digital World Social Media..Founders Of Digital World Social Media..
Founders Of Digital World Social Media..
jom pom
University of Otago degree offer diploma Transcript
University of Otago degree offer diploma TranscriptUniversity of Otago degree offer diploma Transcript
University of Otago degree offer diploma Transcript
Massey University degree offer diploma Transcript
Massey University degree offer diploma TranscriptMassey University degree offer diploma Transcript
Massey University degree offer diploma Transcript
Carrington degree offer diploma Transcript
Carrington degree offer diploma TranscriptCarrington degree offer diploma Transcript
Carrington degree offer diploma Transcript
About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...
Erkinjon Erkinov
How to Choose the Right UIUX Design Service for Optimal Customer Experience
How to Choose the Right UIUX Design Service for Optimal Customer ExperienceHow to Choose the Right UIUX Design Service for Optimal Customer Experience
How to Choose the Right UIUX Design Service for Optimal Customer Experience
Serva AppLabs

Recently uploaded (20)

10th International Conference on Networks, Mobile Communications and Telema...
10th International Conference on Networks, Mobile Communications and   Telema...10th International Conference on Networks, Mobile Communications and   Telema...
10th International Conference on Networks, Mobile Communications and Telema...
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirtsJarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Book dating , international dating phgra
Book dating , international dating phgraBook dating , international dating phgra
Book dating , international dating phgra
very nice project on internet class 10.pptx
very nice project on internet class 10.pptxvery nice project on internet class 10.pptx
very nice project on internet class 10.pptx
seo proposal | Kiyado Innovations LLP pdf
seo proposal | Kiyado Innovations LLP  pdfseo proposal | Kiyado Innovations LLP  pdf
seo proposal | Kiyado Innovations LLP pdf
Founders Of Digital World Social Media..
Founders Of Digital World Social Media..Founders Of Digital World Social Media..
Founders Of Digital World Social Media..
University of Otago degree offer diploma Transcript
University of Otago degree offer diploma TranscriptUniversity of Otago degree offer diploma Transcript
University of Otago degree offer diploma Transcript
Massey University degree offer diploma Transcript
Massey University degree offer diploma TranscriptMassey University degree offer diploma Transcript
Massey University degree offer diploma Transcript
Carrington degree offer diploma Transcript
Carrington degree offer diploma TranscriptCarrington degree offer diploma Transcript
Carrington degree offer diploma Transcript
About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...About Alibaba company and brief general information regarding how to trade on...
About Alibaba company and brief general information regarding how to trade on...
How to Choose the Right UIUX Design Service for Optimal Customer Experience
How to Choose the Right UIUX Design Service for Optimal Customer ExperienceHow to Choose the Right UIUX Design Service for Optimal Customer Experience
How to Choose the Right UIUX Design Service for Optimal Customer Experience

Opinion-based Article Ranking for Information Retrieval Systems: Factoids and Facts

  • 1. How Search Engines Leverage Opinion-based Articles for Ranking Rethinking Search: Corroboration of Web Answers Koray Tuğberk GÜBÜR
  • 2. Components for Re-ranking based on Opiniated Factoids 01 Uncertain Inference Knowledge Base 02 Corroboration of Web Answers 03 Embarrassment Factor 04 Open Information Extraction 5 External Databases 6 7 Evidence Aggregation 09 9 Information Literacy 06 07 08 10 Semantic Role Labeling Truth Ranges 05
  • 3. Uncertain Inference • Uncertain Inference is found by C. J. Van Rijsbergen from Glasgow University. • Focuses on “Query Inference” with “Context Understanding”. • Query Path, and Query Context (Context- Sensitive Search Elements) are used. • Query is processed with Probable Probabilities for Question Generation. • It requires a “Knowledge Base” for understanding Factual Needs for the query. • “Uncertain facts” have a plausibility threshold that gives “Opinions” to exist on results. • Extract word sequences in News Titles. How do Search Engines know facts? Andrew Houge The Structured Search Engine
  • 4. Uncertain Inference How do Search Engines know facts? Andrew Houge – The Structured Search Engine • Query Processing and Parsing is another topic. • But, to reach out to “wrong” and “true” facts, the high level of confidence and coverage are needed. • The Uncertain Inference follows users’ behaviors in “Adaptive Search”, or sometimes, it uses “word-sequences” in a mega corpus. • Extract, Entity-Attribute Pairs and their synonyms from News Articles.
  • 5. Knowledge Base • Different than Knowledge Graph. • Stores facts, or factual values for the same entity-attribute pairs, and triples. • It is dynamic. • A fact from today might be inaccurate information tomorrow. • Procedural Part of Knowledge Bases helps to update the connections between components. • Understand which facts are approved by search engine. Browsable Fact Repository
  • 6. Corroboration of Web Answers • One of the best 10 “Opinion Papers” in Information Retrieval. • Directly connected to the concept of “Helpful Content”, or “Information Responsiveness”. • “Even, main web source has contradicting information for the same question, which one is fact?”. • Corroboration of Web Answers focus on “Truth Ranges”, and “Answer Prominence” to choose answers from certain sources. • Create your own truth range by auditing ranking resources. How do Search Engines know facts?
  • 7. Corroboration of Web Answers • Minji Wu, and Amelia Marian focus on numeric values and measure units to find real authorities. • PageRank, Source Authority, First Answer, Closeness to First Answer and De-duplication are used to determine a “Fact Range”, or “Truth Range”. • The “Truth Range” changes from today to tomorrow according to ranking sources • Use numeric values, metrics, dates, and measurement units to have higher precision. How do Search Engines know facts?
  • 8. Corroboration of Web Answers • Google cited the research paper of “Corroborating Answers from Multiple Web Sources” more than 40 times in “Candidate Answer Passage” patent series. • It is used in Featured Snippets (Web Answers) since 2018. • This brings us to “Embarrassment Factor”. • Use “safe” and “indirect” answers for conflicted issues. How do Search Engines know facts?
  • 9. Embarrassment Factor • What is Embarrassment Factor? • Does a Search Engine get shame? • Can you make a search engine feel shame with your bad answer, or opinion? • What happens if you tell that “Barrack Obama is a communist” in a featured snippet? Or, “Global Warming is hoax”, or “Vaccines are for controlling your brain”. • Let’s remember, “Truth Ranges”. • Do not play with the patience of search engine engineers. Do not take advantage of fundamental NLP understanding. How do Search Engines know facts?
  • 10. Truth Ranges • Fuzzy Logic is used. • Not every wrong is equal. • Some facts are more facts. • Some opinions are accepted as consensus. • Upper and Bottom Limits are used to determine “safe opinions”. • Google created “Content Advisories” to help for “Information Consensus”. • Stay in the consensus (reports with descriptive news), unless it is “satiric” (critiques with questions). • Use “question-format” as a shield against algorithms, if you are outside of truth ranges. Which one is more factual? Source: Wesley Chai
  • 11. Truth Ranges • There are two different approaches in Linguistics for a “truth”, or “fact”. • Words like “will”, “can”, “might”, “may”, “may” decrease the certainty. • Numeric Ranges, or Sentiment Magnitude and Direction are used. • The middle of range is called “Fixpoint”. • The answers that are outside of Range is filtered out. • Find the balance between “precision” and “coverage” in news titles, and intros. How do Search Engines know facts?
  • 12. Truth Ranges • According to Fuzzy Logic: • 1 > 5 and 1 > 10 are not equally wrong. • One of them is more wrong than other. • For “Disagreeing Views”, “Corroboration” happens with inference. • “Barrack Obama is born in Hawaii”, • “Barrack Obama is born in Kenya”. • A search engine might see “Barrack Obama is a US Citizen” as a safe answer to give to avoid embarrassment. • Use the absolute truths, for projecting a safe answer rather than giving a possible wrong factoid. Journalists share organization’s trustworthiness Source: Indiatimes Source: Making Better Informed Trust Decisions with Generalized Fact-Finding
  • 13. Truth Ranges • Uncertainty is used as a measurement to filter factoids. • Phrases like “I am sure”, or “%45 possibility” create uncertainty. • Intrinsic Ambiguities decrease the trust to the source. • “Who claims what” is key point for fact-finding algorithms. • Source Reliability and, “Variance” and “Mean” values are used for “fixpoints”. • Do not use “I am sure”, or “Pretty sure”, “I think…”, “In my opinion…”, “It might”, “It may”. Tell whether the “bomb exploded”, or not. Tell “how many people died”, do not tell “With %45 possibility, over 20 people…” • Compare your numbers, names, dates and places for an event to your competitors. “Safe Answers” is better. Source: Making Better Informed Trust Decisions with Generalized Fact-Finding CIUV: Collaborating Information Against Unreliable Views
  • 14. Truth Ranges: Why do we need PageRank? • Speed. • Google and other search engines do not have time to process text of the documents. • News SEO has to prioritize “indexing”. • News Search Engine has to serve everything in fastest way. • Processing the text, checking accuracy is not possible in seconds, minutes, or hours and days, when a source publishes 100,000 words a day. • Thus, Truth Ranges is a “long-term ranking factor” for news sources. • Google gets angry when I give PageRank related suggestions. • Understand that, some sources are prioritized, even if they scrape and use your original news story. Groundedness - Unanimity Source: Towards an axiomatic approach to truth discovery Source: Towards an axiomatic approach to truth discovery
  • 15. Truth Ranges: Why do we need PageRank? We guess that this news is quality… Source: Corroborating Information from Disagreeing Views Source: Corroborating Information from Disagreeing Views
  • 16. Information Extraction (OIE) An example of OIE • Open Information Extraction is found by WAVII. • WAVII is bought by Google for $30 Million. • It is used to expand Google’s Knowledge Graph. • OIE is to extract triples, and recognize minor entities to structure a semantic network. • Extract “predicates” from news articles. Create tuples from “predicates, nouns, and subjects”. • Understand which fact, or factoid is given first, or later. Open Information Extraction Example from the researchers.
  • 17. Information Extraction (OIE): Rel-grams Precision / Coverage • Open Information Extraction is to extract opinions, and facts about certain concepts, and named entities. • It uses “tuples” as “predicate” and “noun”. • Aggregates occurrences, standardizing the masked sections by comparing the different OIE iterations. • Match “prepositions” to “interrogative” terms. • Use “uncertain inference” to extract interrogative terms.
  • 18. Information Extraction (OIE): Rel-grams Word Connections and Sense Disambiguation • OIE is used by Google to recognize and understand micro entities, and knowledge on the web. • OIE is helpful for processing the text in the news sources to understand latest changes in real-world, and reflect it on the knowledge base. • Open Information Extraction is different than Information Retrieval. • The opinions and facts of web sources are compared to each other to understand the higher groundedness. • Update outdated facts in your website. “X lives in P” declaration might be wrong, if “X” is not alive anymore. How many “died in” entity lives in your internal knowledge base?
  • 19. External Databases (Data Commons) Structuring the Web • Data Commons is aggregation of unified databases for nearly every topic, industry, geography and entity. • It is a common fact repository that is open to all web. • It is supported by Ramanathan V. Guha. • It focuses on statistical data. • Query external databases for “statistics” to create statistic-rich news articles.
  • 20. External Databases (Data Commons) How do Search Engines know facts? • Google integrated Data Commons Project to its own algorithms. • The announcement is done by Prabhakar Raghavan. • It helps to understand accuracy, and authority of an information source. • A trustworthy news article propagate its trust to next news article.
  • 21. External Databases (Data Commons) “As we may think”
  • 22. External Databases (Data Commons) “As we may think” “Google is planned to be third-part of your brain” - Sergey Bring “Google is designed as a Star Trek Computer to answer your needs. It is not created for websites, it is created for users. - Larry Page “They already hate Google, so what is the down- side?” - Craig Nevill-Manning
  • 23. Semantic Role Labeling Which news source reflected emotions? • Words’ order change, but sentence’s meaning stay same. • Same opinion can be expressed in many different ways. • XYZ corporation bought the stock. • They sold the stock to XYZ corporation. • The stock was bought by XYZ corporation. • The purchase of the stock by XYZ • corporation ... • The stock purchase by XYZ corporation ... • OIE provides an aggregation for tuples, and relational n-grams to extract factual propositions. • Semantic Role Labels help for standardization based on “predicates”. • Match “emotions” to “causes” with shorter declarations, stay away from “nested declarations”. Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
  • 24. Semantic Role Labeling Agent – Predicate - Theme • Predicates can take multiple arguments. • Semantic role labels are descriptions of the semantic relation between the predicate and its arguments. • Semantic Roles are abstract representations of the role that an argument plays in the event described by the predicate. • Semantic Role Labeling assigns roles to the constituents of a sentence. • Semantic selection restrictions allow words to have semantic contractions on the semantic properties. • Understand “patterns of human mind”. Reflect these patterns in news articles, according to “macro- context”.
  • 25. Semantic Role Labeling Predicate is context. • Let’s say, “George Bush” phrase appeared 500,000 times in the News Titles. • Google has to categorize them according to the news contexts. • “Context-based Person Search” is used for this task. • But, News Search Engines have to be fast. • There is no time for processing the text. • But, “SRL” is a quick process. • Check Semantic Role Label of Entity, is it agent? Or, is it theme? • Which instrument is used? • Which goal is mentioned? • Which propositional structure is used? • For the sentence “George Bush signed military operation”, the “Relational Grams”, “Aggregated Tuples”, and “Semantic Role Labels” help a search engine to differentiate entities/context from each other. • “Grouping entities” is not enough. Group “contexts”. “X and Love Life”, “X and Career” have different contexts. Connections should follow “identity” and context together. Analyze “News Context”, more than “Entity” that appears.
  • 26. Semantic Role Labeling How do opinions differ in phrases? • Beyond Classification: • It helps to see the factual information. • It is used to differentiate opinions from each other. • It measures the possibility of truth. • It understands the representation of the web source according to its connection to others. • Semantic Role Labeling is used by semantic search engines to have better entity associations. • The suggested associations, or graphs are accepted or rejected by semantic network constructors. • “Names in the News Title” should match the Faces in the News Image. Source: Marina Santini, Brighton University Source: Grounded Semantic Role Labeling
  • 27. Question-Answer Pairs Which evidence is correct? • Question Generation and Answer Pairing are NLP tasks for fact extraction. • Question generation involves query parsing and processing. • Answer pairing involves dense-context retrieval and question-answer format matching. • But, it is not clear which answer is more accurate. • Thus, Question-Answer Coverage, Entity-oriented search and Semantic- Syntatic Parsing are used. • Matching entities, attributes, queries, or phrases are not good enough, as long as information is not responsive. Source: Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
  • 28. Information Literacy - Consensus Who said it? • Google started to give education for Information Literacy. • It involves recognizing information source before the information on the source. • Google ranks News Sources for certain topics, contexts and entities before ranking the news. • The need of “fast indexing and serving” will always be more important than understanding the “truth” at the first stage. • Thus, the quality news sources have higher accuracy with more historical data, and PageRank. • Google has to assume that truth comes from strength of repeated evidence from the most authoritative sources. • Audit “About the source” panels of your competitors, create a review, and third- party mention gap.
  • 29. Information Literacy - Consensus Author Authority? bing-really-count-55389 • Danny Sullivan once asked Google and Bing whether they use social signals, or author names to understand who is the real expert on a topic. • Both of the search engines said that they audit “author quality” and “author expertise” for different topics. • Associate authoritative authors with your web source stronger, if they are writing for multiple web sources.
  • 30. Information Literacy - Consensus How do they use Knowledge Base? Integrating Knowledge Graph and Natural Text for Language Model Pre-training • There are hundreds of different algorithms to understand the authenticity and “true facts”. • For a search engine engineer, there is no “lie” and “fact”. • It is only “true facts” and “wrong facts”. • And, KELM-like algorithms help together to differentiate them from each other. • Query “Google Knowledge Graph API” to understand what they state for the same entity.
  • 31. Information Literacy - Cues What makes you trustworthy? • The research that Google cites mentioned that there are “6 Cues for snap judgments about whom to trust”. • These involve “images”, “brands”, “headlines – tonality”, “social cues”, “sponsors”, and “interactivity”. • Google works with MediaWise to perform surveys and integrate findings to their own algorithms. • Create your own “audit templates” for news articles for these 6 different verticals. Mark up “MediaWise”.
  • 32. Information Literacy – About this source Why does your opinion matter? • The story of “Web Answers” is too long. • Context-terms, Topical Entries, Candidate Answer Passages, Context-scoring for Candidate Answer Passages, and many more concepts… • Google Product Manager calls these “word callouts”. • Search Engine Engineers call them “representative answer”. • Learn NLP. Scoring Candidate Answer Passages
  • 33. Some Google Designs Machine learning to identify opinions in documents •Identifying opinionated portions in documents •Relating opinionated portions inside the document and/or across other documents (e.g., that relate to the same story) •To surface opinionated snippets or quotes to users of a news aggregation. •To identify portions of a document that convey opinion. •Google might rank a source for “report”, but not for “opinion”. Understand which vertical has a higher chance for your web source.
  • 34. Some Google Designs System and method for supporting editorial opinion in the ranking of search results “Editorial opinion” without “distorting facts” helps you for ranking. Especially for “first-person” experience stories, or reviews.
  • 35. Some Google Designs Embedded communication of link information “Information in the improved link tags may allow one or more publishers of content and/or documents to convey opinions about content and/or documents at one or more content locations and/or one or more document locations. The link tags may also allow one or more publishers to convey a weighting of the relative importance of one or more content locations and/or one or more document locations. In some embodiment, at least a portion of the information in the improved link tags may be encrypted, to allow one or more publishers to restrict the audience that may view the information in the link tags….. The improved link tags may allow the publishers to communicate additional information, such as opinions, about the content locations and/or document locations.” Categorize boilerplate/main content links according to their context. “Joe Biden and Congress” might have a different “block-link” than “Joe Biden and Elections”.
  • 36. Some Google Designs Aspect-Based Sentiment Summarization Use “key-points” with “sentiments” to summarize essence of news stories.
  • 37. Topicality and Context Filters Long and Shor Term Solutions for SERP Construction in News Vertical Short-term Solutions for News Search Engines: • Classify authoritative sources (PageRank, Article Count, Unique Sentence Count, Publication Frequency, Length, Citations, Search Behaviors). • Rank authoritative sources for different topics. • Classify and rank news web pages according to their context, and topicality. • Serve the most relevant news articles based on trust and confidence. Long-term Solutions for News Search Engines: • Process text. • Understand facts. • Audit accuracy and comprehensiveness. • Filter the sources, by re-assigning topical relevance and authority.
  • 38. Samples from News SEO with Factoids NaturalNews
  • 39. Samples from News SEO with Factoids NaturalNews
  • 40. Samples from News SEO with Factoids Powerofpositivity
  • 41. Samples from News SEO with Factoids Powerofpositivity
  • 42. Samples from News SEO with Factoids RealClearPolitics
  • 43. Samples from News SEO with Factoids BREITBART
  • 45. What would you do if you were Google? Which opinions should rank?
  • 46. What would you do if you were Google? Which opinions should rank?
  • 47. What would you do if you were Google? Which opinions should rank?
  • 48. What would you do if you were Google? Which opinions should rank?
  • 49. What would you do if you were Google? Which opinions should rank?