FAKE Review Detection

FAKE REVIEW DETECTION
What are they saying about you ? Are they real …
Guided By : Dr. Animesh Mukherjee

ONLINE REVIEW
• Captures testimonials of “real” people (unlike advertisements).
• Shapes decision making of customers.
• Positive reviews – financial gains and fame for business.
• Deceptive opinion spamming – to promote or discredit some target product
and services.[1]
• Opinion Spammers admitted for being paid to write fake reviews.(Kost, 2012)
• Yelp.com – “Sting operation” : publically shame business who buy fake
reviews.[2]
[1] Jindal and Liu 2008 : http://www.cs.uic.edu/~liub/FBS/opinion-spam-WSDM-08.pdf
[2] Yelp official blog : http://officialblog.yelp.com/2013/05/how-yelp-protects-consumers-from-fake-reviews.html

• Amazon Mechanical Turk – crowd sourced online workers(turkers) to write fake
reviews ($1 per review) portraying 20 hotels of Chicago in positive light.[1]
• 400 fake positive reviews collected and 400 non-fake reviews on same 20 hotels
using Tripadvisor.com
• Yelp’s filtered and unfiltered reviews collected to understand working of Yelp
review classification algorithm.
• Approach : Linguistic n-gram features and some supervised learning method.[2]
[1] Amazon Mechanical Turk : https://www.mturk.com/
[2] Ott et al. 2011 : https://www.cs.cornell.edu/courses/CS4740/2012sp/lectures/op_spamACL2011.pdf
Dataset Collection and Approach for Analysis

Linguistic Approach : Results
• Using only Bigram features :
accuracy of 89.6% on AMT data.[1]
• Using same n-gram features :
accuracy of 67.8% on Yelp Data.[2]
• Table 1: Class distribution of Yelp
data is skewed – imbalanced data
produces poor model.(Chawla et al,
2004)
• Good model for imbalanced data –
under sampling. (Drummond and
Holte, 2003)
[1] Ott et al. 2011 : https://www.cs.cornell.edu/courses/CS4740/2012sp/lectures/op_spamACL2011.pdf
[2] A Mukherjee - 2013 : http://www2.cs.uh.edu/~arjun/papers/ICWSM-Spam_final_camera-submit.pdf

Linguistic Approach Results : Explained
• For AMT data, word distribution of fake and non-fake reviews are very different,
which explains the high accuracy using n-gram.
• Reason for the different word distribution : Domain knowledge absence and little
gain in writing fake reviews($1 per review).
• Poor performance of n-grams for Yelp data because spammers according to
Yelp filter used very similar language in fake review as non-fake review :
linguistically very similar.[1]
• Inefficiency in linguistic in detecting fake reviews filtered by Yelp encourages to
do behavioral study of reviews.
[1] Mukherjee et al , 2013 : http://www2.cs.uh.edu/~arjun/papers/ICWSM-Spam_final_camera-submit.pdf

))(/)((2log)()||( iNiFiFNFKL i
Information Theoretic Analysis
• To explain huge accuracy difference : analysis of word distribution of AMT and
Yelp data.
• Good-Turing smoothed unigram language models.
• Computation of word distribution difference across fake & non-fake reviews :
Kullback-Leibler(KL) Divergence :
where F(i) and N(i) are respective probabilities of word i in fake and non-fake reviews.[1]
• Here KL(F||N) gives quantitative estimate of how much fake reviews linguistically
differ from non-fake reviews
[1] Kullback Leibler Divergence : http://www.cs.buap.mx/~dpinto/research/CICLing07_1/Pinto06c/node2.html

Information Theoretic Analysis (Cont..)
• KL Divergence is Asymmetric :
• We have,
• For AMT data : and
• For Yelp data : and
• [1]
   FNKLNFKL |||| 
   FNKLNFKL ||||  0KL
   FNKLNFKL ||||  0KL
   FNKLNFKLKL |||| 

Information Theoretic Analysis (Cont..)
• KL(F||N) Definition implies that words having higher probability in F and very low
probability in N contribute most to KL-Divergence.
• To study word-wise contribution to ∆KL, word ∆KL calculated as :
where,
• Contribution of top k words to ∆KL for k= 200 and k=300
)||()||( iiwordiiword
i
Word FNKLNFKLKL 
  






)(
)(
log)||( 2
iN
iF
iFNFKL iiword

Turkers didn’t do a good job at “Faking”!
Word-wise Difference of KL-
Divergence across top 200 words.
Equally Dense : |E| = |G|
• Symmetric Distribution of for top k-words for AMT
data implies the existence of two set of words:
1. set of words E, appearing more in fake reviews than
in non-fake, Ɐi € E, F(i) > N(i) resulting in
2. set of words G, appearing more in non-fake reviews
than in fake, Ɐi € G, N(i) > F(i) resulting in
i
WordKL
0 Ei
WordKL
0 Ei
WordKL
• Additionally top k=200 words only contribute 20% to ∆KL for the AMT data.
There are many words in AMT data having higher probabilities in fake than non-
fake and vice-versa.
• This implies fake and non-fake reviews in AMT data consist of words with very
different frequencies. Turkers didn’t do a good job in faking.[1]

Yelp Spammers are Smart but Overdid “Faking”!
• Yelp fake review data shows KL(F||N) is much larger than KL(N||F) and ∆KL>1.
• Given graphs from b-e shows among top 200 words which contribute to major
(=70%) most words have and only a few have .
Word-wise Difference of KL-Divergence across top 200 words.[1]
0 i
WordKL 0 i
WordKL

Yelp Spammers are Smart but Overdid “Faking”! (Cont..)
• Consider A be set of top words contributing most to ∆KL. We partition A as:
where and
• The curve above y=0 is dense and below it is sparse which implies
• clearly indicates that there exists specific words which contribute
most to ∆KL by appearing in fake reviews with much higher frequencies than
in non-fake reviews.
• Spammers made smart effort to ensure that their fake reviews have most
words that also appear in non-fake reviews to sound convincing.
 ))()(,.,(0| iFiNAiieKLiA Ni
Word
N

 ))()(,.,(0| iNiFAiieKLiA Fi
Word
F

NF
AAA   NF
AA
NF
AA 
NF
AA 

Yelp Spammers are Smart but Overdid “Faking”! (Cont..)
• While making their reviews sound convincing, psychologically they
happened to OVERUSE come words resulting higher frequency of certain
words in fake review than non-fake reviews.
• A quick lookup yields {us, price, stay, feel, deal, comfort} in hotel domain and
{options, went, seat, helpful, overall, serve, amount etc.} in restaurants.
• Prior personality work shown that deception/lying usually involves more use
of personal pronouns(eg. us) and associated action(eg. Went, feel) towards
specific targets(eg. option, price, stay) with objective of incorrect
projection(lying or faking).[1]
• Spammers caught by Yelp left behind linguistic footprints which can be
caught by precise behavioral study.
[1] Newman et al , 2003 : http://www.communicationcache.com/uploads/1/0/8/8/10887248/lying_words-_predicting_deception_from_linguistic_styles.pdf

Spamming Behavior Analysis
1. Maximum Number of Reviews(MNR) : Writing too many reviews in a day is
abnormal.
2. Percentage of Positive Reviews(PR) : Deception words in fake reviews
indicates projection in positive light. CDF of positive(4-5 stars) reviews
among all reviews is plotted to illustrate analysis.
3. Review Length(RL) : While writing fake experiences, there is probably not
much to write and also spammer not want to spend too much time in it.
4. Maximum Content Similarity(MCS) : To examine if some posted reviews are
similar to previous reviews, we computed cosine similarity between two
reviews of a reviewer. Non-spammers mostly write new contents.

Maximum Number of Reviews Percentage of Positive Reviews
Review Length Maximum Content Similarity
Spamming Behavior Analysis CDFs

Challenges with Supervised Evaluation
• Very difficult to find gold-standard data of fake and non-fake reviews for
model building – too difficult to manually recognize/label fake/non-fake
reviews by mere reading.
• Duplicate and near duplicate assumed to be fake, which is unreliable.
• Usage of manually labeled dataset – reliability issues because it have been
shown that accuracy of human labeling of fake reviews is very poor.[1]
• AMT crowdsourcing fake reviews by paying are fake yet they don’t reflect the
dynamics of fake reviews in commercial website.[2]
• This lack of labeled data, motivates to look after the unsupervised methods of
classification.[3]
[1] Ott M, Choi, Y, Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination.
[3] Mukherjee et al : http://delivery.acm.org/10.1145/2490000/2487580/p632-mukherjee.pdf

Unsupervised Evaluation Model
• Since human labeling for supervised learning is difficult, problem was
proposed by modeling spamicity (degree of spamming) as latent with other
observed behavioral features.
• Unsupervised model – Author Spamicity Model (ASM) proposed.[1]
• Taken a fully Bayesian approach and formulated opinion spam detection as
clustering problem.
• Opinion spammers have different behavioral distributions than non-spammers.
• Causes distributional divergence between latent population distributions of two
clusters: spammers and non-spammers.[1]
• Model inference results in learning the population distributions of two clusters.

• Formulates spam detection as an unsupervised clustering problem in
Bayesian setting.
• Belongs to class of Generative models for clustering based on set of
observed features.
• Models spamicity 𝑠 𝑎
(in range[0, 1]) of an author a; and spam label 𝜋 𝑟
of a
review which is the class variable reflecting the cluster membership (two
cluster K=2, spam and non-spam).[1]
• Each author/reviewer and respectively each review has a set of observed
features(behavioral clues).
• Certain characteristics of abnormal behavior defined which likely to link
with spamming and thus can be exploited in model for learning spam and
non-spam clusters. [1]
Author Spamicity Model

Author Features
• It have value in range [0, 1] and value close to 1 indicates spamming.
1. Content Similarity : Crafting new review every time is time consuming, spammers
likely to copy reviews across similar product. Choose maximum similarity to
capture worst spamming behavior.[1]
2. Maximum Number of Reviews : Posting many reviews a day is also abnormal.
3. Reviewing Burstiness : Spammers are usually not longtime members of site.
Defined over an activity window(first and last review posting date). If posted over
reasonably long timeframe, it probably a normal activity but all review posted
within short burst likely to be spam.[2]
4. Ratio of First Review : People mostly rely on early reviews and spamming early
impacts hugely on sales. Spammers try to be among first reviewers. [1]
[2] Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. WWW (2012).

Review Features
• It have five binary review features. Value 1 indicates spamming else 0 is non-spamming.
1. Duplicate /Near Duplicate Reviews: Spammers often post multiple reviews which are
duplicate /near duplicate on same product to boost ratings.
2. Extreme Rating: Spammers mostly like to give extreme ratings(1 or 5) in order to boost
ratings to demote or promote products.
3. Rating Deviation: Spammers usually involve in wrong projection in either positive or
negative and it deviates from average ratings given by other reviewers.
4. Early Time Frame: Early review can greatly impact people’s sentiments on a product.
5. Rating Abuse: Multiple ratings on same product are unusual. Similar to DUP but focuses
on rating dimensions rather than content.

CONCLUSIONS
• We presented an in-depth investigation of nature of fake reviews in Commercial
settings of Yelp.com.
• Our study shows linguistics methods of (Ott et al., 2011) and its high accuracy in
AMT data.
• We presented a behavioral study of spammers for real-life fake reviews.
• We presented a brief introduction to n-gram language model.
• We presented challenges with the supervised evaluation and gave idea about the
unsupervised approach of evaluation.
• We presented a brief introduction to unsupervised Author Spamicity Model (ASM).

REFERENCES
1. Jindal and Liu 2008 : http://www.cs.uic.edu/~liub/FBS/opinion-spam-WSDM-08.pdf
2. Yelp official blog : http://officialblog.yelp.com/2013/05/how-yelp-protects-consumers-from-fake-
reviews.html
3. MIT N-gram Language Model Tutorial:
http://web.mit.edu/6.863/www/fall2012/readings/ngrampages.pdf
4. Amazon Mechanical Turk : https://www.mturk.com/
5. Ott et al. 2011 :
https://www.cs.cornell.edu/courses/CS4740/2012sp/lectures/op_spamACL2011.pdf
6. Mukherjee et al , 2013 : http://www2.cs.uh.edu/~arjun/papers/ICWSM-Spam_final_camera-
submit.pdf
7. Kullback Leibler Divergence :
http://www.cs.buap.mx/~dpinto/research/CICLing07_1/Pinto06c/node2.html
8. Newman et al , 2003 :
http://www.communicationcache.com/uploads/1/0/8/8/10887248/lying_words-
_predicting_deception_from_linguistic_styles.pdf
9. Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer
Reviews. WWW (2012).
10. Mukherjee et al : http://delivery.acm.org/10.1145/2490000/2487580/p632-mukherjee.pdf
11. Ott M, Choi, Y, Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any
Stretch of the Imagination.

FAKE Review Detection

Related slideshows

More Related Content

FAKE Review Detection