master_thesis.pdf
- 1. MOHAMED V UNIVERSITY– RABAT
NATIONALADVANCED SCHOOL OF COMPUTER
SCIENCE AND SYSTEM ANALYSIS
Master Thesis
Data science and Big Data
Designing and Developing a Personalized Country Recommender System
Presented By :
EL MAJJODI Ayoub
Octobre, 2019
Supervised By:
Pr. Lamia BENHIBA, ENSIAS
Pr. Nabil EL IONI, UNIBZ
Pr. Mehdi ELAHI, UNIBZ
- 3. State of the Art
3
3
Introduction Methodology Results Conclusion
Motivation Research questions
Proposed Solution
Quality of Life, UK
Education quality, USA
Health Care, UEA
- 4. State of the Art
4
4
Introduction Methodology Results Conclusion
Motivation Research questions
Proposed Solution
Ranking lists: best place to live
- 5. State of the Art
5
Introduction Methodology Results Conclusion
Motivation Research questions
Proposed Solution
Recommender Systems found success in many domains such as :
E-commerce
Education
Entertainment
Travel
- 6. State of the Art
Introduction Methodology Results Conclusion
Motivation Research question
Proposed Solution
Personalized System With a Recommendation of Countries
6
- 7. State of the Art
Introduction Methodology Results Conclusion
Motivation Research questions
Project Proposal
In order to achieve our objective, we formulated a number of research questions :
1- Which recommender algorithms can be adopted -based on the preferences of
users in order to generate personalized country ranking ?
2- What are the most important features that users consider when deciding to move
to another country ?
3- Do recommender algorithm preferences depend on personality types ?
4- Will the system for generating personalized country ranking be usable
according to the user’s assessment ?
7
- 8. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches Evaluation
Recommender systems :
{people provide recommendations as inputs, which the system
then aggregates and directs to appropriate recipients} (Resnick, 1997)
{Any system that produces individualized recommendations as output or has
the effect of guiding the user in a personalized way to interesting or useful objects
in a large space of possible options} (Burkee 2002)
8
- 9. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches Evaluation
Recommender systems formulation :
● U the set of all the users
● I the set of all the possible items
● Let f be the utility function that measures the suitability of
item i to the users u needs
● A system of recommendation tries to choose item i’ in I that
maximize the user’s utility function :
9
- 10. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches Evaluation
The data used by Recommender systems can be categorized into :
Items : objects that are recommended (goods, movies,books,
courses ..).
Transactions : recorded interactions between the user and
the system.
Users : users of the recommender system
10
- 11. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches Evaluation
Recommender system approaches :
Collaborative Filtering
Content Based Filtering
Hybrid Filtering
11
- 12. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches:
collaborative filtering
Evaluation
Collaborative Filtering : generate
ratings for new user based on people with
similar interest.
Example :collaborative filtering
12
- 13. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches:
Content Based filtering
Evaluation
Content Based filtering : recommends an
item to user based on, the description of item
characteristic and user profile in term of item
characteristics.
Example: Content based filtering
13
- 14. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches:
Hybrid Filtering
Evaluation
combine two or more recommendation techniques,
mostly collaborative and content based filtering to
make recommendations.
Hybrid filtering :
Example: Hybrid filtering
14
- 15. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches: Evaluation
Statistical Measures
Statistical measures :
Mean Absolute Error (MAE) : measures the average absolute deviation
between a predicted rating and the user’s true ratings.
{
{
{
Ratings set predicted rating of
item i to user u.
true rating
15
- 16. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches: Evaluation
Statistical Measures
Statistical measures :
Root Mean Squared Error (RMSE) : between the predicted values and
actual rating.
16
- 17. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches: Evaluation
Usefulness
Usefulness :
★ Novelty
★ Diversity
★ Understand ME
★ Satisfaction
★ Accuracy
17
- 18. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches: Evaluation
Usefulness
Personality :
{ Individual’s characteristic pattern of thinking, feeling, and psychological
mechanism, influences how people make their decision. }(The personality puzzle
1997)
18
- 19. State of the Art
Introduction Methodology Results Conclusion
Definition Human Behaviour and
Personality
Knowledge Source Approaches: Evaluation
Usefulness
Big-5 personality traits :
➢ Openness : reflects a person’s tendency to intellectual curiosity, creativity and preference
for novelty and variety of experience.
➢ Conscientiousness : reflects a person’s tendency to show self-discipline and aim for
personal achievements, and to have an organized and dependable behavior.
➢ Neuroticism: reflects a person’s tendency to experience unpleasant emotions.
➢ Extraversion: reflects a person’s tendency to show sociability, talkativeness and
assertiveness traits.
➢ Agreeableness : reflects a person’s tendency to be kind, concerned, truthful and cooper-
ative towards others.
19
- 20. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
Form used in training dataset collection
20
- 21. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
Training dataset:
- 136 users
- 25 country
- 3400 rows
Ratings Matrix
21
- 22. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms
cross-validation
Implementation & Design
Adopted Algorithms :
Cross validation results
SVD
KNN-B
Co-clustering
22
- 23. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms
SVD
Implementation & Design
Adopted Algorithms :
➢ SingularValue Decomposition (SVD): factorize the original ratings matrix into two
matrices using a prediction function.
R = Ratings matrix, m users, n item
P=User matrix , m user, f features
Q= Item matrix, n item, f
A rating r(ui) can be estimated by dot product of user vector p(u) and item vector q(i).
23
- 24. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms
KNN-B
Implementation & Design
Adopted Algorithms :
➢ K-Nearest Neighbor Baseline (KNN-B): Finding like-minded users or similar items
for a given users, based on :
➔ A similarity measures
➔ A function that fetch the neighborhood using the similarity measures
➔ A rating prediction function based on the neighbor ratings.
24
- 25. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms
Co-clustering
Implementation & Design
Adopted Algorithms :
➢ Co-clustering: grouping both similar user and similar items into, categories
synchronously.
Example: Co-clustering 25
- 26. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
26
System Architecture
- 27. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
27
User flow :
● Registration step
● Username, email, password
● Personality survey: Five factor model
● Openness, conscientiousness ,
extraversion, agreeableness,
neuroticism
- 28. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
28
User flow :
● Select Features (at least 3 out of 12)
1. Education quality
2. Political insecurity
3. Social conflict
4. Work opportunities
5. Health care
6. Income difference
7. Wars and dictatorship
8. Family member abroad
9. Cultural and linguistic similarities
10. Working atmosphere
11. Shorter distance
12. Crime rate
- 29. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
29
User flow :
● Rate countries (at least 5 ) using 5-star rating scale
- 30. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
30
User flow :
● Result (3 lists)
● Evaluation survey : 5 metrics , accuracy, Diversity, understand Me,
satisfaction, Novelty.
● List 1 : SVD
● List 2 : KNN-B
● List 3 : Co-clustering
- 31. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
31
User flow :
● Usability Survey (System Usability Scale,
SUS : score 10-item questionnaire based
on 5-point Likret scale)
- 32. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
32
➢ Online evaluation with real user
➢ 281 new user attempted the experiment, 109 completed all the steps
➢ Data collected was analysed in order to find possible patterns
Registration
281
(100%)
Personality
241
(85%)
Features
226
(80%)
Ratings
193
(69%)
Evaluate
189
(67%)
Usability
109
(38%)
- 33. State of the Art
Introduction Methodology Results Conclusion
Data description Experiment
Recommender Algorithms Implementation & Design
33
Under 18
Age:
12
(5%)
18-24
100
(42%)
25-35
93
(39%)
35-45
23
(9%)
45-55
10
(4%)
Over 55
2
(1%)
Females
Origin Country:
65
(27%)
Males
170
(71%)
25-35
5
(25%)
Gender:
➢ USA (20%)
➢ Morocco (11%)
➢ Egypte (5%)
+ Various other countries (64%)
- 34. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison System Usability
Personality & Algorithm Preferences
34
Feature Preferences
Metric Question Co-clustering KNN-B SVD
Accuracy 1. Which list has more selections that you find appealing ? 33% 29% 38%
Accuracy 2. Which list has more obviously bad suggestions for you ? 59% 31% 10%
Diversity 3. Which list has more countries that are similar to each other ? 26% 26% 48%
Diversity 4. Which list has a more varied selection of countries ? 32% 40% 18%
Diversity 5. Which list has countries that match a wider variety of preferences ? 24% 74% 29%
- 35. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
35
Feature Preferences
Metric Question Co-clustering KNN-B SVD
Understand ME 6. Which list better reflects your preferences in countries ? 18% 26% 56%
Understand ME 7. Which list seems more personalized to your countries ratings ? 21% 24% 55%
Understand ME 8. Which list represents more mainstream ratings instead of your
own ?
15% 24% 61%
Satisfaction 9. Which list would better help you find countries to consider ? 14% 40% 46%
Satisfaction 10. Which list would you be more likely to recommend to your
friends ?
19% 19% 62%
System Usability
- 36. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
36
Feature Preferences
Metric Question Co-clustering KNN-B SVD
Novelty 11. Which list has more countries you did not expect ? 55% 33% 12%
Novelty 12. Which list has more countries that are familiar to you ? 23% 29% 48%
Novelty 13. Which list has more pleasantly surprising countries ? 25% 49% 26%
Novelty 14. Which list provides fewer new suggestions ? 29% 17% 54%
System Usability
- 37. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
37
Feature Preferences
RQ1: Which recommender algorithms can be adopted -based on the preferences
of users in order to generate personalized country ranking ?
SVD : better in terms of accuracy, Understand Me, Satisfaction
SVD : many mainstream suggestions
KNN-B : better in terms of Diversity and Novelty
Co-clustering :Deemed underperforming by majority of users across
most of the categories of metrics
System Usability
- 38. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
Feature Preferences
Overall (226)
Work Opportunities 161 (72%)
Education Quality 105 (42%)
Working Atmosphere 100 (44%)
Health Care 97 (43%)
Income Difference 84 (37%)
Political Insecurity 59 (26%)
Crime Rate 58 (26%)
Social Conflict 49 (22%)
Cultural & Linguistic Similarities 41 (21%)
Wars & Dictatorship 37 (16%)
Family Member Abroad 20 (8%)
Shorter Distance 15 (6%)
Males (170)
Work Opportunities 108(70%)
Education Quality 81(48%)
Health Care 72 (42%)
Females (65)
Work Opportunities 40 (61%)
Working Atmosphere 31 (48%)
Health Care 31 (48%)
38
System Usability
- 39. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
39
Feature Preferences
RQ2: What are the most important features that users consider when
deciding to move to another country ?
Top 4 features :
➢ Work Opportunities
➢ Education Quality
➢ Working Atmosphere
➢ Health Care
System Usability
- 40. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
40
Feature Preferences System Usability
Accuracy Diversity Understand ME Satisfaction Novelty
Openness SVD Co-clustering SVD SVD KNN-B
Conscientiousness SVD Co-clustering SVD SVD KNN-B
Extraversion SVD KNN-B SVD SVD KNN-B
Agreeableness SVD KNN-B SVD SVD KNN-B
Neuroticism SVD Co-clustering SVD SVD KNN-B
Personality and Algorithm preferences
- 41. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
41
Feature Preferences
RQ3: Do recommender algorithm preferences depend on personality
types ?
➢ People with different types of personality may tend to choose results generated
by different types of algorithms.
System Usability
- 42. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
42
Feature Preferences System Usability
Sus Score Interpretion :
Score Grade Rating
> 80 A Excellent
68 - 80 B Good
68 C OKay
51-68 D Poor
< 51 F Awful
Final score : 60.82
Lowest : 22.5
Highest : 100
- 43. State of the Art
Introduction Methodology Results Conclusion
Algorithm Comparison
Personality & algorithm preferences
43
Feature Preferences
RQ4: Will the system for generating personalized country ranking be usable
according to the user’s assessment ?
➢ Scored lower than well accepted benchmark
➢ The system didn’t pass the usability test
System Usability
- 44. State of the Art
Introduction Methodology Results Conclusion
44
Conclusion
➔ We survyed people to gather explicit rating about some countries.
➔ Proposed System evaluated according to real users
assessment .
➔ A recomender system of countries was designed, deployed.
➔ We cross validate seviral collaborative filtering algorithms.
Future works
- 45. State of the Art
Introduction Methodology Results Conclusion
45
➔ Investigate whether recommender system based on deep
learning would improve quality of recommendation in this
domain.
➔ Investigate the usefulness of making recommendations related
to immigration factors.
➔ Incorporate the personality information in the prediction
model.
➔ Extend the experement to gather more data.
Conclusion Future works
- 47. MOHAMED V UNIVERSITY– RABAT
NATIONALADVANCED SCHOOL OF COMPUTER
SCIENCE AND SYSTEM ANALYSIS
Master Thesis
Data science and Big Data
Designing and Developing a Personalized Country Recommender System
Presented By :
EL MAJJODI Ayoub
Octobre, 2019
Supervised By:
Pr. Lamia BENHIBA, ENSIAS
Pr. Nabil EL IONI, UNIBZ
Pr. Mehdi ELAHI, UNIBZ