Crowdsourcing has become a popular means to solicit assistance for scientific research. From classifying images or texts to responding to surveys, tapping into the knowledge of crowds to complete complex tasks has become a common strategy in social and information sciences. Although the timeliness and cost-effectiveness of crowdsourcing may provide desirable advantages to researchers, the data it generates may be of lower quality for some scientific purposes. The quality control mechanisms, if any, offered by common crowdsourcing platforms may not provide robust measures of data quality. This study explores whether research task participants may engage in motivated misreporting whereby participants tend to cut corners to reduce their workload while performing various scientific tasks online. We conducted an experiment with three common crowdsourcing tasks: answering surveys, coding images, and classifying online social media content. The experiment recruited workers from three sources: a crowdsourcing platform for crowd workers, a commercial survey panel provider for online panelists, and a research volunteering website for citizen scientists. The analysis seeks to address the following two questions: (1) whether online panelists, crowd workers or volunteers may engage in motivated misreporting differently and (2) whether the patterns of misreporting vary by different task types. We further seek to examine potential correlation between the patterns of motivated misreporting and the data quality of complex scientific research tasks. The study closes with suggestions of quality assurance practices of incorporating collective intelligence to improve the system for massive online information analysis in social science research.
Report
Share
Report
Share
1 of 13
Download to read offline
More Related Content
Data Quality Concerns when Crowdsourcing Scientific Tasks
1. www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute.
Data Quality Concerns in Scientific Tasks
Y. Patrick Hsieh
Stephanie Eckman
Herschel Sanders
Amanda Smith
1
2. Use of Crowdsourcing
Crowdsourcing popular source of online workforce for
scientific research
– Classifying images
– Transcribing audio files
– Coding texts or social media content
Fast & inexpensive
Amazon Mechanical Turk (MTurk)
2
These tasks are
a lot like surveys
What about
Data Quality?
3. Crowdsourcing vs Panels
MTurk
Paid per HIT
Metrics available
– # of tasks completed
– % of tasks approved
Strong norm:
– Quality work → fair
pay
Online Panel
• Paid per survey
• Few quality metrics
available
3
Do cultures & incentives lead
to data quality differences?
• In surveys?
• In scientific tasks?
Motivated misreporting
4. Web survey design
Research Question
4
Format MTurk Online Panel
Grouped
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Interleafed
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
2 tasks:
• Survey
• Image
coding
5. 2 Sources of Participants
MTurk
– 80% prior approval rate
– In US
Online panel
– Convenience sample in US
– Balanced to Census
5
Survey:
– 185/214 completed
– 59% female
– 39 years old
– 48% >= bachelors
Image coding:
– 141/342 completed
– 62% female
– 50% bachelors or higher
Survey:
– 204/260 completed
– 53% female
– 48 years old
– 37% >= bachelors
Image coding:
– 141/372 completed
– 60% female
– 45% bachelors or higher
6. Task A: Lifestyle Survey
4 filter sections
– Clothing
– Consumer goods
– Leisure activity
– Credit cards
30 minutes
$4 incentive
Order of sections randomized
Filters in forward or backward order
6
Has anyone in this household
purchased pants in the last 3
months?
Yes
How much did those pants cost?
Does that price include tax?
Did you buy them online?
……………….
Has anyone in this household
purchased shoes in the last 3
months?
Yes?
7. Task B: Image Coding
7
Image coding task
– 40 photos of Haiti buildings
– $6 incentive
– 50 minutes
4 elements
– Beam
– Column
– Slab
– Wall
2 filters
– Can you see element?
– Is it damaged?
9. Results: Motivated Misreporting in Survey Questions
DV: YES response
Controlling for:
– Demographics
– Order * section
– Format * MTurk / Panel
9
10. Results: Motivated Misreporting in Image Coding
Effect in opposite direction: More YES in lnterleafed
MTurkers answered YES more often
10
Average # of YES responses
Element visibility Element damage
Grouped 68.7 49.3
Interleaf 87.1 53.1
Average # of YES responses
Element visibility Element damage
Panel 65.4 47.1
MTurk 88.9 55.0
11. Take Aways (preliminary)
Results not as expected
– Survey: Format effect only in MTurk
– MTurkers are similar to other survey respondents
– Why no format effect in panel?
No motivated misreporting in Panel?
Or misreporting in both formats?
– Image Coding: Format effect in opposite direction
Some evidence MTurkers work harder than panelists
– Survey: less item NR
– Image Coding: longer time with training materials
11
???
12. Discussion
Data scientists are doing surveys to make training data
We know a lot about survey data quality!
– Measurement error
– Nonresponse error
– Coverage error
12
How do these affect
• Training data?
• Model predictions?