Data Quality Concerns when Crowdsourcing Scientific Tasks

www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute.
Data Quality Concerns in Scientific Tasks
Y. Patrick Hsieh
Stephanie Eckman
Herschel Sanders
Amanda Smith
1

Use of Crowdsourcing
 Crowdsourcing popular source of online workforce for
scientific research
– Classifying images
– Transcribing audio files
– Coding texts or social media content
 Fast & inexpensive
 Amazon Mechanical Turk (MTurk)
2
These tasks are
a lot like surveys
What about
Data Quality?

Crowdsourcing vs Panels
MTurk
 Paid per HIT
 Metrics available
– # of tasks completed
– % of tasks approved
 Strong norm:
– Quality work → fair
pay
Online Panel
• Paid per survey
• Few quality metrics
available
3
Do cultures & incentives lead
to data quality differences?
• In surveys?
• In scientific tasks?
Motivated misreporting

 Web survey design
Research Question
4
Format MTurk Online Panel
Grouped
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Interleafed
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
2 tasks:
• Survey
• Image
coding

2 Sources of Participants
 MTurk
– 80% prior approval rate
– In US
 Online panel
– Convenience sample in US
– Balanced to Census
5
 Survey:
– 185/214 completed
– 59% female
– 39 years old
– 48% >= bachelors
 Image coding:
– 62% female
– 50% bachelors or higher
 Survey:
– 53% female
– 48 years old
– 37% >= bachelors
 Image coding:
– 60% female
– 45% bachelors or higher

Task A: Lifestyle Survey
 4 filter sections
– Clothing
– Consumer goods
– Leisure activity
– Credit cards
 30 minutes
 $4 incentive
 Order of sections randomized
 Filters in forward or backward order
6
Has anyone in this household
purchased pants in the last 3
months?
Yes
How much did those pants cost?
Does that price include tax?
Did you buy them online?
……………….
Has anyone in this household
purchased shoes in the last 3
months?
Yes?

Task B: Image Coding
7
 Image coding task
– 40 photos of Haiti buildings
– $6 incentive
– 50 minutes
 4 elements
– Beam
– Column
– Slab
– Wall
 2 filters
– Can you see element?
– Is it damaged?

Results: Motivated Misreporting in Survey Questions
 Expected format effect: more YES answers in GROUPED format
8

Results: Motivated Misreporting in Survey Questions
 DV: YES response
 Controlling for:
– Demographics
– Order * section
– Format * MTurk / Panel
9

Results: Motivated Misreporting in Image Coding
 Effect in opposite direction: More YES in lnterleafed
 MTurkers answered YES more often
10
Average # of YES responses
Element visibility Element damage
Grouped 68.7 49.3
Interleaf 87.1 53.1
Average # of YES responses
Element visibility Element damage
Panel 65.4 47.1
MTurk 88.9 55.0

Take Aways (preliminary)
 Results not as expected
– Survey: Format effect only in MTurk
– MTurkers are similar to other survey respondents
– Why no format effect in panel?
 No motivated misreporting in Panel?
 Or misreporting in both formats?
– Image Coding: Format effect in opposite direction
 Some evidence MTurkers work harder than panelists
– Survey: less item NR
– Image Coding: longer time with training materials
11
???

Discussion
 Data scientists are doing surveys to make training data
 We know a lot about survey data quality!
– Measurement error
– Nonresponse error
– Coverage error
12
How do these affect
• Training data?
• Model predictions?

More Information
Y. Patrick Hsieh
yph@rti.org
@coolpat
Stephanie Eckman
seckman@rti.org
@stephnie
13

Data Quality Concerns when Crowdsourcing Scientific Tasks

More Related Content

Data Quality Concerns when Crowdsourcing Scientific Tasks