SlideShare a Scribd company logo
www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute.
Data Quality Concerns in Scientific Tasks
Y. Patrick Hsieh
Stephanie Eckman
Herschel Sanders
Amanda Smith
1
Use of Crowdsourcing
 Crowdsourcing popular source of online workforce for
scientific research
– Classifying images
– Transcribing audio files
– Coding texts or social media content
 Fast & inexpensive
 Amazon Mechanical Turk (MTurk)
2
These tasks are
a lot like surveys
What about
Data Quality?
Crowdsourcing vs Panels
MTurk
 Paid per HIT
 Metrics available
– # of tasks completed
– % of tasks approved
 Strong norm:
– Quality work → fair
pay
Online Panel
• Paid per survey
• Few quality metrics
available
3
Do cultures & incentives lead
to data quality differences?
• In surveys?
• In scientific tasks?
Motivated misreporting
 Web survey design
Research Question
4
Format MTurk Online Panel
Grouped
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Filter
Filter
Filter
Follow Up
Follow Up
Follow Up
Follow Up
Interleafed
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
Filter
Follow Up
Follow Up
Filter
Filter
Follow Up
Follow Up
2 tasks:
• Survey
• Image
coding
2 Sources of Participants
 MTurk
– 80% prior approval rate
– In US
 Online panel
– Convenience sample in US
– Balanced to Census
5
 Survey:
– 185/214 completed
– 59% female
– 39 years old
– 48% >= bachelors
 Image coding:
– 141/342 completed
– 62% female
– 50% bachelors or higher
 Survey:
– 204/260 completed
– 53% female
– 48 years old
– 37% >= bachelors
 Image coding:
– 141/372 completed
– 60% female
– 45% bachelors or higher
Task A: Lifestyle Survey
 4 filter sections
– Clothing
– Consumer goods
– Leisure activity
– Credit cards
 30 minutes
 $4 incentive
 Order of sections randomized
 Filters in forward or backward order
6
Has anyone in this household
purchased pants in the last 3
months?
Yes
How much did those pants cost?
Does that price include tax?
Did you buy them online?
……………….
Has anyone in this household
purchased shoes in the last 3
months?
Yes?
Task B: Image Coding
7
 Image coding task
– 40 photos of Haiti buildings
– $6 incentive
– 50 minutes
 4 elements
– Beam
– Column
– Slab
– Wall
 2 filters
– Can you see element?
– Is it damaged?
Results: Motivated Misreporting in Survey Questions
 Expected format effect: more YES answers in GROUPED format
8
Results: Motivated Misreporting in Survey Questions
 DV: YES response
 Controlling for:
– Demographics
– Order * section
– Format * MTurk / Panel
9
Results: Motivated Misreporting in Image Coding
 Effect in opposite direction: More YES in lnterleafed
 MTurkers answered YES more often
10
Average # of YES responses
Element visibility Element damage
Grouped 68.7 49.3
Interleaf 87.1 53.1
Average # of YES responses
Element visibility Element damage
Panel 65.4 47.1
MTurk 88.9 55.0
Take Aways (preliminary)
 Results not as expected
– Survey: Format effect only in MTurk
– MTurkers are similar to other survey respondents
– Why no format effect in panel?
 No motivated misreporting in Panel?
 Or misreporting in both formats?
– Image Coding: Format effect in opposite direction
 Some evidence MTurkers work harder than panelists
– Survey: less item NR
– Image Coding: longer time with training materials
11
???
Discussion
 Data scientists are doing surveys to make training data
 We know a lot about survey data quality!
– Measurement error
– Nonresponse error
– Coverage error
12
How do these affect
• Training data?
• Model predictions?
More Information
Y. Patrick Hsieh
yph@rti.org
@coolpat
Stephanie Eckman
seckman@rti.org
@stephnie
13

More Related Content

Data Quality Concerns when Crowdsourcing Scientific Tasks

  • 1. www.rti.orgRTI International is a registered trademark and a trade name of Research Triangle Institute. Data Quality Concerns in Scientific Tasks Y. Patrick Hsieh Stephanie Eckman Herschel Sanders Amanda Smith 1
  • 2. Use of Crowdsourcing  Crowdsourcing popular source of online workforce for scientific research – Classifying images – Transcribing audio files – Coding texts or social media content  Fast & inexpensive  Amazon Mechanical Turk (MTurk) 2 These tasks are a lot like surveys What about Data Quality?
  • 3. Crowdsourcing vs Panels MTurk  Paid per HIT  Metrics available – # of tasks completed – % of tasks approved  Strong norm: – Quality work → fair pay Online Panel • Paid per survey • Few quality metrics available 3 Do cultures & incentives lead to data quality differences? • In surveys? • In scientific tasks? Motivated misreporting
  • 4.  Web survey design Research Question 4 Format MTurk Online Panel Grouped Filter Filter Filter Follow Up Follow Up Follow Up Follow Up Filter Filter Filter Follow Up Follow Up Follow Up Follow Up Interleafed Filter Follow Up Follow Up Filter Filter Follow Up Follow Up Filter Follow Up Follow Up Filter Filter Follow Up Follow Up 2 tasks: • Survey • Image coding
  • 5. 2 Sources of Participants  MTurk – 80% prior approval rate – In US  Online panel – Convenience sample in US – Balanced to Census 5  Survey: – 185/214 completed – 59% female – 39 years old – 48% >= bachelors  Image coding: – 141/342 completed – 62% female – 50% bachelors or higher  Survey: – 204/260 completed – 53% female – 48 years old – 37% >= bachelors  Image coding: – 141/372 completed – 60% female – 45% bachelors or higher
  • 6. Task A: Lifestyle Survey  4 filter sections – Clothing – Consumer goods – Leisure activity – Credit cards  30 minutes  $4 incentive  Order of sections randomized  Filters in forward or backward order 6 Has anyone in this household purchased pants in the last 3 months? Yes How much did those pants cost? Does that price include tax? Did you buy them online? ………………. Has anyone in this household purchased shoes in the last 3 months? Yes?
  • 7. Task B: Image Coding 7  Image coding task – 40 photos of Haiti buildings – $6 incentive – 50 minutes  4 elements – Beam – Column – Slab – Wall  2 filters – Can you see element? – Is it damaged?
  • 8. Results: Motivated Misreporting in Survey Questions  Expected format effect: more YES answers in GROUPED format 8
  • 9. Results: Motivated Misreporting in Survey Questions  DV: YES response  Controlling for: – Demographics – Order * section – Format * MTurk / Panel 9
  • 10. Results: Motivated Misreporting in Image Coding  Effect in opposite direction: More YES in lnterleafed  MTurkers answered YES more often 10 Average # of YES responses Element visibility Element damage Grouped 68.7 49.3 Interleaf 87.1 53.1 Average # of YES responses Element visibility Element damage Panel 65.4 47.1 MTurk 88.9 55.0
  • 11. Take Aways (preliminary)  Results not as expected – Survey: Format effect only in MTurk – MTurkers are similar to other survey respondents – Why no format effect in panel?  No motivated misreporting in Panel?  Or misreporting in both formats? – Image Coding: Format effect in opposite direction  Some evidence MTurkers work harder than panelists – Survey: less item NR – Image Coding: longer time with training materials 11 ???
  • 12. Discussion  Data scientists are doing surveys to make training data  We know a lot about survey data quality! – Measurement error – Nonresponse error – Coverage error 12 How do these affect • Training data? • Model predictions?
  • 13. More Information Y. Patrick Hsieh yph@rti.org @coolpat Stephanie Eckman seckman@rti.org @stephnie 13