SlideShare a Scribd company logo
Data Acquisition: A Key
Challenge for Quality
and Reliability
Improvement
Gerald J. Hahn & Necip Doganaksoy
©2013 ASQ & Presentation Hahn & Doganaksoy

http://reliabilitycalendar.org/webina
rs/
ASQ Reliability Division
English Webinar Series
One of the monthly webinars
on topics of interest to
reliability engineers.
To view recorded webinar (available to ASQ Reliability
Division members only) visit asq.org/reliability
To sign up for the free and available to anyone live
webinars visit reliabilitycalendar.org and select English
Webinars to find links to register for upcoming events

http://reliabilitycalendar.org/webina
rs/
DATA ACQUISITION: A KEY CHALLENGE FOR QUALITY
AND RELIABILITY IMPROVEMENT

Gerald J. Hahn
GE Global Research
(Retired)
gerryhahn@yahoo.com

Necip Doganaksoy
GlobalFoundries
necipdoganaksoy@yahoo.com

ASQ RELIABILITY DIVISION WEBINAR
November 14, 2013

3
THE OBVIOUS, THE EXPECTATION AND THE REALITY
• The Obvious
– Statistical quality and reliability analyses are based
upon sample data (and assumptions about
sampled populations, etc.)
– Such analyses are only as good as the data upon
which they are based
– Bad data lead to more complex, less powerful or
invalid analyses
– David Moore: The most important information
about any statistical study is how the data were
produced
• The Expectation: Much attention is given to the data
acquisition process in training and applications
• The Reality: Little or insufficient attention is generally
given to the data acquisition process
4
THE CONSEQUENCES AND THE
CHALLENGE
• The Consequences
– Why is it that every database that I have
encountered is filled with data quality problems?
(Theodore Johnson, 2003 QPRC)
– Common wisdom puts the extent of the total
project effort spent in cleaning the data before
doing any analysis as high as 60-95% (DeVeaux
and Hand, Statistical Science 2005)
• The Challenge: Move data-acquisition to front burner
– Understand limitations of available data
– Emphasize data acquisition
– Use disciplined process
5
WEBINAR TOPICS
•
•
•
•
•
•
•
•

Typical data acquisition situations
Problems (and opportunities) with observational data
A disciplined, targeted approach for data acquisition
Washing machine design reliability example
Some guidelines for effective data acquisition
Some practical challenges
Some relevant further commentaries
Elevator speech

EMPHASIS ON QUALITY AND RELIABILITY
6
TYPICAL DATA ACQUISITION SITUATIONS
• Control over data acquisition
– Designed experiments
– Random sampling studies (from specified
population)
– Double-blind medical studies
– Systems development studies, e.g.,
•
•
•
•
•

Estimate design reliability
Evaluate measurement system
Assess process capability
Signal changes via control charts
Anticipate/avoid field failures by automated monitoring

• Observational studies (and data mining) on
existing data often from Big Data
MANY APPLICATIONS INVOLVE COMBINATIONS

7
PROBLEMS (AND OPPORTUNITIES) WITH
OBSERVATIONAL DATA
•

Problems with “available” databases
– Data obtained for purposes other than statistical analysis
– Data resides in different data bases

• Some limitations of observational data
–
–
–
–
–

•

•

Missing values and events
Unrepresentative observations
Inconsistent or imprecise measurements
Limited variability
Key impacting variables unrecorded; recorded proxy variables
deemed “significant” (e.g., foot size impacts reading ability)
Observational studies
– May be helpful for prediction, e.g., credit performance, top selling
items before expected hurricane, finding best time to buy plane ticket
– Misleading or useless for gaining “cause and effect” understanding
– Observation from the trenches (Kati Illouz, GE): Data owners tend to
be overly optimistic about their data
Data inadequacies (and reasons) define future information needs

QUALITY—NOT QUANTITY—OF DATA IS WHAT COUNTS
8
IN SUMMARY
• Even the most sophisticated statistical analysis cannot
compensate for or rescue inadequate data
• It’s not that there is lack of data. Instead, it is that the
data are inadequate to answer the questions (NY Times
article on “How Safe is Cycling?” October 22, 2013
• Massive data does not guarantee success…Knowing
how the data were collected (the “data pedigree”) is
critical (Snee, Union College Mathematics
Conference, October 2013)
• A good principle to remember is that data are guilty
until proven innocent, not the other way around (Snee
and Hoerl, QP Dec 2012)
• Observational data have an important role in pointing
the way forward, but they should not be a primary
ingredient for making final decisions (Anderson-Cook
and Borror, QP April 2013)
9
DISCIPLINED, TARGETED PROCESS FOR DATA
ACQUISITION (DEUPM) FOR SYSTEMS
DEVELOPMENT STUDY
• Proposed process:
– Step 1: D: Define the problem
– Step 2: E: Evaluate the existing data
– Step 3: U: Understand data acquisition opportunities
and limitations
– Step 4: P: Plan data acquisition and analysis
– Step 5: M: Monitor, clean data, analyze and validate
• Example: Demonstrate desired ten-year reliability for
new washing machine design in 6 months elapsed time

10
STEP 1: DEFINE THE PROBLEM

• Define specific questions to be answered
Washing machine design example:
– Stated objective: Show within 6 months and with 95% confidence
that following can be met:
• 95% reliability after one year of operation
• 90% reliability after five years
• 80% reliability after ten years
(“reliability” defined as no repair or servicing need)
– Added question: How can reliability be improved further?

• Identify resulting actions
Washing machine design example: Go to full scale
production if validated and make identified improvements

• State population or process of interest
Washing machine design example: 6 million machines to
be built in next 5 years
11
STEP 2: EVALUATE THE EXISTING DATA
• Understand the process and its physical basis
Washing machine design example: Study up and participate in
design reviews, FMEA’s (Failure Mode and Effects Analyses), etc.
• Determine and analyze existing data
Washing machine design example
– Previous design
• Existing data
– In-house component, sub-assembly and system tests
– Field failure and servicing data
• Conclusion: Previous design does not meet current reliability goals

– New design
•
•
•
•
•

Proposed new design aims to correct key past problems
Possible concern: Introduction of new failure modes
Existing data: Component and sub-assembly test results
Data identified one new failure mode; rapidly addressed and corrected
Conclusion: Proposed new design appears to correct past problems
without introducing new ones; reliability goals appear to be met

• Identify data inadequacies
Washing machine design example: No information about system
performance in realistic use environment
12
STEP 3:UNDERSTAND DATA ACQUISITION
OPPORTUNITIES AND LIMITATIONS
• Gain understanding of data that can be acquired and how
Washing machine example: In-house accelerated use rate systems testing
• Simulate 3.5 years of operation per month
• Evaluate weekly for failures
• Sample unfailed units and measure degradation (destructive test)

• Determine practical considerations and limitations in data
acquisition
Washing machine design example:
• 6 months of testing
• 3 prototype lots initially (and one more subsequently)
• 36 available test stands

• Assess relevance of resulting data to meet study goals and
underlying assumptions
Washing machine design example:
• Assume prototype lots representative of 5-year high volume production
• Assume failures are cycle (and not elapsed time) dependent
• Assume realistic simulation of field environment
Conclusion: This is analytic (not enumerative) study; statistical
13
confidence bounds capture only statistical uncertainty
STEP 4: PLAN DATA ACQUISITION AND IMPLEMENT

•

•

•

•

Specify test conditions or operational environment
Washing machine design example: Run washing machines with full load of soiled
towels, mixed with sand, wrapped in plastic bag
Specify sample size and selection process
Washing machine design example: Select 12 units randomly from each of 3 prototype
lots and put on life test
Specify protocol and operational details
Washing machine design example:
– Record failures and determine failure mode
– After 3 months and again after 6 months
Years
• Remove 4 units from each of 3 lots and measure degradation
• Replace 3 month withdrawals with 12 units from 4th prototype lot
– Assure high-precision measurements, meaningful failure definition, complete and
consistent data recording procedures, etc
Specify data analysis plan and assess expected statistical precision
Washing machine design example:
– Do Weibull distribution analysis on time to failure data after 6 months
– Conduct supplementary analysis using degradation data
– Simulation study demonstrated proposed plan provides desired statistical precision
Specify pilot study
Washing machine design example: Run three washing machines for one week
Percent Failing

•

14
STEP 5: MONITOR, CLEAN DATA, ANALYZE AND
VALIDATE

• Monitor implementation to ensure that process is being followed
Washing machine design example: Continue involvement

• Clean data—as gathered
Washing machine design example: Develop proactive checks for missing or
inconsistent data

• Conduct preliminary analyses; act thereon, as appropriate
Washing machine design example: Analyze failure data after 1 week, 1 month
and 3 months; identify failure modes for correction

• Conduct final data analysis and report findings
Washing machine design example: Do final analyses after 6 months (failure
and degradation data)

• Validate: Propose appropriate validation testing
Washing machine design example:
– Continue 6 of 36 units on test beyond 6 months
– Test 100 machines with company employees and 60 machines in
laundromats
– Audit sample 6 production units each week: Test five for 1 week; one for
3 months
– Develop system for capturing and analyzing field reliability data
– Provide current data access to engineers and management
15
SOME GUIDELINES FOR EFFECTIVE DATA
ACQUISITION (STEP 4)
RECORD KEY VARIABLES AND EVENTS
Example: Use field data to estimate reliability and
speedily identify/address root causes of failures calls
for
• Field data
–
–
–
–

Estimate of product usage
Product performance measurements over time
Time to failure
Failure mode information

• Manufacturing data
–
–
–
–
–
–

Parts and manufacturing lot identification
Actual process conditions
Ambient conditions during manufacture
Unplanned events
Other potentially important process variables
End-of-line performance
16
ENSURE CONSISTENT AND
ACCURATE DATA RECORDING
• Strive for precise measurements
• Combat data recording inconsistencies
– Differences between operators
– Differences in qualitative scaling assessments
– Differences in data recording conventions; e.g. date of 2/8

• Address missing values
– Understand reason
– Handle appropriately
– Minimize occurrence

• Conduct timely data cleaning: Identify “errors” in
recorded data (e.g. 999 for missing values) and correct
17
AVOID SYSTEMATICALLY
UNRECORDED OBSERVATIONS
Some examples:
• Information recorded on failed units
only
• Information only during warranty
period
• Exclusion of “outlier” information
• Purging of “old”—but still relevant-data
18
SOME OTHER HINTS
• Strive to obtain continuous data
• Aim for compatibility and integration of
databases
• Consider sampling

19
CHALLENGES
• Some practical challenges
– Added cost and possible delays
– Added bureaucracy
– Diversity of data ownership:
Engineering, Manufacturing, etc.
– Need for added work not evident
Result: Lack of motivation by data recorders and their
management

• Strive to overcome by
–
–
–
–
–

Recognizing perspectives of others
Understanding consequences of our requests
Making requests as simple and reasonable as possible
Automating data acquisition process
Providing convincing justification (e.g., insurance)
20
SOME RELEVANT FURTHER COMMENTARIES
•

Webinar adapted from
– Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the
Numbers, Wiley (Chapter 11).
– Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key
Challenge, Quality Engineering, Vol. 24, #4, 446-459.

•

Also note
– Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data
Collection Strategies to Enhance Your Quality Analyses, Quality Progress, April, 1829.
– Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a
Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12.
– DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20
(3) 121-238.
– Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the
Challenge, Presentation at Joint Statistical Meetings.
– Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and
Industry, Wiley, 2008.
– Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and
rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming).
– Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18.
– Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and
origin of your data?, Quality Progress, December, 66-68.
– Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing
Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press
21
ELEVATOR SPEECH
• We need put the horse (focus on data acquisition)
before the CART (Classification and Regression
Tree) data analysis
• Specific proposals
– Focus on data acquisition in training programs
– Scrutinize available data to assess relevance
and identify gaps
– Use disciplined, targeted process for added data
acquisition
– Remain constantly cognizant of underlying
assumptions
• Thanks for listening
– Gerry Hahn, gerryhahn@yahoo.com
– Necip
Doganaksoy, necipdoganaksoy@yahoo.com
22
SOME RELEVANT FURTHER COMMENTARIES
•

Webinar adapted from
–
–

•

Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the
Numbers, Wiley (Chapter 11).
Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key
Challenge, Quality Engineering, Vol. 24, #4, 446-459.

Also note
–
–
–
–
–
–
–
–

–

Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data Collection
Strategies to Enhance Your Quality Analyses, Quality Progress, April, 18-29.
Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a
Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12.
DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20 (3)
121-238.
Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the
Challenge, Presentation at Joint Statistical Meetings.
Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and
Industry, Wiley, 2008.
Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and
rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming).
Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18.
Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and
origin of your data?, Quality Progress, December, 66-68.
Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing
Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press

23

More Related Content

Data Acquisition: A Key Challenge for Quality and Reliability Improvement

  • 1. Data Acquisition: A Key Challenge for Quality and Reliability Improvement Gerald J. Hahn & Necip Doganaksoy ©2013 ASQ & Presentation Hahn & Doganaksoy http://reliabilitycalendar.org/webina rs/
  • 2. ASQ Reliability Division English Webinar Series One of the monthly webinars on topics of interest to reliability engineers. To view recorded webinar (available to ASQ Reliability Division members only) visit asq.org/reliability To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events http://reliabilitycalendar.org/webina rs/
  • 3. DATA ACQUISITION: A KEY CHALLENGE FOR QUALITY AND RELIABILITY IMPROVEMENT Gerald J. Hahn GE Global Research (Retired) gerryhahn@yahoo.com Necip Doganaksoy GlobalFoundries necipdoganaksoy@yahoo.com ASQ RELIABILITY DIVISION WEBINAR November 14, 2013 3
  • 4. THE OBVIOUS, THE EXPECTATION AND THE REALITY • The Obvious – Statistical quality and reliability analyses are based upon sample data (and assumptions about sampled populations, etc.) – Such analyses are only as good as the data upon which they are based – Bad data lead to more complex, less powerful or invalid analyses – David Moore: The most important information about any statistical study is how the data were produced • The Expectation: Much attention is given to the data acquisition process in training and applications • The Reality: Little or insufficient attention is generally given to the data acquisition process 4
  • 5. THE CONSEQUENCES AND THE CHALLENGE • The Consequences – Why is it that every database that I have encountered is filled with data quality problems? (Theodore Johnson, 2003 QPRC) – Common wisdom puts the extent of the total project effort spent in cleaning the data before doing any analysis as high as 60-95% (DeVeaux and Hand, Statistical Science 2005) • The Challenge: Move data-acquisition to front burner – Understand limitations of available data – Emphasize data acquisition – Use disciplined process 5
  • 6. WEBINAR TOPICS • • • • • • • • Typical data acquisition situations Problems (and opportunities) with observational data A disciplined, targeted approach for data acquisition Washing machine design reliability example Some guidelines for effective data acquisition Some practical challenges Some relevant further commentaries Elevator speech EMPHASIS ON QUALITY AND RELIABILITY 6
  • 7. TYPICAL DATA ACQUISITION SITUATIONS • Control over data acquisition – Designed experiments – Random sampling studies (from specified population) – Double-blind medical studies – Systems development studies, e.g., • • • • • Estimate design reliability Evaluate measurement system Assess process capability Signal changes via control charts Anticipate/avoid field failures by automated monitoring • Observational studies (and data mining) on existing data often from Big Data MANY APPLICATIONS INVOLVE COMBINATIONS 7
  • 8. PROBLEMS (AND OPPORTUNITIES) WITH OBSERVATIONAL DATA • Problems with “available” databases – Data obtained for purposes other than statistical analysis – Data resides in different data bases • Some limitations of observational data – – – – – • • Missing values and events Unrepresentative observations Inconsistent or imprecise measurements Limited variability Key impacting variables unrecorded; recorded proxy variables deemed “significant” (e.g., foot size impacts reading ability) Observational studies – May be helpful for prediction, e.g., credit performance, top selling items before expected hurricane, finding best time to buy plane ticket – Misleading or useless for gaining “cause and effect” understanding – Observation from the trenches (Kati Illouz, GE): Data owners tend to be overly optimistic about their data Data inadequacies (and reasons) define future information needs QUALITY—NOT QUANTITY—OF DATA IS WHAT COUNTS 8
  • 9. IN SUMMARY • Even the most sophisticated statistical analysis cannot compensate for or rescue inadequate data • It’s not that there is lack of data. Instead, it is that the data are inadequate to answer the questions (NY Times article on “How Safe is Cycling?” October 22, 2013 • Massive data does not guarantee success…Knowing how the data were collected (the “data pedigree”) is critical (Snee, Union College Mathematics Conference, October 2013) • A good principle to remember is that data are guilty until proven innocent, not the other way around (Snee and Hoerl, QP Dec 2012) • Observational data have an important role in pointing the way forward, but they should not be a primary ingredient for making final decisions (Anderson-Cook and Borror, QP April 2013) 9
  • 10. DISCIPLINED, TARGETED PROCESS FOR DATA ACQUISITION (DEUPM) FOR SYSTEMS DEVELOPMENT STUDY • Proposed process: – Step 1: D: Define the problem – Step 2: E: Evaluate the existing data – Step 3: U: Understand data acquisition opportunities and limitations – Step 4: P: Plan data acquisition and analysis – Step 5: M: Monitor, clean data, analyze and validate • Example: Demonstrate desired ten-year reliability for new washing machine design in 6 months elapsed time 10
  • 11. STEP 1: DEFINE THE PROBLEM • Define specific questions to be answered Washing machine design example: – Stated objective: Show within 6 months and with 95% confidence that following can be met: • 95% reliability after one year of operation • 90% reliability after five years • 80% reliability after ten years (“reliability” defined as no repair or servicing need) – Added question: How can reliability be improved further? • Identify resulting actions Washing machine design example: Go to full scale production if validated and make identified improvements • State population or process of interest Washing machine design example: 6 million machines to be built in next 5 years 11
  • 12. STEP 2: EVALUATE THE EXISTING DATA • Understand the process and its physical basis Washing machine design example: Study up and participate in design reviews, FMEA’s (Failure Mode and Effects Analyses), etc. • Determine and analyze existing data Washing machine design example – Previous design • Existing data – In-house component, sub-assembly and system tests – Field failure and servicing data • Conclusion: Previous design does not meet current reliability goals – New design • • • • • Proposed new design aims to correct key past problems Possible concern: Introduction of new failure modes Existing data: Component and sub-assembly test results Data identified one new failure mode; rapidly addressed and corrected Conclusion: Proposed new design appears to correct past problems without introducing new ones; reliability goals appear to be met • Identify data inadequacies Washing machine design example: No information about system performance in realistic use environment 12
  • 13. STEP 3:UNDERSTAND DATA ACQUISITION OPPORTUNITIES AND LIMITATIONS • Gain understanding of data that can be acquired and how Washing machine example: In-house accelerated use rate systems testing • Simulate 3.5 years of operation per month • Evaluate weekly for failures • Sample unfailed units and measure degradation (destructive test) • Determine practical considerations and limitations in data acquisition Washing machine design example: • 6 months of testing • 3 prototype lots initially (and one more subsequently) • 36 available test stands • Assess relevance of resulting data to meet study goals and underlying assumptions Washing machine design example: • Assume prototype lots representative of 5-year high volume production • Assume failures are cycle (and not elapsed time) dependent • Assume realistic simulation of field environment Conclusion: This is analytic (not enumerative) study; statistical 13 confidence bounds capture only statistical uncertainty
  • 14. STEP 4: PLAN DATA ACQUISITION AND IMPLEMENT • • • • Specify test conditions or operational environment Washing machine design example: Run washing machines with full load of soiled towels, mixed with sand, wrapped in plastic bag Specify sample size and selection process Washing machine design example: Select 12 units randomly from each of 3 prototype lots and put on life test Specify protocol and operational details Washing machine design example: – Record failures and determine failure mode – After 3 months and again after 6 months Years • Remove 4 units from each of 3 lots and measure degradation • Replace 3 month withdrawals with 12 units from 4th prototype lot – Assure high-precision measurements, meaningful failure definition, complete and consistent data recording procedures, etc Specify data analysis plan and assess expected statistical precision Washing machine design example: – Do Weibull distribution analysis on time to failure data after 6 months – Conduct supplementary analysis using degradation data – Simulation study demonstrated proposed plan provides desired statistical precision Specify pilot study Washing machine design example: Run three washing machines for one week Percent Failing • 14
  • 15. STEP 5: MONITOR, CLEAN DATA, ANALYZE AND VALIDATE • Monitor implementation to ensure that process is being followed Washing machine design example: Continue involvement • Clean data—as gathered Washing machine design example: Develop proactive checks for missing or inconsistent data • Conduct preliminary analyses; act thereon, as appropriate Washing machine design example: Analyze failure data after 1 week, 1 month and 3 months; identify failure modes for correction • Conduct final data analysis and report findings Washing machine design example: Do final analyses after 6 months (failure and degradation data) • Validate: Propose appropriate validation testing Washing machine design example: – Continue 6 of 36 units on test beyond 6 months – Test 100 machines with company employees and 60 machines in laundromats – Audit sample 6 production units each week: Test five for 1 week; one for 3 months – Develop system for capturing and analyzing field reliability data – Provide current data access to engineers and management 15
  • 16. SOME GUIDELINES FOR EFFECTIVE DATA ACQUISITION (STEP 4) RECORD KEY VARIABLES AND EVENTS Example: Use field data to estimate reliability and speedily identify/address root causes of failures calls for • Field data – – – – Estimate of product usage Product performance measurements over time Time to failure Failure mode information • Manufacturing data – – – – – – Parts and manufacturing lot identification Actual process conditions Ambient conditions during manufacture Unplanned events Other potentially important process variables End-of-line performance 16
  • 17. ENSURE CONSISTENT AND ACCURATE DATA RECORDING • Strive for precise measurements • Combat data recording inconsistencies – Differences between operators – Differences in qualitative scaling assessments – Differences in data recording conventions; e.g. date of 2/8 • Address missing values – Understand reason – Handle appropriately – Minimize occurrence • Conduct timely data cleaning: Identify “errors” in recorded data (e.g. 999 for missing values) and correct 17
  • 18. AVOID SYSTEMATICALLY UNRECORDED OBSERVATIONS Some examples: • Information recorded on failed units only • Information only during warranty period • Exclusion of “outlier” information • Purging of “old”—but still relevant-data 18
  • 19. SOME OTHER HINTS • Strive to obtain continuous data • Aim for compatibility and integration of databases • Consider sampling 19
  • 20. CHALLENGES • Some practical challenges – Added cost and possible delays – Added bureaucracy – Diversity of data ownership: Engineering, Manufacturing, etc. – Need for added work not evident Result: Lack of motivation by data recorders and their management • Strive to overcome by – – – – – Recognizing perspectives of others Understanding consequences of our requests Making requests as simple and reasonable as possible Automating data acquisition process Providing convincing justification (e.g., insurance) 20
  • 21. SOME RELEVANT FURTHER COMMENTARIES • Webinar adapted from – Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the Numbers, Wiley (Chapter 11). – Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key Challenge, Quality Engineering, Vol. 24, #4, 446-459. • Also note – Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data Collection Strategies to Enhance Your Quality Analyses, Quality Progress, April, 1829. – Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12. – DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20 (3) 121-238. – Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the Challenge, Presentation at Joint Statistical Meetings. – Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and Industry, Wiley, 2008. – Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming). – Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18. – Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and origin of your data?, Quality Progress, December, 66-68. – Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press 21
  • 22. ELEVATOR SPEECH • We need put the horse (focus on data acquisition) before the CART (Classification and Regression Tree) data analysis • Specific proposals – Focus on data acquisition in training programs – Scrutinize available data to assess relevance and identify gaps – Use disciplined, targeted process for added data acquisition – Remain constantly cognizant of underlying assumptions • Thanks for listening – Gerry Hahn, gerryhahn@yahoo.com – Necip Doganaksoy, necipdoganaksoy@yahoo.com 22
  • 23. SOME RELEVANT FURTHER COMMENTARIES • Webinar adapted from – – • Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the Numbers, Wiley (Chapter 11). Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key Challenge, Quality Engineering, Vol. 24, #4, 446-459. Also note – – – – – – – – – Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data Collection Strategies to Enhance Your Quality Analyses, Quality Progress, April, 18-29. Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12. DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20 (3) 121-238. Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the Challenge, Presentation at Joint Statistical Meetings. Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and Industry, Wiley, 2008. Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming). Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18. Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and origin of your data?, Quality Progress, December, 66-68. Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press 23