Data Acquisition: A Key Challenge for Quality and Reliability Improvement

Data Acquisition: A Key
Challenge for Quality
and Reliability
Improvement
Gerald J. Hahn & Necip Doganaksoy
©2013 ASQ & Presentation Hahn & Doganaksoy

http://reliabilitycalendar.org/webina
rs/

ASQ Reliability Division
English Webinar Series
One of the monthly webinars
on topics of interest to
reliability engineers.
To view recorded webinar (available to ASQ Reliability
Division members only) visit asq.org/reliability
To sign up for the free and available to anyone live
webinars visit reliabilitycalendar.org and select English
Webinars to find links to register for upcoming events

http://reliabilitycalendar.org/webina
rs/

DATA ACQUISITION: A KEY CHALLENGE FOR QUALITY
AND RELIABILITY IMPROVEMENT

Gerald J. Hahn
GE Global Research
(Retired)
gerryhahn@yahoo.com

Necip Doganaksoy
GlobalFoundries
necipdoganaksoy@yahoo.com

ASQ RELIABILITY DIVISION WEBINAR
November 14, 2013

3

THE OBVIOUS, THE EXPECTATION AND THE REALITY
• The Obvious
– Statistical quality and reliability analyses are based
upon sample data (and assumptions about
sampled populations, etc.)
– Such analyses are only as good as the data upon
which they are based
– Bad data lead to more complex, less powerful or
invalid analyses
– David Moore: The most important information
about any statistical study is how the data were
produced
• The Expectation: Much attention is given to the data
acquisition process in training and applications
• The Reality: Little or insufficient attention is generally
given to the data acquisition process
4

THE CONSEQUENCES AND THE
CHALLENGE
• The Consequences
– Why is it that every database that I have
encountered is filled with data quality problems?
(Theodore Johnson, 2003 QPRC)
– Common wisdom puts the extent of the total
project effort spent in cleaning the data before
doing any analysis as high as 60-95% (DeVeaux
and Hand, Statistical Science 2005)
• The Challenge: Move data-acquisition to front burner
– Understand limitations of available data
– Emphasize data acquisition
– Use disciplined process
5

WEBINAR TOPICS
•
•
•
•
•
•
•
•

Typical data acquisition situations
Problems (and opportunities) with observational data
A disciplined, targeted approach for data acquisition
Washing machine design reliability example
Some guidelines for effective data acquisition
Some practical challenges
Some relevant further commentaries
Elevator speech

EMPHASIS ON QUALITY AND RELIABILITY
6

TYPICAL DATA ACQUISITION SITUATIONS
• Control over data acquisition
– Designed experiments
– Random sampling studies (from specified
population)
– Double-blind medical studies
– Systems development studies, e.g.,
•
•
•
•
•

Estimate design reliability
Evaluate measurement system
Assess process capability
Signal changes via control charts
Anticipate/avoid field failures by automated monitoring

• Observational studies (and data mining) on
existing data often from Big Data
MANY APPLICATIONS INVOLVE COMBINATIONS

7

PROBLEMS (AND OPPORTUNITIES) WITH
OBSERVATIONAL DATA
•

Problems with “available” databases
– Data obtained for purposes other than statistical analysis
– Data resides in different data bases

• Some limitations of observational data
–
–
–
–
–

•

•

Missing values and events
Unrepresentative observations
Inconsistent or imprecise measurements
Limited variability
Key impacting variables unrecorded; recorded proxy variables
deemed “significant” (e.g., foot size impacts reading ability)
Observational studies
– May be helpful for prediction, e.g., credit performance, top selling
items before expected hurricane, finding best time to buy plane ticket
– Misleading or useless for gaining “cause and effect” understanding
– Observation from the trenches (Kati Illouz, GE): Data owners tend to
be overly optimistic about their data
Data inadequacies (and reasons) define future information needs

QUALITY—NOT QUANTITY—OF DATA IS WHAT COUNTS
8

IN SUMMARY
• Even the most sophisticated statistical analysis cannot
compensate for or rescue inadequate data
• It’s not that there is lack of data. Instead, it is that the
data are inadequate to answer the questions (NY Times
article on “How Safe is Cycling?” October 22, 2013
• Massive data does not guarantee success…Knowing
how the data were collected (the “data pedigree”) is
critical (Snee, Union College Mathematics
Conference, October 2013)
• A good principle to remember is that data are guilty
until proven innocent, not the other way around (Snee
and Hoerl, QP Dec 2012)
• Observational data have an important role in pointing
the way forward, but they should not be a primary
ingredient for making final decisions (Anderson-Cook
and Borror, QP April 2013)
9

DISCIPLINED, TARGETED PROCESS FOR DATA
ACQUISITION (DEUPM) FOR SYSTEMS
DEVELOPMENT STUDY
• Proposed process:
– Step 1: D: Define the problem
– Step 2: E: Evaluate the existing data
– Step 3: U: Understand data acquisition opportunities
and limitations
– Step 4: P: Plan data acquisition and analysis
– Step 5: M: Monitor, clean data, analyze and validate
• Example: Demonstrate desired ten-year reliability for
new washing machine design in 6 months elapsed time

10

STEP 1: DEFINE THE PROBLEM

• Define specific questions to be answered
Washing machine design example:
– Stated objective: Show within 6 months and with 95% confidence
that following can be met:
• 95% reliability after one year of operation
• 90% reliability after five years
• 80% reliability after ten years
(“reliability” defined as no repair or servicing need)
– Added question: How can reliability be improved further?

• Identify resulting actions
Washing machine design example: Go to full scale
production if validated and make identified improvements

• State population or process of interest
Washing machine design example: 6 million machines to
be built in next 5 years
11

STEP 2: EVALUATE THE EXISTING DATA
• Understand the process and its physical basis
Washing machine design example: Study up and participate in
design reviews, FMEA’s (Failure Mode and Effects Analyses), etc.
• Determine and analyze existing data
Washing machine design example
– Previous design
• Existing data
– In-house component, sub-assembly and system tests
– Field failure and servicing data
• Conclusion: Previous design does not meet current reliability goals

– New design
•
•
•
•
•

Proposed new design aims to correct key past problems
Possible concern: Introduction of new failure modes
Existing data: Component and sub-assembly test results
Data identified one new failure mode; rapidly addressed and corrected
Conclusion: Proposed new design appears to correct past problems
without introducing new ones; reliability goals appear to be met

• Identify data inadequacies
Washing machine design example: No information about system
performance in realistic use environment
12

STEP 3:UNDERSTAND DATA ACQUISITION
OPPORTUNITIES AND LIMITATIONS
• Gain understanding of data that can be acquired and how
Washing machine example: In-house accelerated use rate systems testing
• Simulate 3.5 years of operation per month
• Evaluate weekly for failures
• Sample unfailed units and measure degradation (destructive test)

• Determine practical considerations and limitations in data
acquisition
• 6 months of testing
• 3 prototype lots initially (and one more subsequently)
• 36 available test stands

• Assess relevance of resulting data to meet study goals and
underlying assumptions
• Assume prototype lots representative of 5-year high volume production
• Assume failures are cycle (and not elapsed time) dependent
• Assume realistic simulation of field environment
Conclusion: This is analytic (not enumerative) study; statistical
13
confidence bounds capture only statistical uncertainty

STEP 4: PLAN DATA ACQUISITION AND IMPLEMENT

•

•

•

•

Specify test conditions or operational environment
Washing machine design example: Run washing machines with full load of soiled
towels, mixed with sand, wrapped in plastic bag
Specify sample size and selection process
Washing machine design example: Select 12 units randomly from each of 3 prototype
lots and put on life test
Specify protocol and operational details
– Record failures and determine failure mode
– After 3 months and again after 6 months
Years
• Remove 4 units from each of 3 lots and measure degradation
• Replace 3 month withdrawals with 12 units from 4th prototype lot
– Assure high-precision measurements, meaningful failure definition, complete and
consistent data recording procedures, etc
Specify data analysis plan and assess expected statistical precision
– Do Weibull distribution analysis on time to failure data after 6 months
– Conduct supplementary analysis using degradation data
– Simulation study demonstrated proposed plan provides desired statistical precision
Specify pilot study
Washing machine design example: Run three washing machines for one week
Percent Failing

•

14

STEP 5: MONITOR, CLEAN DATA, ANALYZE AND
VALIDATE

• Monitor implementation to ensure that process is being followed
Washing machine design example: Continue involvement

• Clean data—as gathered
Washing machine design example: Develop proactive checks for missing or
inconsistent data

• Conduct preliminary analyses; act thereon, as appropriate
Washing machine design example: Analyze failure data after 1 week, 1 month
and 3 months; identify failure modes for correction

• Conduct final data analysis and report findings
Washing machine design example: Do final analyses after 6 months (failure
and degradation data)

• Validate: Propose appropriate validation testing
– Continue 6 of 36 units on test beyond 6 months
– Test 100 machines with company employees and 60 machines in
laundromats
– Audit sample 6 production units each week: Test five for 1 week; one for
3 months
– Develop system for capturing and analyzing field reliability data
– Provide current data access to engineers and management
15

SOME GUIDELINES FOR EFFECTIVE DATA
ACQUISITION (STEP 4)
RECORD KEY VARIABLES AND EVENTS
Example: Use field data to estimate reliability and
speedily identify/address root causes of failures calls
for
• Field data
–
–
–
–

Estimate of product usage
Product performance measurements over time
Time to failure
Failure mode information

• Manufacturing data
–
–
–
–
–
–

Parts and manufacturing lot identification
Actual process conditions
Ambient conditions during manufacture
Unplanned events
Other potentially important process variables
End-of-line performance
16

ENSURE CONSISTENT AND
ACCURATE DATA RECORDING
• Strive for precise measurements
• Combat data recording inconsistencies
– Differences between operators
– Differences in qualitative scaling assessments
– Differences in data recording conventions; e.g. date of 2/8

• Address missing values
– Understand reason
– Handle appropriately
– Minimize occurrence

• Conduct timely data cleaning: Identify “errors” in
recorded data (e.g. 999 for missing values) and correct
17

AVOID SYSTEMATICALLY
UNRECORDED OBSERVATIONS
Some examples:
• Information recorded on failed units
only
• Information only during warranty
period
• Exclusion of “outlier” information
• Purging of “old”—but still relevant-data
18

SOME OTHER HINTS
• Strive to obtain continuous data
• Aim for compatibility and integration of
databases
• Consider sampling

19

CHALLENGES
• Some practical challenges
– Added cost and possible delays
– Added bureaucracy
– Diversity of data ownership:
Engineering, Manufacturing, etc.
– Need for added work not evident
Result: Lack of motivation by data recorders and their
management

• Strive to overcome by
–
–
–
–
–

Recognizing perspectives of others
Understanding consequences of our requests
Making requests as simple and reasonable as possible
Automating data acquisition process
Providing convincing justification (e.g., insurance)
20

SOME RELEVANT FURTHER COMMENTARIES
•

Webinar adapted from
– Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the
Numbers, Wiley (Chapter 11).
– Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key
Challenge, Quality Engineering, Vol. 24, #4, 446-459.

•

Also note
– Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data
Collection Strategies to Enhance Your Quality Analyses, Quality Progress, April, 1829.
– Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a
Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12.
– DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20
(3) 121-238.
– Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the
Challenge, Presentation at Joint Statistical Meetings.
– Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and
Industry, Wiley, 2008.
– Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and
rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming).
– Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18.
– Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and
origin of your data?, Quality Progress, December, 66-68.
– Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing
Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press
21

ELEVATOR SPEECH
• We need put the horse (focus on data acquisition)
before the CART (Classification and Regression
Tree) data analysis
• Specific proposals
– Focus on data acquisition in training programs
– Scrutinize available data to assess relevance
and identify gaps
– Use disciplined, targeted process for added data
acquisition
– Remain constantly cognizant of underlying
assumptions
• Thanks for listening
– Gerry Hahn, gerryhahn@yahoo.com
– Necip
Doganaksoy, necipdoganaksoy@yahoo.com
22

SOME RELEVANT FURTHER COMMENTARIES
•

Webinar adapted from
–
–

•

Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the
Numbers, Wiley (Chapter 11).
Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key
Challenge, Quality Engineering, Vol. 24, #4, 446-459.

Also note
–
–
–
–
–
–
–
–

–

Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data Collection
Strategies to Enhance Your Quality Analyses, Quality Progress, April, 18-29.
Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a
Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12.
DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20 (3)
121-238.
Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the
Challenge, Presentation at Joint Statistical Meetings.
Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and
Industry, Wiley, 2008.
Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and
rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming).
Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18.
Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and
origin of your data?, Quality Progress, December, 66-68.
Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing
Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press

23

Data Acquisition: A Key Challenge for Quality and Reliability Improvement

More Related Content

Data Acquisition: A Key Challenge for Quality and Reliability Improvement