SlideShare a Scribd company logo
A or B?
Creating a Culture that Provokes
Failure and Boosts Improvement
Ben Dressler
Failing
= not reaching the goal you set yourself
Creating a culture that provokes failure and boosts improvement
“Anything we design, we’re going to test and iterate, Lean Startup-style. Just
because something looks good, doesn’t mean it’s actually working. This data-driven
approach gives us a more enhanced resolution on how the product is
behaving and succeeding compared to what a typical startup would do.”
Garrett Camp (StumbleUpon, Uber)
Successful companies, start-ups and corporations alike, are leveraging
strategies that are powered by failure as a way of learning and adapting.!
Progressive failure means failing inexpensively and rapidly, with clear learnings
and fast recovery.!
It’s not about risking catastrophic damage.!
Ingredients for 
progressive failure
1.  The Thumb of Caesar
2.  A Flight Recorder
3.  A Big Blackboard
Augmented reality
UK retailer - ex-catalogue business with 2 billion dollar turnover. Adopting a
test-and-fail culture on its journey to become a world class digital retailer.!
Cimagine – Israeli startup with a markerless augmented reality app for
furniture.!
„It‘s great“
„I would use this“
First verbal reactions from Shop Direct customers very encouraging – without
exception impressed by the technology. But…!
… users were not able to use basic functions of the app successfully.!
By our earlier definition this is a failure.!
The usual response to failing. We’re not going to do this.!
1. The Thumb of Caesar








The first element of successful processes that are based on failure: Having a
clear, measurable criterion that tells you whether or not you missed your goal. It
is crucial to know THAT you failed if you want to take lessons from it.!
Think evolution: No matter how or why – genes being passed on means
success, genes not passed on means failure.!
Examples
- Metrics in an A/B test

- Completion rates in user testing

- Any measurable goal (be tough on yourself!)
Ron Kohavi 
(Bing experimentation team)

“We measure 500 metrics. The
shipping decision is based on three.”
Ron Kohavi 
(Bing experimentation team)

“We measure 500 metrics. The
shipping decision is based on three.”

= 0.6% of all data influences decision


Focus is key here. If this isn’t reflecting what you’re trying as a business overall,
it will drive you into the wrong direction long term.!
Back to the example
Success criterion: 100% of users can use all basic features

Measurement 1: 0% of users could use all basic features
Thumb of Caesar
Know THAT you failed
-  Yes/No answer
-  Eliminate ideas/prototypes/hypotheses
-  Base tests on a rock solid criterion
-  Statistics may apply
-  You‘ll learn one thing, but that for sure

2. A Flight Recorder
Knowing THAT you failed is the basics. But in order to improve you need more
information. That is why you also need to know HOW you failed.!
In the 1950s no one was interested in funding what would later become the
flight recorder, or black box. In spring 2014 an estimated 60m€ are being spent
on finding a single one of those devices.!
Games 
do it 
well
Failing in a game usually leaves you with a trace of audiovisual feedback that
gives you a good idea of all the events leading to the failure. !
A multitude
of sources
This stage is all about gathering lots of rich, varied data. It’s not about
answering questions (yet), its about generating as many as possible! !
Ron Kohavi 
(Bing experimentation team)

“We measure 500 metrics. The
shipping decision is based on three.”
Ron Kohavi 
(Bing experimentation team)

“We measure 500 metrics. The
shipping decision is based on three.”

= 99.4% of all data is used for investigating


It’s the antithesis of the Thumb of Caesar – we’re not concerned with
measuring or testing. All we want is data of all kinds to investigate.!
Back to the example
User 
testing

Live on-demand data feed
Observation: Users aim higher than they should and
drag in unexpected ways
A Flight Recorder
Know HOW you failed
-  It‘s about having loads of data
-  It‘s about generating ideas
-  Don‘t confuse with hard evidence
-  No need to monitor all the time

3. A Big Blackboard
When we know THAT we fail and HOW we fail it is time we think about the
WHY. Now we throw all the data we got at the blackboard and try to understand
relations, build theories and come up with clean hypotheses.!
Examples
- Why it looks good: Design theory
- Why users do XYZ: Psychology
- Why you‘ll have product-market fit: Market models
Theory: Collection of ideas and assumptions that try to explain causal
relationships of a system (e.g. the user behaviour or growth development)!
Back to the example
1.  Older users are not familiar with 3D
technology
2.  Users aim too high because they have a
mental image of an overlay rather than a
3D environment
3.  Many users skip tutorials

The Big 
Black-
board
Test hypothesis:
Masking the lower half of the camera screen will nudge users to aim
lower with the device.
This variation on the app tests nothing but the specific hypothesis we created. If
completion rates don’t improve we need to form a new one.!
A Big Blackboard
Know WHY you failed
-  Have a theory of the relevant system
-  Let different theories rival each other
-  Build yes/no hypotheses to predict effects
-  Modify theory after failure
Constant data feeds
Test: Fail/Success
Theory and
Hypotheses
Flight Recorder
Big Blackboard
Thumb of Caesar
After a few iterations
Thumb of Caesar: 100% of users could use all basic features
Now we not only have a better product that is fit for launch – we also have learnt
fundamental things about how our user and the product behave. (see Garrett Camps
quote at the start)!
The Spotify Redesign
Big 
Black-
board 
1.  Design of Spotify lags behind
2.  Design is a factor in attracting users
3.  A good design results from...
(insert design theory here)
1. Blackboard = theories. These were some of the theories the team at Spotify
had when going into the redesign.!
The old design (top left) plus 3 different versions were used to get an initial feel
for user preferences.!
Thumb
Of Caesar
1.  Users will prefer one out of four
2.  The winning design will increase
brand perception
3.  The new design will make users
more satisfied with the product
4.  Any redesign will not hurt the
commercial metric
2. Thumb of Caesar = testing yes/no hypotheses. These were some of the hypotheses at the
time of going into testing stage. Importance of focus: Improving commercial metrics in the
short term wasn’t a focus at this point so the success criterion was only to not lower them!!
1.  Users will prefer one design Thumb up
2.  Brand perception and user satisfaction up Thumb up
User activation 
 
No change
User retention 
 
No change
# of songs played 
 
No change
A
 B
Flight
Recorder
1.  Raving press reviews
2.  Great ratings
3.  Positive user comments
3. Flight recorder = rich, diverse data collection. In this case the team gathered press
reviews, app store ratings, user comments and all test data that wasn’t used for the primary
hypotheses. !
Good things only become a culture if you keep doing them! Look at the data and keep asking questions.
Challenge your ideas/products/business/colleagues models, establish success and fail criteria. And build
razor sharp hypotheses – ”Our business idea will change the world” is too high level! !
@BenDressler

More Related Content

Creating a culture that provokes failure and boosts improvement

  • 1. A or B? Creating a Culture that Provokes Failure and Boosts Improvement Ben Dressler
  • 2. Failing = not reaching the goal you set yourself
  • 4. “Anything we design, we’re going to test and iterate, Lean Startup-style. Just because something looks good, doesn’t mean it’s actually working. This data-driven approach gives us a more enhanced resolution on how the product is behaving and succeeding compared to what a typical startup would do.” Garrett Camp (StumbleUpon, Uber) Successful companies, start-ups and corporations alike, are leveraging strategies that are powered by failure as a way of learning and adapting.!
  • 5. Progressive failure means failing inexpensively and rapidly, with clear learnings and fast recovery.!
  • 6. It’s not about risking catastrophic damage.!
  • 7. Ingredients for progressive failure 1.  The Thumb of Caesar 2.  A Flight Recorder 3.  A Big Blackboard
  • 8. Augmented reality UK retailer - ex-catalogue business with 2 billion dollar turnover. Adopting a test-and-fail culture on its journey to become a world class digital retailer.!
  • 9. Cimagine – Israeli startup with a markerless augmented reality app for furniture.!
  • 10. „It‘s great“ „I would use this“ First verbal reactions from Shop Direct customers very encouraging – without exception impressed by the technology. But…!
  • 11. … users were not able to use basic functions of the app successfully.!
  • 12. By our earlier definition this is a failure.!
  • 13. The usual response to failing. We’re not going to do this.!
  • 14. 1. The Thumb of Caesar The first element of successful processes that are based on failure: Having a clear, measurable criterion that tells you whether or not you missed your goal. It is crucial to know THAT you failed if you want to take lessons from it.!
  • 15. Think evolution: No matter how or why – genes being passed on means success, genes not passed on means failure.!
  • 16. Examples - Metrics in an A/B test
 - Completion rates in user testing
 - Any measurable goal (be tough on yourself!)
  • 17. Ron Kohavi (Bing experimentation team) “We measure 500 metrics. The shipping decision is based on three.”
  • 18. Ron Kohavi (Bing experimentation team) “We measure 500 metrics. The shipping decision is based on three.” = 0.6% of all data influences decision Focus is key here. If this isn’t reflecting what you’re trying as a business overall, it will drive you into the wrong direction long term.!
  • 19. Back to the example
  • 20. Success criterion: 100% of users can use all basic features Measurement 1: 0% of users could use all basic features
  • 21. Thumb of Caesar Know THAT you failed -  Yes/No answer -  Eliminate ideas/prototypes/hypotheses -  Base tests on a rock solid criterion -  Statistics may apply -  You‘ll learn one thing, but that for sure

  • 22. 2. A Flight Recorder Knowing THAT you failed is the basics. But in order to improve you need more information. That is why you also need to know HOW you failed.!
  • 23. In the 1950s no one was interested in funding what would later become the flight recorder, or black box. In spring 2014 an estimated 60m€ are being spent on finding a single one of those devices.!
  • 24. Games do it well Failing in a game usually leaves you with a trace of audiovisual feedback that gives you a good idea of all the events leading to the failure. !
  • 25. A multitude of sources This stage is all about gathering lots of rich, varied data. It’s not about answering questions (yet), its about generating as many as possible! !
  • 26. Ron Kohavi (Bing experimentation team) “We measure 500 metrics. The shipping decision is based on three.”
  • 27. Ron Kohavi (Bing experimentation team) “We measure 500 metrics. The shipping decision is based on three.” = 99.4% of all data is used for investigating It’s the antithesis of the Thumb of Caesar – we’re not concerned with measuring or testing. All we want is data of all kinds to investigate.!
  • 28. Back to the example
  • 29. User testing Live on-demand data feed Observation: Users aim higher than they should and drag in unexpected ways
  • 30. A Flight Recorder Know HOW you failed -  It‘s about having loads of data -  It‘s about generating ideas -  Don‘t confuse with hard evidence -  No need to monitor all the time

  • 31. 3. A Big Blackboard When we know THAT we fail and HOW we fail it is time we think about the WHY. Now we throw all the data we got at the blackboard and try to understand relations, build theories and come up with clean hypotheses.!
  • 32. Examples - Why it looks good: Design theory - Why users do XYZ: Psychology - Why you‘ll have product-market fit: Market models Theory: Collection of ideas and assumptions that try to explain causal relationships of a system (e.g. the user behaviour or growth development)!
  • 33. Back to the example
  • 34. 1.  Older users are not familiar with 3D technology 2.  Users aim too high because they have a mental image of an overlay rather than a 3D environment 3.  Many users skip tutorials The Big Black- board Test hypothesis: Masking the lower half of the camera screen will nudge users to aim lower with the device.
  • 35. This variation on the app tests nothing but the specific hypothesis we created. If completion rates don’t improve we need to form a new one.!
  • 36. A Big Blackboard Know WHY you failed -  Have a theory of the relevant system -  Let different theories rival each other -  Build yes/no hypotheses to predict effects -  Modify theory after failure
  • 37. Constant data feeds Test: Fail/Success Theory and Hypotheses Flight Recorder Big Blackboard Thumb of Caesar
  • 38. After a few iterations Thumb of Caesar: 100% of users could use all basic features
  • 39. Now we not only have a better product that is fit for launch – we also have learnt fundamental things about how our user and the product behave. (see Garrett Camps quote at the start)!
  • 41. Big Black- board 1.  Design of Spotify lags behind 2.  Design is a factor in attracting users 3.  A good design results from... (insert design theory here) 1. Blackboard = theories. These were some of the theories the team at Spotify had when going into the redesign.!
  • 42. The old design (top left) plus 3 different versions were used to get an initial feel for user preferences.!
  • 43. Thumb Of Caesar 1.  Users will prefer one out of four 2.  The winning design will increase brand perception 3.  The new design will make users more satisfied with the product 4.  Any redesign will not hurt the commercial metric 2. Thumb of Caesar = testing yes/no hypotheses. These were some of the hypotheses at the time of going into testing stage. Importance of focus: Improving commercial metrics in the short term wasn’t a focus at this point so the success criterion was only to not lower them!!
  • 44. 1.  Users will prefer one design Thumb up
  • 45. 2.  Brand perception and user satisfaction up Thumb up
  • 46. User activation No change User retention No change # of songs played No change A B
  • 47. Flight Recorder 1.  Raving press reviews 2.  Great ratings 3.  Positive user comments 3. Flight recorder = rich, diverse data collection. In this case the team gathered press reviews, app store ratings, user comments and all test data that wasn’t used for the primary hypotheses. !
  • 48. Good things only become a culture if you keep doing them! Look at the data and keep asking questions. Challenge your ideas/products/business/colleagues models, establish success and fail criteria. And build razor sharp hypotheses – ”Our business idea will change the world” is too high level! !