Why do most machine learning projects never make it to production
- 3. Machine Learning
Cloud Architecture
Fractional CIO/CTO
Innovation Workshops
Octavian
Technology
Group
- 5. What is your Organizational Role?
ⓘ Start presenting to display the poll results on this slide.
- 6. Why did you choose to come to this talk? What do you
hope to gain?
ⓘ Start presenting to display the poll results on this slide.
- 7. Have you taken part in a Machine Learning Project?
ⓘ Start presenting to display the poll results on this slide.
- 8. PROBLEM
We discuss common mistakes in
Machine Learning Projects.
What goes wrong?
My opinion on why these
mistakes are made.
Why does it go
wrong?
- 9. SOLUTION
How to Avoid
Becoming a
Statistic
What not to do in your
machine learning project.
My
Advice
Some tips on how I setup my
machine learning projects for
success.
- 11. 2
1
LEADERSHIP THROWS
MONEY AT MACHINE LEARNING?
“Money is only a tool. It will take you wherever you
wish, but it will not replace you as the driver.“
@ Ayn Rand
- 12. 2
1
Is only part of your leadership team
onboard?
Leadership
Misalignment Is your machine learning project an
experiment not expected to succeed?
The Experiment
LACK OF LEADERSHIP
SUPPORT
!
- 13. 2
1
POORLY DEFINED ROI
Why is this project critical to
your organization?
Why?
What will it bring to the
company? Increased
Revenue? Lower Cost?
New Line of Business?
What?
When will this investment be
realized?
When?
- 15. Data Scientists Focused On
Experimental Model Creation
Only
Data Science teams often focus on iterative experimentation and often consider their work done once the
model works.
- 16. Models are tossed over the
wall to an engineering team
Model is not computationally
or financially possible to run
at scale.
Not Scalable
Model takes too long to
execute to provide
production level
performance.
Too Slow
The input shape required for
the model is not possible in
a real time or production
environment.
Input Shape
Model is very difficult to
refactor into something
production worthy.
Not Refactorable
- 17. This is not MY model the
other team owns.
When one team creates the model and tosses it over the wall to another team to iterate the model, often
neither team feels ownership of the model. Treat a model like any other software unit in your organization!
- 19. Chasing the Unicorn
Starting with the earth
shattering, company redefining,
machine learning project,
instead of the simple and
achievable.
- 22. No MVP Plan
Every machine learning project
needs an MVP plan, just like
every software project.
- 25. DATA EXPERT
I know the data inside and out, and I know where it
all physically exists. I understand how the different
data relates to each other and I understand the
business domain well enough to provide context for
the rest of the team.
- 26. DATA SCIENTIST [optional]
I live and breath data. I know how to manipulate it and
apply world class techniques. I may not be very good at
coding, but I can do enough to get the result I want. I
often have a PHD and the math and statistics skills to
back it up.
- 27. BUSINESS SPONSOR
I’m strongly positioned within the business and well
respected by the company's leadership team. I am
the champion for this project and truly believe in it.
I can easily speak to the business benefit and when
it will be delivered.
- 28. ML ENGINEER
I’m a great engineer that understands design patterns,
development process, and production quality software
as well as any other engineer developing software within
the company. What differentiates me is a thorough
understanding of how to apply machine learning and the
ability to build custom ML models.
- 29. MLOPS ENGINEER
I understand how to manage the code, data, and
models associated with machine learning. I will
create a place and versioning scheme for each of
these. I also will create and manage a pipeline to
automate the training, testing, and deployment of
the machine learning models for this project. I bring
the rigor and repeatability of a traditional
development project to a machine learning project.
- 30. DATA LABELING TEAM
Our team is able to label each row of training data based
on our business domain knowledge. We will be able to
create massive data sets that are used by the rest of the
team to train and validate the models. We are usually
lower cost labor but are the most important part of the
team, the quality of your machine learning model will
never be any better than the quality of our labeling work.
- 32. Although machine learning work
is experimental you still need
process.
SDLC
How will I validate this model
works? How will I prove it works
in the real world?
Test Strategy Release Strategy
Will I do Side by Side releases of
different models? How will I roll
back? Can I turn off the ML
completely?
- 33. Although machine learning work
is experimental you still need
process.
SDLC
How will I validate this model
works? How will I prove it works
in the real world?
Test Strategy Release Strategy
Will I do Side by Side releases of
different models? How will I roll
back? Can I turn off the ML
completely?
- 34. Although machine learning work
is experimental you still need
process.
SDLC
How will I validate this model
works? How will I prove it works
in the real world?
Test Strategy Release Strategy
Will I do Side by Side releases of
different models? How will I roll
back? Can I turn off the ML
completely?
- 36. ACCESS
Data is often siloed to business units. Do not start a
project until full access to all data is secured for the
entire team.
FORMAT
Data is found in different database formats, and different
storage mediums. Often data is hiding in images and
video.
PRIVACY
Security and privacy requirements within an enterprise
or enforced by regulatory bodies.
QUANTITY
The amount of data needed for machine learning model
development is almost always greater than available.
LABELING
A plan is needed to bring together the vast amount of
manual labor and the domain knowledge to execute
data labeling.
- 37. ACCESS
Data is often siloed to business units. Do not start a
project until full access to all data is secured for the
entire team.
FORMAT
Data is found in different database formats, and different
storage mediums. Often data is hiding in images and
video.
PRIVACY
Security and privacy requirements within an enterprise
or enforced by regulatory bodies.
QUANTITY
The amount of data needed for machine learning model
development is almost always greater than available.
LABELING
A plan is needed to bring together the vast amount of
manual labor and the domain knowledge to execute
data labeling.
- 38. ACCESS
Data is often siloed to business units. Do not start a
project until full access to all data is secured for the
entire team.
FORMAT
Data is found in different database formats, and different
storage mediums. Often data is hiding in images and
video.
PRIVACY
Security and privacy requirements within an enterprise
or enforced by regulatory bodies.
QUANTITY
The amount of data needed for machine learning model
development is almost always greater than available.
LABELING
A plan is needed to bring together the vast amount of
manual labor and the domain knowledge to execute
data labeling.
- 39. ACCESS
Data is often siloed to business units. Do not start a
project until full access to all data is secured for the
entire team.
FORMAT
Data is found in different database formats, and different
storage mediums. Often data is hiding in images and
video.
PRIVACY
Security and privacy requirements within an enterprise
or enforced by regulatory bodies.
QUANTITY
The amount of data needed for machine learning model
development is almost always greater than available.
LABELING
A plan is needed to bring together the vast amount of
manual labor and the domain knowledge to execute
data labeling.
- 40. ACCESS
Data is often siloed to business units. Do not start a
project until full access to all data is secured for the
entire team.
FORMAT
Data is found in different database formats, and different
storage mediums. Often data is hiding in images and
video.
PRIVACY
Security and privacy requirements within an enterprise
or enforced by regulatory bodies.
QUANTITY
The amount of data needed for machine learning model
development is almost always greater than available.
LABELING
A plan is needed to bring together the vast amount of
manual labor and the domain knowledge to execute
data labeling.
- 48. 2
1
RESISTANCE TO OFF THE
SHELF SOLUTIONS
Offerings by many cloud
providers and many other
companies.
ML as a Service
Start with an existing model
known to solve a similar
problem and build from
there.
Open-Source
Model Machine learning to create
machine learning models.
Auto ML
- 54. The business needs should be directly
related to your users needs, if these are
misaligned you have a problem.
USER NEEDS
What is the return on investment? Translate
your ML project into an impact on the
business bottom line.
BOTTOM LINE
EXPECTATION
S
Find your business champion, if you are not
able to find one you likely don’t have a
viable project.
CHAMPION
Manage expectations, make sure your
promises are realistic in a short timeline.
BUSINESS
NEEDS
- 55. USER NEEDS
What is the return on investment? Translate
your ML project into an impact on the
business bottom line.
BOTTOM LINE
Find your business champion, if you are not
able to find one you likely don’t have a
viable project.
CHAMPION
EXPECTATION
S
The business needs should be directly
related to your users needs, if these are
misaligned you have a problem.
Manage expectations, make sure your
promises are realistic in a short timeline.
BUSINESS
NEEDS
- 56. USER NEEDS
BOTTOM LINE
Find your business champion, if you are not
able to find one you likely don’t have a
viable project.
CHAMPION
Manage expectations, make sure your
promises are realistic in a short timeline.
EXPECTATION
S
What is the return on investment? Translate
your ML project into an impact on the
business bottom line.
The business needs should be directly
related to your users needs, if these are
misaligned you have a problem.
BUSINESS
NEEDS
- 57. USER NEEDS
BOTTOM LINE
Find your business champion, if you are not
able to find one you likely don’t have a
viable project.
CHAMPION
Manage expectations, make sure your
promises are realistic in a short timeline.
EXPECTATION
S
What is the return on investment? Translate
your ML project into an impact on the
business bottom line.
The business needs should be directly
related to your users needs, if these are
misaligned you have a problem.
BUSINESS
NEEDS
- 59. Start with a proof of concept, fail fast, pivot
fast, and repeat until you have a feasible
project.
POC
MVP
ITERATION
Target a true minimal viable product and get it
in front of users fast.
Plan on iterating quickly and moving fast. It is
not uncommon to ship multiple models in a
week.
- 60. POC
MVP
ITERATION
Start with a proof of concept, fail fast, pivot
fast, and repeat until you have a feasible
project.
Target a true minimal viable product and get it
in front of users fast.
Plan on iterating quickly and moving fast. It is
not uncommon to ship multiple models in a
week.
- 61. POC
MVP
ITERATION
Start with a proof of concept, fail fast, pivot
fast, and repeat until you have a feasible
project.
Target a true minimal viable product and get it
in front of users fast.
Plan on iterating quickly and moving fast. It is
not uncommon to ship multiple models in a
week.
- 63. Evaluate all off the shelf solutions
that are applicable to this project
off the shelf
Initial machine learning projects
should avoid hand crafting
don’t hand craft pivot
If something simple and off the
shelf won’t work you have the
wrong starting project
- 64. Evaluate all off the shelf solutions
that are applicable to this project
off the shelf
Initial machine learning projects
should avoid hand crafting
don’t hand craft pivot
If something simple and off the
shelf won’t work you have the
wrong starting project
- 65. Evaluate all off the shelf solutions
that are applicable to this project
off the shelf
Initial machine learning projects
should avoid hand crafting
don’t hand craft pivot
If something simple and off the
shelf won’t work you have the
wrong starting project
- 73. identify:
Data Expert
Data Scientist
Business Sponsor
ML Engineer
MLOps Engineer
Data Labeling Team
THE
RIGHT
TEAM
- 75. MLOps
Initial MLOps pipelines are in place
and can deploy to a DEV environment.
SDLC
A team development process is in
place and members of the team
understand and participate in it.
PROCESS
- 77. Have a Documented
Test Plan
Model Testing
Have a plan for validating the model with
labeled data not used during the training
process.
Real World Validation
Have a small test group of real end users
that can put your model through real world
usage.
- 78. Have a RELEASE PLAN
Go / No Go
Establish metrics that will
determine if a release to
production has been successful.
A / B Transitioning
When updating models, move a
small part of your user base at a
time.
Rollback Plan
Have a plan to enable rolling back
to a previous model or disabling
its use.
Editor's Notes
- Justify your existence!
- Justify your existence!
- Lack of collaboration often Data Scientists working in a bubble
- Model focus vs customer focus, It’s too easy to chase the perfect model
- What is good enough, answer this question before you start!
- What does the customer actually want
- What does the customer need?
- Not invented here (NIH) is the tendency to avoid using or buying products, research, standards, or knowledge from external origins. It is usually adopted by social, corporate, or institutional cultures. Research illustrates a strong bias against ideas from the outside.[1]
WIKIPEDIA DEFINITION
- Model focus vs customer focus, It’s too easy to chase the perfect model
- Model focus vs customer focus, It’s too easy to chase the perfect model
- Model focus vs customer focus, It’s too easy to chase the perfect model