Limits of Machine Learning

aka Limits of Machine Learning
AI/ML for Product Management Bootcamp
2019/10/14
Reality of ML: Myth Busting and
Expectation Setting

2019/10/14
Expectation Setting
No myth busting, sorry

2019/10/14
Expectation Setting Somewhat

● I’m Alexey
● Data Scientist
● 󾓬
● Did some Java development
● And then a masters in BI
● Now at OLX - since October 2018
● Attended “PM Foundations” Workshop (so I’m qualified to
speak here)

Wanna
make a
present
ation
about
limits
of ML?

Pro tip
Mention people
with many
followers
Bill Gates, Hillary Clinton,

Random walk process
Data science without data
is just formulas

Weather forecast with
global warming
Deep learning when logistic
regression is enough

We should be aware about biases in
our heads

Other biases - e.g. selection bias:
Surveys - only data from people who decided to answer

Should prefer simple to complex

“Why did the model predict 42?”

Also in our case - we have moderators

Three broad categories
● Current SOTA limitations
● Expectation mismatch
● Technical difficulties in maintaining ML systems

SOTA Limitations &
Expectations mismatch

Current SOTA Limitations
We still haven’t found a good way to build:
● Generalized models for both images and text
● Conversational agents that pass the Turing test
● And many other things

Title generation
● CTO: Build me a system that generates good catchy titles
● Team: Okay
● [some time passes]
Disclaimer: it was in 2015, so maybe now it’s possible :)
TeamCTO

Title generation
● CTO: Build me a system that generates good catchy titles
● Team: Okay
● [some time passes]
● … We eventually made an “ok” system that uses templates

Solution?
● Rule of thumb:
○ Test all cool ideas with data scientists
● Data scientists don’t know?
○ Give them a bit of time to check if it’s possible
● Also, know what’s possible (sometimes hard)

How to know what’s possible with ML?
Check data science competitions
● Kaggle.com
● DrivenData.org
● Conference competitions (NIPS, WSDM, RecSys)
● Many more

https://www.kaggle.com/c/outbrain-click-prediction

Organizers did the hard work for us
Already done!
Check results at the end
https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining

Examples from competitions
● Answering multiple choice questions
● Severity of a network disruption
● Hotel recommendation
● Search relevance
● Duplicate listings detection
● Click prediction
● Intrusion detection using logs
● Youtube video tagging
● Predicting interest in property listings (low/med/high)
● Linking devices belonging to the same user
● And many many more

https://coursera.org/learn/machine-learning

Technical difficulties
We have problems like:
● Concept drifts and stale models
● Training-serving skew
● Unexpected data distributions
● Feedback loops

Pose estimation
● False positives in NSFW model
● Idea: let’s enhance it with pose estimation
https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5

Selling a dinosaur. Original. Ideal condition

Unexpected data distributions
● Funny, but not for moderators: they need to review many false positives
● Reason: when training the model, we didn’t expect to see dinosaurs
Solution:
● Learn on mistakes, use them as a feedback
● More training data

Solution
Learn from others - follow best practices
https://www.deeplearning.ai/machine-learning-yearning/

Solution
https://developers.google.com/machine-learning/guides/rules-of-ml

Solution
Examples of rules:
● Rule #4: Keep the first model simple and get the infrastructure right.
● Rule #14: Starting with an interpretable model makes debugging easier.
https://developers.google.com/machine-learning/guides/rules-of-ml

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
Huge time investment!

Image quality
Low image quality => low CTR
https://tech.olx.com/qualifying-image-quality-part-1-cropped-images-27bd7c3ef949
https://tech.olx.com/qualifying-image-quality-part-2-55b2479fb8a8
Let’s build a model for detecting image quality!

Image quality
Ok, now we have a model. Let’s build an infra around it to serve it

Image quality
Ok, now we have the infra. Let’s build a service that uses it

Image quality
Ok, now we have the service. Let’s run an experiment

Solution
Manual validation before building things

AutoML
DS PM
Do I need
to worry?

AutoML
DS PM
Why do I
actually need
them? 🤔

AutoML
DS PM
Why do I
actually need
them? 🤔
Hopefully now you know the answer

Limits of Machine Learning

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (8)

Similar to Limits of Machine Learning

Similar to Limits of Machine Learning (20)

More from Alexey Grigorev

More from Alexey Grigorev (20)

Recently uploaded

Recently uploaded (20)

Limits of Machine Learning