ML Zoomcamp 1.4 - CRISP-DM
- 3. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Session #1.4: Plan
● CRISP-DM — methodology for organizing ML projects
● From problem understanding to deployment
- 11. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Business understanding
● Our users complain about spam
● Analyze to what extent it’s a problem
● Will Machine Learning help?
● If not: propose an alternative solution
- 12. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Business understanding
Define the goal:
● Reduce the amount of spam messages, or
● Reduce the amount of complaints about spam
The goal has to be measurable
● Reduce the amount of spam by 50%
- 14. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Data understanding
Identify the data sources
● We have a report spam button
● Is the data behind this button good enough?
● Is it reliable?
● Do we track it correctly?
● Is the dataset large enough?
● Do we need to get more data?
- 15. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Data understanding
Identify the data sources
● It may influence the goal
● We may go back to the previous step and adjust it
- 19. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Subject: You won 1 MILLION!
From: winner@moneys.com
Congratulations! You've won $1,000,000!
In order to access the money, deposit $100 to
XXXXXX
Yours sincerely,
Moneyball
[1, 1, 0, 0, 1, 0]
Subject: You won 1 MILLION!
From: winner@moneys.com
Congratulations! You've won $1,000,000!
In order to access the money, deposit $100 to
XXXXXX
Yours sincerely,
Moneyball
Subject: You won 1 MILLION!
From: winner@moneys.com
Congratulations! You've won $1,000,000!
In order to access the money, deposit $100 to
XXXXXX
Yours sincerely,
Moneyball
Subject: You won 1 MILLION!
From: winner@moneys.com
Congratulations! You've won $1,000,000!
In order to access the money, deposit $100 to
XXXXXX
Yours sincerely,
Moneyball
[0, 1, 0, 0, 0, 1]
[0, 0, 0, 1, 1, 0]
[1, 1, 0, 0, 1, 1]
- 25. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Evaluation
Is the model good enough?
● Have we reached the goal?
● Do our metrics improve?
Goal: Reduce the amount of spam by 50%
● Have we reduced it? By how much?
● (Evaluate on the test group)
- 26. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Evaluation
Do a retrospective:
● Was the goal achievable?
● Did we solve/measure the right thing?
After that, we may decide to:
● Go back and adjust the goal
● Roll the model to more users/all users
● Stop working on the project
- 27. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Evaluation + Deployment
Often happens together:
● Online evaluation: evaluation of live users
● It means: deploy the model, evaluate it
- 33. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Summary
● Business understanding: define a measurable goal. Ask: do we need ML?
● Data understanding: do we have the data? Is it good?
- 34. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Summary
● Business understanding: define a measurable goal. Ask: do we need ML?
● Data understanding: do we have the data? Is it good?
● Data preparation: transform data into a table, so we can put it into ML
- 35. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Summary
● Business understanding: define a measurable goal. Ask: do we need ML?
● Data understanding: do we have the data? Is it good?
● Data preparation: transform data into a table, so we can put it into ML
● Modelling: to select the best model, use the validation set
- 36. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Summary
● Business understanding: define a measurable goal. Ask: do we need ML?
● Data understanding: do we have the data? Is it good?
● Data preparation: transform data into a table, so we can put it into ML
● Modelling: to select the best model, use the validation set
● Evaluation: validate that the goal is reached
- 37. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Summary
● Business understanding: define a measurable goal. Ask: do we need ML?
● Data understanding: do we have the data? Is it good?
● Data preparation: transform data into a table, so we can put it into ML
● Modelling: to select the best model, use the validation set
● Evaluation: validate that the goal is reached
● Deployment: roll out to production to all the users
- 38. DataTalks.Club — mlzoomcamp.com — @Al_Grigor
Summary
● Business understanding: define a measurable goal. Ask: do we need ML?
● Data understanding: do we have the data? Is it good?
● Data preparation: transform data into a table, so we can put it into ML
● Modelling: to select the best model, use the validation set
● Evaluation: validate that the goal is reached
● Deployment: roll out to production to all the users
● Iterate: start simple, learn from the feedback, improve