SlideShare a Scribd company logo
1
© All rights reserved
to
Agile Data Science
Volodymyr (Vlad) Kazantsev
Head of Data Science at Product Madness
2015
2
volodymyrk
What are we trying to solve?
3
volodymyrk
What do we want to achieve?
Happy Data Team
Transparency over delivery
and priorities
Minimize Waste
Deliver lots of
Value
4
volodymyrk
What we do?
5
volodymyrk
Heart of Vegas in (public) Numbers
* source: App Annie, 2nd of March
Top Grossing
Games US
Top Grossing
Games AU
iphone 29 (+1) 1 (+1)
ipad 8 (+2) 1 (-)
Android 16 (+2) 1 (-)
Facebook 5(+1)
6
volodymyrk
Data Team
● Ad-hoc analytics and
daily fires; dashboards
● Deep dive analysis;
Predictive analytics
● ETL, Data Viz tools,
R&D, DBA
Analytics
Data
Science
Data
Engineering
8 people; 4 in London
7
volodymyrk
Technology Stack
ETL
orchestration
Transformation
& Aggregation
SQL
Data Products
Reports
Dashboards
+
8
volodymyrk
Technology Stack
ETL
orchestration
Transformation
& Aggregation
SQL
Data Products
Reports
Dashboards
+
9
volodymyrk
few examples ..
A B
A/B TestsCustomer Lifetime Value
days
$value
Segmentation
group 1 group 2 group 3 group 4
Agile Data Science
11
volodymyrk
Data Scientist..
Coding
Maths and Stats
Business and
Marketing expert
12
Lesson 1: Agile Philosophy for Data Science
1
13
volodymyrk
Agile Manifesto
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
* agilemanifesto.org
14
volodymyrk
Agile Data Science Manifesto
Individuals and interactions over processes and tools
Actionable insights over comprehensive reports
Customer collaboration over project negotiation
Responding to change over following a plan
15
“If a building doesn’t encourage [collaboration], you’ll lose a lot of
innovation and the magic that’s sparked by serendipity” - Steve Jobs
Individuals and interactions over processes and tools
16
Individuals and interactions over processes and tools
Standing Desks + Easily Available
Whiteboard
17
Agile Principles
Iterative, incremental and evolutionary
Efficient and face-to-face communication
Very short feedback loop and adaptation cycle
Quality focus
- iterations, timeboxed estimates
- no to tasks by email (with no face-to-face)
- daily standups, pair analysis
- verifiable, reproducible findings
18
Scrum-Ban in Data Science @ProductMadness
● Weekly cycle
● Daily standup meeting @10am
● ToDo/WIP/Waiting buckets are kept small
● Disruptions to weekly plan are expected
● On-demand planning
19
Data Science Board
20
Lesson 1: Agile methods in Data Science
1. co-location matter; whiteboard next to your desk
2. Work with decision maker; share preliminary findings
3. Make a research plan; pivot early
4. Book “Findings” meeting before project start
5. MVP for Data Products
6. Do Daily Stand-ups !
21
Lesson 2: Agile Velocity vs. Acceleration
2
22
What is Agile Acceleration
Waterfall Scrum
Units of Work
Time Interval
Velocity = ΔVelocity = Acceleration* ΔTime
VS.
23
a = F
m
I run SQL, copy-
paste data to Excel
and send it by email
I created a deep
neural network to
predict high
spenders
24
Case Study: to Git or not to Git
Scripts (ruby, bash, python)
Python Apps
Python Modules
IPython Notebooks
Research Documents (word)
Presentations (powerpoint)
Spreadsheets (excel)
25
Case Study: Git or not to Git
Scripts (ruby, bash, python)
Python Apps
Python Modules
IPython Notebooks ?
Research Documents (word)
Slides (powerpoint)
Spreadsheets (excel)
26
Case Study: Git or not to Git
Scripts (ruby, bash, python)
Python Apps
Python Modules
IPython Notebooks
Research Documents (word)
Slides (powerpoint)
Spreadsheets (excel)
27
Remove unnecessary weight
28
Friction
29
Friction: Mini Case Studies
re.dash for self-service analytics cloud-hosted Jupyter notebooks
30
Lesson 2: find the lightest suitable tool
1. IPython notebooks: Dropbox over Git
2. Google Slides over Powerpoint
Google Slides over Email with images
3. Google Spreadsheets over Excel
4. Podio over Jira
5. Data Transformations in DWH in SQL over Hadoop
6. re.dash over SQL Workbench+csv export+excel
7. Hosted Jupyter over local python
31
Lesson 3: Focus on Closing the Loop
3
32
Iterative development
7-30
days
33
Scrum for Data Science?
Assumptions:
● Motivated ninjas
● Isolated and co-located team
● Clear direction
● You can estimate work
Reality:
● Unicorns are rare
● Constant interruption; 3 locations
● Lots of unknown-unknowns
● You can estimate very little
34
Analytics Loop
Spot Opportunity
Ask the Right
Question
Make Decision
Improve the
Business
Data Science
@work
35
Analytics Spiral
Ideas &
Questions
Data
Analysis
Insights
Impact
36
Limit the number of Open Loops
90% 90%
75%80%
80%60%
100% 100%
100%100%
0% 0%
Always prefer to have:
90% of tasks are 100%
complete
over 100% of tasks are
90% complete
VS.
37
Lesson 3: Focus on Closing the Loop
1. Don’t build predictive models that you can’t act upon. Don’t
analyse stuff that cannot help to make a decision
2. The best way to deal with Analytics Spiral is to avoid the spiral.
Practise Crack a Case and “what if” method.
3. Limit the number of “open loops”
38
Lesson 4: Reproducibility Matters
4
39
To the and back!
40
Why?
Boss: “Great! Can you run this for all monthly cohorts?”
Because:
41
Why?
Because:
Boss: “Sam is on holiday.
Can you re-run his analysis?”
42
Few IPython Tips
43
Import all commonly used tools
in one line.
All access and security is
abstracted away.
Focus on SQL, not data access
formatting and publishing a .png
in one line of code
PyCharm has great SQL editor
44
Lesson 4: Reproducibility
● Get rid of Windows and you get rid of Excel
● ipynb are always shared and versioned;
Prefer simple cloud sharing to VCS
● Streamline data access functions
● Cache long-running code and queries
● Develop a common library
45
In Summary...
46
Summary
● Agile approach works well for Data Science
● Find the lightest suitable tool for a task
● Reproducibility is not negotiable
● Focus on closing the loop(s)
47
2015 © All rights reserved to
Thank You!
jobs.productmadness.com
volodymyr.kazantsev@productmadness.com
volodymyrk
We Are Hiring !
jobs.productmadness.com

More Related Content

Agile Data Science