SlideShare a Scribd company logo
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
Confidential Prepared by Ver.
Data-driven approaches in a technology startup
1.0Michal Szczecinski
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
2
Hong Kong
Taiwan
Singapore
South Korea
China
China (300+ cities)
Since 02.2015
08. 2017 merged with 58
Suyun
Hong Kong
Since 07.2013
Singapore
Since 06.2014
South Korea (2 cities)
Since 10.2015
Taiwan
Since 11.2014
India
India
Since 03.2016
Established in 2013, GOGOVAN is the first app-based platform for
delivering goods in Asia.
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
3
What I will talk about?
Startup context
Goals of Analytics
Why data matters
Work cases
Lessons learnt
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
4
Hong Kong
Oxford
London
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
“Data guy”
• Business Intelligence
• Data engineering
• Data Science (data products)
• Data quality
• Digital Marketing/Growth
• Product analytics
• Financial modelling/forecasting
• Strategy analysis
• Big Data Research
• Data compliance
…...
Established multi-team contribution:
Corporate vs Startup
Multidisciplinary, tech wizz, “all-knowing”…. :
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 6
Goals of Analytics
Underlying vision is to make GOGOVAN data-driven.
6
1. Decision support
2. Knowledge discovery
3. Optimization
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 7
- Supporting all teams (product, operations,
marketing, customer service, engineering, finance,
management, legal and more…)
- Supporting all countries
- Everything related to data
- Multiple outputs (in house build dashboards, etl
jobs, interactive tools, notebooks, ML models,
scientific papers, ad hoc queries, alerts,
infrastructure and tools)
- Multiple input
- Data users across whole organisation
Data team
“Everything data” in GOGOVAN
7
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
8
Why data matters?
Just 3 examples (there is more…)
read more: https://towardsdatascience.com/what-does-a-data-team-really-do-12484482e683
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 9
1. Price - what user pays/what driver
earns.
2. Time - response, arrival, completion.
3. Quality - customer experience, effort,
reliability...
Service level
improving key components of our
service
9
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 10
1. Frontier - After certain point x as the
volume of orders grows, the completion
rate starts to fall exponentially.
2. Wall - Also there is a wall of soft limit of
numbers of orders that can be completed
no matter what is the volume of orders.
3. Improvement - In the whole history of
GOGOVAN that wall has been overcome
just once, very recently. Also this wall
has been steadily raising.
Completion rate
growing business activity
10
*axes and details removed for data confidentiality purposes
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 11
1. Transactions - is there any unusual
activity?
2. Partners - do all partners play fair?
3. Systems - are systems working fine?
4. Community - what are people saying?
5. Safety - are people and goods safe?
6. Competition - what’s going on in other
camps?
Anomaly detection
avoiding unexpected
11
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
12
What are we working on?
Applications - examples of projects and solutions..
Real use cases and tools (with transformed, hidden or masked details)
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
13
Decision Support
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 14
1. All-in-one place for data
2. Multi-use - reports, interactive tools,.
Self service, dashboards, algorithms,
docs, training videos etc.
3. Search
4. Tagging
5. Collaboration
Data Platform
Operating data services
14
Main dashboard charts
Goal: Provide decision support on all important areas of the company for the respective team members.
Action: Get important metrics by different breakdowns and time periods. Monitor progress and
Outcome: Lower Costs/More GMV/More Users
Next Generation self service analytics
Goal: Enable end users to to effectively analyse and retrieve the data.
Action: Build custom reports, share comments and insights, optimised UX.
Outcome: Lower costs/Better Service
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
17
Knowledge discovery
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 18
1. Focused on particular problem/question
2. Thousands of searchable and reproducible reports
3. Publishing tools
4. Auto Generated reports and alerts
5. Metadata and templates
6. Analytics Meetings
Notebooks
Scaling deep knowledge
18
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 19
Real Time Heatmap
19
1. Interactive monitoring
2. Adopted - used by ops
3. Goal: Visualize drivers and orders
4. Action: identify idle drivers and pending
orders, understand and affect distribution
of supply/demand
5. Outcome: Higher GMV/Better service
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 20
Marketplace analysis
Monitoring and stimulating GOGOVAN ecosystem
20
1. Arrival time
2. Distribution of orders
3. Supply/demand proportion
4. Completion time
5. Utilization rate
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
21
Optimization
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
22
Predicting demand
Algorithms
Selected examples
Predicting unmet demand
Predicting order status
Driver Matching
Route Optimization
Churn prediction
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 23
1. Responding to questions, what was an
impact of x ?
2. ARMA Exogenous Variable Model
(ARMAX)
3. DOW, Weather, Holiday
Demand prediction
Causal inference
23
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 24
1. Goal: Predict unmet demand and balance
supply/demand.
2. Action: Know how many more drivers we
need at particular regions at the
particular time in order to fulfill expected
demand.
3. Outcome: Better Service/Higher GMV
Unmet demand prediction
Balancing supply/demand
24
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 25
1. Goal: Optimise supply and demand.
2. Action: Match drivers to orders better so
that we optimise key operational KPIs.
3. Outcome: Lower Costs/Better
Service/Higher GMV
Dynamic Supply and Demand dispatching in
spatially structured region based on big data
analytics
Matching best driver.
25
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 26
1. Goal: Plan route in a way that utilizes
drivers time and provide cost benefits for
the customer.
2. Action: choose quickest route; avoid
obstacles, traffic and hot spots, predict
ETA, bundle orders so that is more cost
efficient for the driver
3. Outcome: Higher GMV/Lower
costs/Better Service
4. Scalable
5. Cost efficient
6. High performance
7. Customizable
8. In-house competitive advantage
Route Optimization
Increasing operations efficiency: route optimization.
Bundling and scheduling.
26
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 27
1. Interactive real time tool predicting
if/how fast order will be picked up
2. Response time (percentiles, absolute)
3. Zero rated probability
4. Feature Importance
5. Action: identify risky orders, assign
orders before they are cancelled by user,
add bonus/subsidy, redirect bad
performing orders to specified pool of
drivers/incentivize drivers/notify user to
add bonus in the app
6. Outcome: More revenue/More
users/Improved experience for user
Order status prediction
Estimating attractiveness of the order
27
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 28
1. Predicting churn
2. Identifying things that lead to churn
3. Prevent churn
Predicting churn
Engaging clients
28
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
29
How to become a data-driven organisation?
Lessons learnt
read more : article coming soon “Principles for becoming data-driven”.
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
30
30
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 31
1. Goals: 1) cross system data integration
2) analytics abstraction 3) data analytics
4) real time data services
2. Minimal management cost
3. Scalable
4. Well integrated with data analytics tools
5. Universal, being able to support different
type of systems and events
6. Facilitating productivity of data science
team , with minimized maintenance
effort and cognitive load
7. Ideally unified data science workflow
across batch and real time
Data Infrastructure
(GOGOTRACK)
Real Time analytics source
31
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 32
1. Scaling
2. Traceability
3. Multiple models
4. Flexibility
5. Multiple consumers
6. Reproducibility
7. Performance and availability
ML Logistics
(GOGOMI)
Operating data services
32
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
33
Data-driven Framework
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
34
ML/AI Initiatives
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
Ops Data Brain
35
Real-time Heatmap on steroids with ML recommendations for ops
Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission.
Confidential Prepared by Ver.
Michal Szczecinski
https://www.linkedin.com/in/michalszczecinski/
michal@gogotech.hk
Thank you
Michal Szczecinski

More Related Content

Data driven approaches in a technology startup

  • 1. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. Confidential Prepared by Ver. Data-driven approaches in a technology startup 1.0Michal Szczecinski
  • 2. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 2 Hong Kong Taiwan Singapore South Korea China China (300+ cities) Since 02.2015 08. 2017 merged with 58 Suyun Hong Kong Since 07.2013 Singapore Since 06.2014 South Korea (2 cities) Since 10.2015 Taiwan Since 11.2014 India India Since 03.2016 Established in 2013, GOGOVAN is the first app-based platform for delivering goods in Asia.
  • 3. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 3 What I will talk about? Startup context Goals of Analytics Why data matters Work cases Lessons learnt
  • 4. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 4 Hong Kong Oxford London
  • 5. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. “Data guy” • Business Intelligence • Data engineering • Data Science (data products) • Data quality • Digital Marketing/Growth • Product analytics • Financial modelling/forecasting • Strategy analysis • Big Data Research • Data compliance …... Established multi-team contribution: Corporate vs Startup Multidisciplinary, tech wizz, “all-knowing”…. :
  • 6. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 6 Goals of Analytics Underlying vision is to make GOGOVAN data-driven. 6 1. Decision support 2. Knowledge discovery 3. Optimization
  • 7. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 7 - Supporting all teams (product, operations, marketing, customer service, engineering, finance, management, legal and more…) - Supporting all countries - Everything related to data - Multiple outputs (in house build dashboards, etl jobs, interactive tools, notebooks, ML models, scientific papers, ad hoc queries, alerts, infrastructure and tools) - Multiple input - Data users across whole organisation Data team “Everything data” in GOGOVAN 7
  • 8. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 8 Why data matters? Just 3 examples (there is more…) read more: https://towardsdatascience.com/what-does-a-data-team-really-do-12484482e683
  • 9. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 9 1. Price - what user pays/what driver earns. 2. Time - response, arrival, completion. 3. Quality - customer experience, effort, reliability... Service level improving key components of our service 9
  • 10. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 10 1. Frontier - After certain point x as the volume of orders grows, the completion rate starts to fall exponentially. 2. Wall - Also there is a wall of soft limit of numbers of orders that can be completed no matter what is the volume of orders. 3. Improvement - In the whole history of GOGOVAN that wall has been overcome just once, very recently. Also this wall has been steadily raising. Completion rate growing business activity 10 *axes and details removed for data confidentiality purposes
  • 11. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 11 1. Transactions - is there any unusual activity? 2. Partners - do all partners play fair? 3. Systems - are systems working fine? 4. Community - what are people saying? 5. Safety - are people and goods safe? 6. Competition - what’s going on in other camps? Anomaly detection avoiding unexpected 11
  • 12. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 12 What are we working on? Applications - examples of projects and solutions.. Real use cases and tools (with transformed, hidden or masked details)
  • 13. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 13 Decision Support
  • 14. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 14 1. All-in-one place for data 2. Multi-use - reports, interactive tools,. Self service, dashboards, algorithms, docs, training videos etc. 3. Search 4. Tagging 5. Collaboration Data Platform Operating data services 14
  • 15. Main dashboard charts Goal: Provide decision support on all important areas of the company for the respective team members. Action: Get important metrics by different breakdowns and time periods. Monitor progress and Outcome: Lower Costs/More GMV/More Users
  • 16. Next Generation self service analytics Goal: Enable end users to to effectively analyse and retrieve the data. Action: Build custom reports, share comments and insights, optimised UX. Outcome: Lower costs/Better Service
  • 17. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 17 Knowledge discovery
  • 18. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 18 1. Focused on particular problem/question 2. Thousands of searchable and reproducible reports 3. Publishing tools 4. Auto Generated reports and alerts 5. Metadata and templates 6. Analytics Meetings Notebooks Scaling deep knowledge 18
  • 19. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 19 Real Time Heatmap 19 1. Interactive monitoring 2. Adopted - used by ops 3. Goal: Visualize drivers and orders 4. Action: identify idle drivers and pending orders, understand and affect distribution of supply/demand 5. Outcome: Higher GMV/Better service
  • 20. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 20 Marketplace analysis Monitoring and stimulating GOGOVAN ecosystem 20 1. Arrival time 2. Distribution of orders 3. Supply/demand proportion 4. Completion time 5. Utilization rate
  • 21. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 21 Optimization
  • 22. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 22 Predicting demand Algorithms Selected examples Predicting unmet demand Predicting order status Driver Matching Route Optimization Churn prediction
  • 23. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 23 1. Responding to questions, what was an impact of x ? 2. ARMA Exogenous Variable Model (ARMAX) 3. DOW, Weather, Holiday Demand prediction Causal inference 23
  • 24. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 24 1. Goal: Predict unmet demand and balance supply/demand. 2. Action: Know how many more drivers we need at particular regions at the particular time in order to fulfill expected demand. 3. Outcome: Better Service/Higher GMV Unmet demand prediction Balancing supply/demand 24
  • 25. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 25 1. Goal: Optimise supply and demand. 2. Action: Match drivers to orders better so that we optimise key operational KPIs. 3. Outcome: Lower Costs/Better Service/Higher GMV Dynamic Supply and Demand dispatching in spatially structured region based on big data analytics Matching best driver. 25
  • 26. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 26 1. Goal: Plan route in a way that utilizes drivers time and provide cost benefits for the customer. 2. Action: choose quickest route; avoid obstacles, traffic and hot spots, predict ETA, bundle orders so that is more cost efficient for the driver 3. Outcome: Higher GMV/Lower costs/Better Service 4. Scalable 5. Cost efficient 6. High performance 7. Customizable 8. In-house competitive advantage Route Optimization Increasing operations efficiency: route optimization. Bundling and scheduling. 26
  • 27. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 27 1. Interactive real time tool predicting if/how fast order will be picked up 2. Response time (percentiles, absolute) 3. Zero rated probability 4. Feature Importance 5. Action: identify risky orders, assign orders before they are cancelled by user, add bonus/subsidy, redirect bad performing orders to specified pool of drivers/incentivize drivers/notify user to add bonus in the app 6. Outcome: More revenue/More users/Improved experience for user Order status prediction Estimating attractiveness of the order 27
  • 28. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 28 1. Predicting churn 2. Identifying things that lead to churn 3. Prevent churn Predicting churn Engaging clients 28
  • 29. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 29 How to become a data-driven organisation? Lessons learnt read more : article coming soon “Principles for becoming data-driven”.
  • 30. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 30 30
  • 31. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 31 1. Goals: 1) cross system data integration 2) analytics abstraction 3) data analytics 4) real time data services 2. Minimal management cost 3. Scalable 4. Well integrated with data analytics tools 5. Universal, being able to support different type of systems and events 6. Facilitating productivity of data science team , with minimized maintenance effort and cognitive load 7. Ideally unified data science workflow across batch and real time Data Infrastructure (GOGOTRACK) Real Time analytics source 31
  • 32. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 32 1. Scaling 2. Traceability 3. Multiple models 4. Flexibility 5. Multiple consumers 6. Reproducibility 7. Performance and availability ML Logistics (GOGOMI) Operating data services 32
  • 33. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 33 Data-driven Framework
  • 34. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. 34 ML/AI Initiatives
  • 35. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. Ops Data Brain 35 Real-time Heatmap on steroids with ML recommendations for ops
  • 36. Copyrights: Proprietary and confidential. Not to be distributed or reproduced without permission. Confidential Prepared by Ver. Michal Szczecinski https://www.linkedin.com/in/michalszczecinski/ michal@gogotech.hk Thank you Michal Szczecinski