SlideShare a Scribd company logo
What are common mistakes
in Data Science projects?
(and how to avoid them?)
Artur Suchwałko, Ph.D., QuantUp
AI & Big Data 2018, March 10, 2018, Lviv, Ukraine
Real-world Data Science projects
Real-world Data Science projects
• Kaggle competitions and real Data Science projects are two quite
different disciplines
• When a data frame is prepared then it’s easy
• What is done not correctly and can be corrected?
• Analysis of a business problem
• Data
• Process
• Methods, models
• Hardware, sofware
• People
(Everything based on practical experience: 20 years, 100 projects, 3,000
hours of workshops.
For the majority of topics I could add quotes from talks.)
Analysis of a business problem

Recommended for you

A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing

This document provides a summary of a presentation on Rapid Software Testing. The presentation was given by Michael Bolton of DevelopSense and covered the methodology and mindset of rapid software testing. It emphasizes testing software expertly under uncertainty and time pressure. The presentation defines rapid testing as testing more quickly and less expensively while still achieving excellent results. It compares rapid testing to other approaches like exhaustive, ponderous, and slapdash testing. The presentation also discusses principles of rapid testing, how to recognize problems quickly using heuristics, and testing rapidly to fulfill the mission of testing.

Hiring a developer: step by step debugging
Hiring a developer: step by step debuggingHiring a developer: step by step debugging
Hiring a developer: step by step debugging

The document provides guidance on hiring developers based on Joel Spolsky's book "The Ultimate Developer Recruiting Guide". It summarizes the key steps in Spolsky's hiring process: conducting interviews that last at least an hour and involve both technical exercises and allowing candidates to interview the hiring manager; focusing on finding candidates that are both smart and able to get things done; and emphasizing the importance of treating candidates and employees well to attract and retain top talent.

startuphiringteam management
hypothesis driven development
hypothesis driven developmenthypothesis driven development
hypothesis driven development

Can we discover new possibilities and overcome our anchors by working with questions instead of answers

agile
No. We don’t want to build a model of
production and storage in our factory
Problem:
• We’d like just to optimize cutting a log (a trunk of a dead tree) into
planks
• Let’s do it in the simplest way. Why should we waste time and
money?
• The others can do it. Why do you make it complicated?!?
Solution:
• To build the production and storage model
�� Otherwise you will optimize log cutting in a different sawmill
• or something completely different
Solution of a wrong analytical problem
Problem:
• Stating of a wrong problem and solving it can decrease predictive
ability of a model
• Similarly, removing so called false predictors (leaks from future)
• But we never want to have pure predictive power. Usually business
wants actionability and real value
Solution:
• Focus on what influences your busines
Data
Preparation of a development sample is not
very important
Problem:
• Let’s take a sample and model!
• Preparation of the development sample decides if the model will fit
the reality we model or not
• The data and thus the sample is generated (or influenced) by a
process that must be well known and understoo
Solution:
• Think it over really carefully.

Recommended for you

Witness wednesdays informing agile software development with continuous user...
Witness wednesdays  informing agile software development with continuous user...Witness wednesdays  informing agile software development with continuous user...
Witness wednesdays informing agile software development with continuous user...

In the startup world speed to market is everything. This talk covers how it is possible to embed user insights into a rapid software development cycle by conducting usability studies that break the stereotype that "research takes too long." Justin Marx and Rebecca Destello illustrate how to plan, conduct, analyze and inform development sprints in just one week with what famously became known as "Witness Wednesdays." Justin Marx, Product Designer and Rebecca Destello, Manager, Research & Insights - both with Atlas Informatics.

lean uxresearchsoftware
Better products faster: let's bring the user into the userstory // TAPOST_201...
Better products faster: let's bring the user into the userstory // TAPOST_201...Better products faster: let's bring the user into the userstory // TAPOST_201...
Better products faster: let's bring the user into the userstory // TAPOST_201...

Why is it that everyone knows the importance of frequent user testing, yet hardly anyone does it? Because user testing often is time consuming, complex and expensive. It probably doesn’t fit in your development process and thus feels like extra work. To feel reassured you tell yourself to test with users once you have something working, or at the very end of the process. This is strange, because everybody knows that changing your product late in the process will increase costs exponentially. We created a way so that user testing saves time, improves the quality and doesn’t cost a lot of money. Team driven, pragmatic and no extra resources needed. The talk will show how, with only 2 hours every sprint, we focused on creating better products faster. We would love to share our learnings and simple DIY tools that let you start user testing with your current teams tomorrow!

icemobileuser testingux
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego

https://www.bigdataspain.org/2016/program/thu-managing-data-science.html https://www.youtube.com/watch?v=XolLvcdxP2c&t=48s&list=PL6O3g23-p8Tr5eqnIIPdBD_8eE5JBDBik&index=12

big data spain
We have Big Data. We need to implement
Big Data solutions
Problem:
• If you can email your data or fit it in a pendrive it means you don’t
have Big Data!
• Many Data Science tasks for millions of records can be completed
using (powerful) laptops
• Decisions are data-driven or not. It’s not about data magnitude but
about way the decisions are taken
Solution:
• Be (more than) sure that we need Big Data technologies for storing
and processing
• During PoC / prototype stage don’t use Big Data tools
• Important: Not valid for some problems
Use social media data
Problem:
• It’s a tremendous effort if you don’t use an off-the-shelf solution
• Usually business value is not big
Solution:
• Be sure that the effort will be rewarded
Process
Let’s build a model in one week
Problem:
• It’s possible (in theory)
• If you don’t analyze the process thoughtfully and don’t detect false
predictors then the model will not work in production
• We will be really happy to see how well it performs on our
development sample
Solution:
• Take enough time
• Be sure that the process is correct

Recommended for you

Problem solving section 1
Problem solving section 1Problem solving section 1
Problem solving section 1

This document provides an overview and agenda for a training on problem solving and root cause analysis. It covers defining problems versus symptoms, using the Plan-Do-Check-Adjust problem solving model, developing problem statements, prioritizing problems, and practicing active listening skills to understand current conditions and gather information. The training aims to help participants reduce defects by addressing ongoing or critical issues.

The 7 step problem solving methodology
The 7 step problem solving methodologyThe 7 step problem solving methodology
The 7 step problem solving methodology

The 7-Step Problem Solving Methodology outlines a standardized process for exploring problems, understanding root causes, and implementing effective solutions. The 7 steps include: 1) identifying the problem, 2) determining and ranking causes, 3) taking short-term action, 4) gathering data and designing tests, 5) conducting tests, analyzing data, and selecting a solution, 6) planning, implementing, and fail-safing the solution, and 7) measuring, evaluating, and recognizing the team. The methodology provides a disciplined approach for solving problems where the solution is not obvious.

Principal as agent of change
Principal as agent of change Principal as agent of change
Principal as agent of change

The principal plays a key role in facilitating school improvement and professional learning for teachers. As an agent of change, the principal must intentionally address barriers to teacher learning, such as focusing too much on confirming existing ideas rather than challenging them. Some strategies for interrupting barriers include using protocols to structure discussion, making preconceptions explicit, and viewing mistakes as learning opportunities. The principal also ensures school goals are aligned to student needs based on data and provides resources to support teachers in achieving goals.

There is too short time to complete the task /
model
Problem:
• Data problems
• Stucked in preprocessing
• The implementation takes too long
• Too short experience
Solution:
• Prepare a full product as soon as possible, e.g.:
• cutting out all the functionalities, e.g. a scoring application with a
simple / dummy model
• a full code for building the model but using simpler methods
• improve it in the next iterations
• Using CRISP-DM / checklist to support your memory
• Usually you can start implementation from the first product version
Way you prepare the result (a model, a data
product) doesn’t matter
Problem:
• I want a model. It must work. I don’t care how you’ll build it. Just
build it!
• The process is crucial
• If it is wrong then the analysis is not fully reproducible
• We take a technical debt
• and sooner or later we will be forced to pay it back
Solution:
• Build models in a fully reproducible way
Implementation – I’m sure it’ll work out
somehow
Problem:
• Implementation without planned tests usually fail
• What is really painful, it takes time to realize that they failed (a
model works and generates risk)
Solution:
• Plan both, implementation and tests
Methods & models

Recommended for you

The Pragmatic Agilist: estimating, improving quality, and communication with...
The Pragmatic Agilist: estimating, improving quality, and communication  with...The Pragmatic Agilist: estimating, improving quality, and communication  with...
The Pragmatic Agilist: estimating, improving quality, and communication with...

Money doesn’t grow on trees: developer teams are expensive and always need to deliver value. I’ll describe in a pragmatic way how we have adopted agile practices to deliver more value with the same team and to solve 3 pains: - estimation and deadlines - bug fixes and quality assurance - inefficient communication And without working overtime (or almost never).

agileagile software developmentagilize
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords

When going into the development of a software product, a possible source of mistake is the incorrect evaluation of the complexity that lies behind an idea , as well as a clutter coming from the massive amounts of technologies enabled. This presentation explains a possible way to deal with such issues.

team managementproducttechnologies
Problem solving skills
Problem solving skillsProblem solving skills
Problem solving skills

The document discusses problem solving skills and techniques. It describes the problem solving process as having five steps: 1) defining the problem, 2) finding possible solutions, 3) choosing the best solution, 4) implementing the solution, and 5) evaluating the solution. It also discusses common problem solving tools like brainstorming and the 5 Whys technique. Finally, it lists some reasons why people may fail to solve problems effectively, such as not being methodical or misinterpreting the problem.

AI. We desperately need AI!
Problem:
• We don’t need
• Predictive modeling is not AI!
• It happens that full control over a model is more important than
predictive power
Solution:
• Let’s think what we’d like to achieve and how to do this
• Data-driven decision making is more important
A model just learns everything it is exposed
to
Problem:
• You need to promise self-learning to sell a service / a software
• But it will not learn automatically if not fed by suitable data
• In many situations you don’t have such data to design a feedback loop
Solution:
• Analyze a process that generates the data for the development sample
• Put aside a “not touched” sample
• The model will be taught using a sample and refined in an ongoing
way
Start modeling from using Deep Learning!
Problem:
• But everybody uses it…
• No!!!
• Many problems are too simple for DL
• In particular, the problems with data in a data frame
Solution:
• Random Forest, xgboost
If we have 3000 classes then let’s build a
BIG classifier
Problem:
• For example when we’d like to recommend bank products
• Such a random classifier has error 2999/3000 = 99.97% (not 50%)
• Usually the dataset is too small
Solution:
• It’s good to use a simpler method (usually)

Recommended for you

Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job

Software development is not exactly the same as computer programming. When it comes to a career, development for productization introduces many more things than simply coding. It is important to learn how to accomplish tasks, sharpen skills, develop the career and enjoy it. And last but not the least, how to start?

software developmentcareer
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...

This document discusses the application of the Thiagi Four-Door model for rapid e-learning. It describes Sun Microsystems' use of the model to address problems with expensive, repetitive, and boring e-learning courses that lacked autonomy and had high attrition rates. The Four-Door model incorporates case studies, expert questions, tests, games, and a library to engage learners. Sun implemented a prototype in 2 months, piloted it in 2 more months, and fully deployed the first Four-Door course after 4 months with positive learner feedback and results. The document recommends obtaining business support, allowing design time, and paying attention to guidance for future Four-Door implementations.

Using Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A JobUsing Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A Job

The document provides an overview of problem solving skills and thinking differently, with the goal of helping unemployed professionals think in new ways to find jobs. It discusses critical vs creative thinking, systems thinking, statistical thinking, intuition, problem solving tools/methods, and lateral/intuitive thinking. Techniques for thinking differently include meditation, reconnecting with senses/intuition, analogies/metaphors, conversations/interviews, and learning something new. The document aims to get readers to open their minds to new ideas and think in ways outside their comfort zones.

criticalcreativethinking
Hardware & software
You can do calculations using a laptop
Problem:
• Sometimes yes, you can
• But usually you cannot
• Usually it doesn’t make any sense – human’s time is more expensive
that machine’s time
Solution:
• It is good to invest some money in hardware
• or use AWS from Amazon (or something similar)
Commercial software is excellent
Problem:
• Users often tell that it is excellent unless bought
• The problems appear later
Solution:
• Test it in similar conditions it will be used
• Think seriously about using open source
Free software is excellent (and it’s free!)
Problem:
• It’s free – in terms of a buying cost
• It’s not just excellent – the cost is neccessity to have qualified people
onboard and to develop software
• There happen inconvenient problems
Solution:
• Use as it should be used
• i.e. write clear and clean code, use additional tools, e.g. VCS
• Take care of the team to have the skills needed

Recommended for you

Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmap

This document provides an overview and agenda for Week 5 of the Data Scientist Enablement course. It includes discussions on data visualization and a quote by John Tukey. The learning plan recommends readings on data visualization and time series analysis. Activities involve practicing with visualization tools and datasets. The assignment is a comparative study of data visualization tools. Submissions are due by Saturday at 11:59pm via email. References and additional resources on data visualization are also provided.

Dancing for a product release
Dancing for a product releaseDancing for a product release
Dancing for a product release

This document discusses best practices for managing product releases and software engineering teams. It provides the following recommendations: 1) Establish clear processes for releases, including regular intervals, versioning, distribution, and metrics to measure success. Ensure everyone understands their role in the release cycle. 2) Use the Dreyfus model of skill acquisition to balance team skills and experience levels. Recruit for "smart and get things done" attitudes. Apply practices according to where the team stands. 3) Automate aspects like releases, reporting, and testing when possible, but also retain some manual processes to aid understanding of what to automate. Team learning takes time.

dreyfus modelproductteam management
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...

For the full video of this presentation, please visit: https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-warden For more information about embedded vision, please visit: http://www.embedded-vision.com Pete Warden, Google research engineer and the tech lead of the TensorFlow Mobile and Embedded team, presents the "Solving Vision Tasks Using Deep Learning: An Introduction" tutorial at the May 2018 Embedded Vision Summit. This talk introduces deep learning for vision tasks. It provides an overview of deep learning, explores its weaknesses and strengths, and highlights best approaches to applying deep learning to solving vision problems. The audience will learn to think about vision problems from a different perspective, understand what questions to ask, and discover where to find the answers to these questions. The talk will conclude with insights on the challenges of deploying deep learning solutions on mobile devices.

pete wardengoogleembedded vision
People
All companies have Data Science teams.
Let’s build one for us!
Problem:
• It’s possible to build a team. It will take a lot of time and lots of
money.
• If the results will be wasted then the people will leave
• They need to have fun working on projects
• If I need a plank then do I really need to buy a sawmill?
Solution::
• Be sure that:
• we know how to use their results
• it will give value to the business
• PoC can be outsourced. The first data science project can be
outsourced.
A student or a freshman is enough to give
profits from deep analytics to business
Problem:
• If someone can cut with a scalpel then will we call him a surgeon?
• Why someone who can build (technically) a model having a data
frame is called a Data Scientist?
• Data Scientist is a profession – experience matters!
• People without experience usually don’t give any business value for
a company. Even after spending a year working with data (!)
Solution:
• Hire experienced people, especially in the beginning of a DS journey
• let them teach the freshmen
• But what is you don’t have experienced people?
• Invest time, effort, and money in your team. Let a more business
analyst control the team
The team will learn everything on online
courses
Problem:
• I give each of you $20 (ok, even $50) and learn everything online
• It’s true. The team will learn some things
• But not the most important ones
• A good hands-on training cannot be substituted
Solution:
• Learning by doing (and applying)
• Control and stimulate learning
• Buy knowledge

Recommended for you

"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau

The document discusses various topics related to developing a technology product, including hiring an engineering team, creating a product, technical development challenges, and setting up processes. It provides advice on tuning your setup by considering human resources, available technologies, tools, and processes. It discusses common pitfalls and emphasizes focusing on users and testing. Technical concepts discussed include infrastructure, programming languages, servers, APIs, storage, desktop development, and mobile development.

fiscalitéfranceentrepreneuriat
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk

This document summarizes a presentation on data science consulting. It discusses: 1) The Agile Analytics group at ThoughtWorks which does data science consulting projects using probabilistic modeling, machine learning, and big data technologies. 2) Two case studies are described, including developing a machine learning model to improve matching of healthcare product data and using logistic regression for retail recommendation systems. 3) The origins and future of the field are discussed, noting that while not entirely new, data science has grown due to improvements in technology, programming languages, and libraries that have increased productivity and driven new career opportunities in the field.

big datanyc open data meetupnyc data science academy
Lecture13-Product-Development-PartI-Feb25-2018.pptx
Lecture13-Product-Development-PartI-Feb25-2018.pptxLecture13-Product-Development-PartI-Feb25-2018.pptx
Lecture13-Product-Development-PartI-Feb25-2018.pptx

This document summarizes a lecture about using the lean startup approach for product development. It discusses: - Using a minimum viable product (MVP) to test assumptions quickly without overbuilding. An example is Dropbox starting with a simple demo video. - The build-experiment-learn feedback loop, where you build an MVP, experiment to collect data, and learn how to improve. Key phases are identifying leap-of-faith assumptions to test like value and growth hypotheses. - The dilemma of having more traction to raise funds after validating assumptions with an MVP, rather than prematurely seeking funds with just an idea and no customer feedback. Starting small allows wisely using funds.

Summary
Summary
• To avoid mistakes it is good to ask ourselves these questions (and
answer them), e.g.:
• What business problem are we solving?
• What will be business value we can get from the results?
• What could be lost in translation fro business into analytics?
• Do we have adequate and representative data?
• What process does generate them? What are they influenced by?
• What is model building process?
• What analytical tools should be used? Could we apply simpler
approaches?
• How do we control all the risk?
• It is good to do it repeatedly
• It’s best to involve someone experienced
• It’s beneficial to educate the receivers of the results
Contact
Contact
• During the conference!
• After the conference: artur [at] quantup [dot] eu

Recommended for you

ACC presentation for QA Club Kiev
ACC presentation for QA Club KievACC presentation for QA Club Kiev
ACC presentation for QA Club Kiev

This document discusses using Google's Attribute-Component-Capability (ACC) model approach to help balance test efforts. The key points are: 1) The ACC model involves listing a product's attributes, breaking it into technical components, and categorizing capabilities. This provides an overview of test needs across the entire product. 2) Complexity, frequency of use, and user impact are assigned scores to capabilities. This determines relative "testing needs". 3) The ACC items, scores, and needs are tracked in a tool like Excel linked to a tool like TFS. This provides instant visibility into where more testing is required based on risk.

accrisk-based testing
CTO Crunch avec Julien Simon, Viadeo
CTO Crunch avec Julien Simon, ViadeoCTO Crunch avec Julien Simon, Viadeo
CTO Crunch avec Julien Simon, Viadeo

En puisant dans ses différentes expériences de direction technique (notamment chez Pixmania, Criteo et Viadeo), Julien Simon parlera des risques qui menacent les équipes et les plates-formes en forte croissance, en donnant au passage des pistes pour les anticiper et les résoudre.

ctocto nightcto crunch
Life in the tech trenches (2015)
Life in the tech trenches (2015)Life in the tech trenches (2015)
Life in the tech trenches (2015)

Presentation done at the "CTO Crunch" event by France Digitale, Paris, 24/02/2015. Based on his experience (VP Eng @ Digiplug, CTO @ Pixmania, VP Eng @ Criteo, CTO @ Aldebaran Robotics and now CTO @ Viadeo), Julien shares some hard-learned, bullshit-free lessons on what it means to be a CTO. Hiring, Tools, Methodology, Technology, Politics: welcome to Hell :)

ctoengineeringlessons

More Related Content

What's hot

Testing in the Wild
Testing in the WildTesting in the Wild
Testing in the Wild
Dawn Code
 
Startup Operating Systems
Startup Operating SystemsStartup Operating Systems
Startup Operating Systems
Dean Haritos
 
Herman- Pieter Nijhof - Where Do Old Testers Go?
Herman- Pieter Nijhof - Where Do Old Testers Go?Herman- Pieter Nijhof - Where Do Old Testers Go?
Herman- Pieter Nijhof - Where Do Old Testers Go?
TEST Huddle
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
TechWell
 
Hiring a developer: step by step debugging
Hiring a developer: step by step debuggingHiring a developer: step by step debugging
Hiring a developer: step by step debugging
Laurent Cerveau
 
hypothesis driven development
hypothesis driven developmenthypothesis driven development
hypothesis driven development
Andrew Pirkola
 
Witness wednesdays informing agile software development with continuous user...
Witness wednesdays  informing agile software development with continuous user...Witness wednesdays  informing agile software development with continuous user...
Witness wednesdays informing agile software development with continuous user...
Rebecca Destello
 
Better products faster: let's bring the user into the userstory // TAPOST_201...
Better products faster: let's bring the user into the userstory // TAPOST_201...Better products faster: let's bring the user into the userstory // TAPOST_201...
Better products faster: let's bring the user into the userstory // TAPOST_201...
Anna Witteman
 
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
Big Data Spain
 
Problem solving section 1
Problem solving section 1Problem solving section 1
Problem solving section 1
dwyer1an
 
The 7 step problem solving methodology
The 7 step problem solving methodologyThe 7 step problem solving methodology
The 7 step problem solving methodology
quest_pune
 
Principal as agent of change
Principal as agent of change Principal as agent of change
Principal as agent of change
Kim Crawford
 
The Pragmatic Agilist: estimating, improving quality, and communication with...
The Pragmatic Agilist: estimating, improving quality, and communication  with...The Pragmatic Agilist: estimating, improving quality, and communication  with...
The Pragmatic Agilist: estimating, improving quality, and communication with...
Thiago Colares
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
Laurent Cerveau
 
Problem solving skills
Problem solving skillsProblem solving skills
Problem solving skills
Doaa Kotb
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
Yung-Yu Chen
 
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
rpowell285
 
Using Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A JobUsing Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A Job
Gary Clement
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmap
Dr. Mohan K. Bavirisetty
 
Dancing for a product release
Dancing for a product releaseDancing for a product release
Dancing for a product release
Laurent Cerveau
 

What's hot (20)

Testing in the Wild
Testing in the WildTesting in the Wild
Testing in the Wild
 
Startup Operating Systems
Startup Operating SystemsStartup Operating Systems
Startup Operating Systems
 
Herman- Pieter Nijhof - Where Do Old Testers Go?
Herman- Pieter Nijhof - Where Do Old Testers Go?Herman- Pieter Nijhof - Where Do Old Testers Go?
Herman- Pieter Nijhof - Where Do Old Testers Go?
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
 
Hiring a developer: step by step debugging
Hiring a developer: step by step debuggingHiring a developer: step by step debugging
Hiring a developer: step by step debugging
 
hypothesis driven development
hypothesis driven developmenthypothesis driven development
hypothesis driven development
 
Witness wednesdays informing agile software development with continuous user...
Witness wednesdays  informing agile software development with continuous user...Witness wednesdays  informing agile software development with continuous user...
Witness wednesdays informing agile software development with continuous user...
 
Better products faster: let's bring the user into the userstory // TAPOST_201...
Better products faster: let's bring the user into the userstory // TAPOST_201...Better products faster: let's bring the user into the userstory // TAPOST_201...
Better products faster: let's bring the user into the userstory // TAPOST_201...
 
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
 
Problem solving section 1
Problem solving section 1Problem solving section 1
Problem solving section 1
 
The 7 step problem solving methodology
The 7 step problem solving methodologyThe 7 step problem solving methodology
The 7 step problem solving methodology
 
Principal as agent of change
Principal as agent of change Principal as agent of change
Principal as agent of change
 
The Pragmatic Agilist: estimating, improving quality, and communication with...
The Pragmatic Agilist: estimating, improving quality, and communication  with...The Pragmatic Agilist: estimating, improving quality, and communication  with...
The Pragmatic Agilist: estimating, improving quality, and communication with...
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
 
Problem solving skills
Problem solving skillsProblem solving skills
Problem solving skills
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
eLearning Guild Online Forum - Application of the Thiagi Four-Door Model for ...
 
Using Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A JobUsing Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A Job
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmap
 
Dancing for a product release
Dancing for a product releaseDancing for a product release
Dancing for a product release
 

Similar to Artur Suchwalko “What are common mistakes in Data Science projects and how to avoid them?"

"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
Edge AI and Vision Alliance
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
TheFamily
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
Lecture13-Product-Development-PartI-Feb25-2018.pptx
Lecture13-Product-Development-PartI-Feb25-2018.pptxLecture13-Product-Development-PartI-Feb25-2018.pptx
Lecture13-Product-Development-PartI-Feb25-2018.pptx
KamalKamalli1
 
ACC presentation for QA Club Kiev
ACC presentation for QA Club KievACC presentation for QA Club Kiev
ACC presentation for QA Club Kiev
Nikita Knysh
 
CTO Crunch avec Julien Simon, Viadeo
CTO Crunch avec Julien Simon, ViadeoCTO Crunch avec Julien Simon, Viadeo
CTO Crunch avec Julien Simon, Viadeo
France Digitale
 
Life in the tech trenches (2015)
Life in the tech trenches (2015)Life in the tech trenches (2015)
Life in the tech trenches (2015)
Julien SIMON
 
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
DevOpsDays Tel Aviv
 
Rex Sprint 0 - how build the data model with 2 BA and 3 IT architects
Rex Sprint 0 - how build the data model with 2 BA and 3 IT architectsRex Sprint 0 - how build the data model with 2 BA and 3 IT architects
Rex Sprint 0 - how build the data model with 2 BA and 3 IT architects
Jean-François Nguyen
 
Roadmap
RoadmapRoadmap
Roadmap
Adam Ochs
 
Adam Ochs - Office 365 Roadmap
Adam Ochs - Office 365 RoadmapAdam Ochs - Office 365 Roadmap
Adam Ochs - Office 365 Roadmap
Adam Ochs
 
Mini-Training: Using root-cause analysis for problem management
Mini-Training: Using root-cause analysis for problem managementMini-Training: Using root-cause analysis for problem management
Mini-Training: Using root-cause analysis for problem management
Betclic Everest Group Tech Team
 
Adopting innovation
Adopting innovationAdopting innovation
Adopting innovation
Shishir Choudhary
 
Shipping code is not the problem, deciding what to ship it is!
Shipping code is not the problem, deciding what to ship it is!Shipping code is not the problem, deciding what to ship it is!
Shipping code is not the problem, deciding what to ship it is!
Mauro Servienti
 
FDS Unit I_PPT.pptx
FDS Unit I_PPT.pptxFDS Unit I_PPT.pptx
FDS Unit I_PPT.pptx
sayalishivarkar1
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
Mandar Parikh
 
Adopting innovation
Adopting innovationAdopting innovation
Adopting innovation
Shishir Choudhary
 
Building Startups and Minimum Viable Products (NDC2013)
Building Startups and Minimum Viable Products (NDC2013)Building Startups and Minimum Viable Products (NDC2013)
Building Startups and Minimum Viable Products (NDC2013)
Ben Hall
 
Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)
careercup
 
20180324 zen and the art of programming
20180324 zen and the art of programming20180324 zen and the art of programming
20180324 zen and the art of programming
David Horvath
 

Similar to Artur Suchwalko “What are common mistakes in Data Science projects and how to avoid them?" (20)

"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Lecture13-Product-Development-PartI-Feb25-2018.pptx
Lecture13-Product-Development-PartI-Feb25-2018.pptxLecture13-Product-Development-PartI-Feb25-2018.pptx
Lecture13-Product-Development-PartI-Feb25-2018.pptx
 
ACC presentation for QA Club Kiev
ACC presentation for QA Club KievACC presentation for QA Club Kiev
ACC presentation for QA Club Kiev
 
CTO Crunch avec Julien Simon, Viadeo
CTO Crunch avec Julien Simon, ViadeoCTO Crunch avec Julien Simon, Viadeo
CTO Crunch avec Julien Simon, Viadeo
 
Life in the tech trenches (2015)
Life in the tech trenches (2015)Life in the tech trenches (2015)
Life in the tech trenches (2015)
 
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
 
Rex Sprint 0 - how build the data model with 2 BA and 3 IT architects
Rex Sprint 0 - how build the data model with 2 BA and 3 IT architectsRex Sprint 0 - how build the data model with 2 BA and 3 IT architects
Rex Sprint 0 - how build the data model with 2 BA and 3 IT architects
 
Roadmap
RoadmapRoadmap
Roadmap
 
Adam Ochs - Office 365 Roadmap
Adam Ochs - Office 365 RoadmapAdam Ochs - Office 365 Roadmap
Adam Ochs - Office 365 Roadmap
 
Mini-Training: Using root-cause analysis for problem management
Mini-Training: Using root-cause analysis for problem managementMini-Training: Using root-cause analysis for problem management
Mini-Training: Using root-cause analysis for problem management
 
Adopting innovation
Adopting innovationAdopting innovation
Adopting innovation
 
Shipping code is not the problem, deciding what to ship it is!
Shipping code is not the problem, deciding what to ship it is!Shipping code is not the problem, deciding what to ship it is!
Shipping code is not the problem, deciding what to ship it is!
 
FDS Unit I_PPT.pptx
FDS Unit I_PPT.pptxFDS Unit I_PPT.pptx
FDS Unit I_PPT.pptx
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
 
Adopting innovation
Adopting innovationAdopting innovation
Adopting innovation
 
Building Startups and Minimum Viable Products (NDC2013)
Building Startups and Minimum Viable Products (NDC2013)Building Startups and Minimum Viable Products (NDC2013)
Building Startups and Minimum Viable Products (NDC2013)
 
Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)
 
20180324 zen and the art of programming
20180324 zen and the art of programming20180324 zen and the art of programming
20180324 zen and the art of programming
 

More from Lviv Startup Club

Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...
Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...
Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...
Lviv Startup Club
 
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Lviv Startup Club
 
Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)
Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)
Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)
Lviv Startup Club
 
Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)
Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)
Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)
Lviv Startup Club
 
Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)
Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)
Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)
Lviv Startup Club
 
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Lviv Startup Club
 
Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...
Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...
Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...
Lviv Startup Club
 
Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...
Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...
Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...
Lviv Startup Club
 
Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...
Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...
Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...
Lviv Startup Club
 
Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...
Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...
Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...
Lviv Startup Club
 
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Lviv Startup Club
 
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Lviv Startup Club
 
Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)
Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)
Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)
Lviv Startup Club
 
Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)
Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)
Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)
Lviv Startup Club
 
Denis Shemyakin: Communication and Stakeholder Engagement (UA)
Denis Shemyakin: Communication and Stakeholder Engagement (UA)Denis Shemyakin: Communication and Stakeholder Engagement (UA)
Denis Shemyakin: Communication and Stakeholder Engagement (UA)
Lviv Startup Club
 
Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)
Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)
Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)
Lviv Startup Club
 
Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)
Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)
Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)
Lviv Startup Club
 
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...
Lviv Startup Club
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Lviv Startup Club
 
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Lviv Startup Club
 

More from Lviv Startup Club (20)

Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...
Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...
Rostyslav Chayka: Тренди та виклики української аутсорс індустрії в 2024 році...
 
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
 
Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)
Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)
Oleksandr Krakovetskyi: How AI is Redefining IT Outsourcing Strategies (UA)
 
Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)
Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)
Viktor Chekh: Делівері центр в Колумбії: перешкоди, досвід і результат (UA)
 
Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)
Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)
Anatoliy Klepach: Як не "продавати", а будувати відносини з клієнтами? (UA)
 
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
 
Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...
Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...
Maryna Ruban: Як SEO допомагає впізнаваності бренду та покращує його репутаці...
 
Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...
Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...
Maksym Sydorenko: Фішки автоматизації в LinkedIn: як отримувати якісні ліди? ...
 
Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...
Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...
Evgeniy Bachinskiy: BCP Real Talk. Making a plan that won't go down the drain...
 
Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...
Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...
Vladyslav Tkachuk: Ідентичність у сервісному бізнесі: як вирізнятися на ринку...
 
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
Valeriy Kozlov: Transition to Fact-Based, Data-Driven Decision Making in B2B ...
 
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
Oleksa Stelmakh: Scaling of IT Outsourcing Company: From 1 to 100 and from 10...
 
Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)
Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)
Oleksii Minakov: Як IT-компанії стати AI-powered? (UA)
 
Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)
Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)
Vitalii Dnistrovskyi: M&A в українському аутсорсі (UA)
 
Denis Shemyakin: Communication and Stakeholder Engagement (UA)
Denis Shemyakin: Communication and Stakeholder Engagement (UA)Denis Shemyakin: Communication and Stakeholder Engagement (UA)
Denis Shemyakin: Communication and Stakeholder Engagement (UA)
 
Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)
Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)
Oleksandr Klymchuk: PMO Maturity and Continuous Improvement (UA)
 
Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)
Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)
Maksym Vyshnivetskyi: PMO KPIs (UA) (#12)
 
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
 
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
Helen Lubchak: Тренди в управлінні проєктами та miltech (UA)
 

Recently uploaded

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
University of Toronto degree offer diploma Transcript
University of Toronto  degree offer diploma TranscriptUniversity of Toronto  degree offer diploma Transcript
University of Toronto degree offer diploma Transcript
taqyea
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
SanelaNikodinoska1
 
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
Amazon Web Services Korea
 
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeSouth Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
simmi singh$A17
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeRK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Alisha Pathan $A17
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
sanjay singh
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
khansayyad1256
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
vasudha malikmonii$A17
 
Sunshine Coast University diploma
Sunshine Coast University diplomaSunshine Coast University diploma
Sunshine Coast University diploma
cwavvyy
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
aarusi sexy model
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
sapna sharmap11
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
RajdeepPaul47
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
aashuverma204
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
depikasharma
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
ASISHSABAT3
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
shruti singh$A17
 

Recently uploaded (20)

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
University of Toronto degree offer diploma Transcript
University of Toronto  degree offer diploma TranscriptUniversity of Toronto  degree offer diploma Transcript
University of Toronto degree offer diploma Transcript
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
 
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
 
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeSouth Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
 
How We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeachHow We Added Replication to QuestDB - JonTheBeach
How We Added Replication to QuestDB - JonTheBeach
 
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeRK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
 
Sunshine Coast University diploma
Sunshine Coast University diplomaSunshine Coast University diploma
Sunshine Coast University diploma
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 

Artur Suchwalko “What are common mistakes in Data Science projects and how to avoid them?"

  • 1. What are common mistakes in Data Science projects? (and how to avoid them?) Artur Suchwałko, Ph.D., QuantUp AI & Big Data 2018, March 10, 2018, Lviv, Ukraine
  • 3. Real-world Data Science projects • Kaggle competitions and real Data Science projects are two quite different disciplines • When a data frame is prepared then it’s easy • What is done not correctly and can be corrected? • Analysis of a business problem • Data • Process • Methods, models • Hardware, sofware • People (Everything based on practical experience: 20 years, 100 projects, 3,000 hours of workshops. For the majority of topics I could add quotes from talks.)
  • 4. Analysis of a business problem
  • 5. No. We don’t want to build a model of production and storage in our factory Problem: • We’d like just to optimize cutting a log (a trunk of a dead tree) into planks • Let’s do it in the simplest way. Why should we waste time and money? • The others can do it. Why do you make it complicated?!? Solution: • To build the production and storage model • Otherwise you will optimize log cutting in a different sawmill • or something completely different
  • 6. Solution of a wrong analytical problem Problem: • Stating of a wrong problem and solving it can decrease predictive ability of a model • Similarly, removing so called false predictors (leaks from future) • But we never want to have pure predictive power. Usually business wants actionability and real value Solution: • Focus on what influences your busines
  • 8. Preparation of a development sample is not very important Problem: • Let’s take a sample and model! • Preparation of the development sample decides if the model will fit the reality we model or not • The data and thus the sample is generated (or influenced) by a process that must be well known and understoo Solution: • Think it over really carefully.
  • 9. We have Big Data. We need to implement Big Data solutions Problem: • If you can email your data or fit it in a pendrive it means you don’t have Big Data! • Many Data Science tasks for millions of records can be completed using (powerful) laptops • Decisions are data-driven or not. It’s not about data magnitude but about way the decisions are taken Solution: • Be (more than) sure that we need Big Data technologies for storing and processing • During PoC / prototype stage don’t use Big Data tools • Important: Not valid for some problems
  • 10. Use social media data Problem: • It’s a tremendous effort if you don’t use an off-the-shelf solution • Usually business value is not big Solution: • Be sure that the effort will be rewarded
  • 12. Let’s build a model in one week Problem: • It’s possible (in theory) • If you don’t analyze the process thoughtfully and don’t detect false predictors then the model will not work in production • We will be really happy to see how well it performs on our development sample Solution: • Take enough time • Be sure that the process is correct
  • 13. There is too short time to complete the task / model Problem: • Data problems • Stucked in preprocessing • The implementation takes too long • Too short experience Solution: • Prepare a full product as soon as possible, e.g.: • cutting out all the functionalities, e.g. a scoring application with a simple / dummy model • a full code for building the model but using simpler methods • improve it in the next iterations • Using CRISP-DM / checklist to support your memory • Usually you can start implementation from the first product version
  • 14. Way you prepare the result (a model, a data product) doesn’t matter Problem: • I want a model. It must work. I don’t care how you’ll build it. Just build it! • The process is crucial • If it is wrong then the analysis is not fully reproducible • We take a technical debt • and sooner or later we will be forced to pay it back Solution: • Build models in a fully reproducible way
  • 15. Implementation – I’m sure it’ll work out somehow Problem: • Implementation without planned tests usually fail • What is really painful, it takes time to realize that they failed (a model works and generates risk) Solution: • Plan both, implementation and tests
  • 17. AI. We desperately need AI! Problem: • We don’t need • Predictive modeling is not AI! • It happens that full control over a model is more important than predictive power Solution: • Let’s think what we’d like to achieve and how to do this • Data-driven decision making is more important
  • 18. A model just learns everything it is exposed to Problem: • You need to promise self-learning to sell a service / a software • But it will not learn automatically if not fed by suitable data • In many situations you don’t have such data to design a feedback loop Solution: • Analyze a process that generates the data for the development sample • Put aside a “not touched” sample • The model will be taught using a sample and refined in an ongoing way
  • 19. Start modeling from using Deep Learning! Problem: • But everybody uses it… • No!!! • Many problems are too simple for DL • In particular, the problems with data in a data frame Solution: • Random Forest, xgboost
  • 20. If we have 3000 classes then let’s build a BIG classifier Problem: • For example when we’d like to recommend bank products • Such a random classifier has error 2999/3000 = 99.97% (not 50%) • Usually the dataset is too small Solution: • It’s good to use a simpler method (usually)
  • 22. You can do calculations using a laptop Problem: • Sometimes yes, you can • But usually you cannot • Usually it doesn’t make any sense – human’s time is more expensive that machine’s time Solution: • It is good to invest some money in hardware • or use AWS from Amazon (or something similar)
  • 23. Commercial software is excellent Problem: • Users often tell that it is excellent unless bought • The problems appear later Solution: • Test it in similar conditions it will be used • Think seriously about using open source
  • 24. Free software is excellent (and it’s free!) Problem: • It’s free – in terms of a buying cost • It’s not just excellent – the cost is neccessity to have qualified people onboard and to develop software • There happen inconvenient problems Solution: • Use as it should be used • i.e. write clear and clean code, use additional tools, e.g. VCS • Take care of the team to have the skills needed
  • 26. All companies have Data Science teams. Let’s build one for us! Problem: • It’s possible to build a team. It will take a lot of time and lots of money. • If the results will be wasted then the people will leave • They need to have fun working on projects • If I need a plank then do I really need to buy a sawmill? Solution:: • Be sure that: • we know how to use their results • it will give value to the business • PoC can be outsourced. The first data science project can be outsourced.
  • 27. A student or a freshman is enough to give profits from deep analytics to business Problem: • If someone can cut with a scalpel then will we call him a surgeon? • Why someone who can build (technically) a model having a data frame is called a Data Scientist? • Data Scientist is a profession – experience matters! • People without experience usually don’t give any business value for a company. Even after spending a year working with data (!) Solution: • Hire experienced people, especially in the beginning of a DS journey • let them teach the freshmen • But what is you don’t have experienced people? • Invest time, effort, and money in your team. Let a more business analyst control the team
  • 28. The team will learn everything on online courses Problem: • I give each of you $20 (ok, even $50) and learn everything online • It’s true. The team will learn some things • But not the most important ones • A good hands-on training cannot be substituted Solution: • Learning by doing (and applying) • Control and stimulate learning • Buy knowledge
  • 30. Summary • To avoid mistakes it is good to ask ourselves these questions (and answer them), e.g.: • What business problem are we solving? • What will be business value we can get from the results? • What could be lost in translation fro business into analytics? • Do we have adequate and representative data? • What process does generate them? What are they influenced by? • What is model building process? • What analytical tools should be used? Could we apply simpler approaches? • How do we control all the risk? • It is good to do it repeatedly • It’s best to involve someone experienced • It’s beneficial to educate the receivers of the results
  • 32. Contact • During the conference! • After the conference: artur [at] quantup [dot] eu