From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

From Data Science to Production
01
deploy, scale, enjoy!
Sergii Khomenko, Data Scientist
sergii.khomenko@stylight.com, @lc0d3r
PyData Amsterdam - March 12, 2016

Sergii Khomenko
2
Data scientist at one of the biggest fashion communities, Stylight.
Data analysis and visualisation hobbyist, working on problems not
only in working time but in free time for fun and personal data
visualisations.
Originally from computer engineering background.
Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet
Camp London 2015, Berlin Buzzwords 2015 , Tableau Conference on
Tour 2015, Budapest BI Forum 2015, Crunchsconf 2015, FOSDEM
2016

Fellow DevOps
3
Quentin NerdenMilos Radovanovic Patrick Roelke

Profitable Leads
Stylight provides its
partners with high-
quality leads enabling
partner shops to
leverage Stylight as a
ROI positive traffic
channel.
Inspiration
Stylight offers
shoppable
inspiration that
makes it easy to
know what to
buy and how to
style it.
Branding & Reach
Stylight offers a unique
opportunity for brands to reach
an audience that is actively
looking for style online.
Shopping
Stylight helps users search
and shop fashion and lifestyle
products smarter across
hundreds of shops.
4
Stylight – Make Style Happen
Core Target Group
Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.

Stylight – acting on a global scale

Experienced & Ambitious Team
Innovative cross-
functional organisation
with flat hierarchy builds a  
unique team spirit.
• +200 employees
• 40 PhDs/Engineers
• 28 years average age
• 63% female
• 23 nationalities
• 0 suits
6

7
D a t a S c i e n t i s t : P e r s o n w h o i s
b e t t e r a t s t a t i s t i c s t h a n a n y
s o f t w a r e e n g i n e e r a n d b e t t e r a t
s o f t w a r e e n g i n e e r i n g t h a n a n y
s t a t i s t i c i a n .

Agenda
8
E a r l y d a y s o f s t a r t u p s
S o f t w a r e e n g i n e e r i n g
I m m u t a b l e i n f r a s t r u c t u r e
S e r v e r l e s s a r c h i t e c t u r e

Problem deﬁnition:
10
• Many different technologies
• Hard to reproduce data science results
• Issues with backward compatibility
• Dependency hell
• Hard to scale products
• Hard to on-board new people

Software engineering
12
built circa 2015-16

You most likely doing it already
15
• Version control
• Cover code with tests
• nosetests, pytest, unittest2
- start small with doc tests
- try out TDD: rednose, nose-watch

You most likely doing it already
16
• Cover code with tests
• yes, even your R application could
have tests
- testthat
- devtools
• Code reviews
• Pair programming

Some of the mentioned problems
17
• Dependency hell

18image from http://udaypal.com/

21
• Dependency hell

How it could help:
22
• Every technology has its own container
- just docker run
• Every package with version defined in
Dockerfile
- have a base image for more advanced cases
• New people
- just docker run

r-base/Dockerﬁle

lc0/docker-shiny-server

Known issues
26
• Images could be really huge
• Try to skip anything you do not need
• Alpine Linux as a base image
• 5 mb base image (musl libc and BusyBox)
• Iron.io has pre-built images based on alpine
• python, scala, java, elixir, etc

28
• Hard to roll out
• Hard to maintain production dependencies

AWS ECR

CircleCI deployments

Immutable infrastructure
35
Infrastructure as Code

36
N e e d t o u p g r a d e ? N o p r o b l e m .
B u i l d a n e w , u p g r a d e d s y s t e m a n d
t h r o w t h e o l d o n e a w a y . N e w a p p
r e v i s i o n ? S a m e t h i n g . B u i l d a
s e r v e r ( o r i m a g e ) w i t h a n e w
r e v i s i o n a n d t h r o w a w a y t h e o l d
o n e s .

47
Terraform
Kubernetes and Docker {Swarm, Compose}

Possibilities
56
• all Lambdas in one place with version control
• integration tests with real events
• proper CI/CD setup

Use-case of
outlier detection
61

63
custom
uniﬁcation
pipeline
Departments
Business
Intelligence
internal processes variety of event types
and structures

www.stylight.com
sergii.khomenko@stylight.com
@lc0d3r

Related links
66
1. Testing Your Code - The Hitchhiker's Guide to Python
2. https://hub.docker.com/_/r-base/
3. http://www.alpinelinux.org/
4. https://github.com/iron-io/dockers
5. Docker Hub: A new stack plus ecosystem partners automate developer
workﬂows
6. Trash Your Servers and Burn Your Code: Immutable Infrastructure and
Disposable Components

Related links
67
7. https://github.com/cloudtools/troposphere
8. CloudFormation UpdatePolicy Attribute
9. https://www.terraform.io/
10.(Docker Compose + Docker Swarm) or Kubernetes
11.Google Cloud Functions
12.https://github.com/apex/apex
13.Streaming Data Processing with Amazon Kinesis and AWS Lambda

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

More Related Content

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016