Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps

DataOps
Data Science Empowerment through
DevOps, Cloud Computing and Building
your own Applications

Kelly O’Briant
Data Science Product Engineer
kelly@rladies.org
@kellrstats | @RLadiesDC
• R-Ladies Washington DC Chapter Founder
and Organizer
• R-Ladies Global unofficial “cloud expert”
• Publish a monthly series called .rprofile
on the rOpenSci blog
• Business Science University
course developer

My Talk Goal:
I want you to leave this conference so
excited, you go back to work and completely
ignore whatever project you’re supposed to
be working on because you’re so pumped up
about building a data product and you can’t
stop yourself from doing it.

Motivation
Why I talk about Data Science Empowerment
R-Ladies events
• How do I get a job as a data scientist/analyst/anything?
• What should I study/learn/do/produce to be a data scientist?
• Am I even a data scientist? Is what I do data science?
Why are data products empowering?
• I use data products to justify/prove to myself that I belong, that my
ideas are valid and to help me communicate with people who are bad
at listening (or when I’m bad at speaking)

Motivation
Traumatic Experiences!
Windows Lab Linux Lab Mac Lab

R-Ladies + International Women’s Day
Twitter Campaign
• Create a twitter bot using R code
to tweet out a profile for every
woman in our Global speaker
directory
• Project collaboration through GitHub
• Docker linked to a local volume
• Twitter Application(s)

Deploy and Use H2O Machine Learning
Models in Production
• Build and validate a model in python
working in a Jupyter Notebook with the
H2O machine learning API
• Package the model code as a POJO or
MOJO file
• Deploy the model to H2O.ai STEAM to
create an ML prediction service complete
with a REST API query URL

Create and Maintain a Personal Website
• Use the blogdown package in an
RStudio project to create the
framework for a Hugo static
website
• Create content for the site by
writing Rmarkdown files
• Compile and deploy the static site –
choose a hosting mechanism:
GitHub? Continuous Integration
with Netlify?

Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff

#rstats

Worldwide organization
that promotes gender diversity
in the R community via meetups
and mentorship in a friendly and
safe environment

Back to the topic: DataOps
1. It usually takes a little DevOps to build a Data Product
2. Building more Data Products is empowering – good for your portfolio and soul

What is DevOps
And why should Data-oriented people care about it?
DevOps is…
“A combination of cultural philosophies, practices
and tools that increases an organizations ability to
deliver applications and services at high velocity.
- AWS DevOps Blog

Deliver applications and services at high velocity
Do This – without pulling all
your hair out?

Deliver applications and services at high velocity
Do This – Super Effectively
Host your analysis
• Share
• Publish
• Collaborate
• Prove a point
• Serve a purpose
• Be reproducible
• Save the day

What is DataOps?
DataOps?
Anywhere you can put a little DevOps magic into your data science workflow

Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps

Build More Data Products
So that you and others can use them to solve real problems

Do Machine Learning!
So Hot Right Now
What Species
is this iris??
Credit: xkcd

1. Turn your ideas into R code
• Write functions to generate the
plots you’re envisioning
• Package: ggplot2
• Train and validate a machine
learning model to use
• Package: caret
geom_hist_basic <- function(var){
ggplot(iris, aes_string(x = var)) +
geom_histogram() +
facet_wrap(~ Species)
}
predict_matrix(fit.knn, validation)
Confusion Matrix and Statistics
Prediction setosa versicolor virginica
setosa 10 0 0
versicolor 0 8 1
virginica 0 2 9

2. Turn your R code into an R Shiny app
Client Side Code:
User Interface and
Input Elements
Server Side Code:
(Reactive) R Output
Elements
shinyApp(ui = fluidPage, server = serverFunction)
fluidPage
Code
serverFunction
Code

Let’s Build a REST API with R
1. Write Functions in R
Expose Data or Model
Produce Analysis or Visualization
Data Agnostic
Perform Analysis on New Data
2. Create Plumber
API Endpoints
- Get
- Post
4. Send Requests to
the Plumber Service
Through external (or
internal) Applications
- Jupyter Notebooks
- Web Apps
3. Host the Plumber
Script on a Server
- Create Plumber
router object
- Run in an R Session

Docker Image
RStudio
Server
R Session
Running
Plumber
REST API
My Local File
System
- Plumber.R
- Dockerfile
Local Volume Link
Applications
&
Notebooks
Requests!
Demo Framework

That’s it!
Now go build some sweet data products

R-Ladies Global Meetups
• Get involved!
• More female speakers,
leaders, teachers, builders,
friends!
RLadies.org
@RLadiesGlobal

RStudio Webinars
• All of the talks
from RStudio::conf
2018 have just
been published
• Highly
recommend!

Resources for Learning Shiny Development
shiny.rstudio.com

Resources for Learning Plumber
www.rplumber.io
@TrestleJeff
on Twitter!

Note to self: Remember to give
out stickers
I have R-Ladies and R-Ladies Plumber Stickers!
I’m Kelly!
@kellrstats on Twitter

Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps

Similar to Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps (20)

More from Rehgan Avon

More from Rehgan Avon (9)

Recently uploaded

Recently uploaded (20)

Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps