Machine learning at scale with Google Cloud Platform

Machine Learning at
scale with GCP
using ML Engine & Python Dataflow
19/09/2017
Matthias Feys

2
About ML6/Datatonic
We are a team of data scientists, machine learning
experts, software engineers and mathematicians.
Our mission is to provide tailor-made systems to help
your organization get smart actionable insights from
large data volumes.
+ Specialized Machine Learning partner of Google
Cloud
Matthias Feys
GDE for ML and GCP
@FsMatt - matthiasfeys@gmail.com

3 Outline
Dissecting a Machine Learning Project
Mapping the components to GCP tools
Building a boilerplate example
1
2
3

4
Opinionated view
Focus on tooling
not complete overview
no focus on models
no focus business
Caveats

5 Outline
1
2
3

6 Popular image of what ML is
Lots of data Magical resultsComplex mathematics in multidimensional spaces

7 In reality, ML is
Collect
data
Create
model
Train model with
organized data
Organize
data
Deploy trained
model
iterate

8 Concrete use case: predict stellar masses*
*data and problem of the 1st
DSGhent hackathon: https://astrohack.org
Predict stellar mass based
on:
● image of galaxy
● distance to galaxy

9 Collect & organize data
Input:
76k labelled galaxies:
● 1 grayscale image in different CSV files
● all distances to galaxies & actual masses in
single CSV file
Output:
Training, validation & test examples:
● Features:
○ normalized image (matrix of
predefined shape)
○ distance & derived features
● Label:
○ actual mass

10 Create & train model
Input:
Training, validation & test examples
Process:
● build model that predicts labels based on
features,
● fit model on training data,
● tune hyperparameters
● iterate
Output:
Best model fitted on the training data with
optimal hyperparameters

11 Deploy trained model
Input:
Best model fitted on the training data with
optimal hyperparameters
Output:
API that accepts input data for a single galaxy:
● 1 grayscale image
● distance to galaxy
and returns the predicted stellar mass

12 Outline
1
2
3

13 Google Cloud Products
Compute Storage
Data &
Analytics
Machine
Learning

14 GCP: Open Cloud Philosophy
● Powerful Open Source Frameworks that run everywhere
● Fully Managed Services to run it more easily
Cloud Machine
Learning Engine
TensorFlow
Cloud
Dataflow
Apache
Beam

15 Mapping to GCP Products
Collect
data
Create
model
Train model with
organized data
Organize
data
Deploy trained
model
Cloud Machine
Learning Engine
Cloud
Dataflow
Cloud Machine
Learning Engine
Cloud
Storage
Tensorflow

16 Datalab/Jupyter notebooks to experiment & iterate
Collect
data
Create
model
Train model with
organized data
Organize
data
Deploy trained
model
Iterate

17 Google Cloud Storage (GCS)
● Object Storage Service
● Your data lives here:
○ Raw input data
○ Cleaned examples for TF models
○ Serialized Tensorflow models
● Single interface/API, multiple
offerings
Name Access Frequency
Multi-Regional Frequent, Cross-regional
Regional Frequent, Single-region
Nearline Less than once per month
Coldline Less than once per year

18 Apache Beam running on Cloud Dataflow
● Open source, unified model for defining both
batch and streaming data-parallel processing
pipelines.
● Using one of the open source Beam SDKs, you
build a program that defines the pipeline.
● The pipeline is then executed by one of Beam’s
supported distributed processing back-ends,
which include Apache Apex, Apache Flink,
Apache Spark, and Google Cloud Dataflow.
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline
Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflow
Execution
Source: https://beam.apache.org

19 Apache Beam key concepts
● Pipelines: data processing job made of a
series of computations including input,
processing, and output
● PCollections: bounded (or unbounded)
datasets which represent the input,
intermediate and output data in pipelines
● PTransforms: data processing step in a
pipeline in which one or more PCollections are
an input and output
● I/O Sources and Sinks: APIs for reading and
writing data which are the roots and
endpoints of the pipeline.
Source: https://beam.apache.org

20 Apache Beam running on Cloud Dataflow
● Fully-managed data processing service
to run Apache Beam pipelines:
○ Automated and optimized
work partitioning which can
dynamically rebalance lagging work
○ Horizontal dynamic autoscaling of
worker resources

21 Collect & organize data with Cloud Dataflow
Astrohack use case steps:
● Read:
○ Metadata from 1 csv-file
○ Image data from 76k csv-files
● Combine datasets
● Preprocess and build Tensorflow
examples
● Split into train/validation/test set
● Write to TFRecords on GCS

22 Collect & organize data with Cloud Dataflow
1 1
2
2
3 3
4
4
5
5

23 Tensorflow
● Open-source library for machine learning
● Single API for multiple platforms/devices:
cpu(s), gpu(s),tpu(s), mobile phones...
● 2 step approach:
○ Construct your model as a
computational graph
○ Train your model by pushing data
through the graph
● Big community with lots of SotA model
implementations

24 ML Engine Training
● Tensorflow Training As a Service
● Data needs to be available online
● No fancy interface (only logging +
Tensorboard)
● Same code can run locally to test on small
datasets
● Nice features:
○ Easy setup of (GPU) clusters for
distributed Tensorflow models
○ Automatic parallel hyperparameter
tuning with Hypertune

25 ML Engine Training with contrib.learn and TFRecords on GCS

26 ML Engine Predictions
● Deploy trained model:
○ model (container)
○ version (actual code)
● Predictions:
○ batch
○ online
● Autoscaling

28 GCP components for ML summarized
Google Cloud Storage
● All your data lives here
● Very fast access from other GCP services
Cloud Dataflow
● Best place to run Apache Beam Pipelines
● Automatically & dynamically scales to 1000’s of cores
Cloud ML Engine
● Best place to train & deploy Tensorflow models
● Easy access to clusters of GPU machines to:
○ Train distributed model on multiple machines
○ Automate hyperparameter tuning (in parallel)
● Autoscales the number of serving nodes in response to request traffic

29 Outline
1
2
3

30 Boilerplate example
● Starterkit to deploy your Tensorflow project on Google Cloud Platform
● Structure based on the ML Engine samples, simplified
● Usage:
○ build locally on a sample on the dataset (faster for small datasets)
○ run in the cloud on the complete dataset (faster for large datasets)
● Functionalities:
○ preprocess data with Apache Beam (make TFRecords)
○ train Tensorflow model
○ deploy Tensorflow model
○ scalable inferences with ML Engine

31 Boilerplate example (bit.ly/mlboilerplate)
Link:

33 What I didn’t cover
Other (big) data services Ready to use Machine Learning models
Cloud
Vision API
Cloud
Translation API
Cloud Natural
Language API
Cloud
Speech API
Cloud
Jobs API
Cloud Video
Inelligence
Cloud
Datalab
Cloud
Dataproc
Cloud
Dataprep
BigQuery

34 Further reading
Getting Started:
● GCP-ML boilerplate: https://github.com/Fematich/mlengine-boilerplate
● Apache Beam Programming Guide: https://beam.apache.org/documentation/programming-guide
● Tensorflow Getting Started: https://www.tensorflow.org/get_started
● ML Engine documentation: https://cloud.google.com/ml-engine/docs
● Machine Learning Workflow: https://cloud.google.com/ml-engine/docs/concepts/ml-solutions-overview
● Gcloud commands: https://cloud.google.com/sdk/gcloud/reference/ml-engine/
Next Steps:
● Tensorflow Model Zoo: https://github.com/tensorflow/models
● Tensorflow Serving with Kubernetes: https://tensorflow.github.io/serving/serving_inception
● Tensorflow Transform (Preprocessing for ML Engine):
https://research.googleblog.com/2017/02/preprocessing-for-machine-learning-with.html
● Cloudml-magic (Jupyter Notebook magic cmds for ML Engine): https://github.com/hayatoy/cloudml-magic
● Best Practices for ML Engineering: http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
● How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native:
http://bit.ly/2w0YWlh
Meetups (Belgium):
● https://www.meetup.com/GDG-Cloud-Belgium
● https://www.meetup.com/TensorFlow-Belgium

Machine learning at scale with Google Cloud Platform

More Related Content

Machine learning at scale with Google Cloud Platform