H2O Deep Water - Making Deep Learning Accessible to Everyone

H2O Deep Water
Making Deep Learning Accessible to Everyone
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
@matlabulous
SF Big Data Science at Metis
23rd February, 2017

About Me
• Civil (Water) Engineer
• 2010 – 2015
• Consultant (UK)
• Utilities
• Asset Management
• Constrained Optimization
• Industrial PhD (UK)
• Infrastructure Design Optimization
• Machine Learning +
Water Engineering
• Discovered H2O in 2014
• Data Scientist
• From 2015
• Virgin Media (UK)
• Domino Data Lab (Silicon Valley)
• H2O.ai (Silicon Valley)
2

3
Joe (2015)
http://www.h2o.ai/gartner-magic-quadrant/

Agenda
• About H2O.ai
• Company
• Machine Learning Platform
• Deep Learning Tools
• TensorFlow, MXNet, Caffe, H2O
• Deep Water
• Motivation
• Benefits
• Interface
• Learning Resources
• Conclusions
4

Company Overview
Founded 2011 Venture-backed, debuted in 2012
Products • H2O Open Source In-Memory AI Prediction Engine
• Sparkling Water
• Steam
Mission Operationalize Data Science, and provide a platform for users to build beautiful data products
Team 70 employees
• Distributed Systems Engineers doing Machine Learning
• World-class visualization designers
Headquarters Mountain View, CA
6

H2O Machine Learning Platform
7

Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
Algorithms Overview
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
8

HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
9
Flow (Web), R, Python API
Java for computation

Deep Learning Tools
TensorFlow, mxnet, Caffe and H2O Deep Learning
14

TensorFlow
• Open source machine learning
framework by Google
• Python / C++ API
• TensorBoard
• Data Flow Graph Visualization
• Multi CPU / GPU
• v0.8+ distributed machines support
• Multi devices support
• desktop, server and Android devices
• Image, audio and NLP applications
• HUGE Community
• Support for Spark, Windows …
15
https://github.com/tensorflow/tensorflow

TensorFlow Wrappers
• TFLearn – Simplified interface
• keras – TensorFlow + Theano
• tensorflow.rb – Ruby wrapper
• TensorFlow.jl – Julia wrapper
• TensorFlow for R – R wrapper
• … and many more!
• See: github.com/jtoy/awesome-
tensorflow
16

Sorting Cucumbers
• Problem
• Sorting cucumbers is a laborious
process.
• In a Japanese farm, the farmer’s wife
can spend up to eight hours a day
sorting cucumbers during peak
harvesting period.
• Solution
• Farmer’s son (Makoto Koike) used
TensorFlow, Arduino and Raspberry Pi
to create an automatic cucumber
sorting system.
17

Sorting Cucumbers
• Classification Problem
• Input: cucumber photos (side, top,
bottom)
• Output: one of nine classes
• Google’s Blog Post [Link]
• YouTube Video [Link]
18

TensorKart – Self-driving Mario Kart
22
https://github.com/kevinhughes27/TensorKart
https://opensourceforu.com/2017/01/tensorflow-brings-self-driving-to-mario-kart/

23
https://github.com/dmlc/mxnet
https://www.zeolearn.com/magazine/amazon-to-use-mxnet-as-deep-learning-framework

Caffe
• Convolution Architecture For
Feature Extraction (CAFFE)
• Pure C++ / CUDA architecture for
deep learning
• Command line, Python and
MATLAB interface
• Model Zoo
• Open collection of models
24
https://docs.google.com/presentation/d/1UeKXVgRvvxg9OUdh_UiC5G71UMscNPlvArsWER41PsU/

Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
H2O Deep Learning
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
25

Both TensorFlow and H2O are widely used
27

TensorFlow , MXNet, Caffe and H2O
democratize the power of deep learning.
H2O platform democratizes artificial
intelligence & big data science.
There are other open source deep learning libraries like Theano and Torch too.
Let’s have a party, this will be fun!
28

Deep Water
Next-Gen Distributed Deep Learning with H2O
H2O integrates with existing GPU backends
for significant performance gains
One Interface - GPU Enabled - Significant Performance Gains
Inherits All H2O Properties in Scalability, Ease of Use and Deployment
Recurrent Neural Networks
enabling natural language processing,
sequences, time series, and more
Convolutional Neural Networks enabling
Image, video, speech recognition
Hybrid Neural Network Architectures
enabling speech to text translation, image
captioning, scene parsing and more
Deep Water
30

Deep Water Architecture
Node 1 Node N
Scala
Spark
H2O
Java
Execution Engine
TensorFlow/mxnet/Caffe
C++
GPU CPU
TensorFlow/mxnet/Caffe
C++
GPU CPU
RPC
R/Py/Flow/Scala client
REST API
Web server
H2O
Java
Execution Engine
grpc/MPI/RDMA
Scala
Spark
31

32
Using H2O Flow to train Deep
Water Model

Available Networks in Deep Water
• LeNet
• AlexNet
• VGGNet
• Inception (GoogLeNet)
• ResNet (Deep Residual
Learning)
• Build Your Own
33
ResNet

34
Choosing different network structures

35
Choosing different backends
(TensorFlow, MXNet, Caffe)

Unified Interface for TF, MXNet and Caffe
36
Change backend to
“mxnet”, “caffe” or “auto”

Easy Stacking with other H2O Models
37
Ensemble of Deep Water, Gradient Boosting
Machine & Random Forest models

39
https://github.com/h2oai/h2o-3/tree/master/examples/deeplearning/notebooks
Deep Water
Example notebooks

Deep Water Cat/Dog/Mouse
Demo
40

Deep Water H2O + TensorFlow Demo
• H2O + TensorFlow
• Dataset – Cat/Dog/Mouse
• TensorFlow as GPU backend
• Train a LeNet (CNN) model
• Interfaces
• Python (Jupyter Notebook)
• Web (H2O Flow)
• Code and Data
• github.com/h2oai/deepwater
41

Data – Cat/Dog/Mouse Images
42

Unified Interface for TF, MXNet and Caffe
47
Change backend to
“mxnet”, “caffe” or “auto”

Deep Water – Custom Network
50

51
Saving the custom network
structure as a file

52
Creating the custom network
structure with size = 28x28
and channels = 3

53
Specifying the custom
network structure for
training

Project “Deep Water”
• H2O + TF + MXNet + Caffe
• a powerful combination of widely
used open source machine
learning libraries.
• All Goodies from H2O
• inherits all H2O properties in
scalability, ease of use and
deployment.
• Unified Interface
• allows users to build, stack and
deploy deep learning models from
different libraries efficiently.
55
• 100% Open Source
• the party will get bigger!

Deep Water – Current Contributors
56

• Organizers & Sponsors
• Metis & H2O.ai
• Code, Slides & Documents
• bit.ly/h2o_meetups
• bit.ly/h2o_deepwater
• docs.h2o.ai
• Contact
• joe@h2o.ai
• @matlabulous
• github.com/woobe
58
Thanks!
Making Machine Learning
Accessible to Everyone
Photo credit: Virgin Media

59
https://techcrunch.com/2017/01/26/h2os-deep-water-puts-deep-learning-in-the-hands-of-enterprise-users/

H2O Deep Water - Making Deep Learning Accessible to Everyone

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (19)

Similar to H2O Deep Water - Making Deep Learning Accessible to Everyone

Similar to H2O Deep Water - Making Deep Learning Accessible to Everyone (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

H2O Deep Water - Making Deep Learning Accessible to Everyone