Strata San Jose 2016: Scalable Ensemble Learning with H2O

Scalable Ensemble Learning with H2O
Erin LeDell Ph.D. 
Machine Learning Scientist 
H2O.ai
San Jose, CA March 2016

Introduction
• Statistician & Machine Learning Scientist at H2O.ai in
Mountain View, California, USA
• Ph.D. in Biostatistics with Designated Emphasis in
Computational Science and Engineering from  
UC Berkeley (focus on Machine Learning)
• Worked as a data scientist at several startups

Agenda
• Ensemble Learning
• Super Learner Algorithm / Stacking
• H2O Machine Learning Platform
• H2O Ensemble package
• R Code Demo

Ensemble Learning
In statistics and machine learning,
ensemble methods use multiple
learning algorithms to obtain
better predictive performance
than could be obtained by any of
the constituent algorithms.
 
— Wikipedia (2016)

Common Types of Ensemble Methods
• Also reduces variance and increases accuracy
• Not robust against outliers or noisy data
• Flexible — can be used with any loss function
Bagging
Boosting
Stacking
• Reduces variance and increases accuracy
• Robust against outliers or noisy data
• Often used with Decision Trees (i.e. Random Forest)
• Used to ensemble a diverse group of strong learners
• Involves training a second-level machine learning
algorithm called a “metalearner” to learn the  
optimal combination of the base learners

History of Stacking
• Leo Breiman, “Stacked Regressions” (1996)
• Modified algorithm to use CV to generate level-one data
• Blended Neural Networks and GLMs (separately)
Stacked
Generalization
Stacked
Regressions
Super Learning
• David H. Wolpert, “Stacked Generalization” (1992)
• First formulation of stacking via a metalearner
• Blended Neural Networks
• Mark van der Laan et al., “Super Learner” (2007)
• Provided the theory to prove that the Super Learner is
the asymptotically optimal combination
• First R implementation in 2010

The Super Learner Algorithm
• Start with design matrix, X, and response, y
• Specify L base learners (with model params)
• Specify a metalearner (just another algorithm)
• Perform k-fold CV on each of the L learners
“Level-zero”  
data

The Super Learner Algorithm
• Collect the predicted values from k-fold CV that was
performed on each of the L base learners
• Column-bind these prediction vectors together to
form a new design matrix, Z
• Train the metalearner using Z, y
“Level-one”  
data

Super Learning vs. Parameter Tuning/Search
• A common task in machine learning is to perform model selection by
specifying a number of models with different parameters.
• An example of this is Grid Search or Random Search.
• The first phase of the Super Learner algorithm is computationally
equivalent to performing model selection via cross-validation.
• The latter phase of the Super Learner algorithm (the metalearning step)
is just training another single model (no CV).
• With Super Learner, your computation does not go to waste!

H2O Platform Overview
• Distributed implementations of cutting edge ML algorithms.
• Core algorithms written in high performance Java.
• APIs available in R, Python, Scala, REST/JSON.
• Interactive Web GUI.

H2O Platform Overview
• Write code in high-level language like R (or use the web
GUI) and output production-ready models in Java.
• To scale, just add nodes to your H2O cluster.
• Works with Hadoop, Spark and your laptop.

H2O Ensemble Overview
• H2O Ensemble implements the Super Learner algorithm.
• Super Learner finds the optimal combination of a
combination of a collection of base learning algorithms.
ML Tasks
Super Learner
Why
Ensembles?
• When a single algorithm does not approximate the true
prediction function well.
• Win Kaggle competitions!
• Regression
• Binary Classification
• Roadmap: Support for multi-class classification

H2O Ensemble
Lasso GLM
Ridge GLM
Random 
Forest
GBMRectifier 
DNN
Maxout  
DNN

How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/leaderboard/private

How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7229#post7229

How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7230#post7230

Stacking with Random Grids
New H2O Ensemble function in v0.1.8:
h2o.stack
http://tinyurl.com/h2o-randomgrid-stack-demo
Strata San Jose Exclusive!!

H2O Ensemble Resources
H2O Ensemble training guide:
http://tinyurl.com/learn-h2o-ensemble
H2O Ensemble homepage on Github:
http://tinyurl.com/github-h2o-ensemble
H2O Ensemble R Demos:
http://tinyurl.com/h2o-ensemble-demos

Where to learn more?
• H2O Online Training (free): http://learn.h2o.ai
• H2O Slidedecks: http://www.slideshare.net/0xdata
• H2O Video Presentations: https://www.youtube.com/user/0xdata
• H2O Community Events & Meetups: http://h2o.ai/events
• Machine Learning & Data Science courses: http://coursebuffet.com

Thank you!
@ledell on Github, Twitter
erin@h2o.ai
http://www.stat.berkeley.edu/~ledell

Strata San Jose 2016: Scalable Ensemble Learning with H2O

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Strata San Jose 2016: Scalable Ensemble Learning with H2O

Similar to Strata San Jose 2016: Scalable Ensemble Learning with H2O (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Strata San Jose 2016: Scalable Ensemble Learning with H2O