SlideShare a Scribd company logo
Stacked Ensembles in H2O
Erin LeDell Ph.D.

Machine Learning Scientist
February 2017
Agenda
• Who/What is H2O?
• Ensemble Learning Overview
• Stacking / Super Learner
• Why Stacking?
• Grid Search & Stacking
• Stacking with Third-party Algos
• AutoML and Stacking
H2O.ai
H2O.ai, the
Company
H2O, the
Platform
• Founded in 2012
• Stanford & Purdue Math & Systems Engineers
• Headquarters: Mountain View, California, USA
• Open Source Software (Apache 2.0 Licensed)
• R, Python, Scala, Java and Web Interfaces
• Distributed algorithms that scale to “Big Data”
Scientific Advisory Council
• John A. Overdeck Professor of Mathematics, Stanford University
• PhD in Statistics, Stanford University
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Co-author with John Chambers, Statistical Models in S
• Co-author, Generalized Additive Models
Dr. Trevor Hastie
• Professor of Statistics and Health Research and Policy, Stanford University
• PhD in Statistics, Stanford University
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Author, Regression Shrinkage and Selection via the Lasso
• Co-author, An Introduction to the Bootstrap
Dr. Robert Tibshirani
• Professor of Electrical Engineering and Computer Science, Stanford University
• PhD in Electrical Engineering and Computer Science, UC Berkeley
• Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
• Co-author, Linear Matrix Inequalities in System and Control Theory
• Co-author, Convex Optimization
Dr. Steven Boyd

Recommended for you

ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217

How Deep Learning Will Make Us More Human Again While deep learning is taking over the AI space, most of us are struggling to keep up with the pace of innovation. Arno Candel shares success stories and challenges in training and deploying state-of-the-art machine learning models on real-world datasets. He will also share his insights into what the future of machine learning and deep learning might look like, and how to best prepare for it. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

artificial intelligencemachine learningdata science
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python

Sarah Guido gave a presentation on analyzing data with Python. She discussed several Python tools for preprocessing, analysis, and visualization including Pandas for data wrangling, scikit-learn for machine learning, NLTK for natural language processing, MRjob for processing large datasets in parallel, and ggplot for visualization. For each tool, she provided examples and use cases. She emphasized that the best tools depend on the type of data and analysis needs.

mrjobpythonscikit-learn
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python

Introduction to Analytics with Azure Notebooks and Python for Data Science and Business Intelligence. This is one part of a full day workshop on moving from BI to Analytics

pythonazure notebooksazure
H2O Distributed Computing
H2O Cluster
H2O Frame
• Multi-node cluster with shared memory model.
• All computations in memory.
• Each node sees only some rows of the data.
• No limit on cluster size.
• Distributed data frames (collection of vectors).
• Columns are distributed (across nodes) arrays.
• Works just like R’s data.frame or Python Pandas
DataFrame
Introduction to Stacking
Ensemble Learning
In statistics and machine learning,
ensemble methods use multiple
learning algorithms to obtain
better predictive performance
than could be obtained by any of
the constituent algorithms.


— Wikipedia
Common Types of Ensemble Methods
• Also reduces variance and increases accuracy
• Not robust against outliers or noisy data
• Flexible — can be used with any loss function
Bagging
Boosting
Stacking
• Reduces variance and increases accuracy
• Robust against outliers or noisy data
• Often used with Decision Trees (i.e. Random Forest)
• Used to ensemble a diverse group of strong learners
• Involves training a second-level machine learning
algorithm called a “metalearner” to learn the 

optimal combination of the base learners

Recommended for you

Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013

Ted Willke, Principal Engineer/GM, Intel Labs: "Avoiding Cluster-Scale Headaches with Better Tools for Data Quality and Feature Engineering"

"feature engineering"
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala

How the data science pipelines have to evolve and how it'll be accessible using the right technologies from Scala and the Spark Notebook.

spark notebookapache cassandraakka
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari

This document summarizes an introduction to data analysis in Python using Wakari. It discusses why Python is a good language for data analysis, highlighting key Python packages like NumPy, Pandas, Matplotlib and IPython. It also introduces Wakari, a browser-based Python environment for collaborative data analysis and reproducible research. Wakari allows sharing of code, notebooks and data through a web link. The document recommends several talks at the PyData conference on efficient computing, machine learning and interactive plotting.

wakariprogrammingpresentation
Stacking (aka Super Learner Algorithm)
• Start with design matrix, X, and response, y
• Specify L base learners (with model params)
• Specify a metalearner (just another algorithm)
• Perform k-fold CV on each of the L learners
“Level-zero” 

data
Stacking (aka Super Learner Algorithm)
• Collect the predicted values from k-fold CV that was
performed on each of the L base learners
• Column-bind these prediction vectors together to
form a new design matrix, Z
• Train the metalearner using Z, y
“Level-one” 

data
Stacking vs. Parameter Tuning/Search
• A common task in machine learning is to perform model selection by
specifying a number of models with different parameters.
• An example of this is Grid Search or Random Search.
• The first phase of the Super Learner algorithm is computationally
equivalent to performing model selection via cross-validation.
• The latter phase of the Super Learner algorithm (the metalearning step)
is just training another single model (no CV).
• With Stacking, your computation does not go to waste!
Why Stacked Ensembles?

Recommended for you

Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...

We’ve all heard that AI is going to become as ubiquitous in the enterprise as the telephone, but what does that mean exactly? Everyone in IBM has a telephone; and everyone knows how to use her telephone; and yet IBM isn’t a phone company. How do we bring AI to the same standard of ubiquity — where everyone in a company has access to AI and knows how to use AI; and yet the company is not an AI company? In this talk, we’ll break down the challenges a domain expert faces today in applying AI to real-world problems. We’ll talk about the challenges that a domain expert needs to overcome in order to go from “I know a model of this type exists” to “I can tell an application developer how to apply this model to my domain.” We’ll conclude the talk with a live demo that show cases how a domain expert can cut through the five stages of model deployment in minutes instead of days using IBM and other open source tools.

apache sparksparkaisummit
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...

This document discusses pandas, a popular Python library for data analysis, and its limitations. It introduces Badger, a new project from DataPad that aims to address some of pandas' shortcomings like slow performance on large datasets and lack of tight database integration. The creator describes Badger as using compressed columnar storage, immutable data structures, and C kernels to perform analytics queries much faster than pandas or databases on benchmark tests of a multi-million row dataset. He envisions Badger becoming a distributed, multicore analytics platform that can also be used for ETL jobs.

Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)

Wes McKinney discusses challenges in building better analytics workflows. He notes the increasing scale of data and need for more advanced analytics has led more people to learn programming. However, current tools have issues with inefficient workflows, lack of collaboration, and friction between different parts of the analytics process. McKinney advocates for more integrated environments that enhance collaboration and make data science more accessible to address these problems.

How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/leaderboard/private
How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7229#post7229
How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7230#post7230
h2oEnsemble R package
& Stacked Ensemble in h2o

Recommended for you

Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark

This slide deck is from a webinar "Everything you always wanted to know about Machine Learning but did not know where to ask"

big datahadoopspark
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O

This document discusses scalable ensemble learning using the H2O platform. It provides an overview of ensemble methods like bagging, boosting, and stacking. The stacking or Super Learner algorithm trains a "metalearner" to optimally combine the predictions from multiple "base learners". The H2O platform and its Ensemble package implement Super Learner and other ensemble methods for tasks like regression and classification. An R code demo is presented on training ensembles with H2O.

h2o.aidatadata science
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen

These slides are from my talk at the NYC Python Meetup at ODSC Office NYC on February 17, 2016. It discusses Python's architectural challenges to interoperate with the Hadoop ecosystem and how a new project, Apache Arrow, will help.

pandasbig datahadoop
Evolution of H2O Ensemble
• h2oEnsemble R package in 2015
• Ensemble logic ported to Java in late 2016
• Stacked Ensemble method in h2o in early 2017
• R & Python APIs supported
• In progress: Custom metalearners
• In progress: MOJO for production use
Stacking with 

Random Grids
H2O Cartesian Grid Search
H2O Random Grid Search

Recommended for you

Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python

This document provides an introduction and agenda for a tutorial on machine learning with H2O and Python. The introduction discusses the presenter's background and qualifications. The agenda outlines topics to be covered including an overview of H2O.ai as a company and machine learning platform, tutorials on using the H2O Python module to import data, build regression and classification models, and improve model performance through techniques like cross-validation, grid search, and stacking. Case study notebooks and examples will be used to demonstrate key machine learning concepts and the H2O framework in Python.

h2opythonmachine learning
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone

H2O Deep Water is a tool that integrates distributed deep learning with H2O's machine learning platform. It allows users to build, stack, and deploy deep learning models from libraries like TensorFlow, MXNet, and Caffe through a unified interface. Deep Water inherits properties from H2O like scalability, ease of use, and deployment capabilities. It also makes deep learning more accessible by supporting popular network architectures and allowing easy ensemble of deep models with other H2O algorithms.

deep learningcaffèopen source
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed

This document summarizes machine learning scalability from single machine to distributed systems. It discusses how true scalability is about how long it takes to reach a target accuracy level using any available hardware resources. It introduces GraphLab Create and SFrame/SGraph for scalable machine learning and graph processing. Key points include distributed optimization techniques, graph partitioning strategies, and benchmarks showing GraphLab Create can solve problems faster than other systems by using fewer machines.

data sciencedatasciencesummit2015machine learning
Stacking with Random Grids (h2o R)
Stacking with Random Grids (h2o Python)
H2O Stacking Resources
H2O Stacked Ensembles docs & code demo:
http://tinyurl.com/h2o-stacked-ensembles
h2oEnsemble R package homepage on Github:
http://tinyurl.com/github-h2o-ensemble
Third-Party Integrations

Recommended for you

Spark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scaleSpark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scale

Slides for a presentation I gave for the Machine Learning with Spark Tokyo meetup. Introduction to Spark, H2O, SparklingWater and live demos of GBM and DL.

big datamachine learningpython
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015

Spark & GraphX for recommendation algorithms - presented at a Netflix-hosted Spark Meetup, on 05/19/2015

ldagraphxnetflix
Using Machine Learning For Solving Time Series Probelms
Using Machine Learning For Solving Time Series ProbelmsUsing Machine Learning For Solving Time Series Probelms
Using Machine Learning For Solving Time Series Probelms

The document discusses using machine learning for solving time series problems. It outlines typical steps which include converting irregular time series data to regular samples, preprocessing the data through techniques like tapped delay lines and dynamic filters to capture dynamic aspects for static models, and then applying machine learning. Both static models where dynamic preprocessing is used and dynamic models like recurrent neural networks are suitable. Example applications include time series forecasting, classification, and control/decision making problems.

Ensemble H2O with Anything
• XGBoost will be available in the next major
release of H2O, so you can use it with the
Stacked Ensemble method
• https://github.com/h2oai/h2o-3/pull/699
A powerful combo: H2O + XGBoost
Third party stacking with H2O:
• SuperLearner, subsemble, mlr & caret R packages
support stacking with H2O for small/medium data
AutoML
H2O AutoML
Public code coming soon!
• AutoML stands for “Automatic Machine Learning”
• The idea here is to remove most (or all) of the parameters
from the algorithm, as well as automatically generate
derived features that will aid in learning.
• Single algorithms are tuned automatically using a
carefully constructed random grid search.
• Optionally, a Stacked Ensemble can be constructed.
H2O Resources
• H2O Online Training: http://learn.h2o.ai
• H2O Tutorials: https://github.com/h2oai/h2o-tutorials
• H2O Meetup Materials: https://github.com/h2oai/h2o-meetups
• H2O Video Presentations: https://www.youtube.com/user/0xdata
• H2O Community Events & Meetups: https://h2o.ai/events

Recommended for you

Deep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno CandelDeep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno Candel

1) The document discusses the history and development of artificial intelligence, machine learning, and deep learning from the 1950s to present. 2) It introduces H2O.ai's Deep Water platform for deep learning which leverages popular open source tools like TensorFlow, MXNet, and Caffe to enable large-scale deep learning on CPUs and GPUs. 3) Deep Water allows users to easily train, compare, and deploy deep learning models through H2O's APIs in R, Python, and Flow for big data use cases like image, video, text, and time series analysis.

machine learningcaffèdeep water
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko

Dmitry will show the audience on how get started with Mxnet and building Deep Learning models to classify images, sound and text. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

h2o open dallasdeep learningpredictive analytics
H2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray PeckH2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray Peck

Ray Peck from H2O.ai talks about the roadmap for the upcoming AutoML product in H2O. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

h2o open dallasautomated machine learningmachine learning
Thank you!
@ledell on Github, Twitter
erin@h2o.ai
http://www.stat.berkeley.edu/~ledell

More Related Content

What's hot

Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
Andy Petrella
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
Sri Ambati
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
Sri Ambati
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
Sarah Guido
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
Jen Stirrup
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
MLconf
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
Andy Petrella
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Databricks
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Wes McKinney
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
elephantscale
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Sri Ambati
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
Turi, Inc.
 
Spark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scaleSpark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scale
Mateusz Dymczyk
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
 

What's hot (20)

Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Spark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scaleSpark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scale
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 

Viewers also liked

Using Machine Learning For Solving Time Series Probelms
Using Machine Learning For Solving Time Series ProbelmsUsing Machine Learning For Solving Time Series Probelms
Using Machine Learning For Solving Time Series Probelms
Sri Ambati
 
Deep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno CandelDeep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno Candel
Sri Ambati
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
Sri Ambati
 
H2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray PeckH2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray Peck
Sri Ambati
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
Sri Ambati
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
Sri Ambati
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal Malohlava
Sri Ambati
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff Allen
Sri Ambati
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
Sri Ambati
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with  Matt DowleThe Joys of Clean Data with  Matt Dowle
The Joys of Clean Data with Matt Dowle
Sri Ambati
 
H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData Amsterdam
Sri Ambati
 
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard PafkaH2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
Sri Ambati
 
Better Customer Experience with Data Science - Bernard Burg, Comcast
Better Customer Experience with Data Science - Bernard Burg, ComcastBetter Customer Experience with Data Science - Bernard Burg, Comcast
Better Customer Experience with Data Science - Bernard Burg, Comcast
Sri Ambati
 
H2O Advancements - Arno Candel
H2O Advancements - Arno CandelH2O Advancements - Arno Candel
H2O Advancements - Arno Candel
Sri Ambati
 
Comcast Enterprise Network Services
Comcast Enterprise Network ServicesComcast Enterprise Network Services
Comcast Enterprise Network Services
vcardona
 
Predicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCA
Sri Ambati
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
MLconf
 
Visual Machine Learning - Tony Chu
 Visual Machine Learning - Tony Chu Visual Machine Learning - Tony Chu
Visual Machine Learning - Tony Chu
Sri Ambati
 

Viewers also liked (20)

Using Machine Learning For Solving Time Series Probelms
Using Machine Learning For Solving Time Series ProbelmsUsing Machine Learning For Solving Time Series Probelms
Using Machine Learning For Solving Time Series Probelms
 
Deep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno CandelDeep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno Candel
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
 
H2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray PeckH2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray Peck
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal Malohlava
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff Allen
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with  Matt DowleThe Joys of Clean Data with  Matt Dowle
The Joys of Clean Data with Matt Dowle
 
H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData Amsterdam
 
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard PafkaH2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
 
Better Customer Experience with Data Science - Bernard Burg, Comcast
Better Customer Experience with Data Science - Bernard Burg, ComcastBetter Customer Experience with Data Science - Bernard Burg, Comcast
Better Customer Experience with Data Science - Bernard Burg, Comcast
 
H2O Advancements - Arno Candel
H2O Advancements - Arno CandelH2O Advancements - Arno Candel
H2O Advancements - Arno Candel
 
Comcast Enterprise Network Services
Comcast Enterprise Network ServicesComcast Enterprise Network Services
Comcast Enterprise Network Services
 
Predicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCA
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Visual Machine Learning - Tony Chu
 Visual Machine Learning - Tony Chu Visual Machine Learning - Tony Chu
Visual Machine Learning - Tony Chu
 

Similar to Stacked Ensembles in H2O

New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
Sri Ambati
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
QuantUniversity
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
Hang Li
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
Jinseob Kim
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Ramiro Aduviri Velasco
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
BhagyasriPatel2
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academia
Michael Mior
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 

Similar to Stacked Ensembles in H2O (20)

New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 
A view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academiaA view from the ivory tower: Participating in Apache as a member of academia
A view from the ivory tower: Participating in Apache as a member of academia
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 

Recently uploaded (20)

Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 

Stacked Ensembles in H2O

  • 1. Stacked Ensembles in H2O Erin LeDell Ph.D.
 Machine Learning Scientist February 2017
  • 2. Agenda • Who/What is H2O? • Ensemble Learning Overview • Stacking / Super Learner • Why Stacking? • Grid Search & Stacking • Stacking with Third-party Algos • AutoML and Stacking
  • 3. H2O.ai H2O.ai, the Company H2O, the Platform • Founded in 2012 • Stanford & Purdue Math & Systems Engineers • Headquarters: Mountain View, California, USA • Open Source Software (Apache 2.0 Licensed) • R, Python, Scala, Java and Web Interfaces • Distributed algorithms that scale to “Big Data”
  • 4. Scientific Advisory Council • John A. Overdeck Professor of Mathematics, Stanford University • PhD in Statistics, Stanford University • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Co-author with John Chambers, Statistical Models in S • Co-author, Generalized Additive Models Dr. Trevor Hastie • Professor of Statistics and Health Research and Policy, Stanford University • PhD in Statistics, Stanford University • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Author, Regression Shrinkage and Selection via the Lasso • Co-author, An Introduction to the Bootstrap Dr. Robert Tibshirani • Professor of Electrical Engineering and Computer Science, Stanford University • PhD in Electrical Engineering and Computer Science, UC Berkeley • Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers • Co-author, Linear Matrix Inequalities in System and Control Theory • Co-author, Convex Optimization Dr. Steven Boyd
  • 5. H2O Distributed Computing H2O Cluster H2O Frame • Multi-node cluster with shared memory model. • All computations in memory. • Each node sees only some rows of the data. • No limit on cluster size. • Distributed data frames (collection of vectors). • Columns are distributed (across nodes) arrays. • Works just like R’s data.frame or Python Pandas DataFrame
  • 7. Ensemble Learning In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained by any of the constituent algorithms. 
 — Wikipedia
  • 8. Common Types of Ensemble Methods • Also reduces variance and increases accuracy • Not robust against outliers or noisy data • Flexible — can be used with any loss function Bagging Boosting Stacking • Reduces variance and increases accuracy • Robust against outliers or noisy data • Often used with Decision Trees (i.e. Random Forest) • Used to ensemble a diverse group of strong learners • Involves training a second-level machine learning algorithm called a “metalearner” to learn the 
 optimal combination of the base learners
  • 9. Stacking (aka Super Learner Algorithm) • Start with design matrix, X, and response, y • Specify L base learners (with model params) • Specify a metalearner (just another algorithm) • Perform k-fold CV on each of the L learners “Level-zero” 
 data
  • 10. Stacking (aka Super Learner Algorithm) • Collect the predicted values from k-fold CV that was performed on each of the L base learners • Column-bind these prediction vectors together to form a new design matrix, Z • Train the metalearner using Z, y “Level-one” 
 data
  • 11. Stacking vs. Parameter Tuning/Search • A common task in machine learning is to perform model selection by specifying a number of models with different parameters. • An example of this is Grid Search or Random Search. • The first phase of the Super Learner algorithm is computationally equivalent to performing model selection via cross-validation. • The latter phase of the Super Learner algorithm (the metalearning step) is just training another single model (no CV). • With Stacking, your computation does not go to waste!
  • 13. How to Win Kaggle https://www.kaggle.com/c/GiveMeSomeCredit/leaderboard/private
  • 14. How to Win Kaggle https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7229#post7229
  • 15. How to Win Kaggle https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7230#post7230
  • 16. h2oEnsemble R package & Stacked Ensemble in h2o
  • 17. Evolution of H2O Ensemble • h2oEnsemble R package in 2015 • Ensemble logic ported to Java in late 2016 • Stacked Ensemble method in h2o in early 2017 • R & Python APIs supported • In progress: Custom metalearners • In progress: MOJO for production use
  • 20. H2O Random Grid Search
  • 21. Stacking with Random Grids (h2o R)
  • 22. Stacking with Random Grids (h2o Python)
  • 23. H2O Stacking Resources H2O Stacked Ensembles docs & code demo: http://tinyurl.com/h2o-stacked-ensembles h2oEnsemble R package homepage on Github: http://tinyurl.com/github-h2o-ensemble
  • 25. Ensemble H2O with Anything • XGBoost will be available in the next major release of H2O, so you can use it with the Stacked Ensemble method • https://github.com/h2oai/h2o-3/pull/699 A powerful combo: H2O + XGBoost Third party stacking with H2O: • SuperLearner, subsemble, mlr & caret R packages support stacking with H2O for small/medium data
  • 27. H2O AutoML Public code coming soon! • AutoML stands for “Automatic Machine Learning” • The idea here is to remove most (or all) of the parameters from the algorithm, as well as automatically generate derived features that will aid in learning. • Single algorithms are tuned automatically using a carefully constructed random grid search. • Optionally, a Stacked Ensemble can be constructed.
  • 28. H2O Resources • H2O Online Training: http://learn.h2o.ai • H2O Tutorials: https://github.com/h2oai/h2o-tutorials • H2O Meetup Materials: https://github.com/h2oai/h2o-meetups • H2O Video Presentations: https://www.youtube.com/user/0xdata • H2O Community Events & Meetups: https://h2o.ai/events
  • 29. Thank you! @ledell on Github, Twitter erin@h2o.ai http://www.stat.berkeley.edu/~ledell