SlideShare a Scribd company logo
© 2018 KNIME AG. All rights reserved.
Leveraging H2O Machine Learning
with KNIME Analytics Platform
Christian Dietz
KNIME
H2O Distributed Machine Learning Algorithms
Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
Platforms with H2O Integration
H2O + KNIME Talk
at KNIME Summit
March 2018
© 2018 KNIME AG. All rights reserved. 4
KNIME®
• KNIME AG founded in 2008
• Offices in Zurich (HQ), Konstanz, Berlin, and Austin
• Maintainer of the Open Source KNIME Analytics Platform
– comprehensive data loading, processing, analysis, modeling platform
– visual frontend
– open: to all sorts of data, other tools (R and Python, etc.), various
user personas
– 20+ open source releases since 2006
– open source.
• KNIME Server
– 14 commercial product releases since 2008
• KNIME cloud offerings
© 2018 KNIME AG. All rights reserved. 5
KNIME® Software
© 2018 KNIME AG. All rights reserved. 6
KNIME® Analytics Platform
© 2018 KNIME AG. All rights reserved. 7
Analysis & Mining
Statistics
Data Mining
Machine Learning
Deep Learning
Web Analytics
Text Mining
Network Analysis
Social Media Analysis
R, Weka, Python, H2O
Community / 3rd
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers
Industry Specific
Community / 3rd
Transformation
Row,
Column
Matrix
Text, Image
Time Series
Java
Python
Community / 3rd
Visualization
R
Python
JavaScript
Community / 3rd
Deployment
via BIRT
PMML
XML, JSON
Databases
Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Community / 3rd
Over 2000 Native and Embedded Nodes Included
© 2018 KNIME AG. All rights reserved. 8
KNIME H2O Machine Learning Integration
• Offer our users high-performance machine learning
algorithms from H2O in KNIME
• Allow to mix & match with other KNIME
functionality
– Data wrangling KNIME Analytics Platform functionality
– KNIME Big-Data Connectors
– Text Mining, Image Processing, Cheminformatics, …
– and more!
© 2018 KNIME AG. All rights reserved. 9
KNIME H2O Machine Learning Integration
© 2018 KNIME AG. All rights reserved. 10
The Data
Date Store ID Visitors
2016-01-01 ba937bf13d40fb24 28
… … …
2017-04-22 324f7c39a8410e7c 216
Date Store ID Visitors
2017-04-23 e8ed9335d0c38333 ?
… … …
2017-05-31 8f13ef0f5e8c64dd ?
Provided data:
• Number of visitors
• Reservations
• Store information
• Calendar date info
© 2018 KNIME AG. All rights reserved. 11
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 12
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 13
Data Preparation with KNIME Nodes
© 2018 KNIME AG. All rights reserved. 15
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 16
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 17
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 18
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 19
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 20
Modeling with the H2O Nodes
© 2018 KNIME AG. All rights reserved. 21
Visitor Forecasting
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 22
Visitor Forecasting
.
Data
preparation
Model
training
Model
optimization
Model
evaluation
Deployment
© 2018 KNIME AG. All rights reserved. 24
Blend H2O with…Python, Java and R Scripting…
24
© 2018 KNIME AG. All rights reserved. 25
…Image Processing...
25
© 2018 KNIME AG. All rights reserved. 26
…Deep Learning...
© 2018 KNIME AG. All rights reserved. 27
...Text Processing...
© 2018 KNIME AG. All rights reserved. 28
...Databases...
© 2018 KNIME AG. All rights reserved. 29
...a growing Big Data Integration.
© 2018 KNIME AG. All rights reserved. 30
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 31
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 32
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 33
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 34
H2O Sparkling Water in KNIME
© 2018 KNIME AG. All rights reserved. 35
Scoring with H2O MOJOs on Apache Spark
© 2018 KNIME AG. All rights reserved. 36
Thank You!
www.knime.com
37© 2018 KNIME AG. All rights reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by
KNIME AG under license from KNIME GmbH, and are registered in the United States.
KNIME® is also registered in Germany.

More Related Content

What's hot

Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Yuriy Guts
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
WQ Fan
 
AI & ML
AI & MLAI & ML
AI & ML
Karan Shaw
 
Anaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange InstallationAnaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange Installation
Girinath Pillai
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Amazon Web Services
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
Unity Technologies
 
Kubeflow
KubeflowKubeflow
Kubeflow
Karane Vieira
 
Factorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsFactorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender Systems
Evgeniy Marinov
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
D Yogendra Rao
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
Falitokiniaina Rabearison
 
Introduction to Computational Intelligent
Introduction to Computational IntelligentIntroduction to Computational Intelligent
Introduction to Computational Intelligent
Kent State University
 
Knime (Konstanz Information Miner)
Knime (Konstanz Information Miner)Knime (Konstanz Information Miner)
Knime (Konstanz Information Miner)
Kiran Buriro
 
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Amazon Web Services
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
Faisal Siddiqi
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020
Mikio L. Braun
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
georgejustymirobi1
 
Easily enrich capella models with your own domain extensions
Easily enrich capella models with your own domain extensionsEasily enrich capella models with your own domain extensions
Easily enrich capella models with your own domain extensions
Obeo
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 

What's hot (20)

Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
AI & ML
AI & MLAI & ML
AI & ML
 
Anaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange InstallationAnaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange Installation
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
 
Kubeflow
KubeflowKubeflow
Kubeflow
 
Factorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsFactorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender Systems
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Introduction to Computational Intelligent
Introduction to Computational IntelligentIntroduction to Computational Intelligent
Introduction to Computational Intelligent
 
Knime (Konstanz Information Miner)
Knime (Konstanz Information Miner)Knime (Konstanz Information Miner)
Knime (Konstanz Information Miner)
 
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
 
Easily enrich capella models with your own domain extensions
Easily enrich capella models with your own domain extensionsEasily enrich capella models with your own domain extensions
Easily enrich capella models with your own domain extensions
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 

Similar to H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI World London

Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
KNIMESlides
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIMESlides
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
KNIMESlides
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
Greg Landrum
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
Your Flight is Boarding Now!
Your Flight is Boarding Now!Your Flight is Boarding Now!
Your Flight is Boarding Now!
MeetupDataScienceRoma
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
KNIMESlides
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
BioinformaticsInstitute
 
OpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS DataOpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS Data
Ganesan Narayanasamy
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
Neo4j
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
Luciano Resende
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Microsoft Tech Community
 
Machine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser EbrahimMachine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser Ebrahim
Eclipse Day India
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
ModusOptimum
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME Server
KNIMESlides
 

Similar to H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI World London (20)

Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Your Flight is Boarding Now!
Your Flight is Boarding Now!Your Flight is Boarding Now!
Your Flight is Boarding Now!
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
OpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS DataOpenPOWER partner presentation - GTS Data
OpenPOWER partner presentation - GTS Data
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on Azure
 
Machine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser EbrahimMachine Learning for Java Developers - Nasser Ebrahim
Machine Learning for Java Developers - Nasser Ebrahim
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME Server
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 

Recently uploaded (20)

[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 

H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI World London

  • 1. © 2018 KNIME AG. All rights reserved. Leveraging H2O Machine Learning with KNIME Analytics Platform Christian Dietz KNIME
  • 2. H2O Distributed Machine Learning Algorithms Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
  • 3. Platforms with H2O Integration H2O + KNIME Talk at KNIME Summit March 2018
  • 4. © 2018 KNIME AG. All rights reserved. 4 KNIME® • KNIME AG founded in 2008 • Offices in Zurich (HQ), Konstanz, Berlin, and Austin • Maintainer of the Open Source KNIME Analytics Platform – comprehensive data loading, processing, analysis, modeling platform – visual frontend – open: to all sorts of data, other tools (R and Python, etc.), various user personas – 20+ open source releases since 2006 – open source. • KNIME Server – 14 commercial product releases since 2008 • KNIME cloud offerings
  • 5. © 2018 KNIME AG. All rights reserved. 5 KNIME® Software
  • 6. © 2018 KNIME AG. All rights reserved. 6 KNIME® Analytics Platform
  • 7. © 2018 KNIME AG. All rights reserved. 7 Analysis & Mining Statistics Data Mining Machine Learning Deep Learning Web Analytics Text Mining Network Analysis Social Media Analysis R, Weka, Python, H2O Community / 3rd Data Access MySQL, Oracle, ... SAS, SPSS, ... Excel, Flat, ... Hive, Impala, ... XML, JSON, PMML Text, Doc, Image, ... Web Crawlers Industry Specific Community / 3rd Transformation Row, Column Matrix Text, Image Time Series Java Python Community / 3rd Visualization R Python JavaScript Community / 3rd Deployment via BIRT PMML XML, JSON Databases Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd Over 2000 Native and Embedded Nodes Included
  • 8. © 2018 KNIME AG. All rights reserved. 8 KNIME H2O Machine Learning Integration • Offer our users high-performance machine learning algorithms from H2O in KNIME • Allow to mix & match with other KNIME functionality – Data wrangling KNIME Analytics Platform functionality – KNIME Big-Data Connectors – Text Mining, Image Processing, Cheminformatics, … – and more!
  • 9. © 2018 KNIME AG. All rights reserved. 9 KNIME H2O Machine Learning Integration
  • 10. © 2018 KNIME AG. All rights reserved. 10 The Data Date Store ID Visitors 2016-01-01 ba937bf13d40fb24 28 … … … 2017-04-22 324f7c39a8410e7c 216 Date Store ID Visitors 2017-04-23 e8ed9335d0c38333 ? … … … 2017-05-31 8f13ef0f5e8c64dd ? Provided data: • Number of visitors • Reservations • Store information • Calendar date info
  • 11. © 2018 KNIME AG. All rights reserved. 11 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 12. © 2018 KNIME AG. All rights reserved. 12 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 13. © 2018 KNIME AG. All rights reserved. 13 Data Preparation with KNIME Nodes
  • 14. © 2018 KNIME AG. All rights reserved. 15 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 15. © 2018 KNIME AG. All rights reserved. 16 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 16. © 2018 KNIME AG. All rights reserved. 17 Modeling with the H2O Nodes
  • 17. © 2018 KNIME AG. All rights reserved. 18 Modeling with the H2O Nodes
  • 18. © 2018 KNIME AG. All rights reserved. 19 Modeling with the H2O Nodes
  • 19. © 2018 KNIME AG. All rights reserved. 20 Modeling with the H2O Nodes
  • 20. © 2018 KNIME AG. All rights reserved. 21 Visitor Forecasting Data preparation Model training Model optimization Model evaluation Deployment
  • 21. © 2018 KNIME AG. All rights reserved. 22 Visitor Forecasting . Data preparation Model training Model optimization Model evaluation Deployment
  • 22. © 2018 KNIME AG. All rights reserved. 24 Blend H2O with…Python, Java and R Scripting… 24
  • 23. © 2018 KNIME AG. All rights reserved. 25 …Image Processing... 25
  • 24. © 2018 KNIME AG. All rights reserved. 26 …Deep Learning...
  • 25. © 2018 KNIME AG. All rights reserved. 27 ...Text Processing...
  • 26. © 2018 KNIME AG. All rights reserved. 28 ...Databases...
  • 27. © 2018 KNIME AG. All rights reserved. 29 ...a growing Big Data Integration.
  • 28. © 2018 KNIME AG. All rights reserved. 30 H2O Sparkling Water in KNIME
  • 29. © 2018 KNIME AG. All rights reserved. 31 H2O Sparkling Water in KNIME
  • 30. © 2018 KNIME AG. All rights reserved. 32 H2O Sparkling Water in KNIME
  • 31. © 2018 KNIME AG. All rights reserved. 33 H2O Sparkling Water in KNIME
  • 32. © 2018 KNIME AG. All rights reserved. 34 H2O Sparkling Water in KNIME
  • 33. © 2018 KNIME AG. All rights reserved. 35 Scoring with H2O MOJOs on Apache Spark
  • 34. © 2018 KNIME AG. All rights reserved. 36 Thank You! www.knime.com
  • 35. 37© 2018 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.

Editor's Notes

  1. this competition was published by a japanese Restaurant chain. They wanted to know the number of future visitors for their different stores Lets see what kind of data they provided us to solve this problem
  2. This is the top-level-workflow we used to solve the problem It will guide us through the major steps from reading in the data up to doing the prediction and showcases the interaction of the knime native and the h2o nodes
  3. Lets jump right into our data preparation
  4. Data preparation part of the workflow We‘ll not discuss it in too much detail
  5. In the end we get two datasets the trainset with information about the number of visitors (target variable), which we will use to build our model in the next steps the test dataset without the number of visitors. These have to be predicted by our model and submitted to kaggle lateron
  6. We just did the data preparation, before we jump right into the modeling we have to create a local h2o context and convert our knime table into an h2o frame This frame will be used to build our models
  7. At the moment there are three H2O models implemented in KNIME which are capable of solving such a regression task: Random Forest, Generalized Linear Model and Gradient Boosting Machine Lets have a look at one of those to see how we trained, optimized and evaluated our models
  8. The actual learning of a model happens in one single node: The H2O Random Forest Learner takes the h2o frame with the testset and builds a model Configuration dialog: What is the target variable you want to predict? Here it is visitors, enter some model specific parameters, e.g. number of levels of a single tree and the number of tree models in this forest Next we use the H2O predictor to use the just created model to predict the visitors for our testset Afterwards the score of the model is computed with the H2O regression scorer. as performance measure we used the root mean squared logarithmic error, as this measure is also used on Kaggle to evaluate the final submissions.
  9. To avoid overfitting we use the h2o cross validation loop, which partitions the data and trains one model for each partition of the data !!! Tabelle mit mean von cv einbauen !!!
  10. With one machine learning algorithm, here e.g. random forest, you can solve different problems. With parameters, for a random forest e.g. the number of trees and the treedepth, one can adapt it to a specific problem with respect to the objective function. Here we are looking for parameters that minimize the error of our model validations We did it with a grid search that performs one iteration of the loop for every possible combination of parameters At the end we have a table with all parameter combinations and their respective scores
  11. At the end of the loop we’ve got all parameter combinations with their respective scores. We selected the parameters that lead to the best result and trained a new model on the complete public dataset As you can see we’ve got a nested loop here. Luckily the new H2O nodes are really fast, so this is not gonna be a performance issue
  12. The steps I just showed you happen in all three nodes. Afterwards we select the model which scored best
  13. convert it into an h2o MOJO, which is a model object that is optimized to be embedded in any java environment By doing this we are able to use our just created model outside of an H2O context. We can for example do our prediction for the submission dataset from Kaggle Or we can deploy it to where ever we want, so we just stored it somewhere for Christian. Lets see he is doing with it.