SlideShare a Scribd company logo
Deploying H2O in
Large-Scale Distributed
Environments using
Containers
Nanda Vijaydev
Lead Data Scientist and Senior Director of Solutions
BlueData (now part of HPE)
www.bluedata.ai @NandaVijaydev @BlueData
#H2OWORLD
Hype versus Reality …
Source: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
AI / ML Stack: Tools + Infrastructure
Multiple ML / DL frameworks, notebooks, and other tools
Data Science
Notebooks
Analytics
& BI Tools
Pipeline Tools
Data Scientists Developers Data Engineers Decision Makers
Role-based
access control
Tools for
distributed
AI / ML / DL
pipeline
RDBMS HDFS Streams Spark
Model
Storage
Workflow
Mgmt
Data Frameworks Access to data
and storage
• Access to valuable data: small, big, or both
• Choices of modeling techniques: each problem is
different
• Ability to build on datasets, validate on other
datasets, iterate, and improve
• Access to GPUs (and CPUs)
• Scale easily on real datasets
• Ability to operationalize in production
Distributed AI / ML / DL – Key Requirements
Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication

Recommended for you

MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow

This document discusses MLOps and Kubeflow. It begins with an introduction to the speaker and defines MLOps as addressing the challenges of independently autoscaling machine learning pipeline stages, choosing different tools for each stage, and seamlessly deploying models across environments. It then introduces Kubeflow as an open source project that uses Kubernetes to minimize MLOps efforts by enabling composability, scalability, and portability of machine learning workloads. The document outlines key MLOps capabilities in Kubeflow like Jupyter notebooks, hyperparameter tuning with Katib, and model serving with KFServing and Seldon Core. It describes the typical machine learning process and how Kubeflow supports experimental and production phases.

architectureartificial intelligencecontainerization
Productionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache SparkProductionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache Spark

This meetup was recorded in Mountain View, CA on January 10th, 2019. Video recording from the meetup can be viewed here: https://youtu.be/yN26i7e_BtM Spark pipelines represent a powerful concept to support productionizing machine learning workflows. Their API allows to combine data processing with machine learning algorithms and opens opportunities for integration with various machine learning libraries. However, to benefit from the power of pipelines, their users need to have a freedom to choose and experiment with any machine learning algorithm or library. Therefore, we developed Sparkling Water that embeds H2O machine learning library of advanced algorithms into the Spark ecosystem and exposes them via pipeline API. Furthermore, the algorithms benefit from H2O MOJOs - Model Object Optimized - a powerful concept shared across entire H2O platform to store and exchange models. The MOJOs are designed for effective model deployment with focus on scoring speed, traceability, exchangeability, and backward compatibility. In this talk we will explain the architecture of Sparkling Water with focus on integration into the Spark pipelines and MOJOs. We’ll demonstrate creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python. Furthermore, we will show how to utilize pre-trained model MOJOs with Spark pipelines. Speaker's Bio: Michal is the VP of Engineering at H2O.ai! Michal is a geek, developer, Java, Linux, programming languages enthusiast developing software for over 15 years. He obtained PhD from the Charles University in Prague in 2012 and post-doc at Purdue University. During his studies he was interested in construction of not only distributed but also embedded and real-time component-based systems using model-driven methods and domain-specific languages. He participated in design and development of various systems including SOFA and Fractal component systems or jPapabench control system.

An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU

This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/rKoBJcnsFpM Speaker's Bio: Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based. In his spare time he tries to be part of the IT community by organizing, attending and speaking at conferences and meet ups.

• Scalability, repeatability, complexity,
reproducibility across environments
• Sharing data, not duplicating data
• Deploying distributed platforms, libraries,
applications, and versions
• Efficiently sharing expensive resources like GPUs
• Agility to scale up and down compute resources
• Providing a future-proof solution
• Ensuring compatible NVIDIA device kernel
module installation
Distributed AI / ML / DL – Challenges
Laptop On-Prem
Cluster
Off-Prem
Cluster
Distributed
Machine Learning
with H20 on
Containers
Docker is a computer program that
performs operating-system-level virtualization
also known as containerization.
Containerization allows the existence of
multiple isolated user-space instances.
Docker Containers
Source: https://en.wikipedia.org/wiki/docker_(software)
Container-Based Platform for AI / ML
Data Scientists Developers Data Engineers Data Analysts
BI/Analytics Tools Bring-Your-Own
NFS HDFS
Compute
Storage
On-Premises Public Cloud
ML/DL & Big Data Tools Data Science Tools
CPUs GPUs
IOBoost™ – Extreme performance and enterprise-grade scalability
ElasticPlane™ – Self-service, multi-tenant containerized environments
DataTap™ – In-place access to data on-prem or in the cloud
BlueData EPIC™ Software Platform
H2O for AI/ML

Recommended for you

Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One

Presented at #H2OWorld 2017 in Mountain View, CA. Learn more about H2O.ai: https://www.h2o.ai/. Follow @h2oai: https://twitter.com/h2oai. - - - Effective volume anomaly detection presents unique challenges when monitoring customer transaction volumes across thousands of platforms and systems. We overcome this by using H2O, building on open source tools, and delivering machine learning anomaly detection for enterprise scale. Hear how we model, visualize then automatically alert on anomalous Mobile app volumes in real-time. Donald Gennetten has over 15 years experience supporting digital channels in the Financial Services industry. In his current role as a Data Engineer for Capital One’s Monitoring Intelligence team, he leads a cross-functional group of Data, Business, and Engineering subject matter experts to deliver Advanced Analytics solutions for real-time customer transaction monitoring and issue detection. Rahul Gupta is a Data Engineer in Capital One's Center for Machine Learning, focusing heavily on back-end development and model creation. His primary efforts include building an Algorithmic IT Operations (AIOps) platform that utilizes a combination of batch and streaming data with Machine Learning capabilities to improve the stability of Capital One services and overall customer experience.

data sciencemachine learning
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...

Presented at #H2OWorld 2017 in Mountain View, CA. Enjoy the video: https://youtu.be/r9S3xchrzlY. Learn more about H2O.ai: https://www.h2o.ai/. Follow @h2oai: https://twitter.com/h2oai. - - - Abstract: Venkatesh will explore how driverless AI is helping to keep fraudsters at bay. Share results from experiments conducted on large scale payment transaction data. Venkatesh's Bio: Venkatesh is a senior data scientist at PayPal where he is working on building state-of-the-art tools for payment fraud detection. He has over 20+ years experience in designing, developing and leading teams to build scalable server-side software. In addition to being an expert in big-data technologies, Venkatesh holds a Ph.D. degree in Computer Science with specialization in Machine Learning and Natural Language Processing (NLP) and had worked on various problems in the areas of Anti-Spam, Phishing Detection, and Face Recognition.

machine learningmachine learning big dataartificial intelligence
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure

The document discusses using Microsoft Azure cloud services for game development and operations. It provides examples of how games use Azure for scalable storage, global load balancing for multiplayer games, predictive analytics using big data, and DevOps approaches for deployment, monitoring, and development. Key Azure services highlighted include Storage, SQL Database, Virtual Machines, Mobile Services, HDInsight, and Application Insights.

microsoftgamesgaming
Accelerated AI / ML Deployment
• With H2O + BlueData, customers now have:
• Pre-built Docker H2O images with CUDA and automated cluster
creation for the entire stack
• Appropriate NVIDIA kernel module surfaced automatically to the
containers
• Easy access to resources required (e.g. single node, single GPU, multi-
node, multi-GPU combinations)
• UI, CLI, and API access (notebooks, web, SSH)
• NFS mounts surfaced as local drives for sharing assets
Example of an H2O Pipeline on Containers
H2O Driverless AI
Import Validate
Export
Shared Data Access Layer
… Data Sources …
Docker images for multiple
applications and versions
Ability to create and
add new images, and
save or restore
tested combinations
on demand
Deploy H2O from Pre-Built Images in the
BlueData EPIC App Store
Multi-Tenant, with Quotas for GPU Resources
Support for multi-tenancy
and ability to define quota
per tenant
Define ‘flavor’ types used to
launch Docker containers

Recommended for you

Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab

Enjoy the webinar recording here: https://youtu.be/Lll1qwQJKVw. Driverless AI speeds up data science workflows by automating feature engineering, model tuning, ensembling, and model deployment. In this presentation, Arno Candel (CTO, H2O.ai), gives a quick overview and guide attendees through an interactive hands-on lab using Qwiklabs. Driverless AI turns Kaggle-winning recipes into production-ready code and is specifically designed to avoid common mistakes such as under or overfitting, data leakage or improper model validation. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy. With Driverless AI, everyone can now train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client/server API through a variety of languages such as Python, Java, C++, go, C# and many more. To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware. For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on-premise. There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and interactive model interpretation with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and models.

machine learningdata scienceai
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production

The presentation topic for this meet-up was covered in two sections without any breaks in-between Section 1: Business Aspects (20 mins) Speaker: Rasmi Mohapatra, Product Owner, Experian https://www.linkedin.com/in/rasmi-m-428b3a46/ Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios Section 2: Tech Aspects (40 mins, slides & demo, Q&A ) Speaker: Santanu Dey, Solution Architect, Iguazio https://www.linkedin.com/in/santanu/ In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc. with relevant demos.

data sciencemachine learningmlops
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer instead

Data Con LA 2020 Description Machine learning is an essential skill in today's job market. But when it comes to learning Machine Learning, beginners get lot of conflicting advice. I have been teaching ML for software engineers for years. In this talk *I will dis-spell some of the myths surrounding machine learning *give you solid, tangible plan on how to go about learning ML *and give you good pointers to start from *and steer you away from common mistakes Speaker Sujee Maniyam, Elephant Scale, Founder, Principal instructor

data con ladata con la 2020dcla
Spin Up Multiple Environments
Quick launch templates
for one-click cluster
creation
Run multiple clusters,
with different versions or
combinations of tools,
side by side
Pick from a list of
pre-built and tested images
Assign specific resources (GPUs,
CPUs) to the cluster, depending on
the use case (e.g. for Driverless AI)
Define number of nodes, here for
H2O and Sparkling Water
On-Demand Cluster Creation
• 2 Docker containers running different versions
of Driverless AI with 1 GPU (Tesla P100) each
• NVIDIA device kernel (driver version: 390.46)
• NVIDIA CUDA (9.x) and cuDNN libraries
including
1. libcudnn7-7.4.2.24-1.cuda9.0.x86_64.rpm
2. libcudnn7-devel-7.4.2.24-
1.cuda9.0.x86_64.rpm
Source: https://medium.com/linagora-engineering/making-image-classification-simple-with-spark-deep-learning-f654a8b876b8
H2O Driverless AI with GPUs
• The user authenticates on Driverless AI
• Import datasets from BlueData DataTap
with DataTap connector, optimized
access with BlueData IOBoost
• Analyze the data
• Run experiments
• Build models, save them …
• Validate against other datasets from
DataTap …
• Export model for production
Run Driverless AI on Containers with GPUs
dtap

Recommended for you

H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI Workshop

This in-depth training on H2O Driverless AI was given by Wen Phan on June 28th, 2018. He elaborated on automatic feature engineering, machine learning interpretability, and automatic visualization components of this ground breaking product.

DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience

A talk for SF big analytics meetup. Building, testing, deploying, monitoring and maintaining big data analytics services. http://hydrosphere.io/

big dataapache sparkapache kafka
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...

The document discusses building real-time targeting capabilities at Capital One. It introduces two speakers, Ryan Zotti and Subbu Thiruppathy, and describes challenges around striving for speed in everything. It then covers how to achieve fast model data, training, deployment, and scoring through techniques like using the most up-to-date data, distributed computing in the cloud, automatic model refitting, and response times under 100 milliseconds.

aidata sciencepredictive analytics
• Optionally initialize Sparkling Water against an existing H2O cluster created previously
[external backend]
• Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity
• Work on your dataset using the HDFS connectivity
Work with Sparkling Water Cluster and HDFS
• BlueData EPIC automatically
deploys the environments
• Using persistent containers
• Providing true multi-tenancy
• Access to shared resources (CPU,
RAM, GPUs, storage)
• Pre-built H2O images in the
BlueData EPIC App Store
• Enterprise-grade security
(integration with AD /LDAP / TDE)
Simplify H2O Deployments at Scale in Minutes
BlueData DataTap
BlueData IOBoost
Enable Compute / Storage Separation
Connect the clusters to different datasets without
copying the data, and with performance optimized
From the BlueData EPIC App Store, deploy
more application clusters to connect to H2O
Integrate H2O with Production Environment

Recommended for you

Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale

The document discusses designing scalable platforms for artificial intelligence (AI) and machine learning (ML). It outlines several challenges in developing AI applications, including technical debts, unpredictability, different data and compute needs compared to traditional software. It then reviews existing commercial AI platforms and common components of AI platforms, including data access, ML workflows, computing infrastructure, model management, and APIs. The rest of the document focuses on eBay's Krylov project as an example AI platform, outlining its architecture, challenges of deploying platforms at scale, and needed skill sets on the platform team.

machine learningaiplatform
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O

In this presentation, Parul Pandey, will provide a history and overview of the field of “Automatic Machine Learning” (AutoML), followed by a detailed look inside H2O’s open source AutoML algorithm. H2O AutoML provides an easy-to-use interface which automates data pre-processing, training and tuning a large selection of candidate models (including multiple stacked ensemble models for superior model performance). The result of the AutoML run is a “leaderboard” of H2O models which can be easily exported for use in production. AutoML is available in all H2O interfaces (R, Python, Scala, web GUI) and due to the distributed nature of the H2O platform, can scale to very large datasets. The presentation will end with a demo of H2O AutoML in R and Python, including a handful of code examples to get you started using automatic machine learning on your own projects. Parul's Bio: Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science, evangelism and community in her work. Her emphasis is to spread the information about H2O and Driverless AI to as many people as possible, She is also an active writer and has contributed towards various national and international publications.

Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYC

This session took place at New York City on November 4th, 2019. Speaker Bio: Chemere is a Senior Data Science Training Specialist for H2O.ai. Chemere has a Master's in Business Administration with focus in Marketing Analytics from the University of North Carolina at Charlotte. She is an experienced data scientist with a diverse background in transformational decision-making in various industries including Banking, Manufacturing, Logistics, and Medical Devices. Chemere joins us from Venus Concept/2two5, where she was the Lead Data Scientist focused on building predictive models with Internet of Things (IoT) data and for a subscription-based marketing product for B2B customers. Prior to that, Chemere worked as a Senior Data Scientist at Wells Fargo Bank focused on various applied predictive analytic solutions. More details about the event can be had here: https://www.eventbrite.com/e/dive-into-h2o-new-york-tickets-76351721053

• Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data …)
 This complexity can be abstracted from data science teams with self-
service provisioning and automation, using containers
 GPU access can be effectively used by the containerized application,
then released for other applications and users
 For a flexible and scalable solution, data resources
should be decoupled from compute
• H2O, Driverless AI, and Sparkling Water can be
deployed at scale on containers – whether
on-premises, on any public cloud, or hybrid
 BlueData + H2O proven in production with Global 2000 enterprises
Lessons Learned – H2O on Containers
Thank you!
www.bluedata.ai
@BlueData @NandaVijaydev
Nanda Vijaydev
Lead Data Scientist and Senior Director of Solutions
BlueData (now part of HPE)
https://www.linkedin.com/in/nanda-vijaydev-3638693
#H2OWORLD

More Related Content

What's hot

H2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithmsH2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithms
Sri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
Sri Ambati
 
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Sri Ambati
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
Saurabh Kaushik
 
Productionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache SparkProductionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache Spark
Sri Ambati
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU
Sri Ambati
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Sri Ambati
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Sri Ambati
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Microsoft
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
Sri Ambati
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
iguazio
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Data Con LA
 
H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI Workshop
Sri Ambati
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Sri Ambati
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
Henry Saputra
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
Sri Ambati
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYC
Sri Ambati
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
Sri Ambati
 

What's hot (20)

H2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithmsH2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithms
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Productionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache SparkProductionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache Spark
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
 
H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI Workshop
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYC
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
 

Similar to Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Big-Data-as-a-Service (BDaaS) Meetup
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2O
Sri Ambati
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
vty
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
BlueData DataSheet
BlueData DataSheetBlueData DataSheet
BlueData DataSheet
Greg Kirchoff
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
Tillmann Eitelberg
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
Gwen (Chen) Shapira
 
Persistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectPersistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU project
vty
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
DataWorks Summit
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipeline
DataWorks Summit
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Hortonworks
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 

Similar to Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco (20)

Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2O
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
BlueData DataSheet
BlueData DataSheetBlueData DataSheet
BlueData DataSheet
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Persistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectPersistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU project
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipeline
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 

Recently uploaded (20)

What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 

Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

  • 1. Deploying H2O in Large-Scale Distributed Environments using Containers Nanda Vijaydev Lead Data Scientist and Senior Director of Solutions BlueData (now part of HPE) www.bluedata.ai @NandaVijaydev @BlueData #H2OWORLD
  • 2. Hype versus Reality … Source: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
  • 3. AI / ML Stack: Tools + Infrastructure Multiple ML / DL frameworks, notebooks, and other tools Data Science Notebooks Analytics & BI Tools Pipeline Tools Data Scientists Developers Data Engineers Decision Makers Role-based access control Tools for distributed AI / ML / DL pipeline RDBMS HDFS Streams Spark Model Storage Workflow Mgmt Data Frameworks Access to data and storage
  • 4. • Access to valuable data: small, big, or both • Choices of modeling techniques: each problem is different • Ability to build on datasets, validate on other datasets, iterate, and improve • Access to GPUs (and CPUs) • Scale easily on real datasets • Ability to operationalize in production Distributed AI / ML / DL – Key Requirements Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication
  • 5. • Scalability, repeatability, complexity, reproducibility across environments • Sharing data, not duplicating data • Deploying distributed platforms, libraries, applications, and versions • Efficiently sharing expensive resources like GPUs • Agility to scale up and down compute resources • Providing a future-proof solution • Ensuring compatible NVIDIA device kernel module installation Distributed AI / ML / DL – Challenges Laptop On-Prem Cluster Off-Prem Cluster
  • 7. Docker is a computer program that performs operating-system-level virtualization also known as containerization. Containerization allows the existence of multiple isolated user-space instances. Docker Containers Source: https://en.wikipedia.org/wiki/docker_(software)
  • 8. Container-Based Platform for AI / ML Data Scientists Developers Data Engineers Data Analysts BI/Analytics Tools Bring-Your-Own NFS HDFS Compute Storage On-Premises Public Cloud ML/DL & Big Data Tools Data Science Tools CPUs GPUs IOBoost™ – Extreme performance and enterprise-grade scalability ElasticPlane™ – Self-service, multi-tenant containerized environments DataTap™ – In-place access to data on-prem or in the cloud BlueData EPIC™ Software Platform H2O for AI/ML
  • 9. Accelerated AI / ML Deployment • With H2O + BlueData, customers now have: • Pre-built Docker H2O images with CUDA and automated cluster creation for the entire stack • Appropriate NVIDIA kernel module surfaced automatically to the containers • Easy access to resources required (e.g. single node, single GPU, multi- node, multi-GPU combinations) • UI, CLI, and API access (notebooks, web, SSH) • NFS mounts surfaced as local drives for sharing assets
  • 10. Example of an H2O Pipeline on Containers H2O Driverless AI Import Validate Export Shared Data Access Layer … Data Sources …
  • 11. Docker images for multiple applications and versions Ability to create and add new images, and save or restore tested combinations on demand Deploy H2O from Pre-Built Images in the BlueData EPIC App Store
  • 12. Multi-Tenant, with Quotas for GPU Resources Support for multi-tenancy and ability to define quota per tenant Define ‘flavor’ types used to launch Docker containers
  • 13. Spin Up Multiple Environments Quick launch templates for one-click cluster creation Run multiple clusters, with different versions or combinations of tools, side by side
  • 14. Pick from a list of pre-built and tested images Assign specific resources (GPUs, CPUs) to the cluster, depending on the use case (e.g. for Driverless AI) Define number of nodes, here for H2O and Sparkling Water On-Demand Cluster Creation
  • 15. • 2 Docker containers running different versions of Driverless AI with 1 GPU (Tesla P100) each • NVIDIA device kernel (driver version: 390.46) • NVIDIA CUDA (9.x) and cuDNN libraries including 1. libcudnn7-7.4.2.24-1.cuda9.0.x86_64.rpm 2. libcudnn7-devel-7.4.2.24- 1.cuda9.0.x86_64.rpm Source: https://medium.com/linagora-engineering/making-image-classification-simple-with-spark-deep-learning-f654a8b876b8 H2O Driverless AI with GPUs
  • 16. • The user authenticates on Driverless AI • Import datasets from BlueData DataTap with DataTap connector, optimized access with BlueData IOBoost • Analyze the data • Run experiments • Build models, save them … • Validate against other datasets from DataTap … • Export model for production Run Driverless AI on Containers with GPUs dtap
  • 17. • Optionally initialize Sparkling Water against an existing H2O cluster created previously [external backend] • Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity • Work on your dataset using the HDFS connectivity Work with Sparkling Water Cluster and HDFS
  • 18. • BlueData EPIC automatically deploys the environments • Using persistent containers • Providing true multi-tenancy • Access to shared resources (CPU, RAM, GPUs, storage) • Pre-built H2O images in the BlueData EPIC App Store • Enterprise-grade security (integration with AD /LDAP / TDE) Simplify H2O Deployments at Scale in Minutes
  • 19. BlueData DataTap BlueData IOBoost Enable Compute / Storage Separation Connect the clusters to different datasets without copying the data, and with performance optimized
  • 20. From the BlueData EPIC App Store, deploy more application clusters to connect to H2O Integrate H2O with Production Environment
  • 21. • Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data …)  This complexity can be abstracted from data science teams with self- service provisioning and automation, using containers  GPU access can be effectively used by the containerized application, then released for other applications and users  For a flexible and scalable solution, data resources should be decoupled from compute • H2O, Driverless AI, and Sparkling Water can be deployed at scale on containers – whether on-premises, on any public cloud, or hybrid  BlueData + H2O proven in production with Global 2000 enterprises Lessons Learned – H2O on Containers
  • 22. Thank you! www.bluedata.ai @BlueData @NandaVijaydev Nanda Vijaydev Lead Data Scientist and Senior Director of Solutions BlueData (now part of HPE) https://www.linkedin.com/in/nanda-vijaydev-3638693 #H2OWORLD

Editor's Notes

  1. Deep learning uses general learning algorithms The algorithms need to build the layers of an artificial neural network Training data Processing this training data requires lots of computation Matrix multiplications
  2. The #1 challenge with respect to bringing the DevOps mindset to to Big Data is the scalability, reproducibility and repeatability. It’s easy enough for developers to work on their laptops. Data scientists sometimes prototype the entire pipeline on a powerful laptop with a whatever it takes, “make it work” mentality. You can take a single node VM, install a bunch of libraries and work on smallish data sets. But will that same program successfully deploy and work on a real environment that uses multi-node clusters, potentially different versions and libraries and more importantly significantly larger volumes of data. This last aspect is unique to the Big Data and is one of single biggest reason that data team are unable to iterate rapidly ML / DL local Single node VM Local libraries Limited data (10s of GB) “It works on my laptop” Multi-node environments Different versions Different environment variables Libraries and dependencies must exist on all nodes Big Data (TBs of data)
  3. The BlueData EPIC software platform leverages Docker container technology – together with patented innovations – to deliver self-service, speed, security, and efficiency for Big Data Analytics, Data Science, and AI / ML / DL environments. The key components of the BlueData EPIC (which stands for Elastic Private Instant Clusters) platform are: • ElasticPlane™ enables users to spin up virtual clusters on-demand in a secure, multi-tenant environment. • IOBoost™ ensures performance on par with bare-metal, with the agility and simplicity of Docker containers. • DataTap™ accelerates time-to-value for Big Data by eliminating time-consuming data movement. Our software platform can be deployed on any infrastructure – whether on-premises, in the public cloud (e.g. AWS and now Azure and GCP), or a hybrid architecture