Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

Deploying H2O in
Large-Scale Distributed
Environments using
Containers
Nanda Vijaydev
Lead Data Scientist and Senior Director of Solutions
BlueData (now part of HPE)
www.bluedata.ai @NandaVijaydev @BlueData
#H2OWORLD

Hype versus Reality …
Source: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007

AI / ML Stack: Tools + Infrastructure
Multiple ML / DL frameworks, notebooks, and other tools
Data Science
Notebooks
Analytics
& BI Tools
Pipeline Tools
Data Scientists Developers Data Engineers Decision Makers
Role-based
access control
Tools for
distributed
AI / ML / DL
pipeline
RDBMS HDFS Streams Spark
Model
Storage
Workflow
Mgmt
Data Frameworks Access to data
and storage

• Access to valuable data: small, big, or both
• Choices of modeling techniques: each problem is
different
• Ability to build on datasets, validate on other
datasets, iterate, and improve
• Access to GPUs (and CPUs)
• Scale easily on real datasets
• Ability to operationalize in production
Distributed AI / ML / DL – Key Requirements
Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication

• Scalability, repeatability, complexity,
reproducibility across environments
• Sharing data, not duplicating data
• Deploying distributed platforms, libraries,
applications, and versions
• Efficiently sharing expensive resources like GPUs
• Agility to scale up and down compute resources
• Providing a future-proof solution
• Ensuring compatible NVIDIA device kernel
module installation
Distributed AI / ML / DL – Challenges
Laptop On-Prem
Cluster
Off-Prem
Cluster

Distributed
Machine Learning
with H20 on
Containers

Docker is a computer program that
performs operating-system-level virtualization
also known as containerization.
Containerization allows the existence of
multiple isolated user-space instances.
Docker Containers
Source: https://en.wikipedia.org/wiki/docker_(software)

Container-Based Platform for AI / ML
Data Scientists Developers Data Engineers Data Analysts
BI/Analytics Tools Bring-Your-Own
NFS HDFS
Compute
Storage
On-Premises Public Cloud
ML/DL & Big Data Tools Data Science Tools
CPUs GPUs
IOBoost™ – Extreme performance and enterprise-grade scalability
ElasticPlane™ – Self-service, multi-tenant containerized environments
DataTap™ – In-place access to data on-prem or in the cloud
BlueData EPIC™ Software Platform
H2O for AI/ML

Accelerated AI / ML Deployment
• With H2O + BlueData, customers now have:
• Pre-built Docker H2O images with CUDA and automated cluster
creation for the entire stack
• Appropriate NVIDIA kernel module surfaced automatically to the
containers
• Easy access to resources required (e.g. single node, single GPU, multi-
node, multi-GPU combinations)
• UI, CLI, and API access (notebooks, web, SSH)
• NFS mounts surfaced as local drives for sharing assets

Example of an H2O Pipeline on Containers
H2O Driverless AI
Import Validate
Export
Shared Data Access Layer
… Data Sources …

Docker images for multiple
applications and versions
Ability to create and
add new images, and
save or restore
tested combinations
on demand
Deploy H2O from Pre-Built Images in the
BlueData EPIC App Store

Multi-Tenant, with Quotas for GPU Resources
Support for multi-tenancy
and ability to define quota
per tenant
Define ‘flavor’ types used to
launch Docker containers

Spin Up Multiple Environments
Quick launch templates
for one-click cluster
creation
Run multiple clusters,
with different versions or
combinations of tools,
side by side

Pick from a list of
pre-built and tested images
Assign specific resources (GPUs,
CPUs) to the cluster, depending on
the use case (e.g. for Driverless AI)
Define number of nodes, here for
H2O and Sparkling Water
On-Demand Cluster Creation

• 2 Docker containers running different versions
of Driverless AI with 1 GPU (Tesla P100) each
• NVIDIA device kernel (driver version: 390.46)
• NVIDIA CUDA (9.x) and cuDNN libraries
including
1. libcudnn7-7.4.2.24-1.cuda9.0.x86_64.rpm
2. libcudnn7-devel-7.4.2.24-
1.cuda9.0.x86_64.rpm
Source: https://medium.com/linagora-engineering/making-image-classification-simple-with-spark-deep-learning-f654a8b876b8
H2O Driverless AI with GPUs

• The user authenticates on Driverless AI
• Import datasets from BlueData DataTap
with DataTap connector, optimized
access with BlueData IOBoost
• Analyze the data
• Run experiments
• Build models, save them …
• Validate against other datasets from
DataTap …
• Export model for production
Run Driverless AI on Containers with GPUs
dtap

• Optionally initialize Sparkling Water against an existing H2O cluster created previously
[external backend]
• Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity
• Work on your dataset using the HDFS connectivity
Work with Sparkling Water Cluster and HDFS

• BlueData EPIC automatically
deploys the environments
• Using persistent containers
• Providing true multi-tenancy
• Access to shared resources (CPU,
RAM, GPUs, storage)
• Pre-built H2O images in the
BlueData EPIC App Store
• Enterprise-grade security
(integration with AD /LDAP / TDE)
Simplify H2O Deployments at Scale in Minutes

BlueData DataTap
BlueData IOBoost
Enable Compute / Storage Separation
Connect the clusters to different datasets without
copying the data, and with performance optimized

From the BlueData EPIC App Store, deploy
more application clusters to connect to H2O
Integrate H2O with Production Environment

• Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data …)
 This complexity can be abstracted from data science teams with self-
service provisioning and automation, using containers
 GPU access can be effectively used by the containerized application,
then released for other applications and users
 For a flexible and scalable solution, data resources
should be decoupled from compute
• H2O, Driverless AI, and Sparkling Water can be
deployed at scale on containers – whether
on-premises, on any public cloud, or hybrid
 BlueData + H2O proven in production with Global 2000 enterprises
Lessons Learned – H2O on Containers

Thank you!
www.bluedata.ai
@BlueData @NandaVijaydev
Nanda Vijaydev
Lead Data Scientist and Senior Director of Solutions
BlueData (now part of HPE)
https://www.linkedin.com/in/nanda-vijaydev-3638693
#H2OWORLD

Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

Similar to Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

Editor's Notes