DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the Cloud

Building Scalable End-to-End Deep Learning Pipelines in
the Cloud
Rustem Feyzkhanov
Machine Learning Engineer @ Instrumental
AWS Machine Learning Hero

DataTalks.Club
• Cloud native orchestrators are convenient for constructing
scalable end-to-end deep learning pipelines

• There are multiple services at your disposal for constructing deep
learning workflow and it depends on your context

• You can deploy this kind of workflows pretty easily even for
research projects
Takeaways

DataTalks.Club
Data science process
from https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
Business
understanding

Data
acquisition

Modeling Deployment
Customer
acceptance

- Define objectives

- Identify data
sources
- Ingest data

- Explore data

- Update data
- Feature selection

- Create model

- Train model
- Operationalize - Testing and
validation

- Handoff

- Re-train and re-
score

DataTalks.Club
Data science process
Modeling
Deployment
Customer
acceptance

Challenges:
- starting fast

- being flexible

- integrating with
production
infrastructure

DataTalks.Club
What is serverless
On premise Iaas CaaS PaaS FaaS SaaS
Functions Functions Functions Functions Functions Functions
Application Application Application Application Application Application
Runtime Runtime Runtime Runtime Runtime Runtime
Container Container Container Container Container Container
Operating system Operating system Operating system Operating system Operating system Operating system
Vizualization Vizualization Vizualization Vizualization Vizualization Vizualization
Networking Networking Networking Networking Networking Networking
Storage Storage Storage Storage Storage Storage
Hardware Hardware Hardware Hardware Hardware Hardware

DataTalks.Club
• On-demand cluster/worker/service to scale with your
consumption

• No upfront costs, pay-as-you-go pricing, no payment for idle
time*

• Low operational support

• Defined as Infrastructure as Code (IaC)

• Built-in service integrations
Serverless

DataTalks.Club
• Low operational support + no upfront costs

=> easy to start

• Defined as IaC + built-in service integrations

=> flexible infrastructure

• Scalable + built-in service integrations

=> integratabtle with production infrastructure
Why serverless

DataTalks.Club
•Data preprocessing

•DL/ML training

•DL/ML inference
Production pipeline steps

DataTalks.Club
Task

• Getting and transforming data from multiple sources

Challenges

• Combination of multiple frameworks and libraries

• Scaling based on load

• Combination of heavy processing, long running processing and
parallel one
Data preprocessing

DataTalks.Club
Tasks

• Training and publishing the model

• Checking multiple sets of hyperparameters

• Handling semi-automatic logic

Challenges

• High cost of GPU instances

• Higher level of uncertainty compared to normal Software Engineering
ML/DL training

DataTalks.Club
Task

• Making predictions based on new incoming data

Challenges

• Handling production requirements - latency/load/cost

• Handling multiple frameworks

• Handling model versioning

• Implementing custom logic for choosing the result
ML/DL inference

DataTalks.Club
Container/Function-as-a-Service

DataTalks.Club
• On-demand cluster/worker to scale with your consumption

• Requires to define just code and launching configuration

• Scaling technique:

• scales based on job queue (AWS Batch)

• starts VM per job (AWS Fargate, Amazon SageMaker)

• starts worker per job (AWS Lambda)
'Serverless' cluster

DataTalks.Club
• On-demand cluster which scales based on Job Queue

• Consists of the following components:

• job definition

• job queue

• computer environment

• scheduler
AWS Batch

DataTalks.Club
• CaaS service which starts VM per job

• Can be used for both on demand processing and as a scalable
web server

• Fargate can only use customizable CPU instances
AWS Fargate

DataTalks.Club
Amazon SageMaker - Processing jobs
• CaaS service which starts VM/VMs per job

• Can be used only for on demand processing

• Can be used only with specific ml. instance types

• Instances can mount S3 buckets as disks

DataTalks.Club
Amazon SageMaker - Training jobs
• CaaS service which starts VM/VMs per job

• Can be used only for on demand processing, but also can run cluster of
VMs for a job in case of distributed training

• Can be used only with specific ml. instance types

• Instances can mount S3 buckets as disks

• Handles monitoring, hyperparameters, data import and model export for
you

• Has support for spot instances and handles checkpointing

DataTalks.Club
AWS Lambda
Container pool
Lambda configuration
‣Code
‣Libraries
‣Configuration (memory, max time)
‣OR -> Docker image
Trigger
‣S3
‣API
‣DynamoDB
‣SQS, Kinesis
Warm container
Response
Re:Invent 2020

DataTalks.Club
'Serverless' cluster comparison
Lambda SageMaker Fargate Batch
Type FaaS
Pure container(s) as a
service

Pure container as a service
Service which starts
cluster and executes jobs
on it

Pros Fast startup time
(~100ms)

Price per 100ms 1ms

Very scalable
Most instance types
available

Build-in dashboard

Spot instances available

Customizable instances

Medium startup time
(~10-20s)

Full control VM


Cons Higher price per CPU/sec

Timeout limit

CPU limit - 2vCPU 6vCPU

RAM limit - 3GB 10GB

Medium startup time
(~30-1min)

Price per 1s (min 1 min)
Price per 1s (min 1 minute)

Only CPU
Slow startup time
(~1-4min)

Price per 1s (min 1 minute)
Use
cases
Short term processes GPU long running
processes
CPU long running
processes
CPU/GPU medium running
multiple tasks processes
Re:Invent 2020

DataTalks.Club
• Speed of single inference/training

• Speed of batch inference

• Cost per inference/training

• Scalability
CPU vs GPU for ML

DataTalks.Club
Inference cost - Inception V3
Service Type Inference
time (s)
Cost per hour Cost per
prediction
Cost of 1M
predictions
Cost per
month
Lambda
predictions
Lambda
3GB RAM

2vCPU 0.338 $0.18 $0.0000179 $17.9
AWS EC2
c5a.large

on demand 0.177 $0.077 $0.000003786 $3.79 $55.44 3.1M
AWS EC2
c5a.large

spot 0.177 $0.032 $0.000001573 $1.57 $23.04 1.29M
AWS EC2
p2.xlarge

on demand 0.057 $0.9 $0.00001425 $14.25 $648.00 36.2M
AWS EC2
p2.xlarge

spot 0.057 $0.27 $0.000004275 $4.28 $194.40 10.86M
AWS EC2
p3.2xlarge

on demand 0.027 $3.06 $0.00002295 $22.95 $2203.2 123.1M
AWS EC2
p3.2xlarge

Spot 0.027 $0.918 $0.000006885 $6.89 $660.96 36.93M
AWS EC2
inf1.large

on demand 0.0095 $0.368 $0.000000971 $0.97 $264.96 14.8M
AWS EC2
inf1.large

spot 0.0095 $0.1104 $0.000000291 $0.29 $79.49 4.44M

DataTalks.Club
• C5 Large Instance - 2 vCPU 4GB RAM
• AWS Lambda
• 3GB RAM x 0.00001667 x 3600 = 0.18$ per hour
• AWS Fargate
• 4GB RAM x 0.0044 + 2 vCPU x 0.0404 = 0.098$ per hour
• AWS Batch
• C5 Large On Demand = 0.085$ per hour
• C5 Large Spot = 0.033$ per hour
Price comparison - CPU

DataTalks.Club
• P2 Xlarge Instance - 1 NVIDIA K80 GPU, 4 vCPU
• Amazon SageMaker (recently reduced prices up to 18%)
• P2 Xlarge ML instance = 1.12$ per hour
• P2 Xlarge ML instance Spot = 0.33$ per hour
• AWS Batch
• P2 Xlarge On Demand = 0.90$ per hour
• P2 Xlarge Reserved = 0.42$ per hour
• P2 Xlarge Spot = 0.27$ per hour
Price comparison - GPU

DataTalks.Club
Modular approach
GPU
Deep learning application

DataTalks.Club
Modular approach
GPU
Data
gathering
Data
preprocessing
Model
training
Model
upload

DataTalks.Club
Modular approach
GPU
Data
gathering
Data
preprocessing
Model
training
Model
upload
Multicore CPU
FaaS
FaaS
FaaS
FaaS
CPU

DataTalks.Club
Platform-as-a-Service

DataTalks.Club
Rest API Event queue Orchestrator
Synchronous process

Short-term process

Simple intermediate logic

Doesn’t trace the whole
process

Cheap
Asynchronous process

Long-term process

Simple intermediate logic

Doesn’t trace the whole
process

Cheap
Asynchronous process

Long-term process

Complex intermediate
logic

Traces the process

Expensive
Microservice connectors

DataTalks.Club
• Native support for FaaS and CaaS
• Central monitoring
• Central logging and tracing
• On-demand scaling*
Cloud native orchestrators

DataTalks.Club
• Graph-based workflow
• Processing nodes - support for Fargate, ECS,
SageMaker, Lambda, Batch, Glue, EMR
• Logic for custom error handling
• Parallel dynamic execution
• Branching and loops
• Scheduler and waiter
• Pay-as-you-go ($0.025 /1,000 state transitions)
Amazon Step Functions

DataTalks.Club
• DAG-based workflow
• Processing nodes - SageMaker
Processing, Training, Data Brew
• Logic for custom error handling
• Data Lineage Tracking
• Human review step
• Free*
SageMaker Pipelines

DataTalks.Club
• DAG-based workflow (Airflow)
• Processing nodes - anything which
Airflow can support (works with
plugins)
• High flexibility
• Doesn’t scale automatically
• Pay per instance-time
Managed Apache Airflow (MWAA)

DataTalks.Club
Cloud native orchestrators
AWS Step Functions SageMaker Pipelines MWAA (Airflow)
Type PaaS
PaaS
Hosted Airflow

Pros Scales automatically

Integrations with many AWS
services

Pay-as-you-go
Scales automatically

Automatically track model lineage

Free
Easy to run locally

Extremely flexible with plugins
Cons Can’t run in local environment

Manually handle pipeline as
artifact
Can’t run in local environment

Only has integrations with
SageMaker services
Scales manually

Pay per instance time

Manually handle pipeline as
artifact
Use
cases
Integration with production
infrastructure
Research and semi production
workflows which require more
validation
Integration with complex
production infrastructure

DataTalks.Club
• Use scalable processing nodes AWS Lambda for short/parallel
processing

• Use scalable container service AWS Batch for heavy and parallel
processing and GPU training jobs

• Use Amazon SageMaker for GPU training jobs and distributed training

• Use scalable container service AWS Fargate for long running
processing

• Use orchestrator AWS Step Functions to organize workflows
Serverless approach

DataTalks.Club
• SageMaker processing jobs
for heavy processing

• Modular approach

• Parallel data download and
parsing

• FaaS for parallel
processing
SageMaker
Data download
Scalable
processing
Amazon SageMaker

DataTalks.Club
• Manual handling of input/
output data

• FaaS for parallel processing

• CaaS for heavy processing

• Modular approach

• Parallel data download and
parsing
Batch/Fargate
Data download
Scalable
processing
AWS Batch/Fargate
S3
S3
input data
output
data

DataTalks.Club
How do you know if this is for you
• You have peak loads and want to scale automatically

• You have custom logic (scheduler, error handling, etc) in business
logic

• You want to make customizable pipeline with multiple frameworks

DataTalks.Club
How do you know if this is NOT for you
• You need to run synchronous data processing workflows

=> in this case calling AWS Lambda or cluster is easier

• You have low CPU/RAM consuming workflow and want to optimize
costs

=> in this case SQS/Kinesis + AWS Lambda is a cheaper solution

DataTalks.Club
Repositories to check
https://github.com/ryfeus/stepfunctions2processing

• Serverless configuration files which allows to deploy:

• AWS Step Functions

• AWS Lambda

• AWS Batch + AWS Fargate

DataTalks.Club
• Automatic handling of hyper
parameters and metrics

• Automatic handling of model and
input data

• Automatic hyperparameters
optimization

• Automatic checkpoint handling

• Handling error on each branch

• Distributed training
Preprocessor
SageMaker
Mapper
Handler
Amazon SageMaker

DataTalks.Club
• Parallel training on multiple
sets of hyper parameters

• Central gathering of the
results

• Handling error on each
branch

• Capability for feedback loop

• Test after training
ML
Preprocessor
ML
ML
Batch
Mapper
Publisher
AWS Batch
S3
S3
parameters
checkpoint
data
model

DataTalks.Club
• Integrating production
cloud environment with
on-premise
infrastructure

• Preparing data and
providing access

• Handle publishing
completed model
Preprocessor
External infrastructure
S3
S3
Handler
Async task External
GPU
model
metrics
parameters
data

DataTalks.Club
• You have peak loads and want to scale automatically

• You need to run training jobs occasionally and want to minimize idle
time

• You have custom logic (scheduler, error handling, etc) in business
logic

• You want to integrate external infrastructure or multiple AWS
services

DataTalks.Club
• You need to run synchronous model training workflows

=> in this case using cluster is easier as Step Functions don’t support
synchronous workflows

• You need to maximize training speed

=> in this case using cluster minimizes start up time

DataTalks.Club

• Serverless configuration files which allows to deploy:

• AWS Step Functions

• AWS Lambda

• AWS Batch, Amazon SageMaker

DataTalks.Club
Route 53 ECR EC2
ASG
Spot
ECS
ALB
Usual AWS architecture for inference

DataTalks.Club
Route 53 API Gateway
Lambda
Architecture using Lambda

DataTalks.Club
SQS
Lambda

DataTalks.Club
SQS
Lambda
Step function

DataTalks.Club
• A/B testing/multi-armed bandit
to rollout new models

• Scalable inference which
allows to run batches in parallel

• Allows modular approach
(multiple frameworks)
Post processor
Preprocessor/feature extractor
Gather
data
Inference A Inference B
ML/DL inference pipeline

DataTalks.Club
Import from S3:

•Keras - h5 files

•TensorFlow - pb/ckpt files

•PyTorch - path files

Models in package:

•TensorFlow - TFlite export

•PyTorch - ONNX export

•OpenVino export
How to import models

DataTalks.Club
Models in Docker image:

•Keras - h5 files

•TensorFlow - pb/ckpt files

•PyTorch - path files

•TensorFlow - TFlite export

•PyTorch - ONNX export

•OpenVino export
How to import models
Re:Invent 2020

DataTalks.Club
Inference cost - Inception V3
Framework RAM
Cold
invocation
Warm
invocation
Cold inv
per 1$
Warm inv
per 1$
Tensorflow 3 GB 2.9s 0.6s 6.8K 32K
Tensorflow 1.5 GB 3.6s 1.1s 10.1K 35K
TFLite 3 GB 8.5s 0.4s 2.3K 47K
TFLite 1.5 GB 8.8s 0.7s 4.5K 54K

DataTalks.Club
Lifehacks for serverless inference
• Store model in memory for warm invocations

• Use AWS EFS for storing the model

• Store part of the model with the libraries

• Download model in parallel from storage

• Separate layers on multiple lambdas and chain them

• Batch the workload

• Balance RAM/Timeout to optimize your costs

DataTalks.Club
• You want to deploy your model for pet project

• You want to make s simple MVP for your startup/project

• You have simple model and this architecture will reduce cost

• You have peak loads and it is hard to manage clusters

DataTalks.Club
• You want to have real time response

• Your model requires a lot of data

• Your model requires a lot of processing power

• You want to handle large number of requests (>10M per month)

=> in this case cluster would be more suitable approach

DataTalks.Club
https://github.com/ryfeus/lambda-packs https://github.com/ryfeus/gcf-packs

• Packages for AWS Lambda and Google Cloud Functions including:

• Tensorflow (including 2.0), PyTorch - Deep Learning

• Scikit Learn, LightGBM, H2O - Machine Learning

• Scikit Image, Scipy, OpenCV, Tesseract - Image processing

• Spacy - Natural Language Processing

DataTalks.Club
• Cloud native orchestrators are convenient for constructing
scalable end-to-end deep learning pipelines

• There are multiple services at your disposal for constructing deep
learning workflow and it depends on your context

• You can deploy this kind of workflows pretty easily even for
research projects
Summary

DataTalks.Club
Thank you!
Packages for AWS Lambda and Google Cloud Functions

https://github.com/ryfeus/lambda-packs

https://github.com/ryfeus/gcf-packs

Infrastructure configuration files for AWS Step Functions, AWS Batch,
AWS Fargate, Amazon Sagemaker


Link to my website: https://ryfeus.io

DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the Cloud

More Related Content

What's hot

What's hot (20)

Similar to DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the Cloud

Similar to DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the Cloud (20)

Recently uploaded

Recently uploaded (20)

DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the Cloud