Serving Deep Learning Models At Scale With RedisAI: Luca Antiga

PRESENTED BY
Serving Deep Learning Models at
Scale with RedisAI
Luca Antiga
[tensor]werk, CEO

PRESENTED BY
1 Don’t say AI until you productionize
Deep learning and challenges for production
2 Introducing RedisAI + Roadmap
Architecture, API, operation, what’s next
3 Demo
Live notebook by Chris Fragly, Pipeline.ai
Agenda:

PRESENTED BY
• Co-founder and CEO of [tensor]werk
Infrastructure for data-defined software
• Co-founder of Orobix 
AI in healthcare/manufacturing/gaming/+
• PyTorch contributor in 2017-2018
• Co-author of Deep Learning with
PyTorch, Manning (with Eli Stevens)
Who am I (@lantiga)

PRESENTED BY
Don’t say AI until you productionize

PRESENTED BY
• TensorFlow
• PyTorch
• MXNet
• CNTK, Chainer,
• DyNet, DL4J,
• Flux, …
Deep learning frameworks

PRESENTED BY
• Python code behind e.g. Flask
• Execution service from cloud provider
• Runtime
• TensorFlow serving
• Clipper
• NVIDIA TensorRT inference server
• MXNet Model Server
• …
• Bespoke solutions (C++, …)
Production strategies
https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58

PRESENTED BY
• Must fit the technology stack
• Not just about languages, but about
semantics, scalability, guarantees
• Run anywhere, any size
• Composable building blocks
• Must try to limit the amount of moving
parts
• And the amount of moving data
• Must make best use of resources
Production requirements

PRESENTED BY
• A Redis module providing
• Tensors as a data type
• and Deep Learning model execution
• on CPU and GPU
• It turns Redis into a full-fledged deep
learning runtime
• While still being Redis
What it is

PRESENTED BY
RedisAI
New data type:
Tensor
def addsq(a, b):
return (a + b)**2
TorchScript
CPU
GPU0
GPU1
…

PRESENTED BY
• Tensors: framework-agnostic
• Queue + processing thread
• Backpressure
• Redis stays responsive
• Models are kept hot in memory
• Client blocks
Architecture

PRESENTED BY
• redisai.io
• github.com/RedisAI/RedisAI
Where to get it
docker run -p 6379:6379 -it --rm redisai/redisai

PRESENTED BY
• AI.TENSORSET
• AI.TENSORGET
API: Tensor
AI.TENSORSET foo FLOAT 2 2 VALUES 1 2 3 4
AI.TENSORSET foo FLOAT 2 2 BLOB < buffer.raw
AI.TENSORGET foo BLOB
AI.TENSORGET foo VALUES
AI.TENSORGET foo META

PRESENTED BY
• based on dlpack https://github.com/dmlc/dlpack
• framework-independent
API: Tensor
typedef struct {
void* data;
DLContext ctx;
int ndim;
DLDataType dtype;
int64_t* shape;
int64_t* strides;
uint64_t byte_offset;
} DLTensor;

PRESENTED BY
• AI.MODELSET
API: Model
AI.MODELSET resnet18 TORCH GPU < foo.pt
AI.MODELSET resnet18 TF CPU INPUTS in1 OUTPUTS linear4 < foo.pt
https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re

PRESENTED BY
• AI.MODELRUN
API: Model
AI.MODELRUN resnet18 INPUTS foo OUTPUTS bar
https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re

PRESENTED BY
• TensorFlow (+ Keras): freeze graph
Exporting models
import tensorflow as tf
var_converter = tf.compat.v1.graph_util.convert_variables_to_constants
with tf.Session() as sess:
sess.run([tf.global_variables_initializer()])
frozen_graph = var_converter(sess, sess.graph_def, ['output'])
tf.train.write_graph(frozen_graph, '.', 'resnet50.pb', as_text=False)
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/tensorflow/model_saver.py

PRESENTED BY
• PyTorch: JIT model
Exporting models
import torch
batch = torch.randn((1, 3, 224, 224))
traced_model = torch.jit.trace(model, batch)
torch.jit.save(traced_model, 'resnet50.pt')
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/pytorch/model_saver.py

PRESENTED BY
• AI.SCRIPTSET
• AI.SCRIPTRUN
API: Script
AI.SCRIPTSET myadd2 GPU < addtwo.txt
AI.SCRIPTRUN myadd2 addtwo INPUTS foo OUTPUTS bar
def addtwo(a, b):
return a + b
addtwo.txt

PRESENTED BY
• SCRIPT is a TorchScript interpreter
• Python-like syntax for tensor ops
• on CPU and GPU
• Vast library of tensor operations
• Allows to prescribe computations
directly (without exporting from a
Python env, etc)
• Pre-proc, post-proc (but not only)
Scripts?
Ref: https://pytorch.org/docs/stable/jit.html
SCRIPT
PT MODEL
TF MODEL
SCRIPTin_key out_key

PRESENTED BY
• RDB supported
• AOF almost :-)
• Tensors are serialized (meta + blob)
• Models are serialized back into
protobuf
• Scripts are serialized as strings
Persistence

PRESENTED BY
• Master-replica supported for all data
types
• Right now, run cmds replicated too 
(post-conf: replication of results of
computations where appropriate)
• Cluster supported, caveat: sharding
models and scripts
• For the moment, use hash tags 
{foo}resnet18 {foo}input1234
Replication

PRESENTED BY
• NOTE: any Redis client works right now
• JRedisAI https://github.com/RedisAI/JRedisAI
• redisai-py https://github.com/RedisAI/redisai-py
• Coming up: NodeJS, Go, … (community?)
RedisAI client libraries

PRESENTED BY
RedisAI from NodeJS

PRESENTED BY
• Keep the data local
• Keep stack short
• Run everywhere Redis runs
• Run multi-backend
• Stay language-independent
• Optimize use of resources
• Keep models hot
• HA with sentinel, clustering
Advantages of RedisAI today
https://github.com/RedisAI/redisai-examples

PRESENTED BY
Roadmap: DAG
AI.DAGRUN
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label
• DAG = Direct Acyclic Graph
• Atomic operation
• Volatile keys (~x~): command-local, don’t touch keyspace

PRESENTED BY
Roadmap: DAG
AI.DAGRUNRO
TENSORSET ~img~ FLOAT 1 3 224 224 BLOB ...
SCRIPTRUN preproc normalize ~img~ ~in~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS ~label~
TENSORGET ~label~ VALUES
• AI.DAGRUNRO:
• if no key is written, replicas can execute
• errors if commands try to write to non-volatile keys
DAGRUN
DAGRUNRO
DAGRUNRO
DAGRUNRO

PRESENTED BY
Roadmap: DAG
AI.MODELSET resnet18a TF GPU0 ...
AI.MODELSET resnet18b TORCH GPU1 ...
AI.TENSORSET img ...
AI.DAGRUN
MODELRUN resnet18a INPUTS ~in~ OUTPUTS ~out1~
MODELRUN resnet18b INPUTS ~in~ OUTPUTS ~out2~
SCRIPTRUN postproc ensamble INPUTS ~out1~ ~out2~ OUTPUTS ~out~
• Parallel multi-device execution
• One queue per device
normalize
GPU0 GPU1
ensamble
probstolabels

PRESENTED BY
Roadmap: DAG
AI.DAGRUNASYNC
=> 12634
AI.DAGRUNINFO 1264
=> RUNNING [... more details …]
AI.DAGRUNINFO 1264
=> DONE
• AI.DAGRUNASYNC:
• non-blocking, returns an ID
• ID can be used to later retrieve status + keyspace notification

PRESENTED BY
• Pervasive use of streams
Roadmap: streams
AI.DAGRUN
SCRIPTRUN preproc normalize instream ~in~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream
AI.DAGRUNASYNC
SCRIPTRUN preproc normalize instream ~in~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream

PRESENTED BY
• Runtime from Microsoft
• ONNX
• exchange format for NN
• export from many frameworks
(MXNet, CNTK, …)
• ONNX-ML
• ONNX for machine learning
models (RandomForest, SVN, K-
means, etc)
• export from scikit-learn
Roadmap: ONNXRuntime backend

PRESENTED BY
• Auto-batching
• Transparent batching of requests
• Queue gets rearranged according
to other analogous requests in the
queue, time to response
• Only if different clients or async
• Time-to-response
• ~Predictable in DL graphs
• Can reject request if TTR >
estimated time of queue + run
Roadmap: Auto-batching
concat along 0-th dim
run batch
Run if TTR
< 800ms
analyze queue
Reject Run if TTR
< 800ms
OK
A B

PRESENTED BY
• Dynamic loading of backends:
• don’t load what you don’t need
(especially on GPU)
• Monitoring with AI.INFO:
• more granular information on
running operations, commands,
memory
Roadmap: Misc
AI.CONFIG LOADBACKEND TF CPU
AI.CONFIG LOADBACKEND TF GPU
AI.CONFIG LOADBACKEND TORCH CPU
AI.CONFIG LOADBACKEND TORCH GPU
...
AI.INFO
AI.INFO MODEL key
AI.INFO SCRIPT key
...

PRESENTED BY
• Advanced monitoring
• Health
• Performance
• Model metrics (reliability)
• A/B testing
• Module integration, e.g.
• RediSearch (FAISS)
• Anomaly detection with RedisTS
• Training/fine-tuning in-Redis
RedisAI Enterprise

PRESENTED BY
• [tensor]werk
• Sherin Thomas, Rick Izzo, Pietro Rota
• RedisLabs
• Guy Korland, Itamar Haber, Pieter
Cailliau, Meir Shpilraien, Mark
Nunberg, Ariel Madar
• Orobix
• Everyone!
• Manuela Bazzana, Daniele Ciriello,
Lisa Lozza, Simone Manini,
Alessandro Re
Acknowledgements

PRESENTED BY
Development tools for the data-
defined software era
1. Launching April 2019
2. Looking for investors, users,
contributors
Projects:
- RedisAI: enterprise-grade runtime
(with Redis Labs)
- Hangar: version control for tensor
data (mid April, BSD)
Hit me up!
luca@tensorwerk.com
tensorwerk.com

PRESENTED BY
Chris Fregly, Pipeline.ai
• End-to-End Deep Learning from Research to Production
• Any Cloud, Any Framework, Any Hardware
• Free Community Edition: https://community.pipeline.ai

Serving Deep Learning Models At Scale With RedisAI: Luca Antiga

Related slideshows

More Related Content

Serving Deep Learning Models At Scale With RedisAI: Luca Antiga