SlideShare a Scribd company logo
PRESENTED BY
Serving Deep Learning Models at
Scale with RedisAI
Luca Antiga
[tensor]werk, CEO
PRESENTED BY
1 Don’t say AI until you productionize
Deep learning and challenges for production
2 Introducing RedisAI + Roadmap
Architecture, API, operation, what’s next
3 Demo
Live notebook by Chris Fragly, Pipeline.ai
Agenda:
PRESENTED BY
• Co-founder and CEO of [tensor]werk
Infrastructure for data-defined software
• Co-founder of Orobix

AI in healthcare/manufacturing/gaming/+
• PyTorch contributor in 2017-2018
• Co-author of Deep Learning with
PyTorch, Manning (with Eli Stevens)
Who am I (@lantiga)
PRESENTED BY
Don’t say AI until you productionize
PRESENTED BY
• TensorFlow
• PyTorch
• MXNet
• CNTK, Chainer,
• DyNet, DL4J,
• Flux, …
Deep learning frameworks
PRESENTED BY
• Python code behind e.g. Flask
• Execution service from cloud provider
• Runtime
• TensorFlow serving
• Clipper
• NVIDIA TensorRT inference server
• MXNet Model Server
• …
• Bespoke solutions (C++, …)
Production strategies
https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58
PRESENTED BY
• Must fit the technology stack
• Not just about languages, but about
semantics, scalability, guarantees
• Run anywhere, any size
• Composable building blocks
• Must try to limit the amount of moving
parts
• And the amount of moving data
• Must make best use of resources
Production requirements
PRESENTED BY
RedisAI
PRESENTED BY
• A Redis module providing
• Tensors as a data type
• and Deep Learning model execution
• on CPU and GPU
• It turns Redis into a full-fledged deep
learning runtime
• While still being Redis
What it is
PRESENTED BY
RedisAI
New data type:
Tensor
def addsq(a, b):
return (a + b)**2
TorchScript
CPU
GPU0
GPU1
…
PRESENTED BY
• Tensors: framework-agnostic
• Queue + processing thread
• Backpressure
• Redis stays responsive
• Models are kept hot in memory
• Client blocks
Architecture
PRESENTED BY
• redisai.io
• github.com/RedisAI/RedisAI
Where to get it
docker run -p 6379:6379 -it --rm redisai/redisai
PRESENTED BY
• AI.TENSORSET
• AI.TENSORGET
API: Tensor
AI.TENSORSET foo FLOAT 2 2 VALUES 1 2 3 4
AI.TENSORSET foo FLOAT 2 2 BLOB < buffer.raw
AI.TENSORGET foo BLOB
AI.TENSORGET foo VALUES
AI.TENSORGET foo META
PRESENTED BY
• based on dlpack https://github.com/dmlc/dlpack
• framework-independent
API: Tensor
typedef struct {
void* data;
DLContext ctx;
int ndim;
DLDataType dtype;
int64_t* shape;
int64_t* strides;
uint64_t byte_offset;
} DLTensor;
PRESENTED BY
• AI.MODELSET
API: Model
AI.MODELSET resnet18 TORCH GPU < foo.pt
AI.MODELSET resnet18 TF CPU INPUTS in1 OUTPUTS linear4 < foo.pt
https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
PRESENTED BY
• AI.MODELRUN
API: Model
AI.MODELRUN resnet18 INPUTS foo OUTPUTS bar
https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
PRESENTED BY
• TensorFlow (+ Keras): freeze graph
Exporting models
import tensorflow as tf
var_converter = tf.compat.v1.graph_util.convert_variables_to_constants
with tf.Session() as sess:
sess.run([tf.global_variables_initializer()])
frozen_graph = var_converter(sess, sess.graph_def, ['output'])
tf.train.write_graph(frozen_graph, '.', 'resnet50.pb', as_text=False)
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/tensorflow/model_saver.py
PRESENTED BY
• PyTorch: JIT model
Exporting models
import torch
batch = torch.randn((1, 3, 224, 224))
traced_model = torch.jit.trace(model, batch)
torch.jit.save(traced_model, 'resnet50.pt')
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/pytorch/model_saver.py
PRESENTED BY
• AI.SCRIPTSET
• AI.SCRIPTRUN
API: Script
AI.SCRIPTSET myadd2 GPU < addtwo.txt
AI.SCRIPTRUN myadd2 addtwo INPUTS foo OUTPUTS bar
def addtwo(a, b):
return a + b
addtwo.txt
PRESENTED BY
• SCRIPT is a TorchScript interpreter
• Python-like syntax for tensor ops
• on CPU and GPU
• Vast library of tensor operations
• Allows to prescribe computations
directly (without exporting from a
Python env, etc)
• Pre-proc, post-proc (but not only)
Scripts?
Ref: https://pytorch.org/docs/stable/jit.html
SCRIPT
PT MODEL
TF MODEL
SCRIPTin_key out_key
PRESENTED BY
• RDB supported
• AOF almost :-)
• Tensors are serialized (meta + blob)
• Models are serialized back into
protobuf
• Scripts are serialized as strings
Persistence
PRESENTED BY
• Master-replica supported for all data
types
• Right now, run cmds replicated too

(post-conf: replication of results of
computations where appropriate)
• Cluster supported, caveat: sharding
models and scripts
• For the moment, use hash tags

{foo}resnet18 {foo}input1234
Replication
PRESENTED BY
• NOTE: any Redis client works right now
• JRedisAI https://github.com/RedisAI/JRedisAI
• redisai-py https://github.com/RedisAI/redisai-py
• Coming up: NodeJS, Go, … (community?)
RedisAI client libraries
PRESENTED BY
RedisAI from NodeJS
PRESENTED BY
RedisAI from NodeJS
PRESENTED BY
• Keep the data local
• Keep stack short
• Run everywhere Redis runs
• Run multi-backend
• Stay language-independent
• Optimize use of resources
• Keep models hot
• HA with sentinel, clustering
Advantages of RedisAI today
https://github.com/RedisAI/redisai-examples
PRESENTED BY
Roadmap
PRESENTED BY
Roadmap: DAG
AI.DAGRUN
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label
• DAG = Direct Acyclic Graph
• Atomic operation
• Volatile keys (~x~): command-local, don’t touch keyspace
PRESENTED BY
Roadmap: DAG
AI.DAGRUNRO
TENSORSET ~img~ FLOAT 1 3 224 224 BLOB ...
SCRIPTRUN preproc normalize ~img~ ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS ~label~
TENSORGET ~label~ VALUES
• AI.DAGRUNRO:
• if no key is written, replicas can execute
• errors if commands try to write to non-volatile keys
DAGRUN
DAGRUNRO
DAGRUNRO
DAGRUNRO
PRESENTED BY
Roadmap: DAG
AI.MODELSET resnet18a TF GPU0 ...
AI.MODELSET resnet18b TORCH GPU1 ...
AI.TENSORSET img ...
AI.DAGRUN
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet18a INPUTS ~in~ OUTPUTS ~out1~
MODELRUN resnet18b INPUTS ~in~ OUTPUTS ~out2~
SCRIPTRUN postproc ensamble INPUTS ~out1~ ~out2~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label
• Parallel multi-device execution
• One queue per device
normalize
GPU0 GPU1
ensamble
probstolabels
PRESENTED BY
Roadmap: DAG
AI.DAGRUNASYNC
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label
=> 12634
AI.DAGRUNINFO 1264
=> RUNNING [... more details …]
AI.DAGRUNINFO 1264
=> DONE
• AI.DAGRUNASYNC:
• non-blocking, returns an ID
• ID can be used to later retrieve status + keyspace notification
PRESENTED BY
• Pervasive use of streams
Roadmap: streams
AI.DAGRUN
SCRIPTRUN preproc normalize instream ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream
AI.DAGRUNASYNC
SCRIPTRUN preproc normalize instream ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream
PRESENTED BY
• Runtime from Microsoft
• ONNX
• exchange format for NN
• export from many frameworks
(MXNet, CNTK, …)
• ONNX-ML
• ONNX for machine learning
models (RandomForest, SVN, K-
means, etc)
• export from scikit-learn
Roadmap: ONNXRuntime backend
PRESENTED BY
• Auto-batching
• Transparent batching of requests
• Queue gets rearranged according
to other analogous requests in the
queue, time to response
• Only if different clients or async
• Time-to-response
• ~Predictable in DL graphs
• Can reject request if TTR >
estimated time of queue + run
Roadmap: Auto-batching
concat along 0-th dim
run batch
Run if TTR
< 800ms
analyze queue
Reject Run if TTR
< 800ms
OK
A B
PRESENTED BY
• Dynamic loading of backends:
• don’t load what you don’t need
(especially on GPU)
• Monitoring with AI.INFO:
• more granular information on
running operations, commands,
memory
Roadmap: Misc
AI.CONFIG LOADBACKEND TF CPU
AI.CONFIG LOADBACKEND TF GPU
AI.CONFIG LOADBACKEND TORCH CPU
AI.CONFIG LOADBACKEND TORCH GPU
...
AI.INFO
AI.INFO MODEL key
AI.INFO SCRIPT key
...
PRESENTED BY
• Advanced monitoring
• Health
• Performance
• Model metrics (reliability)
• A/B testing
• Module integration, e.g.
• RediSearch (FAISS)
• Anomaly detection with RedisTS
• Training/fine-tuning in-Redis
RedisAI Enterprise
PRESENTED BY
• [tensor]werk
• Sherin Thomas, Rick Izzo, Pietro Rota
• RedisLabs
• Guy Korland, Itamar Haber, Pieter
Cailliau, Meir Shpilraien, Mark
Nunberg, Ariel Madar
• Orobix
• Everyone!
• Manuela Bazzana, Daniele Ciriello,
Lisa Lozza, Simone Manini,
Alessandro Re
Acknowledgements
PRESENTED BY
Development tools for the data-
defined software era
1. Launching April 2019
2. Looking for investors, users,
contributors
Projects:
- RedisAI: enterprise-grade runtime
(with Redis Labs)
- Hangar: version control for tensor
data (mid April, BSD)
Hit me up!
luca@tensorwerk.com
tensorwerk.com
PRESENTED BY
Chris Fregly, Pipeline.ai
• End-to-End Deep Learning from Research to Production
• Any Cloud, Any Framework, Any Hardware
• Free Community Edition: https://community.pipeline.ai
Thank you!
PRESENTED BY

More Related Content

Serving Deep Learning Models At Scale With RedisAI: Luca Antiga

  • 1. PRESENTED BY Serving Deep Learning Models at Scale with RedisAI Luca Antiga [tensor]werk, CEO
  • 2. PRESENTED BY 1 Don’t say AI until you productionize Deep learning and challenges for production 2 Introducing RedisAI + Roadmap Architecture, API, operation, what’s next 3 Demo Live notebook by Chris Fragly, Pipeline.ai Agenda:
  • 3. PRESENTED BY • Co-founder and CEO of [tensor]werk Infrastructure for data-defined software • Co-founder of Orobix
 AI in healthcare/manufacturing/gaming/+ • PyTorch contributor in 2017-2018 • Co-author of Deep Learning with PyTorch, Manning (with Eli Stevens) Who am I (@lantiga)
  • 4. PRESENTED BY Don’t say AI until you productionize
  • 5. PRESENTED BY • TensorFlow • PyTorch • MXNet • CNTK, Chainer, • DyNet, DL4J, • Flux, … Deep learning frameworks
  • 6. PRESENTED BY • Python code behind e.g. Flask • Execution service from cloud provider • Runtime • TensorFlow serving • Clipper • NVIDIA TensorRT inference server • MXNet Model Server • … • Bespoke solutions (C++, …) Production strategies https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58
  • 7. PRESENTED BY • Must fit the technology stack • Not just about languages, but about semantics, scalability, guarantees • Run anywhere, any size • Composable building blocks • Must try to limit the amount of moving parts • And the amount of moving data • Must make best use of resources Production requirements
  • 9. PRESENTED BY • A Redis module providing • Tensors as a data type • and Deep Learning model execution • on CPU and GPU • It turns Redis into a full-fledged deep learning runtime • While still being Redis What it is
  • 10. PRESENTED BY RedisAI New data type: Tensor def addsq(a, b): return (a + b)**2 TorchScript CPU GPU0 GPU1 …
  • 11. PRESENTED BY • Tensors: framework-agnostic • Queue + processing thread • Backpressure • Redis stays responsive • Models are kept hot in memory • Client blocks Architecture
  • 12. PRESENTED BY • redisai.io • github.com/RedisAI/RedisAI Where to get it docker run -p 6379:6379 -it --rm redisai/redisai
  • 13. PRESENTED BY • AI.TENSORSET • AI.TENSORGET API: Tensor AI.TENSORSET foo FLOAT 2 2 VALUES 1 2 3 4 AI.TENSORSET foo FLOAT 2 2 BLOB < buffer.raw AI.TENSORGET foo BLOB AI.TENSORGET foo VALUES AI.TENSORGET foo META
  • 14. PRESENTED BY • based on dlpack https://github.com/dmlc/dlpack • framework-independent API: Tensor typedef struct { void* data; DLContext ctx; int ndim; DLDataType dtype; int64_t* shape; int64_t* strides; uint64_t byte_offset; } DLTensor;
  • 15. PRESENTED BY • AI.MODELSET API: Model AI.MODELSET resnet18 TORCH GPU < foo.pt AI.MODELSET resnet18 TF CPU INPUTS in1 OUTPUTS linear4 < foo.pt https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
  • 16. PRESENTED BY • AI.MODELRUN API: Model AI.MODELRUN resnet18 INPUTS foo OUTPUTS bar https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
  • 17. PRESENTED BY • TensorFlow (+ Keras): freeze graph Exporting models import tensorflow as tf var_converter = tf.compat.v1.graph_util.convert_variables_to_constants with tf.Session() as sess: sess.run([tf.global_variables_initializer()]) frozen_graph = var_converter(sess, sess.graph_def, ['output']) tf.train.write_graph(frozen_graph, '.', 'resnet50.pb', as_text=False) https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/tensorflow/model_saver.py
  • 18. PRESENTED BY • PyTorch: JIT model Exporting models import torch batch = torch.randn((1, 3, 224, 224)) traced_model = torch.jit.trace(model, batch) torch.jit.save(traced_model, 'resnet50.pt') https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/pytorch/model_saver.py
  • 19. PRESENTED BY • AI.SCRIPTSET • AI.SCRIPTRUN API: Script AI.SCRIPTSET myadd2 GPU < addtwo.txt AI.SCRIPTRUN myadd2 addtwo INPUTS foo OUTPUTS bar def addtwo(a, b): return a + b addtwo.txt
  • 20. PRESENTED BY • SCRIPT is a TorchScript interpreter • Python-like syntax for tensor ops • on CPU and GPU • Vast library of tensor operations • Allows to prescribe computations directly (without exporting from a Python env, etc) • Pre-proc, post-proc (but not only) Scripts? Ref: https://pytorch.org/docs/stable/jit.html SCRIPT PT MODEL TF MODEL SCRIPTin_key out_key
  • 21. PRESENTED BY • RDB supported • AOF almost :-) • Tensors are serialized (meta + blob) • Models are serialized back into protobuf • Scripts are serialized as strings Persistence
  • 22. PRESENTED BY • Master-replica supported for all data types • Right now, run cmds replicated too
 (post-conf: replication of results of computations where appropriate) • Cluster supported, caveat: sharding models and scripts • For the moment, use hash tags
 {foo}resnet18 {foo}input1234 Replication
  • 23. PRESENTED BY • NOTE: any Redis client works right now • JRedisAI https://github.com/RedisAI/JRedisAI • redisai-py https://github.com/RedisAI/redisai-py • Coming up: NodeJS, Go, … (community?) RedisAI client libraries
  • 26. PRESENTED BY • Keep the data local • Keep stack short • Run everywhere Redis runs • Run multi-backend • Stay language-independent • Optimize use of resources • Keep models hot • HA with sentinel, clustering Advantages of RedisAI today https://github.com/RedisAI/redisai-examples
  • 28. PRESENTED BY Roadmap: DAG AI.DAGRUN SCRIPTRUN preproc normalize img ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label • DAG = Direct Acyclic Graph • Atomic operation • Volatile keys (~x~): command-local, don’t touch keyspace
  • 29. PRESENTED BY Roadmap: DAG AI.DAGRUNRO TENSORSET ~img~ FLOAT 1 3 224 224 BLOB ... SCRIPTRUN preproc normalize ~img~ ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS ~label~ TENSORGET ~label~ VALUES • AI.DAGRUNRO: • if no key is written, replicas can execute • errors if commands try to write to non-volatile keys DAGRUN DAGRUNRO DAGRUNRO DAGRUNRO
  • 30. PRESENTED BY Roadmap: DAG AI.MODELSET resnet18a TF GPU0 ... AI.MODELSET resnet18b TORCH GPU1 ... AI.TENSORSET img ... AI.DAGRUN SCRIPTRUN preproc normalize img ~in~ MODELRUN resnet18a INPUTS ~in~ OUTPUTS ~out1~ MODELRUN resnet18b INPUTS ~in~ OUTPUTS ~out2~ SCRIPTRUN postproc ensamble INPUTS ~out1~ ~out2~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label • Parallel multi-device execution • One queue per device normalize GPU0 GPU1 ensamble probstolabels
  • 31. PRESENTED BY Roadmap: DAG AI.DAGRUNASYNC SCRIPTRUN preproc normalize img ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label => 12634 AI.DAGRUNINFO 1264 => RUNNING [... more details …] AI.DAGRUNINFO 1264 => DONE • AI.DAGRUNASYNC: • non-blocking, returns an ID • ID can be used to later retrieve status + keyspace notification
  • 32. PRESENTED BY • Pervasive use of streams Roadmap: streams AI.DAGRUN SCRIPTRUN preproc normalize instream ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream AI.DAGRUNASYNC SCRIPTRUN preproc normalize instream ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream
  • 33. PRESENTED BY • Runtime from Microsoft • ONNX • exchange format for NN • export from many frameworks (MXNet, CNTK, …) • ONNX-ML • ONNX for machine learning models (RandomForest, SVN, K- means, etc) • export from scikit-learn Roadmap: ONNXRuntime backend
  • 34. PRESENTED BY • Auto-batching • Transparent batching of requests • Queue gets rearranged according to other analogous requests in the queue, time to response • Only if different clients or async • Time-to-response • ~Predictable in DL graphs • Can reject request if TTR > estimated time of queue + run Roadmap: Auto-batching concat along 0-th dim run batch Run if TTR < 800ms analyze queue Reject Run if TTR < 800ms OK A B
  • 35. PRESENTED BY • Dynamic loading of backends: • don’t load what you don’t need (especially on GPU) • Monitoring with AI.INFO: • more granular information on running operations, commands, memory Roadmap: Misc AI.CONFIG LOADBACKEND TF CPU AI.CONFIG LOADBACKEND TF GPU AI.CONFIG LOADBACKEND TORCH CPU AI.CONFIG LOADBACKEND TORCH GPU ... AI.INFO AI.INFO MODEL key AI.INFO SCRIPT key ...
  • 36. PRESENTED BY • Advanced monitoring • Health • Performance • Model metrics (reliability) • A/B testing • Module integration, e.g. • RediSearch (FAISS) • Anomaly detection with RedisTS • Training/fine-tuning in-Redis RedisAI Enterprise
  • 37. PRESENTED BY • [tensor]werk • Sherin Thomas, Rick Izzo, Pietro Rota • RedisLabs • Guy Korland, Itamar Haber, Pieter Cailliau, Meir Shpilraien, Mark Nunberg, Ariel Madar • Orobix • Everyone! • Manuela Bazzana, Daniele Ciriello, Lisa Lozza, Simone Manini, Alessandro Re Acknowledgements
  • 38. PRESENTED BY Development tools for the data- defined software era 1. Launching April 2019 2. Looking for investors, users, contributors Projects: - RedisAI: enterprise-grade runtime (with Redis Labs) - Hangar: version control for tensor data (mid April, BSD) Hit me up! luca@tensorwerk.com tensorwerk.com
  • 39. PRESENTED BY Chris Fregly, Pipeline.ai • End-to-End Deep Learning from Research to Production • Any Cloud, Any Framework, Any Hardware • Free Community Edition: https://community.pipeline.ai