Build, Train, and Deploy ML Models at Scale
- 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
Build, train and deploy Machine Learning
models at scale
- 3. Application
Services
Platform
Services
Frameworks
& Infrastructure
API-driven services: Vision, Language & Speech Services, Chatbots
AWS ML Stack
h t t p s : / / m l . a w s
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / a - m a p - f o r - m a c h i n e - l e a r n i n g - o n - a w s - a 2 8 5 f c d 8 d 9 3 2
Deploy machine learning models with high-performance machine learning algorithms,
broad framework support, and one-click training, tuning, and inference.
Develop sophisticated models with any framework, create managed, auto-scaling
clusters of GPUs for large scale training, or run prediction
on trained models.
- 4. Application
Services
Platform
Services
Frameworks
& Infrastructure
API-driven services: Vision, Language & Speech Services, Chatbots
Deploy machine learning models with high-performance machine learning algorithms,
broad framework support, and one-click training, tuning, and inference.
Develop sophisticated models with any framework, create managed, auto-scaling
clusters of GPUs for large scale training, or run prediction
on trained models.
AWS ML Stack
h t t p s : / / m l . a w s
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / a - m a p - f o r - m a c h i n e - l e a r n i n g - o n - a w s - a 2 8 5 f c d 8 d 9 3 2
- 5. Data Visualization &
Analysis
Business Problem
ML problem framing Data Collection
Data Integration
Data Preparation &
Cleaning
Feature Engineering
Model Training &
Parameter Tuning
Model Evaluation
Are Business
Goals met?
Model Deployment
Monitoring &
Debugging
YesNo
DataAugmentation
Feature
Augmentation
The Machine Learning Process
Re-training
Predictions
- 6. Amazon SageMaker
Pre-built
notebooks for
common
problems
K-Means Clustering
Principal Component Analysis
Neural Topic Modelling
FactorizationMachines
Linear Learner
XGBoost
Latent DirichletAllocation
Image Classification
Seq2Seq,
And more!
ALGORITHMS
Apache MXNet, Chainer
TensorFlow, PyTorch, scikit-learn
FRAMEWORKS Set up and manage
environments for training
Train and tune
model (trial and
error)
Deploy model
in production
Scale and manage the
production environment
Built-in, high-
performance
algorithms
Build
- 7. Amazon SageMaker
Pre-built
notebooks for
common
problems
K-Means Clustering
Principal Component Analysis
Neural Topic Modelling
FactorizationMachines
Linear Learner
XGBoost
Latent DirichletAllocation
Image Classification
Seq2Seq,
And more!
ALGORITHMS
Apache MXNet, Chainer
TensorFlow, PyTorch, scikit-learn
FRAMEWORKS Set up and manage
environments for training
Train and tune
model (trial and
error)
Deploy model
in production
Scale and manage the
production environment
Built-in, high-
performance
algorithms
Build
- 9. Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
DeployTrainBuild
- 10. Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Deploy
Model compilation
Elastic inference
Inference pipelines
TrainBuild
P3DN, C5N
TensorFlow on 256 GPUs
Resume HPO tuning job
New built-in algorithms
scikit-learn environment
Model marketplace
Search
Git integration
Elastic inference
- 13. The Amazon SageMaker API
• Python SDK orchestrating all Amazon SageMaker activity
• High-level objects for algorithm selection, training, deploying,
automatic model tuning, etc.
• Spark SDK (Python & Scala)
• AWS CLI: ‘aws sagemaker’
• AWS SDK: boto3, etc.
- 14. Model Training (on EC2)
Model Hosting (on EC2)
Trainingdata
Modelartifacts
Training code Helper code
Helper codeInference code
GroundTruth
Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
- 17. Built-in algorithms
orange: supervised, yellow: unsupervised
Linear Learner: regression, classification Image Classification: Deep Learning (ResNet)
Factorization Machines: regression, classification,
recommendation
Object Detection (SSD): Deep Learning
(VGG or ResNet)
K-Nearest Neighbors: non-parametric regression and
classification
Neural Topic Model: topic modeling
XGBoost: regression, classification, ranking
https://github.com/dmlc/xgboost
Latent Dirichlet Allocation: topic modeling (mostly)
K-Means: clustering Blazing Text: GPU-based Word2Vec,
and text classification
Principal Component Analysis: dimensionality
reduction
Sequence to Sequence: machine translation, speech
to text and more
Random Cut Forest: anomaly detection DeepAR: time-series forecasting (RNN)
Object2Vec: general-purpose embedding IP Insights: usage patterns for IP addresses
Semantic Segmentation: Deep Learning
- 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Blazing Text
https://dl.acm.org/citation.cfm?id=3146354
- 20. Demo:
Text Classification with BlazingText
https://github.com/awslabs/amazon-sagemaker-
examples/tree/master/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia
- 21. XGBoost
• Open Source project
• Popular tree-based algorithm
for regression, classification
and ranking
• Builds a collection of trees.
• Handles missing values
and sparse data
• Supports distributed training
• Can work with data sets larger
than RAM
https://github.com/dmlc/xgboost
https://xgboost.readthedocs.io/en/latest/
https://arxiv.org/abs/1603.02754
- 24. Demo:
Keras/TensorFlow CNN on CIFAR-10
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-
sdk/tensorflow_keras_cifar10/tensorflow_keras_CIFAR10.ipynb
- 25. Demo:
Sentiment analysis with Apache MXNet
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-
sdk/mxnet_sentiment_analysis_with_gluon.ipynb
- 27. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
- 28. Use your own models with AWS DeepLens
• AWS DeepLens can run TensorFlow, Caffe and Apache MXNet
models
• Inception
• MobileNet
• NasNet
• ResNet
• Etc.
• Train or fine-tune your model on Amazon SageMaker
• Deploy to AWS DeepLens with AWS Greengrass
- 29. Run inference
and local actions
on device
Send insights
to the Cloud
Generic
Deploy model
and Lambda function
Write inference code
Setup Greengrass
Architecture
Train model
- 31. Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train Deploy
- 33. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Getting started
http://aws.amazon.com/free
https://ml.aws
https://aws.amazon.com/sagemaker
https://github.com/aws/sagemaker-python-sdk
https://github.com/aws/sagemaker-spark
https://github.com/awslabs/amazon-sagemaker-examples
https://gitlab.com/juliensimon/ent321
https://medium.com/@julsimon
https://gitlab.com/juliensimon/dlnotebooks
- 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
https://medium.com/@julsimon
Thank you!