RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Talk #0: Introductions and Meetup Announcements By Chris Fregly and Antje Barth
Talk #1: Ray Overview, Ray AI Runtime on AWS using Amazon SageMaker, EC2, EMR, EKS by Chris Fregly, Principal Specialist Solution Architect, AI and Machine Learning @ AWS
Talk #2: Deep-dive Blueprints for Amazon Elastic Kubernetes Service (EKS) including Ray and Spark by Apoorva Kulkarni, Sr. Specialist Solution Architect, Containers and Kubernetes @ AWS
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Report
Share
Report
Share
1 of 37
More Related Content
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
1. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray AI Runtime (AIR) on AWS:
Distributed ML with Amazon SageMaker,
EC2, EMR, and EKS!
GitHub repo:
https://github.com/data-science-on-aws
Recordings:
https://youtube.datascienceonaws.com
Book:
https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/
3. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
What is Ray?
3
Friction-less transition from research to production
Encourages iterative development and debugging
Env management: “conda as a service”, auto-syncs files across cluster
Makes TensorFlow/PyTorch/Scikit/Everything as easy to scale as Spark
13. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Frictionless transition from research to production
13
Local
development
Remote cluster
production job
Remote cluster
development
14. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Local development: local laptop and conda
14
pytorch-huggingface-clothing.py # train.py
--num_train_epochs 1 # hyper-parameter
--max_length 64 # hyper-parameter
--num_workers 4 # number of workers (ie. CPUs or GPUs)
--model_name_or_path roberta-base # base BERT model
--train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv
--validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
15. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Remote development: cluster and cluster-scope conda
15
ray submit cluster.yaml # run the same python script on Ray cluster!
pytorch-huggingface-clothing.py # train.py
--num_train_epochs 10 # hyper-parameter
--max_length 64 # hyper-parameter
--num_workers 64 # number of workers
--model_name_or_path roberta-base # base BERT model
--train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv
--validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
16. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Remote cluster production jobs: specify conda yaml per job
16
ray job submit
--working-dir . # Copy everything from this directory and below
--runtime-env job-pytorch-huggingface-clothing-runtime.yaml # Conda env yaml
--address http://127.0.0.1:8265 -- # port forward to cluster
python pytorch-huggingface-clothing.py # train.py
--num_train_epochs 1 # hyper-parameter
--max_length 64 # hyper-parameter
--num_workers 4 # number of workers (ie. CPUs or GPUs)
--model_name_or_path roberta-base # base BERT model
--train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv
--validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
17. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray environment management (“conda as a service”)
17
21. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Data - not (yet) a DataFrame abstraction (ie. no joins)
21
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
22. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Modin: Pandas on Ray
22
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
23. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
RayDP: Spark on Ray
23
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
24. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Tune
24
from ray import tune
# 1. Define an objective function.
def objective(config):
score = config["a"] ** 2 + config["b"]
return {"score": score}
# 2. Define a search space.
search_space = {
"a": tune.grid_search([0.001, 0.01, 0.1, 1.0]),
"b": tune.choice([1, 2, 3]),
}
# 3. Start a Tune run and print the best result.
analysis = tune.run(objective, config=search_space)
print(analysis.get_best_config(metric="score", mode="min"))
25. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray RLlib: Initial beachhead for Ray
25
Ray Reinforcement Learning Ray Data & Ray Train/Tune
26. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve
26
Serving Framework on Ray
Python-native, supports any Python code, ML framework, etc
Compose multiple ML models into a deployment graph
27. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: NLP inference pipeline with HuggingFace
27
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
28. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: combine 2 NLP models, average the predictions
28
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
30. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: build DAG with http inputs
30
InputNode() - http
inputs to the DAG
bind() - Graph
building API on
decorated body
serve.run() - Run
deployment graph
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
31. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: submit long-running “serve job” to cluster
31
ray job submit
--working-dir .
--runtime-env job-serve-runtime.yaml
-- python serve-dag-huggingface.py
32. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: run sample predictions with an http client
32
import requests
input_text_list = ["Ray Serve is great!", "Serving frameworks without DAG
support are not great."]
for input_text in input_text_list:
prediction = requests.get("http://<cluster_host>:8080/invocations",
data=input_text).text
print("Prediction for '{}' is {}".format(input_text, prediction))
33. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Workflows
33
High-performance, durable application
workflows
Large-scale workflows
(ie. ML and data pipelines)
Long-running business workflows
(when used with Ray Serve)
read_data() preprocessing() train() validate()
35. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Workflows - Initialize storage, setup and run workflow
35
Workflow.run() -
Start workflow DAG
Setup workflow DAG
Workflow execution is
durably logged to storage
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/workflow