Introduction to Polyaxon

Introduction to Polyaxon
Yu Ishikawa

Agenda
- Why do we need Polyaxon?
- What is Polyaxon?
- How does Polyaxon work?
- Demo
- Summary

Objectives to introduce Polyaxon
- Make the lead time of experiments as short as possible.
- Make the financial cost to train models as cheap as possible.
- Make the experiments reproducible.
Experiment Phase Operating Phase
Problem
Setting
Collecting
Data
Experiment
s
Off-line
Evaluation
Serving
Models
On-line
Evaluation
Retrain
Model
Off-line
Evaluation
Productionize
ML system
Productionize PhaseML
Workflow
Polyaxon’s role

Why do we need Polyaxon?
- We are not able to manage experiments as team today.
- The cost of experiments is expensive in terms of the financial cost and time. There is room to
improve the efficiency and productivity.
- Setting experiment environments can be tough for ML engineers. Moreover, the environments tend
not to be reproducible. Taking over other member’s tasks can be expensive. As well as, we can not
manage the training process as team.
- It takes a long time for python ML libraries like sklearn to do hyperparameter search, since python is
basically not good at scalability.

Agenda
- Why do we need Polyaxon?
- What is Poyaxon?
- How does Polyaxon work?
- Demo
- Summary

What is Polyaxon?
- An open source platform for reproducible machine learning at scale.
- https://polyaxon.com/
- Features
- Notebook
- Hyperparameter search
- Powerful workspace
- User management
- Dynamic resources allocation
- Dashboard
- Versioning

Notebook environment with Jupyter
---
version: 1
kind: notebook
build:
image: tensorflow/tensorflow:1.4.1-py3
build_steps:
- pip3 install jupyter
$ Polyaxon notebook start -f polyaxon_notebook.yml
New version of CLI (0.3.5) is now available. To upgrade run:
pip install -U polyaxon-cli
Notebook is being deployed for project `quick-start`
It may take some time before you can access the notebook.
Your notebook will be available on:
http://35.184.217.84:80/notebook/root/quick-start/
- Polyaxon enables us to launch a jupyter environment with one command. As
well as we can define the environment with docker and some commands in a
YAML file.
- We can reproduce the notebook experiments easily.

Hyperparameter tuning with Polyaxon
- Polyaxon supports some hyperparameter tuning methods:
- Grid search
- Random search
- Bayesian optimization
- Early stopping
- Hyperband
- We can control the concurrency of hyperparameter tuning with YAML file.
- We can reproduce hyperparameter tuning jobs as well.

Hyperparameter tuning with high concurrency
---
version: 1
kind: group
hptuning:
concurrency: 5
matrix:
learning_rate:
linspace: 0.001:0.1:5
dropout:
values: [0.25, 0.3]
activation:
values: [relu, sigmoid]
declarations:
batch_size: 128
num_steps: 500
num_epochs: 1
build:
image: tensorflow/tensorflow:1.4.1-py3
build_steps:
- pip3 install --no-cache-dir -U polyaxon-helper
run:
cmd: python3 model.py --batch_size={{ batch_size }}
--num_steps={{ num_steps }}
--learning_rate={{ learning_rate }}
--dropout={{ dropout }}
--num_epochs={{ num_epochs }}
--activation={{ activation }}
$ Polyaxon notebook start -f
polyaxon_notebook.yml
New version of CLI (0.3.5) is now available. To
upgrade run:
pip install -U polyaxon-cli
Creating an experiment group with the following
definition:
---------------- -----------------
Search algorithm grid
Concurrency 5 concurrent runs
Early stopping deactivated
---------------- -----------------
Experiment group 1 was created
polyaxon_gridsearch.yml

Dashboard ~ Metrics Visualization

Hyperparameter tuning with a single machine
Many CPU cores machine
Memory
data
CPU
Core: Train with parameter set A
Core: Train with parameter set B
Core: Train with parameter set C
Core: Train with parameter set X
...
For instance, scikit-learn’s GridSearchCV enables us to run experiments in parallel. However, the number
of process is based on the number of CPU cores. For instance, a 64 CPU cores machine can have up to
64 concurrencies.

Hyperparameter tuning of Polyaxon
Polyaxon on k8s
Polyaxon core
Node A
Node B
Node C
Training Code
Upload & run
Build
Pod: parameter set A
Pod: parameter set B
Pod: parameter set C
Pod: parameter set D
Pod: parameter set E
Pod: parameter set F
Schedule
The more the number of nodes in k8s cluster, the more the number of process to
train is. There is no constraint of parallelism.

Even one experiments can be shorter
Experiments Evaluation
By leveraging the multiple nodes on k8s, we can shorten the experiment time of 1
experiment with high concurrency.
Single machine
Polyaxon
t
Reduce training time

Auto-scalable & preemptible node pool with polyaxon
Polyaxon on k8s
Node pool for
polyaxon core
Node Node Node
Node pool for
experiments

Polyaxon on k8s
Node pool for
polyaxon core
Node Node Node
Node pool for
experiments
Training Code
of experiment X
Upload & run
Concurrency:
- 100
Requests:
- CPU: 1
- Memory: 2GB

Polyaxon on k8s
Node pool for
polyaxon core
Node Node Node
Node pool for
experiments
Preemptible node Preemptible node Preemptible node
Training Code
of experiment X
Upload & run
Concurrency:
- 100
Requests:
- CPU: 1
- Memory: 2GB
Automatically launch new preemptible instances

Polyaxon on k8s
Node pool for
polyaxon core
Node Node Node
Node pool for
experiments
Training Code
of experiment Y
Upload & run
Concurrency:
- 50
Requests:
- CPU: 1
- Memory: 1GB
Preemptible node Preemptible node Preemptible node

Polyaxon on k8s
Node pool for
polyaxon core
Node Node Node
Node pool for
experiments
Preemptible node Preemptible node
Preemptible node Preemptible node
Preemptible node
Training Code
of experiment Y
Upload & run
Concurrency:
- 50
Requests:
- CPU: 1
- Memory: 1GB

Preemptible instance/GPU/TPU pricing
Preemptible instance Preemptible GPU
Preemptible TPU

Regular instance cost vs Preemptible instance cost
Running cost
t
Regular instance
Preemptible instance
- We can reduce the cost of training models by
leveraging preemptible instances with
polyaxon.
- Polyaxon enables us to use preemptible node
pool for experiments.
- Since polyaxon automatically scale the node
pool with GKE, we don’t need to hold static
instances for experiments.
Reduced
cost

It takes a longer time to do experiments with a single machine sequentially,
because python ML library like sklearn is not basically scalable.
Multiple Experiments with a single machine
t

Polyaxon enables us to easily run multiple experiments in parallel on k8s. We
don’t need to wait for each experiments to move on to the next one.
Multiple Experiments with polyaxon
We can shorten the total experiments time by
the parallelism of Polyaxon.
t

Sequential experiments cost vs parallel experiments cost
Running cost
t
- Essentially speaking, the costs of instances
should be the same, since the cost of CPU
usage is linear with running time.
- However, we should not overlook labor costs
while experiments. Waiting for experiments is
time and money wasting. Time is money!!
- We can reduce the total cost by shortening the
total experiments time.
Sequential experiments
Experiments in parallel
Labor cost
t
Reduced cost

Power of multiple preemptible nodes
- The cost of preemptible n1-standard-64 x 10 nodes x 2 hours with 640
concurrencies. It should be cheap!
- $12.8 = $0.64 * 10 * 2
- Even If it takes about 20 minutes to run 1 parameter set of a training job, we
can run about 3840 parameter sets of a training job for just 2 hours with such
a cheap cost.
We can achieve the objectives:

Demo
- Notebook
- Job
- Experiment
- Hyperparameter tuning at scale

Summary
- We can definitely achieve the objectives with polyaxon on GKE.
- Make the experiments reproducible.
- All we ML engineers have to do is:
- Making the training code in python as usual, and
- Defining the YAML files to do experiments.
- What’s next?
- Supporting preemptible GPUs / TPUs.

Appendix A: Links
- Polyaxon
- https://polyaxon.com/
- Documentation
- https://docs.polyaxon.com/
- Examples
- https://github.com/polyaxon/polyaxon-quick-start
- https://github.com/polyaxon/deep-learning-with-python-notebooks-on-polyaxon
- https://github.com/polyaxon/polyaxon-examples

Appendix B: Architecture of Polyaxon

Introduction to Polyaxon

More Related Content

Introduction to Polyaxon