SlideShare a Scribd company logo
1
Coding the
Ian Foster
"When the network is as
fast as the computer’s
internal links, the machine
disintegrates across the
net into a set of special-
purpose appliances.”
-- George Gilder, 2001
2
"When the network is as
fast as the computer’s
internal links, the machine
disintegrates across the
net into a set of special-
purpose appliances.”
-- George Gilder, 2001
3
4
Hollow core fiber:
99.7% speed of light
(1.46x faster than fiber)
73.7 terabits per second
“network is as fast as the computer’s internal links”
https://doi.org/10.1007/978-3-319-31903-2_8
Global IP traffic, wired and wireless
Communication technologies continue to evolve
5G is transforming communications
doi:10.1038/nphoton.2013.45
Innovation continues in the lab
We can compute anywhere!
Cheapest
Greenest
Nearest to data
But are we really free?
Time = Tcompute + 2 Tlatency
Uphill in all directions
"When the network is as
fast as the computer’s
internal links, the machine
disintegrates across the
net into a set of special-
purpose appliances.”
-- George Gilder, 2001
7
Source: http://bit.ly/2SDGHzT
“a set of special-purpose appliances”
FPGAs
9
Tesla self-driving chip: 2.5 Gpixel/s, 72 Top/s, 72 W
“a set of special-purpose appliances”
“Cloud computing 5x to 10x improved price point [relative to Enterprise]”
— James Hamilton, http://bit.ly/2E78Wi1
Why?
• Improved utilization
• Economies of scale
in operations
• More power efficient
• Optimized software
LBNL-1005775
Google hyperscale data center, St. Ghislain, Belgium
Modular
data
center
Zero-carbon cloud:
Reduce energy cost
and energy carbon
footprint to 0
Andrew Chien DOI 10.1109/IPDPS.2016.96
The performance landscape becomes peculiar
A program can run on two computers
C1 takes 0.01 seconds
C2 takes 0.005 seconds
Which is faster?
13
The performance landscape becomes peculiar
A program can run on two computers
C1 takes 0.01 seconds
C2 takes 0.005 seconds
Which is faster?
The answer depends on their location.
Say C1 is adjacent and C2 is 500 km distant
t(C1) = T1 = 0.01 sec
t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec
14
The performance landscape becomes peculiar
A program can run on two computers
C1 takes 0.01 seconds
C2 takes 0.005 seconds
Which is faster?
The answer depends on their location.
Say C1 is adjacent and C2 is 500 km distant
t(C1) = T1 = 0.01 sec
t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec
15
The apparent speed of a computer depends on its location;
the apparent location of a computer depends on its speed
Continuum
A set of elements such that between any two of them there is a third element
[dictionary.com]
For example, the computing continuum:
Size Nano Micro Milli Server Fog Campus Facility
Example Adafruit
Trinket
Particle.io
Boron
Array of
Things
Linux Box Co-located
Blades
1000-node
cluster
Datacenter
Memory 0.5K 256K 8GB 32GB 256G 32TB 16PB
Network BLE WiFi/LTE WiFi/LTE 1 GigE 10GigE 40GigE N*100GigE
Cost $5 $30 $600 $3K $50K $2M $1000M
IoT/Edge HPC/CloudFog
Credit: Pete Beckman,
beckman@anl.gov. See PAISE, Friday
The
space-time continuum
“space by itself, and time by
itself, are doomed to fade
away into mere shadows,
and only a kind of union of
the two will preserve an
independent reality …”
H. Minkowski, 1908
17
Space-time diagram
https://en.wikipedia.org/wiki/Spacetime
18
500 km
2.5 ms
The spacetime continuum in computational systems
5 ms
7.5 ms
10 ms
0 km
C2
C1
Misquoting Minkowski: “Henceforth, location for itself, and speed for itself shall completely
reduce to a mere shadow, and only some sort of union of the two shall preserve independence."
The behaviors of the two
computers are indistinguishable
t(C1) = T1 = 0.01 sec
t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec
T1T2
19
0km
(Illinois)
2000 km
(Virginia)
10 ms
A real example: High energy physics trigger analysis
T1 = 2 seconds
on CPU
(not to scale)
T2 = 30 msec
on FPGA
Local: 2000 msec
Remote: 30 + 10 + 10 = 50 msec
40x acceleration
40 ms
50 ms
Nhan Tran, FermiLab, et al. arXiv:1904.08986
Reasoning about the computing continuum
(a) Assumptions
A1: N identical consumers, each of which
requests one compute unit per sec,
distributed X secs apart
A2: Infinite bandwidth: i.e., only latency
A3: A computer takes T secs to complete
a compute unit
A4: A compute center containing Z
computers is faster by a factor of √Z
20
X
X
Max time is:
On N:
𝑻
𝑵
+
𝑵
𝟐
𝐗
Local: T
E.g., N = 100, T=0.01, X=0.0001:
On N:
0.01
10
+
100
2
0.0001
= 0.001 + 0.00071 = 0.00171 s
Local: 0.01 sec
22
Reasoning about the computing continuum
(b) Without response time bounds
N
2
X
N
2
X
N
2
X
We want to know D for which:
T
𝑠𝑖𝑧𝑒
+ 2D ≤ B
As size is πD2/X2, we want to solve:
T
πD2/X2
+ 2D = B
With B=0.01, T=0.001, X=0.0001 sec:
D = 0.004964 sec (~1000 km)
Then:
Size = πD2/X2 = 7854
Max processing time is
2 × 0.004964 + 0.001/ 7854
= 0.01 seconds
24
D
From A1, there are πD2/X2 consumers
within distance D of a compute center
Reasoning about the computing continuum
(c) With response time bound, B
Reasoning about the computing continuum
(d) Discussion
The model emphasizes the importance of aggregation
The model can surely be improved:
• Empirical data on scaling of cost and speed with size
• Data transfer costs
• Empirical data on workloads
Optimal solutions will likely involve compute centers of multiple
sizes
26Source: LBNL-2001025
Small and midsize data centers: Server intensity
Coding the continuum
Code: verb.
1) to arrange or enter in a code
27
Coding the continuum
Code: verb.
1) to arrange or enter in a code
2) to write code for
28
Coding the continuum
Code: verb.
1) to arrange or enter in a code
2) to write code for
Now that the machine has
disintegrated across the net,
how do we program it?
29
Coding the continuum
Code: verb.
1) to arrange or enter in a code
2) to write code for
Now that the machine has
disintegrated across the net,
how do we program it?
30
Continuum-aware
programming model
Function
fabric
Data
fabric
Trust
fabric
Cost
map
31
Coding the continuum: Serial crystallography
doi: 10.1038/nature09750
1 image/20 msec
1K image/15 sec
26K images/7 min
6 MB,
5 msec
6 GB,
1 sec
160 GB
60 sec
0.2-1 TB
3000 sec
Multiple chips @ 7 min each
For each sample:
• Image crystals at ~50 Hz:
• Validate each image
• After 1000, quality control
• After 26000, full analysis
• If good:
• Determine crystal structure
• Return crystal structure
Coding the continuum: Serial crystallography
1 image/20 msec
1K image/15 sec
26K images/7 min
Multiple chips @ 7 min each
1 msec = 50 km
200 msec = 10 000 km
12 000 msec = 600 000 km
[moon = 384 000 km]600K msec
= 30 Mkm
[L1 = 1.5M km]
Coding the continuum: Serial crystallography
6 MB,
5 msec
6 GB,
1 sec
160 GB
60 sec
0.2-1 TB
3000 sec
Advanced Photon Source
Argonne Leadership Computng Facility
1 km
10 μsec
RTT
Similar needs arise across modern (AI-enabled) science
Scientific instruments
Major user facilities
Laboratories
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memoization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
Scientists, engineers
Expert input
Goal setting
…
Industry, academia
New methods
Open source codes
AI accelerators
…
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
AI
Methods
Data
Models
Accelerators
Compute
Agile
Infrastructure
Surrogates
Agile Services
Data
mgmt
Operating
system
Portability
Compilers
Runtime
system
Workflow
Automation
Prog.
envs.
Languages
Model
creation
Libraries
Resource
mgmt
Authen/Access
Learned Function Accelerators (LFAs)
36
Coding the continuum: Closed solution
37https://read.acloud.guru/aws-greengrass-the-missing-manual-2ac8df2fbdf4
Coding the continuum:
Elements of an open solution
Zhuozhao Li Tyler Skluzacek Steve Tuecke Anna Woodard Logan Ward
Rachana Yadu Babuji Ben Blaiszik Kyle Chard Ryan Chard
Ananthakrishnan
Thanks to colleagues, especially:
funcX
Model
registry
Flows
Cost
map
Write
programs
Function
fabric
Data
fabric
Trust
fabric
DLHub
Data
services
Auth
SCRIMP
Automate
Automate
Coding the continuum:
Elements of an open solution
https://arxiv.org/pdf/1905.02158 http://parsl-project.org
funcX
Model
registry
Flows
Cost
map
Write
programs
Function
fabric
Data
fabric
Trust
fabric
DLHub
Data
services
Auth
SCRIMP
Automate
Coding the continuum:
Elements of an open solution
Portable code Any access Any computer
Python
Docker, Shifter,
Singularity
Clusters,
clouds, HPC,
accelerators
SSH, Globus,
cluster or HPC
scheduler
funcX
Model
registry
Flows
Cost
map
Write
programs
Function
fabric
Data
fabric
Trust
fabric
DLHub
Data
services
Auth
SCRIMP
funcX: Transform clouds, clusters, and supercomputers
into high-performance function serving systems
41
EP(x) EP(x) EP(x) EP(x)
funcX
Simply deploy funcX endpoint to transform
a computer into a function serving system
repo2dockerRegister
EP(x)
f(x) g(x)
h(x) k(x)
f(x) g(x)
EP(x) h(x) k(x)
f(x), … +
depend-
encies
42
EP(x) EP(x) EP(x) EP(x)
f(x)
g(x)
h(x)
k(x)
repo2dockerRegister
f(x) g(x)
h(x) k(x)
Registration
f(x), g(x), … + dependencies
EP(x) registry
Execution
f(x), …
[1,2,3 … n]
Simply deploy funcX endpoint to transform
a computer into a function serving system
funcX: Transform clouds, clusters, and supercomputers
into high-performance function serving systems
repo2dockerRegister
EP(x)
f(x) g(x)
h(x) k(x)
f(x) g(x)
EP(x) h(x) k(x)
f(x), … +
depend-
encies
Latency (s) for functions running on ALCF Cooley cluster, submitted from login node
Strong scaling
Weak scaling
44
Common FaaS systems, compared
Automate
Coding the continuum:
Elements of an open solution
funcX
Model
registry
Flows
Cost
map
Write
programs
Function
fabric
Data
fabric
Trust
fabric
DLHub
Data
services
Auth
SCRIMP
Incremental construction of a personalized cost map
• Build black-box performance models from observed
execution times for different codes on different platforms
• Transfer learning across codes, problem sizes, and
hardware platforms
• Experiment design to choose experiments that maximize
reduction in uncertainty
• Evolve models over time as codes and platforms change
• Use models for instance selection and scheduling
46
Virtual CPUs
RAM(GB)
Example: A cost map for bioinformatics applications
on different AWS instance types IndexBam performs better on
compute-optimized instances. Poorly
chosen experiments mislead the model
On average, within 30% of final error after 4 experiments and within 2.3% after 6
Coding the continuum:
Elements of an open solution
funcX
Model
registry
Flows
Cost
map
Write
programs
Function
fabric
Data
fabric
Trust
fabric
DLHub
Data
services
Auth
SCRIMP
Detect and respond to events
 E.g., in HPC file systems: FSMon (Arnab Paul et al.)
Invoke RESTful services, and accept user input
Manage short- and long-lived activities
Automate
Automate
Flow automation in a neuroanatomy automation
1. Image 2. Acquire 3. Pre-process
5. User:
Validate
& input
6. Reconstruct8. Visualize
9. Science!
Lab Server 1 Lab Server 2
7. Publish
Advanced Photon Source
4. Preview & center
ALCF
Compute LabUChicago
Automate
Coding the continuum:
Elements of an open solution
Cloud-hosted services support data lifecycle events
 Cloud for high-reliability, modest-latency actions
 Integrated OAuth-based security with delegation
funcX
Model
registry
Flows
Cost
map
Write
programs
Function
fabric
Data
fabric
Trust
fabric
DLHub
Data
services
Auth
SCRIMPAutomate
Automate
Coding the continuum:
Elements of an open solution
dlhub.orghttps://arxiv.org/abs/1811.11213
Paper @ Session 7, 1:30pm today
funcX
Model
registry
Flows
Cost
map
Write
programs
Function
fabric
Data
fabric
Trust
fabric
DLHub
Data
services
Auth
SCRIMP
Coding the Continuum: Thanks for support
US Department of Energy
US National Science Foundation
US National Institutes of Health
US National Institute of
Standards and Technology
Amazon Web Services
Globus subscribers
Coding the [location-
speed] continuum
Code: verb:
1) to arrange or enter in a code
2) to write code for
“Henceforth, location for itself, and speed
for itself shall completely reduce to a
mere shadow, and only some sort of union
of the two shall preserve independence.”
labs.globus.org – dlhub.org – globus.org – parsl-project.orgfoster@anl.gov
Distribute computational tasks across a heterogeneous computing fabric
“the machine disintegrates across the net
into a set of special-purpose appliances”
T
πD2/X2
+ 2D = B

More Related Content

Coding the Continuum

  • 2. "When the network is as fast as the computer’s internal links, the machine disintegrates across the net into a set of special- purpose appliances.” -- George Gilder, 2001 2
  • 3. "When the network is as fast as the computer’s internal links, the machine disintegrates across the net into a set of special- purpose appliances.” -- George Gilder, 2001 3
  • 4. 4 Hollow core fiber: 99.7% speed of light (1.46x faster than fiber) 73.7 terabits per second “network is as fast as the computer’s internal links” https://doi.org/10.1007/978-3-319-31903-2_8 Global IP traffic, wired and wireless Communication technologies continue to evolve 5G is transforming communications doi:10.1038/nphoton.2013.45 Innovation continues in the lab
  • 5. We can compute anywhere! Cheapest Greenest Nearest to data
  • 6. But are we really free? Time = Tcompute + 2 Tlatency Uphill in all directions
  • 7. "When the network is as fast as the computer’s internal links, the machine disintegrates across the net into a set of special- purpose appliances.” -- George Gilder, 2001 7
  • 8. Source: http://bit.ly/2SDGHzT “a set of special-purpose appliances” FPGAs
  • 9. 9 Tesla self-driving chip: 2.5 Gpixel/s, 72 Top/s, 72 W
  • 10. “a set of special-purpose appliances” “Cloud computing 5x to 10x improved price point [relative to Enterprise]” — James Hamilton, http://bit.ly/2E78Wi1 Why? • Improved utilization • Economies of scale in operations • More power efficient • Optimized software LBNL-1005775
  • 11. Google hyperscale data center, St. Ghislain, Belgium Modular data center
  • 12. Zero-carbon cloud: Reduce energy cost and energy carbon footprint to 0 Andrew Chien DOI 10.1109/IPDPS.2016.96
  • 13. The performance landscape becomes peculiar A program can run on two computers C1 takes 0.01 seconds C2 takes 0.005 seconds Which is faster? 13
  • 14. The performance landscape becomes peculiar A program can run on two computers C1 takes 0.01 seconds C2 takes 0.005 seconds Which is faster? The answer depends on their location. Say C1 is adjacent and C2 is 500 km distant t(C1) = T1 = 0.01 sec t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec 14
  • 15. The performance landscape becomes peculiar A program can run on two computers C1 takes 0.01 seconds C2 takes 0.005 seconds Which is faster? The answer depends on their location. Say C1 is adjacent and C2 is 500 km distant t(C1) = T1 = 0.01 sec t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec 15 The apparent speed of a computer depends on its location; the apparent location of a computer depends on its speed
  • 16. Continuum A set of elements such that between any two of them there is a third element [dictionary.com] For example, the computing continuum: Size Nano Micro Milli Server Fog Campus Facility Example Adafruit Trinket Particle.io Boron Array of Things Linux Box Co-located Blades 1000-node cluster Datacenter Memory 0.5K 256K 8GB 32GB 256G 32TB 16PB Network BLE WiFi/LTE WiFi/LTE 1 GigE 10GigE 40GigE N*100GigE Cost $5 $30 $600 $3K $50K $2M $1000M IoT/Edge HPC/CloudFog Credit: Pete Beckman, beckman@anl.gov. See PAISE, Friday
  • 17. The space-time continuum “space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality …” H. Minkowski, 1908 17 Space-time diagram https://en.wikipedia.org/wiki/Spacetime
  • 18. 18 500 km 2.5 ms The spacetime continuum in computational systems 5 ms 7.5 ms 10 ms 0 km C2 C1 Misquoting Minkowski: “Henceforth, location for itself, and speed for itself shall completely reduce to a mere shadow, and only some sort of union of the two shall preserve independence." The behaviors of the two computers are indistinguishable t(C1) = T1 = 0.01 sec t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec T1T2
  • 19. 19 0km (Illinois) 2000 km (Virginia) 10 ms A real example: High energy physics trigger analysis T1 = 2 seconds on CPU (not to scale) T2 = 30 msec on FPGA Local: 2000 msec Remote: 30 + 10 + 10 = 50 msec 40x acceleration 40 ms 50 ms Nhan Tran, FermiLab, et al. arXiv:1904.08986
  • 20. Reasoning about the computing continuum (a) Assumptions A1: N identical consumers, each of which requests one compute unit per sec, distributed X secs apart A2: Infinite bandwidth: i.e., only latency A3: A computer takes T secs to complete a compute unit A4: A compute center containing Z computers is faster by a factor of √Z 20 X X
  • 21. Max time is: On N: 𝑻 𝑵 + 𝑵 𝟐 𝐗 Local: T E.g., N = 100, T=0.01, X=0.0001: On N: 0.01 10 + 100 2 0.0001 = 0.001 + 0.00071 = 0.00171 s Local: 0.01 sec 22 Reasoning about the computing continuum (b) Without response time bounds N 2 X N 2 X N 2 X
  • 22. We want to know D for which: T 𝑠𝑖𝑧𝑒 + 2D ≤ B As size is πD2/X2, we want to solve: T πD2/X2 + 2D = B With B=0.01, T=0.001, X=0.0001 sec: D = 0.004964 sec (~1000 km) Then: Size = πD2/X2 = 7854 Max processing time is 2 × 0.004964 + 0.001/ 7854 = 0.01 seconds 24 D From A1, there are πD2/X2 consumers within distance D of a compute center Reasoning about the computing continuum (c) With response time bound, B
  • 23. Reasoning about the computing continuum (d) Discussion The model emphasizes the importance of aggregation The model can surely be improved: • Empirical data on scaling of cost and speed with size • Data transfer costs • Empirical data on workloads Optimal solutions will likely involve compute centers of multiple sizes
  • 24. 26Source: LBNL-2001025 Small and midsize data centers: Server intensity
  • 25. Coding the continuum Code: verb. 1) to arrange or enter in a code 27
  • 26. Coding the continuum Code: verb. 1) to arrange or enter in a code 2) to write code for 28
  • 27. Coding the continuum Code: verb. 1) to arrange or enter in a code 2) to write code for Now that the machine has disintegrated across the net, how do we program it? 29
  • 28. Coding the continuum Code: verb. 1) to arrange or enter in a code 2) to write code for Now that the machine has disintegrated across the net, how do we program it? 30 Continuum-aware programming model Function fabric Data fabric Trust fabric Cost map
  • 29. 31 Coding the continuum: Serial crystallography doi: 10.1038/nature09750
  • 30. 1 image/20 msec 1K image/15 sec 26K images/7 min 6 MB, 5 msec 6 GB, 1 sec 160 GB 60 sec 0.2-1 TB 3000 sec Multiple chips @ 7 min each For each sample: • Image crystals at ~50 Hz: • Validate each image • After 1000, quality control • After 26000, full analysis • If good: • Determine crystal structure • Return crystal structure Coding the continuum: Serial crystallography
  • 31. 1 image/20 msec 1K image/15 sec 26K images/7 min Multiple chips @ 7 min each 1 msec = 50 km 200 msec = 10 000 km 12 000 msec = 600 000 km [moon = 384 000 km]600K msec = 30 Mkm [L1 = 1.5M km] Coding the continuum: Serial crystallography 6 MB, 5 msec 6 GB, 1 sec 160 GB 60 sec 0.2-1 TB 3000 sec
  • 32. Advanced Photon Source Argonne Leadership Computng Facility 1 km 10 μsec RTT
  • 33. Similar needs arise across modern (AI-enabled) science Scientific instruments Major user facilities Laboratories Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memoization … Databases Reference data Experimental data Computed properties Scientific literature … Scientists, engineers Expert input Goal setting … Industry, academia New methods Open source codes AI accelerators … Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning AI Methods Data Models Accelerators Compute Agile Infrastructure Surrogates Agile Services Data mgmt Operating system Portability Compilers Runtime system Workflow Automation Prog. envs. Languages Model creation Libraries Resource mgmt Authen/Access
  • 35. Coding the continuum: Closed solution 37https://read.acloud.guru/aws-greengrass-the-missing-manual-2ac8df2fbdf4
  • 36. Coding the continuum: Elements of an open solution Zhuozhao Li Tyler Skluzacek Steve Tuecke Anna Woodard Logan Ward Rachana Yadu Babuji Ben Blaiszik Kyle Chard Ryan Chard Ananthakrishnan Thanks to colleagues, especially: funcX Model registry Flows Cost map Write programs Function fabric Data fabric Trust fabric DLHub Data services Auth SCRIMP Automate
  • 37. Automate Coding the continuum: Elements of an open solution https://arxiv.org/pdf/1905.02158 http://parsl-project.org funcX Model registry Flows Cost map Write programs Function fabric Data fabric Trust fabric DLHub Data services Auth SCRIMP
  • 38. Automate Coding the continuum: Elements of an open solution Portable code Any access Any computer Python Docker, Shifter, Singularity Clusters, clouds, HPC, accelerators SSH, Globus, cluster or HPC scheduler funcX Model registry Flows Cost map Write programs Function fabric Data fabric Trust fabric DLHub Data services Auth SCRIMP
  • 39. funcX: Transform clouds, clusters, and supercomputers into high-performance function serving systems 41 EP(x) EP(x) EP(x) EP(x) funcX Simply deploy funcX endpoint to transform a computer into a function serving system repo2dockerRegister EP(x) f(x) g(x) h(x) k(x) f(x) g(x) EP(x) h(x) k(x) f(x), … + depend- encies
  • 40. 42 EP(x) EP(x) EP(x) EP(x) f(x) g(x) h(x) k(x) repo2dockerRegister f(x) g(x) h(x) k(x) Registration f(x), g(x), … + dependencies EP(x) registry Execution f(x), … [1,2,3 … n] Simply deploy funcX endpoint to transform a computer into a function serving system funcX: Transform clouds, clusters, and supercomputers into high-performance function serving systems repo2dockerRegister EP(x) f(x) g(x) h(x) k(x) f(x) g(x) EP(x) h(x) k(x) f(x), … + depend- encies
  • 41. Latency (s) for functions running on ALCF Cooley cluster, submitted from login node Strong scaling Weak scaling
  • 43. Automate Coding the continuum: Elements of an open solution funcX Model registry Flows Cost map Write programs Function fabric Data fabric Trust fabric DLHub Data services Auth SCRIMP Incremental construction of a personalized cost map • Build black-box performance models from observed execution times for different codes on different platforms • Transfer learning across codes, problem sizes, and hardware platforms • Experiment design to choose experiments that maximize reduction in uncertainty • Evolve models over time as codes and platforms change • Use models for instance selection and scheduling
  • 44. 46 Virtual CPUs RAM(GB) Example: A cost map for bioinformatics applications on different AWS instance types IndexBam performs better on compute-optimized instances. Poorly chosen experiments mislead the model On average, within 30% of final error after 4 experiments and within 2.3% after 6
  • 45. Coding the continuum: Elements of an open solution funcX Model registry Flows Cost map Write programs Function fabric Data fabric Trust fabric DLHub Data services Auth SCRIMP Detect and respond to events  E.g., in HPC file systems: FSMon (Arnab Paul et al.) Invoke RESTful services, and accept user input Manage short- and long-lived activities Automate Automate
  • 46. Flow automation in a neuroanatomy automation 1. Image 2. Acquire 3. Pre-process 5. User: Validate & input 6. Reconstruct8. Visualize 9. Science! Lab Server 1 Lab Server 2 7. Publish Advanced Photon Source 4. Preview & center ALCF Compute LabUChicago
  • 47. Automate Coding the continuum: Elements of an open solution Cloud-hosted services support data lifecycle events  Cloud for high-reliability, modest-latency actions  Integrated OAuth-based security with delegation funcX Model registry Flows Cost map Write programs Function fabric Data fabric Trust fabric DLHub Data services Auth SCRIMPAutomate
  • 48. Automate Coding the continuum: Elements of an open solution dlhub.orghttps://arxiv.org/abs/1811.11213 Paper @ Session 7, 1:30pm today funcX Model registry Flows Cost map Write programs Function fabric Data fabric Trust fabric DLHub Data services Auth SCRIMP
  • 49. Coding the Continuum: Thanks for support US Department of Energy US National Science Foundation US National Institutes of Health US National Institute of Standards and Technology Amazon Web Services Globus subscribers
  • 50. Coding the [location- speed] continuum Code: verb: 1) to arrange or enter in a code 2) to write code for “Henceforth, location for itself, and speed for itself shall completely reduce to a mere shadow, and only some sort of union of the two shall preserve independence.” labs.globus.org – dlhub.org – globus.org – parsl-project.orgfoster@anl.gov Distribute computational tasks across a heterogeneous computing fabric “the machine disintegrates across the net into a set of special-purpose appliances” T πD2/X2 + 2D = B

Editor's Notes

  1. https://www.kickstarter.com/projects/planetary/planetary-collective-presents-continuum
  2. SDM = Space Division Multiplexing (multi-core fibers) BL: bit rate -- non-repeating distance product http://www.netquestcorp.com/newsroom/2019-networking-trends/
  3. https://www.ibiblio.org/jimmy/folkden-wp/?p=7437 https://medium.com/coursera-engineering/courseras-skills-graph-helps-learners-find-the-right-content-to-reach-their-goals-b10418a05214
  4. https://www.ibiblio.org/jimmy/folkden-wp/?p=7437
  5. Each chip is 72 Top/s
  6. PUE = Ratio of total amount of energy used by a computer facility to the energy delivered to computing equipment -- close to 1 vs. >2 for some in house
  7. https://www.datacenterdynamics.com/news/synergy-number-hyperscale-data-centers-reached-430-2018/
  8. 10 msec = 500 km RTT 10 000 = 500 000
  9. 1 msec = 100 km RTT = 200 km in one direction 5 msec = 500 km RTT 10 000 = 500 000
  10. 10 msec = 500 km RTT 10 000 = 500 000
  11. Hermann Minkowski One spatial dimension, one temporal dimension. 45 degrees = object moving at the speed of light.
  12. 5 µs / km 5000 usec = 5ms for 1000km Time is accelerated: or costs reduced (time = money) Space based on how long it takes to do things … (odd geometry/geography) – valley or highland
  13. 5 µs / km 5000 usec = 5ms for 1000km Time is accelerated: or costs reduced (time = money) Space based on how long it takes to do things … (odd geometry/geography) – valley or highland
  14. “1/100 msec apart = 2 km”
  15. “1/100 msec apart = 2 km”
  16. n/4 + N/4
  17. n/4 + N/4
  18. Logan Ward, Ben Blaiszik, et al.
  19. 1)      First we need a framework to handle execution of functions in various places (funcX).  ·         Speak to the various resources we need to support (from supercomputers to edge devices) ·         Need for containers for isolation and portability ·         Need to support existing resource types (and different models for execution) ·         App examples here include APS, metadata extraction, QCArchive (all from proposal I guess) 2)      Then we need a way to program it (Parsl) ·         Compose programs by assembling components  o    Components may be written in different languages, wrapped in Python, etc.  o    Easily expressed as functions ·         Orchestrate and plan execution based on dependencies ·         * at the moment Parsl supports more than just funcX and can run apps on raw resources outside of containers o    Talk about different requirements and different executors  3)      Once we get to a point we can program it... then we want to automate and outsource ·         Event-based systems and function serving go hand in hand ·         Need to first capture and respond to events (Ripple/FSMon) ·         Then we need to define reliable flows that cross many different services/locations ·         Our path towards automation ·         App examples: bobby or suresh 4)      Finally, building on this fabric we can develop higher level services (e.g., DLHub) ·          Story around complexity of being able to share and run models o    Models are really just functions + some data ·         Using funcX we can run models on demand wherever they might be most useful ·         (Don't want to say too much here as talk is coming later in the conference :))
  20. 45 Max 10:01 – 15 (40) IndexBAM - Because tail end variability was high… matt [5:10 PM]So we did say in the paper that as index bam is highly CPU bound and performs A LOT better on the compute optimized instances, that the additional data points effectively mislead the model in increasingly opposite directions so it cannot movie in a single "more accurate" directionHigh learning rate->our model is gullible with new dataFor "bound" applications Matt Summary Profiled and modelled Genomics pipelines Using manual profiling for instance type models Transfer learning for input data model Combined models to predict runtime and cost for tools given resource capabilities and input data Employed experimental design strategies to focus exploratory profiling efforts Next steps Bayesian optimization over the search space Deploy and validate outside of genomics tool pipelines
  21. Apparent speed depends on location; apparent location depends on speed