SlideShare a Scribd company logo
Near Exascale Computing in the Cloud:
the use of GPU bursts for Multi-Messenger
Astrophysics with IceCube Data
Frank Würthwein
OSG Executive Director
UCSD/SDSC
Jensen Huang keynote @ SC19
2
The Largest Cloud Simulation in History
50k NVIDIA GPUs in the Cloud
350 Petaflops for 2 hours
Distributed across US, Europe & Asia
Saturday morning before SC19 we bought all GPU capacity
that was for sale worldwide across AWS, Azure, and Google
A Story of 3 Cloud Bursts
• Saturday before SC19:
- Buy the entire GPU capacity worldwide that is for sale in AWS, Azure,
and Google for a couple of hours.
- Proof of principle & measurement of global GPU capacity
• February 4th 2020:
- Buy a workday’s worth of GPU capacity of only the most cost effective
GPUs for our application.
- Establish standard operations & cost
• November 4th 2020:
- Repeat without any storage in the cloud. All data input and output via
network. EGRESS via cloud connect to minimize charges.
- Establish on-prem to cloud networking and cloud connect routing
3
We will discuss this story from beginning to end.
The Science Case

Recommended for you

High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing

The document discusses challenges in achieving exascale computing capabilities by 2018. It outlines how standard technology scaling will not be enough, and compromises will need to be made. These include reduced node performance, lower network bandwidth and fewer pins. Blue Gene architecture is presented as an example of a balanced system that achieves high performance through optimized interconnects and packaging density. A thought experiment proposes integrating significant solid state storage at each node to create an "active storage" machine based on Blue Gene architecture.

supercomputerhpcibm
Delle monache luca
Delle monache lucaDelle monache luca
Delle monache luca

The document describes NCAR's wind forecasting system for Xcel Energy. Key points: - NCAR operates a probabilistic wind power prediction system for Xcel Energy using ensemble forecasts from its WRF model and an Analog Ensemble technique. - The system provides day-ahead and short-term wind power forecasts for effective grid integration and energy trading. - Accurate wind power predictions provide economic benefits of $1.9 million per 1% improvement and reduced 238,136 tons of CO2 emissions in 2011 due to avoided generation.

wind energyrenewable energywind power
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...

Cloudgene is an open-source platform that provides a graphical web interface to simplify the execution of MapReduce programs for genomic data analysis in public and private clouds. It allows users to integrate different MapReduce programs through a plugin interface, import and export data from various sources, and connect programs together in a pipeline. Cloudgene handles setting up clusters in public clouds and installing programs and data, making it easier for scientists to perform MapReduce analysis without having to manage the underlying infrastructure.

bioinformaticsbosc2012
IceCube
5
A cubic kilometer of ice at the
south pole is instrumented
with 5160 optical sensors.
Astrophysics:
• Discovery of astrophysical neutrinos
• First evidence of neutrino point source (TXS)
• Cosmic rays with surface detector
Particle Physics:
• Atmospheric neutrino oscillation
• Neutrino cross sections at TeV scale
• New physics searches at highest energies
Earth Science:
• Glaciology
• Earth tomography
A facility with very
diverse science goals
Restrict this talk to
high energy Astrophysics
High Energy Astrophysics
Science case for IceCube
6
Universe is opaque to light
at highest energies and
distances.
Only gravitational waves
and neutrinos can
pinpoint most violent
events in universe.
Fortunately, highest energy
neutrinos are of cosmic origin.
Effectively “background free” as long
as energy is measured correctly.
High energy neutrinos from
outside the solar system
7
First 28 very high energy neutrinos from outside the solar system
Red curve is the photon flux
spectrum measured with the
Fermi satellite.
Black points show the
corresponding high energy
neutrino flux spectrum
measured by IceCube.
This demonstrates both the opaqueness of the universe to high energy
photons, and the ability of IceCube to detect neutrinos above the
maximum energy we can see light due to this opaqueness.
Science 342 (2013). DOI:
10.1126/science.1242856
Understanding the Origin
8
We now know high energy events happen in the universe. What are they?
p + g D + p + p 0 p + g g
p + g D + n + p + n + m + n
Cosm
Aya Ishihara
The hypothesis:
The same cosmic events produce
neutrinos and photons
We detect the electrons or muons from neutrino that interact in the ice.
Neutrino interact very weakly => need a very large array of ice instrumented
to maximize chances that a cosmic neutrino interacts inside the detector.
Need pointing accuracy to point back to origin of neutrino.
Telescopes the world over then try to identify the source in the direction
IceCube is pointing to for the neutrino.
Multi-messenger Astrophysics

Recommended for you

Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...

The document discusses challenges faced with application specific supercomputer design. It provides an example of QPACE, a supercomputer designed for quantum chromodynamics (QCD) computations. Key challenges discussed include data ordering issues when using InfiniBand networking that could cause computations to use invalid data if ordering of writes to memory was not enforced. Ensuring proper data ordering is important to avoid software consuming data before it is valid.

supercomputerhpcqpace
Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Run

In this slidecast, Jason Stowe from Cycle Computing describes the company's recent record-breaking Petascale CycleCloud HPC production run. "For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours. Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule." Learn more: http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html Watch the video presentation: http://wp.me/p3RLHQ-aO9

cycle computinghpcsupercomputing
Detection of Radio Emission from Fireballs
Detection of Radio Emission from FireballsDetection of Radio Emission from Fireballs
Detection of Radio Emission from Fireballs

This document summarizes the detection of radio emissions from fireballs (very bright meteors) using the Long Wavelength Array radio telescope. A search of over 11,000 hours of all-sky radio images found 49 long-duration radio transients. Ten of these transients correlated spatially and temporally with fireballs detected by an optical meteor monitoring network. This provides evidence that fireballs emit previously undiscovered low frequency radio pulses. Further analysis found characteristics inconsistent with expected radio reflections from meteor trails, suggesting a non-thermal radio emission mechanism from the fireballs. This identifies a new class of natural radio transients and provides a new probe to study meteor physics.

The ν detection challenge
9
Optical Prop
Aya Ishihara
• Combining all the possible infor
• These features are included in s
• We’re always be developing the
Nature never tell us a perfect a
satisfactory agreeme
Ice properties change with
depth and wavelength
Observed pointing resolution at high
energies is systematics limited.
Central value moves
for different ice models
Improved e and τ reconstruction
Þ increased neutrino flux
detection
Þ more observations
Photon propagation through
ice runs efficiently on single
precision GPU.
Detailed simulation campaigns
to improve pointing resolution
by improving ice model.
Improvement in reconstruction with
better ice model near the detectors
First evidence of an origin
10
side view
125m
top view
0 500 1000 1500 2000 2500 3000
nanoseconds
Figure 1: Event display for neutrino event IceCube-170922A. The time at which a DOM
observed a signal is reflected in the color of the hit, with dark blues for earliest hits and yellow
First location of a source of very high energy neutrinos.
Neutrino produced high energy muon
near IceCube. Muon produced light as it
traverses IceCube volume. Light is
detected by array of phototubes of
IceCube.
IceCube alerted the astronomy community of the
observation of a single high energy neutrino on
September 22 2017.
A blazar designated by astronomers as TXS
0506+056 was subsequently identified as most likely
source in the direction IceCube was pointing. Multiple
telescopes saw light from TXS at the same time
IceCube saw the neutrino.
Science 361, 147-151
(2018). DOI:10.1126/science.aat2890
IceCube’s Future Plans
11
| IceCube Upgrade and Gen2 | Summer Blot | TeVPA 2018 16
The IceCube-Gen2 Facility
Preliminary timeline
MeV- to EeV-scale physics
Surface array
High Energy
Array
Radio array
PINGU
IC86
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 … 2032
Today
Surface air shower
Construction
R&D Design & Approval
IceCube Upgrade
IceCube Upgrade
Deployment
Near term:
add more phototubes to deep core to increase granularity of measurements.
Longer term:
• Extend instrumented
volume at smaller
granularity.
• Extend even smaller
granularity deep core
volume.
• Add surface array.
Improve detector for low & high energy neutrinos
Details on the Cloud Burst(s)

Recommended for you

How to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC HardwareHow to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC Hardware

In this video from GTC 2018, Peter Dueben from ECMWF presents: How to Prepare Weather and Climate Models for Future HPC Hardware. "Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modeling such as the use of reduced numerical precision and deep learning." Watch the video: https://wp.me/p3RLHQ-ixu Learn more: https://www.ecmwf.int/ and https://www.nvidia.com/en-us/gtc/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

hpcsupercomputingecmwf
A Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionA Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity Consumption

This document summarizes a low-cost and scalable system for visualizing electricity consumption. The system uses inexpensive current sensors connected to data collection units that transmit power usage data to a Google App Engine cloud server. This allows visualization apps to retrieve the data and monitor consumption across many sensors over time. The system was demonstrated at the SC11 conference with data from 82 sensors across Japan and the US being visualized.

visualizationembedded system
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD

05.07.29 Briefing to the CalREN-XD Subcommittee CENIC Board Title: The OptIPuter as a Prototype for CalREN-XD San Diego, CA

smarrcalit2calren-xd
The Idea
• Integrate all GPUs available for sale
worldwide into a single HTCondor pool.
- use 28 regions across 3 cloud providers for a
burst of a couple hours, or so.
• IceCube submits their photon propagation
workflow to this HTCondor pool.
- we handle the input, the jobs on the GPUs, and
the output as a single globally distributed system.
13
Run a GPU burst relevant in scale
for future Exascale HPC systems.
A global HTCondor pool
• IceCube, like all OSG user communities, relies on
HTCondor for resource orchestration
- This demo used the standard tools
• Dedicated HW setup
- Avoid disruption of OSG production system
- Optimize HTCondor setup for the spiky nature of the demo
§ multiple schedds for IceCube to submit to
§ collecting resources in each cloud region, then collecting from all
regions into global pool
14
HTCondor Distributed CI
15
Collector
Collector Collector
Collector
Collector
Negotiator
Scheduler Scheduler
Scheduler
IceCube
VM
VM
VM
10 schedd’s
One global resource pool
Using native Cloud storage
• Input data pre-staged into native Cloud storage
- Each file in one-to-few Cloud regions
§ some replication to deal with limited predictability of resources per region
- Local to Compute for large regions for maximum throughput
- Reading from “close” region for smaller ones to minimize ops
• Output staged back to region-local Cloud storage
• Deployed simple wrappers around Cloud native file
transfer tools
- IceCube jobs do not need to customize for different Clouds
- They just need to know where input data is available
(pretty standard OSG operation mode)
16

Recommended for you

Image Segmentation Using Hardware Forest Classifiers
Image Segmentation Using Hardware Forest ClassifiersImage Segmentation Using Hardware Forest Classifiers
Image Segmentation Using Hardware Forest Classifiers

The document describes a Kinect pipeline that uses hardware acceleration to perform background removal and body part classification directly on the FPGA. It explores using a random forest algorithm called Forest Fire, which has an efficient FPGA implementation, to classify pixels into player/background and body part classes. Two approaches are evaluated: a one-stage Forest Fire model, and a two-stage approach where the first stage classifies player/background and the second stage further classifies body parts. The hardware implementations achieve faster speeds and lower power compared to a CPU baseline.

fpga; computer vision;
BurstCube Poster Final Draft
BurstCube Poster Final DraftBurstCube Poster Final Draft
BurstCube Poster Final Draft

1) BurstCube is a CubeSat mission that will use four CsI scintillators read out by silicon photomultipliers (SiPMs) to detect gamma rays between 10 keV and 1 MeV and observe astrophysical counterparts to gravitational waves. 2) The study characterized SiPMs to qualify them for use in space missions like BurstCube by measuring their sensitivity, dark current, and pulse height in a dark box setup with an LED pulser. 3) Preliminary characterization of the SiPM's dark current and response to LED pulses was performed and the data was analyzed to understand the device's performance and background noise levels. Further optical tests and integration with scintillators is planned.

Fun with JavaScript and sensors - AmsterdamJS April 2015
Fun with JavaScript and sensors - AmsterdamJS April 2015Fun with JavaScript and sensors - AmsterdamJS April 2015
Fun with JavaScript and sensors - AmsterdamJS April 2015

This document discusses using various sensors in mobile devices for creative purposes beyond their typical uses. It describes how sensors like accelerometers, gyroscopes, ambient light sensors, and magnetometers can be accessed through JavaScript to build applications like a Theremin music instrument controlled by ambient light levels or tracking real-world device movement in 3D. It also covers experimental uses of beacons and connecting devices to visualize juggling. The document encourages developers to think creatively about how sensor data could enable new types of applications on mobile platforms.

Science with 51,000 GPUs
achieved as peak performance
17
Time in Minutes
Each color is a different
cloud region in US, EU, or Asia.
Total of 28 Regions in use.
Peaked at 51,500 GPUs
~380 Petaflops of fp32
8 generations of NVIDIA GPUs used.
Summary of stats at peak
A Heterogenous Resource Pool
18
28 cloud Regions across 4 world regions
providing us with 8 GPU generations.
No one region or GPU type dominates!
Science Produced
19
Distributed High-Throughput
Computing (dHTC) paradigm
implemented via HTCondor provides
global resource aggregation.
Largest cloud region provided 10.8% of the total
dHTC paradigm can aggregate
on-prem anywhere
HPC at any scale
and multiple clouds
Performance and Cost

Recommended for you

Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4

This document discusses clouds at CERN. It provides background on CERN, including that it was founded in 1954 by 12 European states for "Science for Peace" and now has 20 member states. It notes CERN has around 2300 staff, 1000 other paid personnel, and over 11,000 users. The document discusses challenges in scaling IT infrastructure with fixed staff and budgets. It outlines CERN's approach of moving to cloud models using open source tools. The status provides details on OpenStack deployments at CERN and experiments. It outlines next steps such as moving to new OpenStack releases and using cells to scale capacity.

Seismic sensor
Seismic sensorSeismic sensor
Seismic sensor

This document provides a tutorial about a seismic sensor network. It discusses: 1) The special demands of seismic and acoustic applications including large-scale deployment, challenged networks, and remote monitoring requirements. 2) An overview of the software and hardware used in the network including the CDCCs, Q330 data loggers, Duiker data collection software, and DTS remote management software. 3) How to assemble a seismic node in 30 minutes by connecting sensors, data loggers, and wireless nodes together and reprogramming the nodes.

2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...
2016 asprs track:  overview and user perspective of usgs 3 dep lidar by john ...2016 asprs track:  overview and user perspective of usgs 3 dep lidar by john ...
2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...

This document provides an overview of 3D Elevation Program (3DEP) lidar data. It defines key lidar terminology and describes the various products that can be derived from lidar point clouds, including digital surface models (DSMs), digital terrain models (DTMs), intensity images, and canopy height models. It explains the quality levels and availability of 3DEP lidar data across the United States. The document also briefly discusses the latest lidar technologies, such as multispectral, single-photon, and Geiger-mode lidar.

Performance vs GPU type
21
42% of the science was done on V100 in 19% of the wall time.
Second Cloud Burst focused on
maximizing science/$$$
2nd burst was an 8h work day in pacific time zone on a
random Tuesday in February
Do a burst that we could repeat anytime,
with any dHTC application.
A Day of Cloud Use
23
Integrated one
EFLOP32 hour
170 PFLOP32s plateau
Total bill: ~$60k,
including networking and storage
We did a 2nd run on February 4th 2020 to focus on a cost-effective 8h work day
We picked a “random” Tuesday during peak working hours (pacific).
Cost to support cloud as a
“24x7” capability
• February 2020: roughly $60k per ExaFLOP32 hour
• This burst was executed by 2 people
- Igor Sfiligoi (SDSC) to support the infrastructure.
- David Schultz (UW Madison) to create and submit the
IceCube workflows.
§ “David” type person is needed also for on-prem science workflows.
• To make this a routine operations capability for any
open science that is dHTC capable would require
another 50% FTE “Cloud Budget Manager”.
- There is substantial effort involved in just dealing with cost &
budgets for a large community of scientists.
24

Recommended for you

Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...

Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research. By Mark Rivers, University of Chicago

cyberinfrastructureend-user workshopearthcube
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid

This document provides an overview of distributed data management for the Large Hadron Collider (LHC) experiments at CERN. It discusses the worldwide computing grid that is used to store, process, and analyze the immense volumes of data produced by the LHC experiments each year. The grid consists of Tier 0, 1, and 2 computing centers around the world. It has enabled scientists from many collaborating institutions to work together on data from the LHC experiments.

lhc datagrid
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era

10.02.22 Invited talk Symposium #1610, How Computational Science Is Tackling the Grand Challenges Facing Science and Society Title: Science and Cyberinfrastructure in the Data-Dominated Era San Diego, CA

aaassmarrcomputational science
To provide an aggregate ExaFlop32
hour per day dHTC production capability
in the commercial cloud for the sum of
many sciences today would require:
1.5FTE of human effort
$60k of cloud costs per day
This does not include the human effort to train the community,
define the workflows, run the workflows, … i.e. it does not include
what the scientists themselves still have to do.
3rd Cloud Burst
Buy enough GPUs to saturate 100Gbps network to
UW Madison with overflow to UCSD
Do EGRESS entirely via Internet2 Cloud Connect
for AWS, Azure, and Google
Scale of GPU burst peaked at
60% of second cloud burst.
Used smaller set of regions but
still all 3 providers
Egress data intensive in nature
• Cloud burst ~ saturated
100 Gbps link
- To make good use of a
large fraction
of available Cloud GPUs
• IceCube simulations
are relatively heavy in
egress data
- 2 GB per job
- Job length ~= 0.5 hour
• And very spiky
- The whole file
is transferred
after compute
completed
• Input sizes small-ish
- 0.25 GB Peaked at 90.3Gbps at UW Madison
plus an additional 10-20Gbps at UCSD
Using Internet2 Cloud Connect Service
• Egress costs notoriously high
• Buying dedicated links is
cheaper
- If provisioned on demand
• Internet2 acts as provider for
the research community
- For AWS, Azure and GCP
• No 100Gbps links available
- Had to stitch together 21 links,
at 10Gbps, 5Gbps and 2 Gbps
Each color band belongs
to one network link
https://internet2.edu/services/cloud-connect/
130TB
in 5hours

Recommended for you

Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform

The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.

A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...

11.12.12 Seminar Presentation Princeton Institute for Computational Science and Engineering (PICSciE) Princeton University Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research Princeton, NJ

picsciesmarrcyberinfrastructure
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science

In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science. "Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale." Watch the video: https://wp.me/p3RLHQ-kLV Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

hpcsupercomputingnersc
Struggled with spiky workload
during trial run
• Attempted burst in trial run lead
to “oscillatory” network use.
• Noticed links to different providers behave differently
- Some capped, some flexible
- Long upload times when congested => waste of money
AWS
Azure
50Gbps
Slow & careful during big “burst”
• Ramp up for over 2 hours • Still not perfect
- But much smoother
2 hour ramp
GB/sec GB/sec
21 network links
were provisioned
IO across individual
links quite chaotic
Started slow to randomize job end times, and thus network transfers.
And yet, the individual link utilization is still quite spikey.
Screenshot of provisioned links
Bought:
10 links @ 5Gbps
5 links @ 2Gbps
6 links @ 10Gbps
Our ability to use a link depended on the
availability of GPUs in the corresponding region.
A bit of a Tetris problem.
Very different provisioning
in the 3 Clouds
• AWS the most complex
- And requires initiation by
on-prem network engineer
• Many steps after initial request
- Accept connection request
- Create VPG
- Associate VPG with VPC
- Create DCG
- Create VIF
§ Relay back to on-prem the BGP key
- Establish VPC -> VPG routing
- Associate DCG -> VPG
• And don’t forget the Internet routers, if
you use dedicated VPCs
• GCP the simplest
- Create Cloud Router
- Create Interconnect
§ Provide key to on-prem
• Azure not much harder
- Make sure the VPC has Gateway subnet
- Create ExpressRoute (ER)
§ Provide key to on-prem
- Create VNG
- Create connection between ER and VNG
- But Azure comes with many more
options
to choose from
This was the hardest of our 3 cloud bursts
because it required a lot of coordination,
and had too many parts without automation.
(Tetris problem of GPU availability, job end time, link bandwidth)

Recommended for you

Detecting solar farms with deep learning
Detecting solar farms with deep learningDetecting solar farms with deep learning
Detecting solar farms with deep learning

Talk delivered at Free and Open Source Software for Geo North America 2019 (FOSS4GNA) Large scale solar arrays or farms have been installed globally faster than can be reliably tracked by interested stakeholders. We have built a deep learning model with Sentinel 2 satellite imagery that allows us to create accurate, timely global maps of solar farms.

deep learningsolarpv
Harnessing the virtual realm
Harnessing the virtual realmHarnessing the virtual realm
Harnessing the virtual realm

NVIDIA is working on tackling climate change through the development of digital twins of Earth using AI and high performance computing. They are collaborating with various partners on initiatives like Destination Earth, which envisions an interactive digital twin platform for modeling and simulation. NVIDIA technologies like Omniverse, AI, and upcoming CPUs like Grace could help make a fully realized digital twin a reality. This would allow researchers to better understand climate systems and explore different scenarios to help mitigate and adapt to climate change.

aiai4eoclimate
From pixels to point clouds - Using drones,game engines and virtual reality t...
From pixels to point clouds - Using drones,game engines and virtual reality t...From pixels to point clouds - Using drones,game engines and virtual reality t...
From pixels to point clouds - Using drones,game engines and virtual reality t...

Drone-based monitoring and 3D modeling of the National Arboretum in Canberra is allowing for detailed phenotyping of tree growth over time. Drones equipped with RGB and multispectral cameras capture aerial images that are processed using software like Pix4D to generate orthomosaic images, digital elevation models, 3D point clouds, and tree metrics like height and area. The data is helping researchers monitor changes in the young research forest over several years. Advanced visualization tools are being developed to better explore the large, complex datasets.

 
by ARDC
dronesdata managementforest
Additional on-prem
networking setup needed
• Quote from Michael Hare, UW Madison
Network engineer:
In addition to network configuration [at] UW Madison (AS59), we provisioned BGP
based Layer 3 MPLS VPNs (L3VPNs) towards Internet2 via our regional aggregator,
BTAA OmniPop.
This work involved reaching out to the BTAA NOC to coordinate on VLAN numbers
and paths and to [the] Internet2 NOC to make sure the newly provisioned VLANs
were configurable inside OESS.
Due to limitations in programmability or knowledge at the time regarding duplicate
IP address towards the cloud (GCP, Azure, AWS) endpoints, we built several discrete
L3VPNs inside the Internet2 network to accomplish the desired topology.
Not something domain scientists can expect to accomplish.
Applicability beyond IceCube
• All the large instruments we know off
- LHC, LIGO, DUNE, LSST, …
• Any midscale instrument we can think off
- XENON, GlueX, Clas12, Nova, DES, Cryo-EM, …
• A large fraction of Deep Learning
- But not all of it …
• Basically, anything that has bundles of
independently schedulable jobs that can be
partitioned to adjust workloads to have 0.5 to
few hour runtimes on modern GPUs.
34
IceCube is ready for Exascale
• Humanity has built extraordinary instruments by pooling
human and financial resources globally.
• The computing for these large collaborations fits perfectly to
the cloud or scheduling holes in Exascale HPC systems due
to its “ingeniously parallel” nature. => dHTC
• The dHTC computing paradigm applies to a wide range of
problems across all of open science.
- We are happy to repeat this with anybody willing to spend $50k in the
clouds.
35
Contact us at: support@opensciencegrid.org
Or me personally at: fkw@ucsd.edu
Demonstrated elastic burst at 51,500 GPUs
IceCube is ready for Exascale
Acknowledgements
• This work was partially supported by the
NSF grants OAC-1941481, MPS-1148698,
OAC-1841530, OAC-1904444, and OAC-
1826967, OPP-1600823
36

Recommended for you

The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...

05.02.23 Invited Access Grid Talk MSCMC FORUM Series Examining the National Vision for Global Peace and Prosperity Title: The Academic and R&D Sectors' Current and Future Broadband and Fiber Access Needs for US Global Competitiveness Arlington, VA

smarrcalit2mscmc forum series
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure

The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.

smarrcalit2meteorology
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...

05.06.14 Keynote to the 15th Federation of Earth Science Information Partners Assembly Meeting: Linking Data and Information to Decision Makers Title: The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way to the International LambdaGrid San Diego, CA

smarrcalit2lambdagrid

More Related Content

What's hot

Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
Larry Smarr
 
Differential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networksDifferential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networks
Daniel Lim
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
Heiko Joerg Schick
 
Delle monache luca
Delle monache lucaDelle monache luca
Delle monache luca
Winterwind
 
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
Jan Aerts
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
 
Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Run
inside-BigData.com
 
Detection of Radio Emission from Fireballs
Detection of Radio Emission from FireballsDetection of Radio Emission from Fireballs
Detection of Radio Emission from Fireballs
Carlos Bella
 
How to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC HardwareHow to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC Hardware
inside-BigData.com
 
A Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionA Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity Consumption
Ryousei Takano
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
Larry Smarr
 
Image Segmentation Using Hardware Forest Classifiers
Image Segmentation Using Hardware Forest ClassifiersImage Segmentation Using Hardware Forest Classifiers
Image Segmentation Using Hardware Forest Classifiers
Neil Pittman
 
BurstCube Poster Final Draft
BurstCube Poster Final DraftBurstCube Poster Final Draft
BurstCube Poster Final Draft
Ykeshia Zamore
 
Fun with JavaScript and sensors - AmsterdamJS April 2015
Fun with JavaScript and sensors - AmsterdamJS April 2015Fun with JavaScript and sensors - AmsterdamJS April 2015
Fun with JavaScript and sensors - AmsterdamJS April 2015
Jan Jongboom
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
Tim Bell
 
Seismic sensor
Seismic sensorSeismic sensor
Seismic sensor
ajsatienza
 
2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...
2016 asprs track:  overview and user perspective of usgs 3 dep lidar by john ...2016 asprs track:  overview and user perspective of usgs 3 dep lidar by john ...
2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...
GIS in the Rockies
 
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
EarthCube
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
Swiss Big Data User Group
 

What's hot (20)

Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Differential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networksDifferential data processing for energy efficiency of wireless sensor networks
Differential data processing for energy efficiency of wireless sensor networks
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
 
Delle monache luca
Delle monache lucaDelle monache luca
Delle monache luca
 
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Run
 
Detection of Radio Emission from Fireballs
Detection of Radio Emission from FireballsDetection of Radio Emission from Fireballs
Detection of Radio Emission from Fireballs
 
How to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC HardwareHow to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC Hardware
 
A Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionA Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity Consumption
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
 
Image Segmentation Using Hardware Forest Classifiers
Image Segmentation Using Hardware Forest ClassifiersImage Segmentation Using Hardware Forest Classifiers
Image Segmentation Using Hardware Forest Classifiers
 
BurstCube Poster Final Draft
BurstCube Poster Final DraftBurstCube Poster Final Draft
BurstCube Poster Final Draft
 
Fun with JavaScript and sensors - AmsterdamJS April 2015
Fun with JavaScript and sensors - AmsterdamJS April 2015Fun with JavaScript and sensors - AmsterdamJS April 2015
Fun with JavaScript and sensors - AmsterdamJS April 2015
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Seismic sensor
Seismic sensorSeismic sensor
Seismic sensor
 
2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...
2016 asprs track:  overview and user perspective of usgs 3 dep lidar by john ...2016 asprs track:  overview and user perspective of usgs 3 dep lidar by john ...
2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...
 
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 

Similar to Near Exascale Computing in the Cloud

Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
Larry Smarr
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
Larry Smarr
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
Larry Smarr
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Detecting solar farms with deep learning
Detecting solar farms with deep learningDetecting solar farms with deep learning
Detecting solar farms with deep learning
Jason Brown
 
Harnessing the virtual realm
Harnessing the virtual realmHarnessing the virtual realm
Harnessing the virtual realm
Alison B. Lowndes
 
From pixels to point clouds - Using drones,game engines and virtual reality t...
From pixels to point clouds - Using drones,game engines and virtual reality t...From pixels to point clouds - Using drones,game engines and virtual reality t...
From pixels to point clouds - Using drones,game engines and virtual reality t...
ARDC
 
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
Larry Smarr
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
Larry Smarr
 
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
Larry Smarr
 
���HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
inside-BigData.com
 
1. GRID COMPUTING
1. GRID COMPUTING1. GRID COMPUTING
1. GRID COMPUTING
Dr Sandeep Kumar Poonia
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
Igor Sfiligoi
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
Larry Smarr
 
Thoughts on Cybersecurity
Thoughts on CybersecurityThoughts on Cybersecurity
Thoughts on Cybersecurity
Frank Wuerthwein
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Research
shandra_psc
 
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
Larry Smarr
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 
Cal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun MicrosystemsCal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun Microsystems
Larry Smarr
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
InfluxData
 

Similar to Near Exascale Computing in the Cloud (20)

Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
Detecting solar farms with deep learning
Detecting solar farms with deep learningDetecting solar farms with deep learning
Detecting solar farms with deep learning
 
Harnessing the virtual realm
Harnessing the virtual realmHarnessing the virtual realm
Harnessing the virtual realm
 
From pixels to point clouds - Using drones,game engines and virtual reality t...
From pixels to point clouds - Using drones,game engines and virtual reality t...From pixels to point clouds - Using drones,game engines and virtual reality t...
From pixels to point clouds - Using drones,game engines and virtual reality t...
 
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
 
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
1. GRID COMPUTING
1. GRID COMPUTING1. GRID COMPUTING
1. GRID COMPUTING
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
 
Thoughts on Cybersecurity
Thoughts on CybersecurityThoughts on Cybersecurity
Thoughts on Cybersecurity
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Research
 
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Cal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun MicrosystemsCal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun Microsystems
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
 

Recently uploaded

Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
Sérgio Sacani
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
WALTONMARBRUCAL
 
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
muralinath2
 
Search for Dark Matter Ionization on the Night Side of Jupiter with Cassini
Search for Dark Matter Ionization on the Night Side of Jupiter with CassiniSearch for Dark Matter Ionization on the Night Side of Jupiter with Cassini
Search for Dark Matter Ionization on the Night Side of Jupiter with Cassini
Sérgio Sacani
 
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Ashkbiz Danehkar
 
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptxBragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Lisandro Cunci
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Sérgio Sacani
 
CONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLCCONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLC
ROLANARIBATO3
 
Summer program introduction in Yunnan university
Summer program introduction in Yunnan universitySummer program introduction in Yunnan university
Summer program introduction in Yunnan university
Hayato Shimabukuro
 
Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)
saloniswain225
 
MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...
MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...
MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...
Steffi Friedrichs
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Hossein Fani
 
1,1 and 1,2 Migratory insertion reactions.pptx
1,1 and 1,2 Migratory insertion reactions.pptx1,1 and 1,2 Migratory insertion reactions.pptx
1,1 and 1,2 Migratory insertion reactions.pptx
nharnahquophi8080
 
El Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdf
El Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdfEl Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdf
El Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdf
Champs Elysee Roldan
 
Electrostatic force class 8 ncert. .pptx
Electrostatic force class 8 ncert. .pptxElectrostatic force class 8 ncert. .pptx
Electrostatic force class 8 ncert. .pptx
yokeswarikannan123
 
lipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptxlipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptx
muralinath2
 
Electrostatic force class 8 physics .pdf
Electrostatic force class 8 physics .pdfElectrostatic force class 8 physics .pdf
Electrostatic force class 8 physics .pdf
yokeswarikannan123
 
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptxScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
JoanaBanasen1
 
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of AngiospermsDalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
Gurjant Singh
 
GIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptxGIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptx
muralinath2
 

Recently uploaded (20)

Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
 
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
 
Search for Dark Matter Ionization on the Night Side of Jupiter with Cassini
Search for Dark Matter Ionization on the Night Side of Jupiter with CassiniSearch for Dark Matter Ionization on the Night Side of Jupiter with Cassini
Search for Dark Matter Ionization on the Night Side of Jupiter with Cassini
 
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
 
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptxBragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
 
CONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLCCONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLC
 
Summer program introduction in Yunnan university
Summer program introduction in Yunnan universitySummer program introduction in Yunnan university
Summer program introduction in Yunnan university
 
Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)
 
MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...
MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...
MACRAMÉ-ChiPs: Patchwork Project Family & Sibling Projects (24th Meeting of t...
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
 
1,1 and 1,2 Migratory insertion reactions.pptx
1,1 and 1,2 Migratory insertion reactions.pptx1,1 and 1,2 Migratory insertion reactions.pptx
1,1 and 1,2 Migratory insertion reactions.pptx
 
El Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdf
El Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdfEl Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdf
El Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdf
 
Electrostatic force class 8 ncert. .pptx
Electrostatic force class 8 ncert. .pptxElectrostatic force class 8 ncert. .pptx
Electrostatic force class 8 ncert. .pptx
 
lipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptxlipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptx
 
Electrostatic force class 8 physics .pdf
Electrostatic force class 8 physics .pdfElectrostatic force class 8 physics .pdf
Electrostatic force class 8 physics .pdf
 
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptxScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
 
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of AngiospermsDalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
 
GIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptxGIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptx
 

Near Exascale Computing in the Cloud

  • 1. Near Exascale Computing in the Cloud: the use of GPU bursts for Multi-Messenger Astrophysics with IceCube Data Frank Würthwein OSG Executive Director UCSD/SDSC
  • 2. Jensen Huang keynote @ SC19 2 The Largest Cloud Simulation in History 50k NVIDIA GPUs in the Cloud 350 Petaflops for 2 hours Distributed across US, Europe & Asia Saturday morning before SC19 we bought all GPU capacity that was for sale worldwide across AWS, Azure, and Google
  • 3. A Story of 3 Cloud Bursts • Saturday before SC19: - Buy the entire GPU capacity worldwide that is for sale in AWS, Azure, and Google for a couple of hours. - Proof of principle & measurement of global GPU capacity • February 4th 2020: - Buy a workday’s worth of GPU capacity of only the most cost effective GPUs for our application. - Establish standard operations & cost • November 4th 2020: - Repeat without any storage in the cloud. All data input and output via network. EGRESS via cloud connect to minimize charges. - Establish on-prem to cloud networking and cloud connect routing 3 We will discuss this story from beginning to end.
  • 5. IceCube 5 A cubic kilometer of ice at the south pole is instrumented with 5160 optical sensors. Astrophysics: • Discovery of astrophysical neutrinos • First evidence of neutrino point source (TXS) • Cosmic rays with surface detector Particle Physics: • Atmospheric neutrino oscillation • Neutrino cross sections at TeV scale • New physics searches at highest energies Earth Science: • Glaciology • Earth tomography A facility with very diverse science goals Restrict this talk to high energy Astrophysics
  • 6. High Energy Astrophysics Science case for IceCube 6 Universe is opaque to light at highest energies and distances. Only gravitational waves and neutrinos can pinpoint most violent events in universe. Fortunately, highest energy neutrinos are of cosmic origin. Effectively “background free” as long as energy is measured correctly.
  • 7. High energy neutrinos from outside the solar system 7 First 28 very high energy neutrinos from outside the solar system Red curve is the photon flux spectrum measured with the Fermi satellite. Black points show the corresponding high energy neutrino flux spectrum measured by IceCube. This demonstrates both the opaqueness of the universe to high energy photons, and the ability of IceCube to detect neutrinos above the maximum energy we can see light due to this opaqueness. Science 342 (2013). DOI: 10.1126/science.1242856
  • 8. Understanding the Origin 8 We now know high energy events happen in the universe. What are they? p + g D + p + p 0 p + g g p + g D + n + p + n + m + n Cosm Aya Ishihara The hypothesis: The same cosmic events produce neutrinos and photons We detect the electrons or muons from neutrino that interact in the ice. Neutrino interact very weakly => need a very large array of ice instrumented to maximize chances that a cosmic neutrino interacts inside the detector. Need pointing accuracy to point back to origin of neutrino. Telescopes the world over then try to identify the source in the direction IceCube is pointing to for the neutrino. Multi-messenger Astrophysics
  • 9. The ν detection challenge 9 Optical Prop Aya Ishihara • Combining all the possible infor • These features are included in s • We’re always be developing the Nature never tell us a perfect a satisfactory agreeme Ice properties change with depth and wavelength Observed pointing resolution at high energies is systematics limited. Central value moves for different ice models Improved e and τ reconstruction Þ increased neutrino flux detection Þ more observations Photon propagation through ice runs efficiently on single precision GPU. Detailed simulation campaigns to improve pointing resolution by improving ice model. Improvement in reconstruction with better ice model near the detectors
  • 10. First evidence of an origin 10 side view 125m top view 0 500 1000 1500 2000 2500 3000 nanoseconds Figure 1: Event display for neutrino event IceCube-170922A. The time at which a DOM observed a signal is reflected in the color of the hit, with dark blues for earliest hits and yellow First location of a source of very high energy neutrinos. Neutrino produced high energy muon near IceCube. Muon produced light as it traverses IceCube volume. Light is detected by array of phototubes of IceCube. IceCube alerted the astronomy community of the observation of a single high energy neutrino on September 22 2017. A blazar designated by astronomers as TXS 0506+056 was subsequently identified as most likely source in the direction IceCube was pointing. Multiple telescopes saw light from TXS at the same time IceCube saw the neutrino. Science 361, 147-151 (2018). DOI:10.1126/science.aat2890
  • 11. IceCube’s Future Plans 11 | IceCube Upgrade and Gen2 | Summer Blot | TeVPA 2018 16 The IceCube-Gen2 Facility Preliminary timeline MeV- to EeV-scale physics Surface array High Energy Array Radio array PINGU IC86 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 … 2032 Today Surface air shower Construction R&D Design & Approval IceCube Upgrade IceCube Upgrade Deployment Near term: add more phototubes to deep core to increase granularity of measurements. Longer term: • Extend instrumented volume at smaller granularity. • Extend even smaller granularity deep core volume. • Add surface array. Improve detector for low & high energy neutrinos
  • 12. Details on the Cloud Burst(s)
  • 13. The Idea • Integrate all GPUs available for sale worldwide into a single HTCondor pool. - use 28 regions across 3 cloud providers for a burst of a couple hours, or so. • IceCube submits their photon propagation workflow to this HTCondor pool. - we handle the input, the jobs on the GPUs, and the output as a single globally distributed system. 13 Run a GPU burst relevant in scale for future Exascale HPC systems.
  • 14. A global HTCondor pool • IceCube, like all OSG user communities, relies on HTCondor for resource orchestration - This demo used the standard tools • Dedicated HW setup - Avoid disruption of OSG production system - Optimize HTCondor setup for the spiky nature of the demo § multiple schedds for IceCube to submit to § collecting resources in each cloud region, then collecting from all regions into global pool 14
  • 15. HTCondor Distributed CI 15 Collector Collector Collector Collector Collector Negotiator Scheduler Scheduler Scheduler IceCube VM VM VM 10 schedd’s One global resource pool
  • 16. Using native Cloud storage • Input data pre-staged into native Cloud storage - Each file in one-to-few Cloud regions § some replication to deal with limited predictability of resources per region - Local to Compute for large regions for maximum throughput - Reading from “close” region for smaller ones to minimize ops • Output staged back to region-local Cloud storage • Deployed simple wrappers around Cloud native file transfer tools - IceCube jobs do not need to customize for different Clouds - They just need to know where input data is available (pretty standard OSG operation mode) 16
  • 17. Science with 51,000 GPUs achieved as peak performance 17 Time in Minutes Each color is a different cloud region in US, EU, or Asia. Total of 28 Regions in use. Peaked at 51,500 GPUs ~380 Petaflops of fp32 8 generations of NVIDIA GPUs used. Summary of stats at peak
  • 18. A Heterogenous Resource Pool 18 28 cloud Regions across 4 world regions providing us with 8 GPU generations. No one region or GPU type dominates!
  • 19. Science Produced 19 Distributed High-Throughput Computing (dHTC) paradigm implemented via HTCondor provides global resource aggregation. Largest cloud region provided 10.8% of the total dHTC paradigm can aggregate on-prem anywhere HPC at any scale and multiple clouds
  • 21. Performance vs GPU type 21 42% of the science was done on V100 in 19% of the wall time.
  • 22. Second Cloud Burst focused on maximizing science/$$$ 2nd burst was an 8h work day in pacific time zone on a random Tuesday in February Do a burst that we could repeat anytime, with any dHTC application.
  • 23. A Day of Cloud Use 23 Integrated one EFLOP32 hour 170 PFLOP32s plateau Total bill: ~$60k, including networking and storage We did a 2nd run on February 4th 2020 to focus on a cost-effective 8h work day We picked a “random” Tuesday during peak working hours (pacific).
  • 24. Cost to support cloud as a “24x7” capability • February 2020: roughly $60k per ExaFLOP32 hour • This burst was executed by 2 people - Igor Sfiligoi (SDSC) to support the infrastructure. - David Schultz (UW Madison) to create and submit the IceCube workflows. § “David” type person is needed also for on-prem science workflows. • To make this a routine operations capability for any open science that is dHTC capable would require another 50% FTE “Cloud Budget Manager”. - There is substantial effort involved in just dealing with cost & budgets for a large community of scientists. 24
  • 25. To provide an aggregate ExaFlop32 hour per day dHTC production capability in the commercial cloud for the sum of many sciences today would require: 1.5FTE of human effort $60k of cloud costs per day This does not include the human effort to train the community, define the workflows, run the workflows, … i.e. it does not include what the scientists themselves still have to do.
  • 26. 3rd Cloud Burst Buy enough GPUs to saturate 100Gbps network to UW Madison with overflow to UCSD Do EGRESS entirely via Internet2 Cloud Connect for AWS, Azure, and Google Scale of GPU burst peaked at 60% of second cloud burst. Used smaller set of regions but still all 3 providers
  • 27. Egress data intensive in nature • Cloud burst ~ saturated 100 Gbps link - To make good use of a large fraction of available Cloud GPUs • IceCube simulations are relatively heavy in egress data - 2 GB per job - Job length ~= 0.5 hour • And very spiky - The whole file is transferred after compute completed • Input sizes small-ish - 0.25 GB Peaked at 90.3Gbps at UW Madison plus an additional 10-20Gbps at UCSD
  • 28. Using Internet2 Cloud Connect Service • Egress costs notoriously high • Buying dedicated links is cheaper - If provisioned on demand • Internet2 acts as provider for the research community - For AWS, Azure and GCP • No 100Gbps links available - Had to stitch together 21 links, at 10Gbps, 5Gbps and 2 Gbps Each color band belongs to one network link https://internet2.edu/services/cloud-connect/ 130TB in 5hours
  • 29. Struggled with spiky workload during trial run • Attempted burst in trial run lead to “oscillatory” network use. • Noticed links to different providers behave differently - Some capped, some flexible - Long upload times when congested => waste of money AWS Azure 50Gbps
  • 30. Slow & careful during big “burst” • Ramp up for over 2 hours • Still not perfect - But much smoother 2 hour ramp GB/sec GB/sec 21 network links were provisioned IO across individual links quite chaotic Started slow to randomize job end times, and thus network transfers. And yet, the individual link utilization is still quite spikey.
  • 31. Screenshot of provisioned links Bought: 10 links @ 5Gbps 5 links @ 2Gbps 6 links @ 10Gbps Our ability to use a link depended on the availability of GPUs in the corresponding region. A bit of a Tetris problem.
  • 32. Very different provisioning in the 3 Clouds • AWS the most complex - And requires initiation by on-prem network engineer • Many steps after initial request - Accept connection request - Create VPG - Associate VPG with VPC - Create DCG - Create VIF § Relay back to on-prem the BGP key - Establish VPC -> VPG routing - Associate DCG -> VPG • And don’t forget the Internet routers, if you use dedicated VPCs • GCP the simplest - Create Cloud Router - Create Interconnect § Provide key to on-prem • Azure not much harder - Make sure the VPC has Gateway subnet - Create ExpressRoute (ER) § Provide key to on-prem - Create VNG - Create connection between ER and VNG - But Azure comes with many more options to choose from This was the hardest of our 3 cloud bursts because it required a lot of coordination, and had too many parts without automation. (Tetris problem of GPU availability, job end time, link bandwidth)
  • 33. Additional on-prem networking setup needed • Quote from Michael Hare, UW Madison Network engineer: In addition to network configuration [at] UW Madison (AS59), we provisioned BGP based Layer 3 MPLS VPNs (L3VPNs) towards Internet2 via our regional aggregator, BTAA OmniPop. This work involved reaching out to the BTAA NOC to coordinate on VLAN numbers and paths and to [the] Internet2 NOC to make sure the newly provisioned VLANs were configurable inside OESS. Due to limitations in programmability or knowledge at the time regarding duplicate IP address towards the cloud (GCP, Azure, AWS) endpoints, we built several discrete L3VPNs inside the Internet2 network to accomplish the desired topology. Not something domain scientists can expect to accomplish.
  • 34. Applicability beyond IceCube • All the large instruments we know off - LHC, LIGO, DUNE, LSST, … • Any midscale instrument we can think off - XENON, GlueX, Clas12, Nova, DES, Cryo-EM, … • A large fraction of Deep Learning - But not all of it … • Basically, anything that has bundles of independently schedulable jobs that can be partitioned to adjust workloads to have 0.5 to few hour runtimes on modern GPUs. 34
  • 35. IceCube is ready for Exascale • Humanity has built extraordinary instruments by pooling human and financial resources globally. • The computing for these large collaborations fits perfectly to the cloud or scheduling holes in Exascale HPC systems due to its “ingeniously parallel” nature. => dHTC • The dHTC computing paradigm applies to a wide range of problems across all of open science. - We are happy to repeat this with anybody willing to spend $50k in the clouds. 35 Contact us at: support@opensciencegrid.org Or me personally at: fkw@ucsd.edu Demonstrated elastic burst at 51,500 GPUs IceCube is ready for Exascale
  • 36. Acknowledgements • This work was partially supported by the NSF grants OAC-1941481, MPS-1148698, OAC-1841530, OAC-1904444, and OAC- 1826967, OPP-1600823 36