- The document describes running a GPU burst simulation for IceCube astrophysics research across 50,000 NVIDIA GPUs in multiple cloud platforms globally, achieving 350 petaflops for 2 hours.
- IceCube detects high-energy neutrinos to study violent astrophysical events by observing the interactions of neutrinos within a cubic kilometer of Antarctic ice instrumented with sensors.
- The GPU burst simulation campaign helped improve IceCube's ability to reconstruct neutrino direction and energy and identify astrophysical sources through multi-messenger astrophysics.
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
The document discusses how Nvidia's A100 GPU with Multi-Instance GPU (MIG) capability can help scale up scientific output for astronomy projects like IceCube and LIGO. The A100 is much faster than previous GPUs, but MIG allows it to be partitioned so multiple jobs or processes can leverage the GPU simultaneously. This results in 200-600% higher throughput compared to using a single GPU, by better utilizing the massive parallelism of the A100. MIG makes the powerful A100 GPU practical for these CPU-bound scientific workloads.
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
Hadoop: the Big Answer to the Big Question of the Big DataVictor Haydin
This document discusses the history and future of big data and distributed computing frameworks. It notes that the amount of data in the world has grown enormously from 130GB in 1975 to 2.9ZB in 2012. Hadoop is an open-source software framework that supports distributed applications on large datasets using a distributed file system (HDFS) and MapReduce programming model. Main contributors to the ecosystem include HDFS, MapReduce, HBase, Hive, Pig, ZooKeeper and projects in the Apache incubator like Flume. The future includes improving high availability, scalability, and alternative processing models in core Hadoop and further developing the ecosystem through projects like Apache BigTop.
In this slidecast, Jason Stowe from Cycle Computing describes the company's recent record-breaking Petascale CycleCloud HPC production run.
"For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours. Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule."
Learn more: http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html
Watch the video presentation: http://wp.me/p3RLHQ-aO9
Detection of Radio Emission from FireballsCarlos Bella
This document summarizes the detection of radio emissions from fireballs (very bright meteors) using the Long Wavelength Array radio telescope. A search of over 11,000 hours of all-sky radio images found 49 long-duration radio transients. Ten of these transients correlated spatially and temporally with fireballs detected by an optical meteor monitoring network. This provides evidence that fireballs emit previously undiscovered low frequency radio pulses. Further analysis found characteristics inconsistent with expected radio reflections from meteor trails, suggesting a non-thermal radio emission mechanism from the fireballs. This identifies a new class of natural radio transients and provides a new probe to study meteor physics.
Towards Exascale Simulations of Stellar Explosions with FLASHGanesan Narayanasamy
- ORNL is managed by UT-Battelle for the US Department of Energy and conducts research including simulations of stellar explosions using the FLASH code.
- The research aims to prepare FLASH to run on the upcoming Summit supercomputer by accelerating components like the nuclear kinetics module using GPUs.
- Preliminary results show significant speedups from using GPUs for large nuclear reaction networks that were previously too computationally expensive.
Differential data processing for energy efficiency of wireless sensor networksDaniel Lim
Wireless sensor networks use many types of wireless sensors to configure network. However batteries in wireless sensor nodes are energy limited and consume considerable energy for data transmission. Therefore, data merging is used as a means to increase energy efficiency in data transmission. In this paper, we propose Differential Data Processing(DDP), which reduces the size of data transmitted from the sensor node to increase the energy efficiency of the wireless sensor network. Experimental results show that processing the differential temperature data reduces the average data size of the sensor node by 30%.
How to Prepare Weather and Climate Models for Future HPC Hardwareinside-BigData.com
In this video from GTC 2018, Peter Dueben from ECMWF presents: How to Prepare Weather and Climate Models for Future HPC Hardware.
"Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modeling such as the use of reduced numerical precision and deep learning."
Watch the video: https://wp.me/p3RLHQ-ixu
Learn more: https://www.ecmwf.int/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Making Sense of Information Through Planetary Scale ComputingLarry Smarr
Larry Smarr discusses how planetary-scale computing and high-speed networks enable data-intensive research through optical portals. This infrastructure allows remote visualization and analysis of large datasets across multiple sites in real-time. Examples include viewing microbial genomes, cosmological simulations, and remote instrument control. The infrastructure also aims to reduce carbon emissions through more efficient computing.
The document discusses OpenStack at CERN. It provides details on:
- OpenStack has been in production at CERN for 3 years, managing over 190,000 cores and 7,000 hypervisors.
- Major cultural and technology changes were required and have been successfully addressed to transition to OpenStack.
- Contributing back to the upstream OpenStack community has led to sustainable tools and effective technology transfer.
Weather Station Data Publication at Irstea: an implementation Report. catherine roussey
This document discusses Irstea's publication of weather station data as linked open data using semantic web standards. It provides an overview of open data and linked open data principles. It then describes Irstea's weather station in Montoldre, France, the sensors that collect data, and the observations made. It details how the data was modeled using the Semantic Sensor Network (SSN) ontology and other related ontologies. Finally, it discusses converting the data from CSV files to RDF and making it available via a SPARQL endpoint.
This document summarizes Tim Bell's presentation on OpenStack at CERN. It discusses how CERN adopted OpenStack in 2011 to manage its growing computing infrastructure needs for processing massive data sets from the Large Hadron Collider. OpenStack has since been scaled up to manage over 300,000 CPU cores and 500,000 physics jobs per day across CERN's private cloud. The document also briefly outlines CERN's use of other open source technologies like Ceph and Kubernetes.
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
This document summarizes Tim Bell's presentation on big data science and computing at CERN. It discusses:
1) The large volumes of data generated by the LHC experiments, including 40 million pictures per second and over 800 petabytes of stored data worldwide.
2) The worldwide computing grid used to store, process and analyze LHC data across over 170 computing centers in 42 countries.
3) How CERN has transitioned its computing infrastructure from mainframes to Linux and open source cloud technologies like OpenStack to manage its increasing data needs.
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
The document discusses the CERN OpenStack cloud, which provides compute resources for the Large Hadron Collider experiment at CERN. It details the scale of the cloud, including over 6,700 hypervisors, 190,000 cores, and 20,000 VMs. It also describes the various use cases served, wide range of hardware, and operations of the cloud, including a retirement campaign and network migration to Neutron.
The ITIC will examine NASA's IT infrastructure, software, and data environments to identify opportunities for improvement. This includes investigating collaborative tools, high performance computing, data storage, and aerospace communications. The committee will also review the OCIO's strategic plans and IT governance across NASA to recommend best practices for managing IT infrastructure. The goal is to help NASA utilize leading edge capabilities and disruptive technologies to enhance distributed teams and mission activities.
The document describes a scenario to analyze access between a constellation of 40 low-Earth orbit satellites and a ground station located at MathWorks Natick. A satellite scenario is created in MATLAB and the constellation satellites are added along with their orbital parameters. Each satellite is equipped with a conical sensor camera with a 90-degree field of view. The ground station representing MathWorks Natick is also added with a minimum elevation angle of 30 degrees. Access analysis is performed between each camera and the ground station to determine the times each camera can photograph the site. The results show the start and end times of access intervals for each camera over the 6-hour period from 1:00 PM to 7:00 PM UTC on May 12,
inGeneoS: Intercontinental Genetic sequencing over trans-Pacific networks and...Andrew Howard
This document summarizes an international collaboration between the National Computational Infrastructure (NCI) in Australia and A*Star in Singapore to accelerate DNA analysis. The collaboration utilizes trans-Pacific extended InfiniBand networks and supercomputers to:
1) Transfer large genetic sequence datasets from NCI in Canberra to A*Star in Singapore for analysis on the A*Star Aurora system and return results.
2) Utilize NCI's InfiniCloud HPC system for visualization of genetic data results produced by Aurora.
3) Demonstrate long distance high-speed data transfers between Australia and Singapore leveraging extended InfiniBand networks.
In this video from the DDN User Group Meeting at SC14, Steve Simms from Indiana University presents: Indiana University's Data Capacitor II.
"The High Performance File Systems unit of UITSResearch Technologies operates two separate high-speed file systems for temporary storage of research data. Both use the open sourceLustre parallel distributed file system running on a version of theLinux operating system: Data Capacitor II (DC2) is a larger, faster replacement for the former Data Capacitor, which was decommissioned January 7, 2014. Like its predecessor, DC2 is a large-capacity, high-throughput, high-bandwidth Lustre-based file system serving all IU campuses. It is mounted on the Big Red II, Karst,Quarry, and Mason research computing systems."
This document describes a project to improve the processing and visualization of weather satellite telemetry data from the Ocean Surface Topography Mission (OSTM) satellites. The current tool (Cyclone) is slow, taking a long time to perform queries and generate graphs from the large dataset. The authors developed a new tool that ingests the data into Elasticsearch using Logstash, then allows users to quickly query and visualize the data in an interactive graph through a custom interface. Benchmarking showed their tool was over 5 times faster than Cyclone. While ingestion is still slow, the visualization component provides a quicker and easier experience to replace Cyclone. Future work could implement Apache Spark to improve the speed of data ingestion.
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
10.02.22
Invited talk
Symposium #1610, How Computational Science Is Tackling the Grand Challenges Facing Science and Society
Title: Science and Cyberinfrastructure in the Data-Dominated Era
San Diego, CA
NASA Advanced Computing Environment for Science & Engineeringinside-BigData.com
In this deck from the 2017 Argonne Training Program on Extreme-Scale Computing, Rupak Biswas from NASA presents: NASA Advanced Computing Environment for Science & Engineering.
""High performance computing is now integral to NASA’s portfolio of missions to pioneer the future of space exploration, accelerate scientific discovery, and enable aeronautics research. Anchored by the Pleiades supercomputer at NASA Ames Research Center, the High End Computing Capability (HECC) Project provides a fully integrated environment to satisfy NASA’s diverse modeling, simulation, and analysis needs. In addition, HECC serves as the agency’s expert source for evaluating emerging HPC technologies and maturing the most appropriate ones into the production environment. This includes investigating advanced IT technologies such as accelerators, cloud computing, collaborative environments, big data analytics, and adiabatic quantum computing. The overall goal is to provide a consolidated bleeding-edge environment to support NASA's computational and analysis requirements for science and engineering applications."
Dr. Rupak Biswas is currently the Director of Exploration Technology at NASA Ames Research Center, Moffett Field, Calif., and has held this Senior Executive Service (SES) position since January 2016. In this role, he in charge of planning, directing, and coordinating the technology development and operational activities of the organization that comprises of advanced supercomputing, human systems integration, intelligent systems, and entry systems technology. The directorate consists of approximately 700 employees with an annual budget of $160 million, and includes two of NASA’s critical and consolidated infrastructures: arc jet testing facility and supercomputing facility. He is also the Manager of the NASA-wide High End Computing Capability Project that provides a full range of advanced computational resources and services to numerous programs across the agency. In addition, he leads the emerging quantum computing effort for NASA.
Watch the video: https://wp.me/p3RLHQ-hua
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses how the TeraGrid initiative provides US researchers with access to large scale high performance computing resources for physics research. It describes the diverse computing resources available through TeraGrid including supercomputers, clusters, and visualization resources. It provides examples of how physics domains like lattice QCD, astrophysics, and nanoscale electronic structure are using TeraGrid resources to enable large simulations and address research challenges. Training and support resources for users are also summarized.
From pixels to point clouds - Using drones,game engines and virtual reality t...ARDC
Drone-based monitoring and 3D modeling of the National Arboretum in Canberra is allowing for detailed phenotyping of tree growth over time. Drones equipped with RGB and multispectral cameras capture aerial images that are processed using software like Pix4D to generate orthomosaic images, digital elevation models, 3D point clouds, and tree metrics like height and area. The data is helping researchers monitor changes in the young research forest over several years. Advanced visualization tools are being developed to better explore the large, complex datasets.
Global Research Platforms: Past, Present, FutureLarry Smarr
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Statistical estimation and inference for large data sets require computationally efficient optimization methods. Remote sensing retrievals are, in fact, estimates of the underlying true state, and their optimization routines must necessarily make compromises in order to keep up with large data volumes. A sub-group of the Remote Sensing Working Group of the SAMSI Program on Mathematical and Statistical Methods for Climate and the Earth System is investigating how optimization in Bayesian-inspired retrievals and o_-line statistical methods could be made more computationally efficient. We will report on discussions held to-date and describe how progress in the theory of data systems research can positively impact optimization methodologies.
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...Larry Smarr
09.11.03
Report to the
Dept. of Energy Advanced Scientific Computing Advisory Committee
Title: Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * Calit2 * LBNL * NICS * ORNL * SDSC
Oak Ridge, TN
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
11.03.28
Remote Luncheon Presentation from Calit2@UCSD
National Science Board
Expert Panel Discussion on Data Policies
National Science Foundation
Title: High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Science and Engineering
Arlington, Virginia
Applying Photonics to User Needs: The Application ChallengeLarry Smarr
05.02.28
Invited Talk to the 4th Annual On*VECTOR International Photonics Workshop
Sponsored by NTT Network Innovation Laboratories
Title: Applying Photonics to User Needs: The Application Challenge
University of California, San Diego
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...Larry Smarr
05.02.23
Invited Access Grid Talk
MSCMC FORUM Series
Examining the National Vision for Global Peace and Prosperity
Title: The Academic and R&D Sectors' Current and Future Broadband and Fiber Access Needs for US Global Competitiveness
Arlington, VA
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...Larry Smarr
05.06.14
Keynote to the 15th Federation of Earth Science Information Partners Assembly Meeting: Linking Data and Information to Decision Makers
Title: The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way to the International LambdaGrid
San Diego, CA
The NASA Nebula Project provides a cloud computing platform that addresses NASA's challenge of a fragmented and inefficient IT environment. Nebula offers scalable computing resources that researchers can access easily to perform data processing and analysis. This overcomes limitations of local servers and supercomputers. Early users report being able to accomplish more data-intensive work faster using Nebula. The platform is based on OpenStack, an open source cloud software project.
TOPIC: INTRODUCTION TO FORENSIC SCIENCE.pptximansiipandeyy
This presentation, "Introduction to Forensic Science," offers a basic understanding of forensic science, including its history, why it's needed, and its main goals. It covers how forensic science helps solve crimes and its importance in the justice system. By the end, you'll have a clear idea of what forensic science is and why it's essential.
Dalghren, Thorne and Stebbins System of Classification of AngiospermsGurjant Singh
The Dahlgren, Thorne, and Stebbins system of classification is a modern method for categorizing angiosperms (flowering plants) based on phylogenetic relationships. Developed by botanists Rolf Dahlgren, Robert Thorne, and G. Ledyard Stebbins, this system emphasizes evolutionary relationships and incorporates extensive morphological and molecular data. It aims to provide a more accurate reflection of the genetic and evolutionary connections among angiosperm families and orders, facilitating a better understanding of plant diversity and evolution. This classification system is a valuable tool for botanists, researchers, and horticulturists in studying and organizing the vast diversity of flowering plants.
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetSérgio Sacani
We observed two transits of HD 189733b in JWST program 1633 using JWST
NIRCam grism F444W and F322W2 filters on August 25 and 29th 2022. The first
visit with F444W used SUBGRISM64 subarray lasting 7877 integrations with 4
BRIGHT1 groups per integration. Each effective integration is 2.4s for a total effective exposure time of 18780.9s and a total exposure duration of 21504.2s (∼6 hrs)
including overhead. The second visit with F322W2 used SUBGRISM64 subarray
lasting 10437 integrations with 3 BRIGHT1 groups per integration. Each effective
integration is 1.7s for a total effective exposure time of 17774.7s and a total exposure
duration of 21383.1s (∼6 hrs) including overhead. The transit duration of HD189733
b is ∼1.8 hrs and both observations had additional pre-ingress baseline relative to
post-egress baseline in anticipating the potential ramp systematics at the beginning
of the exposure from NIRCam infrared detectors.
Testing the Son of God Hypothesis (Jesus Christ)Robert Luk
Instead of answering the God hypothesis, we investigate the Son of God hypothesis. We developed our own methodology to deal with existential statements instead of universal statements unlike science. We discuss the existence of the supernaturals and found that there are strong evidence for it. Given that supernatural exists, we report on miracles investigated in the past related to the Son of God. A Bayesian methodology is used to calculate the combined degree of belief of the Son of God Hypothesis. We also report the testing of occurrences of words/numbers in the Bible to suggest the likelihood of some special numbers occurring, supporting the Son of God Hypothesis. We also have a table showing the past occurrences of miracles in hundred year periods for about 1000 years. Miracles that we have looked at include Shroud of Turin, Eucharistic Miracles, Marian Apparitions, Incorruptible Corpses, etc.
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Hossein Fani
Collaborative team recommendation involves selecting users with certain skills to form a team who will, more likely than not, accomplish a complex task successfully. To automate the traditionally tedious and error-prone manual process of team formation, researchers from several scientific spheres have proposed methods to tackle the problem. In this tutorial, while providing a taxonomy of team recommendation works based on their algorithmic approaches to model skilled users in collaborative teams, we perform a comprehensive and hands-on study of the graph-based approaches that comprise the mainstream in this field, then cover the neural team recommenders as the cutting-edge class of approaches. Further, we provide unifying definitions, formulations, and evaluation schema. Last, we introduce details of training strategies, benchmarking datasets, and open-source tools, along with directions for future works.
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...Sérgio Sacani
LHS 1140 b is the second-closest temperate transiting planet to the Earth with an equilibrium temperature low enough to support surface liquid water. At 1.730±0.025 R⊕, LHS 1140 b falls within
the radius valley separating H2-rich mini-Neptunes from rocky super-Earths. Recent mass and radius
revisions indicate a bulk density significantly lower than expected for an Earth-like rocky interior,
suggesting that LHS 1140 b could either be a mini-Neptune with a small envelope of hydrogen (∼0.1%
by mass) or a water world (9–19% water by mass). Atmospheric characterization through transmission
spectroscopy can readily discern between these two scenarios. Here, we present two JWST/NIRISS
transit observations of LHS 1140 b, one of which captures a serendipitous transit of LHS 1140 c. The
combined transmission spectrum of LHS 1140 b shows a telltale spectral signature of unocculted faculae (5.8 σ), covering ∼20% of the visible stellar surface. Besides faculae, our spectral retrieval analysis
reveals tentative evidence of residual spectral features, best-fit by Rayleigh scattering from an N2-
dominated atmosphere (2.3 σ), irrespective of the consideration of atmospheric hazes. We also show
through Global Climate Models (GCM) that H2-rich atmospheres of various compositions (100×, 300×,
1000×solar metallicity) are ruled out to >10 σ. The GCM calculations predict that water clouds form
below the transit photosphere, limiting their impact on transmission data. Our observations suggest
that LHS 1140 b is either airless or, more likely, surrounded by an atmosphere with a high mean molecular weight. Our tentative evidence of an N2-rich atmosphere provides strong motivation for future
transmission spectroscopy observations of LHS 1140 b.
Search for Dark Matter Ionization on the Night Side of Jupiter with CassiniSérgio Sacani
We present a new search for dark matter (DM) using planetary atmospheres. We point out that
annihilating DM in planets can produce ionizing radiation, which can lead to excess production of
ionospheric Hþ
3 . We apply this search strategy to the night side of Jupiter near the equator. The night side
has zero solar irradiation, and low latitudes are sufficiently far from ionizing auroras, leading to a lowbackground search. We use Cassini data on ionospheric Hþ
3 emission collected three hours either side of
Jovian midnight, during its flyby in 2000, and set novel constraints on the DM-nucleon scattering cross
section down to about 10−38 cm2. We also highlight that DM atmospheric ionization may be detected in
Jovian exoplanets using future high-precision measurements of planetary spectra.
Search for Dark Matter Ionization on the Night Side of Jupiter with Cassini
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all available GPUs in the Cloud
1. Running a GPU burst for Multi-
Messenger Astrophysics with
IceCube across all available GPUs in
the Cloud
Frank Würthwein
OSG Executive Director
UCSD/SDSC
2. Jensen Huang keynote @ SC19
2
The Largest Cloud Simulation in History
50k NVIDIA GPUs in the Cloud
350 Petaflops for 2 hours
Distributed across US, Europe & Asia
Saturday morning before SC19 we bought all GPU capacity that was for sale in
Amazon Web Services, Microsoft Azure, and Google Cloud Platform worldwide
4. IceCube
4
A cubic kilometer of ice at the
south pole is instrumented
with 5160 optical sensors.
Astrophysics:
• Discovery of astrophysical neutrinos
• First evidence of neutrino point source (TXS)
• Cosmic rays with surface detector
Particle Physics:
• Atmospheric neutrino oscillation
• Neutrino cross sections at TeV scale
• New physics searches at highest energies
Earth Science:
• Glaciology
• Earth tomography
A facility with very
diverse science goals
Restrict this talk to
high energy Astrophysics
5. High Energy Astrophysics Science
case for IceCube
5
Universe is opaque to light
at highest energies and
distances.
Only gravitational waves
and neutrinos can pinpoint
most violent events in
universe.
Fortunately, highest energy
neutrinos are of cosmic origin.
Effectively “background free” as long
as energy is measured correctly.
6. High energy neutrinos from
outside the solar system
6
First 28 very high energy neutrinos from outside the solar system
Red curve is the photon flux
spectrum measured with the
Fermi satellite.
Black points show the
corresponding high energy
neutrino flux spectrum
measured by IceCube.
This demonstrates both the opaqueness of the universe to high energy
photons, and the ability of IceCube to detect neutrinos above the maximum
energy we can see light due to this opaqueness.
Science 342 (2013). DOI:
10.1126/science.1242856
7. Understanding the Origin
7
We now know high energy events happen in the universe. What are they?
p + g D + p + p 0 p + gg
p + g D + n + p + n + m + n
Co
Aya Ishihara
The hypothesis:
The same cosmic events produce
neutrinos and photons
We detect the electrons or muons from neutrino that interact in the ice.
Neutrino interact very weakly => need a very large array of ice instrumented
to maximize chances that a cosmic neutrino interacts inside the detector.
Need pointing accuracy to point back to origin of neutrino.
Telescopes the world over then try to identify the source in the direction
IceCube is pointing to for the neutrino. Multi-messenger Astrophysics
8. The ν detection challenge
8
Optical Pro
Aya Ishiha
• Combining all the possible info
• These features are included in
• We’re always be developing th
Nature never tell us a perfec
satisfactory agreem
Ice properties change with
depth and wavelength
Observed pointing resolution at high
energies is systematics limited.
Central value moves
for different ice models
Improved e and τ reconstruction
increased neutrino flux
detection
more observations
Photon propagation through
ice runs efficiently on single
precision GPU.
Detailed simulation campaigns
to improve pointing resolution
by improving ice model.
Improvement in reconstruction with
better ice model near the detectors
9. First evidence of an origin
9
First location of a source of very high energy neutrinos.
Neutrino produced high energy muon
near IceCube. Muon produced light as it
traverses IceCube volume. Light is
detected by array of phototubes of
IceCube.
IceCube alerted the astronomy community of the
observation of a single high energy neutrino on
September 22 2017.
A blazar designated by astronomers as TXS
0506+056 was subsequently identified as most likely
source in the direction IceCube was pointing. Multiple
telescopes saw light from TXS at the same time
IceCube saw the neutrino.
Science 361, 147-151
(2018). DOI:10.1126/science.aat2890
10. IceCube’s Future Plans
10
| IceCube Upgrade and Gen2 | Summer Blot | TeVPA 2018
The IceCube-Gen2 Facility
Preliminary timeline
MeV- to EeV-scale physics
Surface array
High Energy
Array
Radio array
PINGU
IC86
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 … 2032
Today
Surface air shower
ConstructionR&D Design & Approval
IceCube Upgrade
IceCube Upgrade
Deployment
Near term:
add more phototubes to deep core to increase granularity of measurements.
Longer term:
• Extend instrumented
volume at smaller
granularity.
• Extend even smaller
granularity deep core
volume.
• Add surface array.
Improve detector for low & high energy neutrinos
12. The Idea
• Integrate all GPUs available for sale
worldwide into a single HTCondor pool.
use 28 regions across AWS, Azure, and Google
Cloud for a burst of a couple hours, or so.
• IceCube submits their photon propagation
workflow to this HTCondor pool.
we handle the input, the jobs on the GPUs, and
the output as a single globally distributed system.
12
Run a GPU burst relevant in scale
for future Exascale HPC systems.
13. A global HTCondor pool
• IceCube, like all OSG user communities, relies on
HTCondor for resource orchestration
This demo used the standard tools
• Dedicated HW setup
Avoid disruption of OSG production system
Optimize HTCondor setup for the spiky nature of the demo
multiple schedds for IceCube to submit to
collecting resources in each cloud region, then collecting from all
regions into global pool
13
15. Using native Cloud storage
• Input data pre-staged into native Cloud storage
Each file in one-to-few Cloud regions
some replication to deal with limited predictability of resources per region
Local to Compute for large regions for maximum throughput
Reading from “close” region for smaller ones to minimize ops
• Output staged back to region-local Cloud storage
• Deployed simple wrappers around Cloud native file
transfer tools
IceCube jobs do not need to customize for different Clouds
They just need to know where input data is available
(pretty standard OSG operation mode)
15
16. The Testing Ahead of Time
16
~250,000 single threaded jobs
run across 28 cloud regions
during 80 minutes.
Peak at 90,000
jobs running.
up to 60k jobs started in ~10min.
Regions across US, EU, and
Asia were used in this test.
Demonstrated burst capability
of our infrastructure on CPUs.
Want scale of GPU burst to be limited
only by # of GPUs available for sale.
17. Science with 51,000 GPUs
achieved as peak performance
17
Time in Minutes
Each color is a different
cloud region in US, EU, or Asia.
Total of 28 Regions in use.
Peaked at 51,500 GPUs
~380 Petaflops of fp32
8 generations of NVIDIA GPUs used.
Summary of stats at peak
18. A Heterogenous Resource Pool
18
28 cloud Regions across 4 world regions
providing us with 8 GPU generations.
No one region or GPU type dominates!
19. Science Produced
19
Distributed High-Throughput
Computing (dHTC) paradigm
implemented via HTCondor provides
global resource aggregation.
Largest cloud region provided 10.8% of the total
dHTC paradigm can aggregate
on-prem anywhere
HPC at any scale
and multiple clouds
24. IceCube Input Segmentable
24
IceCube prepared two types of input files that differed
in x10 in the number of input events per file.
Small files processed by K80 and K520, large files by all other GPU types.
seconds seconds
A total of 10.2 Billion events were processed across ~175,000 GPU jobs.
Each job fetched a file from cloud storage to local storage, processed that file, and wrote
the output to cloud storage. For ¼ of the regions cloud storage was not local to the
region. => we could have probably avoided data replication across regions given the
excellent networking between regions for each provider.
25. Annual IceCube GPU use via OSG
25
Peaked at ~3000 GPUs for a day.
Last 12 months
The routine operations of IceCube is
as globally distributed as this cloud burst.
We produced ~3% of the annual photon
propagation simulations in this ~2h cloud burst.
26. Applicability beyond IceCube
• All the large instruments we know off
LHC, LIGO, DUNE, LSST, …
• Any midscale instrument we can think off
XENON, GlueX, Clas12, Nova, DES, Cryo-EM, …
• A large fraction of Deep Learning
But not all of it …
• Basically, anything that has bundles of
independently schedulable jobs that can be
partitioned to adjust workloads to have 0.5 to
few hour runtimes on modern GPUs.
26
27. Cost to support cloud as a “24x7”
capability
• Today, roughly $15k per 300 PFLOP32 hour
• This burst was executed by 2 people
Igor Sfiligoi (SDSC) to support the infrastructure.
David Schulz (UW Madison) to create and submit the
IceCube workflows.
David is needed also for on-prem science workflows.
• To make this a routine operations capability for any
open science that is dHTC capable would require
another 50% FTE “Cloud Budget Manager”.
There is substantial effort involved in just dealing with cost &
budgets for a large community of scientists.
27
28. IceCube is ready for Exascale
• Humanity has built extraordinary instruments by pooling
human and financial resources globally.
• The computing for these large collaborations fits perfectly to
the cloud or scheduling holes in Exascale HPC systems due
to its “ingeniously parallel” nature. => dHTC
• The dHTC computing paradigm applies to a wide range of
problems across all of open science.
We are happy to repeat this with anybody willing to spend $50k in the
clouds.
28
Contact us at: help@opensciencegrid.org
Or me personally at: fkw@ucsd.edu
Demonstrated elastic burst at 51,500 GPUs
IceCube is ready for Exascale
29. Acknowledgements
• This work was partially supported by the
NSF grants OAC-1941481, MPS-1148698,
OAC-1841530, and OAC-1826967
29