This is the keynote talk fkw gave at cloudnet 2020. It covers all three cloudbursts we did. As of early 2021, slides 26ff is still the most detailed documentation of the 3rd cloudburst. This material will be covered in a future conference paper.
High Performance Computing - Challenges on the Road to Exascale Computing
The document discusses challenges in achieving exascale computing capabilities by 2018. It outlines how standard technology scaling will not be enough, and compromises will need to be made. These include reduced node performance, lower network bandwidth and fewer pins. Blue Gene architecture is presented as an example of a balanced system that achieves high performance through optimized interconnects and packaging density. A thought experiment proposes integrating significant solid state storage at each node to create an "active storage" machine based on Blue Gene architecture.
The document describes NCAR's wind forecasting system for Xcel Energy. Key points:
- NCAR operates a probabilistic wind power prediction system for Xcel Energy using ensemble forecasts from its WRF model and an Analog Ensemble technique.
- The system provides day-ahead and short-term wind power forecasts for effective grid integration and energy trading.
- Accurate wind power predictions provide economic benefits of $1.9 million per 1% improvement and reduced 238,136 tons of CO2 emissions in 2011 due to avoided generation.
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
Cloudgene is an open-source platform that provides a graphical web interface to simplify the execution of MapReduce programs for genomic data analysis in public and private clouds. It allows users to integrate different MapReduce programs through a plugin interface, import and export data from various sources, and connect programs together in a pipeline. Cloudgene handles setting up clusters in public clouds and installing programs and data, making it easier for scientists to perform MapReduce analysis without having to manage the underlying infrastructure.
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
The document discusses challenges faced with application specific supercomputer design. It provides an example of QPACE, a supercomputer designed for quantum chromodynamics (QCD) computations. Key challenges discussed include data ordering issues when using InfiniBand networking that could cause computations to use invalid data if ordering of writes to memory was not enforced. Ensuring proper data ordering is important to avoid software consuming data before it is valid.
In this slidecast, Jason Stowe from Cycle Computing describes the company's recent record-breaking Petascale CycleCloud HPC production run.
"For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours. Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule."
Learn more: http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html
Watch the video presentation: http://wp.me/p3RLHQ-aO9
This document summarizes the detection of radio emissions from fireballs (very bright meteors) using the Long Wavelength Array radio telescope. A search of over 11,000 hours of all-sky radio images found 49 long-duration radio transients. Ten of these transients correlated spatially and temporally with fireballs detected by an optical meteor monitoring network. This provides evidence that fireballs emit previously undiscovered low frequency radio pulses. Further analysis found characteristics inconsistent with expected radio reflections from meteor trails, suggesting a non-thermal radio emission mechanism from the fireballs. This identifies a new class of natural radio transients and provides a new probe to study meteor physics.
How to Prepare Weather and Climate Models for Future HPC Hardware
In this video from GTC 2018, Peter Dueben from ECMWF presents: How to Prepare Weather and Climate Models for Future HPC Hardware.
"Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modeling such as the use of reduced numerical precision and deep learning."
Watch the video: https://wp.me/p3RLHQ-ixu
Learn more: https://www.ecmwf.int/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
A Low-cost and Scalable Visualization System for Electricity Consumption
This document summarizes a low-cost and scalable system for visualizing electricity consumption. The system uses inexpensive current sensors connected to data collection units that transmit power usage data to a Google App Engine cloud server. This allows visualization apps to retrieve the data and monitor consumption across many sensors over time. The system was demonstrated at the SC11 conference with data from 82 sensors across Japan and the US being visualized.
Image Segmentation Using Hardware Forest Classifiers
The document describes a Kinect pipeline that uses hardware acceleration to perform background removal and body part classification directly on the FPGA. It explores using a random forest algorithm called Forest Fire, which has an efficient FPGA implementation, to classify pixels into player/background and body part classes. Two approaches are evaluated: a one-stage Forest Fire model, and a two-stage approach where the first stage classifies player/background and the second stage further classifies body parts. The hardware implementations achieve faster speeds and lower power compared to a CPU baseline.
1) BurstCube is a CubeSat mission that will use four CsI scintillators read out by silicon photomultipliers (SiPMs) to detect gamma rays between 10 keV and 1 MeV and observe astrophysical counterparts to gravitational waves.
2) The study characterized SiPMs to qualify them for use in space missions like BurstCube by measuring their sensitivity, dark current, and pulse height in a dark box setup with an LED pulser.
3) Preliminary characterization of the SiPM's dark current and response to LED pulses was performed and the data was analyzed to understand the device's performance and background noise levels. Further optical tests and integration with scintillators is planned.
Fun with JavaScript and sensors - AmsterdamJS April 2015
This document discusses using various sensors in mobile devices for creative purposes beyond their typical uses. It describes how sensors like accelerometers, gyroscopes, ambient light sensors, and magnetometers can be accessed through JavaScript to build applications like a Theremin music instrument controlled by ambient light levels or tracking real-world device movement in 3D. It also covers experimental uses of beacons and connecting devices to visualize juggling. The document encourages developers to think creatively about how sensor data could enable new types of applications on mobile platforms.
This document discusses clouds at CERN. It provides background on CERN, including that it was founded in 1954 by 12 European states for "Science for Peace" and now has 20 member states. It notes CERN has around 2300 staff, 1000 other paid personnel, and over 11,000 users. The document discusses challenges in scaling IT infrastructure with fixed staff and budgets. It outlines CERN's approach of moving to cloud models using open source tools. The status provides details on OpenStack deployments at CERN and experiments. It outlines next steps such as moving to new OpenStack releases and using cells to scale capacity.
This document provides a tutorial about a seismic sensor network. It discusses:
1) The special demands of seismic and acoustic applications including large-scale deployment, challenged networks, and remote monitoring requirements.
2) An overview of the software and hardware used in the network including the CDCCs, Q330 data loggers, Duiker data collection software, and DTS remote management software.
3) How to assemble a seismic node in 30 minutes by connecting sensors, data loggers, and wireless nodes together and reprogramming the nodes.
2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...
This document provides an overview of 3D Elevation Program (3DEP) lidar data. It defines key lidar terminology and describes the various products that can be derived from lidar point clouds, including digital surface models (DSMs), digital terrain models (DTMs), intensity images, and canopy height models. It explains the quality levels and availability of 3DEP lidar data across the United States. The document also briefly discusses the latest lidar technologies, such as multispectral, single-photon, and Geiger-mode lidar.
The World Wide Distributed Computing Architecture of the LHC Datagrid
This document provides an overview of distributed data management for the Large Hadron Collider (LHC) experiments at CERN. It discusses the worldwide computing grid that is used to store, process, and analyze the immense volumes of data produced by the LHC experiments each year. The grid consists of Tier 0, 1, and 2 computing centers around the world. It has enabled scientists from many collaborating institutions to work together on data from the LHC experiments.
Science and Cyberinfrastructure in the Data-Dominated Era
10.02.22
Invited talk
Symposium #1610, How Computational Science Is Tackling the Grand Challenges Facing Science and Society
Title: Science and Cyberinfrastructure in the Data-Dominated Era
San Diego, CA
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
How HPC and large-scale data analytics are transforming experimental science
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Talk delivered at Free and Open Source Software for Geo North America 2019 (FOSS4GNA)
Large scale solar arrays or farms have been installed globally faster than can be reliably tracked by interested stakeholders. We have built a deep learning model with Sentinel 2 satellite imagery that allows us to create accurate, timely global maps of solar farms.
NVIDIA is working on tackling climate change through the development of digital twins of Earth using AI and high performance computing. They are collaborating with various partners on initiatives like Destination Earth, which envisions an interactive digital twin platform for modeling and simulation. NVIDIA technologies like Omniverse, AI, and upcoming CPUs like Grace could help make a fully realized digital twin a reality. This would allow researchers to better understand climate systems and explore different scenarios to help mitigate and adapt to climate change.
From pixels to point clouds - Using drones,game engines and virtual reality t...
Drone-based monitoring and 3D modeling of the National Arboretum in Canberra is allowing for detailed phenotyping of tree growth over time. Drones equipped with RGB and multispectral cameras capture aerial images that are processed using software like Pix4D to generate orthomosaic images, digital elevation models, 3D point clouds, and tree metrics like height and area. The data is helping researchers monitor changes in the young research forest over several years. Advanced visualization tools are being developed to better explore the large, complex datasets.
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
05.02.23
Invited Access Grid Talk
MSCMC FORUM Series
Examining the National Vision for Global Peace and Prosperity
Title: The Academic and R&D Sectors' Current and Future Broadband and Fiber Access Needs for US Global Competitiveness
Arlington, VA
Toward a Global Interactive Earth Observing Cyberinfrastructure
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
05.06.14
Keynote to the 15th Federation of Earth Science Information Partners Assembly Meeting: Linking Data and Information to Decision Makers
Title: The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way to the International LambdaGrid
San Diego, CA
The ITIC will examine NASA's IT infrastructure, software, and data environments to identify opportunities for improvement. This includes investigating collaborative tools, high performance computing, data storage, and aerospace communications. The committee will also review the OCIO's strategic plans and IT governance across NASA to recommend best practices for managing IT infrastructure. The goal is to help NASA utilize leading edge capabilities and disruptive technologies to enhance distributed teams and mission activities.
Differential data processing for energy efficiency of wireless sensor networksDaniel Lim
Wireless sensor networks use many types of wireless sensors to configure network. However batteries in wireless sensor nodes are energy limited and consume considerable energy for data transmission. Therefore, data merging is used as a means to increase energy efficiency in data transmission. In this paper, we propose Differential Data Processing(DDP), which reduces the size of data transmitted from the sensor node to increase the energy efficiency of the wireless sensor network. Experimental results show that processing the differential temperature data reduces the average data size of the sensor node by 30%.
Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick
The document discusses big data and analytics technologies. It describes how new technologies like Hadoop and MapReduce enable processing of extremely large datasets. It also discusses future technologies like exascale computing and storage class memory that will be needed to manage increasing data volumes and support real-time analytics.
High Performance Computing - Challenges on the Road to Exascale ComputingHeiko Joerg Schick
The document discusses challenges in achieving exascale computing capabilities by 2018. It outlines how standard technology scaling will not be enough, and compromises will need to be made. These include reduced node performance, lower network bandwidth and fewer pins. Blue Gene architecture is presented as an example of a balanced system that achieves high performance through optimized interconnects and packaging density. A thought experiment proposes integrating significant solid state storage at each node to create an "active storage" machine based on Blue Gene architecture.
The document describes NCAR's wind forecasting system for Xcel Energy. Key points:
- NCAR operates a probabilistic wind power prediction system for Xcel Energy using ensemble forecasts from its WRF model and an Analog Ensemble technique.
- The system provides day-ahead and short-term wind power forecasts for effective grid integration and energy trading.
- Accurate wind power predictions provide economic benefits of $1.9 million per 1% improvement and reduced 238,136 tons of CO2 emissions in 2011 due to avoided generation.
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...Jan Aerts
Cloudgene is an open-source platform that provides a graphical web interface to simplify the execution of MapReduce programs for genomic data analysis in public and private clouds. It allows users to integrate different MapReduce programs through a plugin interface, import and export data from various sources, and connect programs together in a pipeline. Cloudgene handles setting up clusters in public clouds and installing programs and data, making it easier for scientists to perform MapReduce analysis without having to manage the underlying infrastructure.
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
The document discusses challenges faced with application specific supercomputer design. It provides an example of QPACE, a supercomputer designed for quantum chromodynamics (QCD) computations. Key challenges discussed include data ordering issues when using InfiniBand networking that could cause computations to use invalid data if ordering of writes to memory was not enforced. Ensuring proper data ordering is important to avoid software consuming data before it is valid.
In this slidecast, Jason Stowe from Cycle Computing describes the company's recent record-breaking Petascale CycleCloud HPC production run.
"For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours. Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule."
Learn more: http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html
Watch the video presentation: http://wp.me/p3RLHQ-aO9
Detection of Radio Emission from FireballsCarlos Bella
This document summarizes the detection of radio emissions from fireballs (very bright meteors) using the Long Wavelength Array radio telescope. A search of over 11,000 hours of all-sky radio images found 49 long-duration radio transients. Ten of these transients correlated spatially and temporally with fireballs detected by an optical meteor monitoring network. This provides evidence that fireballs emit previously undiscovered low frequency radio pulses. Further analysis found characteristics inconsistent with expected radio reflections from meteor trails, suggesting a non-thermal radio emission mechanism from the fireballs. This identifies a new class of natural radio transients and provides a new probe to study meteor physics.
How to Prepare Weather and Climate Models for Future HPC Hardwareinside-BigData.com
In this video from GTC 2018, Peter Dueben from ECMWF presents: How to Prepare Weather and Climate Models for Future HPC Hardware.
"Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modeling such as the use of reduced numerical precision and deep learning."
Watch the video: https://wp.me/p3RLHQ-ixu
Learn more: https://www.ecmwf.int/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
A Low-cost and Scalable Visualization System for Electricity ConsumptionRyousei Takano
This document summarizes a low-cost and scalable system for visualizing electricity consumption. The system uses inexpensive current sensors connected to data collection units that transmit power usage data to a Google App Engine cloud server. This allows visualization apps to retrieve the data and monitor consumption across many sensors over time. The system was demonstrated at the SC11 conference with data from 82 sensors across Japan and the US being visualized.
Image Segmentation Using Hardware Forest ClassifiersNeil Pittman
The document describes a Kinect pipeline that uses hardware acceleration to perform background removal and body part classification directly on the FPGA. It explores using a random forest algorithm called Forest Fire, which has an efficient FPGA implementation, to classify pixels into player/background and body part classes. Two approaches are evaluated: a one-stage Forest Fire model, and a two-stage approach where the first stage classifies player/background and the second stage further classifies body parts. The hardware implementations achieve faster speeds and lower power compared to a CPU baseline.
1) BurstCube is a CubeSat mission that will use four CsI scintillators read out by silicon photomultipliers (SiPMs) to detect gamma rays between 10 keV and 1 MeV and observe astrophysical counterparts to gravitational waves.
2) The study characterized SiPMs to qualify them for use in space missions like BurstCube by measuring their sensitivity, dark current, and pulse height in a dark box setup with an LED pulser.
3) Preliminary characterization of the SiPM's dark current and response to LED pulses was performed and the data was analyzed to understand the device's performance and background noise levels. Further optical tests and integration with scintillators is planned.
Fun with JavaScript and sensors - AmsterdamJS April 2015Jan Jongboom
This document discusses using various sensors in mobile devices for creative purposes beyond their typical uses. It describes how sensors like accelerometers, gyroscopes, ambient light sensors, and magnetometers can be accessed through JavaScript to build applications like a Theremin music instrument controlled by ambient light levels or tracking real-world device movement in 3D. It also covers experimental uses of beacons and connecting devices to visualize juggling. The document encourages developers to think creatively about how sensor data could enable new types of applications on mobile platforms.
This document discusses clouds at CERN. It provides background on CERN, including that it was founded in 1954 by 12 European states for "Science for Peace" and now has 20 member states. It notes CERN has around 2300 staff, 1000 other paid personnel, and over 11,000 users. The document discusses challenges in scaling IT infrastructure with fixed staff and budgets. It outlines CERN's approach of moving to cloud models using open source tools. The status provides details on OpenStack deployments at CERN and experiments. It outlines next steps such as moving to new OpenStack releases and using cells to scale capacity.
This document provides a tutorial about a seismic sensor network. It discusses:
1) The special demands of seismic and acoustic applications including large-scale deployment, challenged networks, and remote monitoring requirements.
2) An overview of the software and hardware used in the network including the CDCCs, Q330 data loggers, Duiker data collection software, and DTS remote management software.
3) How to assemble a seismic node in 30 minutes by connecting sensors, data loggers, and wireless nodes together and reprogramming the nodes.
2016 asprs track: overview and user perspective of usgs 3 dep lidar by john ...GIS in the Rockies
This document provides an overview of 3D Elevation Program (3DEP) lidar data. It defines key lidar terminology and describes the various products that can be derived from lidar point clouds, including digital surface models (DSMs), digital terrain models (DTMs), intensity images, and canopy height models. It explains the quality levels and availability of 3DEP lidar data across the United States. The document also briefly discusses the latest lidar technologies, such as multispectral, single-photon, and Geiger-mode lidar.
This document provides an overview of distributed data management for the Large Hadron Collider (LHC) experiments at CERN. It discusses the worldwide computing grid that is used to store, process, and analyze the immense volumes of data produced by the LHC experiments each year. The grid consists of Tier 0, 1, and 2 computing centers around the world. It has enabled scientists from many collaborating institutions to work together on data from the LHC experiments.
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
10.02.22
Invited talk
Symposium #1610, How Computational Science Is Tackling the Grand Challenges Facing Science and Society
Title: Science and Cyberinfrastructure in the Data-Dominated Era
San Diego, CA
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Detecting solar farms with deep learningJason Brown
Talk delivered at Free and Open Source Software for Geo North America 2019 (FOSS4GNA)
Large scale solar arrays or farms have been installed globally faster than can be reliably tracked by interested stakeholders. We have built a deep learning model with Sentinel 2 satellite imagery that allows us to create accurate, timely global maps of solar farms.
NVIDIA is working on tackling climate change through the development of digital twins of Earth using AI and high performance computing. They are collaborating with various partners on initiatives like Destination Earth, which envisions an interactive digital twin platform for modeling and simulation. NVIDIA technologies like Omniverse, AI, and upcoming CPUs like Grace could help make a fully realized digital twin a reality. This would allow researchers to better understand climate systems and explore different scenarios to help mitigate and adapt to climate change.
From pixels to point clouds - Using drones,game engines and virtual reality t...ARDC
Drone-based monitoring and 3D modeling of the National Arboretum in Canberra is allowing for detailed phenotyping of tree growth over time. Drones equipped with RGB and multispectral cameras capture aerial images that are processed using software like Pix4D to generate orthomosaic images, digital elevation models, 3D point clouds, and tree metrics like height and area. The data is helping researchers monitor changes in the young research forest over several years. Advanced visualization tools are being developed to better explore the large, complex datasets.
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...Larry Smarr
05.02.23
Invited Access Grid Talk
MSCMC FORUM Series
Examining the National Vision for Global Peace and Prosperity
Title: The Academic and R&D Sectors' Current and Future Broadband and Fiber Access Needs for US Global Competitiveness
Arlington, VA
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...Larry Smarr
05.06.14
Keynote to the 15th Federation of Earth Science Information Partners Assembly Meeting: Linking Data and Information to Decision Makers
Title: The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way to the International LambdaGrid
San Diego, CA
In this video from ChefConf 2014 in San Francisco, Cycle Computing CEO Jason Stowe outlines the biggest challenge facing us today, Climate Change, and suggests how Cloud HPC can help find a solution, including ideas around Climate Engineering, and Renewable Energy.
"As proof points, Jason uses three use cases from Cycle Computing customers, including from companies like HGST (a Western Digital Company), Aerospace Corporation, Novartis, and the University of Southern California. It’s clear that with these new tools that leverage both Cloud Computing, and HPC – the power of Cloud HPC enables researchers, and designers to ask the right questions, to help them find better answers, faster. This all delivers a more powerful future, and means to solving these really difficult problems."
Watch the video presentation: http://insidehpc.com/2014/09/video-hpc-cluster-computing-64-156000-cores/
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
Presented at EDUCAUSE CCCG March 2021.
The IceCube Neutrino Observatory is the world’s premier facility to detect neutrinos.
Built at the south pole in natural ice, it requires extensive and expensive calibration to properly track the neutrinos.
Most of the required compute power comes from on-prem resources through the Open Science Grid,
but IceCube can easily harness the Cloud compute at any scale, too, as demonstrated by a series of Cloud bursts.
This talk provides both details of the performed Cloud bursts, as well as some insight in the science itself.
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
This talk was given at a workshop entitled "Cybersecurity Engagement in a Research Environment" at Rady School of Management at UCSD. The workshop was organized by Michael Corn, the UCSD CISO. It tries to provoke discussion around the cybersecurity features and requirements of international science collaborations, as well as more generally, federated cyberinfrastructure systems.
The document discusses how the TeraGrid initiative provides US researchers with access to large scale high performance computing resources for physics research. It describes the diverse computing resources available through TeraGrid including supercomputers, clusters, and visualization resources. It provides examples of how physics domains like lattice QCD, astrophysics, and nanoscale electronic structure are using TeraGrid resources to enable large simulations and address research challenges. Training and support resources for users are also summarized.
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
05.02.04
Invited Talk to the NASA Jet Propulsion Laboratory
Title: LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks and High Resolution Visualizations
Pasadena, CA
NASA Advanced Computing Environment for Science & Engineeringinside-BigData.com
In this deck from the 2017 Argonne Training Program on Extreme-Scale Computing, Rupak Biswas from NASA presents: NASA Advanced Computing Environment for Science & Engineering.
""High performance computing is now integral to NASA’s portfolio of missions to pioneer the future of space exploration, accelerate scientific discovery, and enable aeronautics research. Anchored by the Pleiades supercomputer at NASA Ames Research Center, the High End Computing Capability (HECC) Project provides a fully integrated environment to satisfy NASA’s diverse modeling, simulation, and analysis needs. In addition, HECC serves as the agency’s expert source for evaluating emerging HPC technologies and maturing the most appropriate ones into the production environment. This includes investigating advanced IT technologies such as accelerators, cloud computing, collaborative environments, big data analytics, and adiabatic quantum computing. The overall goal is to provide a consolidated bleeding-edge environment to support NASA's computational and analysis requirements for science and engineering applications."
Dr. Rupak Biswas is currently the Director of Exploration Technology at NASA Ames Research Center, Moffett Field, Calif., and has held this Senior Executive Service (SES) position since January 2016. In this role, he in charge of planning, directing, and coordinating the technology development and operational activities of the organization that comprises of advanced supercomputing, human systems integration, intelligent systems, and entry systems technology. The directorate consists of approximately 700 employees with an annual budget of $160 million, and includes two of NASA’s critical and consolidated infrastructures: arc jet testing facility and supercomputing facility. He is also the Manager of the NASA-wide High End Computing Capability Project that provides a full range of advanced computational resources and services to numerous programs across the agency. In addition, he leads the emerging quantum computing effort for NASA.
Watch the video: https://wp.me/p3RLHQ-hua
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Similar to Near Exascale Computing in the Cloud (20)
Lunar Mobility Drivers and Needs - ArtemisSérgio Sacani
NASA’s new campaign of lunar exploration will see astronauts visiting sites of scientific or strategic
interest across the lunar surface, with a particular focus on the lunar South Pole region.[1] After landing
crew and cargo at these destinations, local mobility around landing sites will be key to movement of
cargo, logistics, science payloads, and more to maximize exploration returns.
NASA’s Moon to Mars Architecture Definition Document (ADD)[2] articulates the work needed to achieve
the agency’s human lunar exploration objectives by decomposing needs into use cases and functions.
Ongoing analysis of lunar exploration needs reveals demands that will drive future concepts and elements.
Recent analysis of integrated surface operations has shown that the transportation of cargo on the
surface from points of delivery to points of use will be particularly important. Exploration systems will
often need to support deployment of cargo in close proximity to other surface infrastructure. This cargo
can range from the crew logistics and consumables described in the 2023 “Lunar Logistics Drivers and
Needs” white paper,[3] to science and technology demonstrations, to large-scale infrastructure that
requires precision relocation.
Search for Dark Matter Ionization on the Night Side of Jupiter with CassiniSérgio Sacani
We present a new search for dark matter (DM) using planetary atmospheres. We point out that
annihilating DM in planets can produce ionizing radiation, which can lead to excess production of
ionospheric Hþ
3 . We apply this search strategy to the night side of Jupiter near the equator. The night side
has zero solar irradiation, and low latitudes are sufficiently far from ionizing auroras, leading to a lowbackground search. We use Cassini data on ionospheric Hþ
3 emission collected three hours either side of
Jovian midnight, during its flyby in 2000, and set novel constraints on the DM-nucleon scattering cross
section down to about 10−38 cm2. We also highlight that DM atmospheric ionization may be detected in
Jovian exoplanets using future high-precision measurements of planetary spectra.
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetSérgio Sacani
We observed two transits of HD 189733b in JWST program 1633 using JWST
NIRCam grism F444W and F322W2 filters on August 25 and 29th 2022. The first
visit with F444W used SUBGRISM64 subarray lasting 7877 integrations with 4
BRIGHT1 groups per integration. Each effective integration is 2.4s for a total effective exposure time of 18780.9s and a total exposure duration of 21504.2s (∼6 hrs)
including overhead. The second visit with F322W2 used SUBGRISM64 subarray
lasting 10437 integrations with 3 BRIGHT1 groups per integration. Each effective
integration is 1.7s for a total effective exposure time of 17774.7s and a total exposure
duration of 21383.1s (∼6 hrs) including overhead. The transit duration of HD189733
b is ∼1.8 hrs and both observations had additional pre-ingress baseline relative to
post-egress baseline in anticipating the potential ramp systematics at the beginning
of the exposure from NIRCam infrared detectors.
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Hossein Fani
Collaborative team recommendation involves selecting users with certain skills to form a team who will, more likely than not, accomplish a complex task successfully. To automate the traditionally tedious and error-prone manual process of team formation, researchers from several scientific spheres have proposed methods to tackle the problem. In this tutorial, while providing a taxonomy of team recommendation works based on their algorithmic approaches to model skilled users in collaborative teams, we perform a comprehensive and hands-on study of the graph-based approaches that comprise the mainstream in this field, then cover the neural team recommenders as the cutting-edge class of approaches. Further, we provide unifying definitions, formulations, and evaluation schema. Last, we introduce details of training strategies, benchmarking datasets, and open-source tools, along with directions for future works.
El Nuevo Cohete Ariane de la Agencia Espacial Europea-6_Media-Kit_english.pdfChamps Elysee Roldan
Europe must have autonomous access to space to realise its ambitions on the world stage and
promote knowledge and prosperity.
Space is a natural extension of our home planet and forms an integral part of the infrastructure
that is vital to daily life on Earth. Europe must assert its rightful place in space to ensure its
citizens thrive.
As the world’s second-largest economy, Europe must ensure it has secure and autonomous access to
space, so it does not depend on the capabilities and priorities of other nations.
Europe’s longstanding expertise in launching spacecraft and satellites has been a driving force behind
its 60 years of successful space cooperation.
In a world where everyday life – from connectivity to navigation, climate and weather – relies on
space, the ability to launch independently is more important than ever before. With the launch of
Ariane 6, Europe is not just sending a rocket into the sky, we are asserting our place among the
world’s spacefaring nations.
ESA’s Ariane 6 rocket succeeds Ariane 5, the most dependable and competitive launcher for decades.
The first Ariane rocket was launched in 1979 from Europe’s Spaceport in French Guiana and Ariane 6 will continue the adventure.
Putting Europe at the forefront of space transportation for nearly 45 years, Ariane is a triumph of engineering and the prize of great European industrial and political
cooperation. Ariane 1 gave way to more powerful versions 2, 3 and 4. Ariane 5 served as one of the world’s premier heavy-lift rockets, putting single or multiple
payloads into orbit – the cargo and instruments being launched – and sent a series of iconic scientific missions to deep space.
The decision to start developing Ariane 6 was taken in 2014 to respond to the continued need to have independent access to space, while offering efficient
commercial launch services in a fast-changing market.
ESA, with its Member States and industrial partners led by ArianeGroup, is developing new technologies for new markets with Ariane 6. The versatility of Ariane 6
adds a whole new dimension to its very successful predecessors
This an presentation about electrostatic force. This topic is from class 8 Force and Pressure lesson from ncert . I think this might be helpful for you. In this presentation there are 4 content they are Introduction, types, examples and demonstration. The demonstration should be done by yourself
This an presentation about electrostatic force. This topic is from class 8 Force and Pressure lesson from ncert . I think this might be helpful for you. In this presentation there are 4 content they are Introduction, types, examples and demonstration. The demonstration should be done by yourself
ScieNCE grade 08 Lesson 1 and 2 NLC.pptxJoanaBanasen1
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it..............
Dalghren, Thorne and Stebbins System of Classification of AngiospermsGurjant Singh
The Dahlgren, Thorne, and Stebbins system of classification is a modern method for categorizing angiosperms (flowering plants) based on phylogenetic relationships. Developed by botanists Rolf Dahlgren, Robert Thorne, and G. Ledyard Stebbins, this system emphasizes evolutionary relationships and incorporates extensive morphological and molecular data. It aims to provide a more accurate reflection of the genetic and evolutionary connections among angiosperm families and orders, facilitating a better understanding of plant diversity and evolution. This classification system is a valuable tool for botanists, researchers, and horticulturists in studying and organizing the vast diversity of flowering plants.
1. Near Exascale Computing in the Cloud:
the use of GPU bursts for Multi-Messenger
Astrophysics with IceCube Data
Frank Würthwein
OSG Executive Director
UCSD/SDSC
2. Jensen Huang keynote @ SC19
2
The Largest Cloud Simulation in History
50k NVIDIA GPUs in the Cloud
350 Petaflops for 2 hours
Distributed across US, Europe & Asia
Saturday morning before SC19 we bought all GPU capacity
that was for sale worldwide across AWS, Azure, and Google
3. A Story of 3 Cloud Bursts
• Saturday before SC19:
- Buy the entire GPU capacity worldwide that is for sale in AWS, Azure,
and Google for a couple of hours.
- Proof of principle & measurement of global GPU capacity
• February 4th 2020:
- Buy a workday’s worth of GPU capacity of only the most cost effective
GPUs for our application.
- Establish standard operations & cost
• November 4th 2020:
- Repeat without any storage in the cloud. All data input and output via
network. EGRESS via cloud connect to minimize charges.
- Establish on-prem to cloud networking and cloud connect routing
3
We will discuss this story from beginning to end.
5. IceCube
5
A cubic kilometer of ice at the
south pole is instrumented
with 5160 optical sensors.
Astrophysics:
• Discovery of astrophysical neutrinos
• First evidence of neutrino point source (TXS)
• Cosmic rays with surface detector
Particle Physics:
• Atmospheric neutrino oscillation
• Neutrino cross sections at TeV scale
• New physics searches at highest energies
Earth Science:
• Glaciology
• Earth tomography
A facility with very
diverse science goals
Restrict this talk to
high energy Astrophysics
6. High Energy Astrophysics
Science case for IceCube
6
Universe is opaque to light
at highest energies and
distances.
Only gravitational waves
and neutrinos can
pinpoint most violent
events in universe.
Fortunately, highest energy
neutrinos are of cosmic origin.
Effectively “background free” as long
as energy is measured correctly.
7. High energy neutrinos from
outside the solar system
7
First 28 very high energy neutrinos from outside the solar system
Red curve is the photon flux
spectrum measured with the
Fermi satellite.
Black points show the
corresponding high energy
neutrino flux spectrum
measured by IceCube.
This demonstrates both the opaqueness of the universe to high energy
photons, and the ability of IceCube to detect neutrinos above the
maximum energy we can see light due to this opaqueness.
Science 342 (2013). DOI:
10.1126/science.1242856
8. Understanding the Origin
8
We now know high energy events happen in the universe. What are they?
p + g D + p + p 0 p + g g
p + g D + n + p + n + m + n
Cosm
Aya Ishihara
The hypothesis:
The same cosmic events produce
neutrinos and photons
We detect the electrons or muons from neutrino that interact in the ice.
Neutrino interact very weakly => need a very large array of ice instrumented
to maximize chances that a cosmic neutrino interacts inside the detector.
Need pointing accuracy to point back to origin of neutrino.
Telescopes the world over then try to identify the source in the direction
IceCube is pointing to for the neutrino.
Multi-messenger Astrophysics
9. The ν detection challenge
9
Optical Prop
Aya Ishihara
• Combining all the possible infor
• These features are included in s
• We’re always be developing the
Nature never tell us a perfect a
satisfactory agreeme
Ice properties change with
depth and wavelength
Observed pointing resolution at high
energies is systematics limited.
Central value moves
for different ice models
Improved e and τ reconstruction
Þ increased neutrino flux
detection
Þ more observations
Photon propagation through
ice runs efficiently on single
precision GPU.
Detailed simulation campaigns
to improve pointing resolution
by improving ice model.
Improvement in reconstruction with
better ice model near the detectors
10. First evidence of an origin
10
side view
125m
top view
0 500 1000 1500 2000 2500 3000
nanoseconds
Figure 1: Event display for neutrino event IceCube-170922A. The time at which a DOM
observed a signal is reflected in the color of the hit, with dark blues for earliest hits and yellow
First location of a source of very high energy neutrinos.
Neutrino produced high energy muon
near IceCube. Muon produced light as it
traverses IceCube volume. Light is
detected by array of phototubes of
IceCube.
IceCube alerted the astronomy community of the
observation of a single high energy neutrino on
September 22 2017.
A blazar designated by astronomers as TXS
0506+056 was subsequently identified as most likely
source in the direction IceCube was pointing. Multiple
telescopes saw light from TXS at the same time
IceCube saw the neutrino.
Science 361, 147-151
(2018). DOI:10.1126/science.aat2890
11. IceCube’s Future Plans
11
| IceCube Upgrade and Gen2 | Summer Blot | TeVPA 2018 16
The IceCube-Gen2 Facility
Preliminary timeline
MeV- to EeV-scale physics
Surface array
High Energy
Array
Radio array
PINGU
IC86
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 … 2032
Today
Surface air shower
Construction
R&D Design & Approval
IceCube Upgrade
IceCube Upgrade
Deployment
Near term:
add more phototubes to deep core to increase granularity of measurements.
Longer term:
• Extend instrumented
volume at smaller
granularity.
• Extend even smaller
granularity deep core
volume.
• Add surface array.
Improve detector for low & high energy neutrinos
13. The Idea
• Integrate all GPUs available for sale
worldwide into a single HTCondor pool.
- use 28 regions across 3 cloud providers for a
burst of a couple hours, or so.
• IceCube submits their photon propagation
workflow to this HTCondor pool.
- we handle the input, the jobs on the GPUs, and
the output as a single globally distributed system.
13
Run a GPU burst relevant in scale
for future Exascale HPC systems.
14. A global HTCondor pool
• IceCube, like all OSG user communities, relies on
HTCondor for resource orchestration
- This demo used the standard tools
• Dedicated HW setup
- Avoid disruption of OSG production system
- Optimize HTCondor setup for the spiky nature of the demo
§ multiple schedds for IceCube to submit to
§ collecting resources in each cloud region, then collecting from all
regions into global pool
14
16. Using native Cloud storage
• Input data pre-staged into native Cloud storage
- Each file in one-to-few Cloud regions
§ some replication to deal with limited predictability of resources per region
- Local to Compute for large regions for maximum throughput
- Reading from “close” region for smaller ones to minimize ops
• Output staged back to region-local Cloud storage
• Deployed simple wrappers around Cloud native file
transfer tools
- IceCube jobs do not need to customize for different Clouds
- They just need to know where input data is available
(pretty standard OSG operation mode)
16
17. Science with 51,000 GPUs
achieved as peak performance
17
Time in Minutes
Each color is a different
cloud region in US, EU, or Asia.
Total of 28 Regions in use.
Peaked at 51,500 GPUs
~380 Petaflops of fp32
8 generations of NVIDIA GPUs used.
Summary of stats at peak
18. A Heterogenous Resource Pool
18
28 cloud Regions across 4 world regions
providing us with 8 GPU generations.
No one region or GPU type dominates!
19. Science Produced
19
Distributed High-Throughput
Computing (dHTC) paradigm
implemented via HTCondor provides
global resource aggregation.
Largest cloud region provided 10.8% of the total
dHTC paradigm can aggregate
on-prem anywhere
HPC at any scale
and multiple clouds
21. Performance vs GPU type
21
42% of the science was done on V100 in 19% of the wall time.
22. Second Cloud Burst focused on
maximizing science/$$$
2nd burst was an 8h work day in pacific time zone on a
random Tuesday in February
Do a burst that we could repeat anytime,
with any dHTC application.
23. A Day of Cloud Use
23
Integrated one
EFLOP32 hour
170 PFLOP32s plateau
Total bill: ~$60k,
including networking and storage
We did a 2nd run on February 4th 2020 to focus on a cost-effective 8h work day
We picked a “random” Tuesday during peak working hours (pacific).
24. Cost to support cloud as a
“24x7” capability
• February 2020: roughly $60k per ExaFLOP32 hour
• This burst was executed by 2 people
- Igor Sfiligoi (SDSC) to support the infrastructure.
- David Schultz (UW Madison) to create and submit the
IceCube workflows.
§ “David” type person is needed also for on-prem science workflows.
• To make this a routine operations capability for any
open science that is dHTC capable would require
another 50% FTE “Cloud Budget Manager”.
- There is substantial effort involved in just dealing with cost &
budgets for a large community of scientists.
24
25. To provide an aggregate ExaFlop32
hour per day dHTC production capability
in the commercial cloud for the sum of
many sciences today would require:
1.5FTE of human effort
$60k of cloud costs per day
This does not include the human effort to train the community,
define the workflows, run the workflows, … i.e. it does not include
what the scientists themselves still have to do.
26. 3rd Cloud Burst
Buy enough GPUs to saturate 100Gbps network to
UW Madison with overflow to UCSD
Do EGRESS entirely via Internet2 Cloud Connect
for AWS, Azure, and Google
Scale of GPU burst peaked at
60% of second cloud burst.
Used smaller set of regions but
still all 3 providers
27. Egress data intensive in nature
• Cloud burst ~ saturated
100 Gbps link
- To make good use of a
large fraction
of available Cloud GPUs
• IceCube simulations
are relatively heavy in
egress data
- 2 GB per job
- Job length ~= 0.5 hour
• And very spiky
- The whole file
is transferred
after compute
completed
• Input sizes small-ish
- 0.25 GB Peaked at 90.3Gbps at UW Madison
plus an additional 10-20Gbps at UCSD
28. Using Internet2 Cloud Connect Service
• Egress costs notoriously high
• Buying dedicated links is
cheaper
- If provisioned on demand
• Internet2 acts as provider for
the research community
- For AWS, Azure and GCP
• No 100Gbps links available
- Had to stitch together 21 links,
at 10Gbps, 5Gbps and 2 Gbps
Each color band belongs
to one network link
https://internet2.edu/services/cloud-connect/
130TB
in 5hours
29. Struggled with spiky workload
during trial run
• Attempted burst in trial run lead
to “oscillatory” network use.
• Noticed links to different providers behave differently
- Some capped, some flexible
- Long upload times when congested => waste of money
AWS
Azure
50Gbps
30. Slow & careful during big “burst”
• Ramp up for over 2 hours • Still not perfect
- But much smoother
2 hour ramp
GB/sec GB/sec
21 network links
were provisioned
IO across individual
links quite chaotic
Started slow to randomize job end times, and thus network transfers.
And yet, the individual link utilization is still quite spikey.
31. Screenshot of provisioned links
Bought:
10 links @ 5Gbps
5 links @ 2Gbps
6 links @ 10Gbps
Our ability to use a link depended on the
availability of GPUs in the corresponding region.
A bit of a Tetris problem.
32. Very different provisioning
in the 3 Clouds
• AWS the most complex
- And requires initiation by
on-prem network engineer
• Many steps after initial request
- Accept connection request
- Create VPG
- Associate VPG with VPC
- Create DCG
- Create VIF
§ Relay back to on-prem the BGP key
- Establish VPC -> VPG routing
- Associate DCG -> VPG
• And don’t forget the Internet routers, if
you use dedicated VPCs
• GCP the simplest
- Create Cloud Router
- Create Interconnect
§ Provide key to on-prem
• Azure not much harder
- Make sure the VPC has Gateway subnet
- Create ExpressRoute (ER)
§ Provide key to on-prem
- Create VNG
- Create connection between ER and VNG
- But Azure comes with many more
options
to choose from
This was the hardest of our 3 cloud bursts
because it required a lot of coordination,
and had too many parts without automation.
(Tetris problem of GPU availability, job end time, link bandwidth)
33. Additional on-prem
networking setup needed
• Quote from Michael Hare, UW Madison
Network engineer:
In addition to network configuration [at] UW Madison (AS59), we provisioned BGP
based Layer 3 MPLS VPNs (L3VPNs) towards Internet2 via our regional aggregator,
BTAA OmniPop.
This work involved reaching out to the BTAA NOC to coordinate on VLAN numbers
and paths and to [the] Internet2 NOC to make sure the newly provisioned VLANs
were configurable inside OESS.
Due to limitations in programmability or knowledge at the time regarding duplicate
IP address towards the cloud (GCP, Azure, AWS) endpoints, we built several discrete
L3VPNs inside the Internet2 network to accomplish the desired topology.
Not something domain scientists can expect to accomplish.
34. Applicability beyond IceCube
• All the large instruments we know off
- LHC, LIGO, DUNE, LSST, …
• Any midscale instrument we can think off
- XENON, GlueX, Clas12, Nova, DES, Cryo-EM, …
• A large fraction of Deep Learning
- But not all of it …
• Basically, anything that has bundles of
independently schedulable jobs that can be
partitioned to adjust workloads to have 0.5 to
few hour runtimes on modern GPUs.
34
35. IceCube is ready for Exascale
• Humanity has built extraordinary instruments by pooling
human and financial resources globally.
• The computing for these large collaborations fits perfectly to
the cloud or scheduling holes in Exascale HPC systems due
to its “ingeniously parallel” nature. => dHTC
• The dHTC computing paradigm applies to a wide range of
problems across all of open science.
- We are happy to repeat this with anybody willing to spend $50k in the
clouds.
35
Contact us at: support@opensciencegrid.org
Or me personally at: fkw@ucsd.edu
Demonstrated elastic burst at 51,500 GPUs
IceCube is ready for Exascale
36. Acknowledgements
• This work was partially supported by the
NSF grants OAC-1941481, MPS-1148698,
OAC-1841530, OAC-1904444, and OAC-
1826967, OPP-1600823
36