NRP will replace PRP and aims to democratize access to national research cyberinfrastructure. The long term vision is to create an open national cyberinfrastructure by federating resources across research institutions. Key innovations include an innovative network fabric, application libraries for FPGAs, a "bring your own resource" model, and innovative scheduling and data infrastructure. The NSF has funded the Prototype National Research Platform project to support NRP for the next 5 years. NRP aims to grow resources, introduce new capabilities, and be driven by the research community.
Report
Share
Report
Share
1 of 22
Download to read offline
More Related Content
Similar to Frank Würthwein - NRP and the Path forward
The document discusses big data use cases and requirements. It provides 51 detailed use cases across various domains that generate many terabytes to petabytes of data. It also describes extracting 437 specific requirements from the use cases and analyzing trends. The next steps involve matching requirements to a reference architecture and prioritizing use cases for implementation.
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
This document discusses using cloud computing for research through Infrastructure as a Service (IaaS), Software as a Service (SaaS), and Desktop as a Service (DaaS). For IaaS, it describes the EGI Federated Cloud which provides cloud services from multiple public and private sector providers. For SaaS, it discusses Hub for managing the research lifecycle and data, and Chipster for bioinformatics analysis. For DaaS, it covers EOSCloud which provides virtual desktops for bioinformatics research through the JASMIN cloud. Overall it promotes cloud computing for enabling flexible infrastructure, services, and environments to support diverse research needs.
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent
The Oak Ridge Leadership Facility (OLCF) in the National Center for Computational Sciences (NCCS) division at Oak Ridge National Laboratory (ORNL) houses world-class high-performance computing (HPC) resources and has a history of operating top-ranked supercomputers on the TOP500 list, including the world's current fastest, Summit, an IBM AC922 machine with a peak of 200 petaFLOPS. With the exascale era rapidly approaching, the need for a robust and scalable big data platform for operations data is more important than ever. In the past when a new HPC resource was added to the facility, pipelines from data sources spanned multiple data sinks which oftentimes resulted in data silos, slow operational data onboarding, and non-scalable data pipelines for batch processing. Using Apache Kafka as the message bus of the division's new big data platform has allowed for easier decoupling of scalable data pipelines, faster data onboarding, and stream processing with the goal to continuously improve insight into the HPC resources and their supporting systems. This talk will focus on the NCCS division's transition to Apache Kafka over the past few years to enhance the OLCF's current capabilities and prepare for Frontier, OLCF's future exascale system; including the development and deployment of a full big data platform in a Kubernetes environment from both a technical and cultural shift perspective. This talk will also cover the mission of the OLCF, the operational data insights related to high-performance computing that the organization strives for, and several use-cases that exist in production today.
Un cloud pour comparer nos gènes aux images du cerveau" Le pionnier des bases de données, aujourd'hui disparu, Jim Gray avait annoncé en 2007 l'emergence d'un 4eme paradigme scientifique: celui d'une recherche scientifique numérique entierement guidée par l'exploration de données massives. Cette vision est aujourd'hui la réalité de tous les jours dans les laboratoire de recherche scientifique, et elle va bien au delà de ce que l'on appelle communément "BIG DATA". Microsoft Research et Inria on démarré en 2010 un projet intitulé Azure-Brain (ou A-Brain) dont l'originalité consiste à a la fois construire au dessus de Windows Azure une nouvelle plateforme d'acces aux données massives pour les applications scientifiques, et de se confronter à la réalité de la recherche scientifique. Dans cette session nous vous proposons dans une premiere partie de resituer les enjeux recherche concernant la gestion de données massives dans le cloud, et ensuite de vous presenter la plateforme "TOMUS Blob" cloud storage optimisé sur Azure. Enfin nous vous presenterons le projet A-Brain et les résultats que nous avons obtenus: La neuro-imagerie contribue au diagnostic de certaines maladies du système nerveux. Mais nos cerveaux s'avèrent tous un peu différents les uns des autres. Cette variabilité complique l'interprétation médicale. D'où l'idée de corréler ldes images IRM du cerveaux et le patrimoine génétique de chaque patient afin de mieux délimiter les régions cérébrales qui présentent un intérêt symptomatique. Les images IRM haute définition de ce projet sont produites par la plate-forme Neurospin du CEA (Saclay). Problème pour Les chercheurs : la masse d'informations à traiter. Le CV génétique d'un individu comporte environ un million de données. À cela s'ajoutent des volumes tout aussi colossaux de pixel 3D pour décrire les images. Un data deluge: des peta octets de donnés et potentiellement des années de calcul. C'est donc ici qu'entre en jeu le cloud et une plateforme optimisée sur Azure pour traiter des applications massivement parallèles sur des données massives... Comme l'explique Gabriel Antoniu, son responsable, cette équipe de recherche rennaise a développé “des mécanismes de stockage efficaces pour améliorer l'accès à ces données massives et optimiser leur traitement. Nos développements permettent de répondre aux besoins applicatifs de nos collègues de Saclay.
A description of software as infrastructure at NSF, and how Apache projects may be similar. What lessons can be shared from one organization to the other? How does science software compare with more general software?
Cloud Standards in the Real World: Cloud Standards Testing for DevelopersAlan Sill
Learn about standards studied in the US National Science Foundation Cloud and Autonomic Computing Industry/University Cooperative Research Center Cloud Standards Testing Lab and how you can get involved to extend the successes from these results in your own cloud software settings. Presented at the O'Reilly OSCON 2014 Open Cloud Day.
Video available at https://www.youtube.com/watch?v=eD2h0SqC7tY
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Presentation at Networkshop46.
FRµIT: Raspberry Pi clusters and other adventures in networking research - by Phil Basford, University of Southampton.
Programmable network infrastructure: what does it mean for the campus? - by Matthew Broadbent, University of Lancaster.
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...IRJET Journal
The Open Cirrus project aims to provide a federated cloud computing testbed for systems and applications research. It consists of several geographically distributed data centers that researchers can access through a single sign-on. This will allow researchers to study issues around leveraging multiple data centers and conduct large-scale experiments not possible on individual clusters. The goals are to foster innovation in cloud computing systems research, encourage new cloud applications research, collect experimental datasets, and develop open-source cloud software stacks. By federating existing resources from different organizations into a single testbed, Open Cirrus hopes to reduce costs through shared innovation while increasing impact through a larger collaborative effort.
The document discusses how the Earth System Grid Federation (ESGF) leverages tools from Apache Solr and Apache Object Oriented Data Technology (OODT) to manage and distribute large amounts of climate science data. ESGF is an international collaboration that uses a distributed network of nodes running various software components to provide access to over 2.5 petabytes of climate model output and observational data. This infrastructure supports the research of the Intergovernmental Panel on Climate Change and projects like CMIP5, the largest coordinated climate modeling effort to date.
BeSTGRID aims to enhance research capability in New Zealand by providing skills training and infrastructure support. Since 2006, BeSTGRID has delivered services and tools to support research collaboration on shared data and computational resources. BeSTGRID coordinates access to compute and storage resources across New Zealand and provides discipline-specific applications and services to support researchers.
A presentation given at the workshop "Potential of satellite images and hyper/multi-spectral recording in archaeology"
By Anthony Beck
Poznan – 31st June 2012
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform (PRP) is a multi-institutional partnership that establishes a high-capacity "big data freeway system" spanning the University of California campuses and other research universities in California to facilitate rapid data access and sharing between researchers and institutions. Fifteen multi-campus application teams in fields like particle physics, astronomy, earth sciences, biomedicine, and visualization drive the technical design of the PRP over five years. The goal of the PRP is to extend campus "Science DMZ" networks to allow high-speed data movement between research labs, supercomputer centers, and data repositories across campus, regional
The document discusses the National Research Platform (NRP), specifically the 4th NRP workshop. It provides an overview of NRP's Nautilus, a multi-institution hypercluster connected by optical networks across 25 partner campuses. In 2022, Nautilus comprised ~200 computing nodes and 4000TB of rotating storage. The document highlights several large research projects from different domains that utilize Nautilus, including particle physics, telescopes, biomedical applications, earth sciences, and visualization. These applications demonstrate how Nautilus enables data-intensive and collaborative multi-campus research at national scale.
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
The document discusses NASA's use of Apache Spark for big data analytics. It provides context on Chris Mattmann's involvement with Spark through his roles at NASA JPL and the Apache Software Foundation. It outlines some of NASA's big data challenges around handling large volumes of Earth observation data from instruments and simulations. NASA is interested in using Spark for tasks like data triage, archiving, and knowledge extraction to help address these challenges and enable new scientific insights.
OpenStack Toronto Q3 MeetUp - September 28th 2017Stacy Véronneau
The Q3 MeetUp agenda included check-in from 17:45-18:15, introductions and reminders from 18:15-18:25, a presentation on OpenStack and Ceph production at OICR from 18:25-19:05, a break from 19:05-19:20, and a presentation on day 2 operations from 19:20-20:00. Stacy Véronneau was the MeetUp organizer and there were two scheduled speakers, George Mihaiescu and Jared Baker from OICR discussing their production Ceph cluster, and Dirk Wallerstorfer from Dynatrace discussing day 2 tools.
Grid computing allows for sharing and coordinated use of diverse computing resources virtually. It provides uniform access to computational resources over the Internet similar to how the web provides access to documents. Key motivations for grid computing include enabling large-scale science through geographically dispersed resources. Grid architectures have fabric, connectivity, resource, collective, and application layers. The Globus Toolkit is commonly used open source software that provides components for security, data management, scheduling, and more. Grids are used in various domains like earthquake and climate simulation.
Similar to Frank Würthwein - NRP and the Path forward (20)
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Larry Smarr
Larry Smarr discusses quantifying his body and health over time through extensive self-tracking. He measures various biomarkers through regular blood tests and analyzes his gut microbiome by sequencing stool samples. This revealed issues like chronic inflammation and an unhealthy microbiome. Smarr then took steps like a restricted eating window and increasing plant diversity in his diet, which reversed metabolic syndrome issues and correlated with shifts in his microbiome ecology. His goal is to continue precisely measuring factors like toxins, hormones, gut permeability and food/supplement impacts to further optimize his health.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in cyberinfrastructure development through regional networks. It provides data showing the importance of MSIs like historically black colleges and universities (HBCUs) in educating underrepresented minority students in STEM fields. Regional networks can help equalize opportunities by assisting MSIs in overcoming barriers to resources through training, networking infrastructure support, and helping institutions obtain necessary staffing and funding. Strategies mentioned include collaborating with MSIs on grants and addressing issues identified in surveys like lack of vision for data use beyond compliance. The goal is to broaden participation in STEAM fields by leveraging the success MSIs have shown in supporting underrepresented students.
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
This document summarizes a presentation on global petascale to exascale workflows for data intensive sciences. It discusses a partnership convened by the GNA-G Data Intensive Sciences Working Group with the mission of meeting challenges faced by data-intensive science programs. Cornerstone concepts that will be demonstrated include integrated network and site resource management, model-driven frameworks for resource orchestration, end-to-end monitoring with machine learning-optimized data transfers, and integrating Qualcomm's GradientGraph with network services to optimize applications and science workflows.
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
This document discusses opportunities for ESnet to support wireless edge computing through developing a strategy around self-guided field laboratories (SGFL). It outlines several potential science use cases that could benefit from wireless and distributed computing capabilities, both in the short term through technologies like 5G, LoRa and Starlink, and longer term through the vision of automated SGFL. The document proposes some initial ideas for deploying and testing wireless edge computing technologies through existing projects to help enable the SGFL vision and further scientific opportunities. It emphasizes that exploring these emerging areas could help drive new science possibilities if done at a reasonable scale.
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonLarry Smarr
This document provides an overview of Asia Pacific and Korea research platforms. It discusses the Asia Pacific Research Platform working group in APAN, including its objectives to promote HPC ecosystems and engage members. It describes the Asi@Connect project which provides high-capacity internet connectivity for research across Asia-Pacific. It also discusses the Korea Research Platform and efforts to expand it to 25 national research institutes in Korea. New related projects on smart hospitals, agriculture, and environment are mentioned. The conclusion discusses enhancing APAN and the Korea Research Platform and expanding into new areas like disaster and AI education.
Kief Morris rethinks the infrastructure code delivery lifecycle, advocating for a shift towards composable infrastructure systems. We should shift to designing around deployable components rather than code modules, use more useful levels of abstraction, and drive design and deployment from applications rather than bottom-up, monolithic architecture and delivery.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Bert Blevins
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
4. NATIONAL RESEARCH
NATIONAL RESEARCH
Long Term Vision
• Create an Open National Cyberinfrastructure that
allows the federation of CI at all ~4,000 accredited,
degree granting higher education institutions, non-
profit research institutions, and national laboratories.
§ Open Science
§ Open Data
§ Open Source
§ Open Infrastructure
4
Openness for an Open Society
Open Compute
Open Storage & CDN
Open devices/instruments/IoT, …?
5. NATIONAL RESEARCH
NATIONAL RESEARCH
Community vs Funded Projects
5
Community with
Shared Vision
Lot’s of funded projects that
contribute to this shared
vision in different ways.
We want you to …
… grow NRP.
… build on NRP.
NRP is “owned” and “built” by the community for the community
6. How is NRP different from OSG?
We’ve been doing federated cyberinfrastructure since 2005.
Why is NRP even needed given that OSG exists?
7. NATIONAL RESEARCH
HTCondor/OSG
Cyberinfrastructure Stack
7
Hardware
IPMI, Firmware, BIOS
Kubernetes
Admiralty
SLURM
NRP operates at all layers of the stack, from IPMI up
• IPMI reduces TCO and lower threshold to entry
• Kubernetes allows service deployments
• Also the natural layer for application container deployment
• Admiralty allows K8S federation with folks who want control
• Including cloud integration to access TPUs & other cloud only architectures
• HTCondor allows NRP to show up as a “site” in OSG
The layer you integrate at depends on
• Control you want
• Effort you can afford
8. Complementarity in Implementation of
“Bring Your Own Resource” model
OSG/PATh focused on campus cluster integration.
NRP focused on individual node integration instead of clusters.
9. NATIONAL RESEARCH
HTCondor/OSG
Cyberinfrastructure Stack
9
Hardware
IPMI, Firmware, BIOS
Kubernetes
Admiralty
SLURM
NRP operates at all layers of the stack, from IPMI up
• IPMI reduces TCO and lowers threshold to entry
• Kubernetes allows service deployments
• Also the natural layer for application container deployment
• Admiralty allows K8S federation with folks who want control
• Including cloud integration to access TPUs & other cloud only architectures
• HTCondor allows NRP to show up as site in OSG
• Under-resourced institutions
• Network providers and their POPs
• CS & ECE faculty specialized on:
• AI/ML => gaming GPUs
• systems R&D
All of these find it difficult to
justify staff to support all layers
11. NATIONAL RESEARCH
Cyberinfrastructure Stack
11
Hardware
IPMI, Firmware, BIOS
Kubernetes
NRP operates at all layers of the stack, from IPMI up
• IPMI reduces TCO and lowers threshold to entry
• Kubernetes allows service deployments
• Also the natural layer for application container deployment
• Admiralty allows K8S federation with folks who want control
• Including cloud integration to access TPUs & other cloud only architectures
• HTCondor allows NRP to show up as site in OSG
• Open Science Data Federation
• Origins & Caches in US, EU, Asia
• Protein Data Bank
• (Future) Replicas in EU & Asia
NRP is unique in its support of
global service deployments
OSDF ops by PATh PDB ops by PDB ???
12. Supporting Nautilus
for the next decade
Nautilus = K8S infrastructure of PRP for the last 5+ years
Nautilus = K8S of NRP for the next 10 years
13. NATIONAL RESEARCH
NATIONAL RESEARCH
The NSF Cat-II Program
• NSF supports via the Cat-II program novel systems ideas.
- 3 year “testbed” phase
§ The PI owns the resource, and has (some) freedom regarding who uses it.
Ÿ No requirements for making it available via any specific allocation mechanism.
§ It is expected that not all features work on day 1.
Ÿ 3 years of experimentation & development of features
- 2 year “allocation” phase
§ The resource is made available via an NSF supported allocation mechanism.
- The solicitation mentions the possibility of an additional 5 year
renewal without re-competition if system is successful.
• We decided that this is an ideal program to try and secure
NRP core operations funding for the next 10 years
- And thus provide the stability necessary for growth of NRP.
13
14. NATIONAL RESEARCH
NATIONAL RESEARCH
Cat-II: Prototype National
Research Platform (PNRP)
14
Funded as NSF 2112167
80GB A100
A10
64
288
5 year project with $5M hardware & $6.45M people
Supports Nautilus, and thus the core NRP infrastructure
Promises to build on “PRP” functionality, and go beyond
NSF Acceptance Review of System scheduled for March 8 & 9th 2023
PI = Wuerthwein; Co-PIs: DeFanti, Rosing, Tatineni, Weitzel
15. NATIONAL RESEARCH
NATIONAL RESEARCH
Innovations
• I1: Innovative network fabric that allows “rack” of hardware to
behave like a single “node” connected via PCIe.
• I2: Innovative application libraries to expose FPGAs hardware
to science apps at language constructs scientists understand
(C, C++ rather than firmware)
• I3: A “Bring Your Own Resource” model that allows campuses
nationwide to join their resources to the system.
• I4: Innovative scheduling to support urgent computing, including
interactive via Jupyter.
• I5: Innovative Data Infrastructure, including national scale
Content Delivery System like YouTube for science.
15
I3 & I4 & I5 turn ”PRP” into “NRP” and sustains it into the future.
I1 & I2 are totally new.
16. NATIONAL RESEARCH
NATIONAL RESEARCH
Data Infrastructure Model of NRP
• Support regional Ceph storage systems across the USA.
- Campuses can join individual storage hosts to the Ceph system in their region.
- All regional storage systems are Origins in OSG Data Federation (OSDF)
- Deploy replication system such that researchers can decide what part of their
namespace should be in which regional storage.
• Deploy caches in Internet2 backbone such that no campus nationwide is
more than 500 miles from a cache.
16
NRP data infrastructure model combines best of PRP & OSG
From PRP we take the regional Ceph storage concept
From OSG/PATh we take the data origin & caching concepts
And then we add as a totally new feature:
User controlled replication of partial namespaces across regions.
(We will develop this during 3 year “testbed” phase)
Want Others to build higher level data services on top
17. NATIONAL RESEARCH
NATIONAL RESEARCH
Matrix of Science x Innovations
17
NRP. The networks of researchers we create will use the NRP systems innovations to achieve broader im-
pact, with the help from our collaborators, users, staff, Co-PIs, and PI.
Table 3.1 Representative Science and Engineering Use Cases
Application domain
Lead researcher &
Institution
Science Driver
Themes NRP Innovations
LIGO
Peter Couvares, LIGO Lab;
Erik Katsavounidis, MIT
BGS, UC, AI I2, I3, I4, I5
IceCube Benedikt Riedel, UW Madison BGS, UC, AI I3, I4
Astronomy (DKIST &
Sky Surveys)
Curt Dodds, U. Hawai’i BGS, AI I3, I5,
Campus Scale Instru-
ment Facilities
Mark Ellisman, NCMIR; Sa-
mara Reck-Peterson, Nicon
Imaging Center; Johannes
Schoeneberg, Adaptive Op-
tics Lightsheet Microscopy;
Kristen Jepsen, Institute for
Genomic Medicine; Tami
Brown-Brandl, Precision Ani-
mal Management
SD, UC, H I1, I2, I3, I4, I5
Molecular Dynamics
Rommie Amaro, UCSD; An-
dreas Goetz, SDSC; Jona-
than Allen, LLNL
MD, AI, H I1, I2, I3
Human microbiome Rob Knight, UCSD G, AI, H I1, I2, I3
Genomics & Bioinfor-
matics
Alex Feltus, Clemson G, AI, H I3, I4, I5
Fluid Dynamics Rose Yu, UCSD AI I1, I2, I3
Experimental Particle
Physics, IAIFI Phil Harris, MIT AI, BGS, SD I1, I2
Computer Vision Nuno Vasconcelos, UCSD AI, CV I3
Computer Graphics Robert Twomey, UNL CV, AI I3
Programmable Storage Carlos Malzahn, UCSC SD I1, I2, I5
AI systems software
stack for FPGAs Hadi Esmaeilzadeh , UCSD SD I1, I2
WildFire Analysis &
Prediction
Ilkay Altintas, UCSD UC, AI, CV I3, I4
Key: The NRP innovations column lists those innovations among I1 through I5 listed in Section 2.1 that a
given science driver most benefits from.
NSF MREFCs
Incl. 4 campus scale
instrument facilities
Incl. a very diverse
set of sciences and
engineering
Lot’s of AI …
but so much more …
18. NATIONAL RESEARCH
NATIONAL RESEARCH
FKW’s Wishlist for the Future
• Growth of NRP infrastructure
- 1,000++ GPUs end of 2022
- 50 PB storage end of 2024
- Growth in diversity of community
§ # and types of campuses and their researchers
• Introduce new capabilities to NRP
- Machine learning at 100TB scale
- Support Domain Specific Architecture R&D on NRP
- Expand NRP into Wireless, Edge, IoT
- Towards “FAIR” on OSDF
• New Directions initiated by the Community
18
19. NATIONAL RESEARCH
NATIONAL RESEARCH
“Domain Specific Architectures”
• I1: Innovative network fabric allowing “composable hardware”.
• I2: Innovative application libraries allowing “domain optimized
architectures” on FPGAs
19
“end of Moore’s law” motivates new architectures
Mark Papermaster, CTO of AMD
PRISM, a Jump 2.0 project
funded by SRC
is early user of FPGAs@PNRP
John Shalf (2020)
https://doi.org/10.1098/rsta.2019.0061
PI, Tajana Rosing
20. NATIONAL RESEARCH
NATIONAL RESEARCH
New Data Origins
• The NSF CC* 2022 program awarded 9
campuses with $500k storage awards each.
- We guess this pays for 5PB of storage each.
• Some of these campuses may decide to
integrate their CC* storage into the OSDF.
• Some of these campuses have storage from
other projects that they may integrate with the
OSDF in addition.
• NSF 23-523 includes $500k storage
solicitation again, Spring & Fall 2023.
20
21. NATIONAL RESEARCH
NATIONAL RESEARCH
Summary & Conclusions
• PRP ended, and was replaced by NRP
- Significant new capabilities via Cat-II system “PNRP”
§ PNRP provides ops effort for Nautilus for the future
- # of GPUs available double in 2022.
§ new GPUs (A10, 3080, 3090, A100) much more powerful than older GPUs
- # of FPGAs increase from a few to a few dozen in 2022.
- # of caches grow by 50% in 22/23
=> more consistent coverage across USA
- Data volume served expected to grow substantially in 23/24/25.
§ How much? As yet too hard to predict.
• Hoping to recruit new partners to build FAIR capabilities on
top of OSDF within the next 5 years.
• Hoping to expand NRP into sensor networks using 5G &
6G in the next 10 years.
21
22. NATIONAL RESEARCH
NATIONAL RESEARCH
Acknowledgements
• This work was partially supported by the NSF
grants OAC-1541349, OAC-1826967, OAC-
2030508, OAC-1841530, OAC-2005369,
OAC-21121167, CISE-1713149, CISE-
2100237, CISE-2120019, OAC-2112167
22