The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...Larry Smarr
11.05.13
Invited Presentation
Sanford Consortium for Regenerative Medicine
Salk Institute, La Jolla
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research
Distributed Cyberinfrastructure to Support Big Data Machine LearningLarry Smarr
Panel on the Future of Machine Learning
California Institute for Telecommunications and Information Technology
University of California, Irvine
May 24, 2018
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...GEO Analytics Canada
The document discusses new technologies and approaches for analyzing satellite earth observation (EO) data in the cloud, including file formats like COG and ZARR that optimize data access, metadata standards like STAC for discovery, and platforms like Kubernetes and data cubes that enable scalable analytics. It argues that traditional approaches are now obsolete, and that Canada should embrace these new cloud native techniques to become a leader in using satellite data to improve society, as the country's space agency president advocates.
A talk at NASA Goddard, February 27, 2013
Large and diverse data result in challenging data management problems that researchers and facilities are often ill-equipped to handle. I propose a new approach to these problems based on the outsourcing of research data management tasks to software-as-a-service providers. I argue that this approach can both achieve significant economies of scale and accelerate discovery by allowing researchers to focus on research rather than mundane information technology tasks. I present early results with the approach in the context of Globus Online
This document discusses a large-scale GPU-based cloud burst simulation run by the IceCube collaboration to calibrate simulations of natural ice. The simulation was data-intensive, producing over 130 TB of data and exceeding 10 Gbps of egress bandwidth. Internet2 Cloud Connect service was used to provision over 20 dedicated network links between collaborators' institutions and cloud providers to enable high-throughput data transfer at a lower cost than commercial routes. Careful planning was required to smoothly ramp up the burst and avoid overloading individual network links.
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
NRP Engagement webinar: Description of the 380 PFLOP32S , 51k GPU multi-cloud burst using HTCondor to run IceCube photon propagation simulation.
Presented January 27th, 2020.
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein
This talk, describing the "Largest Cloud Simulation in History" (Jensen Huang at SC19), was given at the MAGIC meeting on Dec. 4th 2019. MAGIC stands for "Middleware and Grid Interagency Cooperation", and is a group within NITRD. Current federal agencies that are members of MAGIC include DOC, DOD, DOE, HHS, NASA, and NSF.
This document discusses using cloud computing for bioinformatics. It begins by defining cloud computing and describing its key characteristics like on-demand access to computing resources and rapid elasticity. It then discusses different cloud delivery models like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The document provides examples of public cloud providers for each delivery model. It also introduces tools like CloudBridge that help make applications cloud-independent and CloudLaunch, a portal for deploying cloud-enabled bioinformatics applications. Finally, it briefly discusses how these tools and cloud resources can help improve bioinformatics workflows by providing scalable infrastructure for processing large genomic datasets.
This document introduces SkyhookDM, a system that offloads computation from clients to storage nodes. It does this by embedding Apache Arrow data access libraries inside Ceph object storage devices (OSDs). This allows large Parquet files to be scanned and processed directly on the OSDs without needing to move all the data to clients. Experiments show SkyhookDM reduces latency, CPU usage, and network traffic compared to traditional approaches. It has also been integrated with the Coffea analysis framework. Ongoing work involves optimizing Arrow serialization for network transfers.
The Pacific Research Platform (PRP) aims to achieve transparent and rapid data access among collaborating scientists at multiple institutions through an integrated implementation of data-focused networking that extends the university campus Science DMZ model to a regional, national, and, eventually, a global scale.
PRP researchers are routinely achieving high-performance end-to-end networking from their labs to their collaborators’ labs and data centers, traversing multiple, heterogeneous Science DMZs and wide-area networks connecting multiple campus gateways, enabling researchers across the partnership to transfer data over dedicated optical lightpaths at speeds from 10Gb/s to 100Gb/s.
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
A California-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
The document discusses creating a California-wide cyberinfrastructure for data-intensive research. It outlines efforts to connect all UC campuses and other research institutions across California with high-speed optical networks. This would create a "big data plane" to share large datasets. Several campuses have received NSF grants to upgrade their networks and implement Science DMZ architectures with 10-100Gbps connections to CENIC. Connecting these resources would provide researchers access to high-performance computing, large scientific instruments, and datasets. This would support collaborative big data science across disciplines like physics, climate modeling, genomics and microscopy.
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
- IceCube is a neutrino observatory that detects high-energy neutrinos from astrophysical sources to study violent cosmic events. It uses over 5000 optical sensors buried in Antarctic ice to detect neutrinos.
- A cloud burst was performed using over 50,000 GPUs across multiple cloud providers worldwide to simulate photon propagation through ice for IceCube data analysis. This was the largest cloud simulation ever and demonstrated the ability to burst at exascale scales.
- The simulation helped improve IceCube's neutrino detection and pointing resolution to identify the first known source of high-energy neutrinos, a blazar, demonstrating IceCube's potential for multi-messenger astrophysics.
Overview of what has happened in HNSciCloud over the last five months, Delivered by Helge Meinhard of CERN at the HEPiX Workshop on October 21st, 2016, in Berkeley, California, USA.
Berkeley cloud computing meetup may 2020Larry Smarr
The Pacific Research Platform (PRP) is a high-bandwidth global private "cloud" connected to commercial clouds that provides researchers with distributed computing resources. It links Science DMZs at universities across California and beyond using a high-performance network. The PRP utilizes Data Transfer Nodes called FIONAs to transfer data at near full network speeds. It has adopted Kubernetes to orchestrate software containers across its resources. The PRP provides petabytes of distributed storage and hundreds of GPUs for machine learning. It allows researchers to perform data-intensive science across multiple universities much faster than possible individually.
The document discusses the Pacific Research Platform (PRP), a distributed cyberinfrastructure that connects researchers and data across multiple campuses in California and beyond using optical fiber networking. Key points:
- The PRP uses high-speed networking infrastructure like the CENIC network to connect data generators and consumers across 15+ campuses, creating an integrated "big data freeway system".
- It deploys specialized data transfer nodes called FIONAs to enable high-speed transfer of large datasets between sites at near the full network speed.
- Recent additions include using Kubernetes to orchestrate containers across the PRP infrastructure and integrating machine learning resources through the CHASE-CI grant to support data-intensive AI applications.
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
Global Research Platforms: Past, Present, FutureLarry Smarr
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.
Towards a High-Performance National Research Platform Enabling Digital ResearchLarry Smarr
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
The Pacific Research Platform (PRP) is a multi-institutional cyberinfrastructure project that connects researchers across California and beyond to share large datasets. It spans the 10 University of California campuses, major private research universities, supercomputer centers, and some out-of-state universities. Fifteen multi-campus research teams in fields like physics, astronomy, earth sciences, biomedicine, and multimedia will drive the technical needs of the PRP over five years. The goal is to create a "big data freeway" to allow high-speed sharing of data between research labs, supercomputers, and repositories across multiple networks without performance loss over long distances.
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...Larry Smarr
The document summarizes the Pacific Research Platform (PRP) which connects researchers across multiple universities with high-speed networks and computing resources for big data and machine learning applications. Key points:
- PRP connects 15 universities with optical networks, distributed storage devices (FIONAs), and over 350 GPUs for data analysis and AI training.
- It allows researchers to rapidly share and analyze large datasets, with one example reducing a workflow from 19 days to 52 minutes.
- Other projects using PRP resources include climate modeling, astrophysics simulations, and machine learning courses involving thousands of students.
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Larry Smarr
Larry Smarr discusses quantifying his body and health over time through extensive self-tracking. He measures various biomarkers through regular blood tests and analyzes his gut microbiome by sequencing stool samples. This revealed issues like chronic inflammation and an unhealthy microbiome. Smarr then took steps like a restricted eating window and increasing plant diversity in his diet, which reversed metabolic syndrome issues and correlated with shifts in his microbiome ecology. His goal is to continue precisely measuring factors like toxins, hormones, gut permeability and food/supplement impacts to further optimize his health.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in cyberinfrastructure development through regional networks. It provides data showing the importance of MSIs like historically black colleges and universities (HBCUs) in educating underrepresented minority students in STEM fields. Regional networks can help equalize opportunities by assisting MSIs in overcoming barriers to resources through training, networking infrastructure support, and helping institutions obtain necessary staffing and funding. Strategies mentioned include collaborating with MSIs on grants and addressing issues identified in surveys like lack of vision for data use beyond compliance. The goal is to broaden participation in STEAM fields by leveraging the success MSIs have shown in supporting underrepresented students.
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
This document summarizes a presentation on global petascale to exascale workflows for data intensive sciences. It discusses a partnership convened by the GNA-G Data Intensive Sciences Working Group with the mission of meeting challenges faced by data-intensive science programs. Cornerstone concepts that will be demonstrated include integrated network and site resource management, model-driven frameworks for resource orchestration, end-to-end monitoring with machine learning-optimized data transfers, and integrating Qualcomm's GradientGraph with network services to optimize applications and science workflows.
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
This document discusses opportunities for ESnet to support wireless edge computing through developing a strategy around self-guided field laboratories (SGFL). It outlines several potential science use cases that could benefit from wireless and distributed computing capabilities, both in the short term through technologies like 5G, LoRa and Starlink, and longer term through the vision of automated SGFL. The document proposes some initial ideas for deploying and testing wireless edge computing technologies through existing projects to help enable the SGFL vision and further scientific opportunities. It emphasizes that exploring these emerging areas could help drive new science possibilities if done at a reasonable scale.
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonLarry Smarr
This document provides an overview of Asia Pacific and Korea research platforms. It discusses the Asia Pacific Research Platform working group in APAN, including its objectives to promote HPC ecosystems and engage members. It describes the Asi@Connect project which provides high-capacity internet connectivity for research across Asia-Pacific. It also discusses the Korea Research Platform and efforts to expand it to 25 national research institutes in Korea. New related projects on smart hospitals, agriculture, and environment are mentioned. The conclusion discusses enhancing APAN and the Korea Research Platform and expanding into new areas like disaster and AI education.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in the National Research Platform (NRP). It provides data showing that MSIs serve a disproportionate number of underrepresented minority students and are important producers of STEM graduates from these groups. The NRP can help broaden participation in STEAM fields by providing MSIs access to advanced cyberinfrastructure resources, new learning modalities, and opportunities for collaborative research between MSIs and other institutions. Regional networks also have a role to play in helping MSIs overcome barriers and attracting them to collaborative grants. The goal is to tear down walls between research and teaching and reinvent the university experience for more inclusive learning and innovation.
How We Added Replication to QuestDB - JonTheBeachjavier ramirez
Building a database that can beat industry benchmarks is hard work, and we had to use every trick in the book to keep as close to the hardware as possible. In doing so, we initially decided QuestDB would scale only vertically, on a single instance.
A few years later, data replication —for horizontally scaling reads and for high availability— became one of the most demanded features, especially for enterprise and cloud environments. So, we rolled up our sleeves and made it happen.
Today, QuestDB supports an unbounded number of geographically distributed read-replicas without slowing down reads on the primary node, which can ingest data at over 4 million rows per second.
In this talk, I will tell you about the technical decisions we made, and their trade offs. You'll learn how we had to revamp the whole ingestion layer, and how we actually made the primary faster than before when we added multi-threaded Write Ahead Logs to deal with data replication. I'll also discuss how we are leveraging object storage as a central part of the process. And of course, I'll show you a live demo of high-performance multi-region replication in action.
### Data Description and Analysis Summary for Presentation
#### 1. **Importing Libraries**
Libraries used:
- `pandas`, `numpy`: Data manipulation
- `matplotlib`, `seaborn`: Data visualization
- `scikit-learn`: Machine learning utilities
- `statsmodels`, `pmdarima`: Statistical modeling
- `keras`: Deep learning models
#### 2. **Loading and Exploring the Dataset**
**Dataset Overview:**
- **Source:** CSV file (`mumbai-monthly-rains.csv`)
- **Columns:**
- `Year`: The year of the recorded data.
- `Jan` to `Dec`: Monthly rainfall data.
- `Total`: Total annual rainfall.
**Initial Data Checks:**
- Displayed first few rows.
- Summary statistics (mean, standard deviation, min, max).
- Checked for missing values.
- Verified data types.
**Visualizations:**
- **Annual Rainfall Time Series:** Trends in annual rainfall over the years.
- **Monthly Rainfall Over Years:** Patterns and variations in monthly rainfall.
- **Yearly Total Rainfall Distribution:** Distribution and frequency of annual rainfall.
- **Box Plots for Monthly Data:** Spread and outliers in monthly rainfall.
- **Correlation Matrix of Monthly Rainfall:** Relationships between different months' rainfall.
#### 3. **Data Transformation**
**Steps:**
- Ensured 'Year' column is of integer type.
- Created a datetime index.
- Converted monthly data to a time series format.
- Created lag features to capture past values.
- Generated rolling statistics (mean, standard deviation) for different window sizes.
- Added seasonal indicators (dummy variables for months).
- Dropped rows with NaN values.
**Result:**
- Transformed dataset with additional features ready for time series analysis.
#### 4. **Data Splitting**
**Procedure:**
- Split the data into features (`X`) and target (`y`).
- Further split into training (80%) and testing (20%) sets without shuffling to preserve time series order.
**Result:**
- Training set: `(X_train, y_train)`
- Testing set: `(X_test, y_test)`
#### 5. **Automated Hyperparameter Tuning**
**Tool Used:** `pmdarima`
- Automatically selected the best parameters for the SARIMA model.
- Evaluated using metrics such as AIC and BIC.
**Output:**
- Best SARIMA model parameters and statistical summary.
#### 6. **SARIMA Model**
**Steps:**
- Fit the SARIMA model using the training data.
- Evaluated on both training and testing sets using MAE and RMSE.
**Output:**
- **Train MAE:** Indicates accuracy on training data.
- **Test MAE:** Indicates accuracy on unseen data.
- **Train RMSE:** Measures average error magnitude on training data.
- **Test RMSE:** Measures average error magnitude on testing data.
#### 7. **LSTM Model**
**Preparation:**
- Reshaped data for LSTM input.
- Converted data to `float32`.
**Model Building and Training:**
- Built an LSTM model with one LSTM layer and one Dense layer.
- Trained the model on the training data.
**Evaluation:**
- Evaluated on both training and testing sets using MAE and RMSE.
**Output:**
- **Train MAE:** Accuracy on training data.
- **T
Airline Satisfaction Project using Azure
This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.
Simon Fraser University degree offer diploma Transcript
Toward a National Research Platform
1. “Toward a National Research Platform”
Invited Presentation
Open Science Grid All Hands Meeting
Salt Lake City, UT
March 20, 2018
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
2. 30 Years Ago NSF Brought to University Researchers
a DOE HPC Center Model
NCSA Was Modeled on LLNL SDSC Was Modeled on MFEnet
1985/6
3. • The First National Telecom-Interconnected 155 Mbps Research Network
– 65 Science Projects
– Into the San Diego Convention Center
• I-Way Featured:
– Networked Visualization Applications
– Large-Scale Immersive Displays
– I-Soft Programming Environment
– Led to the Globus Project
I-WAY: Information Wide Area Year
Supercomputing ‘95
UIC
http://archive.ncsa.uiuc.edu/General/Training/SC95/GII.HPCC.html
See talk by:
Brian Bockelman
4. NSF’s PACI Program was Built on the vBNS
to Prototype America’s 21st Century Information Infrastructure
The PACI Grid Testbed
National Computational Science
1997
vBNS
led to
Key Role
of Miron Livny
& Condor
5. UCSD Has Been Working Toward PRP for Over 15 Years:
NSF OptIPuter, Quartzite, Prism Awards
PI Papadopoulos,
2013-2015
PI Smarr,
2002-2009
PI Papadopoulos,
2004-2007
Precursors to DOE
Defining DMZ in 2010
6. Based on Community Input and on ESnet’s Science DMZ Concept,
NSF Has Funded Over 100 Campuses to Build DMZs
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
NSF Program Officer: Kevin Thompson
7. (GDC)
Logical Next Step: The Pacific Research Platform Networks Campus DMZs
to Create a Regional End-to-End Science-Driven “Big Data Superhighway” System
NSF CC*DNI Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Co-PIs:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2/QI,
• Philip Papadopoulos, UCSD SDSC,
• Frank Wuerthwein, UCSD Physics and SDSC
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders
NSF Program Officer: Amy Walton
Source: John Hess, CENIC
8. Note That the OSG Cluster Map
Has Major Overlap with the NSF-Funded DMZ Map
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
NSF CC* Grants
9. Bringing OSG Software and Services
to a Regional-Scale DMZ
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
10. • FIONAs PCs [a.k.a ESnet DTNs]:
– ~$8,000 Big Data PC with:
– 1 CPUs
– 10/40 Gbps Network Interface Cards
– 3 TB SSDs or 100+ TB Disk Drive
– Extensible for Higher Performance to:
– +NVMe SSDs for 100Gbps Disk-to-Disk
– +Up to 8 GPUs [4M GPU Core Hours/Week]
– +Up to 160 TB Disks for Data Posting
– +Up to 38 Intel CPUs
– $700 10Gpbs FIONAs Being Tested
• FIONettes are $270 FIONAs
– 1Gbps NIC With USB-3 for Flash Storage or SSD
Big Data Science Data Transfer Nodes (DTNs)-
Flash I/O Network Appliances (FIONAs)
FIONette—1G, $250
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
Key PRP Innovation: UCSD Designed FIONAs To Solve the Disk-to-Disk
Data Transfer Problem at Full Speed on 10/40/100G Networks
FIONAS—10/40G, $8,000
11. We Measure Disk-to-Disk Throughput
with 10GB File Transfer Using Globus GridFTP
4 Times Per Day in Both Directions for All PRP Sites
January 29, 2016
From Start of Monitoring 12 DTNs
to 24 DTNs Connected at 10-40G
in 1 ½ Years
July 21, 2017
Source: John Graham, Calit2/QI
12. PRP’s First 2 Years:
Connecting Multi-Campus Application Teams and Devices
Earth
Sciences
13. PRP Over CENIC
Couples UC Santa Cruz Astrophysics Cluster to LBNL NERSC Supercomputer
CENIC 2018
Innovations in
Networking
Award for
Research
Applications
14. 100 Gbps FIONA at UCSC Allows for Downloads to the UCSC Hyades Cluster
from the LBNL NERSC Supercomputer for DESI Science Analysis
300 images per night.
100MB per raw image
120GB per night
250 images per night.
530MB per raw image
800GB per night
Source: Peter Nugent, LBNL
Professor of Astronomy, UC Berkeley
Precursors to
LSST and NCSA
NSF-Funded Cyberengineer
Shaw Dong @UCSC
Receiving FIONA
Feb 7, 2017
15. Jupyter Has Become the Digital Fabric for Data Sciences
PRP Creates UC-JupyterHub Backbone
Source: John Graham, Calit2
Goal: Jupyter Everywhere
16. LHCOne Traffic Growth is Large Now
But Will Explode in 2026
31 Petabytes
in January 2018
+38% Change
Within Last Year
LHC Accounts for 47% of
Total ESNet traffic Today
Dramatic Data Volume Growth
Expected for HL-LHC in 2026
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
17. Data Transfer Rates From 40 Gbps DTN in UCSD Physics Building,
Across Campus on PRISM DMZ, Then to Chicago’s Fermilab Over CENIC/ESnet
Based on This Success,
Würthwein Will Upgrade 40G DTN to 100G
For Bandwidth Tests & Kubernetes Integration
With OSG, Caltech, and UCSC
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
18. LHC Data Analysis
Running on PRP
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
Two Projects:
• OSG Cluster-in-a-Box for “T3”
• Distributed Xrootd Cache for “T2”
20. PRP Distributed Tier-2 Cache
Across Caltech & UCSD
Cache
Server
Cache
Server…
Redirect
or
Cache
Server
Cache
Server…
Redirect
or
UCSD Caltech
Redirector Top Level Cache
Global Data Federation of CMS
Applications Can Connect at Local
or Top Level Cache Redirector
Test the System as
Individual or Joint Cache
Provisioned pilot systems:
PRP UCSD: 9 x 12 SATA Disk of 2TB
@ 10Gbps for Each System
PRP Caltech: 2 x 30 SATA Disk of 6TB
@ 40Gbps for Each System
Production Use (UCSD only)
I/O in Production Limited by
# of Apps Hitting the Cache,
and Their I/O Patterns
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
21. Game Changer: Using Kubernetes
to Manage Containers Across the PRP
“Kubernetes is a way of stitching together
a collection of machines into, basically, a big computer,”
--Craig Mcluckie, Google
and now CEO and Founder of Heptio
"Everything at Google runs in a container."
--Joe Beda,Google
“Kubernetes has emerged as
the container orchestration engine of choice
for many cloud providers including
Google, AWS, Rackspace, and Microsoft,
and is now being used in HPC and Science DMZs.
--John Graham, Calit2/QI UC San Diego
See talk by:
Rob Gardner
22. Distributed Computation on PRP Nautilus HyperCluster
Coupling SDSU Cluster and SDSC Comet Using Kubernetes Containers
25 years
Developed and executed MPI-based PRP Kubernetes Cluster execution
[CO2,aq] 100 Year Simulation
4 days
75 years
100 years
• 0.5 km x 0.5 km x 17.5 m
• Three sandstone layers
separated by two shale
layers
Simulating the Injection of CO2
in Brine-Saturated Reservoirs:
Poroelastic & Pressure-Velocity
Fields Solved In Parallel With MPI
Using Domain Decomposition
Across Containers
Source: Chris Paolini and Jose Castillo, SDSU
23. Rook is Ceph Cloud-Native Object Storage
‘Inside’ Kubernetes
https://rook.io/
Source: John Graham, Calit2/QI
See talk by:
Shawn McKee
24. FIONA8: Adding GPUs to FIONAs
Supports Data Science Machine Learning
Multi-Tenant Containerized GPU JupyterHub
Running Kubernetes / CoreOS
Eight Nvidia GTX-1080 Ti GPUs
~$13K
32GB RAM, 3TB SSD, 40G & Dual 10G ports
Source: John Graham, Calit2
25. FIONA8
FIONA8
100G Epyc NVMe
Nautilus - A Multi-Tenant Containerized PRP HyperCluster for Big Data Applications
Running Kubernetes with Rook/Ceph Cloud Native Storage and GPUs for Machine Learning
40G SSD 3T
100G NVMe 6.4T
SDSU
100G Gold NVMe
March 2018 John
Graham, Calit2/QI
100G NVMe 6.4T
Caltech
40G SSD
UCAR
FIONA8
UCI
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
sdx-controller
controller-0
Calit2
100G Gold FIONA8
SDSC
40G SSD
UCR 40G SSD
USC
40G SSD
UCLA
40G SSD
Stanford
40G SSD
UCSB
100G NVMe 6.4T
40G SSD
UCSC
40G SSD
Hawaii
Rook/Ceph - Block/Object/FS
Swift API compatible with
SDSC, AWS, and Rackspace
Kubernetes
Centos7
26. FIONA8
FIONA8
100G Epyc NVMe
40G 160TB
100G NVMe 6.4T
SDSU
100G Gold NVMe
March 2018 John Graham, UCSD
100G NVMe 6.4T
Caltech
40G 160TB
UCAR
FIONA8
UCI
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
FIONA8
sdx-controller
controller-0
Calit2
100G Gold FIONA8
SDSC
40G 160TB
UCR 40G 160TB
USC
40G 160TB
UCLA
40G 160TB
Stanford
40G 160TB
UCSB
100G NVMe 6.4T
40G 160TB
UCSC
40G 160TB
Hawaii
Running Kubernetes/Rook/Ceph On PRP
Allows Us to Deploy a Distributed PB+ of Storage for Posting Science Data
Rook/Ceph - Block/Object/FS
Swift API compatible with
SDSC, AWS, and Rackspace
Kubernetes
Centos7
27. Collaboration Opportunity with OSG & PRP
on Distributed Storage
1.8PB1.2PB1.6PB
210TB
Total data volume pulled last year
is dominated by 4 caches.
OSG Is Operating a Distributed Caching CI.
At Present, 4 Caches Provide Significant Use
PRP Kubernetes Infrastructure Could Either
Grow Existing Caches by Adding Servers,
or by Adding Additional Locations
See talks by:
Alex Feltus
Derek Weitzel
StashCache Users include:
See talk by
Marcelle Soares-Santos
LIGO
DES
Source: Frank Würthwein, OSG, UCSD/SDSC, PRP
28. New NSF CHASE-CI Grant Creates a Community Cyberinfrastructure:
Adding a Machine Learning Layer Built on Top of the Pacific Research Platform
Caltech
UCB
UCI UCR
UCSD
UCSC
Stanford
MSU
UCM
SDSU
NSF Grant for High Speed “Cloud” of 256 GPUs
For 30 ML Faculty & Their Students at 10 Campuses
for Training AI Algorithms on Big Data
NSF Program Officer: Mimi McClure
29. 48 GPUs for
OSG Applications
UCSD Adding >350 Game GPUs to Data Sciences Cyberinfrastructure -
Devoted to Data Analytics and Machine Learning
SunCAVE 70 GPUs
WAVE + Vroom 48 GPUs
FIONA with
8-Game GPUs
88 GPUs
for Students
CHASE-CI Grant Provides
96 GPUs at UCSD
for Training AI Algorithms on Big Data
30. Next Step: Surrounding the PRP Machine Learning Platform
With Clouds of GPUs and Non-Von Neumann Processors
Microsoft Installs Altera FPGAs
into Bing Servers &
384 into TACC for Academic Access
CHASE-CI
64-TrueNorth
Cluster
64-bit GPUs
4352x NVIDIA Tesla V100 GPUs
See talk by:
Hurtado Anampa
31. PRP is Partnering with NSF Grants Supporting
Advanced Cyberinfrastructure Facilitators to Explore PRP Extension Toward NRP
PRP Connected
ACI-REF has also spawned the 35-member
Campus Research Computing Consortium
(CaRCC), Funded by the NSF as
a Research Coordination Network (RCN)
CaRCC is Dedicated to Sharing Best Practices,
Expertise, and Resources, Enabling
the Advancement of Campus-Based
Research Computing Activities
Across the Nation
Jim Bottum, Principal Investigator
Tom Cheatham, ACI REF Chair of Campus PIs
ACI-REF
CaRCC
See talk by
Tom Cheatham
32. Expanding to the Global Research Platform
Via CENIC/Pacific Wave, Internet2, and International Links
PRP
PRP’s Current
International
Partners
Korea Shows Distance is Not the Barrier
to Above 5Gb/s Disk-to-Disk Performance
Netherlands
Guam
Australia
Korea
Japan
Singapore
33. The Second National Research Platform Workshop
Bozeman, MT August 6-7, 2018
A follow-up FIONA workshop
will be held as a lead into
the 2nd NRP workshop
in Bozeman,
starting August 2nd.
While the workshop will be
open to the community,
there is a specific focus
on EPSCoR-affiliated
and minority serving institutions.
Co-Chairs:
Larry Smarr, Calit2
Inder Monga, ESnet
Ana Hunsinger, Internet2
Local Host: Jerry Sheehan, MSU
34. Our Support:
• US National Science Foundation (NSF) awards
CNS 0821155, CNS-1338192, CNS-1456638, CNS-1730158,
ACI-1540112, & ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet