This document summarizes a lecture given by Dr. Larry Smarr on high performance cyberinfrastructure for data-intensive research. The summary discusses:
1) The need for dedicated high-bandwidth networks separate from the shared internet to enable big data research due to the increasing volume of digital scientific data.
2) Extensions being made to networks like CENIC in California to provide campus "Big Data Freeways" connecting instruments, computing resources, and remote facilities.
3) The use of networks like HPWREN to provide high-performance wireless access for data-intensive applications in rural areas like astronomy, wildfire detection, and more.
From the Shared Internet to Personal Lightwaves: How the OptIPuter is Transfo...Larry Smarr
The document summarizes how the OptIPuter project is transforming scientific research through user-controlled high-speed optical network connections. It provides examples of how 1-10Gbps connections through projects like National LambdaRail are enabling new forms of collaborative work and access to scientific instruments and global data repositories. The OptIPuter creates an environment where researchers can access remote resources through local "OptIPortals" connected to these high-speed optical networks.
The Importance of Large-Scale Computer Science Research EffortsLarry Smarr
05.10.20
Talk at Public Seminar on Large-Scale NSF Research Efforts for the Future Computer Museum
Title: The Importance of Large-Scale Computer Science Research Efforts
Mountain View, CA
The document discusses the growing carbon footprint of information and communication technologies (ICT) and efforts to make cyberinfrastructure more energy efficient and environmentally sustainable. Specifically, it mentions that (1) ICT energy usage is growing rapidly and accounts for 2% of global greenhouse gas emissions, (2) universities are working on initiatives like the GreenLight project to reduce ICT energy usage through techniques like dynamic power management, and (3) further research is needed to develop more energy-efficient computing technologies, data center designs, and videoconferencing solutions to reduce the need for travel.
Global Telepresence in Support of Global Public HealthLarry Smarr
The document discusses Calit2's efforts to develop global telepresence technologies to support public health initiatives. It describes Calit2's work in building a multidisciplinary research network across UC campuses, developing telemedicine systems, and applying technologies like optical networks to enable real-time collaboration and data sharing in fields like genomics, metagenomics, and cellular imaging.
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
11.03.28
Remote Luncheon Presentation from Calit2@UCSD
National Science Board
Expert Panel Discussion on Data Policies
National Science Foundation
Title: High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Science and Engineering
Arlington, Virginia
Toward a Global Interactive Earth Observing CyberinfrastructureLarry Smarr
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
SC21: Larry Smarr on The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Larry Smarr, founding director of Calit2 (now Distinguished Professor Emeritus at the University of California San Diego) and the first director of NCSA, is one of the seminal figures in the U.S. supercomputing community. What began as a personal drive, shared by others, to spur the creation of supercomputers in the U.S. for scientific use, later expanded into a drive to link those supercomputers with high-speed optical networks, and blossomed into the notion of building a distributed, high-performance computing infrastructure – replete with compute, storage and management capabilities – available broadly to the science community.
National Federated Compute Platforms: The Pacific Research PlatformLarry Smarr
The Pacific Research Platform (PRP) is a multi-institution hypercluster that connects science DMZs across 25 partner campuses using FIONA data transfer nodes and 10-100Gbps networks. PRP adopted Kubernetes and Rook to orchestrate petabytes of distributed storage and GPUs for data science applications. A CHASE-CI grant added machine learning capabilities. PRP is working to federate with the Open Science Grid and become a prototype for a future National Research Platform connecting regional networks.
Remote Telepresence for Exploring Virtual WorldsLarry Smarr
The document describes the history and development of remote telepresence and virtual reality technologies over several decades. It outlines key projects and innovations including the NSFnet which connected supercomputers in the 1980s, the development of the CAVE virtual reality system in the early 1990s, and more advanced optical network projects like OptIPuter in the 2000s which enabled high-resolution telepresence and collaboration across global research centers.
The Singularity: Toward a Post-Human RealityLarry Smarr
06.02.13
Talk to UCSD's Sixth College
Honor's Course on Kurzweil's The Singularity is Near
Title: The Singularity: Toward a Post-Human Reality
La Jolla, CA
Introduction to the UCSD Division of Calit2Larry Smarr
Calit2 is a research institute at UC San Diego that focuses on digital transformation of fields like health, environment, and education through technologies like mobile phones, sensors, virtual/augmented reality, and high-performance computing networks. The director gave a tour of Calit2's facilities, which include laboratories for nanotechnology, digital media, and medical research using technologies like social mobile apps, environmental sensors on phones, human-robot interaction, and optical networks connecting instruments and storage. Calit2 works with affiliated academic units and industry partners to develop innovative applications and testbeds for areas like telemedicine, digital cinema, virtual reality displays, and telepresence.
The document discusses the history and future of telepresence technology. It describes early visions of telepresence from the 1960s, prototypes in the 1980s, and partnerships in the 1990s that helped advance the technology. It outlines current infrastructure like National LambdaRail that enables remote collaboration and explores future possibilities like connecting very large displays and bringing gigabit internet to homes.
A California-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
The document discusses creating a California-wide cyberinfrastructure for data-intensive research. It outlines efforts to connect all UC campuses and other research institutions across California with high-speed optical networks. This would create a "big data plane" to share large datasets. Several campuses have received NSF grants to upgrade their networks and implement Science DMZ architectures with 10-100Gbps connections to CENIC. Connecting these resources would provide researchers access to high-performance computing, large scientific instruments, and datasets. This would support collaborative big data science across disciplines like physics, climate modeling, genomics and microscopy.
Discovering Yourself with Computational BioinformaticsLarry Smarr
This document summarizes a presentation given by Dr. Larry Smarr on his self-experimentation with quantifying biomarkers and 'omics data to gain insights into his health. Smarr has tracked over 100 blood biomarkers, sequenced his microbiome, and analyzed over 1 million SNPs from his genome. Computational analysis of this data helped diagnose Smarr with Crohn's disease and revealed shifts in his microbiome from healthy to diseased states. Smarr advocates integrating multi-omics data to achieve predictive, preventative and participatory medicine.
Quantifying your Superorganism: Your Gut Microbiome and its Interactions with...Larry Smarr
This document summarizes a lecture given by Dr. Larry Smarr on quantifying one's gut microbiome and its interactions with the immune system. Dr. Smarr discussed how analyzing his own medical data over many years revealed he had an autoimmune disease like inflammatory bowel disease (IBD). By sequencing his microbiome, he found major shifts between healthy and IBD states, with collapses in some bacterial phyla and explosions in others. Dr. Smarr's therapy reduced two phyla greatly but massive reductions remained, leaving him "trapped" in an unfavorable microbial ecology. However, he is now able to track his microbiome over time using new technologies, giving him data and hope to improve his condition.
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...Larry Smarr
This document summarizes a talk given by Dr. Larry Smarr on how he used Dell's HPC Cloud and advanced analytics software to analyze over 300 human gut microbiome samples. He was able to discover distinct microbial signatures associated with health and different diseases like ulcerative colitis and Crohn's disease. Dell's analytics software effectively separated and classified the samples by health status and disease type using only a few key microbial species. This research could lead to new microbial diagnostics for inflammatory bowel diseases.
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...Larry Smarr
The document summarizes Dr. Larry Smarr's talk on using data analytics to analyze the human microbiome. Some key points:
- Next-generation sequencing and supercomputing are used to map the microbiomes of hundreds of people to analyze bacterial species abundance in health and diseases like IBD.
- Analysis with Dell Analytics and Ayasdi reveals major differences in bacterial phyla and protein families between healthy and disease states that can be used to noninvasively diagnose disease. Certain species are found at much higher or lower levels in disease states.
- Continued microbiome profiling and topological data analysis may help discover new diagnostic biomarkers for disease states and track disease progression.
A Systems Approachto Personalized MedicineLarry Smarr
A Systems Approach to Personalized Medicine
This talk discusses how one man used various omics technologies like genomics, metagenomics, metabolomics, and imaging to gain insights into his own health. Over a decade, he tracked over a billion data points about himself including his microbiome, genome, blood variables, and medical images. This led to the discovery that he had an inflammatory bowel disease. He then used multi-omics analyses and computing resources to study his condition and microbiome in detail over time. This is an example of a systems approach to personalized medicine.
The Quantified Self Movement: Technologies Revolutionizing Health and FitnessLarry Smarr
2014.01.15
Calit2 Director Larry Smarr talks to the MIT Enterprise Forum San Diego about the self-monitoring revolution and its impact on technologies for health and fitness.
Digital Culture and the Future InternetLarry Smarr
The document discusses the history and growth of the Internet from its origins in the 1970s to the present day. It notes that traffic on the Internet has increased by over 1 trillion-fold and summarizes some of the key events and innovations that drove this exponential growth, such as the creation of the World Wide Web and Mosaic browser. It also presents trends shaping the future internet, including increased integration with the physical world through wireless sensor networks and a transition to more sustainable, climate-resilient digital infrastructure.
The Pacific Research Platform (PRP) is a multi-institutional cyberinfrastructure project that connects researchers across California and beyond to share large datasets. It spans the 10 University of California campuses, major private research universities, supercomputer centers, and some out-of-state universities. Fifteen multi-campus research teams in fields like physics, astronomy, earth sciences, biomedicine, and multimedia will drive the technical needs of the PRP over five years. The goal is to create a "big data freeway" to allow high-speed sharing of data between research labs, supercomputers, and repositories across multiple networks without performance loss over long distances.
How Studying Astrophysics and Coral Reefs Enabled Me to Become an Empowered,...Larry Smarr
This document summarizes Dr. Larry Smarr's talk on how his background in astrophysics and studying coral reefs enabled him to become an empowered patient by closely monitoring his gut microbiome. Some key findings from analyzing his stool samples over time included discovering oscillations in his immune system, invasions of opportunistic bacteria after disruptions, and evidence of chaos theory at play. Larger studies are now analyzing data from many individuals to better understand the dynamics of the human immune and microbiome systems.
Observing the Dynamics of the Human Immune System Coupled to the Microbiome i...Larry Smarr
Calit2 Director Larry Smarr delivered this presentation to the CASIS Workshop on Biomedical Research Aboard the ISS at Columbia University in NY, NY, on May 28, 2014.
Toward Novel Human Microbiome Surveillance Diagnostics to Support Public HealthLarry Smarr
The document discusses ongoing research into understanding the human microbiome and its role in health and disease. It outlines how sequencing costs have dropped dramatically, enabling analysis of both human and microbial genomes. Several studies are highlighted that use microbiome profiling to differentiate between healthy individuals and those with various forms of inflammatory bowel disease.
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human GenomicsLarry Smarr
This presentation on February 27, 2014 to NGS and the Future of Medicine at Illumina Headquarters in La Jolla, CA, was made by Calit2 Director Larry Smarr.
Quantified Self On Being A Personal Genomic ObservatoryLarry Smarr
Larry Smarr's presentation on the "Quantified Self On Being A Personal Genomic Observatory", Keynote in the "Humans as Genomic Observatories" Meeting Session in the Genomics Standards Consortium, GSC 15, April 24, 2013
Commercializing Space: From the Moon to MarsLarry Smarr
Panel discussion featuring Calit2 Director Larry Smarr, former FAA associate administrator and aerospace consultant Patti Grace Smith, and nonfiction author Michael Sims at the Future in Review Conference on May 21, 2014 in Laguna Beach, Calif.
Towards a High-Performance National Research Platform Enabling Digital ResearchLarry Smarr
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.
2014.02.06
Calit2 Director Larry Smarr invited short talk to a workshop on "Enriching Human Life and Society," one of the planned themes for the UCSD Strategic Plan to be adopted in 2014.
Peering The Pacific Research Platform With The Great Plains NetworkLarry Smarr
The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points:
- The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider.
- The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources.
- Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
The document discusses the Pacific Research Platform (PRP), a distributed cyberinfrastructure that connects researchers and data across multiple campuses in California and beyond using optical fiber networking. Key points:
- The PRP uses high-speed networking infrastructure like the CENIC network to connect data generators and consumers across 15+ campuses, creating an integrated "big data freeway system".
- It deploys specialized data transfer nodes called FIONAs to enable high-speed transfer of large datasets between sites at near the full network speed.
- Recent additions include using Kubernetes to orchestrate containers across the PRP infrastructure and integrating machine learning resources through the CHASE-CI grant to support data-intensive AI applications.
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
Invited Presentation
Symposium on Computational Biology and Bioinformatics:
Remembering John Wooley
National Institutes of Health
Bethesda, MD
July 29, 2016
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemLarry Smarr
The Pacific Research Platform (PRP) is a multi-institutional partnership that establishes a high-capacity "big data freeway system" spanning the University of California campuses and other research universities in California to facilitate rapid data access and sharing between researchers and institutions. Fifteen multi-campus application teams in fields like particle physics, astronomy, earth sciences, biomedicine, and visualization drive the technical design of the PRP over five years. The goal of the PRP is to extend campus "Science DMZ" networks to allow high-speed data movement between research labs, supercomputer centers, and data repositories across campus, regional
- The Pacific Research Platform (PRP) interconnects campus DMZs across multiple institutions to provide high-speed connectivity for data-intensive research.
- The PRP utilizes specialized data transfer nodes called FIONAs that provide disk-to-disk transfer speeds of 10-100Gbps.
- Early applications of the PRP include distributing telescope data between UC campuses, connecting particle physics experiments to computing resources, and enabling real-time wildfire sensor data analysis.
An Integrated Science Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
This document summarizes Dr. Larry Smarr's vision for an integrated science cyberinfrastructure to support data-intensive research. It discusses the exponential growth of digital data and need for dedicated high-bandwidth networks and data repositories. Specific examples are provided of initiatives at UCSD, regional optical networks connecting research institutions, and national projects like the Open Science Grid and Cancer Genomics Hub that are creating cyberinfrastructure to enable data-intensive scientific discovery.
Using the Pacific Research Platform for Earth Sciences Big DataLarry Smarr
Grand Challenge Lecture
Big Data and the Earth Sciences: Grand Challenges Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
May 31, 2017
Positioning University of California Information Technology for the Future: S...Larry Smarr
05.02.15
Invited Talk
The Vice Chancellor of Research and Chief Information Officer Summit
“Information Technology Enabling Research at the University of California”
Title: Positioning University of California Information Technology for the Future: State, National, and International IT Infrastructure Trends and Directions
Oakland, CA
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
My Remembrances of Mike Norman Over The Last 45 YearsLarry Smarr
Mike Norman has been a leader in computational astrophysics for over 45 years. Some of his influential work includes:
- Cosmic jet simulations in the early 1980s which helped explain phenomena from galactic centers.
- Pioneering the use of adaptive mesh refinement in the 1990s to achieve dynamic load balancing on supercomputers.
- Massive cosmology simulations in the late 2000s with over 100 trillion particles using thousands of processors across multiple supercomputing sites, producing petabytes of data.
- Developing end-to-end workflows in the 2000s to couple supercomputers, high-speed networks, and large visualization systems to enable real-time analysis of extremely large astrophysics simulations.
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Larry Smarr
Larry Smarr discusses quantifying his body and health over time through extensive self-tracking. He measures various biomarkers through regular blood tests and analyzes his gut microbiome by sequencing stool samples. This revealed issues like chronic inflammation and an unhealthy microbiome. Smarr then took steps like a restricted eating window and increasing plant diversity in his diet, which reversed metabolic syndrome issues and correlated with shifts in his microbiome ecology. His goal is to continue precisely measuring factors like toxins, hormones, gut permeability and food/supplement impacts to further optimize his health.
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
This document discusses engaging more minority serving institutions (MSIs) in cyberinfrastructure development through regional networks. It provides data showing the importance of MSIs like historically black colleges and universities (HBCUs) in educating underrepresented minority students in STEM fields. Regional networks can help equalize opportunities by assisting MSIs in overcoming barriers to resources through training, networking infrastructure support, and helping institutions obtain necessary staffing and funding. Strategies mentioned include collaborating with MSIs on grants and addressing issues identified in surveys like lack of vision for data use beyond compliance. The goal is to broaden participation in STEAM fields by leveraging the success MSIs have shown in supporting underrepresented students.
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
This document summarizes a presentation on global petascale to exascale workflows for data intensive sciences. It discusses a partnership convened by the GNA-G Data Intensive Sciences Working Group with the mission of meeting challenges faced by data-intensive science programs. Cornerstone concepts that will be demonstrated include integrated network and site resource management, model-driven frameworks for resource orchestration, end-to-end monitoring with machine learning-optimized data transfers, and integrating Qualcomm's GradientGraph with network services to optimize applications and science workflows.
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
This document discusses opportunities for ESnet to support wireless edge computing through developing a strategy around self-guided field laboratories (SGFL). It outlines several potential science use cases that could benefit from wireless and distributed computing capabilities, both in the short term through technologies like 5G, LoRa and Starlink, and longer term through the vision of automated SGFL. The document proposes some initial ideas for deploying and testing wireless edge computing technologies through existing projects to help enable the SGFL vision and further scientific opportunities. It emphasizes that exploring these emerging areas could help drive new science possibilities if done at a reasonable scale.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionBert Blevins
Cybersecurity is a major concern in today's connected digital world. Threats to organizations are constantly evolving and have the potential to compromise sensitive information, disrupt operations, and lead to significant financial losses. Traditional cybersecurity techniques often fall short against modern attackers. Therefore, advanced techniques for cyber security analysis and anomaly detection are essential for protecting digital assets. This blog explores these cutting-edge methods, providing a comprehensive overview of their application and importance.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
Best Practices for Effectively Running dbt in Airflow.pdfTatiana Al-Chueyr
As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.
This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:
- Standard ways of running dbt (and when to utilize other methods)
- How Cosmos can be used to run and visualize your dbt projects in Airflow
- Common challenges and how to address them, including performance, dependency conflicts, and more
- How running dbt projects in Airflow helps with cost optimization
Webinar given on 9 July 2024
Comparison Table of DiskWarrior Alternatives.pdfAndrey Yasko
To help you choose the best DiskWarrior alternative, we've compiled a comparison table summarizing the features, pros, cons, and pricing of six alternatives.
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxSynapseIndia
Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...Toru Tamaki
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfNeo4j
Presented at Gartner Data & Analytics, London Maty 2024. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Join this session to hear their story, the lessons they learned along the way and how their future innovation plans include the exploration of uses of EKG + Generative AI.
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
High Performance Cyberinfrastructure for Data-Intensive Research
1. “High Performance Cyberinfrastructure
for Data-Intensive Research”
Distinguished Lecture
UC Riverside
October 18, 2013
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information
Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
1
2. Abstract
With the increasing number of digital scientific instruments and sensornets available
to university researchers, the need for a high performance cyberinfrastructure (HPCI),
separate from the shared Internet, is becoming necessary. The backbone of such an
HPCI are dedicated wavelengths of light on optical fiber, typically with speeds of
10Gbps or 10,000 megabits/sec, roughly 1000x the speed of the shared Internet. We
are fortunate in California to have one of the most advanced optical state networks,
the CENIC research and education network. I will describe future extensions of the
CENIC backbone to enable a wide range of disciplinary Big Data research. One
extension involves building optical fiber "Big Data Freeways" on UC campuses,
similar to the NSF-funded PRISM network now being deployed on the UCSD campus,
to feed the coming 100Gbps CENIC campus connections. These Freeways connect
on-campus end users, compute and storage resources, and data-generating devices,
such as scientific instruments, with remote Big Data facilities. I will describe uses of
PRISM ranging from particle physics to biomedical data to climate research. The
second type of extension is high performance wireless networks to cover the rural
regions of our counties, similar to the NSF-funded High Performance Wireless
Research and Education Network (HPWREN) currently deployed in San Diego and
Imperial counties. HPWREN has enabled data-intensive astronomy observations,
wildfire detection, first responder connectivity, Internet access to Native American
reservations, seismic networks, and nature observatories.
4. The Data-Intensive Discovery Era Requires
High Performance Cyberinfrastructure
• Growth of Digital Data is Exponential
– “Data Tsunami”
• Driven by Advances in Digital Detectors, Computing,
Networking, & Storage Technologies
• Shared Internet Optimized for Megabyte-Size Objects
• Need Dedicated Photonic Cyberinfrastructure for
Gigabyte/Terabyte Data Objects
• Finding Patterns in the Data is the New Imperative
–
–
–
–
Data-Driven Applications
Data Mining
Visual Analytics
Data Analysis Workflows
Source: SDSC
5. The White House Announcement
Has Galvanized U.S. Campus CI Innovations
6. Global Innovation Centers are Being Connected
with 10,000 Megabits/sec Clear Channel Lightpaths
100 Gbps Commercially Available;
Research on 1 Tbps
Source: Maxine Brown, UIC and Robert Patterson, NCSA
7. Corporation For Education Network Initiatives
In California (CENIC)
3,800+ miles of optical fiber
Members in all 58 counties connect via
fiber-optic cable or leased circuits
from telecom carriers
• Nearly 10,000 sites
connect to CENIC
10,000,000+ Californians
use CENIC each day
Governed by members on
the segmental level
9. How Can a Campus Connect Its Researchers,
Instruments, and Clusters at 10-100 Gbps?
• Strategic Recommendation to the NSF #3: “
– NSF should create a new program funding high-speed (currently
10 Gbps) connections from campuses to the nearest landing point
for a national network backbone. The design of these connections
must include support for dynamic network provisioning services
and must be engineered to support rapid movement of large
scientific data sets."
– - pg. 6, NSF Advisory Committee for Cyberinfrastructure Task
Force on Campus Bridging, Final Report, March 2011
– www.nsf.gov/od/oci/taskforces/TaskForceReport_CampusBridging.pdf
• Led to Office of Cyberinfrastructure RFP March 1, 2012
• NSF’s Campus Cyberinfrastructure –
Network Infrastructure & Engineering (CC-NIE) Program
– 1st Area: Data Driven Networking Infrastructure
for the Campus and Researcher
– 2nd Area: Network Integration and Applied Innovation
10. Examples of CC-NIE Winning Proposals
In California
•
UC Davis
– Develop Infrastructure for Managing/Transfer/Analysis of Big Data
– LSST (30TB/day), GENOME, and More Including Social Sciences
– Provide Data to Campus Research Groups that Perform Network-Related
Research (Security & Performance)
– Create a Software Defined Network (SDN) – Use OpenFlow
– Upgrade Intra-Campus and CENIC Connections
•
San Diego State University
– Implementing a Science DMZ through CENIC
– Balancing Performance and Security Needs
– Operational Network Use: security > performance
– Research Network Use: performance > security
•
Also USC, Caltech,
and UCSD
Stanford University
– Develop SDN-Based Private Cloud
– Connect to Internet2 100G Innovation Platform
– Campus-wide Sliceable/VIrtualized SDN Backbone (10-15 switches)
– SDN control and management
Source: Louis Fox, CENIC CEO
11. Creating a Big Data Freeway System:
Use Optical Fiber with 1000x Shared Internet Speeds
NSF CC-NIE Has Awarded Prism@UCSD Optical Switch
Phil Papadopoulos, SDSC, Calit2, PI
12. Many Disciplines Beginning to Need
Dedicated High Bandwidth on Campus
How to Utilize a CENIC 100G Campus Connection
• Remote Analysis of Large Data Sets
– Particle Physics
• Connection to Remote Campus Compute & Storage Clusters
– Microscopy and Next Gen Sequencers
• Providing Remote Access to Campus Data Repositories
– Protein Data Bank and Mass Spectrometry
• Enabling Remote Collaborations
– National and International
14. UCSD is a Tier-2 LHC Data Center:
CMS Flow into UCSD Physics Dept. Peaks at 2.4 Gbps
Source: Frank Wuerthwein, Physics UCSD
15. Planning for climate change in California
substantial shifts on top of already high climate variability
UCSD Campus Climate Researchers Need to Download
Results from Remote Supercomputer Simulations
to Make Regional Climate Change Forecasts
Dan Cayan
USGS Water Resources Discipline
Scripps Institution of Oceanography, UC San Diego
much support from Mary Tyree, Mike Dettinger, Guido Franco and
other colleagues
Sponsors:
California Energy Commission
NOAA RISA program
California DWR, DOE, NSF
17. Ultra High Resolution Microscopy Images
Created at the National Center for Microscopy Imaging
18. NIH National Center for Microscopy & Imaging Research
Integrated Infrastructure of Shared Resources
Shared Infrastructure
Scientific
Instruments
Local SOM
Infrastructure
End User
Workstations
Source: Steve Peltier, Mark Ellisman, NCMIR
19. Using Calit2’s VROOM to Explore Confocal Light
Microscope Collages of Rat Brains
20. Protein Data Bank (PDB) Needs
Bandwidth to Connect Resources and Users
• Archive of experimentally
determined 3D structures of
proteins, nucleic acids, complex
assemblies
• One of the largest scientific
resources in life sciences
Virus
Hemoglobin
Source: Phil Bourne and
Andreas Prlić, PDB
21. PDB Usage Is Growing Over Time
•
•
•
•
More than 300,000 Unique Visitors per Month
Up to 300 Concurrent Users
~10 Structures are Downloaded per Second 7/24/365
Increasingly Popular Web Services Traffic
Source: Phil Bourne and Andreas Prlić, PDB
22. 2010 FTP Traffic
RCSB PDB
PDBe
PDBj
159 million
entry downloads
34 million
entry downloads
16 million
entry download
22
Source: Phil Bourne and Andreas Prlić, PDB
23. PDB Plans to Establish Global Load Balancing
• Why is it Important?
– Enables PDB to Better Serve Its Users by Providing
Increased Reliability and Quicker Results
• How Will it be Done?
– By More Evenly Allocating PDB Resources at Rutgers and
UCSD
– By Directing Users to the Closest Site
• Need High Bandwidth Between Rutgers & UCSD Facilities
Source: Phil Bourne and Andreas Prlić, PDB
24. Tele-Collaboration for Audio Post-Production
Realtime Picture & Sound Editing Synchronized Over IP
Skywalker Sound@Marin
Calit2@San Diego
25. Collaboration Between EVL’s CAVE2
and Calit2’s VROOM Over 10Gb Wavelength
Calit2
EVL
Source: NTT Sponsored ON*VECTOR Workshop at Calit2 March 6, 2013
26. Partnering Opportunities with DOE:
ARRA Stimulus Investment for DOE Esnet 100Gbps
National-Scale 100Gbps Network Backbone
Source: Presentation to ESnet Policy Board
27. 100G Addition CENIC to UCSD--Configurable,
High-speed, Extensible Research Bandwidth (CHERuB)
818 W. 7th, Los Angeles, CA
10100 Hopkins Drive, La Jolla, CA
SDSC NAP
Equinix/L3/CENIC POP
DWDM
100G
transponders
existing
CENIC fiber
DWDM
100G
transponders
Nx10G
up to 3 add'l 100G
transponders can be
attached
up to 3 add'l 100G
transponders can be
attached
Existing ESnet
SD router
100G
UCSD/SDSC
Gateway Juniper
MX960 "MX0"
New 2x100G/8x10G
line card + optics
New 40G
line card +
optics
SDSC Juniper
MX960 "Medusa"
PacWave,
CENIC,
Internet2, NLR,
ESnet,
StarLight,
XSEDE & other
R&E networks
New 100G card/
optics
100G
2x40G
UCSD
DYNES
4x10G
add'l 10G card/optics
Other
SDSC
resources
Dual Arista 7508
"Oasis"
mult. 40G
connections
256x10G
UCSD Primary Node
Cisco 6509 "Node B"
Pink/black existing UCSD
infrastructure
mult. 40G+
connections
Green/dashed lines new component/
equipment in proposal
128x10G
DataOasis/
SDSC Cloud
SDSC
DYNES
GORDON
compute
cluster
mult. 10G
connections
UCSD
Production users
PRISM@UCSD
Arista 7504
Key:
NEW
10G
UCSD/SDSC
Cisco 6509
100G
to CENIC/
PacWave
switch L2
UCSD
10G
PRISM@UCSD
- many UCSD big
data users
Source:
Mike Norman,
SDSC
29. We Used SDSC’s Gordon Data-Intensive Supercomputer
to Analyze a Wide Range of Gut Microbiomes
• ~180,000 Core-Hrs on Gordon
– KEGG function annotation: 90,000 hrs
– Mapping: 36,000 hrs
– Used 16 Cores/Node
and up to 50 nodes
– Duplicates removal: 18,000 hrs
Enabled by
a Grant of Time
– Assembly: 18,000 hrs
on Gordon from SDSC
– Other: 18,000 hrs
Director Mike Norman
• Gordon RAM Required
– 64GB RAM for Reference DB
– 192GB RAM for Assembly
• Gordon Disk Required
– Ultra-Fast Disk Holds Ref DB for All Nodes
– 8TB for All Subjects
30. SDSC’s Triton Shared Computing Cluster (TSCC)
• High Performance Research Computing Facility
Offered for UC researchers (Including from UC Riverside)
– Faculty Using Startup Package Funds to Purchase
Computing and Storage Time at SDSC
• Hybrid Business Model:
– “Condo” – PIs Purchase Nodes;
– RCI Subsidizes Operating Fees
– “Hotel” – Pay-as-you-go Computing Time
• Launched June 2013 –
– Seeing Strong Interest, Good/Growing Adoption
31. Comet is a ~2 PF System Architected
for the “Long Tail of Science”
NSF Track 2 award to SDSC
$12M NSF award to acquire
$3M/yr x 4 yrs to operate
Production early 2015
32. High Performance Wireless Research and Education Network
http://hpwren.ucsd.edu/
National Science Foundation awards 0087344, 0426879 and 0944131
34. HPWREN Topology, 360 Degree Cameras
155Mbps FDX 6 GHz FCC licensed
155Mbps FDX 11 GHz FCC licensed
45Mbps FDX 6 GHz FCC licensed
45Mbps FDX 11 GHz FCC licensed
45Mbps FDX 5.8 GHz unlicensed
45Mbps-class HDX 4.9GHz
45Mbps-class HDX 5.8GHz unlicensed
~8Mbps HDX 2.4/5.8 GHz unlicensed
~3Mbps HDX 2.4 GHz unlicensed
115kbps HDX 900 MHz unlicensed
56kbps via RCS network
via Tribal Digital Village Network
WIDC
KYVW
KNW
B08
1
BDC
GVDA
Santa
WMC
Rosa
RDM
CRY
SND
SMER
PFO
AZRY
BZN
dashed = planned
KSW
FRD
MPO
P474
DHL
SO
SLMS
LVA2
BVDA
P478
SCS
P486
MTGY MVFD
P510
P483
RMNA
DSME
GLRS
CRRS
WLA
USGC
CWC
GMPK
P506
P499
P480
P509
CE
70+ miles
to SCI
MONP
UCSD
DESC
P497
MLO
P494
P473
IID2
SDSU
P500
CNM PL
to CI and
PEMEX
POTR
P066
NSS
S
Red circles: HPWREN supplied cameras
Yellow circles: SD County supplied cameras
Source: Hans Werner Braun, HPWREN PI
approximately 50 miles:
Note: locations are approximate
Backbone/relay node
Astronomy science site
Biology science site
Earth science site
University site
Researcher location
Native American site
First Responder site
35. Various Real-Time Network Cameras
for Environmental Observations
Source: Hans Werner Braun,
HPWREN PI
36. Time-Lapse Video of Mt. Laguna Chariot Wildfire
From HPWREN Camera (July 8, 2013)
Source: Hans Werner Braun, HPWREN PI
Similar Video
of
Mountain Fire
in Riverside
38. Relative Humidity
Wind speed
Wind direction
Trigger real-time computer-generated alerts, if:
Fuel moisture
condition “A” AND condition “B” AND condition “C”
OR condition “D”
exists, in which case several San Diego emergency officers
are being paged or emailed during such alert conditions,
based on HPWREN data parameterization by a CDF Division
Chief.This system has been in operation since 2004.
Date: Wed, 4 Aug 2010 09:31:05 -0700
Subject: URGENT weather sensor alert
Source: Hans Werner Braun, HPWREN PI
LP: RH=26.1 WD=135.2 WS=1.9 FM=6.8 AT=80.7 at 20100804.093100
More details at http://hpwren.ucsd.edu/Sensors/
39. San Diego Wildfire First Responders
Meeting at Calit2 Aug 25, 2010
SDSC’s Hans-Werner Braun Explains His
High Performance Wireless Research and Education Network
40. Area Situational Awareness for Public Safety Network
(ASAPnet) Extends HPWREN to Connect Fire Stations
Connecting 60 backcountry fire stations as the region nears the peak of its fire season.
Aug. 14, 2013 www.calit2.net/newsroom/release.php?id=2210
41. Creating a Digital “Mirror World”:
Interactive Virtual Reality of San Diego County
Source: Jessica Block, Calit2
0.5 meter image resolution.
2meter resolution elevation
42. All Meteorological Stations Are Represented in Realtime:
Wind Direction, Velocity, and Temperature
Source: Jessica Block, Calit2
43. Using Calit2’s Qualcomm Institute NexCAVE
for CAL FIRE Research and Planning
Source: Jessica Block, Calit2
44. A Scalable Data-Driven Monitoring, Dynamic Prediction and
Resilience Cyberinfrastructure for Wildfires (WiFire)
NSF Has Just Awarded the WiFire Grant – Ilkay Altintas SDSC PI
Development of end-to-end “cyberinfrastructure” for
“analysis of large dimensional heterogeneous real-time
sensor data”
Photo by Bill Clayton
System integration of
•real-time sensor networks,
•satellite imagery,
•near-real time data
management tools,
•wildfire simulation tools
•connectivity to emergency
command centers before
during and after a firestorm.
Editor's Notes
Foundation for fifth-generation architecture.
Change economies and scaling properties. [no more on this slide]
Nodes cost ~$5,000 each plus $495/node/year operating fee (PI
Hotel” – Pay-as-you-go computing time – purchase by recharge at 2.5 cents per core-hour
share)