“Building the Pacific Research Platform:
Supernetworks for Big Data Science”
Steve Jones Internet Lecture
The 67th Annual Conference of the International Communication Association
San Diego, CA
May 26, 2017
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
In every field we see an exponential rise of Big Data, which in turn is demanding new
technological solutions in visualization, machine learning, and high performance
cyberinfrastructure. The rise of artificial intelligence will both be powered by these
developments and be essential for deriving understanding from the tsunami of data. I will
describe how my NSF-funded Pacific Research Platform, which provides an Internet
platform with 100-1000 times the bandwidth of today's commodity Internet to all the
research universities on the West Coast, is being designed from the application needs of
researchers from particle physics to climate to human health. Even fields like
archaeology, digital libraries, and social media analysis are engaged.
The Defining Issue in IT for the Coming Decades:
Machine Intelligence Coupled to Massive Data
May 5, 2015August 25, 2015
Traffic Control for Autonomous Drone Air Delivery
is Under Development by NASA, Amazon, & Google

Towards a High-Performance National Research Platform Enabling Digital Research
Towards a High-Performance National Research Platform Enabling Digital Research
Towards a High-Performance National Research Platform Enabling Digital Research

The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.

Physics Research in an Era of Global Cyberinfrastructure
Physics Research in an Era of Global Cyberinfrastructure
Physics Research in an Era of Global Cyberinfrastructure

05.11.03 Physics Department Colloquium UCSD Title: Physics Research in an Era of Global Cyberinfrastructure La Jolla, CA

ucsdsmarrphysics department
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2

The document discusses Internet2, an advanced networking consortium that operates a 15,000 mile fiber optic network for research and education. It provides very high speed connectivity and collaboration technologies to facilitate large data sharing and frictionless research. Examples are given of life sciences projects utilizing Internet2's high-speed network for genomic research and agricultural applications involving terabytes of satellite and sensor data. The network is expanding to include cloud computing resources and supercomputing centers to enable global-scale distributed scientific computing and collaboration.

Self-Driving Cars From Multiple Companies
Use Advanced Sensors Coupled to Realtime Computing
I Am Living in the Self-Driving Future
Streaming Data From the Tesla Fleet Trains Self-Driving Algorithms:
The “Hive-Mind”
Note: Google
Self-Driving Cars
Have Only Driven
1.5 Million Miles
The Planetary-Scale Computer Fed by a Trillion Sensors
Will Drive a Global Industrial Internet
Next Decade
One Trillion “Within the next 20 years
the Industrial Internet
will have added
to the global economy
an additional $15 trillion.”
--General Electric

Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...

Presentation at the AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013 Additional related material at: Related paper at: Abstract: We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the five V's of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive Value for supporting practical applications transcending physical-cyber-social continuum.

big datasemantic websemantic technology
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...

Featured Keynote at Worldcomp'14, July 2014: Video of the talk at: Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is human health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information, etc.). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data! In this talk, I will forward the concept of Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If I am an asthma patient, for all the data relevant to me with the four V-challenges, what I care about is simply, “How is my current health, and what is the risk of having an asthma attack in my personal situation, especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations. Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.

smart datamhealthpatient empowerment
From IoT Devices to Cloud
From IoT Devices to Cloud
From IoT Devices to Cloud

This document discusses the convergence of IoT devices, edge computing, fog computing, and cloud computing infrastructures. It notes the exponential growth in connected devices and data generated, and need for distributed computing resources closer to users to address latency, bandwidth and other constraints. Key research issues discussed include locality-aware resource management, deployment and reconfiguration of edge sites, energy monitoring and optimization, and resilience across distributed infrastructures.

edge computingiotcloud computing
How Can We Build an Academic Cyberinfrastructure
to Enable Collaborative Teams to Discover Patterns From Big Data?
We Have Been Working Toward the Pacific Research Platform for 15 Years:
OptIPuter, Quartzite, Prism
PI Papadopoulos,
Co-PI Smarr
PI Smarr,
Co-PI DeFanti
Co-PI Papadopoulos
PI Papadopoulos,
Co-PI Smarr
Giving Individual Researchers Optical Fibers
To Create an On-Campus Big Data Freeway System
Phil Papadopoulos, SDSC, Calit2, PI
UCSD’s 30,000+
Internet Users
Travel Over
One 10Gbps Fiber
PRISM is Connecting CERN’s CMS Experiment
To UCSD Physics Department
80 Gbps PRISM Connection Has Been Made

Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing Platforms

Cloud computing has now crossed the frontiers of research to reach industry. It is used every day , whether to exchange emails or make reservations on web sites. However, many research works remain to be done to improve the performance and functionality of these platforms of tomorrow. In this talk, I will do an overview of some these theoretical and appliead researches done at INRIA and particularly around Clouds distribution, energy monitoring and management, massive data processing and exchange, and resource management.

Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...

There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems. Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation. We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).

declarative knowledgeprobabilistic graphical modelsphysical-cyber-social systems
Cloud and Big Data Come Together in the Ocean Observatories Initiative to Giv...
Cloud and Big Data Come Together in the Ocean Observatories Initiative to Giv...
Cloud and Big Data Come Together in the Ocean Observatories Initiative to Giv...

Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer researchers a treasure trove of new real-time information.

briefingsdirectdana gardnervmware
Big Data Science Data Transfer Nodes -
Flash I/O Network Appliances (FIONAs)
UCSD Designed FIONAs
To Solve the Disk-to-Disk
Data Transfer Problem
at Full Speed
on 10G, 40G and 100G Networks
FIONAS—10/40G, $8,000
FIONette—1G, $1,000
Phil Papadopoulos, SDSC &
Tom DeFanti, Joe Keefe & John Graham, Calit2
John Graham, Calit2
How Prism Optical Network Transforms Big Data Microbiome Science:
Preparing for Knight/Smarr 1 Million Core-Hour Analysis
Knight Lab
Data Oasis
Knight 1024 Cluster
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
NSF Has Funded Over 100 Campuses
to Build On-Campus Big Data Freeways
Red 2012 CC-NIE Awardees
Yellow 2013 CC-NIE Awardees
Green 2014 CC*IIE Awardees
Blue 2015 CC*DNI Awardees
Purple Multiple Time Awardees
Source: NSF
Logical Next Step: The Pacific Research Platform Creates
a Regional End-to-End Science-Driven “Big Data Superhighway” System
NSF Grant
$5M 10/2015-10/2020
PI: Larry Smarr, UC San Diego Calit2
Letters of Commitment from:
• 50 Researchers from 15 Campuses
• 32 IT/Network Organization Leaders

UC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive Research

Invited presentation by Calit2 Director Larry Smarr to the UC IT Leadership Council in Oakland, Calif., on May 19, 2014.

Cal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun Microsystems

04.11.09 Invited Talk at the Sun Microsystems Booth SC04 Title: Cal-(IT)2 Projects with Sun Microsystems Pittsburgh, PA


Cyberenvironments integrate shared and custom cyberinfrastructure resources into a process-oriented framework to support scientific communities and allow researchers to focus on their work rather than managing infrastructure. They enable more complex multi-disciplinary challenges to be tackled through enhanced knowledge production and application. Key challenges include coordinating distributed resources and users without centralization and evolving systems rapidly to keep pace with advancing science.

PRP’s First 1.5 Years:
Connecting Campus Application Teams and Devices
Cancer Genomics Hub (UCSC) is Housed in SDSC:
Large Data Flows to End Users at UCSC, UCB, UCSF, …
Data Source: David Haussler,
Brad Smith, UCSC
Jan 2016
30,000 TB
Per Year
20x40G PRP-connected
WAVE@UC San Diego
PRP Now Enables
Distributed Virtual Reality
WAVE @UC Merced
Transferring 5 CAVEcam Images from UCSD to UC Merced:
2 Gigabytes now takes 2 Seconds (8 Gb/sec)
The Prototype PRP Has Attracted
New Application Drivers
Scott Sellars, Marty Ralph
Center for Western Weather and Water Extremes
Frank Vernon - Expansion of HPWREN
Tom Levy, Cultural Heritage
Cryo EM

Grid computing assiment
Grid computing assiment
Grid computing assiment

Grid computing combines the resources of multiple computers from different organizations to solve large problems. It works by sharing computing power, memory, storage and other resources across an authorized network. Examples of grid computing include projects that analyze large datasets like genome sequencing or simulate complex systems like climate modeling. Major grid computing projects include those run by scientific organizations like CERN and SETI@home, which analyzes radio telescope data using volunteers' computers. Grid computing infrastructure allows resources to be accessed easily like a utility over the network.

Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...

Keynote at Web Intelligence 2017: Video: Paper: Abstract: While Bill Gates, Stephen Hawking, Elon Musk, Peter Thiel, and others engage in OpenAI discussions of whether or not AI, robots, and machines will replace humans, proponents of human-centric computing continue to extend work in which humans and machine partner in contextualized and personalized processing of multimodal data to derive actionable information. In this talk, we discuss how maturing towards the emerging paradigms of semantic computing (SC), cognitive computing (CC), and perceptual computing (PC) provides a continuum through which to exploit the ever-increasing and growing diversity of data that could enhance people’s daily lives. SC and CC sift through raw data to personalize it according to context and individual users, creating abstractions that move the data closer to what humans can readily understand and apply in decision-making. PC, which interacts with the surrounding environment to collect data that is relevant and useful in understanding the outside world, is characterized by interpretative and exploratory activities that are supported by the use of prior/background knowledge. Using the examples of personalized digital health and a smart city, we will demonstrate how the trio of these computing paradigms form complementary capabilities that will enable the development of the next generation of intelligent systems. For background:

semantic computingsemantic webcognitive computing
Collins seattle-2014-final
Collins seattle-2014-final
Collins seattle-2014-final

In this deck from the 2014 HPC User Forum in Seattle, Jack Collins from the National Cancer Institute presents: Genomes to Structures to Function: The Role of HPC. Watch the video presentation:

genomicshpc user forumsupercomputing
Director: F. Martin Ralph Website:
Big Data Collaboration with:
Source: Scott Sellers, CW3E
Collaboration on Atmospheric Water in the West
Between UC San Diego and UC Irvine
Director, Soroosh Sorooshian, UCSD Website
Calit2’s FIONA
Calit2’s FIONA
Pacific Research Platform (10-100 Gb/s)
Complete workflow time: 20 days20 hrs20 Minutes!
UC, Irvine UC, San Diego
Improvement of Over 1000x With PRP
Linking Cultural Heritage and Archaeology Datasets
at UCB, UCLA, UCM and UCSD with CAVEkiosks
48 Megapixel CAVEkiosk
UCSD Library
48 Megapixel CAVEkiosk
UCB Library
24 Megapixel CAVEkiosk
UCM Library
Expanding to National Research Platform
and Global Research Platform
PRP’s Current

Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big Data

1) Amit Sheth presented on how knowledge can help machines better understand big data. 2) He discussed challenges like understanding implicit entities, analyzing drug abuse forums, and understanding city traffic using sensors and text. 3) Sheth argued that knowledge graphs and ontologies can help interpret diverse data types and provide contextual understanding to help solve real-world problems.

knowledge-enhanced machine learningbig datamachine intelligence
An Integrated Science Cyberinfrastructure for Data-Intensive Research
An Integrated Science Cyberinfrastructure for Data-Intensive Research
An Integrated Science Cyberinfrastructure for Data-Intensive Research

This document summarizes Dr. Larry Smarr's vision for an integrated science cyberinfrastructure to support data-intensive research. It discusses the exponential growth of digital data and need for dedicated high-bandwidth networks and data repositories. Specific examples are provided of initiatives at UCSD, regional optical networks connecting research institutions, and national projects like the Open Science Grid and Cancer Genomics Hub that are creating cyberinfrastructure to enable data-intensive scientific discovery.

big datacyberinfrastructurescience
The Rise of Machine Intelligence
The Rise of Machine Intelligence
The Rise of Machine Intelligence

Big Thought Leaders Colloquium Series – Spring 2017 Jackson State University Jackson, MS April 11, 2017

automationartificial intelligenceanalytics
Now that PRP Can Move Big Data Quickly,
Next Step is to Add Machine Learning
What is the Cyberinfrastructure Needed
For The World of Autonomous Machines?
• Supernetworks Connecting Big Data to GPU-Cloud for Training AI Nets
• Trained Neural Nets Downloaded onto Robots
• Robots Use Neural Nets to Navigate with Real-Time Data Streams
• Swarm Input to Update Training on Neural Nets
Plans for ~500 Game GPUs Deployed on the Pacific Research Platform
Devoted to Machine Learning
High Speed “Cloud” of 320 GPUs
for Training AI Algorithms on Big Data
48 GPUs
for Applications
48 GPUs
for Students
FIONA with
8-Game GPUs
For ¾ of a Century, Computing Has Relied
on von Neumann’s Architecture

The Pacific Research Platform Connects to CSU San Bernardino
The Pacific Research Platform Connects to CSU San Bernardino
The Pacific Research Platform Connects to CSU San Bernardino

Invited Remote Keynote High Performance Computing Initiative California State University San Bernardino March 18, 2022

distributed supercomputerdistributed machine learningdistributed systems
From NCSA to the National Research Platform
From NCSA to the National Research Platform
From NCSA to the National Research Platform

Invited Seminar National Center for Supercomputing Applications University of Illinois Urbana-Champaign May 9, 2024

ncsasupercomputingnational research platform
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System

The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.

The Future of Supercomputing Will Blend Traditional HPC and Data Analytics
Integrating Non-von Neumann Architectures
“High Performance Computing Will Evolve
Towards a Hybrid Model,
Integrating Emerging Non-von Neumann Architectures,
with Huge Potential in Pattern Recognition,
Streaming Data Analysis,
and Unpredictable New Applications.”
Horst Simon, Deputy Director,
U.S. Department of Energy’s
Lawrence Berkeley National Laboratory
UC San Diego Creates
Center for Brain Activity Mapping
From left, Nick Spitzer, Ralph Greenspan, and Terry Sejnowski.
Photos by Erik Jepsen/UC San Diego Publications
May 16, 2013
Reverse Engineering of the Brain:
Large Scale Microscopy of Mammal Brains Reveals Complex Connectivity
Source: Rat Cerebellum Image, Mark Ellisman, UCSD
Cell Bodies
Neuronal Dendritic
Overlap Region
Realtime Simulation of Human Brain Possible
Within the Next Ten Years With Exascale Supercomputer
Horst Simon, Deputy Director,
Lawrence Berkeley National Laboratory’s
National Energy Research Scientific Computing
Trend Line

The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer
The Pacific Research Platform-a High-Bandwidth Distributed Supercomputer
The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer

Super Computing Asia (SCA21) Singapore March 2-4, 2021

distributed supercomputer
The OptIPuter Project: From the Grid to the LambdaGrid
The OptIPuter Project: From the Grid to the LambdaGrid
The OptIPuter Project: From the Grid to the LambdaGrid

05.10.24 Invited Talk IEEE Orange County Computer Society Title: The OptIPuter Project: From the Grid to the LambdaGrid Irvine, CA

The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...

The document discusses the Pacific Research Platform (PRP), a regional big data cyberinfrastructure connecting researchers across California universities. PRP provides high-speed networks and data transfer nodes to enable sharing of large datasets for projects like medical imaging, cryo-electron microscopy, and machine learning. Recent grants are expanding PRP to add GPUs and non-von Neumann processors to support these computationally intensive applications.

The Rise of Brain-Inspired Computers:
Left & Right Brain Computing: Arithmetic vs. Pattern Recognition
Adapted from D-Wave
Brain-Inspired Processors
Are Accelerating the Non-von Neumann Architecture Era
“On the drawing board are collections of 64, 256, 1024, and 4096 chips.
‘It’s only limited by money, not imagination,’ Modha says.”
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
Google Designed a NvN
Machine Learning Accelerator
AI is Advancing at an Unprecedented Pace:
Deep Learning Algorithms Working on Massive Datasets
1.5 Years!
Training on 30M Moves,
Then Playing Against Itself
Google Used TPUs to Achieve the Go Victory

Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025

This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.

national science foundationcyberinfrastructureinformation technology
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System

Opening Presentation Pacific Research Platform Workshop Calit2’s Qualcomm Institute University of California, San Diego October 14, 2015

big datapacific research platformcyberinfrastructure
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway

Remote Briefing to the Ad Hoc Big Data Task Force of the NASA Advisory Council Science Committee NASA Goddard Space Flight Center June 28, 2016

nasagoddardbig data
Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab
For Machine Learning on Non-von Neumann Processors
“On the drawing board are collections of 64, 256, 1024, and 4096
‘It’s only limited by money, not imagination,’ Modha says.”
Source: Dr. Dharmendra Modha
Founding Director, IBM Cognitive Computing Group
August 8, 2014
UCSD ECE Professor Ken Kreutz-Delgado Brings
the IBM TrueNorth Chip
to Start Calit2’s Qualcomm Institute
Pattern Recognition Laboratory
September 16, 2015
Contextual Robots Need Low Energy Neuromorphic Processors That
Can See and Learn Wirelessly Tied Into the Planetary Cloud Computer
Professor Tajana Rosing
Calit2 Has Students Creating 3D Printed Drones
Deploying Trained Neural Nets on Non-von Neumann Processors
DOD: “Perdix drones share one distributed brain for decision-making,
adapting to each other like swarms in nature.”

Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains Network

The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points: - The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider. - The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources. - Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR

The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...

Presentation to Society of Automotive Engineers (SAE) Committee on Integrated Vehicle Health Management Stanford University September 12, 2017

Blowing up the Box--the Emergence of the Planetary Computer
Blowing up the Box--the Emergence of the Planetary Computer
Blowing up the Box--the Emergence of the Planetary Computer

05.10.13 Invited Talk Oak Ridge National Laboratory Title: Blowing up the Box--the Emergence of the Planetary Computer Oak Ridge, TN

smarrcalit2oak ridge national laboratory
Should We Give Robots Autonomy?
This Next Decade’s Computing Transition
Will Not Be Just About Technology
"Those disposed to dismiss
an 'AI takeover' as science
fiction may think again after
reading this original and
well-argued book." —Martin
Rees, Past President, Royal
If our own extinction is
a likely, or even possible,
outcome of our
development, shouldn't we
proceed with great
Success in creating AI would be
the biggest event in human
history. Unfortunately, it might
also be the last, unless we learn
how to avoid the risks.
– Steven Hawking
Our Support:
• US National Science Foundation (NSF) awards CNS 0821155 and
CNS-1338192, CNS-1456638, ACI-1540112, and ACI-1541349
• University of California Office of the President CIO
• UCSD Chancellor’s Integrated Digital Infrastructure Program
• UCSD Next Generation Networking initiative
• Calit2 and Calit2 Qualcomm Institute
• CENIC, PacificWave and StarLight
• DOE ESnet

Building the Pacific Research Platform: Supernetworks for Big Data Science

  • 1. “Building the Pacific Research Platform: Supernetworks for Big Data Science” Steve Jones Internet Lecture The 67th Annual Conference of the International Communication Association San Diego, CA May 26, 2017 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD 1
  • 2. Abstract In every field we see an exponential rise of Big Data, which in turn is demanding new technological solutions in visualization, machine learning, and high performance cyberinfrastructure. The rise of artificial intelligence will both be powered by these developments and be essential for deriving understanding from the tsunami of data. I will describe how my NSF-funded Pacific Research Platform, which provides an Internet platform with 100-1000 times the bandwidth of today's commodity Internet to all the research universities on the West Coast, is being designed from the application needs of researchers from particle physics to climate to human health. Even fields like archaeology, digital libraries, and social media analysis are engaged.
  • 3. The Defining Issue in IT for the Coming Decades: Machine Intelligence Coupled to Massive Data May 5, 2015August 25, 2015
  • 4. Traffic Control for Autonomous Drone Air Delivery is Under Development by NASA, Amazon, & Google
  • 5. Self-Driving Cars From Multiple Companies Use Advanced Sensors Coupled to Realtime Computing
  • 6. I Am Living in the Self-Driving Future
  • 7. Streaming Data From the Tesla Fleet Trains Self-Driving Algorithms: The “Hive-Mind” Note: Google Self-Driving Cars Have Only Driven 1.5 Million Miles
  • 8. The Planetary-Scale Computer Fed by a Trillion Sensors Will Drive a Global Industrial Internet Next Decade One Trillion “Within the next 20 years the Industrial Internet will have added to the global economy an additional $15 trillion.” --General Electric
  • 9. How Can We Build an Academic Cyberinfrastructure to Enable Collaborative Teams to Discover Patterns From Big Data?
  • 10. We Have Been Working Toward the Pacific Research Platform for 15 Years: OptIPuter, Quartzite, Prism PI Papadopoulos, Co-PI Smarr 2013-2015 PI Smarr, Co-PI DeFanti Co-PI Papadopoulos 2002-2009 PI Papadopoulos, Co-PI Smarr 2004-2007
  • 11. Giving Individual Researchers Optical Fibers To Create an On-Campus Big Data Freeway System NSF CC-NIE Prism@UCSD Phil Papadopoulos, SDSC, Calit2, PI CHERuB UCSD’s 30,000+ Internet Users Travel Over One 10Gbps Fiber
  • 12. PRISM is Connecting CERN’s CMS Experiment To UCSD Physics Department 80 Gbps PRISM Connection Has Been Made
  • 13. Big Data Science Data Transfer Nodes - Flash I/O Network Appliances (FIONAs) UCSD Designed FIONAs To Solve the Disk-to-Disk Data Transfer Problem at Full Speed on 10G, 40G and 100G Networks FIONAS—10/40G, $8,000 FIONette—1G, $1,000 Phil Papadopoulos, SDSC & Tom DeFanti, Joe Keefe & John Graham, Calit2 John Graham, Calit2
  • 14. How Prism Optical Network Transforms Big Data Microbiome Science: Preparing for Knight/Smarr 1 Million Core-Hour Analysis Knight Lab FIONA 10Gbps Gordon Prism@UCSD Data Oasis 7.5PB, 200GB/s Knight 1024 Cluster In SDSC Co-Lo CHERuB 100Gbps Emperor & Other Vis Tools 64Mpixel Data Analysis Wall 120Gbps 40Gbps 1.3Tbps
  • 15. NSF Has Funded Over 100 Campuses to Build On-Campus Big Data Freeways Red 2012 CC-NIE Awardees Yellow 2013 CC-NIE Awardees Green 2014 CC*IIE Awardees Blue 2015 CC*DNI Awardees Purple Multiple Time Awardees Source: NSF
  • 16. Logical Next Step: The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Superhighway” System NSF Grant $5M 10/2015-10/2020 PI: Larry Smarr, UC San Diego Calit2 Letters of Commitment from: • 50 Researchers from 15 Campuses • 32 IT/Network Organization Leaders
  • 17. PRP’s First 1.5 Years: Connecting Campus Application Teams and Devices
  • 18. Cancer Genomics Hub (UCSC) is Housed in SDSC: Large Data Flows to End Users at UCSC, UCB, UCSF, … 1G 8G Data Source: David Haussler, Brad Smith, UCSC 15G Jan 2016 30,000 TB Per Year
  • 19. 40G FIONAs 20x40G PRP-connected WAVE@UC San Diego PRP Now Enables Distributed Virtual Reality PRP WAVE @UC Merced Transferring 5 CAVEcam Images from UCSD to UC Merced: 2 Gigabytes now takes 2 Seconds (8 Gb/sec)
  • 20. The Prototype PRP Has Attracted New Application Drivers Scott Sellars, Marty Ralph Center for Western Weather and Water Extremes Frank Vernon - Expansion of HPWREN Tom Levy, Cultural Heritage Cryo EM
  • 21. Director: F. Martin Ralph Website: Big Data Collaboration with: Source: Scott Sellers, CW3E Collaboration on Atmospheric Water in the West Between UC San Diego and UC Irvine Director, Soroosh Sorooshian, UCSD Website
  • 22. Calit2’s FIONA SDSC’s COMET Calit2’s FIONA Pacific Research Platform (10-100 Gb/s) GPUsGPUs Complete workflow time: 20 days20 hrs20 Minutes! UC, Irvine UC, San Diego Improvement of Over 1000x With PRP
  • 23. Linking Cultural Heritage and Archaeology Datasets at UCB, UCLA, UCM and UCSD with CAVEkiosks 48 Megapixel CAVEkiosk UCSD Library 48 Megapixel CAVEkiosk UCB Library 24 Megapixel CAVEkiosk UCM Library
  • 24. Expanding to National Research Platform and Global Research Platform PRP’s Current International Partners
  • 25. Now that PRP Can Move Big Data Quickly, Next Step is to Add Machine Learning
  • 26. What is the Cyberinfrastructure Needed For The World of Autonomous Machines? • Supernetworks Connecting Big Data to GPU-Cloud for Training AI Nets • Trained Neural Nets Downloaded onto Robots • Robots Use Neural Nets to Navigate with Real-Time Data Streams • Swarm Input to Update Training on Neural Nets
  • 27. Plans for ~500 Game GPUs Deployed on the Pacific Research Platform Devoted to Machine Learning Caltech UCB UCI UCR UCSD UCSC Stanford MSU UCM SDSU High Speed “Cloud” of 320 GPUs for Training AI Algorithms on Big Data SunCAVE 70 GPUs 48 GPUs for Applications 48 GPUs for Students FIONA with 8-Game GPUs
  • 28. For ¾ of a Century, Computing Has Relied on von Neumann’s Architecture
  • 29. The Future of Supercomputing Will Blend Traditional HPC and Data Analytics Integrating Non-von Neumann Architectures “High Performance Computing Will Evolve Towards a Hybrid Model, Integrating Emerging Non-von Neumann Architectures, with Huge Potential in Pattern Recognition, Streaming Data Analysis, and Unpredictable New Applications.” Horst Simon, Deputy Director, U.S. Department of Energy’s Lawrence Berkeley National Laboratory
  • 30. UC San Diego Creates Center for Brain Activity Mapping From left, Nick Spitzer, Ralph Greenspan, and Terry Sejnowski. Photos by Erik Jepsen/UC San Diego Publications May 16, 2013
  • 31. Reverse Engineering of the Brain: Large Scale Microscopy of Mammal Brains Reveals Complex Connectivity Source: Rat Cerebellum Image, Mark Ellisman, UCSD Neuron Cell Bodies Neuronal Dendritic Overlap Region
  • 32. Realtime Simulation of Human Brain Possible Within the Next Ten Years With Exascale Supercomputer Horst Simon, Deputy Director, Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center Fastest Supercomputer Trend Line Tianhe-2
  • 33. The Rise of Brain-Inspired Computers: Left & Right Brain Computing: Arithmetic vs. Pattern Recognition Adapted from D-Wave
  • 34. Brain-Inspired Processors Are Accelerating the Non-von Neumann Architecture Era “On the drawing board are collections of 64, 256, 1024, and 4096 chips. ‘It’s only limited by money, not imagination,’ Modha says.” Source: Dr. Dharmendra Modha Founding Director, IBM Cognitive Computing Group August 8, 2014
  • 35. Google Designed a NvN Machine Learning Accelerator
  • 36. AI is Advancing at an Unprecedented Pace: Deep Learning Algorithms Working on Massive Datasets 1.5 Years! Training on 30M Moves, Then Playing Against Itself Google Used TPUs to Achieve the Go Victory
  • 37. Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab For Machine Learning on Non-von Neumann Processors “On the drawing board are collections of 64, 256, 1024, and 4096 chips. ‘It’s only limited by money, not imagination,’ Modha says.” Source: Dr. Dharmendra Modha Founding Director, IBM Cognitive Computing Group August 8, 2014 UCSD ECE Professor Ken Kreutz-Delgado Brings the IBM TrueNorth Chip to Start Calit2’s Qualcomm Institute Pattern Recognition Laboratory September 16, 2015
  • 38. Contextual Robots Need Low Energy Neuromorphic Processors That Can See and Learn Wirelessly Tied Into the Planetary Cloud Computer Professor Tajana Rosing
  • 39. Calit2 Has Students Creating 3D Printed Drones Deploying Trained Neural Nets on Non-von Neumann Processors
  • 40. DOD: “Perdix drones share one distributed brain for decision-making, adapting to each other like swarms in nature.”
  • 41. Should We Give Robots Autonomy?
  • 42. This Next Decade’s Computing Transition Will Not Be Just About Technology "Those disposed to dismiss an 'AI takeover' as science fiction may think again after reading this original and well-argued book." —Martin Rees, Past President, Royal Society If our own extinction is a likely, or even possible, outcome of our technological development, shouldn't we proceed with great Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks. – Steven Hawking
  • 43. Our Support: • US National Science Foundation (NSF) awards CNS 0821155 and CNS-1338192, CNS-1456638, ACI-1540112, and ACI-1541349 • University of California Office of the President CIO • UCSD Chancellor’s Integrated Digital Infrastructure Program • UCSD Next Generation Networking initiative • Calit2 and Calit2 Qualcomm Institute • CENIC, PacificWave and StarLight • DOE ESnet