High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World

―High Performance Cyberinfrastructure Enables
Data-Driven Science in
the Globally Networked World‖

Invited Speaker
Grand Challenges in Data-Intensive Discovery Conference
San Diego Supercomputer Center, UC San Diego
La Jolla, CA
October 28, 2010

Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor, Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
Follow me on Twitter: lsmarr

Abstract

Today we are living in a data-dominated world where distributed scientific instruments,
as well as supercomputers, generate terabytes to petabytes of data. It was in response to
this challenge that the NSF funded the OptIPuter project to research how user-controlled
10Gbps dedicated lightpaths (or ―lambdas‖) could provide direct access to global data
repositories, scientific instruments, and computational resources from ―OptIPortals,‖ PC
clusters which provide scalable visualization, computing, and storage in the user's
campus laboratory. The use of dedicated lightpaths over fiber optic cables enables
individual researchers to experience ―clear channel‖ 10,000 megabits/sec, 100-1000
times faster than over today’s shared Internet—a critical capability for data-intensive
science. The seven-year OptIPuter computer science research project is now over, but it
stimulated a national and global build-out of dedicated fiber optic networks. U.S.
universities now have access to high bandwidth lambdas through the National
LambdaRail, Internet2's WaveCo, and the Global Lambda Integrated Facility. A few
pioneering campuses are now building on-campus lightpaths to connect the data-
intensive researchers, data generators, and vast storage systems to each other on
campus, as well as to the national network campus gateways. I will give examples of the
application use of this emerging high performance cyberinfrastructure in genomics,
ocean observatories, radio astronomy, and cosmology.

Academic Research ―OptIPlatform‖ Cyberinfrastructure:
A 10Gbps ―End-to-End‖ Lightpath Cloud

HD/4k Video Cams
HD/4k Telepresence
Instruments
End User HPC
OptIPortal

10G
Lightpaths

National LambdaRail

Campus
Optical Switch

Data Repositories & Clusters HD/4k Video Images

The OptIPuter Project: Creating High Resolution Portals
Over Dedicated Optical Channels to Global Science Data
Scalable
Adaptive
Graphics
Environment
(SAGE)

Picture
Source:
Mark
Ellisman,
David Lee,
Jason Leigh
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI
Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent

On-Line Resources
Help You Build Your Own OptIPortal
www.optiputer.net
http://wiki.optiputer.net/optiportal

www.evl.uic.edu/cavern/sage/

http://vis.ucsd.edu/~cglx/

OptIPortals Are Built
From Commodity PC Clusters and LCDs
To Create a 10Gbps Scalable Termination Device

Nearly Seamless AESOP OptIPortal

46‖ NEC Ultra-Narrow Bezel 720p LCD Monitors

Source: Tom DeFanti, Calit2@UCSD;

3D Stereo Head Tracked OptIPortal:
NexCAVE

Array of JVC HDTV 3D LCD Screens
KAUST NexCAVE = 22.5MPixels
www.calit2.net/newsroom/article.php?id=1584
Source: Tom DeFanti, Calit2@UCSD

Project StarGate Goals:
Combining Supercomputers and Supernetworks
• Create an ―End-to-End‖
10Gbps Workflow
• Explore Use of OptIPortals as
OptIPortal@SDSC
Petascale Supercomputer
―Scalable Workstations‖
• Exploit Dynamic 10Gbps
Circuits on ESnet
• Connect Hardware Resources
at ORNL, ANL, SDSC
• Show that Data Need Not be
Trapped by the Network Rick Wagner Mike Norman

―Event Horizon‖
Source: Michael Norman, SDSC, UCSD
• ANL * Calit2 * LBNL * NICS * ORNL * SDSC

Using Supernetworks to Couple End User’s OptIPortal
to Remote Supercomputers and Visualization Servers
Source: Mike Norman,
Rick Wagner, SDSC Argonne NL
DOE Eureka
100 Dual Quad Core Xeon Servers
200 NVIDIA Quadro FX GPUs in 50
Quadro Plex S4 1U enclosures
3.2 TB RAM rendering

ESnet
SDSC 10 Gb/s fiber optic network
NICS
visualization
ORNL
Calit2/SDSC OptIPortal1
20 30‖ (2560 x 1600 pixel) LCD panels
NSF TeraGrid Kraken simulation
Cray XT5
10 NVIDIA Quadro FX 4600 graphics
8,256 Compute Nodes
cards > 80 megapixels
99,072 Compute Cores
10 Gb/s network throughout
129 TB RAM

*ANL * Calit2 * LBNL * NICS * ORNL * SDSC

National-Scale Interactive Remote Rendering
of Large Datasets
SDSC ESnet ALCF
Science Data Network (SDN)
> 10 Gb/s Fiber Optic Network
Dynamic VLANs Configured
Using OSCARS
Rendering
Visualization Eureka
OptIPortal (40M pixels LCDs) 100 Dual Quad Core Xeon Servers
10 NVIDIA FX 4600 Cards 200 NVIDIA FX GPUs
10 Gb/s Network Throughout 3.2 TB RAM
Interactive Remote Rendering
Real-Time Volume Rendering Streamed from ANL to SDSC
Last Year Last Week
High-Resolution (4K+, 15+ FPS)—But: Now Driven by a Simple Web GUI
• Command-Line Driven •Rotate, Pan, Zoom
• Fixed Color Maps, Transfer Functions •GUI Works from Most Browsers
• Slow Exploration of Data • Manipulate Colors and Opacity
• Fast Renderer Response Time

Source: Rick Wagner, SDSC

NSF OOI is a $400M Program
-OOI CI is $34M Part of This

30-40 Software Engineers
Housed at Calit2@UCSD

Source: Matthew Arrott, Calit2 Program Manager for OOI CI

OOI CI
is Built on NLR/I2 Optical Infrastructure
Physical Network Implementation
Source: John Orcutt,
Matthew Arrott, SIO/Calit2

California and Washington Universities Are Testing
a 10Gbps Connected Commercial Data Cloud
• Amazon Experiment for Big Data
– Only Available Through CENIC & Pacific NW
GigaPOP
– Private 10Gbps Peering Paths
– Includes Amazon EC2 Computing & S3 Storage
Services
• Early Experiments Underway
– Robert Grossman, Open Cloud Consortium
– Phil Papadopoulos, Calit2/SDSC Rocks

Open Cloud OptIPuter Testbed--Manage and Compute
Large Datasets Over 10Gbps Lambdas

CENIC NLR C-Wave Dragon

Open Source SW
 Hadoop
• 9 Racks MREN  Sector/Sphere
• 500 Nodes  Nebula
• 1000+ Cores  Thrift, GPB
• 10+ Gb/s Now  Eucalyptus
• Upgrading Portions to  Benchmarks
100 Gb/s in 2010/2011

14
Source: Robert Grossman, UChicago

Ocean Modeling HPC In the Cloud:
Tropical Pacific SST (2 Month Ave 2002)
MIT GCM 1/3 Degree Horizontal Resolution, 51 Levels, Forced by NCEP2.
Grid is 564x168x51, Model State is T,S,U,V,W and Sea Surface Height

Run on EC2 HPC Instance. In Collaboration with OOI CI/Calit2

Source: B. Cornuelle, N. Martinez, C.Papadopoulos COMPAS, SIO

Run Timings of Tropical Pacific:
Local SIO ATLAS Cluster and Amazon EC2 Cloud
ATLAS ATLAS ATLAS EC2 HPC EC2 HPC
Ethernet Myrinet, Myrinet Ethernet Ethernet
NFS NFS Local Disk 1 Node Local Disk
Wall Time* 4711 2986 2983 14428 2379
User Time* 3833 2953 2933 1909 1590
System 798 17 19 2764 750
Time* *All times in Seconds

Atlas: 128 Node Cluster @ SIO COMPAS. Myrinet 10G, 8GB/node, ~3yrs old
EC2: HPC Computing Instance, 2.93GHz Nehalem, 24GB/Node, 10GbE

Compilers: Ethernet – GNU FORTRAN with OpenMPI
Myrinet – PGI FORTRAN with MPICH1

Single Node EC2 was Oversubscribed, 48 Process. All Other Parallel
Instances used 6 Physical Nodes, 8 Cores/Node. Model Code has been
Ported to Run on ATLAS, Triton (@SDSC) and in EC2.

Source: B. Cornuelle, N. Martinez, C.Papadopoulos COMPAS, SIO

Using Condor and Amazon EC2 on
Adaptive Poisson-Boltzmann Solver (APBS)
• APBS Rocks Roll (NBCR) + EC2 Roll
+ Condor Roll = Amazon VM
• Cluster extension into Amazon using Condor

Local
Running in Amazon Cloud
Cluster EC2 Cloud
NBCR NBCR
VM VM

NBCR
VM

APBS + EC2 + Condor

Source: Phil Papadopoulos,
SDSC/Calit2

Moving into the Clouds:
Rocks and EC2
• We Can Build Physical Hosting Clusters & Multiple,
Isolated Virtual Clusters:
– Can I Use Rocks to Author ―Images‖ Compatible with EC2?
(We Use Xen, They Use Xen)
– Can I Automatically Integrate EC2 Virtual Machines into
My Local Cluster (Cluster Extension)
– Submit Locally
– My Own Private + Public Cloud
• What This Will Mean
– All your Existing Software Runs Seamlessly
Among Local and Remote Nodes
– User Home Directories Can Be Mounted
– Queue Systems Work
– Unmodified MPI Works
Source: Phil Papadopoulos, SDSC/Calit2

―Blueprint for the Digital University‖--Report of the
UCSD Research Cyberinfrastructure Design Team
• Focus on Data-Intensive Cyberinfrastructure
April 2009

No Data
Bottlenecks
--Design for
Gigabit/s
Data Flows

http://research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf

Current UCSD Optical Core:
Bridging End-Users to CENIC L1, L2, L3 Services
Quartzite Communications
To 10GigE cluster
node interfaces
Core Year 3
Enpoints:
Quartzite Wavelength
>= 60 endpoints at 10 GigE
Core
Selective
.....

Switch

>= 32 Packet switched Lucent To 10GigE cluster
node interfaces and
>= 32 Switched wavelengths other switches

To cluster nodes
.....
>= 300 Connected endpoints
Glimmerglass
To cluster nodes
.....
Production
GigE Switch with
OOO
Dual 10GigE Upliks
Switch
To cluster nodes
Approximately 0.5 TBit/s
32 10GigE

.....
Arrive at the ―Optical‖ GigE Switch with
Force10 Dual 10GigE Upliks

Center of Campus.
...

GigE Switch with
Switching is a Hybrid of:
To Packet Switch CalREN-HPR
Research
Dual 10GigE Upliks
Packet, Lambda, Circuit --
other
nodes
Cloud
GigE
OOO and Packet Switches
10GigE
Campus Research
4 GigE
4 pair fiber
Cloud
Juniper T320

Source: Phil Papadopoulos, SDSC/Calit2
(Quartzite PI, OptIPuter co-PI)
Quartzite Network MRI #CNS-0421555;
OptIPuter #ANI-0225642

UCSD Campus Investment in Fiber Enables
Consolidation of Energy Efficient Computing & Storage
WAN 10Gb:
N x 10Gb CENIC, NLR, I2

Gordon –
HPD System

Cluster Condo

DataOasis
(Central) Storage
Triton – Petascale
Data Analysis
Scientific
Instruments

Digital Data Campus Lab OptIPortal
Collections Cluster Tile Display Wall

Source: Philip Papadopoulos, SDSC/Calit2

UCSD Planned Optical Networked
Biomedical Researchers and Instruments
• Connects at 10 Gbps :
CryoElectron
Microscopy Facility – Microarrays
San Diego – Genome Sequencers
Supercomputer – Mass Spectrometry
Center
– Light and Electron
Microscopes
– Whole Body Imagers
– Computing
– Storage
Cellular & Molecular
Medicine East
Calit2@UCSD

Bioengineering
Radiology
Imaging Lab
National
Center for
Microscopy
& Imaging Center for
Molecular Genetics
Pharmaceutical
Sciences Building Cellular & Molecular
Biomedical Research Medicine West

Moving to a Shared Campus Data Storage
and Analysis Resource: Triton Resource @ SDSC
Triton
Resource

Large Memory Shared Resource
PSDAF Cluster
• 256/512 GB/sys • 24 GB/Node
• 9TB Total • 6TB Total
• 128 GB/sec • 256 GB/sec
• ~ 9 TF • ~ 20 TF
x256
x28

UCSD Research Labs
Large Scale Storage
• 2 PB
• 40 – 80 GB/sec
• 3000 – 6000 disks
• Phase 0: 1/3 TB, 8GB/s

Campus Research
Network


Calit2 Microbial Metagenomics Cluster-
Next Generation Optically Linked Science Data Server
Source: Phil Papadopoulos, SDSC, Calit2

512 Processors
~200TB
~5 Teraflops Sun
1GbE X4500
~ 200 Terabytes Storage and Storage
10GbE
Switched 10GbE
/ Routed
Core

Calit2 CAMERA Automatic Overflows
into SDSC Triton

@ SDSC

Triton Resource

@ CALIT2
Transparently CAMERA -
Sends Jobs to Managed
Submit Portal Job Submit
on Triton Portal (VM)
10Gbps
Direct
Mount
CAMERA ==
DATA No Data
Staging

Prototyping Next Generation User Access and Large
Data Analysis-Between Calit2 and U Washington
Photo Credit: Alan Decker Feb. 29, 2008

Ginger
Armbrust’s
Diatoms:
Micrographs,
Chromosomes,
Genetic
Assembly

iHDTV: 1500 Mbits/sec Calit2 to
UW Research Channel Over NLR

Rapid Evolution of 10GbE Port Prices
Makes Campus-Scale 10Gbps CI Affordable
• Port Pricing is Falling
• Density is Rising – Dramatically
• Cost of 10GbE Approaching Cluster HPC Interconnects
$80K/port
Chiaro
(60 Max)

$ 5K
Force 10
(40 max) ~$1000
(300+ Max)

$ 500
Arista $ 400
48 ports Arista
48 ports
2005 2007 2009 2010


10G Switched Data Analysis Resource:
Data Oasis (RFP Responses Due 10/29/2010)

OptIPuter RCN
Colo

CalRe
32 n
Triton
20
24
32
2
Trestles 12 Existing
40
Storage
Oasis Procurement (RFP)

Dash
8 • Phase0: > 8GB/s sustained, today 1500 –
• RFP for Phase1: > 40 GB/sec for Lustre 2000 TB
> 40
• Nodes must be able to function as Lustre GB/s
OSS (Linux) or NFS (Solaris)
100 • Connectivity to Network is 2 x 10GbE/Node
Gordon • Likely Reserve dollars for inexpensive
replica servers

You Can Download This Presentation
at lsmarr.calit2.net

High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World

Similar to High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World (20)

More from Larry Smarr

More from Larry Smarr (20)

Recently uploaded

Recently uploaded (20)

High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World