SlideShare a Scribd company logo
Pushing Discovery with
Internet2
Cloud to Supercomputing
in Life Sciences
DAN TAYLOR
Director, Business Development, Internet2
BIO-IT WORLD 2016
BOSTON
APRIL, 2016
2 –
8/30/
Internet2 Overview
• An advanced networking consortium
– Academia
– Corporations
– Government
• Operates a best-in-class national optical network
– 15,000 miles of dedicated fiber
– 100G routers and optical transport systems
– 8.8 Tbps capacity
• For over 20 years, our mission has been to
– Provide cost effective broadband and collaboration technologies to facilitate
frictionless research in Big Science – broad collaboration, extremely large data
sets
– Create tomorrow’s networks & a platform for networking research
– Engage stakeholders in
• Bridging the IT/Researcher gap
• Developing new technologies critical to their missions
[ 3 ]
The 4th Gen Internet2 Network
Internet2 Network
by the numbers
17 Juniper MX960 nodes
31 Brocade and Juniper
switches
49 custom colocation facilities
250+ amplification racks
15,717 miles of newly
acquired dark fiber
2,400 miles of partnered
capacity with Zayo
Communications
8.8 Tbps of optical capacity
100 Gbps of hybrid Layer 2
and Layer 3 capacity
300+ Ciena ActiveFlex 6500
network elements
Technology
• A Research Grade high speed network –
optimized for “Elephant flows”
• Layer 1 – secure point to point wavelength networking
• Advanced Layer 2 Services – Open virtual network for Life
Sciences with connectivity speeds up to 100 Gbs
• SDN Network Virtualization customer trials now
• Advanced Layer 3 Services – High speed IP connectivity to
the world
• Superior economics
• Secure sharing of online research resource
– federated identity management
system

Recommended for you

The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform

Briefing to The Quilt Visit to Calit2’s Qualcomm Institute University of California, San Diego February 10, 2016

analyticsbig datapacific research platform
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021

Keynote Presentation NSF Workshop on Applications and Services in 2021 Washington, DC January 28, 2016

prpcyberinfrastructureinternet
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System

The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.

cyberinfrastructureanalyticsprp
[ 5 ]
Internet2 Members and Partners
255 Higher Education members
67 Affiliate members
41 R&E Network members
82 Industry members
65+ Int’l partners reaching over
100 Nations
93,000+ Community anchor institutions
Focused on member technology needs
since 1996
"The idea of being
able to collaborate
with anybody,
anywhere, without
constraint…"
—Jim Bottum, CIO,
Clemson University
Community
6 –
8/30/
Strong international partnerships
• Agreements with
international networking
partners offer
interoperability and
access
• Enable collaboration
between U.S. researchers
and overseas counterparts
in over 100 international R
& E networks
Community
Some of our Affiliate Members
7
[ 8 ]
*Routers
Stanford
Computer
Workstations
Berkeley, Stanford
Security
Systems
Univ of Michigan
Security
Systems
Georgia Tech
Social
Media
Harvard
Network
Caching
MIT
Search
Stanford

Recommended for you

The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform

Opening Keynote Lecture 15th Annual ON*VECTOR International Photonics Workshop Calit2’s Qualcomm Institute University of California, San Diego February 29, 2016

analyticscyberinfrastructurepacific research platform
UC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive ResearchUC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive Research

Invited presentation by Calit2 Director Larry Smarr to the UC IT Leadership Council in Oakland, Calif., on May 19, 2014.

researchitdata
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data Science

The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.

[ 9 ]
The Route
to Innovation
August 30, 2016 © 2016 Internet2
Abundant Bandwidth
• Raw capacity now available on
Internet2 Network a key imagination enabler
• Incent disruptive use of new, advanced
capabilities
Software Defined Networking
• Open up network layer itself to innovation
• Let innovators communicate with and program
the network itself
• Allow developers to optimize the network for
specific applications
Science DMZ
• Architect a special solution to allow
higher-performance data flows
• Include end-to-end performance monitoring
server and software
• Include SDN server to support programmability
Life Sciences Research Today
• Sharing Big Data sets (genomic, environmental, imagery) key to basic and applied research
• Reproducibility - need to capture methods as well as raw data
– High variability in analytic processes and instruments
– Inconsistent formats and standards
• Lack of metadata & standards
• Biological systems are immensely complicated and dynamic (S. Goff, CyVERSE/iPlant)
• 21k human genes can make >100k proteins
• >50% of genes are controlled by day-night cycles
• Proteins have an average half-life of 30 hours
• Several thousand metabolites are rapidly changing
• Traits are environmentally and genetically controlled
• Information Technology - High Performance Computing and Networking - now can explore
these systems through simulation
• Collaboration
– Cross Domain, Cross Discipline
– Distribution of systems and talent is global
– Resources are public, private and academic
BIO-IT Trends in the Trenches 2015
with Chris Dagdigian
Take Aways
- Science is changing faster than IT funding
cycle for data intensive computing
environments
- Forward looking 100G multi site , multi
party collaborations required
- Cloud adoption driven by capability vs cost
- Centralized data center dead; future is
distributed computing/data stores
- Big pharma security challenge has
been met
- SDN is real and happening now; part of
infrastructure automation wave
- Blast radius more important than ever:
DOE’s Science DMZ architecture is a
solution
https://youtu.be/U6i0THTxe4o
http://www.slideshare.net/chrisdag/201
5-bioit-trends-from-the-frenches
2015 Bio-IT World Conference & Expo
• Change
• Networking
• Cloud
• Decentralized Collaboration
• Security
• Mission Networks
Change
[ 12 ]

Recommended for you

Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020

The Pacific Research Platform (PRP) is a high-bandwidth global private "cloud" connected to commercial clouds that provides researchers with distributed computing resources. It links Science DMZs at universities across California and beyond using a high-performance network. The PRP utilizes Data Transfer Nodes called FIONAs to transfer data at near full network speeds. It has adopted Kubernetes to orchestrate software containers across its resources. The PRP provides petabytes of distributed storage and hundreds of GPUs for machine learning. It allows researchers to perform data-intensive science across multiple universities much faster than possible individually.

pacific research platformnautilusberkeley cloud computing meetup
Cyberistructure
CyberistructureCyberistructure
Cyberistructure

Cyberenvironments integrate shared and custom cyberinfrastructure resources into a process-oriented framework to support scientific communities and allow researchers to focus on their work rather than managing infrastructure. They enable more complex multi-disciplinary challenges to be tackled through enhanced knowledge production and application. Key challenges include coordinating distributed resources and users without centralization and evolving systems rapidly to keep pace with advancing science.

Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2

The document provides an overview of the development of the NIH Data Commons. It discusses factors driving the need for a data commons, including large amounts of data being generated and increased support for data sharing. It outlines the goals of making data findable, accessible, interoperable and reusable. Several pilots are exploring the feasibility of the commons framework, including placing large datasets in the cloud and developing indexing methods. Considerations in fully realizing the commons are also discussed, such as standards, discoverability, policies and incentives.

data commonsfairdata sharing
13 – 8/30/2016, © 2009 Internet2
Data Tsunami
Physics
Large Hadron Collider
Life Sciences
Next Generation Sequencers
CERN Illumina
Networking
[ 14 ]
15 –
8/30/20
2012: US – China 10 Gbps Link
Fed Ex: 2 days
Internet+ FTP: 26 hours
China ‐ US 10G Link: 30 secs
Dr. Lin Fang Dr. Dawei Lin
Sample.fa
(24GB)
NCBI/UC-Davis/BGI : First ultra high speed transfer of
genomic data between China & US, June 2012
“The 10 Gigabit network connection is even
faster than transferring data to most local hard
drives,” said Dr. Lin [of UC, Davis]. “The use of
a 10 Gigabit network connection will be
groundbreaking, very much like email
replacing hand delivered mail for
communication. It will enable scientists in the
genomics-related fields to communicate and
transfer data more rapidly and conveniently,
and bring the best minds together to better
explore the mysteries of life science.” (BGI
press release)
Life Sciences Engagement
16 Community

Recommended for you

EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons

This document discusses the development of the NIH Data Commons, which aims to create a shared framework and infrastructure for biomedical data. It notes the increasing amounts of data being generated and the need for data sharing and interoperability. The Data Commons framework treats data, tools, and publications as digital objects that are findable, accessible, interoperable and reusable. Current pilots include deploying reference datasets in the cloud, indexing data and tools, and a credits system for cloud resources. Challenges discussed include metrics, costs, standards, incentives and sustainability. The framework's relevance for supporting open data in Australia is also addressed.

data commonsfairembl-abr
Global Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureGlobal Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, Future

The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.

global collaborationglobal networksresearch platforms
Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains NetworkPeering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains Network

The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points: - The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider. - The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources. - Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR

Forward Looking 100G Networks & Multi Site Multi
Party Collaboration
Accelerating Discovery:
USDA ARS Science
Network
8/30/2016, © 2016
Internet2
[ 18 ]
USDA Agriculture Research Services Science Network
• USDA scope is far beyond human
USDA Agricultural Research Services
Use Cases
• Drought (Soil Moisture) Project – Challenging Volumes
of Data
– NASA satellite data storage - 7 TB/mo., 36mo mission
– ARS Hydrology and Remote Sensing Lab analysis - 108 TB
– Data completely re-process 3 to 5 times
• Microbial Genomics Project – Computational
Bottlenecks
– Individual Strains of bacteria and microorganism communities
related to
Food Safety
Animal Health
Feed Efficiency
[ 20 ]
ARS Big Data Initiative
Big Data Workshop Recommendations,
(February 2013)
Three Pillars of the ARS Big Data Implementation
Plan – Network, HPC, Virtual Research Support
(April, 2014)
• Develop a Science DMZ
• Enable high-speed, low-latency transfer of
research data to HPC and storage from ARS
locations
• Virtual Researcher Support
Implementation Complete (Nov. 2015)
Clay Center, NE; Albany, CA; Beltsville
Labs/Nat’l Ag. Library, Beltsville, MD
Stoneville, MS; Ft. Collins, CO
Ames/NADC, IA
• ARS Scientific Computing
Assessment
• Final Report March 2014

Recommended for you

Pacific Research Platform Science Drivers
Pacific Research Platform Science DriversPacific Research Platform Science Drivers
Pacific Research Platform Science Drivers

The document discusses the vision and progress of the Pacific Research Platform (PRP) in creating a "big data freeway" across the West Coast to enable data-intensive science. It outlines how the PRP builds on previous NSF and DOE networking investments to provide dedicated high-performance computing resources, like GPU clusters and Jupyter hubs, connected by high-speed networks at multiple universities. Several science driver teams are highlighted, including particle physics, astronomy, microbiology, earth sciences, and visualization, that will leverage PRP resources for large-scale collaborative data analysis projects.

big datacyberinfrastructureinfrastructure
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform

Cybersecurity Engagement in a Research Environment Workshop Rady School of Management, UC San Diego December 5, 2019

cyber securitydistributed machine learning
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025

This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.

national science foundationcyberinfrastructureinformation technology
SCInet Locations and Gateways
USDA AGRICULTURAL RESEARCH
SERVICE
Albany, CA
Ft. Collins, CO Clay Center, NE Ames, IA
Stoneville, MS
Beltsville, MD
100 Gb
100 Gb
100 Gb
10 Gb
10 Gb10 Gb
Cloud & Distributed Research Computing
@Scale
[ 22 ] Community
Internet2 Approach :
Agile scaling of resources and capacity
Access to multi-domain, multi-discipline expertise in one dynamic global community
Offer a bottomless toolbox for Innovation for the researcher
[ 23 ]
New High Speed Cloud Collaborations
8/30/20
16
23
10, x10G, x100G
Syngenta Science Network
Bringing Plant Potential to Life through enhanced
computing capacity

Recommended for you

Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18

Science Gateways Community Institute presentation on a panel at the Earth Science Information Partners summer 2018 meeting.

science gateways
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems

In this deck from PASC18, David Bader from Georgia Tech presents: Massive-Scale Analytics Applied to Real-World Problems. "Emerging real-world graph problems include: detecting and preventing disease in human populations; revealing community structure in large social networks; and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data-intensive computing for applications in social sciences, physical sciences, and engineering." Watch the video: https://wp.me/p3RLHQ-iPk Learn more: https://pasc18.pasc-conference.org/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

hpcsupercomputingbig data
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data

Talk at DOE CIO's Big Data Tech Summit -- latest take on why and wherefore of software as a service (SaaS) for science, and the Globus Online work we are doing, with various DOE examples.

saasdatabigdata
Syngenta Science Network
• Syngenta is a leading agriculture
company helping to improve global
food security by enabling millions of
farmers to make better use of
available resources.
• Key research challenge:
How to grow plants
more efficiently?
• Internet2 members, especially land
grant universities, are important
research partners.
The Challenge
– Increasing size of scientific data sets
– Growing number of useful external resources
and partners
– Complexity of genomic analyses is
increasing
– Need for big data collaborations across the
globe
– Must Innovate
– Higher data throughput
– High speed connectivity to AWS Direct Connect
Surge HPC
Collaborations with academic community
– High speed connections to best-in-class supercomputing resources
NCSA – University of Illinois
Leverage NCSA expertise in building custom R&D workflows
Leverage NCSA Industry Partnership Program
A*Star Supercomputing Center in Singapore
Supports a global, distributed, scientific computing capability
– Global scale : creating a global fabric for computing and collaboration
“I want to be 15 minutes behind NCSA and 6
months ahead of my competition”
- Keith Gray, BP
[ 28 ]
National Center for Supercomputing
Applications

Recommended for you

Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
Public.Cdsc.Middleton

The Clinical Decision Support Consortium aims to advance clinical decision support through several research projects. The Consortium will assess, define, demonstrate, and evaluate best practices for clinical decision support across multiple healthcare settings and electronic health record platforms. Several research teams will focus on knowledge management, clinical guideline translation, decision support content development and delivery, and evaluating demonstrations of decision support. The goal is to improve healthcare quality by facilitating widespread use of evidence-based clinical decision support.

re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri

This document summarizes a presentation about providing next-generation sequencing analysis capabilities using Globus Genomics. It outlines challenges with current manual approaches to sequencing data analysis, including difficulties moving large datasets between locations and maintaining complex analysis scripts. The presentation introduces Globus Genomics, which uses Globus data transfer services integrated with Galaxy to provide a workflow-based system for sequencing analysis without requiring local installation or configuration. Key benefits include on-demand access to scalable cloud resources, ability to easily modify and reuse analysis workflows, and integration with data sources. The system aims to accelerate genomic research by automating and simplifying analysis.

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS

1) Globus Genomics addresses challenges in sequencing analysis by providing a platform that integrates data transfer via Globus Online, workflow management in Galaxy, and scalable compute resources in AWS. 2) An example collaboration with the Dobyns Lab saw over a 10x speedup in exome data analysis by replacing a manual process with Globus Genomics. 3) Globus Genomics leverages XSEDE services like Globus Transfer and Nexus while integrating additional resources like sequencing centers and cloud computing, in order to reduce the costs and complexities of genomic research for communities not traditionally using advanced cyberinfrastructure.

genomicsglobusxsede
[ 29 ]
*Better Designed* *More Durable* *Available Sooner*
Theoretical &
Basic Research
Prototyping &
Development
Optimization &
Robustification
Commercialization
[ 30 ]
NCSA Mayo Clinic @Scale Genome-Wide
Association Study
for Alzheimer’s disease
• NCSA Private Sector Program
– UIUC HPCBio
– Mayo Clinic
• BlueWatersteam and Swiss Institute of Bioinformatics
worked together to identify which genetic variants
interact to influence gene expression patterns that
may associate with Alzheimer’s disease
[ 31 ]
Big Data and Big Compute Problem
• 50,011,495,056 pairs of variants
• Each variant pair is tested against
181 subjects and 24,544 genic regions
• Computationally large problem,
PLINK: ~ 2 years at Mayo FastEpistasis: ~ 6 hours on BlueWaters
• Can be a big data problem:
- 500 PB if keep all results
- 4 TB when using a conservative cutoff
San Diego Supercomputing Center
[ 32

Recommended for you

Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline

This poster is prepared for the upcoming BD2K All hands meeting. We present the BDDS Knowledge Discovery platform as applied to understanding the role of amyloid burden in Alzheimers and Parkisons.

medicinebig datahpc
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015

Globus Genomics provides tools and services to help researchers manage and analyze large genomic datasets. It uses Globus data management tools to securely transfer data between institutions. Researchers can then run analysis workflows on cloud compute resources through Galaxy interfaces. This enables researchers to assemble diverse datasets, apply multiple computational models, and publish results for others to discover, validate, and reuse. Examples show researchers using Globus Genomics to process petabytes of sequencing data and perform genome-wide analysis across many institutions. The goal is to accelerate scientific discovery by making it easier for researchers to find "needles in haystacks" through data-intensive computational approaches.

sciencereproducibilityhpc
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics

Presentation offered at http://www.smartiotlondon.com/2016-seminar-programme/big-data-and-genomics-the-future-of-genetic-engineering Bioinformatics: the marriage of biology and Big Data, and how this will change the way we perform genetic engineering. This presentation explain our company (Alkol Biotech) compares DNA strands, focusing on its development of the “EunergyCane” sugarcane crop: Europe’s only sugarcane variety. It explain the tools the company uses such as Big Data, Machine Learning, and Fast Sequencing. Learning Outcomes: 1 – Learn on the new field of “Bioinformatics”, which is the marriage of IT and biology 2 – Learn how Big Data is changing the game on genetic engineering 3 – Learn what are the tools used and expected results

big datahadoop
UCSC Cancer Genomics Hub: Large Data Flows to End
Users
1G
8G
15G
Cumulative TBs of CGH
Files Downloaded
Data Source: David Haussler, Brad Smith,
UCSC; Larry Smarr, CalIT2
30 PB
http://blogs.nature.com/news/2012/05/us-cancer-genome-repository-hopes-to-speed-research.html
[ 34 ]
SDSC Protein Data Base Archive
• Repository of atomic coordinates and other information describing proteins and other
important biological macromolecules. Structural biologists use methods such as X-ray
crystallography, NMR spectroscopy, and cryo-electron microscopy to determine
the location of each atom relative to each other in the molecule. Information is
annotated and publicly released into the archive by the wwPDB.
SDSC
• Expertise
– Bioinformatics programming
and applications support.
– Computational chemistry
methods.
– Compliance requirements,
e.g., for dbGaP, FISMA and
HIPAA.
– Data mining techniques,
machine learning and
predictive analytics
– HPC and storage system
architecture and design.
– Scientific workflow systems
and informatics pipelines.
• Education and Training
– Intensive Boot camps for
working professionals - Data
Mining, Graph Analytics, and
Bioinformatics and Scientific
Worflows.
– Customized, on-site training
sessions/programs.
– Data Science Certificate
program.
– “Hackathon” events in data
science and other topics.
8/30/20
16
Sherlock Cloud: A HIPAA-Compliant
Cloud
Healthcare IT Managed Services - SDSC Center of Excellence
36
• Expertise in Systems, Cyber Security, Data Management,
Analytics, Application Development, Advanced User Support and
Project Management
• Operating the first & largest FISMA Data Warehouse platform for
Medicaid fraud, waste and abuse analysis
• Leveraged FISMA experience to offer HIPAA- Compliant
managed hosting for UC and academia
• Supporting HHS CMS, NIH, UCOP and other UC Campuses
• Sherlock services : Data Lab, Analytics, Case Management
and Compliant Cloud

Recommended for you

HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support

This is the presentation given to the HL7 Clinical Interoperability Council on Clinical Decision Suppoprt (referenced in the post below), April, 2009.

emrhitclinical
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)

The document summarizes the ADAS&ME project which aims to develop advanced driver assistance systems that can automatically transfer control between the vehicle and driver based on the driver's state and environmental context. The project has a budget of 9.5 million euros over 42 months and involves companies and research institutions developing technologies like high-definition maps, vehicle connectivity, and systems for monitoring driver state and handling non-reactive drivers. Several use cases are outlined focusing on commercial vehicles and motorcycles, with scenarios presented for smooth transitions between automated and manual driving and handling emergencies if the driver does not respond.

#adas #automation #driverless #driverlesscars #hmi
Effective ansible
Effective ansibleEffective ansible
Effective ansible

The document discusses effective ways to use Ansible including setting up passwordless SSH authentication, limiting playbooks to single machines, managing private keys, using roles for scalability, and tuning Ansible for performance. The objective is to deploy Kubernetes from scratch with one command using best practices for Ansible configuration, authentication, and organization.

hpcloud ceph kubernetes ansible docker
Lawrence Livermore National Lab
[ 37 ]
38 – 8/30/2016, ©
Internet2
Lawrence Livermore NL HPC Innovation Center
Cardioid
Electrophysiology human heart
simulations allowing exploration of
causes of
• Arrhythmia
• Sudden cardiac arrest
• Predictive drug interactions.
Depicts activation of each heart
muscle cell and the cell-to-cell
transfer of the voltage of up to 3
billion cells - in near-real time.
Metagenomic analysis with Catalyst:
• Comparing short genetic fragments in a query dataset
against a large searchable index (14 million
genomes - 3x larger than those currently in use) of
genomes to determine the threat an organism poses
to human health
Community Data Science Resources
renci RADII and GWU HIVE
Driving Infrastructure Virtualization
Enabling Reproducibility For FDA Submissions
[ 39 ]
RADII
Resource Aware Datacentric collaboratIve Infrastructure
Goal
Make data-driven collaborations a ‘turn-key’ experience for domain
researchers and a ‘commodity’ for the science community
Approach
A new cyber-infrastructure to manage data-centric collaborations based
upon natural models of collaborations that occur among scientists.
RENCI: Claris Castillo, Fan Jiang, Charles Schmidt, Paul Ruth, Anirban Mandal ,Shu Huang, Yufeng Xin, Ilya Baldin, Arcot Rajasekar
SDSC: Amit Majumdar
DUKE: Erich Huang
Workflows - especially data-driven workflows and workflow
ensembles - are becoming a centerpiece of modern computational
science.

Recommended for you

CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel

This document summarizes a presentation about Globus Genomics, a service that provides genomic data analysis tools and workflows through a web interface. It allows users to securely transfer data, run standardized analysis pipelines, access computational resources on demand through Amazon Web Services, and collaborate on shared data and workflows. The service aims to make genomic analysis more accessible, reproducible, and sustainable through various pricing models and support for individual labs and bioinformatics cores.

globus genomics ngs
Supporting Barack Obama for President
Supporting Barack Obama for PresidentSupporting Barack Obama for President
Supporting Barack Obama for President

This document summarizes Senator Barack Obama's health policy plan, which focuses on achieving universal health care coverage, health care reform, and strengthening public health. It outlines some of the key problems in the current US healthcare system from the perspectives of providers, purchasers, and consumers. Obama's plan would invest in health information technology and reform reimbursement to align with quality. The plan is estimated to cost $50-65 billion annually but could save $120-200 billion through reduced administrative costs, improved disease management, and health IT savings. If implemented, it could lower family insurance costs by $2,500 and cover 10 million more people.

healthemrhit
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools

This document provides an overview of Apache Sqoop and discusses the transition from Sqoop 1 to Sqoop 2. Sqoop is a tool for transferring data between relational databases and Hadoop. Sqoop 1 was connector-based and had challenges around usability and security. Sqoop 2 addresses these with a new architecture separating connections from jobs, centralized metadata management, and role-based security for database access. Sqoop 2 is the primary focus of ongoing development to improve ease of use, extensibility, and security of data transfers with Hadoop.

hughadoopsqoop
RADII Rationale
• Multi-institutional research teams grapple with multitude of resources
– Policy-restricted large data sets
– Campus compute resources
– National compute resources
– Instruments that produce data
• Interconnected by networks
– Campus, regional, national providers
• Many options, much complexity
• Data and infrastructure are treated separately
RADII Creates
A cyberinfrastructure that integrates data and resource
management from the ground up to support data-centric research.
RADII allows scientists to easily map collaborative data-driven
activities onto a dynamically configurable cloud infrastructure.
Infrastructure management
have no visibility into data
resources
Data management solutions
have no visibility into the
infrastructure
RADII: Foundational technologies
Data-grids present distributed data under a
one single abstraction and authorization
layer
Networked Infrastructure as a Service (NIaaS)
for rapid deployment of programmable
network virtual infrastructure (clouds).
Disjoint solutions
Incompatible resource abstractions
Gap
to reduce the data-infrastructure management gap
RADII System – Virtualizing Data, Compute and Network
for Collaboration
43
Novel mechanisms to
represent data-centric
collaborations using DFD
formalism
Data-centric resource
management
mechanisms for
provisioning and de-
provisioning resources
dynamically through
out the lifecycle of
collaborations
Novel mechanisms to
map data processes,
computations, storage
and organization entities
onto infrastructure
FDA and George Washington University
Big Data Decisions:
Linking Regulatory and Industry
Organizations with
HIVE Bio-Compute Objects
[ 44 ]
Presented by: Dan Taylor, Internet 2 | Bio IT | Boston | 2016

Recommended for you

Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November

Ramesh Raskar MIT Media Lab Ramesh Raskar is an Associate Professor at MIT Media Lab. Ramesh Raskar joined the Media Lab from Mitsubishi Electric Research Laboratories in 2008 as head of the Lab’s Camera Culture research group. His research interests span the fields of computational photography, inverse problems in imaging and human-computer interaction. Recent projects and inventions include transient imaging to look around a corner, a next generation CAT-Scan machine, imperceptible markers for motion capture (Prakash), long distance barcodes (Bokode), touch+hover 3D interaction displays (BiDi screen), low-cost eye care devices (Netra,Catra), new theoretical models to augment light fields (ALF) to represent wave phenomena and algebraic rank constraints for 3D displays(HR3D). In 2004, Raskar received the TR100 Award from Technology Review, which recognizes top young innovators under the age of 35, and in 2003, the Global Indus Technovator Award, instituted at MIT to recognize the top 20 Indian technology innovators worldwide. In 2009, he was awarded a Sloan Research Fellowship. In 2010, he received the Darpa Young Faculty award. Other awards include Marr Prize honorable mention 2009, LAUNCH Health Innovation Award, presented by NASA, USAID, US State Dept and NIKE, 2010, Vodafone Wireless Innovation Project Award (first place), 2011. He holds over 50 US patents and has received four Mitsubishi Electric Invention Awards. He is currently co-authoring a book on Computational Photography.

camera cultureemerging worldsinnovatingforbillions
Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)

This document provides an overview of the Leap Motion controller and its capabilities for tracking hand gestures and finger positions. It explains how to set up a basic Leap Motion application using a 2D canvas, initialize the Leap controller, and include an animation loop to continuously draw hand tracking data. Examples are given for visualizing the Leap geometric system, common gestures like swipes and taps, and hand parameters using online code snippets and demos.

Stereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt HirschStereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt Hirsch

This document discusses various types of 3D displays. It begins with an overview of depth cues that can be presented to the human visual system, both monocular cues like size and occlusion, as well as binocular cues like retinal disparity and convergence. The document then presents a taxonomy of 3D display technologies, categorizing them as either glasses-bound or unencumbered designs. Specific display types are described in more detail, including head-mounted displays, spatial and temporal multiplexing, parallax barriers, integral imaging, and volumetric and holographic displays. Multi-view rendering techniques for generating stereoscopic images are also covered, such as using OpenGL for anaglyph generation and off-axis perspective projection.

EI
H V
From Jan 2016: Vahan Simonyan, Raja Mazumder
lecture NIH: Frontiers in Data Science Series
https://videocast.nih.gov/summary.asp?Live=18299&bhcp=1
High-performance Integrated Virtual Environment
A regulatory NGS data analysis platform
BIG DATA – From a range of samples and instruments to approval for
use
analysis and
review
sample
archival
sequencing run
file transfer
regulation
computation
pipelines
produced files
are massive in
size
transfer is
slow
too large to keep
forever; not
standardized
difficult to
validate
difficult to
visualize and
interpret
how do we
avoid
mistakes?
NGS lifecycle: from a biological sample to biomedical research and regulation
• Data Size: petabyte scale, soon exa-bytes
• Data Transfer: too slow over existing networks
• Data Archival: retaining consistent datasets across many years of mandated
evidence maintenance is difficult
• Data Standards: floating standards, multiplicity of formats, inadequate
communication protocols
• Data Complexity: sophisticated IT framework needed for complex dataflow
• Data Privacy: constrictive legal framework and ownership issues across the board
from the patient bedside to the FDA regulation
• Data Security: large number of complicated security rules and data protection tax IT
subsystems and cripple performance
• Computation Size: distributed computing, inefficiently parallelized, requires large
investment of hardware, software and human-ware
• Computation Standards: non canonical computation protocols, difficult to compare,
reproduce, rely on computations
• Computation Complexity: significant investment of time and efforts to learn
appropriate skills and avoid pitfalls in complex computational pipelines
• Interpretation: large outputs from enormous computations are difficult to visualize
and summarize
• Publication: peer review and audit requires communication by massive amount of
information
... and how do we avoid mistakes ?
software challenges and needs
HIVE is an End to End Solution
• Data retrieval from anywhere in the world
• Storage of extra large scale data
• Security approved by OIM
• Integrator platform to bring different data and analytics together
• Tailor made analytics designed around needs
• Visualization made to help in interpretation of data
• Support of the entire hard-, soft-ware and knowledge infrastructure
• Expertise accumulated in the agency
• Bio-Compute objects repository to provide reproducibility and
interoperability and long term referable storage of computations and results
HIVE is not
• an application to perform few tasks
• yet another database
• a computer cluster or a cloud or a data center
• an IT subsystem
More:
http://www.fda.gov/ScienceResearch/SpecialTopics/RegulatoryScience/ucm491
893.htm

Recommended for you

What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'

'Media' is a plural for medium. The medium for impact of digital technologies at MIT Media Lab can be photons, electrons, neurons, atoms, cells, musical notes and more. Over the last 40 years, computing has moved from processor, network, social and more sensory. MIT Media Lab works at the intersection of computing and such media for human-centric technologies.

mit media labraskarcamera culture
Multiview Imaging HW Overview
Multiview Imaging HW OverviewMultiview Imaging HW Overview
Multiview Imaging HW Overview

This document provides an overview of the pipeline for multiview computer vision. It describes taking multiple photographs, detecting and matching features between images, estimating homographies to relate the images, generating blended intermediate frames, and creating a video from the sequence of frames. It also provides details on steps like feature detection and description, matching features, estimating homographies, image blending, and writing video files.

Google Glass Breakdown
Google Glass BreakdownGoogle Glass Breakdown
Google Glass Breakdown

Glass contains a 1 GHz processor, 16GB storage, 512MB RAM, GPS, Bluetooth, WiFi, 5MP camera, bone conduction speaker, gyroscope, accelerometer, and 640x360 display. It detects blinks and winks using a proximity sensor and IR emitter/receiver near the right eye. Developers can build apps using the Mirror API or by sideloading Android APKs that utilize Glass-specific APIs. The documentation provides examples of building simple apps.

google glass
Instantiation
DataTypeDefinitions Definitions of
metadata
types
Data Typing
Engine
Definitions of
computations
metadata
Data
Bio-compute
Definitions of
algorithms
and pipeline
descriptions
Computational
protocols
Verifiable
results
within
acceptable
uncertainty/er
ror
Scientifically
reliable
interpretation
HIVE data universe
industry FDA regulatory
analysis
2. compute
3.
submit
1. data-
forming
6.
issues
resubmits
5. regulatory
decision
4.
SOPP/prot
ocols
consumer
$ millions of dollars
7. yes
7. no
regulatory iterations
~$800 Million R&D dollars for a single
drug
~$2.6 Billion total cost
industr
y
FD
A
HIVE
public-HIVEGalaxy
CLC
DNA-nexus
2. compute
3. submit1. data-forming
6. issues
resubmits
5. bio-
compute
2. HIVE
SOPP/protocols
4.
SOPP/prot
ocols
consumer
7.
yes
7 .no
4. submit
bio-compute
integration
3.
compute
Facilitate
integration
$ millions of dollars
bio-compute as a way to link regulatory
and industry organizations
Federated Identity
[ 52 ]

Recommended for you

What is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh RaskarWhat is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh Raskar

What is SIGGRAPH NEXT? By Juliet Fiss What will be the next big thing at SIGGRAPH, and how can the SIGGRAPH community contribute in an impactful way to fields outside of traditional computer graphics? SIGGRAPH NEXT at SIGGRAPH 2015 explored these questions. In this new addition to the SIGGRAPH program, an eclectic set of speakers gave TED-style talks and posed grand challenges to the SIGGRAPH community. In this blog post, Professor Ramesh Raskar of the MIT Media Lab introduces SIGGRAPH NEXT and outlines his vision for it. What will be the next big thing at SIGGRAPH? The SIGGRAPH community has a set of hammers that it uses to solve problems: geometry processing, rendering, animation, and imaging. What will be the next hammer, the next major field of study, appear at SIGGRAPH? Let’s examine where our research ideas come from. Often, advances in machine learning, optimization, signal processing, and optics forge our hammers. Our selection of hammer also depends on the nails we see. The most common application areas of computer graphics currently include computer-aided design, movies, games, and photography. We often ask: “Does this work contribute to SIGGRAPH techniques?” We should also ask, “Does this work contribute SIGGRAPH techniques to _____?” When we answer the challenges posed by these traditional application areas of computer graphics, we are “drinking our own champagne.” We have made amazing progress in these application areas, and we should celebrate! SIGGRAPH NEXT is about finding new varieties of champagne; for that, we need new varieties of grapes. We should invite others from nontraditional and emerging application areas to enjoy our champagne with us, and they will become part of our community. First, we can expand our work in existing areas like mobile, user interaction, virtual reality, fabrication, and new types of cameras. We can also expand into emerging areas such as healthcare, energy, education, entrepreneurship, materials, tissue fabrication, and social media. What’s next? Professor Raskar highlights three top areas where we can make an impact. One big take-home message is that many of these applications involve biology: bio is the new digital, and it will affect us ubiquitously.

siggraph2015agriculture
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research

Panel Presentation CENIC Annual Conference University of California, Irvine - Irvine, CA March 9, 2015

big dataanalyticscyberinfrastucture
Building a Regional 100G Collaboration Infrastructure
Building a Regional 100G Collaboration InfrastructureBuilding a Regional 100G Collaboration Infrastructure
Building a Regional 100G Collaboration Infrastructure

Keynote Presentation, CineGrid International Workshop 2015, Calit2’s Qualcomm Institute, University of California, San Diego December 11, 2015

internetcyberinfrastructurecinegrid
[ 53 ]
Community-developed framework of
trust enables:
• Secure, streamlined sharing of
protected resources
• Consolidated management of user
identities and access
• Delivery of an integrated portfolio of
community-developed solutions
[ 53 ]
Trusted Identity in Research
The standard for over
600 higher education
institutions—and
counting!
[ 54 ]
15 425+
2 160+
0 2000+
7.8 million
Academic
Participants
Sponsored
Partners
Registered
Service Providers
Individuals served
by federated IdM
Foundation for Trust & Identity
54
®
• Eric Boyd, Internet2
• Stephen Wolff, Internet2
• Stephen Goff, PhD, CyVERSE/iPlant, University of Arizona
• Chris Dagdigian, BioTeam
• Daiwei Lin, PhD, NIAID, NIH
• Paul Gibson, USDA ARS
• Paul Travis, Syngenta
• Evan Burness, NCSA
• Sandeep Chandra, SDSC
• Jonathan Allen, PhD, Lawrence Livermore National Lab
• Claris Castillo, PhD, RENCI
• Vahan Simonyan, PhD, FDA
• Raja Mazumder, PhD, George Washington University
• Eli Dart, ESNET, US Department of Energy
• BGI
• Nature
[ 55 ]
Acknowledgements
Thank you!
Daniel Taylor, Director, Business Development
Internet2
dbt3@internet2.edu
703-517-2566

Recommended for you

100503 bioinfo instsymp
100503 bioinfo instsymp100503 bioinfo instsymp
100503 bioinfo instsymp

BeSTGRID aims to enhance research capability in New Zealand by providing skills and infrastructure to help researchers engage with new eResearch services and kick start centralized infrastructure. Since 2006, BeSTGRID has delivered services and tools to support research collaboration on shared datasets and computational resources. BeSTGRID coordinates access to compute and data resources, provides discipline-specific services and applications, and builds a sustainable community to develop middleware, applications, and services.

eresearchbestgridbioinformatics
Application of Assent in the safe - Networkshop44
Application of Assent in the safe -  Networkshop44Application of Assent in the safe -  Networkshop44
Application of Assent in the safe - Networkshop44

The document summarizes the Safe Share project, which aims to enable the secure exchange of health data between research sites for medical research. It establishes a higher assurance network using encrypted overlays between network nodes. It also explores implementing an authentication, authorization and accounting infrastructure to allow researchers to access data and systems using their home institution credentials. Several pilot programs are underway to test the network and authentication capabilities. The overall goal is to accelerate medical research while maintaining strict security and privacy of sensitive health data.

 
by Jisc
networkshop44jisc
Utilizing Nautilus and the National Research Platform for Big Data Research a...
Utilizing Nautilus and the National Research Platformfor Big Data Research a...Utilizing Nautilus and the National Research Platformfor Big Data Research a...
Utilizing Nautilus and the National Research Platform for Big Data Research a...

Panel Presentation Larry Smarr and Grant Scott MOREnet 2022 Annual Conference October 19, 2022

big datateachingnational research platform
Back up slides
Science DMZ
[ 57 ]
[ 58 ]
Rising expectations
Network throughput required to move y bytes in x time.
(US Dept of Energy - http://fasterdata.es.net).
should
be easy
This
year
3/30/16, © 2016
Internet2
Science DMZ* and perfSONAR
Design pattern to address the most common bottlenecks to moving data
* fasterdata.es.net
59

More Related Content

What's hot

The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
Larry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Larry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Larry Smarr
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
Larry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
UC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive ResearchUC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive Research
Larry Smarr
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data Science
Larry Smarr
 
Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020
Larry Smarr
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
Lab Southwest
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
Vivien Bonazzi
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
Vivien Bonazzi
 
Global Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureGlobal Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, Future
Larry Smarr
 
Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains NetworkPeering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains Network
Larry Smarr
 
Pacific Research Platform Science Drivers
Pacific Research Platform Science DriversPacific Research Platform Science Drivers
Pacific Research Platform Science Drivers
Larry Smarr
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
Larry Smarr
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
Larry Smarr
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
Nancy Wilkins-Diehr
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
inside-BigData.com
 

What's hot (20)

The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
UC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive ResearchUC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive Research
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data Science
 
Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 
Global Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureGlobal Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, Future
 
Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains NetworkPeering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains Network
 
Pacific Research Platform Science Drivers
Pacific Research Platform Science DriversPacific Research Platform Science Drivers
Pacific Research Platform Science Drivers
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 

Viewers also liked

Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
Ian Foster
 
Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
Ravi Madduri
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Ed Dodds
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
Ravi Madduri
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
Ravi Madduri
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics
Al Costa
 
HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support
Harvard Medical School, Partners Healthcare
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
joseplaborda
 
Effective ansible
Effective ansibleEffective ansible
Effective ansible
Wu Bigo
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
Ravi Madduri
 
Supporting Barack Obama for President
Supporting Barack Obama for PresidentSupporting Barack Obama for President
Supporting Barack Obama for President
Harvard Medical School, Partners Healthcare
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
Yahoo Developer Network
 
Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November
Camera Culture Group, MIT Media Lab
 
Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)
Camera Culture Group, MIT Media Lab
 
Stereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt HirschStereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt Hirsch
Camera Culture Group, MIT Media Lab
 
What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'
Camera Culture Group, MIT Media Lab
 
Multiview Imaging HW Overview
Multiview Imaging HW OverviewMultiview Imaging HW Overview
Multiview Imaging HW Overview
Camera Culture Group, MIT Media Lab
 
Google Glass Breakdown
Google Glass BreakdownGoogle Glass Breakdown
Google Glass Breakdown
Camera Culture Group, MIT Media Lab
 
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh RaskarWhat is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
Camera Culture Group, MIT Media Lab
 

Viewers also liked (20)

Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
 
Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
Public.Cdsc.Middleton
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics
 
HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
 
Effective ansible
Effective ansibleEffective ansible
Effective ansible
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
Supporting Barack Obama for President
Supporting Barack Obama for PresidentSupporting Barack Obama for President
Supporting Barack Obama for President
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November
 
Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)
 
Stereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt HirschStereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt Hirsch
 
What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'
 
Multiview Imaging HW Overview
Multiview Imaging HW OverviewMultiview Imaging HW Overview
Multiview Imaging HW Overview
 
Google Glass Breakdown
Google Glass BreakdownGoogle Glass Breakdown
Google Glass Breakdown
 
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh RaskarWhat is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
 

Similar to Internet2 Bio IT 2016 v2

An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research
Larry Smarr
 
Building a Regional 100G Collaboration Infrastructure
Building a Regional 100G Collaboration InfrastructureBuilding a Regional 100G Collaboration Infrastructure
Building a Regional 100G Collaboration Infrastructure
Larry Smarr
 
100503 bioinfo instsymp
100503 bioinfo instsymp100503 bioinfo instsymp
Application of Assent in the safe - Networkshop44
Application of Assent in the safe -  Networkshop44Application of Assent in the safe -  Networkshop44
Application of Assent in the safe - Networkshop44
Jisc
 
Utilizing Nautilus and the National Research Platform for Big Data Research a...
Utilizing Nautilus and the National Research Platformfor Big Data Research a...Utilizing Nautilus and the National Research Platformfor Big Data Research a...
Utilizing Nautilus and the National Research Platform for Big Data Research a...
Larry Smarr
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
Geoffrey Fox
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
Larry Smarr
 
High Performance Cyberinfrastructure for Data-Intensive Research
High Performance Cyberinfrastructure for Data-Intensive ResearchHigh Performance Cyberinfrastructure for Data-Intensive Research
High Performance Cyberinfrastructure for Data-Intensive Research
Larry Smarr
 
SKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID InfrastructureSKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID Infrastructure
Nick Jones
 
The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer
The Pacific Research Platform-a High-Bandwidth Distributed SupercomputerThe Pacific Research Platform-a High-Bandwidth Distributed Supercomputer
The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer
Larry Smarr
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
Blue BRIDGE
 
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
African Open Science Platform
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
Larry Smarr
 
Big Data
Big Data Big Data
Democratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
Larry Smarr
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
guestd60742
 
Shared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK researchShared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK research
Martin Hamilton
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
Larry Smarr
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
SayDotCom.com
 
ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012
Charith Perera
 

Similar to Internet2 Bio IT 2016 v2 (20)

An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research
 
Building a Regional 100G Collaboration Infrastructure
Building a Regional 100G Collaboration InfrastructureBuilding a Regional 100G Collaboration Infrastructure
Building a Regional 100G Collaboration Infrastructure
 
100503 bioinfo instsymp
100503 bioinfo instsymp100503 bioinfo instsymp
100503 bioinfo instsymp
 
Application of Assent in the safe - Networkshop44
Application of Assent in the safe -  Networkshop44Application of Assent in the safe -  Networkshop44
Application of Assent in the safe - Networkshop44
 
Utilizing Nautilus and the National Research Platform for Big Data Research a...
Utilizing Nautilus and the National Research Platformfor Big Data Research a...Utilizing Nautilus and the National Research Platformfor Big Data Research a...
Utilizing Nautilus and the National Research Platform for Big Data Research a...
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
 
High Performance Cyberinfrastructure for Data-Intensive Research
High Performance Cyberinfrastructure for Data-Intensive ResearchHigh Performance Cyberinfrastructure for Data-Intensive Research
High Performance Cyberinfrastructure for Data-Intensive Research
 
SKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID InfrastructureSKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID Infrastructure
 
The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer
The Pacific Research Platform-a High-Bandwidth Distributed SupercomputerThe Pacific Research Platform-a High-Bandwidth Distributed Supercomputer
The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
 
Big Data
Big Data Big Data
Big Data
 
Democratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Shared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK researchShared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK research
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 
ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012
 

Internet2 Bio IT 2016 v2

  • 1. Pushing Discovery with Internet2 Cloud to Supercomputing in Life Sciences DAN TAYLOR Director, Business Development, Internet2 BIO-IT WORLD 2016 BOSTON APRIL, 2016
  • 2. 2 – 8/30/ Internet2 Overview • An advanced networking consortium – Academia – Corporations – Government • Operates a best-in-class national optical network – 15,000 miles of dedicated fiber – 100G routers and optical transport systems – 8.8 Tbps capacity • For over 20 years, our mission has been to – Provide cost effective broadband and collaboration technologies to facilitate frictionless research in Big Science – broad collaboration, extremely large data sets – Create tomorrow’s networks & a platform for networking research – Engage stakeholders in • Bridging the IT/Researcher gap • Developing new technologies critical to their missions
  • 3. [ 3 ] The 4th Gen Internet2 Network Internet2 Network by the numbers 17 Juniper MX960 nodes 31 Brocade and Juniper switches 49 custom colocation facilities 250+ amplification racks 15,717 miles of newly acquired dark fiber 2,400 miles of partnered capacity with Zayo Communications 8.8 Tbps of optical capacity 100 Gbps of hybrid Layer 2 and Layer 3 capacity 300+ Ciena ActiveFlex 6500 network elements
  • 4. Technology • A Research Grade high speed network – optimized for “Elephant flows” • Layer 1 – secure point to point wavelength networking • Advanced Layer 2 Services – Open virtual network for Life Sciences with connectivity speeds up to 100 Gbs • SDN Network Virtualization customer trials now • Advanced Layer 3 Services – High speed IP connectivity to the world • Superior economics • Secure sharing of online research resource – federated identity management system
  • 5. [ 5 ] Internet2 Members and Partners 255 Higher Education members 67 Affiliate members 41 R&E Network members 82 Industry members 65+ Int’l partners reaching over 100 Nations 93,000+ Community anchor institutions Focused on member technology needs since 1996 "The idea of being able to collaborate with anybody, anywhere, without constraint…" —Jim Bottum, CIO, Clemson University Community
  • 6. 6 – 8/30/ Strong international partnerships • Agreements with international networking partners offer interoperability and access • Enable collaboration between U.S. researchers and overseas counterparts in over 100 international R & E networks Community
  • 7. Some of our Affiliate Members 7
  • 8. [ 8 ] *Routers Stanford Computer Workstations Berkeley, Stanford Security Systems Univ of Michigan Security Systems Georgia Tech Social Media Harvard Network Caching MIT Search Stanford
  • 9. [ 9 ] The Route to Innovation August 30, 2016 © 2016 Internet2 Abundant Bandwidth • Raw capacity now available on Internet2 Network a key imagination enabler • Incent disruptive use of new, advanced capabilities Software Defined Networking • Open up network layer itself to innovation • Let innovators communicate with and program the network itself • Allow developers to optimize the network for specific applications Science DMZ • Architect a special solution to allow higher-performance data flows • Include end-to-end performance monitoring server and software • Include SDN server to support programmability
  • 10. Life Sciences Research Today • Sharing Big Data sets (genomic, environmental, imagery) key to basic and applied research • Reproducibility - need to capture methods as well as raw data – High variability in analytic processes and instruments – Inconsistent formats and standards • Lack of metadata & standards • Biological systems are immensely complicated and dynamic (S. Goff, CyVERSE/iPlant) • 21k human genes can make >100k proteins • >50% of genes are controlled by day-night cycles • Proteins have an average half-life of 30 hours • Several thousand metabolites are rapidly changing • Traits are environmentally and genetically controlled • Information Technology - High Performance Computing and Networking - now can explore these systems through simulation • Collaboration – Cross Domain, Cross Discipline – Distribution of systems and talent is global – Resources are public, private and academic
  • 11. BIO-IT Trends in the Trenches 2015 with Chris Dagdigian Take Aways - Science is changing faster than IT funding cycle for data intensive computing environments - Forward looking 100G multi site , multi party collaborations required - Cloud adoption driven by capability vs cost - Centralized data center dead; future is distributed computing/data stores - Big pharma security challenge has been met - SDN is real and happening now; part of infrastructure automation wave - Blast radius more important than ever: DOE’s Science DMZ architecture is a solution https://youtu.be/U6i0THTxe4o http://www.slideshare.net/chrisdag/201 5-bioit-trends-from-the-frenches 2015 Bio-IT World Conference & Expo • Change • Networking • Cloud • Decentralized Collaboration • Security • Mission Networks
  • 13. 13 – 8/30/2016, © 2009 Internet2 Data Tsunami Physics Large Hadron Collider Life Sciences Next Generation Sequencers CERN Illumina
  • 15. 15 – 8/30/20 2012: US – China 10 Gbps Link Fed Ex: 2 days Internet+ FTP: 26 hours China ‐ US 10G Link: 30 secs Dr. Lin Fang Dr. Dawei Lin Sample.fa (24GB)
  • 16. NCBI/UC-Davis/BGI : First ultra high speed transfer of genomic data between China & US, June 2012 “The 10 Gigabit network connection is even faster than transferring data to most local hard drives,” said Dr. Lin [of UC, Davis]. “The use of a 10 Gigabit network connection will be groundbreaking, very much like email replacing hand delivered mail for communication. It will enable scientists in the genomics-related fields to communicate and transfer data more rapidly and conveniently, and bring the best minds together to better explore the mysteries of life science.” (BGI press release) Life Sciences Engagement 16 Community
  • 17. Forward Looking 100G Networks & Multi Site Multi Party Collaboration Accelerating Discovery: USDA ARS Science Network 8/30/2016, © 2016 Internet2
  • 18. [ 18 ] USDA Agriculture Research Services Science Network • USDA scope is far beyond human
  • 19. USDA Agricultural Research Services Use Cases • Drought (Soil Moisture) Project – Challenging Volumes of Data – NASA satellite data storage - 7 TB/mo., 36mo mission – ARS Hydrology and Remote Sensing Lab analysis - 108 TB – Data completely re-process 3 to 5 times • Microbial Genomics Project – Computational Bottlenecks – Individual Strains of bacteria and microorganism communities related to Food Safety Animal Health Feed Efficiency
  • 20. [ 20 ] ARS Big Data Initiative Big Data Workshop Recommendations, (February 2013) Three Pillars of the ARS Big Data Implementation Plan – Network, HPC, Virtual Research Support (April, 2014) • Develop a Science DMZ • Enable high-speed, low-latency transfer of research data to HPC and storage from ARS locations • Virtual Researcher Support Implementation Complete (Nov. 2015) Clay Center, NE; Albany, CA; Beltsville Labs/Nat’l Ag. Library, Beltsville, MD Stoneville, MS; Ft. Collins, CO Ames/NADC, IA • ARS Scientific Computing Assessment • Final Report March 2014
  • 21. SCInet Locations and Gateways USDA AGRICULTURAL RESEARCH SERVICE Albany, CA Ft. Collins, CO Clay Center, NE Ames, IA Stoneville, MS Beltsville, MD 100 Gb 100 Gb 100 Gb 10 Gb 10 Gb10 Gb
  • 22. Cloud & Distributed Research Computing @Scale [ 22 ] Community Internet2 Approach : Agile scaling of resources and capacity Access to multi-domain, multi-discipline expertise in one dynamic global community Offer a bottomless toolbox for Innovation for the researcher
  • 23. [ 23 ] New High Speed Cloud Collaborations 8/30/20 16 23 10, x10G, x100G
  • 24. Syngenta Science Network Bringing Plant Potential to Life through enhanced computing capacity
  • 25. Syngenta Science Network • Syngenta is a leading agriculture company helping to improve global food security by enabling millions of farmers to make better use of available resources. • Key research challenge: How to grow plants more efficiently? • Internet2 members, especially land grant universities, are important research partners.
  • 26. The Challenge – Increasing size of scientific data sets – Growing number of useful external resources and partners – Complexity of genomic analyses is increasing – Need for big data collaborations across the globe – Must Innovate
  • 27. – Higher data throughput – High speed connectivity to AWS Direct Connect Surge HPC Collaborations with academic community – High speed connections to best-in-class supercomputing resources NCSA – University of Illinois Leverage NCSA expertise in building custom R&D workflows Leverage NCSA Industry Partnership Program A*Star Supercomputing Center in Singapore Supports a global, distributed, scientific computing capability – Global scale : creating a global fabric for computing and collaboration
  • 28. “I want to be 15 minutes behind NCSA and 6 months ahead of my competition” - Keith Gray, BP [ 28 ] National Center for Supercomputing Applications
  • 29. [ 29 ] *Better Designed* *More Durable* *Available Sooner* Theoretical & Basic Research Prototyping & Development Optimization & Robustification Commercialization
  • 30. [ 30 ] NCSA Mayo Clinic @Scale Genome-Wide Association Study for Alzheimer’s disease • NCSA Private Sector Program – UIUC HPCBio – Mayo Clinic • BlueWatersteam and Swiss Institute of Bioinformatics worked together to identify which genetic variants interact to influence gene expression patterns that may associate with Alzheimer’s disease
  • 31. [ 31 ] Big Data and Big Compute Problem • 50,011,495,056 pairs of variants • Each variant pair is tested against 181 subjects and 24,544 genic regions • Computationally large problem, PLINK: ~ 2 years at Mayo FastEpistasis: ~ 6 hours on BlueWaters • Can be a big data problem: - 500 PB if keep all results - 4 TB when using a conservative cutoff
  • 33. UCSC Cancer Genomics Hub: Large Data Flows to End Users 1G 8G 15G Cumulative TBs of CGH Files Downloaded Data Source: David Haussler, Brad Smith, UCSC; Larry Smarr, CalIT2 30 PB http://blogs.nature.com/news/2012/05/us-cancer-genome-repository-hopes-to-speed-research.html
  • 34. [ 34 ] SDSC Protein Data Base Archive • Repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. Information is annotated and publicly released into the archive by the wwPDB.
  • 35. SDSC • Expertise – Bioinformatics programming and applications support. – Computational chemistry methods. – Compliance requirements, e.g., for dbGaP, FISMA and HIPAA. – Data mining techniques, machine learning and predictive analytics – HPC and storage system architecture and design. – Scientific workflow systems and informatics pipelines. • Education and Training – Intensive Boot camps for working professionals - Data Mining, Graph Analytics, and Bioinformatics and Scientific Worflows. – Customized, on-site training sessions/programs. – Data Science Certificate program. – “Hackathon” events in data science and other topics.
  • 36. 8/30/20 16 Sherlock Cloud: A HIPAA-Compliant Cloud Healthcare IT Managed Services - SDSC Center of Excellence 36 • Expertise in Systems, Cyber Security, Data Management, Analytics, Application Development, Advanced User Support and Project Management • Operating the first & largest FISMA Data Warehouse platform for Medicaid fraud, waste and abuse analysis • Leveraged FISMA experience to offer HIPAA- Compliant managed hosting for UC and academia • Supporting HHS CMS, NIH, UCOP and other UC Campuses • Sherlock services : Data Lab, Analytics, Case Management and Compliant Cloud
  • 38. 38 – 8/30/2016, © Internet2 Lawrence Livermore NL HPC Innovation Center Cardioid Electrophysiology human heart simulations allowing exploration of causes of • Arrhythmia • Sudden cardiac arrest • Predictive drug interactions. Depicts activation of each heart muscle cell and the cell-to-cell transfer of the voltage of up to 3 billion cells - in near-real time. Metagenomic analysis with Catalyst: • Comparing short genetic fragments in a query dataset against a large searchable index (14 million genomes - 3x larger than those currently in use) of genomes to determine the threat an organism poses to human health
  • 39. Community Data Science Resources renci RADII and GWU HIVE Driving Infrastructure Virtualization Enabling Reproducibility For FDA Submissions [ 39 ]
  • 40. RADII Resource Aware Datacentric collaboratIve Infrastructure Goal Make data-driven collaborations a ‘turn-key’ experience for domain researchers and a ‘commodity’ for the science community Approach A new cyber-infrastructure to manage data-centric collaborations based upon natural models of collaborations that occur among scientists. RENCI: Claris Castillo, Fan Jiang, Charles Schmidt, Paul Ruth, Anirban Mandal ,Shu Huang, Yufeng Xin, Ilya Baldin, Arcot Rajasekar SDSC: Amit Majumdar DUKE: Erich Huang Workflows - especially data-driven workflows and workflow ensembles - are becoming a centerpiece of modern computational science.
  • 41. RADII Rationale • Multi-institutional research teams grapple with multitude of resources – Policy-restricted large data sets – Campus compute resources – National compute resources – Instruments that produce data • Interconnected by networks – Campus, regional, national providers • Many options, much complexity • Data and infrastructure are treated separately RADII Creates A cyberinfrastructure that integrates data and resource management from the ground up to support data-centric research. RADII allows scientists to easily map collaborative data-driven activities onto a dynamically configurable cloud infrastructure.
  • 42. Infrastructure management have no visibility into data resources Data management solutions have no visibility into the infrastructure RADII: Foundational technologies Data-grids present distributed data under a one single abstraction and authorization layer Networked Infrastructure as a Service (NIaaS) for rapid deployment of programmable network virtual infrastructure (clouds). Disjoint solutions Incompatible resource abstractions Gap to reduce the data-infrastructure management gap
  • 43. RADII System – Virtualizing Data, Compute and Network for Collaboration 43 Novel mechanisms to represent data-centric collaborations using DFD formalism Data-centric resource management mechanisms for provisioning and de- provisioning resources dynamically through out the lifecycle of collaborations Novel mechanisms to map data processes, computations, storage and organization entities onto infrastructure
  • 44. FDA and George Washington University Big Data Decisions: Linking Regulatory and Industry Organizations with HIVE Bio-Compute Objects [ 44 ] Presented by: Dan Taylor, Internet 2 | Bio IT | Boston | 2016
  • 45. EI H V From Jan 2016: Vahan Simonyan, Raja Mazumder lecture NIH: Frontiers in Data Science Series https://videocast.nih.gov/summary.asp?Live=18299&bhcp=1 High-performance Integrated Virtual Environment A regulatory NGS data analysis platform
  • 46. BIG DATA – From a range of samples and instruments to approval for use analysis and review sample archival sequencing run file transfer regulation computation pipelines produced files are massive in size transfer is slow too large to keep forever; not standardized difficult to validate difficult to visualize and interpret how do we avoid mistakes? NGS lifecycle: from a biological sample to biomedical research and regulation
  • 47. • Data Size: petabyte scale, soon exa-bytes • Data Transfer: too slow over existing networks • Data Archival: retaining consistent datasets across many years of mandated evidence maintenance is difficult • Data Standards: floating standards, multiplicity of formats, inadequate communication protocols • Data Complexity: sophisticated IT framework needed for complex dataflow • Data Privacy: constrictive legal framework and ownership issues across the board from the patient bedside to the FDA regulation • Data Security: large number of complicated security rules and data protection tax IT subsystems and cripple performance • Computation Size: distributed computing, inefficiently parallelized, requires large investment of hardware, software and human-ware • Computation Standards: non canonical computation protocols, difficult to compare, reproduce, rely on computations • Computation Complexity: significant investment of time and efforts to learn appropriate skills and avoid pitfalls in complex computational pipelines • Interpretation: large outputs from enormous computations are difficult to visualize and summarize • Publication: peer review and audit requires communication by massive amount of information ... and how do we avoid mistakes ? software challenges and needs
  • 48. HIVE is an End to End Solution • Data retrieval from anywhere in the world • Storage of extra large scale data • Security approved by OIM • Integrator platform to bring different data and analytics together • Tailor made analytics designed around needs • Visualization made to help in interpretation of data • Support of the entire hard-, soft-ware and knowledge infrastructure • Expertise accumulated in the agency • Bio-Compute objects repository to provide reproducibility and interoperability and long term referable storage of computations and results HIVE is not • an application to perform few tasks • yet another database • a computer cluster or a cloud or a data center • an IT subsystem More: http://www.fda.gov/ScienceResearch/SpecialTopics/RegulatoryScience/ucm491 893.htm
  • 49. Instantiation DataTypeDefinitions Definitions of metadata types Data Typing Engine Definitions of computations metadata Data Bio-compute Definitions of algorithms and pipeline descriptions Computational protocols Verifiable results within acceptable uncertainty/er ror Scientifically reliable interpretation HIVE data universe
  • 50. industry FDA regulatory analysis 2. compute 3. submit 1. data- forming 6. issues resubmits 5. regulatory decision 4. SOPP/prot ocols consumer $ millions of dollars 7. yes 7. no regulatory iterations ~$800 Million R&D dollars for a single drug ~$2.6 Billion total cost
  • 51. industr y FD A HIVE public-HIVEGalaxy CLC DNA-nexus 2. compute 3. submit1. data-forming 6. issues resubmits 5. bio- compute 2. HIVE SOPP/protocols 4. SOPP/prot ocols consumer 7. yes 7 .no 4. submit bio-compute integration 3. compute Facilitate integration $ millions of dollars bio-compute as a way to link regulatory and industry organizations
  • 53. [ 53 ] Community-developed framework of trust enables: • Secure, streamlined sharing of protected resources • Consolidated management of user identities and access • Delivery of an integrated portfolio of community-developed solutions [ 53 ] Trusted Identity in Research The standard for over 600 higher education institutions—and counting!
  • 54. [ 54 ] 15 425+ 2 160+ 0 2000+ 7.8 million Academic Participants Sponsored Partners Registered Service Providers Individuals served by federated IdM Foundation for Trust & Identity 54 ®
  • 55. • Eric Boyd, Internet2 • Stephen Wolff, Internet2 • Stephen Goff, PhD, CyVERSE/iPlant, University of Arizona • Chris Dagdigian, BioTeam • Daiwei Lin, PhD, NIAID, NIH • Paul Gibson, USDA ARS • Paul Travis, Syngenta • Evan Burness, NCSA • Sandeep Chandra, SDSC • Jonathan Allen, PhD, Lawrence Livermore National Lab • Claris Castillo, PhD, RENCI • Vahan Simonyan, PhD, FDA • Raja Mazumder, PhD, George Washington University • Eli Dart, ESNET, US Department of Energy • BGI • Nature [ 55 ] Acknowledgements
  • 56. Thank you! Daniel Taylor, Director, Business Development Internet2 dbt3@internet2.edu 703-517-2566
  • 58. [ 58 ] Rising expectations Network throughput required to move y bytes in x time. (US Dept of Energy - http://fasterdata.es.net). should be easy This year
  • 59. 3/30/16, © 2016 Internet2 Science DMZ* and perfSONAR Design pattern to address the most common bottlenecks to moving data * fasterdata.es.net 59

Editor's Notes

  1. Greetings I’m Dan Taylor from Internet2 – thanks for joining us. I’m going to talk a bit about internet2 and the work we’re doing with clouds and other compute resources in our community. There are a lot of slides and I’ll move quickly so pls stop by our booth or download the slides if you have questions.
  2. Internet2 is the Research and education network for the US. We’re a membership consortium of academia , government and corporations. Internet2 is an advanced networking consortium comprised of 221 U.S. universities, in cooperation with 45 leading corporations, 66 government agencies, laboratories and other institutions of higher learning, 35 regional and state research and education networks and more than 100 national research and education networking organizations representing over 50 countries Internet2 actively engages our stakeholders in the development of important new technologies including middleware, security, network research and performance measurement capabilities which are critical to the achievement of the mission goals of our members. Throughout our first 15 years, Internet2 has served a unique role among networking organizations, pioneering the use of advanced network applications and technologies, and facilitating their development—to facilitate the work of the research community. Internet2 operates an advanced national optical network based on 17,500 miles of dedicated fiber and utilizes the latest 100G routers and optical transport systems with 8.8Tbps of system capacity
  3. Goal: Deepen and extend, advance, sustain, advance digital resources ecosystem. Value: Growing portfolio of resources and services: advanced computing, high-end visualization, data analysis, and other resources and services. Interoperability with other infrastructures.
  4. membership numbers as of 2014-03-27 Campus Champions (200 at 175 institutions) 14,000 participants in training workshops (online and in person).
  5. Absolutely key to our success is the global partnerships we have formed. [>>] Internet2 partners with over 50 national research and education networks including our friends in Canada to enable connectivity to more than 100 international networks. These partnerships provide the basis for understanding how to facilitate collaborations between the US Internet2 community and counterparts in other countries Our global partnerships have yielded important developments in new technologies. For example - the DICE collaborative is a partnership between GEANT, Internet2, CANARIE and ESnet which provides a joint forum for North American and European investment in advanced networking leadership Our collaboration has led to the development of world-leading tools like PerfSONAR and dynamic circuit networking – which I will touch on later. Our focus in 2010 is to deliver direct services to our members as a result of our development investments
  6. Our community has a track record of IT successes ; we haven’t looked at life sciences yet but I’m pretty sure the Internet2 community’s impact is even greater there
  7. R&E must keep constructing the conditions that spur innovation Give innovators an environment where they’re free to try new, untested, unpopular, ridiculously challenging things Innovation requires a big playground An innovation platform must encourage utilization, not limit it
  8. Life sciences research shares many of the trends we see else where in big science - data set sizes growing rapidly, increased need for collaboration – but we also see a new ecosystem fueling research. At the same time , however, diminishing R&D $ are pressuring the industry and government .
  9. Chris Dagdigian does a great job detailing how IT deals with the changes in Life sciences research. I have a couple of takeways from his talks, its useful to see how Internet2 addresses whats going on Scientific instrument technology – which generates scinetifc data – is changing faster than the IT refresh cycle. Organizations see the big data wave coming and are now implementing 100 G networks to get ahead of the rising tide Organizations are going to the cloud to be able to do things they’ve can’t do on theyir own, not just to save monney Centralization will not wliminate the need to move data Security concerns with high speed transfers and collaboration can be addressed Virtualized infrastructure is moving to the wide area Big science flows are more disruptive than ever to enterprise networks – theres a trend toward separating business and research networks
  10. One of the things we used to in the R&E community is change in scientific data growth
  11. The internet2 community has dealt with the data tsunami for many years now. The LHC shut down for 2 years to upgrade its power – annual output has jumped from 13 to 30 petabytes a year. This data is distributed thru out the world by the R&E networks. In Life Sciences driver is NGS, falling in price rapidly and a proliferation of devices generating data all over the world http://www.nature.com/news/large-hadron-collider-the-big-reboot-1.16095
  12. Our network has responded
  13. Back in 2012 we showed how a 10G link from beijing to UC Davis could change the game. A 24 GB file that would take 26 hours to traverse the internet was transferred in 30 seconds
  14. Researchers likened the difference in collaboration like going from letters to email
  15. So we’re seeing organizations get ahead of the tsunami by getting bigger networks. I recently helped the Department of Agriculture’s Agriculture Research Service do just this.
  16. I like to show this slide to illustrate how much llfe there is beyond humans, and USDA ARS has to deal with many of them – and how they impact our world. It shows the size of genomes of various species, with the x axis being a log scale. Humans are there at the top, one of a number of mammals the usda is interested in . But they are also interested in birds, crustaceans, fish, fungi , algae , bacteria and protozoans – and of course plants. And, some are extremely complex – you see the size of the wheat genome is orders of magnitude larger than the human.
  17. Beyond genomics , these kinds of projects create huge volumes of data as well as computational bottlenecks
  18. To attack this problem they gathered requirements in 2013, hired bioTeam to do an assessment and we actually completed a 6 node science network of 10 and 100G links by the end of 2015. that was fast!!
  19. R&E collaborations are handled at the 100G links on the coasts and another 100g feeds the new HPC in Ames Iowa
  20. You can view Internet2 as the medium for all the data and computing resources, forming a problem solving community around these high speed connections
  21. Syngenta , a life sciences company , is a great example of an organization making the most of these connections
  22. They are an agribusiness with a mission to improve plant productivity, they stay on the leading edge of science thru their internal research and their collaboration with the academic community
  23. Syngenta was challenged by many of the issues USDA saw, but on a global scale and even more pressure to innovate.
  24. We installed a 10G Layer 2 service that provided high speed Direct Connect access to AWS where they could do surge HPC and retrieve sequencing data outsourced to the academic community. They also could connect to NCSA to build and run custom pipelines. They can also use the connection to work with A*Star supercomputer center in Singapore , where they intend on building an asian genomic center. Finally we expect to bring up locations in switzerland and GB, completing a global research network.
  25. I just mentioned NCSA and this resource deserves a few seconds. NCSA does a lot of work with industry , and a comment from a VP at BP says it all….
  26. Leveraging its talent and one of the fastest computers in the world, NCSA provides companies with a full range of services to help the innovate
  27. They do a lot of work in the life sciences ; the one I’ll note here is an alzheimers gwas study with Mayo clinic
  28. In this one they handled an enormous amount of data and kind of strong armed the computational challenge – what wouldve taken 2 years at Mayo was done in 6 hrs on Blue Waters
  29. Another incredible resource in the community is SDSC
  30. You may know them as the home of CGHUB which holds the cancer genome atlas. Note the bits/second growth from 1g to 15 G from 2012 -2015 CGHub is a large-scale data repository and portal for the National Cancer Institute’s Cancer Genome Research Programs Current capacity is 5 Petabytes, scalable to 20 Petabytes. The Cancer Genome Atlas, one data collection of many in the CGH, by itself could produce 10 PB in the next four years As an illustration of how Internet2 is making network resources accessible, consider the the UCSC Cancer Genomics Hub, operated by the University of California at Santa Cruz and located at the San Diego Supercomputer Center co-location facility. Without the “big pipes” provided between SDSC and Internet2, the CGH would not be able to keep pace with demand for its data. As both users and data in the repository grew over a three year period, the bandwidth needed to support the activity grew by 15x.
  31. SDSC also has other important data sources like the Protein Data base archive
  32. They also have consulting services very much focused to support life sciences research
  33. I’d also note the cloud environment they built for HHS CMS – FISMA compliant and HIPAA ready.
  34. The National Labs are also a huge part of the community
  35. Whenever I run into a Metagenomics problem I reference Jonathan Allen’s huge microbiome work with metagenomics
  36. We also have a number of interesting efforts to facilitate collaboration and reproducibility.
  37. RADII is an exciting project that virtualizes clouds leveraging iRODS and virtual networks. The idea is to allow researchers, not IT, to spin up and monitor local and cloud resources, compute and network infrastructure on demand. So for example when I need to complete collaborative a workflow and move data and compute over a number of compute resources
  38. Radii allows you to represent data-centric collaborations using standard modeling mechanisms; map data processes, computations, storage, and organizational entities onto the physical infrastructure with the click of a button provision and de-provision infrastructure dynamically throughout the lifecycle of the collaboration.
  39. Radii builds on the data management of irods and infrastructure virtualization of ORCA and Exogeni to give researchers control over the infrastructure that’s necessary for collaboration
  40. Here’s an example of this virtualization, with researchers at Duke UNC and Scripps sharing data and workflows on SDSC compute resources. Ease of use, Improve end to end performance perceived by the scientists To enable this vision we need two technologies with high level of programmability and automation.
  41. A collaboration between the FDA and Gw is looking improve reproducibility by using biocompute objects. This should accelerate regulatory approvals and reduce costs.
  42. This represents the process for FDA submissions supported by NGS. There is a lot of opportunities for making mistakes along the way. These mistakes result in delays and costly resubmissions
  43. Of the challenges in gaining agreement at the end of this process, many of which are addressed by HIVE, its potential to impact reproducibility is the most exciting
  44. The HIVE platform is big data analysis solution used by the FDA and available to industry. The bio compute objects repository is key to reproducibility
  45. To get to better reproducibility, HIVE relies on a data typing engine to define meta data for the data , computations and both algorithms and pipelines to create a biocompute object related to the submission that’s reusable by the FDA. Data typing engine- facility allowing to register structure, syntax and ontologies of the information fields of objects. Metadata type- descriptive information on the structure of data files or electronic records. Computation metadata- Description of arguments and parameters (not values) for computational analysis. Definitions of algorithms and pipeline descriptions- descriptions of the characteristics for executable applications. Data- collection of actual values observed and accumulated during experimentation by a device or an observer. Computational protocol- well parameterized computational pipeline designed to produce scientifically meritable outcomes with appropriate data. Bio-compute- instance of an actual execution of the computational protocols on a given set of data with actual values of parameters generating identifiable outcomes/results.
  46. HIVE would help by recording the parameters of the analysis as biocompute objects (or use existing ones in the public repository) and share them with FDA so they can verify that analysis. Data forming is done using a public hive and integrated with your usual analytic tools. The resulting biocompute objects are submitted to FDA; these biocompute objects are used in the FDA HIVe to validate the results of the submission.
  47. Finally I ‘ll say a few words about federated identity.
  48. Over 10 yrs ago the R&E Community recognized the importance of trust in collaborations and created the InCommon federated identity management solution.
  49. We now have a leading solution with around 8MM users. Pls stop by the booth for more information.