The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
The document summarizes Dr. Larry Smarr's keynote presentation on enabling a high-performance national research platform. It describes how multi-institutional research increasingly relies on access to large datasets, requiring new cyberinfrastructure. The Pacific Research Platform provides high-bandwidth networking between universities to support research collaborations across disciplines. The next steps involve scaling this model into a national and global platform. The presentation highlights how the PRP enables various scientific applications and drives innovation through improved data transfer capabilities and distributed computing resources.
05.11.03 Physics Department Colloquium UCSD Title: Physics Research in an Era of Global Cyberinfrastructure La Jolla, CA
The document discusses Internet2, an advanced networking consortium that operates a 15,000 mile fiber optic network for research and education. It provides very high speed connectivity and collaboration technologies to facilitate large data sharing and frictionless research. Examples are given of life sciences projects utilizing Internet2's high-speed network for genomic research and agricultural applications involving terabytes of satellite and sensor data. The network is expanding to include cloud computing resources and supercomputing centers to enable global-scale distributed scientific computing and collaboration.
Presentation at the AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013 Additional related material at: http://wiki.knoesis.org/index.php/Smart_Data Related paper at: http://www.knoesis.org/library/resource.php?id=1903 Abstract: We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the five V's of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive Value for supporting practical applications transcending physical-cyber-social continuum.
Featured Keynote at Worldcomp'14, July 2014: http://www.world-academy-of-science.org/worldcomp14/ws/keynotes/keynote_sheth Video of the talk at: http://youtu.be/2991W7OBLqU Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is human health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information, etc.). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data! In this talk, I will forward the concept of Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If I am an asthma patient, for all the data relevant to me with the four V-challenges, what I care about is simply, “How is my current health, and what is the risk of having an asthma attack in my personal situation, especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations. Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.
This document discusses the convergence of IoT devices, edge computing, fog computing, and cloud computing infrastructures. It notes the exponential growth in connected devices and data generated, and need for distributed computing resources closer to users to address latency, bandwidth and other constraints. Key research issues discussed include locality-aware resource management, deployment and reconfiguration of edge sites, energy monitoring and optimization, and resilience across distributed infrastructures.
Cloud computing has now crossed the frontiers of research to reach industry. It is used every day , whether to exchange emails or make reservations on web sites. However, many research works remain to be done to improve the performance and functionality of these platforms of tomorrow. In this talk, I will do an overview of some these theoretical and appliead researches done at INRIA and particularly around Clouds distribution, energy monitoring and management, massive data processing and exchange, and resource management.
There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems. Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation. We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).
Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer researchers a treasure trove of new real-time information.
Invited presentation by Calit2 Director Larry Smarr to the UC IT Leadership Council in Oakland, Calif., on May 19, 2014.
04.11.09 Invited Talk at the Sun Microsystems Booth SC04 Title: Cal-(IT)2 Projects with Sun Microsystems Pittsburgh, PA
Cyberenvironments integrate shared and custom cyberinfrastructure resources into a process-oriented framework to support scientific communities and allow researchers to focus on their work rather than managing infrastructure. They enable more complex multi-disciplinary challenges to be tackled through enhanced knowledge production and application. Key challenges include coordinating distributed resources and users without centralization and evolving systems rapidly to keep pace with advancing science.
Grid computing combines the resources of multiple computers from different organizations to solve large problems. It works by sharing computing power, memory, storage and other resources across an authorized network. Examples of grid computing include projects that analyze large datasets like genome sequencing or simulate complex systems like climate modeling. Major grid computing projects include those run by scientific organizations like CERN and SETI@home, which analyzes radio telescope data using volunteers' computers. Grid computing infrastructure allows resources to be accessed easily like a utility over the network.
Keynote at Web Intelligence 2017: http://webintelligence2017.com/program/keynotes/ Video: https://youtu.be/EIbhcqakgvA Paper: http://knoesis.org/node/2698 Abstract: While Bill Gates, Stephen Hawking, Elon Musk, Peter Thiel, and others engage in OpenAI discussions of whether or not AI, robots, and machines will replace humans, proponents of human-centric computing continue to extend work in which humans and machine partner in contextualized and personalized processing of multimodal data to derive actionable information. In this talk, we discuss how maturing towards the emerging paradigms of semantic computing (SC), cognitive computing (CC), and perceptual computing (PC) provides a continuum through which to exploit the ever-increasing and growing diversity of data that could enhance people’s daily lives. SC and CC sift through raw data to personalize it according to context and individual users, creating abstractions that move the data closer to what humans can readily understand and apply in decision-making. PC, which interacts with the surrounding environment to collect data that is relevant and useful in understanding the outside world, is characterized by interpretative and exploratory activities that are supported by the use of prior/background knowledge. Using the examples of personalized digital health and a smart city, we will demonstrate how the trio of these computing paradigms form complementary capabilities that will enable the development of the next generation of intelligent systems. For background: http://bit.ly/PCSComputing
In this deck from the 2014 HPC User Forum in Seattle, Jack Collins from the National Cancer Institute presents: Genomes to Structures to Function: The Role of HPC. Watch the video presentation: http://wp.me/p3RLHQ-d28
1) Amit Sheth presented on how knowledge can help machines better understand big data. 2) He discussed challenges like understanding implicit entities, analyzing drug abuse forums, and understanding city traffic using sensors and text. 3) Sheth argued that knowledge graphs and ontologies can help interpret diverse data types and provide contextual understanding to help solve real-world problems.
This document summarizes Dr. Larry Smarr's vision for an integrated science cyberinfrastructure to support data-intensive research. It discusses the exponential growth of digital data and need for dedicated high-bandwidth networks and data repositories. Specific examples are provided of initiatives at UCSD, regional optical networks connecting research institutions, and national projects like the Open Science Grid and Cancer Genomics Hub that are creating cyberinfrastructure to enable data-intensive scientific discovery.
Big Thought Leaders Colloquium Series – Spring 2017 Jackson State University Jackson, MS April 11, 2017
Invited Remote Keynote High Performance Computing Initiative California State University San Bernardino March 18, 2022
Invited Seminar National Center for Supercomputing Applications University of Illinois Urbana-Champaign May 9, 2024
The Pacific Research Platform will create a regional "Big Data Freeway System" along the West Coast to support science. It will connect major research institutions with high-speed optical networks, allowing them to share vast amounts of data and computational resources. This will enable new forms of collaborative, data-intensive research for fields like particle physics, astronomy, biomedicine, and earth sciences. The first phase aims to establish a basic networked infrastructure, with later phases advancing capabilities to 100Gbps and beyond with security and distributed technologies.
Super Computing Asia (SCA21) Singapore March 2-4, 2021
05.10.24 Invited Talk IEEE Orange County Computer Society Title: The OptIPuter Project: From the Grid to the LambdaGrid Irvine, CA
The document discusses the Pacific Research Platform (PRP), a regional big data cyberinfrastructure connecting researchers across California universities. PRP provides high-speed networks and data transfer nodes to enable sharing of large datasets for projects like medical imaging, cryo-electron microscopy, and machine learning. Recent grants are expanding PRP to add GPUs and non-von Neumann processors to support these computationally intensive applications.
This document provides an overview of the development of national research platforms (NRPs) from 1985 to the present, with a focus on the Pacific Research Platform (PRP). It describes the evolution of the PRP from early NSF-funded supercomputing centers to today's distributed cyberinfrastructure utilizing optical networking, containers, Kubernetes, and distributed storage. The PRP now connects over 15 universities across the US and internationally to enable data-intensive science and machine learning applications across multiple domains. Going forward, the document discusses plans to further integrate regional networks and partner with new NSF-funded initiatives to develop the next generation of NRPs through 2025.
Opening Presentation Pacific Research Platform Workshop Calit2’s Qualcomm Institute University of California, San Diego October 14, 2015
Remote Briefing to the Ad Hoc Big Data Task Force of the NASA Advisory Council Science Committee NASA Goddard Space Flight Center June 28, 2016
The Pacific Research Platform (PRP) connects research institutions across the western United States with high-speed networks to enable data-intensive science collaborations. Key points: - The PRP connects 15 campuses across California and links to the Great Plains Network, allowing researchers to access remote supercomputers, share large datasets, and collaborate on projects like analyzing data from the Large Hadron Collider. - The PRP utilizes Science DMZ architectures with dedicated data transfer nodes called FIONAs to achieve high-speed transfer of large files. Kubernetes is used to manage distributed storage and computing resources. - Early applications include distributed climate modeling, wildfire science, plankton imaging, and cancer genomics. The PR
Presentation to Society of Automotive Engineers (SAE) Committee on Integrated Vehicle Health Management Stanford University September 12, 2017
05.10.13 Invited Talk Oak Ridge National Laboratory Title: Blowing up the Box--the Emergence of the Planetary Computer Oak Ridge, TN