This document provides an overview of the Pacific Research Platform (PRP) after two years of operation. It describes several science drivers that are using the PRP, including biomedical research on cancer genomics and microbiomes, earth sciences like earthquake modeling, and astronomy. It highlights how the PRP is connecting sites like UC San Diego, UC Santa Cruz, UC Berkeley to share and analyze large datasets using high-speed networks. The PRP is expanding to support new areas like deep learning, cultural heritage projects, and connecting additional UC campuses through network upgrades.
This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.
1) Scientists at the Advanced Photon Source use the Argonne Leadership Computing Facility for data reconstruction and analysis from experimental facilities in real-time or near real-time. This provides feedback during experiments. 2) Using the Swift parallel scripting language and ALCF supercomputers like Mira, scientists can process terabytes of data from experiments in minutes rather than hours or days. This enables errors to be detected and addressed during experiments. 3) Key applications discussed include near-field high-energy X-ray diffraction microscopy, X-ray nano/microtomography, and determining crystal structures from diffuse scattering images through simulation and optimization. The workflows developed provide significant time savings and improved experimental outcomes.
The document discusses how new supercomputing applications are increasingly focused on "logistical" issues like executing many communication-intensive tasks over large shared datasets, rather than "heroic" computations of a single task. It argues that new programming models and tools are needed to efficiently manage large numbers of tasks, complex data dependencies, and failures at extreme scales of petascale and exascale computers. Examples of applications that could benefit include parameter studies, ensemble simulations, data analysis, and scientific workflows involving millions of tasks.
11.05.13 Invited Presentation Sanford Consortium for Regenerative Medicine Salk Institute, La Jolla Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2 Title: High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research
We present the results of a high-throughput, first principles search for topological materials based on identifying materials with band inversion induced by spin-orbit coupling. Out of the currently available 30000 materials in our database, we investigate more than 4507 non-magnetic materials having heavy atoms and low bandgaps. We compute the spillage between the spin-orbit and non-spin-orbit wave functions, resulting in more than 1699 high-spillage candidate materials. We demonstrate that in addition to Z2 topological insulators, this screening method successfully identifies many semimetals and topological crystalline insulators. Our approach is applicable to the investigation of disordered or distorted materials, because it is not based on symmetry considerations, and it can be extended to magnetic materials. After our first screening step, we use Wannier-interpolation to calculate the topological invariants and to search for band crossings in our candidate materials. We discuss some individual example materials, as well as trends throughout our dataset, that is available at JARVIS-DFT website: http://jarvis.nist.gov
[A talk presented at Oak Ridge National Laboratory on October 15, 2015] We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers. I use examples from Globus and other projects to demonstrate what can be achieved.
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science. "Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale." Watch the video: https://wp.me/p3RLHQ-kLV Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This document summarizes several projects from Anubhav Jain at Lawrence Berkeley National Laboratory related to using artificial intelligence and data mining for materials science. It discusses (1) developing interpretable descriptors of crystal structure based on local environments, (2) the matminer toolkit for connecting materials data to machine learning algorithms, and (3) the atomate/Rocketsled software for running high-throughput density functional theory calculations on supercomputers. It also briefly outlines a project to develop a text mining database for materials science literature.
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
Big Data Tech Forum: Big Data Enabling Technologies and Applications San Diego Chinese American Science and Engineering Association (SDCASEA) Sanford Consortium La Jolla, CA December 2, 2017
Xiaolin (Andy) Li, University of Florida, Presentation at Cognitive Systems Institute Group April 28, 2016
Polar Deep Insights with Domain Discovery and Sparkler (Spark Crawler). Presented at EarthCube All Hands Meeting 2017! #ECAHM2017 #USCDataScience #IRDS