We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data. Presented at the Internet2 booth at SC19.
This session will describe how members of the US Large Hadron Collider (LHC) community have benchmarked the usage of Amazon Elastic Compute Cloud (Amazon EC2) resource to simulate events observed by experiments at the European Organization for Nuclear Research (CERN). Miron Livny from the University of Wisconsin-Madison who has been collaborating with the US-LHC community for more than a decade will detail the process for benchmarking high-throughput computing (HTC) applications running across multiple AWS regions using the open source HTCondor distributed computing software. The presentation will also outline the different ways that AWS and HTCondor can help meet the needs of compute intensive applications from other scientific disciplines.
This document outlines a presentation on analyzing large raster data in a Jupyter notebook with GeoPySpark on AWS. The presentation covers introductory material, exercises on working with land cover and Landsat imagery data, combining data layers to detect crop cycles, and combining different data types to create maps. It discusses where the notebooks are running, data sources, and GeoPySpark capabilities like working with space-time raster data. Attendees are encouraged to tweet maps created during the exercises.
The document discusses the CERN OpenStack cloud, which provides compute resources for the Large Hadron Collider experiment at CERN. It details the scale of the cloud, including over 6,700 hypervisors, 190,000 cores, and 20,000 VMs. It also describes the various use cases served, wide range of hardware, and operations of the cloud, including a retirement campaign and network migration to Neutron.
A talk by Rob Emanuele given at FedGeoDay 2016 about using GeoMesa, GeoWave, and GeoTrellis to work with geospatial data on Apache Spark and Accumulo.
This document summarizes an international collaboration between the National Computational Infrastructure (NCI) in Australia and A*Star in Singapore to accelerate DNA analysis. The collaboration utilizes trans-Pacific extended InfiniBand networks and supercomputers to: 1) Transfer large genetic sequence datasets from NCI in Canberra to A*Star in Singapore for analysis on the A*Star Aurora system and return results. 2) Utilize NCI's InfiniCloud HPC system for visualization of genetic data results produced by Aurora. 3) Demonstrate long distance high-speed data transfers between Australia and Singapore leveraging extended InfiniBand networks.
Tim Bell from CERN gave a presentation on "Understanding the Universe through Clouds" at OpenStack UK Days on September 26th, 2017. Some key points: - CERN operates one of the world's largest private OpenStack clouds to support the Large Hadron Collider, with over 8000 hypervisors and 33,000 VMs. - The Worldwide LHC Computing Grid distributes and analyzes LHC data across 600 PB of storage and 750k CPU cores at 170 sites in 42 countries. - CERN has been an early adopter of OpenStack technologies like Nova, Glance, Horizon, and Neutron since 2011 and contributes code back to the community. - New services like Mag
ISC Cloud‘13, Heidelberg (Germany) Sep. 23-24th, 2013 A. Gómez, L.M. Carril, R. Valin, J.C. Mouriño, C. Cotelo
The document discusses OpenStack at CERN. It provides details on: - OpenStack has been in production at CERN for 3 years, managing over 190,000 cores and 7,000 hypervisors. - Major cultural and technology changes were required and have been successfully addressed to transition to OpenStack. - Contributing back to the upstream OpenStack community has led to sustainable tools and effective technology transfer.
This document summarizes Tim Bell's presentation on OpenStack at CERN. It discusses how CERN adopted OpenStack in 2011 to manage its growing computing infrastructure needs for processing massive data sets from the Large Hadron Collider. OpenStack has since been scaled up to manage over 300,000 CPU cores and 500,000 physics jobs per day across CERN's private cloud. The document also briefly outlines CERN's use of other open source technologies like Ceph and Kubernetes.
This document discusses OpenStack cloud computing at CERN. It notes that CERN has 4 OpenStack clouds with over 120,000 cores total, and is migrating to the Kilo release of OpenStack. It then describes OpenStack components like Keystone for authentication, Glance for images, Nova for compute, and Cinder for block storage. The document outlines how OpenStack supports federated identity through options like Active Directory, OpenID Connect, and SAML. It provides examples of how federation could allow access to external clouds and shares experiences in deploying federated OpenStack.
European XFEL are the creators of the strongest x-ray beam in the world. Their 3.4-km long X-ray free-electron laser underground tunnel is used by researchers from around the world. Scientists use their facilities to map atomic details of viruses, film chemical reactions, and study the processes in the interior of planets. Discover how European XFEL uses InfluxDB to monitor their scientific experiments and research. In this webinar, Alessandro Silenzi will dive into: European XFEL’s approach to empowering the worldwide community to push the boundaries of science The evolution of their data management solution — from homegrown to InfluxDB How a time series platform is used to analyze and validate experiment data
CERN operates the largest particle physics laboratory in the world. It manages over 8,000 servers to support its research. In 2012, CERN recognized limits with its existing infrastructure management tools and formed a team to define a new "Agile Infrastructure Project." The project goals were to improve resource provisioning time, enable cloud interfaces, improve monitoring and accounting, and boost efficiency. The team adopted open source tools like OpenStack, Puppet, and Ceph to create a new cloud service spanning two data centers. This allowed on-demand provisioning in minutes versus months and helped CERN better support its expanding computing needs for research.
MACPAC is a federal legislative branch agency tasked with reviewing state and federal Medicaid and Children's Health Insurance Program (CHIP) access and payment policies and making recommendations to Congress. By March 15 and again by June 15 each year, the agency produces a comprehensive report for Congress that compiles results from Medicaid and CHIP data sources for the 50 states and territories. The CIO of MACPAC wanted a secure, cost-effective, high performance platform that met their needs to crunch this large amount of health data. In this session, learn how MACPAC and 8KMiles helped set up the agency’s Big Data/HPC analytics platform on AWS using SAS analytics software.
CERN operates the largest machine on Earth, the Large Hadron Collider (LHC), which produces over 1 billion collisions per second and records over 0.5 petabytes of data per day. CERN relies heavily on OpenStack, with over 190,000 CPU cores and 5,000 VMs under OpenStack management, accounting for over 90% of CERN's computing resources. CERN plans to add over 100,000 more CPU cores in the next 6 months and explores using public clouds and containers to help process the massive amount of data generated by the LHC.
The document discusses the evolution of Ceilometer, an OpenStack project that collects measurements from deployed clouds and persists the data for later retrieval and analysis. It describes how Ceilometer has scaled out its data collection capabilities over time by adding agents, partitioning workloads, and integrating with Gnocchi to provide more efficient time-series storage. The document also provides best practices for Ceilometer deployment and configuration to optimize data collection, storage and querying.
This document describes XeMPUPiL, a performance-aware power capping orchestrator for the Xen hypervisor. It aims to maximize performance under a power cap using a hybrid approach. The key challenges addressed are instrumentation-free workload monitoring and balancing hardware and software power management techniques. Experimental results show XeMPUPiL outperforms a pure hardware approach for I/O, memory, and mixed workloads by better balancing efficiency and timeliness. Future work includes integrating the orchestrator logic into the scheduler and exploring new resource assignment policies.
CERN is the home of the Large Hadron Collider (LHC), a 27km circular proton accelerator that generates petabytes of physics data every year. To process all this data, CERN runs an OpenStack Cloud (>300K cores) that helps scientists all around the world to unveil the mysteries of the Universe. The Infrastructure is also used to run all the IT services of the Organization. Delivering these services, with high performance and reliable service levels has been one of the major challenges for the CERN Cloud engineering team. We have been constantly iterating the architecture and deployment model of the Cloud control plane. In this presentation we will describe the different control plane architecture models that we relied over the years. Finally, we will describe all the work done to move the OpenStack Cloud control plane from VMs into a kubernetes cluster. We will report about our experience running this architecture at scale, its advantages and challenges.
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
The document discusses the need for a new generation of cyberinfrastructure to support interactive global earth observation. It outlines several prototyping projects that are building examples of systems enabling real-time control of remote instruments, remote data access and analysis. These projects are driving the development of an emerging cyber-architecture using web and grid services to link distributed data repositories and simulations.
Talk delivered at Free and Open Source Software for Geo North America 2019 (FOSS4GNA) Large scale solar arrays or farms have been installed globally faster than can be reliably tracked by interested stakeholders. We have built a deep learning model with Sentinel 2 satellite imagery that allows us to create accurate, timely global maps of solar farms.
The document summarizes Dr. Larry Smarr's presentation on the Pacific Research Platform (PRP) and its role in working toward a national research platform. It describes how PRP has connected research teams and devices across multiple UC campuses for over 15 years. It also details PRP's innovations like Flash I/O Network Appliances (FIONAs) and use of Kubernetes to manage distributed resources. Finally, it outlines opportunities to further integrate PRP with the Open Science Grid and expand the platform internationally through partnerships.
Our research group is investigating how to leverage Apache Spark (batch, streaming & real-time) to analyse current and future data sets in astronomy. Among the future large experiments, the Large Synoptic Survey Telescope (LSST) will start soon collecting terabytes of data per observation night, and the efficient processing and analysis of both real-time and historical data remains a major challenge. In this talk we will expose the main challenges and explore the latest developments tailored for big data problems in astronomy. On the one hand we designed a new Data Source API extension to natively manipulate telescope images and astronomical tables within Apache Spark. We then extended the functionalities of the Apache Spark SQL module to ease the manipulation of 3D data sets and perform efficient queries: partitioning, data sets join and cross-match, nearest neighbors search, spatial queries, and more. On the other hand we are using the new possibilities offered by Structured Streaming APIs in recent Apache Spark versions to enable real-time decisions by rapidly accessing and analysing the alerts sent by telescopes every night. Given the unprecedented precision of next generation of telescopes, the streams of alerts will be made of millions of alerts per night, and relying on Structured Streaming is a guarantee of not missing the latest Black Hole event in a sea of data! We will also share active learning developments used on top to improve real-time event selection and classification for the LSST telescope. You will walk away with an understanding of modern challenges in astronomy, appreciate some beautiful night skies, and how Apache Spark can help pushing further the frontiers of Science!
05.03.05 Invited Talk to the SIO Council Title: The Emerging Cyberinfrastructure for Earth and Ocean Sciences La Jolla, CA
The physicists at CERN are increasingly turning to Spark to process large physics datasets in a distributed fashion with the aim of reducing time-to-physics with increased interactivity. The physics data itself is stored in CERN’s mass storage system: EOS and CERN’s IT department runs on-premise private cloud based on OpenStack as a way to provide on-demand compute resources to physicists. This provides both opportunity and challenges to Big Data team at CERN to provide elastic, scalable, reliable spark-as-a-service on OpenStack. The talk focuses on the design choices made and challenges faced while developing spark-as-a-service over kubernetes on openstack to simplify provisioning, automate management, and minimize the operating burden of managing Spark Clusters. In addition, the service tooling simplifies submitting applications on the behalf of the users, mounting user-specified ConfigMaps, copying application logs to s3 buckets for troubleshooting, performance analysis and accounting of spark applications and support for stateful spark streaming applications. We will also share results from running large scale sustained workloads over terabytes of physics data.
In this deck from the 2017 Argonne Training Program on Extreme-Scale Computing, Rupak Biswas from NASA presents: NASA Advanced Computing Environment for Science & Engineering. ""High performance computing is now integral to NASA’s portfolio of missions to pioneer the future of space exploration, accelerate scientific discovery, and enable aeronautics research. Anchored by the Pleiades supercomputer at NASA Ames Research Center, the High End Computing Capability (HECC) Project provides a fully integrated environment to satisfy NASA’s diverse modeling, simulation, and analysis needs. In addition, HECC serves as the agency’s expert source for evaluating emerging HPC technologies and maturing the most appropriate ones into the production environment. This includes investigating advanced IT technologies such as accelerators, cloud computing, collaborative environments, big data analytics, and adiabatic quantum computing. The overall goal is to provide a consolidated bleeding-edge environment to support NASA's computational and analysis requirements for science and engineering applications." Dr. Rupak Biswas is currently the Director of Exploration Technology at NASA Ames Research Center, Moffett Field, Calif., and has held this Senior Executive Service (SES) position since January 2016. In this role, he in charge of planning, directing, and coordinating the technology development and operational activities of the organization that comprises of advanced supercomputing, human systems integration, intelligent systems, and entry systems technology. The directorate consists of approximately 700 employees with an annual budget of $160 million, and includes two of NASA’s critical and consolidated infrastructures: arc jet testing facility and supercomputing facility. He is also the Manager of the NASA-wide High End Computing Capability Project that provides a full range of advanced computational resources and services to numerous programs across the agency. In addition, he leads the emerging quantum computing effort for NASA. Watch the video: https://wp.me/p3RLHQ-hua Learn more: https://extremecomputingtraining.anl.gov/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science. "Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale." Watch the video: https://wp.me/p3RLHQ-kLV Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
10.02.22 Invited talk Symposium #1610, How Computational Science Is Tackling the Grand Challenges Facing Science and Society Title: Science and Cyberinfrastructure in the Data-Dominated Era San Diego, CA