We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service. We use a Jupyter notebook to demonstrate automation of file transfers and permissions management on shared datasets. We also provide a brief introduction to the Globus platform-as-a-service for developers, with emphasis on understanding the security model; and will demonstrate how to access Globus services via APIs for integration with custom research applications.
Presented at a workshop at Oak Ridge National Laboratory on June 23, 2022.
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)
This document discusses various ways to automate research data workflows using Globus. It describes automating regular data transfers through recurring tasks scheduled with sync options. It also discusses staging data automatically as part of compute jobs by adding directives to job scripts. The document outlines how applications can programmatically submit transfers when users complete tasks. It provides an overview of relevant Globus platform capabilities for authentication, authorization, and automation using the Globus CLI and SDK.
Automating Research Data Workflows (GlobusWorld Tour - STFC)
This document discusses automating research data workflows using Globus capabilities. It covers automating data replication and recurring transfers, staging data with compute jobs, and application-driven automation. Relevant Globus platform capabilities for automation include Globus Auth for native apps, refresh tokens, and the Globus command line interface (CLI) for tasks like batch transfers, permission management, and parsing CLI output. Scripting automation with the CLI or portals is also discussed.
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
This document discusses automating data transfers with the Globus CLI and relevant Globus platform capabilities. It provides examples of using the CLI to search for endpoints, list contents, transfer files and directories in batches, manage notifications, and set permissions. It also discusses using the CLI in job submission scripts and how portals can automate transfers by storing refresh tokens to act on behalf of users. Relevant code examples and support resources are referenced.
We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service.
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
We describe how developers can use Globus APIs to integrate robust data management capabilities into their research applications. We also demonstrate the new Globus portal framework that can be used in conjunction with the Globus Search service to simplify data search and discovery.
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
The document discusses how the Globus platform can be leveraged to build science gateways, web portals, and other applications. It provides examples of how the Globus Auth, APIs, and Connect services can be used to enable authentication, file transfer, and data sharing. The Globus Python SDK and helper pages are also described as tools for developing applications that integrate Globus functionality.
Jupyter + Globus: The Foundation for Interactive Data Science
This tutorial from the Gateways 2018 conference in Austin, TX showed participants how Globus may be used in conjunction with the Jupyter platform to open up new avenues—and new data sources--for interactive data science.
We present an overview of Globus services for automating research computing and data management tasks, to accelerate research process throughput. This content is aimed at researchers who wish to automate repetitive data management tasks (such as backup and data distribution to collaborators), as well as those working with instruments (cryoEM, next-gen sequencers, fMRI, etc.) who wish to streamline data egress, downstream analysis, and sharing at scale.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Instrument Data Orchestration with Globus Search and Flows
This document discusses various Globus services for instrument data orchestration including the Timer service, platform services, authentication, search, transfer, flows, and the upcoming Trigger service. The Timer service allows for scheduled and recurring transfers. Platform services provide comprehensive data and compute orchestration. Authentication is handled by Globus Auth. Search allows for data description and discovery. Transfer shares and moves data. Flows automate distributed research tasks. Triggers will start flows based on events.
We review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system. You will experiment with installing Globus Connect Server, and configuring basic options on the endpoint.
Globus Compute wth IRI Workflows - GlobusWorld 2024
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User Endpoints
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdf
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good Practices
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
1) The Globus platform provides services for fast and reliable data transfer, sharing, and file management directly from storage systems via software-as-a-service using existing identities.
2) Globus can be used as a platform for building science gateways, portals and other web applications in support of research through APIs for user authentication, file transfer, and sharing capabilities.
3) The document provides an introduction to the Globus platform and its capabilities including code samples and walks through using the APIs via a Jupyter notebook to search for endpoints, manage files and tasks, and integrate Globus into other applications.
Introduction to the Globus Platform (APS Workshop)Globus
This document discusses the Globus Platform Services API and SDK. It provides an overview of the Globus Auth API for user authentication and file sharing capabilities. It also summarizes the Globus Transfer API and Python SDK for integrating file transfer and access management into applications. Several methods for tasks like endpoint search, file operations, task submission and management are covered at a high level.
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
The document discusses automating research data workflows using the Globus Command Line Interface (CLI). Key points include:
1) The CLI can be used to automate recurring transfers, stage data in/out of compute jobs, and allow applications to submit transfers on a user's behalf by using access tokens.
2) Refresh tokens allow applications to get new access tokens without a user logged in, by storing and exchanging refresh tokens.
3) Examples of automation include syncing directories with scripts, staging data in shared directories, and removing directories after transfer with Python scripts.
4) Resources for support include Globus documentation, sample code, and professional services for custom application development and integration.
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)Globus
This document discusses various ways to automate research data workflows using Globus. It describes automating regular data transfers through recurring tasks scheduled with sync options. It also discusses staging data automatically as part of compute jobs by adding directives to job scripts. The document outlines how applications can programmatically submit transfers when users complete tasks. It provides an overview of relevant Globus platform capabilities for authentication, authorization, and automation using the Globus CLI and SDK.
Automating Research Data Workflows (GlobusWorld Tour - STFC)Globus
This document discusses automating research data workflows using Globus capabilities. It covers automating data replication and recurring transfers, staging data with compute jobs, and application-driven automation. Relevant Globus platform capabilities for automation include Globus Auth for native apps, refresh tokens, and the Globus command line interface (CLI) for tasks like batch transfers, permission management, and parsing CLI output. Scripting automation with the CLI or portals is also discussed.
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)Globus
This document discusses automating data transfers with the Globus CLI and relevant Globus platform capabilities. It provides examples of using the CLI to search for endpoints, list contents, transfer files and directories in batches, manage notifications, and set permissions. It also discusses using the CLI in job submission scripts and how portals can automate transfers by storing refresh tokens to act on behalf of users. Relevant code examples and support resources are referenced.
We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service.
Tutorial: Automating Research Data WorkflowsGlobus
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
Working with Globus Platform Services and PortalsGlobus
We describe how developers can use Globus APIs to integrate robust data management capabilities into their research applications. We also demonstrate the new Globus portal framework that can be used in conjunction with the Globus Search service to simplify data search and discovery.
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Globus
The document discusses how the Globus platform can be leveraged to build science gateways, web portals, and other applications. It provides examples of how the Globus Auth, APIs, and Connect services can be used to enable authentication, file transfer, and data sharing. The Globus Python SDK and helper pages are also described as tools for developing applications that integrate Globus functionality.
Jupyter + Globus: The Foundation for Interactive Data ScienceGlobus
This tutorial from the Gateways 2018 conference in Austin, TX showed participants how Globus may be used in conjunction with the Jupyter platform to open up new avenues—and new data sources--for interactive data science.
Introduction to Research Automation with GlobusGlobus
We present an overview of Globus services for automating research computing and data management tasks, to accelerate research process throughput. This content is aimed at researchers who wish to automate repetitive data management tasks (such as backup and data distribution to collaborators), as well as those working with instruments (cryoEM, next-gen sequencers, fMRI, etc.) who wish to streamline data egress, downstream analysis, and sharing at scale.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Instrument Data Orchestration with Globus Search and FlowsGlobus
This document discusses various Globus services for instrument data orchestration including the Timer service, platform services, authentication, search, transfer, flows, and the upcoming Trigger service. The Timer service allows for scheduled and recurring transfers. Platform services provide comprehensive data and compute orchestration. Authentication is handled by Globus Auth. Search allows for data description and discovery. Transfer shares and moves data. Flows automate distributed research tasks. Triggers will start flows based on events.
Introduction to Globus for System AdministratorsGlobus
We review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system. You will experiment with installing Globus Connect Server, and configuring basic options on the endpoint.
Similar to Automating Research Data Flows and an Introduction to the Globus Platform (20)
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
Ansys Mechanical enables you to solve complex structural engineering problems and make better, faster design decisions. With the finite element analysis (FEA) solvers available in the suite, you can customize and automate solutions for your structural mechanics problems and parameterize them to analyze multiple design scenarios. Ansys Mechanical is a dynamic tool that has a complete range of analysis tools.
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...onemonitarsoftware
Unlock the full potential of mobile monitoring with ONEMONITAR. Our advanced and discreet app offers a comprehensive suite of features, including hidden call recording, real-time GPS tracking, message monitoring, and much more.
Perfect for parents, employers, and anyone needing a reliable solution, ONEMONITAR ensures you stay informed and in control. Explore the key features of ONEMONITAR and see why it’s the trusted choice for Android device monitoring.
Share this infographic to spread the word about the ultimate mobile spy app!
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdfTrackobit
What do fleet managers do? What are their duties, responsibilities, and challenges? And what makes a fleet manager effective and successful? This blog answers all these questions.
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.bhatinidhi2001
CViewSurvey is a SaaS-based Web & Mobile application that provides digital transformation to traditional paper surveys and feedback for customer & employee experience, field & market research that helps you evaluate your customer's as well as employee's loyalty.
With our unique C.A.A.G. Collect, Analysis, Act & Grow approach; business & industry’s can create customized surveys on web, publish on app to collect unlimited response & review AI backed real-time data analytics on mobile & tablets anytime, anywhere. Data collected when offline is securely stored in the device, which syncs to the cloud server when connected to any network.
React Native vs Flutter - SSTech SystemSSTech System
Your project needs and long-term objectives will ultimately choose which of React Native and Flutter to use. For applications using JavaScript and current web technologies in particular, React Native is a mature and trustworthy choice. For projects that value performance and customizability across many platforms, Flutter, on the other hand, provides outstanding performance and a unified UI development experience.
Cultural Shifts: Embracing DevOps for Organizational TransformationMindfire Solution
Mindfire Solutions specializes in DevOps services, facilitating digital transformation through streamlined software development and operational efficiency. Their expertise enhances collaboration, accelerates delivery cycles, and ensures scalability using cloud-native technologies. Mindfire Solutions empowers businesses to innovate rapidly and maintain competitive advantage in dynamic market landscapes.
Software development... for all? (keynote at ICSOFT'2024)miso_uam
Our world runs on software. It governs all major aspects of our life. It is an enabler for research and innovation, and is critical for business competitivity. Traditional software engineering techniques have achieved high effectiveness, but still may fall short on delivering software at the accelerated pace and with the increasing quality that future scenarios will require.
To attack this issue, some software paradigms raise the automation of software development via higher levels of abstraction through domain-specific languages (e.g., in model-driven engineering) and empowering non-professional developers with the possibility to build their own software (e.g., in low-code development approaches). In a software-demanding world, this is an attractive possibility, and perhaps -- paraphrasing Andy Warhol -- "in the future, everyone will be a developer for 15 minutes". However, to make this possible, methods are required to tweak languages to their context of use (crucial given the diversity of backgrounds and purposes), and the assistance to developers throughout the development process (especially critical for non-professionals).
In this keynote talk at ICSOFT'2024 I presented enabling techniques for this vision, supporting the creation of families of domain-specific languages, their adaptation to the usage context; and the augmentation of low-code environments with assistants and recommender systems to guide developers (professional or not) in the development process.
COMPSAC 2024 D&I Panel: Charting a Course for Equity: Strategies for Overcomi...Hironori Washizaki
Hironori Washizaki, "Charting a Course for Equity: Strategies for Overcoming Challenges and Promoting Inclusion in the Metaverse", IEEE COMPSAC 2024 D&I Panel, 2024.
What is OCR Technology and How to Extract Text from Any Image for FreeTwisterTools
Discover the fascinating world of Optical Character Recognition (OCR) technology with our comprehensive presentation. Learn how OCR converts various types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data. Dive into the history, modern applications, and future trends of OCR technology. Get step-by-step instructions on how to extract text from any image online for free using a simple tool, along with best practices for OCR image preparation. Ideal for professionals, students, and tech enthusiasts looking to harness the power of OCR.
React and Next.js are complementary tools in web development. React, a JavaScript library, specializes in building user interfaces with its component-based architecture and efficient state management. Next.js extends React by providing server-side rendering, routing, and other utilities, making it ideal for building SEO-friendly, high-performance web applications.
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Automating Research Data Flows and an Introduction to the Globus Platform
1. Automating Research Data Flows and an
Introduction to the Globus Platform
Greg Nawrocki
greg@globus.org
June 23, 2022
2. Globus Automation Capabilities
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
3. Globus Auth: Foundational IAM service
• Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• Support high assurance service for use with
protected data (e.g. HIPAA protected data)
5
4. Securing Apps with Globus Auth
• Native App (with refresh tokens – extend expiration)
– Authentication as user identity
– Authentication URL / come back with a auth code – exchanged for tokens
– Clients can’t keep a secret - tokens in plain text
– Jupyter Notebook examples / Timer Service
• Auth Code Grant – Templated App
– Authentication as user identity
– Browser redirect to Globus Auth, auth code returned (no manual copy)
– Tokens stored securely
– CLI / Jupyter Hub secured with Globus Auth
• Confidential Client:
– Authentication as application
– ClientID and Secret stored securely
– Custom apps
– developers.globus.org - Client ID / Secret / Client Identity Username
6
6. Use case: Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
12
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
7. The Globus Timer service
• Scheduled/recurring file transfers
• Supports all Globus transfer and sync options
• Service accessible via web app and CLI
• Example: NIH – hpc.nih.gov/storage/globus_cron.html
13
8. Using the Globus Timer service
14
$ globus–timer session {login, logout, whoami}
$ globus–timer job transfer
--name example–job
--label "Timer Transfer Job"
--interval 28800
--start '2020–01–01T12:34:56'
--source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec
--dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec
--item ~/file1.txt ~/new_file1.txt false
--item ~/file2.txt ~/new_file2.txt false
11. Globus Command Line Interface
Open source, uses
the Python SDK
Because of this
correspondence the CLI is
an excellent tool for getting
the gist of how he SDK
functions.
Very scriptable!
12. Globus CLI
• It’s a stand alone application distributed by Globus
– https://docs.globus.org/cli/
– https://github.com/globus/globus-cli
• Easy install and updates
• Command “globus login” gets access tokens
• All interactions with the service use the tokens
– The CLI is acting as you (your identity)
• Command “globus logout” deletes those
13. CLI Basics – “globus” is the executable
$ globus endpoint search 'Globus Tutorial'
$ globus task list
$ globus get-identities greg@globus.org --verbose
• Getting help / list of commands
– globus list-commands
– globus –help
– https://docs.globus.org/cli/examples/
• UUIDs for endpoint, task, user identity, groups…
• Can query to discover the UUIDs
– Use search / list / get options
14. Use case: Sharing out data
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
Instrument
Globus
controls
access to
shared files on
existing
storage; no
need to move
files to cloud
storage!
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
Personal Computer
Transfer
Share
Compute Facility
Globus transfers files
reliably, securely
15. Step 1: Transfer
• Using a Batch Transfer
– Transfer tasks have one source/destination, but can have any
number of files
– Provide input source-dest pairs via local file
– File may have embedded comments
$ export ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ export ep2=af7bda53-6d04-11e5-ba46-22000b92c6ec
$ globus transfer $ep1:/share/godata/ $ep2:/~/automation-example-
data-share/outbound/Vas --label "Files to share" --batch
/Users/gregnawrocki/Greg/Globus/Demos/ornl20220623/files.txt
# this is the contents of files.txt:
# a list of source paths followed by destination paths
file1.txt file1.txt
# file2.txt file2.txt # inline-comments are also allowed
file3.txt file3.txt
16. Step 2: Share - Set permissions
• Set and manage permissions on guest collection
• Requires access manager role
$ globus endpoint search 'automation-example-data-share’
$ export share=bf4445c4-ecfd-11ec-aed7-6f7c2b57b05c
$ globus endpoint permission create --permissions r --identity
vas@uchicago.edu $share:/Vas/
17. Useful submission commands
• Safe resubmissions
– Applies to all tasks (transfer and delete)
– Get a task UUID, use that in submission
– $ globus task generate-submission-id
– --submission-id option in transfer
– Useful for lazy branching or when dealing with unreliable
networks
• Task wait
– useful for scripting conditionals on transfer task status
18. Parsing CLI output
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --format json
$ globus endpoint search --filter-scope my-endpoints --jmespath
'DATA[].[id, display_name]'
• Default output is text; for JSON output use --format
json
• Extract specific attributes using --jmespath
<expression>
19. Managing notifications
• Turn off emails sent for tasks
• Useful when an application manages tasks for a user
• Disable notifications with the --notify option
--notify off (all notifications)
--notify succeeded|failed|inactive (select notifications)
21. Managed automation of tasks
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
22. Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
23. Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
28. Extending the ecosystem: Action providers
35
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
30. 37
Custom portals? Science Gateways? Unique workflows? Our open
REST APIs and Python SDK empower you to create an integrated
ecosystem of research data services and applications.
32. App Access to Collections
• Globus Transfer – Authentication with access tokens
– Individual: Globus login to get tokens
– Application: Apps are people too!
o developers.globus.org - Client ID / Secret / Client Identity Username
• Collection access
– GCSv4 Mapped Collections – user consent
o Activating endpoint means binding a credential to an endpoint for login
– GCSv5 Mapped Collections (no user certificates, OAUTH tokens and consents)
o https://docs.globus.org/globus-connect-server/v5.4/use-client-credentials/
o Request the data_access scope (per collection) to be able to access the collection.
o The storage gateway must permit identities from the 'clients.auth.globus.org' identity domain
o Identity Mapping Policy that maps the ‘UUID@clients.auth.globus.org' identity to a valid local user
– Guest Collections
o Guest Collections auto-activate - need to do this before API calls to endpoints
o Use Guest Collections whenever possible
– Remember to set your ACLs (WebApp)
Automation
34. Globus APIs
• Auth
• Groups
• Transfer
• Search
• Timer
• Flows
• GCS Manager
• Globus Web App consumes public
Transfer API
• Resource named by URL (standard
REST approach)
• Globus APIs use JSON for documents
docs.globus.org/api/transfer
35. Globus Python SDK
• Python client library for the Globus REST APIs
• Largely direct mapping to REST API
• globus_sdk.TransferClient class handles
connection management, security, framing,
marshaling
globus-sdk-python.readthedocs.io/en/stable/
globus.github.io/globus-sdk-python
42
36. TransferClient higher-level calls
• One method for each API resource and HTTP verb
• Largely direct mapping to REST API
endpoint_search(filter_fulltext=None,
filter_scope=None,
num_results=25,
**params)
44
37. Synchronous Tasks
• Endpoint search (with scopes)
• List directory contents (ls)
• Make directory (mkdir)
• Rename
• Note:
– Path encoding & UTF gotchas
– Don’t forget to auto-activate first
46
38. Asynchronous Tasks
• Transfer
– Sync level option
• Delete
• Get submission_id, followed by submit
– Once and only once submission
• Use task id to “follow up”
47
39. The Globus API / SDK with a Jupyter Notebook in a
Jupyter Hub – Auth Code Grant
login
REST APIs
{ “tokens”:…
{“tokens”:…
REST APIs
REST APIs
Bearer a45cd…
40. Walkthrough API with our Jupyter Hub
• https://jupyter.demo.globus.org
– Sign in with Globus
– Verify the consents
– Start My Server (this will take about a minute)
– Open folder: globus-jupyter-notebooks
– Run Platform_Introduction_JupyterHub_Auth.ipynb
• If you mess it up and want to “go back to the beginning”
– Just stop and restart the server
• If you want to use the notebook outside of our hub
– https://github.com/globus/globus-jupyter-notebooks
– Authentication is a manual cut and paste of exchanging the
authorization code for an access token – Native App
49
42. Developer References
• Globus API / SDK Documentation
– Transfer API : docs.globus.org/api/transfer/
– SDK: globus-sdk-python.readthedocs.io/en/stable/
• Globus GitHub: github.com/globus/
– Jupyter Notebooks
o Stand alone notebooks and hub integrations that walk through much of the
functionality of our SDK
o https://github.com/globus/globus-jupyter-notebooks
– Automation Examples
o Shell scripted CLI and Python module examples of common research data
management use cases
o https://github.com/globus/automation-examples