This document introduces the Globus platform for automating research data flows. It describes Globus capabilities for scheduled transfers, command line scripting, and comprehensive task orchestration. It also covers Globus Auth for identity and access management, securing apps with Globus Auth, and the Globus Timer Service. The Globus command line interface and Python SDK allow for programmatic access and automation of data transfers and other tasks.
Report
Share
Report
Share
1 of 41
Download to read offline
More Related Content
Similar to Automating Research Data Flows and Introduction to the Globus Platform
Using Globus to Streamline Research at ScaleGlobus
We provide an overview of the various Globus capabilities that can be used to automate data flows, with particular emphasis on managing data from instruments such as next generation sequencers and cryo electron microscopes. This session introduces the Globus command line interface (CLI) for integrating Globus tasks into scripts, and the Globus Flows service for more robust automation (including workflows that require a human in the loop).
Presented at a workshop at KU Leuven on July 8, 2022.
Introduction to the Globus Platform (APS Workshop)Globus
This document discusses the Globus Platform Services API and SDK. It provides an overview of the Globus Auth API for user authentication and file sharing capabilities. It also summarizes the Globus Transfer API and Python SDK for integrating file transfer and access management into applications. Several methods for tasks like endpoint search, file operations, task submission and management are covered at a high level.
Working with Globus Platform Services and PortalsGlobus
We describe how developers can use Globus APIs to integrate robust data management capabilities into their research applications. We also demonstrate the new Globus portal framework that can be used in conjunction with the Globus Search service to simplify data search and discovery.
We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service.
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)Globus
This document discusses various ways to automate research data workflows using Globus. It describes automating regular data transfers through recurring tasks scheduled with sync options. It also discusses staging data automatically as part of compute jobs by adding directives to job scripts. The document outlines how applications can programmatically submit transfers when users complete tasks. It provides an overview of relevant Globus platform capabilities for authentication, authorization, and automation using the Globus CLI and SDK.
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
The document discusses automating research data workflows using the Globus Command Line Interface (CLI). Key points include:
1) The CLI can be used to automate recurring transfers, stage data in/out of compute jobs, and allow applications to submit transfers on a user's behalf by using access tokens.
2) Refresh tokens allow applications to get new access tokens without a user logged in, by storing and exchanging refresh tokens.
3) Examples of automation include syncing directories with scripts, staging data in shared directories, and removing directories after transfer with Python scripts.
4) Resources for support include Globus documentation, sample code, and professional services for custom application development and integration.
Automating Research Data Workflows (GlobusWorld Tour - STFC)Globus
This document discusses automating research data workflows using Globus capabilities. It covers automating data replication and recurring transfers, staging data with compute jobs, and application-driven automation. Relevant Globus platform capabilities for automation include Globus Auth for native apps, refresh tokens, and the Globus command line interface (CLI) for tasks like batch transfers, permission management, and parsing CLI output. Scripting automation with the CLI or portals is also discussed.
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)Globus
This document discusses automating data transfers with the Globus CLI and relevant Globus platform capabilities. It provides examples of using the CLI to search for endpoints, list contents, transfer files and directories in batches, manage notifications, and set permissions. It also discusses using the CLI in job submission scripts and how portals can automate transfers by storing refresh tokens to act on behalf of users. Relevant code examples and support resources are referenced.
Tutorial: Automating Research Data WorkflowsGlobus
This tutorial was given at the 2019 GlobusWorld Conference in Chicago, IL by Globus Head of Products Rachana Ananthakrishnan and Director of Customer Engagement Greg Nawrocki.
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Globus
The document discusses how the Globus platform can be leveraged to build science gateways, web portals, and other applications. It provides examples of how the Globus Auth, APIs, and Connect services can be used to enable authentication, file transfer, and data sharing. The Globus Python SDK and helper pages are also described as tools for developing applications that integrate Globus functionality.
Jupyter + Globus: The Foundation for Interactive Data ScienceGlobus
This tutorial from the Gateways 2018 conference in Austin, TX showed participants how Globus may be used in conjunction with the Jupyter platform to open up new avenues—and new data sources--for interactive data science.
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Introduction to Research Automation with GlobusGlobus
We present an overview of Globus services for automating research computing and data management tasks, to accelerate research process throughput. This content is aimed at researchers who wish to automate repetitive data management tasks (such as backup and data distribution to collaborators), as well as those working with instruments (cryoEM, next-gen sequencers, fMRI, etc.) who wish to streamline data egress, downstream analysis, and sharing at scale.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
Introduction to Globus for System AdministratorsGlobus
We review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating a Globus endpoint on your HPC cluster, lab server, or other multi-user storage system. You will experiment with installing Globus Connect Server, and configuring basic options on the endpoint.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Extending Globus into a Site-wide Automated Data Infrastructure.pdfGlobus
The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science.
1. https://doi.org/10.1038/sdata.2016.18
2. https://www.go-fair.org/fair-principles
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Reactive Documents and Computational Pipelines - Bridging the GapGlobus
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
Cultural Shifts: Embracing DevOps for Organizational TransformationMindfire Solution
Mindfire Solutions specializes in DevOps services, facilitating digital transformation through streamlined software development and operational efficiency. Their expertise enhances collaboration, accelerates delivery cycles, and ensures scalability using cloud-native technologies. Mindfire Solutions empowers businesses to innovate rapidly and maintain competitive advantage in dynamic market landscapes.
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Softwares
NBFC Software: Optimize Your Non-Banking Financial Company
Enhance Your Financial Services with Comprehensive NBFC Software
NBFC software provides a complete solution for non-banking financial companies, streamlining banking and accounting functions to reduce operational costs. Our software is designed to meet the diverse needs of NBFCs, including investment banks, insurance companies, and hedge funds.
Key Features of NBFC Software:
Centralized Database: Facilitates inter-branch collaboration and smooth operations with a unified platform.
Automation: Simplifies loan lifecycle management and account maintenance, ensuring efficient delivery of financial services.
Customization: Highly customizable to fit specific business needs, offering flexibility in managing various loan types such as home loans, mortgage loans, personal loans, and more.
Security: Ensures safe and secure handling of financial transactions and sensitive data.
User-Friendly Interface: Designed to be intuitive and easy to use, reducing the learning curve for employees.
Cost-Effective: Reduces the need for additional manpower by automating tasks, making it a budget-friendly solution. Benefits of NBFC Software:
Go Paperless: Transition to a fully digital operation, eliminating offline work.
Transparency: Enables managers and executives to monitor various points of the banking process easily.
Defaulter Tracking: Helps track loan defaulters, maintaining a healthy loan management system.
Increased Accessibility: Cutting-edge technology increases the accessibility and usability of NBFC operations. Request a Demo Now!
Explore the rapid development journey of TryBoxLang, completed in just 48 hours. This session delves into the innovative process behind creating TryBoxLang, a platform designed to showcase the capabilities of BoxLang by Ortus Solutions. Discover the challenges, strategies, and outcomes of this accelerated development effort, highlighting how TryBoxLang provides a practical introduction to BoxLang's features and benefits.
React Native vs Flutter - SSTech SystemSSTech System
Your project needs and long-term objectives will ultimately choose which of React Native and Flutter to use. For applications using JavaScript and current web technologies in particular, React Native is a mature and trustworthy choice. For projects that value performance and customizability across many platforms, Flutter, on the other hand, provides outstanding performance and a unified UI development experience.
A Comparative Analysis of Functional and Non-Functional Testing.pdfkalichargn70th171
A robust software testing strategy encompassing functional and non-functional testing is fundamental for development teams. These twin pillars are essential for ensuring the success of your applications. But why are they so critical?
Functional testing rigorously examines the application's processes against predefined requirements, ensuring they align seamlessly. Conversely, non-functional testing evaluates performance and reliability under load, enhancing the end-user experience.
Overview of ERP - Mechlin Technologies.pptxMitchell Marsh
This PowerPoint presentation provides a comprehensive overview of Enterprise Resource Planning (ERP) systems. It covers the fundamental concepts, benefits, and key functionalities of ERP software, illustrating how it integrates various business processes into a unified system. From finance and HR to supply chain and customer relationship management, ERP facilitates efficient data management and decision-making across organizations. Whether you're new to ERP or looking to deepen your understanding, this presentation offers valuable insights into leveraging ERP for business success.
IN Dubai [WHATSAPP:Only (+971588192166**)] Abortion Pills For Sale In Dubai** UAE** Mifepristone and Misoprostol Tablets Available In Dubai** UAE
CONTACT DR. SINDY Whatsapp +971588192166* We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai** Sharjah** Abudhabi** Ajman** Alain** Fujairah** Ras Al Khaimah** Umm Al Quwain** UAE** Buy cytotec in Dubai +971588192166* '''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol** Cytotec” +971588192166* ' Dr.SINDY ''BUY ABORTION PILLS MIFEGEST KIT** MISOPROSTOL** CYTOTEC PILLS IN DUBAI** ABU DHABI**UAE'' Contact me now via What's App… abortion pills in dubai Mtp-Kit Prices
abortion pills available in dubai/abortion pills for sale in dubai/abortion pills in uae/cytotec dubai/abortion pills in abu dhabi/abortion pills available in abu dhabi/abortion tablets in uae
… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all** Cytotec Abortion Pills are Available In Dubai / UAE** you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pills in Dubai** UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if it's beyond 6 months. Our Abu Dhabi** Ajman** Al Ain** Dubai** Fujairah** Ras Al Khaimah (RAK)** Sharjah** Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical** medical and surgical abortion methods for early through late second trimester** including the Abortion By Pill Procedure (RU 486** Mifeprex** Mifepristone** early options French Abortion Pill)** Tamoxifen** Methotrexate and Cytotec (Misoprostol). The Abu Dhabi** United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used** 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need for surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi** United Arab Emirates** uses the latest medications for medical abortions (RU-486** Mifeprex** Mifegyne** Mifepristone** early options French abortion pill)** Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi** United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our
Are you wondering how to migrate to the Cloud? At the ITB session, we addressed the challenge of managing multiple ColdFusion licenses and AWS EC2 instances. Discover how you can consolidate with just one EC2 instance capable of running over 50 apps using CommandBox ColdFusion. This solution supports both ColdFusion flavors and includes cb-websites, a GoLang binary for managing CommandBox websites.
Efficient hot work permit software for safe, streamlined work permit management and compliance. Enhance safety today. Contact us on +353 214536034.
https://sheqnetwork.com/work-permit/
In this talk, we will explore strategies to optimize the success rate of storing and retaining new information. We will discuss scientifically proven ideal learning intervals and content structures. Additionally, we will examine how to create an environment that improves our focus while you remain in the “flow”. Lastly we will also address the influence of AI on learning capabilities.
In the dynamic field of software development, this knowledge will empower you to accelerate your learning curve and support others in their learning journeys.
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...onemonitarsoftware
Unlock the full potential of mobile monitoring with ONEMONITAR. Our advanced and discreet app offers a comprehensive suite of features, including hidden call recording, real-time GPS tracking, message monitoring, and much more.
Perfect for parents, employers, and anyone needing a reliable solution, ONEMONITAR ensures you stay informed and in control. Explore the key features of ONEMONITAR and see why it’s the trusted choice for Android device monitoring.
Share this infographic to spread the word about the ultimate mobile spy app!
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Automating Research Data Flows and Introduction to the Globus Platform
1. Automating Research Data Flows and
Introduction to the Globus Platform
Greg Nawrocki
greg@globus.org
nawrocki@uchicago.edu
nawrocki@anl.gov
September 21, 2022
2. Globus Automation Capabilities
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
3. Globus Auth: Foundational IAM service
• Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• Support high assurance service for use with
protected data (e.g. HIPAA protected data)
5
4. Securing Apps with Globus Auth
• Native App (with refresh tokens – extend expiration)
– Authentication as user identity
– Authentication URL / come back with a auth code – exchanged for tokens
– Clients can’t keep a secret - tokens in plain text
– Jupyter Notebook examples / CLI Based Timer Service
• Auth Code Grant – Templated App
– Authentication as user identity
– Browser redirect to Globus Auth, auth code returned (no manual copy)
– Tokens stored securely
– CLI / Jupyter Hub secured with Globus Auth
• Confidential Client:
– Authentication as application
– ClientID and Secret stored securely
– Custom apps
– developers.globus.org - Client ID / Secret / Client Identity Username
6
6. Use case: Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
12
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
7. The Globus Timer Service
• Scheduled/recurring file transfers
• Supports all Globus transfer and sync options
• Service accessible via web app and CLI
• Example: NIH – hpc.nih.gov/storage/globus_cron.html
13
11. Globus Command Line Interface
Open source, uses
the Python SDK
Because of this
correspondence the CLI is
an excellent tool for getting
the gist of how he SDK
functions.
Very scriptable!
12. Globus CLI
• It’s a stand alone application distributed by Globus
– https://docs.globus.org/cli/
– https://github.com/globus/globus-cli
• Easy install and updates
• Command “globus login” gets access tokens
• All interactions with the service use the tokens
– The CLI is acting as you (your identity)
• Command “globus logout” deletes those
13. CLI Basics – “globus” is the executable
$ globus endpoint search 'Globus Tutorial'
$ globus task list
$ globus get-identities greg@globus.org --verbose
• Getting help / list of commands
– globus list-commands
– globus –help
– https://docs.globus.org/cli/examples/
• UUIDs for endpoint, task, user identity, groups…
• Can query to discover the UUIDs
– Use search / list / get options
14. Use case: Sharing out data
Researcher initiates
transfer request
Instrument
Globus
controls
access to
shared files on
existing
storage; no
need to move
files to cloud
storage!
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
Personal Computer
Transfer
Share
Compute Facility
Globus transfers files
reliably, securely
15. Step 1: Transfer
• Using a Batch Transfer
– Transfer tasks have one source/destination, but can have any
number of files
– Provide input source-dest pairs via local file
– File may have embedded comments
$ export ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ export ep2=af7bda53-6d04-11e5-ba46-22000b92c6ec
$ globus transfer $ep1:/share/godata/ $ep2:/~/automation-example-
data-share/outbound/Vas --label "Files to share" --batch
/Users/gregnawrocki/Greg/Globus/Demos/ornl20220623/files.txt
# this is the contents of files.txt:
# a list of source paths followed by destination paths
file1.txt file1.txt
# file2.txt file2.txt # inline-comments are also allowed
file3.txt file3.txt
16. Step 2: Share - Set permissions
• Set and manage permissions on guest collection
• Requires access manager role
$ globus endpoint search 'automation-example-data-share’
$ export share=bf4445c4-ecfd-11ec-aed7-6f7c2b57b05c
$ globus endpoint permission create --permissions r --identity
vas@uchicago.edu $share:/Vas/
17. Useful submission commands
• Safe resubmissions
– Applies to asynchronous tasks (transfer and delete)
– Get a task UUID, use that in submission
– $ globus task generate-submission-id
– --submission-id option in transfer
– Useful for lazy branching or when dealing with unreliable
networks
• Task wait
– useful for scripting conditionals on transfer task status
18. Parsing CLI output
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --format json
$ globus endpoint search --filter-scope my-endpoints --jmespath
'DATA[].[id, display_name]'
• Default output is text; for JSON output use --format
json
• Extract specific attributes using --jmespath
<expression>
19. Managing notifications
• Turn off emails sent for tasks
• Useful when an application manages tasks for a user
• Disable notifications with the --notify option
--notify off (all notifications)
--notify succeeded|failed|inactive (select notifications)
21. Managed automation of tasks
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
22. Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
23. Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
28. Extending the ecosystem: Action providers
35
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
30. 37
Custom portals? Science Gateways? Unique workflows? Our open
REST APIs and Python SDK empower you to create an integrated
ecosystem of research data services and applications.
32. App Access to Collections
• Globus Transfer – Authentication with access tokens
– Individual: Globus login (consents) to get tokens
– Application: Apps are people too!
o developers.globus.org - Client ID / Secret / Client Identity Username
• Collection access
– GCSv4 Mapped Collections – user consent
o Activating endpoint means binding a credential to an endpoint for login
– GCSv5 Mapped Collections (no user certificates, OAUTH tokens and consents)
o https://docs.globus.org/globus-connect-server/v5.4/use-client-credentials/
o Request the data_access scope (per collection) to be able to access the collection.
o The storage gateway must permit identities from the 'clients.auth.globus.org' identity domain
o Identity Mapping Policy that maps the ‘UUID@clients.auth.globus.org' identity to a valid local user
– Guest Collections
o Guest Collections auto-activate - need to do this before API calls to endpoints
o Use Guest Collections whenever possible
– Remember to set your ACLs (WebApp)
Automation
34. Globus APIs
• Auth
• Groups
• Transfer
• Search
• Timer
• Flows
• GCS Manager
• Globus Web App consumes public
Transfer API
• Resource named by URL (standard
REST approach)
• Globus APIs use JSON for documents
docs.globus.org/api/transfer
35. Globus Python SDK
• Python client library for the Globus REST APIs
• Largely direct mapping to REST API
• globus_sdk.TransferClient class handles
connection management, security, framing,
marshaling
globus-sdk-python.readthedocs.io/en/stable/
globus.github.io/globus-sdk-python
42
36. Synchronous Tasks
• Endpoint search (with scopes)
• List directory contents (ls)
• Make directory (mkdir)
• Rename
• Note:
– Path encoding & UTF gotchas
– Don’t forget to auto-activate first
46
37. Asynchronous Tasks
• Transfer
– Sync level option
• Delete
• Get submission_id, followed by submit
– Once and only once submission
• Use task id to “follow up”
47
38. The Globus API / SDK with a Jupyter Notebook in a
Jupyter Hub – Auth Code Grant
login
REST APIs
{ “tokens”:…
{“tokens”:…
REST APIs
REST APIs
Bearer a45cd…
39. Walkthrough API with our Jupyter Hub
• https://jupyter.demo.globus.org
– Sign in with Globus
– Verify the consents
– Start My Server (this will take about a minute)
– Open folder: globus-jupyter-notebooks
– Run Platform_Introduction_JupyterHub_Auth.ipynb
• If you mess it up and want to “go back to the beginning”
– Just stop and restart the server
• If you want to use the notebook outside of our hub
– https://github.com/globus/globus-jupyter-notebooks
– Authentication is a manual cut and paste of exchanging the
authorization code for an access token – Native App
49
41. Developer References
• Globus API / SDK Documentation
– Transfer API : docs.globus.org/api/transfer/
– SDK: globus-sdk-python.readthedocs.io/en/stable/
• Globus GitHub: github.com/globus/
– Jupyter Notebooks
o Stand alone notebooks and hub integrations that walk through much of the
functionality of our SDK
o https://github.com/globus/globus-jupyter-notebooks
– Automation Examples
o Shell scripted CLI and Python module examples of common research data
management use cases
o https://github.com/globus/automation-examples