SlideShare a Scribd company logo
www.eudat.eu	
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Linking EUDAT services with
EGI
Three use cases
Michaela Barth caela@kth.se
Hans van Piggelen hans.vanpiggelen@surfsara.nl
This work is licensed under the Creative
Commons CC-BY 4.0 licence
2	
Overall EUDAT WP7 objectives
  Ensure the interoperability of EUDAT with other
public and private e-Infrastructures
  Provide European researchers and industries with
seamless access to data and computing resources
  Pave the way towards the interoperability of e-
Infrastructure tools and services beyond H2020
EUDAT	Summer	School,	3-7	July	2017,	Crete	 3	
Joint	Access	to	Data	
and	HPC	Services		 Commercial	
stakeholders	
CollaboraFon	with	
Commercial	
Stakeholders	
Joint	Access	to	Data,	
HTC	and	Cloud	
CompuFng	Resources	
Interoperability	
WP 7 Task 7.2
WP 7
  Lowered barriers, easier access for users to combine
computation and data
  Reduced operational and administrative efforts for the
e-infrastructures
  Additionally pooling effect: augmentation of services
from existing communities using EGI/EUDAT
4	
Computation on EGI
Federated Cloud
and HTC
EUDAT services for
transfer, syncing,
sharing, staging and
preservation of data
Expected Benefits
WP7 Task 7.2: Joint Access to Data, HTC
and Cloud Computing Resources
  EGI-EUDAT collaboration started in March 2016 and
at least continues until March 2018
  Aiming at a production cross-infrastructure service
  Starting with concrete community pilots:
 EPOS
 ICOS
 IS-ENES (very recently)
5
6	
Harmonization on all levels
  Technical
  Interoperability (e.g. workflow execution, data
discoverability and provenance)
  Authentication, Authorization and Identity (AAI)
management
  Combination of respective service catalogues
  Policies
  Access policy
  Long-term perspective
  Operational policies
  Operational
  Operational tools, technologies, best practices
  Security
  SLAs
7	
EGI/EUDAT AAI Interoperability
  See the EGI and EUDAT services as offered by a
unique infrastructure once authenticated
  Breaking this down into smaller steps:
 Allowing users to access EGI and EUDAT web
services with the same credentials
 Allowing users to access EGI and EUDAT non-web
services with the same credentials
 Attributes harmonisation
 Enabling EGI services to delegate user’s
credential to EUDAT services and vice versa
 Data privacy issues and
policy harmonisation
8	
EGI/EUDAT AAI Interoperability
  Work done so far:
 Get an understanding of each other’s AAI layers
 Breaking the task down into smaller steps
 Started draft documentation
 Enabling accounts for some feasibility tests
  Next step:
 Complete roadmap
9	
User community pilots
  EGI and EUDAT selected a set of relevant user
communities
  Started with process of getting requirements from
user communities and their indication of
prioritization of those requirements:
 Definition of universal use case
 Integration activity has been driven by the end
users from the start!
  Identified user communities that are prominent
European Research infrastructure in the field of
Earth Science (EPOS and ICOS), Bioinformatics
(BBMRI and ELIXIR) and Space Physics (EISCAT-3D)
Definition of the universal use case
  Demo at EGI Community Forum 2015
10
11	
User community pilots
  EPOS:
  Fostering worldwide interoperability in Earth Sciences
and provide services to a broad community of users
  ICOS:
  Creating web-based service (“Footprint tool”) at
ICOS Carbon Portal, providing on-demand
computing facilities
  ENES (just recently):
  Performing on-demand climate data analytics for
climate research and climate change impact
communities
12	
European Plate Observing System
(EPOS)
  EPOS is the integrated solid Earth Sciences research
infrastructure
  Aims to be an effective coordinated European-
scale monitoring facility for solid Earth dynamics
  Aims to establish a long-term plan to facilitate the
integrated use of data, models and facilities from
existing, and new distributed research infrastructures
(RIs), for solid Earth science
EPOS data workflow plan
EPOS use case status
  Use case first stage was:
 Defining the best strategy for access to
Federated Cloud partners and identify secure
and efficient data transfer protocols towards the
iRODS system
  Now in second stage:
 Integration with third-party data storage services
(EUDAT) and cloud computing resources (EGI)
EUDAT	Summer	School,	3-7	July	2017,	Crete	
Integrated	Carbon	Observa6on	System	(ICOS)	
“A	pan-European	research	infrastructure	for	quan6fying	and	understanding	the	
greenhouse	gas	balance	of	the	European	con6nent”	
	
•  Collect	high-quality	observaFonal	data	
relevant	to	the	greenhouse	gas	budget	of	
Europe	
•  Make	the	ICOS	data	freely	available	to	all	
interested	parFes		
•  Promote	the	use	of	the	ICOS	data	for	
further	scienFfic	study	
•  Support	modelling	acFviFes	of	the	
greenhouse	gas	fluxes	in	Fme	and	space	
•  Support	verificaFon	of	the	effecFveness	of	
policies	aiming	to	reduce	greenhouse	gas	
emissions	
Slide thanks to Ute Karstens and Margareta Hellström
EUDAT	Summer	School,	3-7	July	2017,	Crete	
ICOS	data	and	computa6onal	workflow	
Atmospheric	observaFons	
Emissions	
Meteorological	driver	fields		
≈	1	GB	
≈	0.5-1	TB	
≈	2-3	TB	
≈	1-2	TB	
per	year	
StaFon	
Footprints	
GHG	
concentraFons	
Federated	Cloud	
STILT	
Lagrangian	
transport	model	
≈	300	CPUs	per	footprint	
	=>	750	CPUh/staFon/year	
ICOS		
Carbon	Portal	
Slide thanks to Ute Karstens and Margareta Hellström
EUDAT	Summer	School,	3-7	July	2017,	Crete	
ICOS	CP		
account	
Footprint	tool	workflow	
VM	
	Model	Input	
Meteo	
datahub.egi.eu	
OneData	
User	1	
User	2	
AAI	
AAI	
VM	
Web	service	
Controller	
ICOS	Data	
Model	Input	
Model	Output	
PDC/KTH	
VM	
Worker	
Model	
VM	
NFS	
	
Model	Output	
ParFcle	LocaFon	
Footprints	
GHG	conc.	
Slide thanks to Ute Karstens and Margareta Hellström
EUDAT	Summer	School,	3-7	July	2017,	Crete	
VM	
	Model	Input	
Meteo	
datahub.egi.eu	
OneData	
ICOS	CP		
account	
User	1	
User	2	
AAI	
AAI	
VM	
Web	service	
Controller	
ICOS	Data	
Model	Input	
Model	Output	
PDC/KTH	
VM	
Worker	
Model	
VM	
Worker	
Model	
VM	
Worker	
Model	
VM	
Worker	
Model	
VM	
NFS	
	
Model	Output	
ParFcle	LocaFon	
Footprints	
GHG	conc.	
User	3	
User	4	
…	
on	
demand	
Footprint	tool	workflow	
Slide thanks to Ute Karstens and Margareta Hellström
ICOS Carbon Portal use case status
  Virtual machines with attached block storage instantiated in the EGI
Federated Cloud.
Docker container for computations with local VM storage
  Data transfer between VM and B2SAFE using B2STAGE instance at
PDC/KTH Stockholm
  Storing of ICOS data tested on the B2SAFE system at KTH
  Robot certificates installed to allow for further automation of the
workflow
Next steps:
  ICOS data replication in B2SAFE and access via B2STAGE
  Access to common storage for several VMs (via the EGI DataHub)
  Load balancing to distribute computations/users requests to several
VMs
  Improve documentation, with a clear user perspective!
19
Slide thanks to Margareta Hellström
European Network for Earth Science
Modelling (IS-ENES)
20
  Spawned from work on EGI-EUDAT interoperability in
WP7/WP8 and the ICOS Carbon Portal use case
developed therein
  Goal: enabling computation on CMIP5/CMIP6 data
stored in the Earth System Grid Federation (ESGF)
infrastructure
  Calculations will be performed using the EUDAT
General Execution Framework Workflow API (GEF)
combined with EUDAT B2 services and EGI
FedCloud
  Results will be sent to climate4impact.eu platform
IS-ENES use case overview
  Adaption of current ESGF
environment at CINES
consisting of one ESGF data
node and one B2SAFE node,
no GEF yet
21
hfp://climate4impact.eu	Slide thanks to Christian Pagé and Xavier Pivan
Motivations
  Societal
  Provide climate projections data to climate change impact
researchers, facilitators, practitioners
  Ease access with better intuitive interfaces
  Provide more common data formats
  Generate tailored products from data processing workflows
Climate	Research	Community	
	
Slide thanks to Christian Pagé and Xavier Pivan
IS-ENES:	Current	situaFon
  Data available for scientific analysis: a very large trend
  Limitations in data access means limitations in data
analytics and scientific results
  Download locally then analyze: a workflow that cannot be
sustained
  Climate researchers
  Impact researchers
PracFcal	Example:	Climate	Community	
FederaFon	
Service	
•  Temperature at 850 hPa field (Aggregated files 30 levels)
•  10 climate models
•  1960-1990 & 2040-2070 = 60 years = 21 915 days
•  Daily fields = 1 field per day
•  Global spatial scale 100 km resolution
TOTAL: 6 754 500 fields to download
~100 Kb per 2D field = 626 Gb
After the analysis post-processing
•  Anomaly of the average of the two periods over a specific
country for each climate model
•  Result: 10 times 2D fields over a small domain
•  Estimated data size after post-processing: 1 Mb
Data reduction...
Slide thanks to Christian Pagé and Xavier Pivan
Current situation
IS-ENES use case plan
  Steps:
1.  Researcher finds data in B2SHARE using B2FIND, or provides PIDs/
URLs
2.  Researcher performs Data Analytics of selected data using GEF
backend deployed on EGI FedCloud. Output is stored into EGI
Volume.
3.  Results are sent back to B2SHARE/B2DROP/B2SAFE for researcher
to download, or execute another GEF for further calculations or
to generate a figure
  So far: Using EGI FedCloud IaaS infrastructure with VMs with CPU,
RAM and storage, Docker engine; testing dockerized jOCCI API for
automatic instantiation of VMs
  Next steps:
  Data transfers via Globus GridFTP, getting test access on
B2STAGE instances (KTH-PDC, CINECA?, STFC?)
  Follow ICOS example in testing interoperability B2SAFE-B2STAGE/
EGI
25
GEF	–	User	Interface	
Send	request	/	
calculaFon	order	
Data	URL	or	PID	
Virtual	Machine	
LOCALHOST	
EGI	Federated	cloud	
GEF	backend	
deploy	docker	
container	
Execute	
calculaFon	
command		
B2SHARE/B2DROP	
EGI	Volume	
Docker	
Volume	
New	data	
New	data	
B2SAFE		
New	data	
In	progress	
Output	
Input	
CalculaFon	
EUDAT	service	
Data	transfer	
B2STAGE		
Transfer	with	globus	
Data	
Slide thanks to Christian Pagé and Xavier Pivan
Prototype overview: Deploying GEF execution
on EGI FedCloud
Challenges encountered so far
  When implementing prototypes of use-cases of EPOS
and ICOS:
  Scaling up
  Managing co-existing support systems and channels
  User-friendly documentation often missing or lacking
  Steep learning curve for the user communities
  3rd party dependencies, e.g. GridFTP
  Large amount of small files not suitable for input yet
  On the plus side: personal contacts highly appreciated
27
Future continuation
  Continued implementation of use cases
with result evaluation
  Description of work and dataflow for the
use cases
  Improved documentation
  Final report (aim: end of Jan. 2018)
including recommendations for service
development and access policy
harmonization
28
www.eudat.eu	
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Authors Contributors
Michaela Barth, KTH
Hans van Piggelen, SURFsara
Christian Pagé, CERFACS
Xavier Pivan, CERFACS
Margareta Hellström, LU
Ute Karstens, LU
Thank you!

More Related Content

Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van Piggelen, SURFsara, Michaela Barth, KTH)