SlideShare a Scribd company logo
High level presentation of data
modeler’s job & available tools
INTRODUCTION
Objectives of Euclid Datamodel 101

Slidecast dedicated to Euclid data modelers & developers
Help you understand what is expected and how to do it

Released as multiple episodes over time
1st episode: high-level overview of tools and process
2nd episode: the TIPS example
Following episodes: zoom on technical points
INTRODUCTION
Objectives and contents of this presentation

Get an overview of the data modeling process
Understand the data model workflow

Know where to find information
Know what tools are available

No complex details and technical information here…
… but high-level information and pointers to the right direction.
SUMMARY
The data modeling process

1 - Understand Euclid DataModel
Why using a Euclid DataModel? Why choosing XML? What is XML
Schema? What are the Euclid-specific XML rules my schema shall
comply with? How is DataModel SVN repository structured? How are
xml namespaces structured?

2 - Create your own DataModel
What should my DataModel contain? What software can I use to write
xml? How can I check if my datamodel is correct?

3 - Use the DataModel in your own code
How can I use the data model in my code? How can I use XML data
bindings? Can I get pre-configured tools all at once?

Recommended for you

Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answers

The document discusses Codd's rules for relational database management systems (RDBMS). It explains the 13 rules, which include that data should only be represented as values in tables, null values must be supported, and the database description must be queryable using the same relational language as the data. It also defines what constitutes an RDBMS, describes database concepts like normalization, and provides examples of relationships and integrity rules.

sql interview questions with answersdbms interview questions and answers
MySQL and its basic commands
MySQL and its basic commandsMySQL and its basic commands
MySQL and its basic commands

MySQL is an open-source relational database management system. The document discusses the introduction to MySQL, its development history, installation, features, data types, basic commands like CREATE, SELECT, UPDATE, DELETE. It also covers MySQL constraints. MySQL is widely used for web applications due to its speed, ease of use and open source nature. It can store and manage large volumes of data across different tables using relationships.

7 data management design
7 data management design7 data management design
7 data management design

The document discusses different approaches to data management and persistence in applications, including: 1) Storing objects directly in files or using a database management system (DBMS) to store data in tables while hiding physical storage details. 2) Design questions around persistence such as whether to use files, a relational or object DBMS, and how to structure the logical and physical layers. 3) Common techniques for mapping objects to relational databases like normalization, handling inheritance and associations. 4) Alternatives for designing data management classes like adding persistence methods to classes or using broker classes.

Why using a Euclid DataModel?
Euclid mission relies a lot on data transfer and manipulation
Data consistency between OUs, workflows, pipelines, storage
is a key point

use

EAS

Your DataModel will be:
- used to structure EAS db
- manipulated by your pipelines code

Compliant
Data
products
in/out

DataModel
IAL

SDC

use
Pipeline
code

DESIGN TIME

RUN TIME

Your data products will be:
- stored on EAS
- queryable from EAS
- transmitted to/from pipelines by IAL
Why choosing XML?
XML language brings many benefits:
Easy to read and understand by humans and machines
<coord>
<x>12.05</x>
<y>3.1</y>
</coord>

Many tools available to create, control and check xml
Strong type/namespace control and definition
Widely used and supported across the world
Self contained: express data and data structure

XML chosen above many other alternatives
Find information

- W3Schools tutorials:

http://www.w3schools.com/schema/
What is XML Schema?
Two file format you should be familiar with:

XSD (XML schema)

XML

Describes the data structure

Contains the actual data

<coord>
<x>12.05</x>
<y>3.1</y>
</coord>

complies
with

<xs:element name=«coord»>
<xs:complexType>
<xs:sequence>
<xs:element name=«x» type=«xs:float» />
<xs:element name=«y» type=«xs:float» />
</xs:sequence>
</xs:complexType>
</xs:element>

Find information
- W3Schools tutorials: http://www.w3schools.com/schema/
- Highlights on XML/XSD (DM Workshop):

http://euclid.roe.ac.uk/attachments/download/2744/Workshop_Nov2013_XSD_XML%20-%202.02.ppt
What are the Euclid-specific XML rules my
schema shall comply with?
Need for a fully consistent DataModel
everybody should follow the same rules

Among existing rules:
-

XML Schema file name
XML file name
Single root element
Element identifier name
Numeric type restriction

-

Recursive definitions
Target namespaces
Encoding
Unqualified namespaces
…

Rules are still in development, feedback is welcome and changes might be required
Find information
- Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47
- DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf

Recommended for you

ado.net
ado.netado.net
ado.net

This document provides an overview of ADO.NET. It defines ADO.NET as providing functionality to connect frontend and backend systems and update, retrieve, and add data to databases using classes and functions. It supports both connected and disconnected architectures. The disconnected architecture allows retrieving entire database tables locally. Key ADO.NET classes include Connection, Command, and DataReader. Namespaces like System.Data organize the object model. Data providers like SqlClient are responsible for database connections. ADO.NET enables developing data-centric applications with benefits like a disconnected data architecture, scalability, and interpretability.

by ziyaul haque
576 oracle-dba-interview-questions
576 oracle-dba-interview-questions576 oracle-dba-interview-questions
576 oracle-dba-interview-questions

This document lists 576 Oracle DBA interview questions covering topics such as checking privileges and permissions, resizing datafiles, finding analyzed tables, active users, what a user is doing, table and index counts, tablespace and user space usage, OS and database versions, datafile reads and writes, segments close to limits, archived log and backup information, online redo log groups, datafiles, schema sizes, quotas, tablespace usage, table sizes, row distribution, database recovery, alerts, thresholds, notifications, archive log format, connection troubleshooting, and other administrative tasks and problems.

Oracle Database Introduction
Oracle Database IntroductionOracle Database Introduction
Oracle Database Introduction

Oracle Database is a relational database management system produced by Oracle Corporation. It stores data logically in tables, tablespaces, and schemas, and physically in datafiles. The database, SGA (containing the buffer cache, redo log buffer, and shared pool), and background processes like SMON, PMON, and DBWR work together for high performance and reliability. Backup methods and administrative tasks help maintain the database.

chhom karath
How is DataModel SVN repo structured?
Classic SVN structure
- trunk: latest stable work
- branches: specific feature parallel development
- tags: official releases

Dictionary and Interfaces for your products
- Dictionnary: definition of the complexTypes and
elements of your product entire DataModel
- Interfaces: definition of the data exchanged between
components. One root element only per type, that you can
see as a variable to access a product.

EC/SGS/ST/4-2-05-DM/schema

Find information
- DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf
- Dictionary of types:
https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html
- Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
How are xml namespaces structured?
Under Dictionary and Interfaces, 4 top-level namespaces
- bas: common definitions shared by everyone
- ins: instrument specific definitions
- pro: OU-specific definitions
- sys: system specific definitions (storage, processing…)

/pro sub-levels
- one directory per OU
- one responsible custodian per directory

EC/SGS/ST/4-2-05-DM/schema

Find information
- DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf
- Dictionary of types:
https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html
- Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
What should my DataModel contain?
Your DataModel should contain:
Must have

- definitions of pipeline inputs
- definitions of output products
- definitions of intermediate elements
used in your code

<sgs:dataContainer>

• ID
• Filename
• StorageNode
• Path

Your DataModel can use:
- new elements you define
- already existing elements
- dataContainers for files with no specific definition

Find information
- Fits DataModel (see dictionary and interfaces): schema/trunk/Dictionary/pro/sim/euc-test-ousim-tips.xsd
- DM Workshop DataContainer presentation: http://euclid.roe.ac.uk/attachments/download/2765
- DM wiki homepage: http://euclid.roe.ac.uk/projects/eucrma/wiki
What software can I use to write XML?
Of course, any text editor allows you to simply read and write XML
One of these two powerful XML development environment software is recommended
- Altova XMLSpy (license from 400€)
- Oxygen XML Editor (license from 99$ - 30 days free trial)

Project oriented browsing, handles dependencies
between files
Content completion for elements, attributes & values
XML validation and detection of errors

Schema modeling with graph representation

Find information
- Altova XMLSpy: http://www.altova.com/xmlspy.html
- Oxygen XML Editor: http://www.oxygenxml.com/

Recommended for you

Oracle dba interview
Oracle dba interviewOracle dba interview
Oracle dba interview

The document discusses various Oracle database concepts and architecture. It covers physical and logical database structures, components like datafiles, redo logs, control files, tablespaces and schemas. It also discusses logical objects like tables, indexes, views, sequences and synonyms. Other topics include parallel servers, database instances, memory structures like SGA and PGA, tablespaces, rollback segments, free extents and space allocation.

php databse handling
php databse handlingphp databse handling
php databse handling

The document discusses using PHP to connect to and manipulate MySQL databases. It covers using MySQLi and PDO to connect to MySQL from PHP, and provides examples of inserting, selecting, updating, and deleting data from MySQL databases using PDO commands. Key points include that PDO can work with multiple database types while MySQLi only works with MySQL, and that both support prepared statements to protect against SQL injection.

All Oracle-dba-interview-questions
All Oracle-dba-interview-questionsAll Oracle-dba-interview-questions
All Oracle-dba-interview-questions

This document contains interview questions about Oracle database concepts and architecture. It covers physical and logical database structures, tablespaces, schemas, schema objects like tables, views, indexes, and sequences. It also discusses database administration topics like instances, parallel servers, memory structures like the system global area and program global area, and space allocation and management using rollback segments and tablespaces.

How can I check if my DataModel is correct?
Use Oxygen or XMLSpy to validate your XML and XML Schema files
Well formed XML: correct language syntax
Document validation: xml conforms to xml schema definition

Use Euclid Data Model Checker tools
Check compliance with Euclid DataModel rules
Python module & scripts available in Euclid SVN
(ECSGSST4-2-05-DMtoolstrunkDataModelChecker)

Find information
-

Altova XMLSpy:
http://www.altova.com/xmlspy.html
Oxygen XML Editor: http://www.oxygenxml.com/
Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47
DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf
- DataModelChecker readme (SVN):
ECSGSST4-2-05-DMtoolstrunkDataModelCheckerdoc
How can I use the DataModel in my code?
In your pipelines code, you might want to
in

Read and modify existing XML files
Produce new XML files
Manipulate data as specified in the DataModel (no XML file)

Multiple ways to do that
Must be
avoided

Use bindings generation

Bindings:

Pipeline
code

Data Model

Manually parse XML files

Prefered way

use

Use XPATH and xml libraries (Python lxml)

XML Schema elements become class definitions
XML product becomes an object instance

Find information
- XML data bindings resources: http://www.rpbourret.com/xml/XMLDataBinding.htm

out
How do I use XML data bindings?
Two XML binding libraries available for Euclid
For Python, based on PyXB
For C++, based on CodeSynthesis XSD

First step: generate classes from the DataModel
C++ classes:
.hxx & .cxx

generateStubs.py

DataModel
XML Schema
(.xsd files)

C++
Python
generate_allbindings.sh

Python
classes: .py

Second step: use generated classes in your own code
Create and access elements as you would do with usual classes/objects

Find information
- Python Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/PythonBinding
- C++ Bindings library:
(SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/CppBinding
- DMWorkshop Python bindings presentation: http://euclid.roe.ac.uk/attachments/download/2734
- DMWorkshop C++ bindings presentation:
http://euclid.roe.ac.uk/attachments/download/2745

& http://euclid.roe.ac.uk/attachments/download/2773
Can I get pre-configured tools at once?
We are building a virtual machine you can use on your own computer
Based on Scientific Linux 6 (OS supported for Euclid)
Linked to Euclid CODEEN yum repository for package installation
Linked to Euclid SVN for source code checkin/checkout
Containing

-

Required software libraries
Pre-configured development environment
C++ & Python bindings generation libraries
Data Model Checker tools
… and more

Still in development, hopefully available soon

Find information
- CODEEN yum packages list: https://apceuclidrepo.in2p3.fr/nexus/content/groups/el6.euclid/
- Virtualbox virtualization tool: https://www.virtualbox.org/
- VMWare virtualization tool: http://www.vmware.com/fr/products/player/

Recommended for you

An Introduction To Oracle Database
An Introduction To Oracle DatabaseAn Introduction To Oracle Database
An Introduction To Oracle Database

This document provides an overview of Oracle database history, architecture, components, and terminology. It discusses: - Oracle's release history from 1978 to present. - The physical and logical structures that make up an Oracle database, including data files, control files, redo logs, tablespaces, segments, and blocks. - The Oracle instance and its memory components like the SGA and PGA. It describes the various background processes. - How clients connect to Oracle using the listener, tnsnames.ora file, and naming resolution. - Common Oracle tools for accessing and managing databases like SQLPlus, SQL Developer, and views for monitoring databases.

oracledatabaseoracle vs sql server
Ado.net
Ado.netAdo.net
Ado.net

Ado.net is a connectivity paradigm in .net applciations. Usually the programs are widespread but the theory is not available.Approach to add theory.

Ado.Net Architecture
Ado.Net ArchitectureAdo.Net Architecture
Ado.Net Architecture

ADO.NET Architecture Data processing has traditionally relied primarily on a connection-based, two-tier model. As data processing increasingly uses multi-tier architectures, programmers are switching to a disconnected approach to provide better scalability for their applications.

ado.net architectureado.net.net
In the next episode…

Tips DataModel from its creation
to the pipeline code

Stay tuned !

More Related Content

What's hot

ADO.NET difference faqs compiled- 1
ADO.NET difference  faqs compiled- 1ADO.NET difference  faqs compiled- 1
ADO.NET difference faqs compiled- 1
Umar Ali
 
Ado.net session14
Ado.net session14Ado.net session14
Ado.net session14
Niit Care
 
Oracle
OracleOracle
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answers
sheibansari
 
MySQL and its basic commands
MySQL and its basic commandsMySQL and its basic commands
MySQL and its basic commands
Bwsrang Basumatary
 
7 data management design
7 data management design7 data management design
7 data management design
Châu Thanh Chương
 
ado.net
ado.netado.net
ado.net
ZAIYAUL HAQUE
 
576 oracle-dba-interview-questions
576 oracle-dba-interview-questions576 oracle-dba-interview-questions
576 oracle-dba-interview-questions
Naveen P
 
Oracle Database Introduction
Oracle Database IntroductionOracle Database Introduction
Oracle Database Introduction
Chhom Karath
 
Oracle dba interview
Oracle dba interviewOracle dba interview
Oracle dba interview
Naveen P
 
php databse handling
php databse handlingphp databse handling
php databse handling
kunj desai
 
All Oracle-dba-interview-questions
All Oracle-dba-interview-questionsAll Oracle-dba-interview-questions
All Oracle-dba-interview-questions
Naveen P
 
An Introduction To Oracle Database
An Introduction To Oracle DatabaseAn Introduction To Oracle Database
An Introduction To Oracle Database
Meysam Javadi
 
Ado.net
Ado.netAdo.net
Ado.Net Architecture
Ado.Net ArchitectureAdo.Net Architecture
Ado.Net Architecture
Umar Farooq
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
puja_dhar
 
Sqlite
SqliteSqlite
Sqlite
Kumar
 
Object relational and extended relational databases
Object relational and extended relational databasesObject relational and extended relational databases
Object relational and extended relational databases
Suhad Jihad
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancements
scacharya
 
The oracle database architecture
The oracle database architectureThe oracle database architecture
The oracle database architecture
Akash Pramanik
 

What's hot (20)

ADO.NET difference faqs compiled- 1
ADO.NET difference  faqs compiled- 1ADO.NET difference  faqs compiled- 1
ADO.NET difference faqs compiled- 1
 
Ado.net session14
Ado.net session14Ado.net session14
Ado.net session14
 
Oracle
OracleOracle
Oracle
 
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answers
 
MySQL and its basic commands
MySQL and its basic commandsMySQL and its basic commands
MySQL and its basic commands
 
7 data management design
7 data management design7 data management design
7 data management design
 
ado.net
ado.netado.net
ado.net
 
576 oracle-dba-interview-questions
576 oracle-dba-interview-questions576 oracle-dba-interview-questions
576 oracle-dba-interview-questions
 
Oracle Database Introduction
Oracle Database IntroductionOracle Database Introduction
Oracle Database Introduction
 
Oracle dba interview
Oracle dba interviewOracle dba interview
Oracle dba interview
 
php databse handling
php databse handlingphp databse handling
php databse handling
 
All Oracle-dba-interview-questions
All Oracle-dba-interview-questionsAll Oracle-dba-interview-questions
All Oracle-dba-interview-questions
 
An Introduction To Oracle Database
An Introduction To Oracle DatabaseAn Introduction To Oracle Database
An Introduction To Oracle Database
 
Ado.net
Ado.netAdo.net
Ado.net
 
Ado.Net Architecture
Ado.Net ArchitectureAdo.Net Architecture
Ado.Net Architecture
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
 
Sqlite
SqliteSqlite
Sqlite
 
Object relational and extended relational databases
Object relational and extended relational databasesObject relational and extended relational databases
Object relational and extended relational databases
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancements
 
The oracle database architecture
The oracle database architectureThe oracle database architecture
The oracle database architecture
 

Similar to Euclid Data Model 101 - Episode 01: Overview

Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Jerry SILVER
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
University of Connecticut Libraries
 
How to use source control with apex?
How to use source control with apex?How to use source control with apex?
How to use source control with apex?
Oliver Lemm
 
EclipseCon Eu 2012 - Build your own System Engineering workbench
EclipseCon Eu 2012 - Build your own System Engineering workbenchEclipseCon Eu 2012 - Build your own System Engineering workbench
EclipseCon Eu 2012 - Build your own System Engineering workbench
melbats
 
treeview
treeviewtreeview
treeview
tutorialsruby
 
treeview
treeviewtreeview
treeview
tutorialsruby
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
Safe Software
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
Jaime Martin Losa
 
Develop an App with the Odoo Framework
Develop an App with the Odoo FrameworkDevelop an App with the Odoo Framework
Develop an App with the Odoo Framework
Odoo
 
Terraform modules restructured
Terraform modules restructuredTerraform modules restructured
Terraform modules restructured
Ami Mahloof
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
DoiT International
 
Silverlight Development & The Model-View-ViewModel Pattern
Silverlight Development & The Model-View-ViewModel PatternSilverlight Development & The Model-View-ViewModel Pattern
Silverlight Development & The Model-View-ViewModel Pattern
Derek Novavi
 
Schema webinar
Schema webinarSchema webinar
Schema webinar
Gary Sherman
 
Linq To XML Overview
Linq To XML OverviewLinq To XML Overview
Linq To XML Overview
Dale Hawthorne
 
Earth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A TutorialEarth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A Tutorial
The HDF-EOS Tools and Information Center
 
PowerPoint
PowerPointPowerPoint
PowerPoint
Videoguy
 
Jacob Keecheril
Jacob KeecherilJacob Keecheril
Jacob Keecheril
Jacob Keecheril
 
Adv DB - Full Handout.pdf
Adv DB - Full Handout.pdfAdv DB - Full Handout.pdf
Adv DB - Full Handout.pdf
3BRBoruMedia
 
ALA Interoperability
ALA InteroperabilityALA Interoperability
ALA Interoperability
spacecowboyian
 
IT6701-Information management question bank
IT6701-Information management question bankIT6701-Information management question bank
IT6701-Information management question bank
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 

Similar to Euclid Data Model 101 - Episode 01: Overview (20)

Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
 
How to use source control with apex?
How to use source control with apex?How to use source control with apex?
How to use source control with apex?
 
EclipseCon Eu 2012 - Build your own System Engineering workbench
EclipseCon Eu 2012 - Build your own System Engineering workbenchEclipseCon Eu 2012 - Build your own System Engineering workbench
EclipseCon Eu 2012 - Build your own System Engineering workbench
 
treeview
treeviewtreeview
treeview
 
treeview
treeviewtreeview
treeview
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
 
Develop an App with the Odoo Framework
Develop an App with the Odoo FrameworkDevelop an App with the Odoo Framework
Develop an App with the Odoo Framework
 
Terraform modules restructured
Terraform modules restructuredTerraform modules restructured
Terraform modules restructured
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
Silverlight Development & The Model-View-ViewModel Pattern
Silverlight Development & The Model-View-ViewModel PatternSilverlight Development & The Model-View-ViewModel Pattern
Silverlight Development & The Model-View-ViewModel Pattern
 
Schema webinar
Schema webinarSchema webinar
Schema webinar
 
Linq To XML Overview
Linq To XML OverviewLinq To XML Overview
Linq To XML Overview
 
Earth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A TutorialEarth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A Tutorial
 
PowerPoint
PowerPointPowerPoint
PowerPoint
 
Jacob Keecheril
Jacob KeecherilJacob Keecheril
Jacob Keecheril
 
Adv DB - Full Handout.pdf
Adv DB - Full Handout.pdfAdv DB - Full Handout.pdf
Adv DB - Full Handout.pdf
 
ALA Interoperability
ALA InteroperabilityALA Interoperability
ALA Interoperability
 
IT6701-Information management question bank
IT6701-Information management question bankIT6701-Information management question bank
IT6701-Information management question bank
 

Recently uploaded

Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 

Recently uploaded (20)

Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 

Euclid Data Model 101 - Episode 01: Overview

  • 1. High level presentation of data modeler’s job & available tools
  • 2. INTRODUCTION Objectives of Euclid Datamodel 101 Slidecast dedicated to Euclid data modelers & developers Help you understand what is expected and how to do it Released as multiple episodes over time 1st episode: high-level overview of tools and process 2nd episode: the TIPS example Following episodes: zoom on technical points
  • 3. INTRODUCTION Objectives and contents of this presentation Get an overview of the data modeling process Understand the data model workflow Know where to find information Know what tools are available No complex details and technical information here… … but high-level information and pointers to the right direction.
  • 4. SUMMARY The data modeling process 1 - Understand Euclid DataModel Why using a Euclid DataModel? Why choosing XML? What is XML Schema? What are the Euclid-specific XML rules my schema shall comply with? How is DataModel SVN repository structured? How are xml namespaces structured? 2 - Create your own DataModel What should my DataModel contain? What software can I use to write xml? How can I check if my datamodel is correct? 3 - Use the DataModel in your own code How can I use the data model in my code? How can I use XML data bindings? Can I get pre-configured tools all at once?
  • 5. Why using a Euclid DataModel? Euclid mission relies a lot on data transfer and manipulation Data consistency between OUs, workflows, pipelines, storage is a key point use EAS Your DataModel will be: - used to structure EAS db - manipulated by your pipelines code Compliant Data products in/out DataModel IAL SDC use Pipeline code DESIGN TIME RUN TIME Your data products will be: - stored on EAS - queryable from EAS - transmitted to/from pipelines by IAL
  • 6. Why choosing XML? XML language brings many benefits: Easy to read and understand by humans and machines <coord> <x>12.05</x> <y>3.1</y> </coord> Many tools available to create, control and check xml Strong type/namespace control and definition Widely used and supported across the world Self contained: express data and data structure XML chosen above many other alternatives Find information - W3Schools tutorials: http://www.w3schools.com/schema/
  • 7. What is XML Schema? Two file format you should be familiar with: XSD (XML schema) XML Describes the data structure Contains the actual data <coord> <x>12.05</x> <y>3.1</y> </coord> complies with <xs:element name=«coord»> <xs:complexType> <xs:sequence> <xs:element name=«x» type=«xs:float» /> <xs:element name=«y» type=«xs:float» /> </xs:sequence> </xs:complexType> </xs:element> Find information - W3Schools tutorials: http://www.w3schools.com/schema/ - Highlights on XML/XSD (DM Workshop): http://euclid.roe.ac.uk/attachments/download/2744/Workshop_Nov2013_XSD_XML%20-%202.02.ppt
  • 8. What are the Euclid-specific XML rules my schema shall comply with? Need for a fully consistent DataModel everybody should follow the same rules Among existing rules: - XML Schema file name XML file name Single root element Element identifier name Numeric type restriction - Recursive definitions Target namespaces Encoding Unqualified namespaces … Rules are still in development, feedback is welcome and changes might be required Find information - Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47 - DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf
  • 9. How is DataModel SVN repo structured? Classic SVN structure - trunk: latest stable work - branches: specific feature parallel development - tags: official releases Dictionary and Interfaces for your products - Dictionnary: definition of the complexTypes and elements of your product entire DataModel - Interfaces: definition of the data exchanged between components. One root element only per type, that you can see as a variable to access a product. EC/SGS/ST/4-2-05-DM/schema Find information - DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf - Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html - Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
  • 10. How are xml namespaces structured? Under Dictionary and Interfaces, 4 top-level namespaces - bas: common definitions shared by everyone - ins: instrument specific definitions - pro: OU-specific definitions - sys: system specific definitions (storage, processing…) /pro sub-levels - one directory per OU - one responsible custodian per directory EC/SGS/ST/4-2-05-DM/schema Find information - DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf - Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html - Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
  • 11. What should my DataModel contain? Your DataModel should contain: Must have - definitions of pipeline inputs - definitions of output products - definitions of intermediate elements used in your code <sgs:dataContainer> • ID • Filename • StorageNode • Path Your DataModel can use: - new elements you define - already existing elements - dataContainers for files with no specific definition Find information - Fits DataModel (see dictionary and interfaces): schema/trunk/Dictionary/pro/sim/euc-test-ousim-tips.xsd - DM Workshop DataContainer presentation: http://euclid.roe.ac.uk/attachments/download/2765 - DM wiki homepage: http://euclid.roe.ac.uk/projects/eucrma/wiki
  • 12. What software can I use to write XML? Of course, any text editor allows you to simply read and write XML One of these two powerful XML development environment software is recommended - Altova XMLSpy (license from 400€) - Oxygen XML Editor (license from 99$ - 30 days free trial) Project oriented browsing, handles dependencies between files Content completion for elements, attributes & values XML validation and detection of errors Schema modeling with graph representation Find information - Altova XMLSpy: http://www.altova.com/xmlspy.html - Oxygen XML Editor: http://www.oxygenxml.com/
  • 13. How can I check if my DataModel is correct? Use Oxygen or XMLSpy to validate your XML and XML Schema files Well formed XML: correct language syntax Document validation: xml conforms to xml schema definition Use Euclid Data Model Checker tools Check compliance with Euclid DataModel rules Python module & scripts available in Euclid SVN (ECSGSST4-2-05-DMtoolstrunkDataModelChecker) Find information - Altova XMLSpy: http://www.altova.com/xmlspy.html Oxygen XML Editor: http://www.oxygenxml.com/ Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47 DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf - DataModelChecker readme (SVN): ECSGSST4-2-05-DMtoolstrunkDataModelCheckerdoc
  • 14. How can I use the DataModel in my code? In your pipelines code, you might want to in Read and modify existing XML files Produce new XML files Manipulate data as specified in the DataModel (no XML file) Multiple ways to do that Must be avoided Use bindings generation Bindings: Pipeline code Data Model Manually parse XML files Prefered way use Use XPATH and xml libraries (Python lxml) XML Schema elements become class definitions XML product becomes an object instance Find information - XML data bindings resources: http://www.rpbourret.com/xml/XMLDataBinding.htm out
  • 15. How do I use XML data bindings? Two XML binding libraries available for Euclid For Python, based on PyXB For C++, based on CodeSynthesis XSD First step: generate classes from the DataModel C++ classes: .hxx & .cxx generateStubs.py DataModel XML Schema (.xsd files) C++ Python generate_allbindings.sh Python classes: .py Second step: use generated classes in your own code Create and access elements as you would do with usual classes/objects Find information - Python Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/PythonBinding - C++ Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/CppBinding - DMWorkshop Python bindings presentation: http://euclid.roe.ac.uk/attachments/download/2734 - DMWorkshop C++ bindings presentation: http://euclid.roe.ac.uk/attachments/download/2745 & http://euclid.roe.ac.uk/attachments/download/2773
  • 16. Can I get pre-configured tools at once? We are building a virtual machine you can use on your own computer Based on Scientific Linux 6 (OS supported for Euclid) Linked to Euclid CODEEN yum repository for package installation Linked to Euclid SVN for source code checkin/checkout Containing - Required software libraries Pre-configured development environment C++ & Python bindings generation libraries Data Model Checker tools … and more Still in development, hopefully available soon Find information - CODEEN yum packages list: https://apceuclidrepo.in2p3.fr/nexus/content/groups/el6.euclid/ - Virtualbox virtualization tool: https://www.virtualbox.org/ - VMWare virtualization tool: http://www.vmware.com/fr/products/player/
  • 17. In the next episode… Tips DataModel from its creation to the pipeline code Stay tuned !