The document discusses extending the Chemical Tagger natural language processing tool to be more applicable to climate science texts by incorporating climate science controlled vocabularies. It describes adapting the Chemical Tagger, originally designed for chemistry texts, to process abstracts from climate science journals. This includes modifying the tagger's dictionaries and grammars. Web forms and a CIM document viewer are also discussed which can generate and view outputs in the CIM XML format. The document aims to highlight climate science terms in texts to map them to a controlled vocabulary and better populate the CIM framework.
Applying the Scientific Method to Simulation Experiments
In this talk I would like to explore on how to apply the scientific method to in silico experiments. How can we design these experiments, so that they are independent of the software tool that gave rise to them? Over the past decade we have seen the rise of model exchange formats such as the Systems Biology Markup Language (SBML), that enable us to share the models readily with colleagues and between applications.
Here I present the Simulation Experiment Description Markup Language (SED-ML) that aims to do the same thing for in silico experiments. After detailing its history, and where it currently stands, I will give a short overview of the growing tool support.
A 5-minute presentation at University of Edinburgh for UK Ontology Workshop 2013-04-11. The animals demonstrate that ontologies can be simple and lament the lack og good ontologies in most of physical science, especially computational chemistry. Blog at http://blogs.ch.cam.ac.uk/pmr
1 Project 2 Introduction - the SeaPort Project seri.docx
1
Project 2
Introduction - the SeaPort Project series
For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports.
Here are the classes and their instance variables we wish to define:
SeaPortProgram extends JFrame
o variables used by the GUI interface
o world: World
Thing implement Comparable <Thing>
o index: int
o name: String
o parent: int
World extends Thing
o ports: ArrayList <SeaPort>
o time: PortTime
SeaPort extends Thing
o docks: ArrayList <Dock>
o que: ArrayList <Ship> // the list of ships waiting to dock
o ships: ArrayList <Ship> // a list of all the ships at this port
o persons: ArrayList <Person> // people with skills at this port
Dock extends Thing
o ship: Ship
Ship extends Thing
o arrivalTime, dockTime: PortTime
o draft, length, weight, width: double
o jobs: ArrayList <Job>
PassengerShip extends Ship
o numberOfOccupiedRooms: int
o numberOfPassengers: int
o numberOfRooms: int
CargoShip extends Ship
o cargoValue: double
o cargoVolume: double
o cargoWeight: double
Person extends Thing
o skill: String
Job extends Thing - optional till Projects 3 and 4
o duration: double
o requirements: ArrayList <String>
// should be some of the skills of the persons
PortTime
o time: int
Eventually, in Projects 3 and 4, you will be asked to show the progress of the jobs using JProgressBar's.
2
Here's a very quick overview of all projects:
1. Read a data file, create the internal data structure, create a GUI to display the structure, and let
the user search the structure.
2. Sort the structure, use hash maps to create the structure more efficiently.
3. Create a thread for each job, cannot run until a ship has a dock, create a GUI to show the
progress of each job.
4. Simulate competing for resources (persons with particular skills) for each job.
Project 2 General Objectives
Project 2 - Map class, Comparator, sorting
Use the JDK Map class to write more efficient code when constructing the internal data
structures from the data file.
Implement SORTING using the Comparator interface together with the JDK support for sorting
data structures, thus sorting on different fields of the classes from Project 1.
Extend the GUI from Project 1 to let the user sort the data at run-time.
Documentation Requirements:
You should start working on a documentation file before you do anything else with these projects, and
fill in items as you go along. Leaving the documentation until the project is finished is not a good idea for
any number of reasons.
The documentation should include the following (graded) elements:
Cover page (including name, date, project, your class information)
Design
o including a UML class diagram
o classes, variables and methods: what they mean and why they are there
o tied to the requirements of the project
User's Guide
o how would a user start and run your pro ...
The document discusses key concepts related to memory models in C#, including:
1. The compilation process involves lexical analysis, parsing, semantic analysis, optimization, and code generation.
2. Value types are stored on the stack while reference types are stored on the heap.
3. The garbage collector performs memory management by freeing up unused memory on the heap.
This document summarizes a workshop on the Tulipp project, which aims to develop ubiquitous low-power image processing platforms. The workshop covered shortcomings of existing platforms, introduced the Maestro real-time operating system as the reference platform, and described the concept of the Tulipp project to provide an operating system and tools to support heterogeneous architectures including FPGA and multi-core processors. Attendees participated in hands-on labs demonstrating how to build applications with Maestro, leverage OpenMP for parallelism, and use SDSoC tools to automatically accelerate functions in FPGA hardware.
The document discusses object-oriented programming (OOP) concepts like abstraction, encapsulation, inheritance, and polymorphism. It then covers design patterns, the Unified Modeling Language (UML), and provides an example case study of developing a student information system using Java and related technologies. The key topics are introduced at a high-level with the goal of demonstrating practical OOP through the case study example.
Software tools for high-throughput materials data generation and data mining
Atomate and matminer are open-source Python libraries for high-throughput materials data generation and data mining. Atomate makes it easy to automatically generate large datasets by running standardized computational workflows with different simulation packages. Matminer contains tools for featurizing materials data and integrating it with machine learning algorithms and data visualization methods. Both aim to accelerate materials discovery by automating and standardizing computational workflows and data analysis tasks.
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
(1) The authors present an approach for using FPGAs in high-performance computing (HPC) that involves using functional descriptions, vector type-transformations, and cost-modeling. (2) Their approach uses type transformations to generate design variants from a functional program and develops an intermediate language and cost model. (3) The cost model provides fast, lightweight estimates of performance and resource usage for different design variants to enable automated design space exploration for FPGA-based HPC applications.
This talks explains the motivations for the Co-op technology: what are the challenges it addresses, in particular focusing on reducing accidental complexity, where it comes from, and a general vision on how to resolve it. Then we continue to show practical application of Co-op, including experience figures from large-scale application of a previous generation of this technology. Show a little bit about its realization, and conclude with an evaluation of the technology.
The document discusses design patterns and how they can be taught through refactoring exercises and case studies. It provides an overview of common design patterns such as creational, structural, and behavioral patterns. Examples are given to illustrate how patterns like factory method, builder, strategy, and decorator can be used to solve common object-oriented design problems by varying implementations independently and adding new functionalities through composition instead of inheritance. The talk emphasizes applying principles like open-closed principle and encapsulating variations when refactoring code using patterns.
This document presents a comparative evaluation of Galaxy and Ruffus-based scripting workflows for DNA sequencing analysis pipelines. The research aims to identify the optimal workflow system by implementing DNA-seq analysis pipelines in both Galaxy and Ruffus and benchmarking their performance. Literature on existing workflow systems is reviewed. The document outlines the research objectives, design, methodology, and requirements for the DNA-seq analysis pipeline use case. Preliminary results indicate pros and cons of each approach, with further analysis of performance metrics still needed.
This document discusses model-driven approaches for cloud data storage. It outlines objectives to 1) characterize cloud data storage requirements using conceptual models, 2) select appropriate cloud data storage implementations and providers based on requirements, and 3) manage artifacts for working with different storage solutions. Existing solutions are limited and the proposed approach uses model-driven engineering with multiple levels of modeling and transformation to map between requirements and storage solutions.
The document provides an overview of distance learning standards and specifications, with a focus on the Sharable Content Object Reference Model (SCORM). It describes the key components of SCORM, including the Content Aggregation Model, Runtime Environment, and Sequencing and Navigation specifications. The document also summarizes the evolution of SCORM and related standards organizations involved in e-learning technologies.
This document provides an overview of a talk on advanced Zemax Programming Language (ZPL) macro programming. It introduces the speakers and describes the structure of the talk, which will cover what ZPL is, examples of using ZPL for specific tasks like calculating ray angles of incidence across a pupil, and tips for writing macros. The document outlines several example macros that will be discussed in detail during the talk, including automating repetitive tasks and creating custom operands. It encourages participants to provide feedback and suggestions for additional topics.
This document discusses the development of executable models of fissile reactor systems for hardware simulation and design. The models are converted from DESIRE simulation models to C++ code using an automatic conversion process. This allows the models to be run on Windows computers and used for engineering design and training simulations. The conversion process and numerical integration engine are described. The document also outlines the engineering design interface and simulator backend that have been developed to utilize the executable models.
Python bindings for SAF-AIS APIs offer many advantages to middleware developers, application developers, tool developers and testers. The bindings help to speed up the software development lifecycle and enable rapid deployment of architecture-independent components and services. This session will describe main principles guiding Python bindings implementation, and will have extensive in-depth application Python code examples using SAF-AIS services.
A Case Study Of A Reusable Component CollectionJennifer Strong
This document summarizes a case study of reusable software components for information retrieval. It discusses the development, distribution, use and evolution of the components. The components were developed in C to accompany a book on data structures and algorithms for information retrieval systems. Practical issues that arose included choosing the implementation language, distributing source code versus binaries, testing and optimizing components, different delivery methods, legal ownership, maintenance and configuration management, searching for components and understanding how to use them.
STATICMOCK : A Mock Object Framework for Compiled Languages ijseajournal
Mock object frameworks are very useful for creating unit tests. However, purely compiled languages lack robust frameworks for mock objects. The frameworks that do exist rely on inheritance, compiler directives, or linker manipulation. Such techniques limit the applicability of the existing frameworks, especially when
dealing with legacy code.
We present a tool, StaticMock, for creating mock objects in compiled languages. This tool uses source-tosource
compilation together with Aspect Oriented Programming to deliver a unique solution that does not rely on the previous, commonly used techniques. We evaluate the compile-time and run-time overhead incurred by this tool, and we demonstrate the effectiveness of the tool by showing that it can be applied to
new and existing code
The document discusses object-relational and extended relational databases. It covers how an ORDBMS supports both relational and object-oriented aspects by allowing objects, classes, inheritance and other OO concepts in database schemas and queries. It provides examples of using ADTs and structured types to store complex data like videos more efficiently compared to a traditional RDBMS. Query processing and optimization techniques for ORDBMS are also discussed, such as user-defined aggregates, method caching and pointer swizzling.
Applying the Scientific Method to Simulation ExperimentsFrank Bergmann
In this talk I would like to explore on how to apply the scientific method to in silico experiments. How can we design these experiments, so that they are independent of the software tool that gave rise to them? Over the past decade we have seen the rise of model exchange formats such as the Systems Biology Markup Language (SBML), that enable us to share the models readily with colleagues and between applications.
Here I present the Simulation Experiment Description Markup Language (SED-ML) that aims to do the same thing for in silico experiments. After detailing its history, and where it currently stands, I will give a short overview of the growing tool support.
A 5-minute presentation at University of Edinburgh for UK Ontology Workshop 2013-04-11. The animals demonstrate that ontologies can be simple and lament the lack og good ontologies in most of physical science, especially computational chemistry. Blog at http://blogs.ch.cam.ac.uk/pmr
1 Project 2 Introduction - the SeaPort Project seri.docxhoney725342
1
Project 2
Introduction - the SeaPort Project series
For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports.
Here are the classes and their instance variables we wish to define:
SeaPortProgram extends JFrame
o variables used by the GUI interface
o world: World
Thing implement Comparable <Thing>
o index: int
o name: String
o parent: int
World extends Thing
o ports: ArrayList <SeaPort>
o time: PortTime
SeaPort extends Thing
o docks: ArrayList <Dock>
o que: ArrayList <Ship> // the list of ships waiting to dock
o ships: ArrayList <Ship> // a list of all the ships at this port
o persons: ArrayList <Person> // people with skills at this port
Dock extends Thing
o ship: Ship
Ship extends Thing
o arrivalTime, dockTime: PortTime
o draft, length, weight, width: double
o jobs: ArrayList <Job>
PassengerShip extends Ship
o numberOfOccupiedRooms: int
o numberOfPassengers: int
o numberOfRooms: int
CargoShip extends Ship
o cargoValue: double
o cargoVolume: double
o cargoWeight: double
Person extends Thing
o skill: String
Job extends Thing - optional till Projects 3 and 4
o duration: double
o requirements: ArrayList <String>
// should be some of the skills of the persons
PortTime
o time: int
Eventually, in Projects 3 and 4, you will be asked to show the progress of the jobs using JProgressBar's.
2
Here's a very quick overview of all projects:
1. Read a data file, create the internal data structure, create a GUI to display the structure, and let
the user search the structure.
2. Sort the structure, use hash maps to create the structure more efficiently.
3. Create a thread for each job, cannot run until a ship has a dock, create a GUI to show the
progress of each job.
4. Simulate competing for resources (persons with particular skills) for each job.
Project 2 General Objectives
Project 2 - Map class, Comparator, sorting
Use the JDK Map class to write more efficient code when constructing the internal data
structures from the data file.
Implement SORTING using the Comparator interface together with the JDK support for sorting
data structures, thus sorting on different fields of the classes from Project 1.
Extend the GUI from Project 1 to let the user sort the data at run-time.
Documentation Requirements:
You should start working on a documentation file before you do anything else with these projects, and
fill in items as you go along. Leaving the documentation until the project is finished is not a good idea for
any number of reasons.
The documentation should include the following (graded) elements:
Cover page (including name, date, project, your class information)
Design
o including a UML class diagram
o classes, variables and methods: what they mean and why they are there
o tied to the requirements of the project
User's Guide
o how would a user start and run your pro ...
The document discusses key concepts related to memory models in C#, including:
1. The compilation process involves lexical analysis, parsing, semantic analysis, optimization, and code generation.
2. Value types are stored on the stack while reference types are stored on the heap.
3. The garbage collector performs memory management by freeing up unused memory on the heap.
This document summarizes a workshop on the Tulipp project, which aims to develop ubiquitous low-power image processing platforms. The workshop covered shortcomings of existing platforms, introduced the Maestro real-time operating system as the reference platform, and described the concept of the Tulipp project to provide an operating system and tools to support heterogeneous architectures including FPGA and multi-core processors. Attendees participated in hands-on labs demonstrating how to build applications with Maestro, leverage OpenMP for parallelism, and use SDSoC tools to automatically accelerate functions in FPGA hardware.
The document discusses object-oriented programming (OOP) concepts like abstraction, encapsulation, inheritance, and polymorphism. It then covers design patterns, the Unified Modeling Language (UML), and provides an example case study of developing a student information system using Java and related technologies. The key topics are introduced at a high-level with the goal of demonstrating practical OOP through the case study example.
Software tools for high-throughput materials data generation and data miningAnubhav Jain
Atomate and matminer are open-source Python libraries for high-throughput materials data generation and data mining. Atomate makes it easy to automatically generate large datasets by running standardized computational workflows with different simulation packages. Matminer contains tools for featurizing materials data and integrating it with machine learning algorithms and data visualization methods. Both aim to accelerate materials discovery by automating and standardizing computational workflows and data analysis tasks.
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...waqarnabi
(1) The authors present an approach for using FPGAs in high-performance computing (HPC) that involves using functional descriptions, vector type-transformations, and cost-modeling. (2) Their approach uses type transformations to generate design variants from a functional program and develops an intermediate language and cost model. (3) The cost model provides fast, lightweight estimates of performance and resource usage for different design variants to enable automated design space exploration for FPGA-based HPC applications.
Software development effort reduction with Co-oplbergmans
This talks explains the motivations for the Co-op technology: what are the challenges it addresses, in particular focusing on reducing accidental complexity, where it comes from, and a general vision on how to resolve it. Then we continue to show practical application of Co-op, including experience figures from large-scale application of a previous generation of this technology. Show a little bit about its realization, and conclude with an evaluation of the technology.
The document discusses design patterns and how they can be taught through refactoring exercises and case studies. It provides an overview of common design patterns such as creational, structural, and behavioral patterns. Examples are given to illustrate how patterns like factory method, builder, strategy, and decorator can be used to solve common object-oriented design problems by varying implementations independently and adding new functionalities through composition instead of inheritance. The talk emphasizes applying principles like open-closed principle and encapsulating variations when refactoring code using patterns.
This document presents a comparative evaluation of Galaxy and Ruffus-based scripting workflows for DNA sequencing analysis pipelines. The research aims to identify the optimal workflow system by implementing DNA-seq analysis pipelines in both Galaxy and Ruffus and benchmarking their performance. Literature on existing workflow systems is reviewed. The document outlines the research objectives, design, methodology, and requirements for the DNA-seq analysis pipeline use case. Preliminary results indicate pros and cons of each approach, with further analysis of performance metrics still needed.
This document discusses model-driven approaches for cloud data storage. It outlines objectives to 1) characterize cloud data storage requirements using conceptual models, 2) select appropriate cloud data storage implementations and providers based on requirements, and 3) manage artifacts for working with different storage solutions. Existing solutions are limited and the proposed approach uses model-driven engineering with multiple levels of modeling and transformation to map between requirements and storage solutions.
The document provides an overview of distance learning standards and specifications, with a focus on the Sharable Content Object Reference Model (SCORM). It describes the key components of SCORM, including the Content Aggregation Model, Runtime Environment, and Sequencing and Navigation specifications. The document also summarizes the evolution of SCORM and related standards organizations involved in e-learning technologies.
This document provides an overview of a talk on advanced Zemax Programming Language (ZPL) macro programming. It introduces the speakers and describes the structure of the talk, which will cover what ZPL is, examples of using ZPL for specific tasks like calculating ray angles of incidence across a pupil, and tips for writing macros. The document outlines several example macros that will be discussed in detail during the talk, including automating repetitive tasks and creating custom operands. It encourages participants to provide feedback and suggestions for additional topics.
This document discusses the development of executable models of fissile reactor systems for hardware simulation and design. The models are converted from DESIRE simulation models to C++ code using an automatic conversion process. This allows the models to be run on Windows computers and used for engineering design and training simulations. The conversion process and numerical integration engine are described. The document also outlines the engineering design interface and simulator backend that have been developed to utilize the executable models.
Python bindings for SAF-AIS APIs offer many advantages to middleware developers, application developers, tool developers and testers. The bindings help to speed up the software development lifecycle and enable rapid deployment of architecture-independent components and services. This session will describe main principles guiding Python bindings implementation, and will have extensive in-depth application Python code examples using SAF-AIS services.
Certified Quality Engineer.PREVIEW .pdfGAFM ACADEMY
The Certified Quality Engineer (CQE) is a gold-standard certification for an experienced individual who has earned the accredited credential from The Global Academy of Finance and Management ®. Earning the CQE designation demonstrates that you have experience in quality engineering that includes monitoring and testing the quality of manufacturing products, ensuring compliance with quality standards, identifying quality issues, recommending solutions, and creating quality documentations.
It forms the basis of the assessment that candidates must pass to gain the Certified Quality Engineer status and inclusion in the Directory of Certified Professionals of The Global Academy of Finance and Management ®.
https://gafm.com.my/digital-certification/gafm-book-shop/
https://gafm.com.my/digital-certification/application-for-certification/
It is the first in a series of webinars planned under Mentoring Program - The Way Forward - An initiative of JU Civil Y2K Batch.
The presentation talks about career options for Civil Graduates withing the field of modeling and simulation (Digital Transformation).
Pakistan Railway Islamabad Punjab Pakistan has announced to select the eligible candidates for the vacant posts including Skilled Fitter, CWL, Certified Welder, Driver, CNC Machine Operator, Skilled Welder and Skilled Painter. Educational qualification of applicants should be DAE, Matric...
Curriculam vitae/Biodata of Dr.C.P.Prince 2024.DR.PRINCE C P
Dr. C. P Prince.,M.Sc PhD is presently working as HOD and Associate Professor,
Department of Microbiology, Mother Theresa Post Graduate and Research Institute of
Health Sciences (Government of Puducherry Institution) .Hailing from Kunnamkulam,
Thrissur, Kerala. Did his Schooling at Ponnani, Under graduation at Guruvayoor Sree
Krishna College , Post graduation at Kasturba medical college, Manipal. And PhD from
Kalinga University.
Started his career as Lecturer of Microbiology at SRM Dental College, Chennai in the
year 1999. Apart from research articles he has authored 4 text books in Microbiology
and 2 patents . Member of various academic and research committees in Universities
and institutions across India.
Also translated and published Tamil poet Erode Tamilanban's 2 books into Malayalam.
In the social and cultural field, Served as General secretary of Pondicherry Kerala
Samajam, President of Intercultural Association of Puducherry; played pivotal role in
establishing All India Malayalee Association state unit in Pondicherry; President of
MTIHS Staff Welfare Association; Trustee member of Pondicherry Sree Muthappan
Seva Samithi, Convener of Sahodaran Ayyappan Memorial Narayana Guru Samithi,
Convener of Puducherry Sri Guruvayoorappan Kshethra Samithi, Member of Puduvai
Natya Mantram, Member of Indian red cross society, Convener of Red Ribbon Club etc.
Received following Awards: 1. Best Associate Professor in Microbiology-2017(Instituted
by Pearl Foundation, Madurai, Tamilnadu) 2. “Har Gobind Khorana Best Scientist Award”
for Microbiology -202 by Bose Science Society Established under the Charter of
Tamilnadu Scientific Research Organisation. 3. “Gurudeva Seva Puraskaram” award for
the excellence in the field of Education and social service during the Guru Dakshina-
2024 instituted by Dharma Probhodhanam Trust, Kerala.
Address:
Dr. C. P Prince.,M.Sc PhD
HOD and Associate Professor,
Department of Microbiology,
Mother Theresa Post Graduate and Research Institute of Health Sciences (Government
of Puducherry Institution)
Gorimedu
Pondicherry
India 605008
Phone:9345413279
Email. cpprincepni@gmail.com
The Certified Quality Engineer ™ (CQE) is a gold-standard certification issued by The American Academy of Project Management ®. Earning the CQE credential demonstrates that you have skills and experience in quality engineering and technical disciplines which includes monitoring and testing the quality of manufacturing products, ensuring compliance with quality standards, identifying issues, recommending solutions, ensuring compliance with quality management processes, and developing quality documentation.
Vaishali @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Cpascoe pimms or2012_
1. The PIMMS project and Natural Language
Processing for Climate Science
Extending the Chemical Tagger natural language processing tool with
climate science controlled vocabularies
Charlotte Pascoe, Hannah Barjat
Peter Murray-Rust and Gerry Devine
June 9th 2012, Open Repositories 2012
3. Common Information Model
Data Software
We can talk about DataObjects
collected together in any number of
ways, stored in a particular medium
Shared ISO
We reuse various ISO classes
Quality
We can talk about
Some concepts hierarchical
are shared ModelComponents
with
We can record the ModelProperties, som
quality of things A particular Activity uses
e of which can be
a particular
coupled together
Grids Activity SoftwareComponent
We can talk about
Simulations run in
support of Experiments.
Experiments consist of
Requirements;
We can define a GridSpec Simulations conform to
or some other geometry Requirements
9. Chemical Tagger
http://chemicaltagger.ch.cam.ac.uk/
ChemicalTagger is an open-source tool that uses OSCAR4 and NLP techniques for tagging and
parsing experimental sections in the chemistry literature.
10. Chemical Tagger
https://bitbucket.org/wwmm/chemicaltagger & https://bitbucket.org/wwmm/acpgeo
• Java project Developed by the Peter Murray-Rust
group, Cambridge. Online demo:
http://chemicaltagger.ch.cam.ac.uk/
• Adapted for use with ACP Abstracts (Lezan Hawizy and
Hannah Barjat).
– Modification by use of dictionaries and changes to grammar.
– First use case outside of laboratory chemistry.
– Still with a significant chemistry component.
– Wider physical science.
• Open Source NLP tool for processing
• Open Source NLP tool for processing chemical text
chemical text
• Combines Chemical Entity Recognitions (OSCAR) with NLP
• techniquesChemical Entity Recognitions
Combines
• Extendible and Reconfigurable Taggers and Parsers
(OSCAR) with NLP techniques
• Extendible and Reconfigurable Taggers
and Parsers generated using ANTLR
(ANother Tool for Language Recognition)
11. Chemical Tagger & PIMMS
• To extend chemical tagger to be more suited to
climate modelling.
– Specifically:
• Palaeoclimate modelling and how process of text mining
might differ from development of a controlled vocabulary.
• High-lighting of text for comparison with CIM documents.
• Initially only using XML Abstracts e.g. from EGU’s
Geoscientific Model Development and Climate of the Past.
– Brief look at PDF to Text.
11
12. Paleoclimate Language
• Time periods and climatic events
– Includes named Ages, Epochs, Eras etc. [Including all those in a mind map produced
for the PIMMS project at Bristol].
– context of proper nouns e.g. with words such as ‘period’, ‘era’, ‘epoch’
– Numbers with appropriate units e.g. Mya, yr BP
– Likely date numbers e.g. 1750 AD.
– Acronyms – known’LGM’ e.g. [in context ACRONYMS have not been investigated]
– Related adjectives e.g.
seasonal, decadal, glacial, interglacial, stadial, interstadial, maximum, minimum
where used as proper nouns.
• Palaeoclimate Models
– Can guess model names from context
• e.g. proper noun or acronym followed by model
• e.g. reconstruction / simulation with XXX
– Can develop/use glossary of model names.
• Palaeoclimate Acronyms
– Time periods and models.
– Theories, techniques, physical and chemical parameters?
– Can develop/use glossary of acronyms – problem area: often not unique even
within subject.
13. Natural Language vs CV
• Quick compilation of proper nouns used for time periods
(primarily from Wikipedia) contains 185 words.
– Use of these words together with adjective/ dates / details of
events would produce a very large number of phrases.
• Controlled Vocabulary from Bristol contains around 24 of
these.
• Use of these words together with other proper nouns /
adjectives / dates gives only 44 phrases within the Bristol CV.
• Map natural language to CV?
– Straightforward for most dates?
– Understanding of context important
• Does context refer to main emphasis of paper?
13 • Is an event/time period described unambiguously? e.g. “Last Glacial
14. Preliminary Results
Preliminary Results (from 68 files)
Tag / Tags Example Comment
<timePhrase> (i) Holocene, (ii) 8 kyr BP
<PALAEOTIME> (iii)
<referencePhrase> (i) (Otto et al. 2009b) Important to distinguish
(ii) Giraudeau et al. 2000 year pattern from dates
relevant to the study.
<locationPhrase> (i) around Lake Kotokel, False positives: e.g. “from
(ii) over Tibetan Plateau Sphagnum”
<LOCATION> (i) 52°47´ N, 108°07´ E, Cannot currently do
458 m a.s.l (ii) London. degrees from pdf-text.
<TempPhrase> „warm‟ and „cool‟: verbs in
synthetic chem unlike env.
chem.
15. Tag / Tags Example Numbers found
<CAMPAIGN> (i) PMIP, (ii) PANASH Less relevant here than to
ACP in general
<MODEL> (i) REVEALS model, (ii)
ECBILT-CLIO intermediate
complexity climate model
<acronymPhrase> (i) Modern Analogues May pick up campaigns /
Technique ( MAT ) models where phrases
(ii) REVEALS ( Regional above have failed.
Estimates of VEgetation
Abundance from Large
Sites )
<QUANTITY> (i) 10 ppm (ii) 0.53 mm/day units dictionary could be
more extensive
<MOLECULE> (i) CO2, (ii) calcium Many false positives as
carbonate what chemical tagger was
designed for.
16. Chemical Tagger
Rendering of PALEOTIME
XML rendered with CSS http://www.clim-past.net/2/205/2006/cp-2-205-2006.html
16
18. CIM Document Viewer
The acronym / name
MIROC4 is not explained – so
reproduce sentence
The description is just
first few sentences after
appearance of
<MODEL>
19. CIM Document Viewer
http://zonda5.badc.rl.ac.uk/site/public/tools/viewer
Makes use of existing
chemical tagging.
20. CIM Document Viewer
http://zonda5.badc.rl.ac.uk/site/public/repository
Number of spectral
intervals were not
found! No place for
“not found”
21. Climate Models –
General Constraints
• Unless paper is specifically about the model we
are unlikely to find much MEAFOR type CV in
the abstract
– Look at experimental / methods sections
• model name
• model resolution
• model schemes
– Problem with PDF -> text.
– Only certain elements easy to extract (e.g.
resolution)
22. Refine ACPgeo Output
• Add a few more phrases e.g. specific phrases to
look for model resolution, using expected
vocabulary (e.g. grid, levels, resolution, directions
etc).
• Refine output of ACPgeo to look for specific CV
terms.
• Try to put CV terms in context:
– Look for proximity of CV terms to other phrases:
• Within phrase; within sentence or within a number of
sentences
22
23. <MOLECULE>
– Chemical Tagger was designed to be used primarily with
chemistry.
• Unsurprising that there is a tendency to to assign acronyms;
hyphenated words; and words with common chemical
endings as molecules.
– It is possible to filter some of these wrongly assigned words by
probability.
– There are still conflicts e.g. C3 and C4 could refer to
hydrocarbons or plants.
• Extensive testing and modifying / machine learning might
reduce these.
– Better to get right first time if important!
24. Harvested Metadata vs
Documented Metadata
http://proj.badc.rl.ac.uk/pimms/blog/
CIM was designed to be populated by modellers with the (probably over simplistic) assumption
that if something isn't in the CIM document then it either isn't in the model or isn't relevant. But
CIM documents created by harvesting information from papers will naturally not cover
everything about a model, so missing info doesn't mean that those things weren't
included/aren't relevant.
PIMMS will need to describe different protocols for interpreting CIM documents depending on
how they were created, but we will also want to ensure that that CIM accounts for missing data
more intelligently in future releases.
In essence the difference between journal article descriptions and metadata documentation is
Narrative. Journal articles need to tell a story so the information they include is only that which
is relevant to the narrative, whereas metadata documentation is an attempt to include as much
as possible across the board. The general nature of metadata documentation is probably why it
has historically been perceived as such a boring task to complete.
PIMMS will make metadata documentation more fun by bringing back the Narrative, once
PIMMS is established at an institution users will be able to create generalised metadata having
only described those things that are relevant to the story of their experiment.