Navigating the Marine Geophysical Data Life Cycle
- 1. Navigating the Marine Geophysical
Data Life Cycle:
From Acquisition and Synthesis to
Publication and Open Data Access
Vicki Ferrini
Lamont-Doherty Earth Observatory
Columbia University
- 2. Research InterestsResearch Interests
• Mapping seafloor morphology to
understand processes at a variety of scales
– Coastal, deep sea, rivers, lakes
• Techniques for remote seafloor
characterization using multibeam sonar
– Morphology
– Backscatter intensity
• Multibeam sonar data quality
• Data preservation, integration and access
- 3. Increasing Importance ofIncreasing Importance of
Data ManagementData Management
• Support science and discovery
• Scientific reproducibility
• Costs of acquisition
• Optimizing operations
• Increasing volumes of data
• Data policies with increasing focus on data
sharing
• Data Syntheses
• Data Publication
- 4. How can we “lessen the burden” of data
management for the science community?
- 5. A community-based data facility funded by NSF to
support, sustain, and advance the geosciences by
providing data services for observational solid earth
data from the Ocean, Earth, and Polar Sciences.
http://www.iedadata.org/
Integrated Earth Data Applications
- 6. • Investigator-focused
• Ensure ‘Fitness for Re-use’ through data
stewardship
• Ensure professional data curation services
– Long-term archiving & access
– Persistent, unique identification
– Discoverability (metadata registration)
• Integrate with the ‘scholarly communication
ecosystem’
Domain-specific Repositories & Services
- 7. • Marine Geophysical Data
• Bathymetry, sidescan, subbottom
• Academic Seismic Facility (MCS, SCS)
• Data from AUV, ROV, HOV, Ship
• Complementary datasets
• Navigation, bottom photos
• Sample-based Data
• Sample Registry (SESAR)
• Geochemistry
• Geochronology
• Technical reports
Data Curated in IEDA Systems
- 9. Data Life Cycle: Plan
• Data Management Plan Tool
• Facilitate assembly
• Inform Investigators
• Inform down-stream repositories
• Promote dialogue
• Data Acquisition Plan
• Metadata & data templates
• Promote & facilitate
contemporaneous
documentation
- 10. Data Life Cycle: Collect & Assure
• Promote Best Practices
• What to document
• How to document
• Tools and workflows
to facilitate digital
documentation
• Metadata & Data
Templates
- 11. Data Life Cycle: Document & Preserve
• Document & capture data &
metadata as soon as it is available
• Simple interfaces & guidelines
• Sample metadata registry
• Link to complementary data
& metadata
- 12. Data Life Cycle: Analyze
• Tools to:
• Support domain specialists
• Make specialist data accessible
to non-specialist users
• Integrate & visualize data
• Quantitative access to Data
Syntheses
• Access to complementary
data & resources
- 13. Data Life Cycle: Integrate & Share
• Advise on what to preserve & how
• Data supporting pubs
• Data of value
• Facilitate data prep.
• Metadata requirements
• Templates
• Format guidelines
- 14. Data Life Cycle: Document & Preserve
Develop simple workflows, interfaces & templates
to capture sufficient information for:
• Long-term curation & access
• Inclusion in syntheses
• Links to scientific publications
• Data Publication
• Data use, discovery & re-use
• Attribution & collaboration
• Data Download Stats
• Data Compliance Reporting
- 15. Data Compliance Reporting Tool
• Tool for demonstrating compliance:
• Award-based
• Informed by DMP
• Report includes:
• Data Inventory
• Data release Status
• Links to data
• Save as PDF
✔
http://www.iedadata.org/compliance/
- 18. MGDS Search & Data CatalogMGDS Search & Data Catalog
• Text & Map-based Search
• Rich metadata
• Download data files
• Proprietary Hold
• Password Access
• Attribution
• Links to Refs
• Data DOIs
• Download Stats
• Web Services
- 26. Data files are great - if you know what to do
with them…
How can we make data quantitatively
accessible to non-specialists?
- 27. GeoMapApp
• Free Java Desktop App
• Basic GIS functionality
• Core functionality:
• GMRT Basemap
• Gridded & Tabular Data
• Linked Views
• Access online datasets (grids, shapefiles, tables)
• Attribution & links to source data
• Custom Portals
• Underway Geophysics, MB Sonar, DSDP, PetDB…
• Import & Export
• Table, Image, Grid, KMZ
http://www.geomapapp.org/
- 31. GMRT Synthesis
-Multi-resolution synthesis
-Access provided to images &
gridded compilation
-9 resolution levels to 100 m
-Dynamically maintained
-Mask highlights hi-res data
-Attribution to data sources &
contributing scientists
-Source Data Includes:
-ASTER, NED, IBCAO, BEDMAP,
Smith and Sandwell
-Contributed grids
-Swath data from > 700 cruises (public domain)
- 32. • 1992 – Ridge Multibeam
Synthesis Project
• 2003 – Expanded to
include US-funded data
from Southern Ocean
• 2004 – present --
Expanded to include
public domain data from
throughout global oceans
• ongoing growth by ~80
cruises/yr
• 2009 – G-cubed paper
(Ryan et al., 2009)
GMRTv2.6 ~780 cruises (April 2014)
Global Multi-Resolution Topography
http://gmrt.marine-geo.org
http://gmrt.marine-geo.org/
- 33. GMRT Components
LDEO 100-m
compilation* (raw &
processed swath files in
public domain)
Contributed
Grids (< 500 m res.)
Global & Regional
Grids (>= 500 m res.)
e.g. GEBCO_08, IBCAO
*LDEO team performs QC of ping files
MB files metadata
- 34. GMRT: Access
• Images & gridded data
• Desktop Apps
• GeoMapApp
• Virtual Ocean
• Web App
• GMRT MapTool
• iPad/iPhone App
• Earth Observer
• Web Map Services
• Images & Mask
Export as: NetCDF, Arc ASCII, Binary, Fledermaus, KMZ, PNG, Geotiff…
- 35. GMRT: MB Data Reduction & SynthesisGMRT: MB Data Reduction & Synthesis
• Bad navigation
• Noisy outer beams
• Attitude problems
• Bad soundings
• Instrument problems
• Bad weather
• Sound velocity
• Slow speed in turns
• Quality assessment
for grid weighting
and resolution
- 36. Tracking and Managing MB
Content for GMRT
MGDS
Relational DB
MB data
files GMRT Access &
Web services
- 44. GMRT: Next Steps
• GMRT Version 2.6 April 2014
– MB Data from ~50 more cruises
– More Contributed grids (including LOS)
• Revise GMRT MapTool (web interface)
– more download format options
• Enhanced Web Services
– Gridded Content
– Attribution
• Enhanced Accessibility to Source Data
– DOIs on processed source data files
– Search & download multiple processed MB files
• GEBCO High-Res Effort
- 45. How can we optimize quality of the
data being preserved?
(Good data in = good data out)
- 47. • Focus on Raw Underway Data
• Instruments permanently installed on ships
• Fleet-wide solution
~500 cruises/year
• Core Services
• Data documentation & preservation
• Programmatic Quality Assessment
• Navigation Products
• Event Logger
• Real-time MET/TSG
R2R Data Stewardship
- 48. MB Raw Data PreservationMB Raw Data Preservation
• 769 file sets
• 291,673 files
• ~ 7.6 TB
• from 671
cruises
as of Apr 15, 2014
- 49. R2R: Quality AssessmentR2R: Quality Assessment
• Programmatic post-cruise review of data
• Identify “suspicious” data
• Feedback to Operators
• Distributable Code
• Leverage existing tools where possible
(for MB data: MB-System)
• Customizable thresholds
• Generate QA Report
• Document QA procedures
• Provide info for downstream data use
- 51. R2R: MB Quality AssessmentR2R: MB Quality Assessment
Lead: S. O’Hara (LDEO)
- 53. Multibeam Advisory CommitteeMultibeam Advisory Committee
• Community of Stakeholders
• Fleet-wide Approach
– Best Practices
– Technical Resources
• Technical Teams
– Shipboard Acceptance
– Acoustic Noise
– Quality Assurance
• Help Desk
http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
- 54. MAC AccomplishmentsMAC Accomplishments
• Test Reports Gathered and Posted
• Tools
– SVP Editor Tool
– SVP Mission Planning Tool
• Best Practice Cookbooks
• Ship visits
– Acoustic Noise Testing
– Quality Assurance
– Sea Acceptance
• Assistance to Operators & Investigators
- 58. • How can we “lessen the burden”?
– Simple workflows, interfaces, & guidelines
• How can we engage the science community?
– High-value Content
– Reward (attribution, citation)
• How can we make data accessible to non-
specialists?
– Data synthesis
• How can we optimize data quality?
– Best practices at acquisition
Summary:
- 59. How to Navigate the
Data Life Cycle
• Know what resources are available
• Tools to make process easier
• Access existing Data
• Communicate
• Upstream
• Downstream (Data Managers)
• Plan ahead
• Document contemporaneously
• Treat data as a valuable community resource
• Participate! Input always needed for:
• Metadata & data format standards
• Usability of interfaces
Editor's Notes
- Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
- Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
- Through geomapapp the original data source can be identified. Here by clicking on the Italian data set Bouvet-Ligi, the location of the survey is highlighted in enclosed box on map view
- Copy right claims can not be made by data contributors for the synthesized products