SlideShare a Scribd company logo
Navigating the Marine Geophysical
Data Life Cycle:
From Acquisition and Synthesis to
Publication and Open Data Access
Vicki Ferrini
Lamont-Doherty Earth Observatory
Columbia University
Research InterestsResearch Interests
• Mapping seafloor morphology to
understand processes at a variety of scales
– Coastal, deep sea, rivers, lakes
• Techniques for remote seafloor
characterization using multibeam sonar
– Morphology
– Backscatter intensity
• Multibeam sonar data quality
• Data preservation, integration and access
Increasing Importance ofIncreasing Importance of
Data ManagementData Management
• Support science and discovery
• Scientific reproducibility
• Costs of acquisition
• Optimizing operations
• Increasing volumes of data
• Data policies with increasing focus on data
sharing
• Data Syntheses
• Data Publication
How can we “lessen the burden” of data
management for the science community?
A community-based data facility funded by NSF to
support, sustain, and advance the geosciences by
providing data services for observational solid earth
data from the Ocean, Earth, and Polar Sciences.
http://www.iedadata.org/
Integrated Earth Data Applications
• Investigator-focused
• Ensure ‘Fitness for Re-use’ through data
stewardship
• Ensure professional data curation services
– Long-term archiving & access
– Persistent, unique identification
– Discoverability (metadata registration)
• Integrate with the ‘scholarly communication
ecosystem’
Domain-specific Repositories & Services
• Marine Geophysical Data
• Bathymetry, sidescan, subbottom
• Academic Seismic Facility (MCS, SCS)
• Data from AUV, ROV, HOV, Ship
• Complementary datasets
• Navigation, bottom photos
• Sample-based Data
• Sample Registry (SESAR)
• Geochemistry
• Geochronology
• Technical reports
Data Curated in IEDA Systems
Navigating the Marine Geophysical Data Life Cycle
Data Life Cycle: Plan
• Data Management Plan Tool
• Facilitate assembly
• Inform Investigators
• Inform down-stream repositories
• Promote dialogue
• Data Acquisition Plan
• Metadata & data templates
• Promote & facilitate
contemporaneous
documentation
Data Life Cycle: Collect & Assure
• Promote Best Practices
�� What to document
• How to document
• Tools and workflows
to facilitate digital
documentation
• Metadata & Data
Templates
Data Life Cycle: Document & Preserve
• Document & capture data &
metadata as soon as it is available
• Simple interfaces & guidelines
• Sample metadata registry
• Link to complementary data
& metadata
Data Life Cycle: Analyze
• Tools to:
• Support domain specialists
• Make specialist data accessible
to non-specialist users
• Integrate & visualize data
• Quantitative access to Data
Syntheses
• Access to complementary
data & resources
Data Life Cycle: Integrate & Share
• Advise on what to preserve & how
• Data supporting pubs
• Data of value
• Facilitate data prep.
• Metadata requirements
• Templates
• Format guidelines
Data Life Cycle: Document & Preserve
Develop simple workflows, interfaces & templates
to capture sufficient information for:
• Long-term curation & access
• Inclusion in syntheses
• Links to scientific publications
• Data Publication
• Data use, discovery & re-use
• Attribution & collaboration
• Data Download Stats
• Data Compliance Reporting
Data Compliance Reporting Tool
• Tool for demonstrating compliance:
• Award-based
• Informed by DMP
• Report includes:
• Data Inventory
• Data release Status
• Links to data
• Save as PDF
✔
http://www.iedadata.org/compliance/
How do we engage the science community?
http://www.marine-geo.org/
MGDS Search & Data CatalogMGDS Search & Data Catalog
• Text & Map-based Search
• Rich metadata
• Download data files
• Proprietary Hold
• Password Access
• Attribution
• Links to Refs
• Data DOIs
• Download Stats
• Web Services
MGDS Search & Data CatalogMGDS Search & Data Catalog
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Data files are great - if you know what to do
with them…
How can we make data quantitatively
accessible to non-specialists?
GeoMapApp
• Free Java Desktop App
• Basic GIS functionality
• Core functionality:
• GMRT Basemap
• Gridded & Tabular Data
• Linked Views
• Access online datasets (grids, shapefiles, tables)
• Attribution & links to source data
• Custom Portals
• Underway Geophysics, MB Sonar, DSDP, PetDB…
• Import & Export
• Table, Image, Grid, KMZ
http://www.geomapapp.org/
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
GeoMapApp default basemap: GMRT
GMRT Synthesis
-Multi-resolution synthesis
-Access provided to images &
gridded compilation
-9 resolution levels to 100 m
-Dynamically maintained
-Mask highlights hi-res data
-Attribution to data sources &
contributing scientists
-Source Data Includes:
-ASTER, NED, IBCAO, BEDMAP,
Smith and Sandwell
-Contributed grids
-Swath data from > 700 cruises (public domain)
• 1992 – Ridge Multibeam
Synthesis Project
• 2003 – Expanded to
include US-funded data
from Southern Ocean
• 2004 – present --
Expanded to include
public domain data from
throughout global oceans
• ongoing growth by ~80
cruises/yr
• 2009 – G-cubed paper
(Ryan et al., 2009)
GMRTv2.6 ~780 cruises (April 2014)
Global Multi-Resolution Topography
http://gmrt.marine-geo.org
http://gmrt.marine-geo.org/
GMRT Components
LDEO 100-m
compilation* (raw &
processed swath files in
public domain)
Contributed
Grids (< 500 m res.)
Global & Regional
Grids (>= 500 m res.)
e.g. GEBCO_08, IBCAO
*LDEO team performs QC of ping files
MB files metadata
GMRT: Access
• Images & gridded data
• Desktop Apps
• GeoMapApp
• Virtual Ocean
• Web App
• GMRT MapTool
• iPad/iPhone App
• Earth Observer
• Web Map Services
• Images & Mask
Export as: NetCDF, Arc ASCII, Binary, Fledermaus, KMZ, PNG, Geotiff…
GMRT: MB Data Reduction & SynthesisGMRT: MB Data Reduction & Synthesis
• Bad navigation
• Noisy outer beams
• Attitude problems
• Bad soundings
• Instrument problems
• Bad weather
• Sound velocity
• Slow speed in turns
• Quality assessment
for grid weighting
and resolution
Tracking and Managing MB
Content for GMRT
MGDS
Relational DB
MB data
files GMRT Access &
Web services
GMRT: Attribution & Access to Source Data
GMRT: Attribution & Access to Source Data
GMRT: Attribution & Access to Source Data
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Cruise-level Attribution & Provenance
GMRT: Next Steps
• GMRT Version 2.6 April 2014
– MB Data from ~50 more cruises
– More Contributed grids (including LOS)
• Revise GMRT MapTool (web interface)
– more download format options
• Enhanced Web Services
– Gridded Content
– Attribution
• Enhanced Accessibility to Source Data
– DOIs on processed source data files
– Search & download multiple processed MB files
• GEBCO High-Res Effort
How can we optimize quality of the
data being preserved?
(Good data in = good data out)
Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts
GMRTR2RMAC
GOAL: Well-documented high-quality publicly available data
199220092011
• Focus on Raw Underway Data
• Instruments permanently installed on ships
• Fleet-wide solution
~500 cruises/year
• Core Services
• Data documentation & preservation
• Programmatic Quality Assessment
• Navigation Products
• Event Logger
• Real-time MET/TSG
R2R Data Stewardship
MB Raw Data PreservationMB Raw Data Preservation
• 769 file sets
• 291,673 files
• ~ 7.6 TB
• from 671
cruises
as of Apr 15, 2014
R2R: Quality AssessmentR2R: Quality Assessment
• Programmatic post-cruise review of data
• Identify “suspicious” data
• Feedback to Operators
• Distributable Code
• Leverage existing tools where possible
(for MB data: MB-System)
• Customizable thresholds
• Generate QA Report
• Document QA procedures
• Provide info for downstream data use
R2R: QA DashboardR2R: QA Dashboard
• By Cruise
• By Ship
• By Instrument
• By Test
R2R: MB Quality AssessmentR2R: MB Quality Assessment
Lead: S. O’Hara (LDEO)
Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts
GMRTR2RMAC
GOAL: Well-documented high-quality publicly available data
199220092011
Multibeam Advisory CommitteeMultibeam Advisory Committee
• Community of Stakeholders
• Fleet-wide Approach
– Best Practices
– Technical Resources
• Technical Teams
– Shipboard Acceptance
– Acoustic Noise
– Quality Assurance
• Help Desk
http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
MAC AccomplishmentsMAC Accomplishments
• Test Reports Gathered and Posted
• Tools
– SVP Editor Tool
– SVP Mission Planning Tool
• Best Practice Cookbooks
• Ship visits
– Acoustic Noise Testing
– Quality Assurance
– Sea Acceptance
• Assistance to Operators & Investigators
Reports from Technical TeamsReports from Technical Teams
Technical ResourcesTechnical Resources
GMRT, R2R MBQA & MAC
• How can we “lessen the burden”?
– Simple workflows, interfaces, & guidelines
• How can we engage the science community?
– High-value Content
– Reward (attribution, citation)
• How can we make data accessible to non-
specialists?
– Data synthesis
• How can we optimize data quality?
– Best practices at acquisition
Summary:
How to Navigate the
Data Life Cycle
• Know what resources are available
• Tools to make process easier
• Access existing Data
• Communicate
• Upstream
• Downstream (Data Managers)
• Plan ahead
• Document contemporaneously
• Treat data as a valuable community resource
• Participate! Input always needed for:
• Metadata & data format standards
• Usability of interfaces

More Related Content

Navigating the Marine Geophysical Data Life Cycle

  • 1. Navigating the Marine Geophysical Data Life Cycle: From Acquisition and Synthesis to Publication and Open Data Access Vicki Ferrini Lamont-Doherty Earth Observatory Columbia University
  • 2. Research InterestsResearch Interests • Mapping seafloor morphology to understand processes at a variety of scales – Coastal, deep sea, rivers, lakes • Techniques for remote seafloor characterization using multibeam sonar – Morphology – Backscatter intensity • Multibeam sonar data quality • Data preservation, integration and access
  • 3. Increasing Importance ofIncreasing Importance of Data ManagementData Management • Support science and discovery • Scientific reproducibility • Costs of acquisition • Optimizing operations • Increasing volumes of data • Data policies with increasing focus on data sharing • Data Syntheses • Data Publication
  • 4. How can we “lessen the burden” of data management for the science community?
  • 5. A community-based data facility funded by NSF to support, sustain, and advance the geosciences by providing data services for observational solid earth data from the Ocean, Earth, and Polar Sciences. http://www.iedadata.org/ Integrated Earth Data Applications
  • 6. • Investigator-focused • Ensure ‘Fitness for Re-use’ through data stewardship • Ensure professional data curation services – Long-term archiving & access – Persistent, unique identification – Discoverability (metadata registration) • Integrate with the ‘scholarly communication ecosystem’ Domain-specific Repositories & Services
  • 7. • Marine Geophysical Data • Bathymetry, sidescan, subbottom • Academic Seismic Facility (MCS, SCS) • Data from AUV, ROV, HOV, Ship • Complementary datasets • Navigation, bottom photos • Sample-based Data • Sample Registry (SESAR) • Geochemistry • Geochronology • Technical reports Data Curated in IEDA Systems
  • 9. Data Life Cycle: Plan • Data Management Plan Tool • Facilitate assembly • Inform Investigators • Inform down-stream repositories • Promote dialogue • Data Acquisition Plan • Metadata & data templates • Promote & facilitate contemporaneous documentation
  • 10. Data Life Cycle: Collect & Assure • Promote Best Practices • What to document • How to document • Tools and workflows to facilitate digital documentation • Metadata & Data Templates
  • 11. Data Life Cycle: Document & Preserve • Document & capture data & metadata as soon as it is available • Simple interfaces & guidelines • Sample metadata registry • Link to complementary data & metadata
  • 12. Data Life Cycle: Analyze • Tools to: • Support domain specialists • Make specialist data accessible to non-specialist users • Integrate & visualize data • Quantitative access to Data Syntheses • Access to complementary data & resources
  • 13. Data Life Cycle: Integrate & Share • Advise on what to preserve & how • Data supporting pubs • Data of value • Facilitate data prep. • Metadata requirements • Templates • Format guidelines
  • 14. Data Life Cycle: Document & Preserve Develop simple workflows, interfaces & templates to capture sufficient information for: • Long-term curation & access • Inclusion in syntheses • Links to scientific publications • Data Publication • Data use, discovery & re-use • Attribution & collaboration • Data Download Stats • Data Compliance Reporting
  • 15. Data Compliance Reporting Tool • Tool for demonstrating compliance: • Award-based • Informed by DMP • Report includes: • Data Inventory • Data release Status • Links to data • Save as PDF ✔ http://www.iedadata.org/compliance/
  • 16. How do we engage the science community?
  • 18. MGDS Search & Data CatalogMGDS Search & Data Catalog • Text & Map-based Search • Rich metadata • Download data files • Proprietary Hold • Password Access • Attribution • Links to Refs • Data DOIs • Download Stats • Web Services
  • 19. MGDS Search & Data CatalogMGDS Search & Data Catalog
  • 26. Data files are great - if you know what to do with them… How can we make data quantitatively accessible to non-specialists?
  • 27. GeoMapApp • Free Java Desktop App • Basic GIS functionality • Core functionality: • GMRT Basemap • Gridded & Tabular Data • Linked Views • Access online datasets (grids, shapefiles, tables) • Attribution & links to source data • Custom Portals • Underway Geophysics, MB Sonar, DSDP, PetDB… • Import & Export • Table, Image, Grid, KMZ http://www.geomapapp.org/
  • 31. GMRT Synthesis -Multi-resolution synthesis -Access provided to images & gridded compilation -9 resolution levels to 100 m -Dynamically maintained -Mask highlights hi-res data -Attribution to data sources & contributing scientists -Source Data Includes: -ASTER, NED, IBCAO, BEDMAP, Smith and Sandwell -Contributed grids -Swath data from > 700 cruises (public domain)
  • 32. • 1992 – Ridge Multibeam Synthesis Project • 2003 – Expanded to include US-funded data from Southern Ocean • 2004 – present -- Expanded to include public domain data from throughout global oceans • ongoing growth by ~80 cruises/yr • 2009 – G-cubed paper (Ryan et al., 2009) GMRTv2.6 ~780 cruises (April 2014) Global Multi-Resolution Topography http://gmrt.marine-geo.org http://gmrt.marine-geo.org/
  • 33. GMRT Components LDEO 100-m compilation* (raw & processed swath files in public domain) Contributed Grids (< 500 m res.) Global & Regional Grids (>= 500 m res.) e.g. GEBCO_08, IBCAO *LDEO team performs QC of ping files MB files metadata
  • 34. GMRT: Access • Images & gridded data • Desktop Apps • GeoMapApp • Virtual Ocean • Web App • GMRT MapTool • iPad/iPhone App • Earth Observer • Web Map Services • Images & Mask Export as: NetCDF, Arc ASCII, Binary, Fledermaus, KMZ, PNG, Geotiff…
  • 35. GMRT: MB Data Reduction & SynthesisGMRT: MB Data Reduction & Synthesis • Bad navigation • Noisy outer beams • Attitude problems • Bad soundings • Instrument problems • Bad weather • Sound velocity • Slow speed in turns • Quality assessment for grid weighting and resolution
  • 36. Tracking and Managing MB Content for GMRT MGDS Relational DB MB data files GMRT Access & Web services
  • 37. GMRT: Attribution & Access to Source Data
  • 38. GMRT: Attribution & Access to Source Data
  • 39. GMRT: Attribution & Access to Source Data
  • 44. GMRT: Next Steps • GMRT Version 2.6 April 2014 – MB Data from ~50 more cruises – More Contributed grids (including LOS) • Revise GMRT MapTool (web interface) – more download format options • Enhanced Web Services – Gridded Content – Attribution • Enhanced Accessibility to Source Data – DOIs on processed source data files – Search & download multiple processed MB files • GEBCO High-Res Effort
  • 45. How can we optimize quality of the data being preserved? (Good data in = good data out)
  • 46. Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts GMRTR2RMAC GOAL: Well-documented high-quality publicly available data 199220092011
  • 47. • Focus on Raw Underway Data • Instruments permanently installed on ships • Fleet-wide solution ~500 cruises/year • Core Services • Data documentation & preservation • Programmatic Quality Assessment • Navigation Products • Event Logger • Real-time MET/TSG R2R Data Stewardship
  • 48. MB Raw Data PreservationMB Raw Data Preservation • 769 file sets • 291,673 files • ~ 7.6 TB • from 671 cruises as of Apr 15, 2014
  • 49. R2R: Quality AssessmentR2R: Quality Assessment • Programmatic post-cruise review of data • Identify “suspicious” data • Feedback to Operators • Distributable Code • Leverage existing tools where possible (for MB data: MB-System) • Customizable thresholds • Generate QA Report • Document QA procedures • Provide info for downstream data use
  • 50. R2R: QA DashboardR2R: QA Dashboard • By Cruise • By Ship • By Instrument • By Test
  • 51. R2R: MB Quality AssessmentR2R: MB Quality Assessment Lead: S. O’Hara (LDEO)
  • 52. Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts GMRTR2RMAC GOAL: Well-documented high-quality publicly available data 199220092011
  • 53. Multibeam Advisory CommitteeMultibeam Advisory Committee • Community of Stakeholders • Fleet-wide Approach – Best Practices – Technical Resources • Technical Teams – Shipboard Acceptance – Acoustic Noise – Quality Assurance • Help Desk http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
  • 54. MAC AccomplishmentsMAC Accomplishments • Test Reports Gathered and Posted • Tools – SVP Editor Tool – SVP Mission Planning Tool • Best Practice Cookbooks • Ship visits – Acoustic Noise Testing – Quality Assurance – Sea Acceptance • Assistance to Operators & Investigators
  • 55. Reports from Technical TeamsReports from Technical Teams
  • 57. GMRT, R2R MBQA & MAC
  • 58. • How can we “lessen the burden”? – Simple workflows, interfaces, & guidelines • How can we engage the science community? – High-value Content – Reward (attribution, citation) • How can we make data accessible to non- specialists? – Data synthesis • How can we optimize data quality? – Best practices at acquisition Summary:
  • 59. How to Navigate the Data Life Cycle • Know what resources are available • Tools to make process easier • Access existing Data • Communicate • Upstream • Downstream (Data Managers) • Plan ahead • Document contemporaneously • Treat data as a valuable community resource • Participate! Input always needed for: • Metadata & data format standards • Usability of interfaces

Editor's Notes

  1. Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
  2. Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
  3. Through geomapapp the original data source can be identified. Here by clicking on the Italian data set Bouvet-Ligi, the location of the survey is highlighted in enclosed box on map view
  4. Copy right claims can not be made by data contributors for the synthesized products