Publishing/Preserving datasets

From Federation of Earth Science Information Partners

Summary

  • This case comes from our work on building preservation infrastructure (the SEAD project http://sead-data.net/ and particularly its preservation component Virtual Archive). This use case has many overlaps with other use cases, but it seems that preservation activities merit some attention separately from the cases of creating and using datasets. The publishing/preserving datasets applies to datasets that are being created or that are already available. The latter, especially data that lack metadata or were created without preservation considerations in mind (i.e., legacy datasets), is an important subcase that needs to be considered.

Objective

  • Ensuring that datasets exist over long periods of time and sustain changes in formats, infrastructure and user bases while remaining readable, discoverable and useable

Actors

  • Data curators (primarily)
  • The system (many tasks are or will be automated)
  • Data producers (hard to get involved in this use case)

Activities / Concerns

  • Supporting variety
    • Types of datasets (collections)
    • Types of data (e.g., databases, continuous data, time series, etc.)
    • File formats
    • Sizes
  • Long-term sustainability (format conversion, migration, reconstruction)
  • Creating metadata / provenance record
  • Enhancing metadata (manual and automatic)
  • Creating citations (DOIs, multiple levels of individual and organizational attribution)
  • Access rights and licensing (esp. when data come without it)
  • Verification / Validation
  • Matching user – system – data requirements for ingestion
    • Affiliation / access permissions
    • System resources
    • Data characteristics
    • Long-term preservation commitment

Sequence of events

  • Data must be made available
  • Data must be verified and validated
  • Data must be identified (file formats, metadata, etc.)
  • Data must be packaged and deposited
  • Citation created
  • Data indexed for future access and discovery

Artifacts TBD (overlap with other use cases)