NetCDF Utilities Metadata Handling

From Earth Science Information Partners (ESIP)

Introduction

There are a many netCDF utilities that can generate new data products from existing ones. In the process, it is important that whatever metadata attributes written to the newly created file accurately describe the contents of that file.

This general principle applies to metadata attributes from any convention. While a given utility can't possibly understand all conventions, it should not be writing attributes into a newly generated one, unless it knows the attribute remains valid. (Alternative approaches are described in the #Alternatives for Maintaining Metadata Accuracy section below.)

Status

It appears some existing utilities may pass through metadata attributes from the original file, directly into the updated file. so that they are misleading or wrong in the new product.

Other utilities may analyze an existing netCDF file and report information about it. (Here we are considering local applications or libraries distributed to the community, not individual deployed web services.) In this case we want to know whether the utility is reporting information from metadata attributes, information distilled from the file, or both.

To understand the behavior of current netCDF utilities, this page was created to document their metadata handling status.

Alternatives for Maintaining Metadata Accuracy

There are several options a developer can choose to ensure the output metadata is correct.

  • Update meaningfully: If the utility understands a particular convention well, it can modify the values in each attribute as needed to reflect the new data.
  • Update by citing: If the utility doesn't understand the convention well, it might still update some attributes like the Summary, e.g., by appending "As modified by XYZ to constrain the time and spatial range."
  • Omit: The utility can omit (meaning, not copy to the new file) the metadata attributes that were in the original file.
  • Retitle: In NetCDF 3, the utility can retitle the attributes from the source file(s) to make clear the attributes do not apply in the destination file. For example, a file created by merging 3 source files could created three new title attributes, source1_title, source2_title, and source3_title, copying the title attributes from each source into the new attributes. Any attribute can be redefined as an element of provenance in this way, and the results can cascade through multiple processes 'source2_source3_title'. In NetCDF 4, groups could be used to manage this information.
  • Mix: A utility which partially understands a specification might mix these approaches, for example updating some attributes while omitting or retitling those it didn't understand or wanted to preserve for provenance reasons.
  • Disclaim: A utility which doesn't understand a convention (and particularly, doesn't recognize it) should not generate a file that it claims is following that convention. It can delete the convention from the list in the Conventions attribute.

The best approach depends on the purpose of the utility, but a utility that creates new metadata attributes for the data it is creating, while retitling attributes from the original sources, maintains the maximum amount of descriptive and provenance information for users of the resulting data products.

Tables of Utilities

Key

  • Contact: The person or other contact point to ask questions/propose changes about the software.
  • Metadata update mode: What software does with existing metadata attributes when writing new file; one of 'update', 'retitle', 'omit', 'mix', or 'copy'
  • Writes history info: Does the software update the history attribute as recommended?
  • Proposal(s): Text details of changes proposed for this software

Table of Data Product Utilities

Utility Name Version Contact Metadata update mode Writes history info Proposal(s) Comments
LAS {version} {contact} copy {history} {proposal} per Ed Armstrong; adds global attribute :FERRET_comment = "File written via LAS. Attributes are inherited from originating dataset";
THREDDS {version} {contact} {mode} {history} {proposal} {comments}
HYRAX {version} {contact} {mode} {history} {proposal} {comments}
ERRDAP {version} {contact} {mode} {history} {proposal} {comments}
{name} {version} {contact} {mode} {history} {proposal} {comments}

Table of Analysis Utilities

Utility Name Version Contact Metadata analysis mode Proposal(s)
ncISO version contact mode proposal
ncdump version contact mode proposal
name version contact mode proposal
name version contact mode proposal
name version contact mode proposal