Talk:Attribute Convention for Data Discovery 1-2 Working

From Earth Science Information Partners (ESIP)

-- Graybeal (talk) 16:44, 3 May 2013 (MDT)

Nan 4/22/2013:

It might be a good idea to cross check against the definitions that NODC has added - as part of their NetCDF template project they wrote some better descriptions. They're at http://www.nodc.noaa.gov/data/formats/netcdf/

There are a few categories of terms that need better definitions, IMHO.

1. people: creator_name (recommended) publisher_name (suggested)

In a 'normal' research/observing/modeling situation, who are these people?

I think there are 2 necessary points of contact, the person who 'owns' the research and gives you the go-ahead to use/publish the data, and the person who put the data into the file and/or on line. You don't really need to know how to contact the other contributors, even if they had equally or more important roles.

I believe that NODC recommends naming the principal investigator as the 'creator' - although in some circumstances there is no single PI, so maybe we should say this is the person who grants the use of the data.

I'm using the publisher as the person who wrote the actual file that contains the terms, and I'm listing co-PIs and data processors as contributors.

Other comments are moved below. jbg

Re: -- Graybeal (talk) 16:48, 3 May 2013 (MDT)

Ted 4/22/2013: Most of these concerns are discussed at http://wiki.esipfed.org/index.php/NetCDF,_HDF,_and_ISO_Metadata along with more general solutions.

Re: -- Graybeal (talk) 16:48, 3 May 2013 (MDT)

Nan 4/26/2013: One other item that I think we might need to have - beyond better definitions for some of the existing terms - is a CV for contributor roles. I think one exists, somewhere, but I'm not sure where. BODC, maybe? MMI? Or should this really be free text?

Re: Re: -- Graybeal (talk) 16:49, 3 May 2013 (MDT)

John 4/26/2013: Should be from a controlled vocabulary IMHO. BODC has (for SeaDataNet) an extension of ISO role terms, if I recall correctly. I think it isn't just for contributor roles, it's for all roles that this is needed—ISO wasn't very thorough in the first place, but there will always be new ways for people to be connected to a data set.
I don't think we have to be restrictive (in what roles are allowed) but I think we should try to be explicit (about what a role means).
Re: Re: Re: -- Graybeal (talk) 16:50, 3 May 2013 (MDT)
Ted 4/26/2013: I agree completely that a shared vocabulary with definition is critical. The old ISO vocab is at https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries#CI_RoleCode. Many new roles were added in the most recent revision. There is also a brief discussion at http://wiki.esipfed.org/index.php/ISO_People (I will update that list to include revisions)...
What is really important is that the representation allow specification of the source of the code along with the code itself. This is possible in THREDDS, but not ACDD. The job of the standard is to say we use a codelist for this item and that codelist has a location. It is the communities job to say: this is the codelist that our community uses.
Re: Re: Re: Re: -- Graybeal (talk) 16:53, 3 May 2013 (MDT)
Derrick S: Codelists can be seen as antithetical to the CF goal of creating self describing files. Can we figure out a way to encode ISO objects with the need for references to other objects while still staying true to our goal of remaining aligned with CF? The last thing I'd want us to recommend is to open a door down a pathway back to Grib and BUFR.
=Re: Re: Re: Re: Re: -- Graybeal (talk) 17:00, 3 May 2013 (MDT)=
Edward A: Regarding CF, in some ways they already use "codelists", e.g., standard names, so it is not entirely new. Its just their standard names are very human readable at the same time.
=Re: Re: Re: Re: Re: -- Graybeal (talk) 16:58, 3 May 2013 (MDT)=
Nan 4/26/2013: I think we can use terms from a CV, but they should be meaningful, not URLs or those lovely 5 character codes that hark back to languages we've forgotten we ever knew.
We can select one CV, or we can add a term 'rolecode_vocabulary' (that would be fairly reasonable, since we're already using 'keyword_vocabulary').
The SDN roles below are new, but the ISO roles are from a slightly outdated page at NODC. I just find this format easier to look at than the full xml and csv formats that are available on line.
Personally, neither of these is very appealing - I hope the new ISO codes will be better.

SeaDataNet roles

  • metadata collator: Responsible for the compilation of metadata for one or more datasets and submission of that metadata to the appropriate SeaDataNet metadata repository.
  • programme operation responsibility: Responsible for the operation of a data collecting programme.
  • programme archive responsibility: Responsible for the archive centre handling distribution of delayed mode data from a collecting programme and the long term stewardship of its data.
  • programme realtime responsibility: Responsible for the centre handling distribution of true and near real time data from a collecting programme.
  • contact point: Person responsible for the provision of information in response to queries concerning the metadata or underlying data.
  • principal funder: Person or organisation that funds the majority of an activity. contributing funder: Person or organisation that contributes to the funding of an activity.
  • principal investigator: Scientific lead of data collection within a programme


ISO roles

  • resourceProvider: party that supplies the resource
  • custodian: party that accepts accountability and responsability for the data and ensures appropriate care and maintenance of the resource
  • owner: party that owns the resource
  • sponsor: party that sponsors the resource
  • user: party who uses the resource
  • distributor: party who distributes the resource
  • originator: party who created the resource
  • pointOfContact: party who can be contacted for acquiring knowledge about or acquisition of the resource
  • principalInvestigator: key party responsible for gathering information and conducting research
  • processor: party who has processed the data in a manner such that the resource has been modified
  • publisher: party who published the resource
  • author: party who authored the resource
  • collaborator: party who conducted or contributed to the research
==Re: Re: Re: Re: Re: Re: -- Graybeal (talk) 17:10, 3 May 2013 (MDT)==
Ted H 4/27/2013: The suggestion to add an attribute called rolecode_vocabulary demonstrates very well the problem with this approach - a community has a documentation need and, in order to address that need, we need to add a new concept into the convention. Do we end up with a *_vocabulary attribute for every attribute that can benefit from a shared vocabulary? I think this would be difficult to maintain.
As an alternative, we create a responsibleParty type group that includes a role from a shared vocabulary and information that describes people or organizations. The role has a value and a source which is the shared vocabulary that it comes from.
Are we a community of convention users or convention developers? When we say we need a mechanism for describing responsibleParties that includes a role from a shared vocabulary and descriptive information, we are convention developers. When we say we need a vocabulary to describe roles like principleInvenstigator or instrumentDeveloper, we are acting as a community using a convention.
What I am trying to do is separate these two roles so that when a community says "we need a shared vocabulary for x", we do not have to add a new attribute called x_vocabulary to the convention.

Re: -- Graybeal (talk) 17:09, 3 May 2013 (MDT)

Ken C 4/27/2013: All we say at NODC in our netCDF templates for the creator_ attributes is copied below… we discussed attributes like this a lot when documenting our templates and finally "settled" on the idea of creator being associated with "collector" of the data. Of course even that is not perfect. We don't say anything about PIs, since as Nan points out there is often no single PI. I would add that there is often no PI at all… many, many, datasets come to us now as a result of sustained and operational observing programs and systems, where the idea of a "PI" itself doesn't even apply.
* creator_email: Email address of the person or institution that collected the data. -- The email of the person or institution may be found in the NODC tables for persons (http://www.nodc.noaa.gov/cgi-bin/OAS/prd/person) and institutions(http://www.nodc.noaa.gov/cgi-bin/OAS/prd/institution). Use the short name of the institution if available.
* creator_name: Name of the person who collected the data. -- Use the name from the NODC persons(http://www.nodc.noaa.gov/cgi-bin/OAS/prd/person) table when applicable.
* creator_url: The URL of the institution that collected the data. -- The url of the institution can be found in the NODC institutions (http://www.nodc.noaa.gov/cgi-bin/OAS/prd/institution) table

-- Graybeal (talk) 16:44, 3 May 2013 (MDT)

Nan 4/22/2013: There are a few categories of terms that need better definitions, IMHO. (continued)

2. file times

  • date_created (recommended)
  • date_modified (suggested)
  • date_issued (suggested)

These could well have different meanings for model data; for my in situ data, I have 2 (or, for real time data, possibly 3) useful file times; the time the last edit or processing occurred, which is the version information and could be useful if the underlying data has been changed, and the time the file was written, which could provide information about translation errors being corrected. (We don't update files, we overwrite them; some people might need to describe the time the original file was written and time of last update?) For real time data it could also be interesting to know the last time new data arrived, which could be asynchronous.

NODC doesn't seem to use date_issued, but they have defs for created and modified.

  • date_created: "The date or date and time when the file was created.... This time stamp will never change, even when modifying the file."
  • date_modified: This time stamp will change any time information is changed in this file.

3. Keywords

Since iso uses keyword type codes instead of cramming all the possible keywords (theme, place, etc) into one structure, I don't see why we don't do something similar. We could use our pseudo-groups syntax; keywords_theme, keywords_dataCenter ...etc.

4. coordinate 'resolution' terms

The word resolution is a poor choice, and if it's going to be kept, it needs to be defined as meaning 'spacing' or 'shape' and not an indication of the precision of the coordinate. For measurements that are irregularly spaced along a mooring line, it's fairly useless - unless we come up with a vocabulary describing this and other possible values.

For my data, the term might be more useful with the other definition; our depths are approximate 'target depths', and, while we may know the lat/long of an anchor and of a buoy (the latter being a time series, the former being a single point) we don't actually know the lat/long of any given instrument on a mooring line. The watch circle of the buoy is really the 'resolution' we need to supply here.