Difference between revisions of "Talk:Attribute Convention for Data Discovery 1-2 Working"

From Earth Science Information Partners (ESIP)
Line 90: Line 90:
  
 
:::: Derrick S: Codelists can be seen as antithetical to the CF goal of creating self describing files.  Can we figure out a way to encode ISO objects with the need for references to other objects while still staying true to our goal of remaining aligned with CF?  The last thing I'd want us to recommend is to open a door down a pathway back to Grib and BUFR.
 
:::: Derrick S: Codelists can be seen as antithetical to the CF goal of creating self describing files.  Can we figure out a way to encode ISO objects with the need for references to other objects while still staying true to our goal of remaining aligned with CF?  The last thing I'd want us to recommend is to open a door down a pathway back to Grib and BUFR.
 +
 +
=======Re: Re: Re: Re: Re: -- [[User:Graybeal|Graybeal]] ([[User talk:Graybeal|talk]]) 17:00, 3 May 2013 (MDT)=======
 +
 +
::::: Edward A: Regarding CF, in some ways they already use "codelists", e.g., standard names, so it is not entirely new. Its just their standard names are very human readable at the same time.
  
 
=======Re: Re: Re: Re: Re: -- [[User:Graybeal|Graybeal]] ([[User talk:Graybeal|talk]]) 16:58, 3 May 2013 (MDT)=======
 
=======Re: Re: Re: Re: Re: -- [[User:Graybeal|Graybeal]] ([[User talk:Graybeal|talk]]) 16:58, 3 May 2013 (MDT)=======

Revision as of 17:00, May 3, 2013

-- Graybeal (talk) 16:44, 3 May 2013 (MDT)

Nan 4/22/2013:

It might be a good idea to cross check against the definitions that NODC has added - as part of their NetCDF template project they wrote some better descriptions. They're at nodc.noaa.gov/data/formats/netcdf/

There are a few categories of terms that need better definitions, IMHO.

1. people: creator_name (recommended) publisher_name (suggested)

In a 'normal' research/observing/modeling situation, who are these people?

I think there are 2 necessary points of contact, the person who 'owns' the research and gives you the go-ahead to use/publish the data, and the person who put the data into the file and/or on line. You don't really need to know how to contact the other contributors, even if they had equally or more important roles.

I believe that NODC recommends naming the principal investigator as the 'creator' - although in some circumstances there is no single PI, so maybe we should say this is the person who grants the use of the data.

I'm using the publisher as the person who wrote the actual file that contains the terms, and I'm listing co-PIs and data processors as contributors.

2. file times: date_created (recommended) date_modified (suggested) date_issued (suggested)

These could well have different meanings for model data; for my in situ data, I have 2 (or, for real time data, possibly 3) useful file times; the time the last edit or processing occurred, which is the version information and could be useful if the underlying data has been changed, and the time the file was written, which could provide information about translation errors being corrected. (We don't update files, we overwrite them; some people might need to describe the time the original file was written and time of last update?) For real time data it could also be interesting to know the last time new data arrived, which could be asynchronous.

NODC doesn't seem to use date_issued, but they have defs for created and modified.

date_created: "The date or date and time when the file was created. ... This time stamp will never change, even when modifying the file."

date_modified: This time stamp will change any time information is changed in this file.

3. Keywords - since iso uses keyword type codes instead of cramming all the possible keywords (theme, place, etc) into one structure, I don't see why we don't do something similar. We could use our pseudo-groups syntax; keywords_theme, keywords_dataCenter ...etc.

4. coordinate 'resolution' terms - the word resolution is a poor choice, and if it's going to be kept, it needs to be defined as meaning 'spacing' or 'shape' and not an indication of the precision of the coordinate. For measurements that are irregularly spaced along a mooring line, it's fairly useless - unless we come up with a vocabulary describing this and other possible values.

For my data, the term might be more useful with the other definition; our depths are approximate 'target depths', and, while we may know the lat/long of an anchor and of a buoy (the latter being a time series, the former being a single point) we don't actually know the lat/long of any given instrument on a mooring line. The watch circle of the buoy is really the 'resolution' we need to supply here.

Re: -- Graybeal (talk) 16:48, 3 May 2013 (MDT)

Ted 4/22/2013: Most of these concerns are discussed at http://wiki.esipfed.org/index.php/NetCDF,_HDF,_and_ISO_Metadata along with more general solutions.

Re: -- Graybeal (talk) 16:48, 3 May 2013 (MDT)

Nan 4/26/2013: One other item that I think we might need to have - beyond better definitions for some of the existing terms - is a CV for contributor roles. I think one exists, somewhere, but I'm not sure where. BODC, maybe? MMI? Or should this really be free text?

Re: Re: -- Graybeal (talk) 16:49, 3 May 2013 (MDT)

John 4/26/2013: Should be from a controlled vocabulary IMHO. BODC has (for SeaDataNet) an extension of ISO role terms, if I recall correctly. I think it isn't just for contributor roles, it's for all roles that this is needed—ISO wasn't very thorough in the first place, but there will always be new ways for people to be connected to a data set.
I don't think we have to be restrictive (in what roles are allowed) but I think we should try to be explicit (about what a role means).
Re: Re: Re: -- Graybeal (talk) 16:50, 3 May 2013 (MDT)
Ted 4/26/2013: I agree completely that a shared vocabulary with definition is critical. The old ISO vocab is at https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries#CI_RoleCode. Many new roles were added in the most recent revision. There is also a brief discussion at http://wiki.esipfed.org/index.php/ISO_People (I will update that list to include revisions)...
What is really important is that the representation allow specification of the source of the code along with the code itself. This is possible in THREDDS, but not ACDD. The job of the standard is to say we use a codelist for this item and that codelist has a location. It is the communities job to say: this is the codelist that our community uses.
Re: Re: Re: Re: -- Graybeal (talk) 16:53, 3 May 2013 (MDT)
Derrick S: Codelists can be seen as antithetical to the CF goal of creating self describing files. Can we figure out a way to encode ISO objects with the need for references to other objects while still staying true to our goal of remaining aligned with CF? The last thing I'd want us to recommend is to open a door down a pathway back to Grib and BUFR.
=Re: Re: Re: Re: Re: -- Graybeal (talk) 17:00, 3 May 2013 (MDT)=
Edward A: Regarding CF, in some ways they already use "codelists", e.g., standard names, so it is not entirely new. Its just their standard names are very human readable at the same time.
=Re: Re: Re: Re: Re: -- Graybeal (talk) 16:58, 3 May 2013 (MDT)=
I think we can use terms from a CV, but they should be meaningful, not URLs or those lovely 5 character codes that hark back to languages we've forgotten we ever knew.
We can select one CV, or we can add a term 'rolecode_vocabulary' (that would be fairly reasonable, since we're already using 'keyword_vocabulary').
The SDN roles below are new, but the ISO roles are from a slightly outdated page at NODC. I just find this format easier to look at than the full xml and csv formats that are available on line.
Personally, neither of these is very appealing - I hope the new ISO codes will be better.

SeaDataNet roles

  • metadata collator: Responsible for the compilation of metadata for one or more datasets and submission of that metadata to the appropriate SeaDataNet metadata repository.
  • programme operation responsibility: Responsible for the operation of a data collecting programme.
  • programme archive responsibility: Responsible for the archive centre handling distribution of delayed mode data from a collecting programme and the long term stewardship of its data.
  • programme realtime responsibility: Responsible for the centre handling distribution of true and near real time data from a collecting programme.
  • contact point: Person responsible for the provision of information in response to queries concerning the metadata or underlying data.
  • principal funder: Person or organisation that funds the majority of an activity. contributing funder: Person or organisation that contributes to the funding of an activity.
  • principal investigator: Scientific lead of data collection within a programme


ISO roles

  • resourceProvider: party that supplies the resource
  • custodian: party that accepts accountability and responsability for the data and ensures appropriate care and maintenance of the resource
  • owner: party that owns the resource
  • sponsor: party that sponsors the resource
  • user: party who uses the resource
  • distributor: party who distributes the resource
  • originator: party who created the resource
  • pointOfContact: party who can be contacted for acquiring knowledge about or acquisition of the resource
  • principalInvestigator: key party responsible for gathering information and conducting research
  • processor: party who has processed the data in a manner such that the resource has been modified
  • publisher: party who published the resource
  • author: party who authored the resource
  • collaborator: party who conducted or contributed to the research