Difference between revisions of "Interagency Data Stewardship/LifeCycle/Jan2011Meeting"

From Earth Science Information Partners (ESIP)
Line 2: Line 2:
  
 
===Tuesday, January 4, 2011 ===
 
===Tuesday, January 4, 2011 ===
* 2:00-3:30 Citation guidelines and identifiers  
+
====2:00-3:30 Citation guidelines and identifiers =====
 
** Presentation on citations - Mark P.
 
** Presentation on citations - Mark P.
 
** Short presentations on ID's paper and testbed - Ruth and Nancy
 
** Short presentations on ID's paper and testbed - Ruth and Nancy
Line 8: Line 8:
 
*** [[File:ESIP-Identifiers.pdf]]
 
*** [[File:ESIP-Identifiers.pdf]]
 
** Discussion and develop plan
 
** Discussion and develop plan
* 4:00-5:30 Towards a Earth Science provenance/context content standard - Part I
+
===== Notes from Session =====
 +
* Encourage publishers to enforce citation requirements
 +
** Papers need a unique name and location (URL + URN)
 +
** Location independent- copies everywhere have the same ID. Can be made without internet access or naming authority
 +
** Unique locator- location is invariant and can always be found in at least one place
 +
** Citable identified- Same as unique locator but also accepted by publishers, reduces clutter from granule level citation
 +
** Scientifically unique identifier- possible to verify that contents are unchanged after format change/rearrangement - ensures that data does not get tampered with and remains untouched
 +
** Different id schemes were assessed based on technical value, user value, archive value, and existing usage in data centers.
 +
*** UUID most promising for Unique identifier
 +
*** Most are fine for Unique locators
 +
*** DOI most suitable for Citable locator
 +
*** No existing models are optimized for Scientifically unique identifier
 +
*** Different schemes solve different problems, plan on supporting lots of identifiers continuously as they go in and out of service- Best recommendation: a UUID and DOI at minimum.
 +
** Also suggested: use UUI for collection identification only and relegate details to metadata.
 +
** Follow up plan- work to have UUID granules/files and DOI data sets set as NASA standards
 +
 
 +
 
 +
==== 4:00-5:30 Towards a Earth Science provenance/context content standard - Part I====
 
** Review Earth science provenance/context requirements - Rama/John Moses  
 
** Review Earth science provenance/context requirements - Rama/John Moses  
 
*** [[File:Provenance Context Content Standard 20101230.ppt]]
 
*** [[File:Provenance Context Content Standard 20101230.ppt]]
 
** Begin to develop a plan for creating the standard
 
** Begin to develop a plan for creating the standard
 +
===== Notes from Session =====
 +
* Review Earth science provenance/context requirements
 +
** Data extremes are often broken down into archival units
 +
** Controlled vocabulary needed for distinguishing data types (defines format, granularity, etc.)
 +
** Data versioning is more complicated than software versions- the same data from the same system but with different calibrations could have different version names
 +
* How to distinguish all individual granules?
 +
** Example: Using FOO satellite data, tag each granule w/ UUID then DOI for the whole collection of granules
 +
** Problem, corruption errors in archives results in deletion and replacement of the data. now experiment cannot be replicated
 +
** Providence information is retained for original data, even so data itself is deleted
 +
** If corrupted data is remade, it gets a new UUID...how does the reproduced experiment get cited?
 +
** Very messy problem- who made it? Does anyone deserve credit for reformatting it?
 +
** Is it possible to make it reproducible or to make it cite-able?
 +
* DOI and UUID have limitations
 +
** So consider a "process on demand" Dataset and an ephemeral "data transformation" web service
 +
** Can you look at data citations and determine if two researchers are using same data granules?
 +
* Begin to develop a plan for creating the standard
 +
** Should the federation develop citation guidelines and best practices for the use of identifiers?
 +
** Other organizations are already doing it, does ESIP need to as well?
 +
** ESIP should explicitly clarify roles and functions of identifiers for the organizations creating standards.
 +
** Establish principles on which the identity of data are assigned
 +
* Proposals for citation can be measured against this criteria
 +
* Be ready now to tell the scientific community how to cite data
 +
* Enabling citation to lead to reproducibility standard
 +
CONCLUSIONS
 +
* Identify roles and functions of identity
 +
* Recommend which identifiers are appropriate
 +
* Guidelines how to cite ESIP data
 +
* Develop guidelines with recognition that its ongoing process that will continuously improve
 +
* Who to work with for this?
 +
** DATAcite, among others
  
 
===Wednesday, January 5, 2011 ===
 
===Wednesday, January 5, 2011 ===

Revision as of 17:17, January 13, 2011

ESIP 2011 Winter Meeting Plans

Tuesday, January 4, 2011

2:00-3:30 Citation guidelines and identifiers =

    • Presentation on citations - Mark P.
    • Short presentations on ID's paper and testbed - Ruth and Nancy
    • Walk through of FOO related examples - Curt
    • Discussion and develop plan
Notes from Session
  • Encourage publishers to enforce citation requirements
    • Papers need a unique name and location (URL + URN)
    • Location independent- copies everywhere have the same ID. Can be made without internet access or naming authority
    • Unique locator- location is invariant and can always be found in at least one place
    • Citable identified- Same as unique locator but also accepted by publishers, reduces clutter from granule level citation
    • Scientifically unique identifier- possible to verify that contents are unchanged after format change/rearrangement - ensures that data does not get tampered with and remains untouched
    • Different id schemes were assessed based on technical value, user value, archive value, and existing usage in data centers.
      • UUID most promising for Unique identifier
      • Most are fine for Unique locators
      • DOI most suitable for Citable locator
      • No existing models are optimized for Scientifically unique identifier
      • Different schemes solve different problems, plan on supporting lots of identifiers continuously as they go in and out of service- Best recommendation: a UUID and DOI at minimum.
    • Also suggested: use UUI for collection identification only and relegate details to metadata.
    • Follow up plan- work to have UUID granules/files and DOI data sets set as NASA standards


4:00-5:30 Towards a Earth Science provenance/context content standard - Part I

Notes from Session
  • Review Earth science provenance/context requirements
    • Data extremes are often broken down into archival units
    • Controlled vocabulary needed for distinguishing data types (defines format, granularity, etc.)
    • Data versioning is more complicated than software versions- the same data from the same system but with different calibrations could have different version names
  • How to distinguish all individual granules?
    • Example: Using FOO satellite data, tag each granule w/ UUID then DOI for the whole collection of granules
    • Problem, corruption errors in archives results in deletion and replacement of the data. now experiment cannot be replicated
    • Providence information is retained for original data, even so data itself is deleted
    • If corrupted data is remade, it gets a new UUID...how does the reproduced experiment get cited?
    • Very messy problem- who made it? Does anyone deserve credit for reformatting it?
    • Is it possible to make it reproducible or to make it cite-able?
  • DOI and UUID have limitations
    • So consider a "process on demand" Dataset and an ephemeral "data transformation" web service
    • Can you look at data citations and determine if two researchers are using same data granules?
  • Begin to develop a plan for creating the standard
    • Should the federation develop citation guidelines and best practices for the use of identifiers?
    • Other organizations are already doing it, does ESIP need to as well?
    • ESIP should explicitly clarify roles and functions of identifiers for the organizations creating standards.
    • Establish principles on which the identity of data are assigned
  • Proposals for citation can be measured against this criteria
  • Be ready now to tell the scientific community how to cite data
  • Enabling citation to lead to reproducibility standard

CONCLUSIONS

  • Identify roles and functions of identity
  • Recommend which identifiers are appropriate
  • Guidelines how to cite ESIP data
  • Develop guidelines with recognition that its ongoing process that will continuously improve
  • Who to work with for this?
    • DATAcite, among others

Wednesday, January 5, 2011

  • 1:45-3:15 Towards a Earth Science provenance/context content standard - Part II
    • Complete plan for standards development
  • 3:45-5:15 Towards an Earth Science provenance/context ontology - Part I

Thursday, January 6, 2011

  • 10:30-12:00 Towards an Earth Science provenance/context ontology - Part II
    • Refine use cases
    • Complete plan to develop ES Provenance/Context Ontology
  • 1:30-3:00 Cluster business meeting
    • Chair/co-chair election - 15 min
    • Summarize results and plans from sessions ~ 30 min
    • Moving testbed activities forward ~ 30 min