Interagency Data Stewardship/LifeCycle/Jan2011Meeting
From Earth Science Information Partners (ESIP)
ESIP 2011 Winter Meeting Plans
Tuesday, January 4, 2011
2:00-3:30 Citation guidelines and identifiers =
- Presentation on citations - Mark P.
- Short presentations on ID's paper and testbed - Ruth and Nancy
- Walk through of FOO related examples - Curt
- Discussion and develop plan
Notes from Session
- Encourage publishers to enforce citation requirements
- Papers need a unique name and location (URL + URN)
- Location independent- copies everywhere have the same ID. Can be made without internet access or naming authority
- Unique locator- location is invariant and can always be found in at least one place
- Citable identified- Same as unique locator but also accepted by publishers, reduces clutter from granule level citation
- Scientifically unique identifier- possible to verify that contents are unchanged after format change/rearrangement - ensures that data does not get tampered with and remains untouched
- Different id schemes were assessed based on technical value, user value, archive value, and existing usage in data centers.
- UUID most promising for Unique identifier
- Most are fine for Unique locators
- DOI most suitable for Citable locator
- No existing models are optimized for Scientifically unique identifier
- Different schemes solve different problems, plan on supporting lots of identifiers continuously as they go in and out of service- Best recommendation: a UUID and DOI at minimum.
- Also suggested: use UUI for collection identification only and relegate details to metadata.
- Follow up plan- work to have UUID granules/files and DOI data sets set as NASA standards
4:00-5:30 Towards a Earth Science provenance/context content standard - Part I
- Review Earth science provenance/context requirements - Rama/John Moses
- Begin to develop a plan for creating the standard
Notes from Session
- Review Earth science provenance/context requirements
- Data extremes are often broken down into archival units
- Controlled vocabulary needed for distinguishing data types (defines format, granularity, etc.)
- Data versioning is more complicated than software versions- the same data from the same system but with different calibrations could have different version names
- How to distinguish all individual granules?
- Example: Using FOO satellite data, tag each granule w/ UUID then DOI for the whole collection of granules
- Problem, corruption errors in archives results in deletion and replacement of the data. now experiment cannot be replicated
- Providence information is retained for original data, even so data itself is deleted
- If corrupted data is remade, it gets a new UUID...how does the reproduced experiment get cited?
- Very messy problem- who made it? Does anyone deserve credit for reformatting it?
- Is it possible to make it reproducible or to make it cite-able?
- DOI and UUID have limitations
- So consider a "process on demand" Dataset and an ephemeral "data transformation" web service
- Can you look at data citations and determine if two researchers are using same data granules?
- Begin to develop a plan for creating the standard
- Should the federation develop citation guidelines and best practices for the use of identifiers?
- Other organizations are already doing it, does ESIP need to as well?
- ESIP should explicitly clarify roles and functions of identifiers for the organizations creating standards.
- Establish principles on which the identity of data are assigned
- Proposals for citation can be measured against this criteria
- Be ready now to tell the scientific community how to cite data
- Enabling citation to lead to reproducibility standard
CONCLUSIONS
- Identify roles and functions of identity
- Recommend which identifiers are appropriate
- Guidelines how to cite ESIP data
- Develop guidelines with recognition that its ongoing process that will continuously improve
- Who to work with for this?
- DATAcite, among others
Wednesday, January 5, 2011
- 1:45-3:15 Towards a Earth Science provenance/context content standard - Part II
- Complete plan for standards development
- 3:45-5:15 Towards an Earth Science provenance/context ontology - Part I
- Review of various provenance - related standards and efforts
- OPM/OPMv - Hook - 15 min
- W3C incubator group Provenance Vocabulary Mappings - Curt
- Review of various provenance - related standards and efforts
Thursday, January 6, 2011
- 10:30-12:00 Towards an Earth Science provenance/context ontology - Part II
- Refine use cases
- Complete plan to develop ES Provenance/Context Ontology
- 1:30-3:00 Cluster business meeting
- Chair/co-chair election - 15 min
- Summarize results and plans from sessions ~ 30 min
- Moving testbed activities forward ~ 30 min