Interagency Data Stewardship/LifeCycle/Preservation Forum/TeleconNotes/20100908

From Earth Science Information Partners (ESIP)

Data Preservation and Stewardship Telecon – September 8, 2010

Present

Ruth Duerr, Brian Rogan, Bruce Barkstrom, Gao Chen, Yuechen Chi, Curt Tilmes, Al Fleig, Mark Parsons, Rob Raskin, Bruce Vollmer, Bob Downs, Nancy Hoebelheinhrich, John Moses


Agenda

Project status reports

Ruth mentioned that there was an interest in a Provenance group being formed and how it might work in conjunction with this group. She has had a conversation with Rahul Ramachadran and he does not want to interfere with the work of the group but had thought there was a potential need for a separate cluster. For now, he is not pursuing the formation of the group.

ID’s paper

Ruth is very close to finishing the ID’s paper. It is close to draft form and ready to go out to the authors by next Friday. Interest in the paper is very substantial and needs to get out.


ID’s testbed activities

Nancy has figured out what needs to be done in order to query the SQL database and the component XML files and we can determine how many components need images. It should be a straightforward process to finish the DOI work. There may be reasons for the DOIs to be more queriable. It is very sparse descriptive information at this point. Once the DOIs are inserted with the component, Nancy will write up the process of what was needed to complete it.

Nancy hopes to be able to get to the other identifiers but it has been difficult to get together with others. The other identifier schemes should be easier. UUID's is next on the list. She is very optimistic to be able to get through the top four identifiers on the list. She doesn’t plan on taking this into other collections at this point since this was what was agreed on.

Bruce raised the issue of how important are PURLS on the list? It is currently number two on the list. ARK's are favored since they seem to be gaining market share. ARK fits into the URI interface very easily and solves many of the problems that PURLS have.

Ideally it would be good to do the entire set but there may be time constraints. Nancy will plug ahead with the given list that was created and will report on her progress and see what direction they want to go.

Yuechen asked about the data. It was realized that he had not been given the spreadsheet by Ruth. She will follow up and send it to him.


ESIP stewardship principles and best practices draft

Ruth reminded everyone that the best practices drafts are up on the wiki and that it will be presented to the larger group at the winter meeting. Now is a good time to review the wiki and suggest any changes or updates.


Provenance papers

Curt will try and get a draft out by next Friday now that his schedule has been freed up. Mark gave an review of the IT&I telecon and the presentation by Beth Plale. She discussed the work her group was doing on Sea Ice and its’ relation to provenance. The focal point was scientific reproducibility. There are several NASA grants that were awarded that were related to Provenance. It was hoped that Rahul would have been able to call in to the group.

The upcoming January meeting would be a good place to have more time to discuss Provenance since there isn’t enough time usually at the ESDWG meetings to discuss this topic. There should be an invitation to the IT&I group to work together since their work is quite parallel to the work we are doing.

Mark mentioned that there was a list of new PIs who were awarded Access grants who should be invited to work together at the upcoming winter meeting.

There will not be a new provenance cluster formed but rather IT&I will work together with the preservation group since some of the work that is done is parallel to the work being done here.

There was a broad discussion of what Provenance is all about. If Rahul and Hook were interested in a new group is there enough interest in Preservation and Stewardship of some of it was diverted. It was mentioned that there were clearly a large number of topics that could be covered in this group even if Provenance was spun off into a new direction.

Ruth mentioned that her term is up as chair of the group and encouraged anyone who was interested to take up the task of chairing the group.


AGU preparations

There will be a data management group led by Carol, related to data management planning. It will happen either Sunday or Wednesday. We anticipate a larger draw. There will be a series of telecons to plan for the session. Anyone who is interested may call in.


January meeting preparations

There will be a session devoted to the future of this group and other related clusters. This could be an hour or so determining how this cluster should continue.

There should be another provenance discussion with goal of determining how the field has advanced over the last year and a half. Rob suggested that provenance might be the theme of the meeting. There was a consensus that this was a good idea but thought Provenance might be too soon of a theme for this meeting - perhaps the meeting after? A review of each of the provenance technologies out there, could be a useful session.

Ruth noted that Rama had asked her about the potential for development of a provenance and context standard for the Earth Sciences (e.g., building off the Hunolt paper).

Another suggestion was to review the existing provenance ontologies for the semantic web, differences, similarities, gaps, etc.

One thought was to start off with Curt giving an overview of the state of the art in provenance followed by each of the funded NASA ACCESS projects talking about their progress and issues.

Specific questions which might be tackled in these sessions, include:

  • Which standards/models should be used?
  • What testbed activities might help
  • What to do about domain specific needs? E.g., vocabularies, etc.

Another session would be to continue with identifiers and determine the next section to work on. For example, in the first go around we've only considered data sets as a whole and individual items (granules) within a data set. What other things need identifiers?

Mark mentioned two groups in particular who are working on data:

  • Data Cite, a consortium of universities and libraries have put out a metadata schema for citation and they are looking for comment. Mark will send out the URL to the group. They are coming from
  • GEOS has been following our work and has a task on their agenda specifically on data citation and peer review. They wanted someone from ESIP to go and he will be attending the meeting. It is mostly focused on data quality.

There was a significant discussion that occurred regarding the definition of Provenance. It was agreed that though it was a timely discussion, it was one for a later date and more probably for the upcoming winter meeting.

The meeting finished at 4:10 PM EST.