Data Management Workshop

From Earth Science Information Partners (ESIP)

Background

The ESIP Federation, in cooperation with NOAA, seeks to share the community's knowledge with scientists who increasingly need to be better data managers. Over the next several years, the ESIP Federation expects to evolve training courses which seeks to improve the understanding of scientific data management among scientists and emerging scientists. Initially, a 1.5 hour workshop is to be held at the 2010 Fall meeting of the American Geophysical Union (AGU). The workshop may form the basis for an online course. Short courses, certificate programs, and university courses may be developed in the future. The AGU workshop is scheduled for Tuesday Dec. 14 from 1200h-1330h in Moscone South Rooms 228-230.

Advisory Team

  • Dave Anderson, NOAA/NCDC
  • Ken Casey, NOAA/NODC
  • Bob Cook, ORNL
  • Ruth Duerr, NSIDC/Chair, ESIP Data Preservation & Stewardship Cluster
  • Peter Fox, Rensselaer Polytechnic Institute, AGU Geoinformatics
  • Ted Habermann, NOAA/NGDC
  • Patricia Huff, NOAA/NESDIS
  • Carol Meyer, ESIP staff
  • Nancy Ritchey, NOAA/NCDC
  • Ron Weaver, NSIDC

AGU Workshop Description

Writing Your Data Management Plan

Whether you need to include a data management plan in your NSF proposal, want to make data exchange in your field as transparent as possible, or just aim to maximize the visibility of your science in the Internet World, this workshop is for you. Earth scientists face increasing pressure to share their results not just in journals, but in many other settings. Data produced sometimes long ago for one purpose are now being successfully applied to emerging problems in entirely different disciplines. A concrete data management plan developed early in your research project can make you and your data more visible, more successful, and increase the impact of your science.

In this Earth Science Information Partners-sponsored workshop (ESIP), representatives from NOAA, NASA, and other data archive centers will provide an overview into the world of successful data stewardship, examine emerging standards and trends, and provide concrete steps for managing your Earth Science data. We will present our roadmap to completion of the recently distributed NSF data management requirement. We will conclude with a question and answer session. Workshop duration is 1.5 hours.

Outline

1. Introduction (20 min total)

  • Welcome/ Goals (Anderson)
  • Data preservation and climate change (video) (Tom Karl, Climate Service)
  • Return on your investment (video) (Peter Fox)
  • What not to do (video) (Anderson)
  • Remarks from NSF (video) (Cliff Jacobs, Geosciences Directorate)
  • Outline (what this is, and is not) (Anderson)

2a. Elements of a data management plan (30 min) Ruth Duerr, Presenter Material adapted from DCC and other sources to be ‘from the scientists perspective’ rather than the archive perspective, trying to capture essential elements common to any plan and avoid being prescriptive. Ruth leads development of this material, with Ron, Bob, possible help from Nancy Ritchey. Resource: Digital Curation Center (UK) checklist. DataONE and Data Conservancy are creating similar checklists.

  • Identify the materials that will be created
  • Standards and organization
  • Access, sharing, and re-use
  • Backups, archiving, and preservation

2b. Questions (panel includes Ron Weaver (NSIDC), Ruth Duerr (NSIDC and Data Conservancy), Ken Casey (NODC), Bob Cook (ORNL DAAC and DataONE), Cliff Jacobs, Geosciences Directorate, NSF (10 min)).

3a. Long term archive topics (20 min)

  • What data goes to a long term archive, and what does not? (Weaver)
  • What do long term archives do with my data? (Casey)
  • Role of metadata (descriptions) in discovery and future use (Cook)
  • Big payoff (video) (Fox).

3b. Second Question period (10 min)

HANDOUT: A one page summary with URL’s to more information.

Links

Questions for discussion session

-My data is already on my web site. Why do I have to use a long term archive?

-I intend to put into my NSF Data Management Plan that if funded, I will send my data to the National Data Center. But how do I do actually do that?

-How do I know/determine the necessary metadata for my dataset?

-How can archives promise to make data available forever?

-I hate metadata. What is the minimum information I need to provide with my data?

-What is a submission agreement?

-When to I have to make my data available to the public?

-What is the cloud and how does it relate to data sharing and long term preservation?

-What is a DOI?