Talk:Interagency Data Stewardship/LifeCycle/Preservation Forum

From Earth Science Information Partners (ESIP)
Revision as of 08:55, January 25, 2009 by Bruce R. Barkstrom (Brb) (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Some Additional Notes on Digital Preservation -- Bruce R. Barkstrom (Brb) 09:53, 25 January 2009 (EST)

In addition to the normal ESIP agency roster, the Library of Congress (LoC) and the National Archives and Records Administration (NARA) have been quite active in trying to deal with long- term information preservation. A couple of years ago, a fourteen agency working group did provide a report on the issues preservation raises, including one on data accession policy.

Such a policy is of interest since both NOAA and NASA have been developing agency-specific policies regarding which data deserve long-term preservation.

I'll see if I can dig out a URL for the LoC and NARA recommendations.

Another interesting source of information on preservation comes from the InterPARES Project, which has more of an archivist flavor to the discussion. I'll get references in a few days (they're also on the web, just don't have them immediately handy).

As background, there are probably three or four major groups that have had long-running interests in digital preservation: 1. Librarians, which tend to bring in a background strongly flavored by experience with text and, sometimes, by music or image collections; 2. Archivists, which tend to bring in a background flavored by experiences with preserving physical documents (and which has created "diplomatics" as a specialty that deals with how to know that a document actually comes from the purported source); 3. Semantic Web e-science groups, which tend to bring in a background flavored by experiences with biological databases.

It will be important to identify some of the distinctive characteristics of Earth science data that are probably a bit outside some of these backgrounds, including a. Collective authorship b. Huge collections (meaning millions of files and Petabytes of data volume) c. Special concerns with the ties between provenance (meaning the chain-of-custody and history of production) and data uncertainty particularly dealing with detectability of trends and changes in extreme value statistics) d. Evolution of collections and metadata standards