Interagency Data Stewardship/Citations

From Earth Science Information Partners (ESIP)

Back to Preservation and Stewardship

This is an historical document from the first round of development of the ESIP Data Citation Guidelines .

Current guidelines are at

Current activity to revise the guidelines are now in a Google Doc.

Citation Guidelines

Data Citation Guidelines for Data Providers and Archives

Data Citation Guidelines for Data Users

Data Citation Guidelines for Journal Editors and Reviewers

Citation Notes

DCC How to Cite Datasets and Link to Publications Another DCC Guide to citing Datasets

GeoData 2011 Talk on Citations (some comments from RPI blog)

Updated Presentation based on feedback at GeoDara2011 and ESIP Meetings.

IPY Data Citation Guidelines

A community hub for progress notes on tool support for tracking citations to nontraditional products

Why the term 'Data Publication'?

Data Identifiers and Citations Enable Reproducible Science Curt Tilmes AGU 2011 slides

DataCite CrossRef DOI Citation Formatter

Citation Scenarios

Glacier Photo Collection

The physical items that might be cited are the developed photographs done by many different photographers. Such physical objects are one kind of AIU. The "atomic" digital artifacts are the electronic images, mostly tiff files, although one might also include jpg's, and a small number of other images in other formats. In terms of the OAIS RM terms, the electronic images are the AIU's.

As a scenario, suppose I wanted to use the photos in the collection to get a survey of the change in glacier coverage or glacier size in the last century. If my understanding of the spreadsheet that Ruth sent me a year or so ago is correct, there are 200,000 digital images in the total collection. The earliest image in the 10,000 item spreadsheet I received was taken from a digitized image whose original photo was taken in 1898. About half (or so) of the images are connected with named glaciers. The sampling of any particular named glacier is temporally intermittent or sporadic. So a survey might include up to something like 100,000 images. If I wanted to do comparative geographic studies, then I might select one or two smaller areas for a pilot study of the feasibility of determining how accurately I could determine changes in glacial area. In order to report on the pilot study in a way that would allow another investigator to verify or replicate my work, I think I'd need to cite which scanned images I had used in these two smaller samples. That's probably something like10,000 references.

For understanding the pilot study, if I recall my earlier publications on radiative transfer in snow, there are a couple of important facts I'd need to keep in mind:

a. Snow reflection is not isotropic (in other words, more light is reflected in some directions than in others) and depends on both the zenith angle and azimuth of the Sun, on the cloud coverage (and lots of other optical stuff). Thus, if I were going to try to be quantitative, I'd need the location of the Sun - which could be derived if the Date and UTC of the photo were recorded.

b. I'd need to know the direction in which the optical axis of the camera was looking, as well as the location of the camera so I could deal with the geometry of reflection.

c. I'd need to know how well the camera was calibrated. (I spent about four years in graduate school taking telescope images on glass plates and developing them in the observatory dark room. I have a fairly pessimistic view of absolute calibration of photographic images.) Such a calibration would usually involve exposing the film (or glass plate) to a calibration source. I don't know if this was done on the images in the photo collection. I also don't know how the scanning of the original images was calibrated.

d. If the pilot investigation used relative brightness and area, then it would be necessary to say something about the algorithm being used.

Note that the numbers of references is important to the practicality of citation, while the context material is needed for enumerating items that need preservation. Thus, this kind of material is needed for justifying preservation of particular items.

Citation Scenario Issues

  1. What kind of "atomic" item is being cited (choosing from a small list that should probably include at least the following: a data file, a data element within a file, a relational (or other) database, a job "residue)? [Note that what is sometimes called a "dataset" is a collection of these "atomic" items and should not be included in this list. If there's a need to get technical, I'll note that the OAIS Reference Model states that an archive consists of Archival Information Packages that are either Archive Information Collections (AICs) or Archive Information Units (AIUs). When I said "atomic" items, I mean AIUs. In this language, I'd interpret "granules" as being Dissemination Information Packages that have already be pre-packaged by an archive for managerial convenience. The DIP's would be made of AIU's.]
  2. How many "atomic" items are in a typical citation for the scenario being considered?
  3. In order to have the cited items be usable or useful, what other digital or physical objects need to be available?