Interagency Data Stewardship/Citations/ipy guidelines

From Earth Science Information Partners (ESIP)

How to Cite a Data Set

Updated: 19 June 2008

"To recognize the valuable role of data providers (and scientists who collect or prepare data) and to facilitate repeatability of IPY experiments in keeping with the scientific method, users of IPY data must formally acknowledge data authors (contributors) and sources. Where possible, this acknowledgment should take the form of a formal citation, such as when citing a book or journal article. Journals should require the formal citation of data used in articles they publish..."

"IPY Data Policy"(PDF)

By encouraging proper citation of data sets, data providers and publishers receive appropriate credit for their efforts, the perception of data management as a discipline improves, and it is easier to track the use and impact of the data. In scientific publication, merely acknowledging the data set in the text or in the acknowledgments section is insufficient. These guidelines can help data users develop appropriate citations for data used in their publications, and can help data managers recommend appropriate citation of their holdings. These guidelines were adapted from internal guidelines used by the National Snow and Ice Data Center, which has encouraged formal data citation for more than a decade.

In general, data sets should be cited like books. Used here is the author-date system described in "Chicago Manual of Style, 15th Edition". When users cite data, they need to use the style dictated by their publishers, but by providing an example, data publishers can give users all the important elements they should include in their citations of data sets.

An example of a citation in the author-date system is:

Algire, G. H., and F. T. Legallais. 1948. Biology of Melanomas. ed. R. W. Miner. New York: New York Academy of Sciences.

As seen in this example, the elements of the citation in order are: Author(s). Date. Title. Editor. Place of Publication. Publisher. All these elements are common in data set citations, but other elements, as described below, are commonly used as well. Data publishers (e.g. data centers) have a responsibility to work with data providers and science teams to develop the actual content of the citation.

Citation Content

The citation should include the following elements as appropriate. Although this is shown as a literary citation, most of the elements are captured in standard metadata. A mapping to the "Citation Information" section of the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) ("FGDC-STD-001-1998") is indicated:

Author or investigator idinfo > citation > citeinfo > "origin"
Publication date idinfo > citation > citeinfo > "pubdate" and sometimes "othercit"
Title idinfo > citation > citeinfo > "title" and possibly "edition"
Dates used not applicable
Editor or compiler idinfo > citation > citeinfo > "origin"
Publication place idinfo > citation > citeinfo > "pubplace"
Publisher idinfo > citation > citeinfo > "publish"
Distributor or associate publisher idinfo > citation > citeinfo > "othercit"
Distribution medium or location idinfo > citation > citeinfo > "othercit" or "onlink"
Access date not applicable
Data within a larger work idinfo > citation > citeinfo > "othercit" or "lworkcit"

Author or Investigator

This is the individual(s) whose intellectual work, such as a particular field experiment or algorithm, led to the creation of the data set.

Oberbauer, S. 2000. Ecosystem carbon fluxes, Toolik Lake, Alaska 1995. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/arcss006.html".

A particular group or organization may sometimes be the author.

Arctic Climatology Project. 2000. Environmental Working Group Arctic meteorology and climate atlas. Edited by F. Fetterer and V. Radionov. Boulder, Colorado USA: National Snow and Ice Data Center. CD-ROM.

If the data set is a collection of several smaller, independent data sets, the individual data sets would have their own specific citations with author, but the whole collection would not have an author. The collection would likely have an editor or compiler, though.

Cross, M. compiler. 1997. Greenland summit ice cores. Boulder, Colorado USA: National Snow and Ice Data Center in association with the World Data Center A for Paleoclimatology at NOAA-NGDC, and the Institute of Arctic and Alpine Research. CD-ROM.

Publication Date

For a completed data set, the publication date is simply the year of release.

Helmig, D. 2004 Vertical Boundary Layer Profiles for Ozone and Meteorological Parameters at Summit, Greenland, 2000. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/arcss100.html".

For a data set that is updated infrequently or on an irregular basis, list the first year of publication followed by "updated" with the current update information. This is appropriate when the title or version of the data set does not change, the data are simply updated.

Osterkamp, T. 1999, updated 2001 Daily air and active layer temperatures from permafrost observatories in Alaska, 1986-2001. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/arcss106.html".

For an ongoing data set that is updated on a regular or continual basis, list the first year of publication followed by the last update. Updates could occur annually or more frequently.

Maslanik, J. and J. Stroeve. 1999, updated quarterly. DMSP SSM/I daily polar gridded brightness temperatures, Jan. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/nsidc-0001.html".


Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Sea Ice Extent 5-Min L2 swath 1km V005, Oct. 2007–Apr. 2008. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/myd29v5.html".

A note on updates vs. new versions:

Ongoing updates to a time series do change the content of the data set, but they do not typically constitute a new version or edition of a data set. New versions typically reflect changes in sampling protocols, algorithms, quality control processes, etc. Both a new version and an update may be reflected in the publication date. The title should indicate the new version.

Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2006, updated daily. MODIS/Terra snow cover Extent 5-Min L2 swath 1km V005, Oct. 2007–Apr. 2008. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/myd29v5.html".

Hall, D. K., G. A. Riggs, and V. V. Salomonson. 2000, updated daily. MODIS/Terra snow cover 5-Min L2 swath 500m V004, Oct. 2007–Apr. 2008. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/myd29v5.html".

If a particular version of a time series is discontinued, it is appropriate to indicate when the final update occurred.

Hall, D. K., G. A. Riggs, and V. V. Salomonson. 2000, updated 2002. MODIS/Terra snow cover 5-Min L2 swath 500m V003, Jan. 2001–Apr. 2001. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/myd29v5.html".

Title

This is the formal title of the data set. It may also include version or edition information.

Liu, H., K. Jezek, B. Li, and Z. Zhao. 2001. Radarsat Antarctic Mapping Project digital elevation model version 2. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/nsidc-0082.html".

Dates Used

For time series, especially continually updated time series, indicate which dates of data were used. Note this is distinct from the publication date.

Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2006, updated daily. MODIS/Terra snow cover Extent 5-Min L2 swath 1km V005, Oct. 2007–Apr. 2008. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/myd29v5.html".

Editor or Compiler

An editor is the person or team who is responsible for creating a value-added and possibly quality-controlled product from the data. In cases where there is minimal scientific or technical input, yet still substantial effort in compiling the product, the person may be more correctly cited as a compiler. Editors and compilers may often be responsible for a larger work that includes an individual author's data set. Occasionally, there may be both a compiler and editor. Some products will have neither.

Armstrong, R., J. Francis, J. Key, J. Maslanik, T. Scambos, and A. Schweiger. 1998. Polar Pathfinder sampler: Combined AVHRR, SMMR-SSM/I, and TOVS time series and full-resolution samples. Compiled by S. Khalsa. Boulder, CO, USA: National Snow and Ice Data Center. CD-ROM.

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated July 2004. CLPX-Ground: ISA snow pit measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/nsidc-0176.html".

Bockheim, J. 2003. "University of Wisconsin Antarctic Soils Database". In International Permafrost Association Standing Committee on Data Information and Communication (comp.). 2003. Circumpolar Active-Layer Permafrost System, Version 2.0. Edited by M. Parsons and T. Zhang. Boulder, CO: National Snow and Ice Data Center/World Data Center for Glaciology. CD-ROM.

When there is an editor or compiler but no author, the editor is listed first.

Publication Place

This is the city, state (when necessary), and country of the publisher.

Cavalieri, D., C. Parkinson, P. Gloersen, and H. J. Zwally. 1996, updated 2006. Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data, March 2002–Sept. 2003. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/nsidc-0051.html".

Publisher

The publisher is whoever published the data set. A publisher often has an implied responsibility for stewardship of the data set. This is usually a data center and is written immediately after the place.

Cavalieri, D., C. Parkinson, P. Gloersen, and H. J. Zwally. 1996, updated 2006. Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data, March 2002–Sept. 2003. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/nsidc-0051.html".

Distributor or Associate Publisher

This field should be used only when it differs from the publisher, i.e. rarely. Its listing should be written in the same manner as that of publisher. Sometimes NSIDC acts as a simple distributor; sometimes we are an associate publisher; sometimes others are associate publishers.

Environmental Working Group. 2000. Environmental Working Group: Joint U.S.-Russian Arctic sea ice atlas. Ann Arbor, MI: Environmental Research Institute of Michigan; distributed by the National Snow and Ice Data Center. CD-ROM.

Cross, M. compiler. 1997. Greenland summit ice cores. Boulder, CO: National Snow and Ice Data Center in association with the World Data Center A for Paleoclimatology at NOAA-NGDC, and the Institute of Arctic and Alpine Research. CD-ROM.

Distribution Medium and Location

If there is one fixed medium, list it. For example, CD-ROM, DVD.

International Permafrost Association Standing Committee on Data Information and Communication (comp.). 2003. Circumpolar Active-Layer Permafrost System, Version 2.0. Edited by M. Parsons and T. Zhang. Boulder, CO: National Snow and Ice Data Center/World Data Center for Glaciology. CD-ROM.

If data are available over the internet or through multiple digital media options it is best to include a reference to the location of the data. Often this is through a standard URL.

Cavalieri, D., C. Parkinson, P. Gloersen, and H. J. Zwally. 1996, updated 2006. Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data, March 2002–Sept. 2003. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/nsidc-0051.html".

Ideally, a persistent identifier such as a Digital Object Identifier should be used.

König-Langlo, Gert and Hatwig Gernandt. 2006. Compilation of radiosonde data from the Antarctic Georg-Forster station of the German Democratic Republic from 1985 to 1992. Bremerhaven, Germany: Alfred Wegener Institute for Polar and Marine Research Data set accessed 2008-05-22. doi:10.1594/PANGAEA.547983

Access Date

Because data can be dynamic and changeable in ways that are not always reflected in publication dates and versions, it is important to indicate when on-line data were accessed. It is not necessary to indicate an access date for a fixed medium like a DVD.

Cavalieri, D., C. Parkinson, P. Gloersen, and H. J. Zwally. 1996, updated 2006. Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data, March 2002–Sept. 2003. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at "http://nsidc.org/data/nsidc-0051.html".

Data Within a Larger Work

A particular data set may be part of a compilation, in which case it is appropriate to cite the data set somewhat like a chapter in an edited volume.

Bockheim, J. 2003. "University of Wisconsin Antarctic Soils Database". In International Permafrost Association Standing Committee on Data Information and Communication (comp.). 2003. Circumpolar Active-Layer Permafrost System, Version 2.0. Edited by M. Parsons and T. Zhang. Boulder, CO: National Snow and Ice Data Center/World Data Center for Glaciology. CD-ROM.

Increasingly, publishers are allowing data supplements to be published along with peer-reviewed research papers. When using the data supplement one need only cite the parent reference. For example, when using the data at "doi:10.1594/PANGAEA.476007", the following reference is appropriate.

Stein, Ruediger, Bettina Boucsein, and Hanno Meyer. 2006. "Anoxia and high primary production in the Paleogene central Arctic Ocean: first detailed records from Lomonosov Ridge." Geophysical Research Letters, 33: L18606. doi:10.1029/2006GL026776.