Difference between revisions of "Main Page/Start here"

From Earth Science Information Partners (ESIP)
m (1 revision imported)
 
(48 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<big><big>'''Test page for the Editor's Roundtable - an Initiative for Best Practices for Data Publication
+
<big><big>'''A Wiki for the Best Practices in Data Publication in the Earth and Environmental Sciences
 
'''</big></big>
 
'''</big></big>
  
== What is the Editor's Roundtable? ==  
+
== Goals and Objectives ==  
The [http://www.earthchem.org/editors/ '''Editor's Roundtable'''] is a community-based initiative, conceived by editors, publishers, and operators of data facilities at a series of workshops organized by [http://www.iedadata.org/ IEDA] (Integrated Earth Data Applications) and [http://www.earthchem.org/ EarthChem] at major scientific conferences. It is an effort to foster and facilitate communication and knowledge exchange among editors and publishers of Earth Science journals as well as data facilities. Our goal is to develop and promote best practices for scholarly publishing, with an emphasis on data publication in support of open access policies.
+
The best practices for data publication, builds on a successful initiative started in 2007 by [http://www.earthchem.org/ EarthChem], to develop and promote best practices for the reporting of geochemical data in scholarly articles and data systems. It is a community-based initiative, conceived by editors, publishers, and operators of data facilities at a series of workshops organized by [http://www.iedadata.org/ IEDA] (Integrated Earth Data Applications) and [http://www.earthchem.org/ EarthChem] at major scientific conferences. It is an effort to foster and facilitate communication and knowledge exchange among editors and publishers of Earth Science journals as well as data facilities. Our goal is to develop and promote best practices for scholarly publishing, with an emphasis on data publication in support of open access policies.
  
The [http://www.earthchem.org/editors/ '''Editors Roundtable'''] builds on a successful initiative started in 2007 by [http://www.earthchem.org/ EarthChem], to develop and promote best practices for the reporting of geochemical data in scholarly articles and data systems, and that resulted in a Policy Recommendation [http://www.earthchem.org/library/browse/view?id=735/ '''Requirements for the Publication of Geochemical Data'''] (Goldstein et al. 2014, doi:10.1594/IEDA/100426), which was endorsed by all major scientific journals that publish geochemical data and has guided policies for the disclosure and documentation of geochemical data.
+
== Types of Data/Discipline in the Earth Sciences ==
  
== Goals and Objectives ==
+
Geoscience/Earth Science:
# Archiving and curating data sets;
+
 
# Setting references and identifiers;
+
Atmospheric and space weather science
# Linking datasets to publications;
+
*  Aeronomy
# Integrating with emerging data citation practices and bibliometrics for data;
+
*  Astronomy
# Complying with interoperability standards.
+
*  Atmospheric chemistry
 +
*  Air quality
 +
*  Atmospheric modeling
 +
*  Atmospheric physics
 +
*  Atmospheric science
 +
*  Heliophysics
 +
*  Meteorology
 +
*  Radio astronomy
 +
*  Space physics
 +
*  Space weather
 +
 
 +
Biology and ecology
 +
*  Biodiversity studies
 +
*  Ecosystem studies
 +
*  Geobiology
 +
*  Marine microbiology
 +
*  Microbial studies
 +
*  Population studies
 +
 
 +
Climate science
 +
 
 +
*  Climate simulation
 +
*  Critical zone science
 +
*  Agronomy
 +
 
 +
 
 +
Geology
 +
*  Coastal processes
 +
*  Geochronology
 +
*  Geomorphology
 +
*  Glaceology
 +
*  Paleobiology
 +
*  Paleoclimate
 +
*  Paleomagnetism
 +
*  Paleoontology
 +
*  Sedimentology
 +
*  Stratigraphy
 +
*  Structural geology
 +
*  Tectonics
 +
 
 +
Geochemistry
 +
*  Biogeochemistry
 +
*  Petrology
 +
*  Volcanology
 +
 
 +
Geophysics
 +
*  Geodesy
 +
*  Geodynamics
 +
*  Geophysics
 +
*  Marine geophysics
 +
*  Seismology
 +
 
 +
Hydrology
 +
*  Hydrogeosciences
 +
*  Hydrology
 +
*  Limnology
 +
*  Watershed dynamics
 +
*  Water resources
 +
*  Water cycles
 +
 
 +
Oceanography
 +
*  Biological oceanography
 +
*  Chemical oceanography
 +
*  Coastal oceanography 
 +
*  Ocean biogeochemistry
 +
*  Oceanography
 +
*  Oceanography modeling
 +
*  Paleooceanography
 +
*  Physical oceanography
 +
 
 +
Physical geography
 +
*  Geospatial Polar
 +
*  Cryosphere
 +
*  Polar glaciology
 +
*  Polar oceanography
 +
*  Polar atmospheric science
 +
*  Polar space physics
 +
*  Polar geology
 +
*  Polar geophysics
 +
*  Polar geochemistry
 +
*  Polar biology and ecology
 +
*  Antarctic astronomy
 +
*  Polar engineering
 +
 
 +
Social science
 +
*  Anthropology
 +
*  Economics
 +
*  Human geography
 +
*  Institutional/organizational science
 +
*  Psychology
 +
*  Sociology
 +
 
 +
Other science
 +
*  Interdisciplinary geoscience
 +
*  Earth system modeling
 +
 
 +
Information/Cyber Science & Technology:
 +
Computer Science
 +
*  Algorithms
 +
*  Computer science
 +
*  Computational science
 +
*  Geographic information science
 +
*  Modeling
 +
*  Numerical modeling
 +
*  Risk modeling
 +
*  Visualization sciences
 +
 
 +
Cyberinfrastructure
 +
 
 +
*  Cyberinfrastructure and hardware engineer
 +
*  Cyberinfrastructure software engineer and programmer
 +
*  High performance computing
 +
*  Informatics/information systems
 +
*  System design
 +
 
 +
 
 +
Data manager/data services
 +
*  Data management
 +
*  Data science
 +
*  Data services/assimilation
 +
*  Disaster assessment
 +
*  Remote sensing
 +
*  Satellite processing
 +
*  System integration
 +
 
 +
== Recommended Practices for Data Publication ==
 +
Using the discipline of geochemistry as an example:
  
== Recommended practices for data submission ==
 
 
* '''Data Accessibility and Format'''
 
* '''Data Accessibility and Format'''
   Access to the complete data is a fundamental requirement for the reproducibility of scientific results.  
+
   '''Access to the complete data is a fundamental requirement for the reproducibility of scientific results.'''
 
All NEW geochemical data used in a publication must be made available for future use by:  
 
All NEW geochemical data used in a publication must be made available for future use by:  
 
# submission to an accessible, persistent source such as a public database or data archive (for example, personal web sites are not persistent data archives), if it exists for the specific data type, or by  
 
# submission to an accessible, persistent source such as a public database or data archive (for example, personal web sites are not persistent data archives), if it exists for the specific data type, or by  
 
# listing the data explicitly in a data table associated with the publication.  
 
# listing the data explicitly in a data table associated with the publication.  
   The data must be available in downloadable format.  
+
   The data must be available in '''downloadable format'''.  
 
For chemical abundance data of samples, elemental or oxide abundance data must be given unless a compelling reason can be provided; elemental abundance ratios are acceptable only if the compositional data do not exist. Isotope ratios are, of course, acceptable.  
 
For chemical abundance data of samples, elemental or oxide abundance data must be given unless a compelling reason can be provided; elemental abundance ratios are acceptable only if the compositional data do not exist. Isotope ratios are, of course, acceptable.  
   Data should be reported in a tabular format.  
+
   Data should be reported in a '''tabular format'''.  
 
Data must always be available as a downloadable file in a format that can be easily converted into spreadsheet format (for example, .csv, .txt). The file should include units for the listed measured values. This means that if a publication contains a data table in the main text or a pdf or image version of the data table as an electronic supplement, the data in the table(s) must also be available in a downloadable form that can be easily converted to a spreadsheet.
 
Data must always be available as a downloadable file in a format that can be easily converted into spreadsheet format (for example, .csv, .txt). The file should include units for the listed measured values. This means that if a publication contains a data table in the main text or a pdf or image version of the data table as an electronic supplement, the data in the table(s) must also be available in a downloadable form that can be easily converted to a spreadsheet.
 +
  
 
* '''Data Quality Information'''
 
* '''Data Quality Information'''
   Proper documentation of data quality is essential.
+
   '''Proper documentation of data quality is essential.'''
Proper documentation of data quality is fundamental for comparison of research results and estimation of uncertainty. Authors must provide sufficient information (metadata) about the analytical process and reproducibility of measurements
+
Proper documentation of data quality is fundamental for comparison of research results and estimation of uncertainty. Authors must provide sufficient information (metadata) about the analytical process and reproducibility of measurement in order that the data quality can be properly evaluated. Correction procedures need to be clearly presented. This
in order that the data quality can be properly evaluated. Correction procedures need to be clearly presented. This
+
information is necessary to allow for scholarly reproduction of the results. Basic metadata such as analytical technique, lab, and values measured on reference materials need to accompany the data. If possible, metadata should be provided in standardized tabular format to facilitate access to this information for editors, reviewers, readers, and data managers.
information is necessary to allow for scholarly reproduction of the results. Basic metadata such as analytical
+
   '''Analytical metadata must be provided for each measured parameter.'''
technique, lab, and values measured on reference materials need to accompany the data. If possible, metadata should be
 
provided in standardized tabular format to facilitate access to this information for editors, reviewers, readers, and
 
data managers.
 
   Information about the analytical procedure must be provided for each measured parameter.  
 
 
If a parameter has been analyzed by more than one method, each method must be documented separately. If possible, this information should be provided in a tabular format.
 
If a parameter has been analyzed by more than one method, each method must be documented separately. If possible, this information should be provided in a tabular format.
  
   General analytical '''metadata''' must include:
+
   General '''analytical metadata''' include:
 
# Analytical technique (e.g. ICP, XRF, EMP)
 
# Analytical technique (e.g. ICP, XRF, EMP)
 
# Laboratory (name of department/lab & institution)
 
# Laboratory (name of department/lab & institution)
Line 43: Line 165:
 
:b. Estimated uncertainty of reference standard measurement, and, if applicable, number of measurements
 
:b. Estimated uncertainty of reference standard measurement, and, if applicable, number of measurements
  
   Method specific '''metadata''' must include, as appropriate to the method:
+
   '''Method specific metadata''' must include, as appropriate to the method:
 
# Fractionation correction
 
# Fractionation correction
 
# Standardization (Normalization)
 
# Standardization (Normalization)
Line 69: Line 191:
 
# Normalization
 
# Normalization
 
# Fractionation correction
 
# Fractionation correction
 
  
 
* '''Sample Information'''
 
* '''Sample Information'''
The geochemical data addressed in this policy are tied to samples. Essential information
+
The geochemical data addressed in this policy are tied to samples. Essential information about the samples must be provided in order to allow for proper identification of their origin and type, and to trace their analytical history.
about the samples must be provided in order to allow for proper identification of their
 
origin and type, and to trace their analytical history.
 
  
   Sample specific '''metadata''' should include, if availalble:
+
   '''Sample specific metadata''' should include, if availalble:
 
# Sample name or global unique identifications: global unique identifiers such as the [http://www.geosamples.org/ International Geo Sample Number (IGSN)] can be unambiguously referenced to a sample. The '''IGSN''' is a global unique 9-digit alphanumeric unique identifier provided and administered by [http://www.geosamples.org/ SESAR (System for Earth Sample Registration)]. It is used together with a person’s or institution’s sample name to ensure unambiguous identification of a sample. IGSNs can be obtained from [http://www.geosamples.org/ SESAR] by submitting sample metadata. This allows a complete analytical profile of a sample to be established that includes data generated at different times or in different labs, and reported in different publications.  
 
# Sample name or global unique identifications: global unique identifiers such as the [http://www.geosamples.org/ International Geo Sample Number (IGSN)] can be unambiguously referenced to a sample. The '''IGSN''' is a global unique 9-digit alphanumeric unique identifier provided and administered by [http://www.geosamples.org/ SESAR (System for Earth Sample Registration)]. It is used together with a person’s or institution’s sample name to ensure unambiguous identification of a sample. IGSNs can be obtained from [http://www.geosamples.org/ SESAR] by submitting sample metadata. This allows a complete analytical profile of a sample to be established that includes data generated at different times or in different labs, and reported in different publications.  
 
# Sample location: all natural samples for which data are reported require, if possible, information about the sample location, including latitude and longitude (if these are unknown, approximate coordinates obtained by using Google Earth would suffice). Marine samples require a depth below sea level. If applicable, the position of a sample within a stratigraphic section or within a core should be reported.  
 
# Sample location: all natural samples for which data are reported require, if possible, information about the sample location, including latitude and longitude (if these are unknown, approximate coordinates obtained by using Google Earth would suffice). Marine samples require a depth below sea level. If applicable, the position of a sample within a stratigraphic section or within a core should be reported.  
 
# Sample classification: samples should be classified (e.g. lithology for rocks and sediments, species for minerals and fossils and age).  
 
# Sample classification: samples should be classified (e.g. lithology for rocks and sediments, species for minerals and fossils and age).  
 
# Sampling information such as the cruise or field program (if applicable)
 
# Sampling information such as the cruise or field program (if applicable)
 +
 +
== Resources for Data Publication ==
 +
# [http://earth-system-science-data.net/ Earth System Science Data Journal data policy]
 +
# [http://www.nerc.ac.uk/research/sites/data/policy/ NERC data policy]
 +
# [http://www.jamstec.go.jp/e/database/data_policy.html/ GANSEKI data policy]
 +
# [http://www.usgs.gov/publishing/policies.html/ USGS data publishing policy]
 +
# [http://www.codata.org/ Policy from ICSU's Committee on Data for Science and Technology]
 +
# [http://centerforopenscience.org/ Center for Open Science (COS) data policy]
 +
# [http://www.plos.org/data-access-for-the-open-access-literature-ploss-data-policy/ PLoS One data policy]
 +
# [http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/ ICPSR data deposit guide]
 +
# [http://thedata.org/ The Data Verse Network Project data policy]
  
  
== Recommended practices for citations ==  
+
== Recommended Practices for Data Citations ==  
 +
Data citation is an evolving but increasingly important scientific practice. We see several important purposes of data citation:
 +
 
 +
# To aid scientific reproducibility through direct, unambiguous reference to the precise data used in a particular study.
 +
# To provide fair credit for data creators or authors, data stewards, and other critical people in the data production and curation process.
 +
# To ensure scientific transparency and reasonable accountability for authors and stewards.
 +
# To aid in tracking the impact of data set and the associated data center through reference in scientific literature.
 +
# To help data authors verify how their data are being used.
 +
# To help future data users identify how others have used the data.
 +
 
 +
The core required elements of a citation from ESIP:
 +
# '''Author(s)'''--the people or organizations responsible for the intellectual work to develop the data set. The data creators.
 +
# '''Release Date'''--when the particular version of the data set was first made available for use (and potential citation) by others.
 +
# '''Version'''-- the precise version of the data used. Careful version tracking is critical to accurate citation.
 +
# '''Title'''-- the formal title of the data set
 +
# '''Archive and/or Distributor'''-- the organization distributing or caring for the data, ideally over the long term.
 +
# '''Locator/Identifier'''-- this could be a URL but ideally it should be a persistent service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.
 +
# '''Access Date and Time'''-- because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
 +
 
 +
# Additional fields can be added as necessary to credit other people and institutions, etc. Additionally, it is important to provide a scheme for users to indicate the precise subset of data that were used. This could be the temporal and spatial range of the data, the types of files used, a specific query id, or other ways of describing how the data were subsetted.
 +
 
 
* [https://www.force11.org/datacitation/ '''The Data Citation Principles''' from Force 11]
 
* [https://www.force11.org/datacitation/ '''The Data Citation Principles''' from Force 11]
The Data Citation Principles cover purpose, function and attributes of citations.  These principles recognize the dual necessity of creating citation practices that are both human understandable and machine-actionable
 
 
# '''Importance''': Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
 
# '''Importance''': Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
 
# '''Credit and attribution''': Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. For example: data citations should provide sufficient information to identify cited data reference within included reference list;
 
# '''Credit and attribution''': Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. For example: data citations should provide sufficient information to identify cited data reference within included reference list;
Line 91: Line 240:
 
# '''Unique Identification''': A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community
 
# '''Unique Identification''': A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community
 
# '''Access''': Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.
 
# '''Access''': Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.
# '''Persistance''': Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe;  
+
# '''Persistent''': Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe;  
 
# '''Specificity and verification''': Data citation should facilitate access to the data themselves and to such associated metadata, documentation, code, and other material, thus, citation metadata should  include additional information that can help identify specific portion of the data related supporting that claim. For example, versions or timeslice information should be supplied with any updated or dynamic dataset;
 
# '''Specificity and verification''': Data citation should facilitate access to the data themselves and to such associated metadata, documentation, code, and other material, thus, citation metadata should  include additional information that can help identify specific portion of the data related supporting that claim. For example, versions or timeslice information should be supplied with any updated or dynamic dataset;
# '''Flexibilityand interoperability''': Data citation should be sufficienty flexible to accomodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities.
+
# '''Flexibility and interoperability''': Data citation should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities.
 
 
 
 
'''Examples''':
 
The plots shown in Figure X show the distribution of selected measures from the main data [author(s), year, portion of subset used]
 
'''References section''':
 
  
 +
'''Example reference citations''':
 
:Author, year, article title, journal, publisher, DOI
 
:Author, year, article title, journal, publisher, DOI
:Author, year dataset title, data repository or archive, version, global persistant identifier?
 
 
:Author, year, book title, publisher, ISBN
 
:Author, year, book title, publisher, ISBN
  
 +
'''Example data citations''':
 +
:Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. Parsons and M. J. Brodzik. National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/10.5060/D4MW2F23z
 +
:Author, year dataset title, data repository or archive, version, global persistant identifier
 +
:The plots shown in Figure X show the distribution of selected measures from the main data [author(s), year, portion of subset used]
 +
 +
== Resources for Data Citation ==
 +
# [http://commons.esipfed.org/node/308/ ESIP Data Citation Guidelines]
 +
# [https://www.force11.org/datacitation/ Force 11 Data Citation Principles]
 +
# [http://www.datacite.org/ Datacite]
 +
# [http://www.codata.org/task-groups/data-citation-standards-and-practices/ CODATA]
 +
# [http://www.dcc.ac.uk/resources/how-guides/cite-datasets/ Digital Curation Center]
  
The core required elements of a citation are:
+
== Index of Data Facilities ==
# '''Author(s)'''--the people or organizations responsible for the intellectual work to develop the data set. The data creators.
+
Lists of available data repositories
# '''Release Date'''--when the particular version of the data set was first made available for use (and potential citation) by others.
+
# [http://www.datasealofapproval.org/en/ Data Seal of Approval]
# '''Version'''--the precise version of the data used. Careful version tracking is critical to accurate citation.
+
# [http://databib.org/index_subjects.php Databib]
# '''Title'''--the formal title of the data set
+
# [http://www.re3data.org re3data - the Registry of Research Data Repositories]
# '''Archive and/or Distributor'''--the organization distributing or caring for the data, ideally over the long term.
 
# '''Locator/Identifier'''--this could be a URL but ideally it should be a persistent service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.
 
# '''Access Date and Time'''--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
 
# Additional fields can be added as necessary to credit other people and institutions, etc. Additionally, it is important to provide a scheme for users to indicate the precise subset of data that were used. This could be the temporal and spatial range of the data, the types of files used, a specific query id, or other ways of describing how the data were subsetted.
 
# '''An example citation''': Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. Parsons and M. J. Brodzik. National Snow and Ice Data Center. Data set accessed 2008-05-14 athttp://dx.doi.org/10.5060/D4MW2F23z
 
  
== Existing data facilities ==
+
Geochemical data
 
# [http://www.earthchem.org/data/contribute/ EarthChem]
 
# [http://www.earthchem.org/data/contribute/ EarthChem]
 
# [http://georoc.mpch-mainz.gwdg.de/georoc/ GEOROC]
 
# [http://georoc.mpch-mainz.gwdg.de/georoc/ GEOROC]
# [http://www.pangaea.de/submit/ Pangea]   
+
# [http://www.pangaea.de/submit/ PANGAEA]   
# [http://www.datacite.org/ Data cite] 
 
# [http://workspace.earthcube.org/council-data-facilities-0/ EarthCube: Council of Data Facilities]
 
 
# [http://www.navdat.org/NavdatHome/DataSubmissionFileFormat.cfm/ NAVDAT]
 
# [http://www.navdat.org/NavdatHome/DataSubmissionFileFormat.cfm/ NAVDAT]
 
# [http://www.godac.jamstec.go.jp/ganseki/e/ GANSEKI]
 
# [http://www.godac.jamstec.go.jp/ganseki/e/ GANSEKI]
# [http://pubs.er.usgs.gov/browse/usgs-publications/ USGS publication warehouse]
+
# [http://gulfresearchinitiative.org/ Gulf of Mexico Research Initiative]
 +
 
 +
Astronomical data
 
# [http://www.nasa.gov/open/data.html/ NASA]
 
# [http://www.nasa.gov/open/data.html/ NASA]
 
# [https://data.noaa.gov/dataset/ NOAA]
 
# [https://data.noaa.gov/dataset/ NOAA]
 +
 +
Seismological data
 +
# [http://www.iris.edu/hq/ IRIS seismological data repository]
 +
# [http://minsocam.org/ MinDATA]
 +
 +
Meteorological and hydrological data
 +
# [http://www.unidata.ucar.edu/ Unidata - NCAR]
 +
# [http://his.cuahsi.org/ CUAHSI hydrological database]
 +
# [http://www.earthchem.org/data/contribute/ EarthChem]
 
# [http://www.si.edu/ Smithsonian]
 
# [http://www.si.edu/ Smithsonian]
# [http://www.nerc.ac.uk/research/sites/data/ NERC data centers] includes the following:
+
# [http://gulfresearchinitiative.org/ Gulf of Mexico Research Initiative]
:[http://www.ceda.ac.uk/ Centre for Environmental Data Archival]
+
# [http://www.nerc.ac.uk/research/sites/data/ NERC data centers]
:[http://www.bgs.ac.uk/services/ngdc/ National Geoscience Data Centre]  
+
# [http://www.nasa.gov/open/data.html/ NASA]
 +
# [https://data.noaa.gov/dataset/ NOAA]
 +
 
 +
Map data
 +
# [http://pubs.er.usgs.gov/browse/usgs-publications/ USGS]
 +
 
 +
Comprehensive data repository
 +
# [http://www.datadryad.org/ Dryad digital repository]
 +
# [http://thedata.org/book/export/html/17201/ DataVerse Network Project]
  
== Existing Resources and Guidelines ==
+
Biological data
# [http://commons.esipfed.org/node/308/ ESIP Data Citation Guidelines]
+
# [http://www.bco-dmo.org/ The Biological and Chemical Oceanography Data Management Office  (BCO-DMO)]
# [https://www.force11.org/datacitation/ Force 11 Data Citation Principles]
 
# [http://www.jamstec.go.jp/e/database/data_policy.html/ GENSEKI data policy]
 
# [http://www.nerc.ac.uk/research/sites/data/policy/ NERC data policy]
 
# [http://www.usgs.gov/publishing/policies.html/ USGS]
 
  
== Main Publishers for Earth and Environmental Sciences ==
+
== Publishers in the Earth and Environmental Sciences ==
 +
Data Journals
 +
# [http://www.icsu-wds.org/organization Geoscience data journal - Wiley]
 +
# [http://www.earth-system-science-data.net/index.html Earth System Science Data]
 +
# [http://www.icsu-wds.org/organization GeoRes - Elsevier]
 +
# [http://www.icsu-wds.org/organization Earth and Space Science - AGU]
 +
# [http://www.icsu-wds.org/organization Scientific Data - Nature]
  
 +
Publishers for scholarly articles
 
# [http://www.elsevier.com/ Elsevier]
 
# [http://www.elsevier.com/ Elsevier]
 
# [http://www.wiley.com/WileyCDA/ Wiley]
 
# [http://www.wiley.com/WileyCDA/ Wiley]
Line 148: Line 318:
 
# [http://www.gsapubs.org/ GSA publications]
 
# [http://www.gsapubs.org/ GSA publications]
 
# [http://www.springer.com/earth+sciences+and+geography/journals?SGWID=0-1729013-0-0-0/ Springer]
 
# [http://www.springer.com/earth+sciences+and+geography/journals?SGWID=0-1729013-0-0-0/ Springer]
 +
# [http://www.proquest.com/ ProQuest]
 
# [http://www.electronic-earth.net/ eEarth]
 
# [http://www.electronic-earth.net/ eEarth]
 
# [http://www.egu.eu/publications/open-access-journals/ EGU publications]
 
# [http://www.egu.eu/publications/open-access-journals/ EGU publications]
 
# [http://www.oxfordjournals.org/en/our-journals/science-and-mathematics.html/ Oxford Journals]
 
# [http://www.oxfordjournals.org/en/our-journals/science-and-mathematics.html/ Oxford Journals]
 
# [http://www-icdp.icdp-online.org/front_content.php?idcat=344/ ICDP]
 
# [http://www-icdp.icdp-online.org/front_content.php?idcat=344/ ICDP]
 +
# [http://geoscienceworld.org/ Geoscience World]
 +
# [http://minsocam.org/ Mineralogical Society of America]

Latest revision as of 08:53, October 8, 2021

A Wiki for the Best Practices in Data Publication in the Earth and Environmental Sciences

Goals and Objectives

The best practices for data publication, builds on a successful initiative started in 2007 by EarthChem, to develop and promote best practices for the reporting of geochemical data in scholarly articles and data systems. It is a community-based initiative, conceived by editors, publishers, and operators of data facilities at a series of workshops organized by IEDA (Integrated Earth Data Applications) and EarthChem at major scientific conferences. It is an effort to foster and facilitate communication and knowledge exchange among editors and publishers of Earth Science journals as well as data facilities. Our goal is to develop and promote best practices for scholarly publishing, with an emphasis on data publication in support of open access policies.

Types of Data/Discipline in the Earth Sciences

Geoscience/Earth Science:

Atmospheric and space weather science

  • Aeronomy
  • Astronomy
  • Atmospheric chemistry
  • Air quality
  • Atmospheric modeling
  • Atmospheric physics
  • Atmospheric science
  • Heliophysics
  • Meteorology
  • Radio astronomy
  • Space physics
  • Space weather

Biology and ecology

  • Biodiversity studies
  • Ecosystem studies
  • Geobiology
  • Marine microbiology
  • Microbial studies
  • Population studies

Climate science

  • Climate simulation
  • Critical zone science
  • Agronomy


Geology

  • Coastal processes
  • Geochronology
  • Geomorphology
  • Glaceology
  • Paleobiology
  • Paleoclimate
  • Paleomagnetism
  • Paleoontology
  • Sedimentology
  • Stratigraphy
  • Structural geology
  • Tectonics

Geochemistry

  • Biogeochemistry
  • Petrology
  • Volcanology

Geophysics

  • Geodesy
  • Geodynamics
  • Geophysics
  • Marine geophysics
  • Seismology

Hydrology

  • Hydrogeosciences
  • Hydrology
  • Limnology
  • Watershed dynamics
  • Water resources
  • Water cycles

Oceanography

  • Biological oceanography
  • Chemical oceanography
  • Coastal oceanography
  • Ocean biogeochemistry
  • Oceanography
  • Oceanography modeling
  • Paleooceanography
  • Physical oceanography

Physical geography

  • Geospatial Polar
  • Cryosphere
  • Polar glaciology
  • Polar oceanography
  • Polar atmospheric science
  • Polar space physics
  • Polar geology
  • Polar geophysics
  • Polar geochemistry
  • Polar biology and ecology
  • Antarctic astronomy
  • Polar engineering

Social science

  • Anthropology
  • Economics
  • Human geography
  • Institutional/organizational science
  • Psychology
  • Sociology

Other science

  • Interdisciplinary geoscience
  • Earth system modeling

Information/Cyber Science & Technology: Computer Science

  • Algorithms
  • Computer science
  • Computational science
  • Geographic information science
  • Modeling
  • Numerical modeling
  • Risk modeling
  • Visualization sciences

Cyberinfrastructure

  • Cyberinfrastructure and hardware engineer
  • Cyberinfrastructure software engineer and programmer
  • High performance computing
  • Informatics/information systems
  • System design


Data manager/data services

  • Data management
  • Data science
  • Data services/assimilation
  • Disaster assessment
  • Remote sensing
  • Satellite processing
  • System integration

Recommended Practices for Data Publication

Using the discipline of geochemistry as an example:

  • Data Accessibility and Format
 Access to the complete data is a fundamental requirement for the reproducibility of scientific results. 

All NEW geochemical data used in a publication must be made available for future use by:

  1. submission to an accessible, persistent source such as a public database or data archive (for example, personal web sites are not persistent data archives), if it exists for the specific data type, or by
  2. listing the data explicitly in a data table associated with the publication.
 The data must be available in downloadable format. 

For chemical abundance data of samples, elemental or oxide abundance data must be given unless a compelling reason can be provided; elemental abundance ratios are acceptable only if the compositional data do not exist. Isotope ratios are, of course, acceptable.

 Data should be reported in a tabular format. 

Data must always be available as a downloadable file in a format that can be easily converted into spreadsheet format (for example, .csv, .txt). The file should include units for the listed measured values. This means that if a publication contains a data table in the main text or a pdf or image version of the data table as an electronic supplement, the data in the table(s) must also be available in a downloadable form that can be easily converted to a spreadsheet.


  • Data Quality Information
 Proper documentation of data quality is essential.

Proper documentation of data quality is fundamental for comparison of research results and estimation of uncertainty. Authors must provide sufficient information (metadata) about the analytical process and reproducibility of measurement in order that the data quality can be properly evaluated. Correction procedures need to be clearly presented. This information is necessary to allow for scholarly reproduction of the results. Basic metadata such as analytical technique, lab, and values measured on reference materials need to accompany the data. If possible, metadata should be provided in standardized tabular format to facilitate access to this information for editors, reviewers, readers, and data managers.

 Analytical metadata must be provided for each measured parameter. 

If a parameter has been analyzed by more than one method, each method must be documented separately. If possible, this information should be provided in a tabular format.

 General analytical metadata include:
  1. Analytical technique (e.g. ICP, XRF, EMP)
  2. Laboratory (name of department/lab & institution)
  3. Analytical accuracy & reproducibility
a. Name(s) and measured value(s) of (internationally recognized) reference standard(s) measured as unknown sample
b. Estimated uncertainty of reference standard measurement, and, if applicable, number of measurements
 Method specific metadata must include, as appropriate to the method:
  1. Fractionation correction
  2. Standardization (Normalization)
  3. Total procedural blank
  4. Detection limit
  5. Calibration
 The list below identifies some of the metadata sets that are relevant for geochemical data:

I. Bulk Elemental Analyses (e.g. AAS, HPLC, ICPAES, ICPMS, INAA, XRF)

  1. Standardization (Normalization)
  2. Total procedural blank
  3. Detection limit

II. In-situ Elemental Analyses (e.g. EMP, SIMS, LA-ICPMS)

  1. Standardization (Normalization)
  2. Detection limit
  3. Calibration

III. Bulk Isotopic Analyses (e.g. TIMS, MC-ICPMS)

  1. Standardization (Normalization)
  2. Fractionation correction
  3. Total procedural blank
  4. Detection limit

IV. In-situ Isotope Analyses (e.g. SIMS, LA-MC-ICPMS, LA-ICPMS)

  1. Standardization (Normalization)
  2. Detection limit
  3. Normalization
  4. Fractionation correction
  • Sample Information

The geochemical data addressed in this policy are tied to samples. Essential information about the samples must be provided in order to allow for proper identification of their origin and type, and to trace their analytical history.

 Sample specific metadata should include, if availalble:
  1. Sample name or global unique identifications: global unique identifiers such as the International Geo Sample Number (IGSN) can be unambiguously referenced to a sample. The IGSN is a global unique 9-digit alphanumeric unique identifier provided and administered by SESAR (System for Earth Sample Registration). It is used together with a person’s or institution’s sample name to ensure unambiguous identification of a sample. IGSNs can be obtained from SESAR by submitting sample metadata. This allows a complete analytical profile of a sample to be established that includes data generated at different times or in different labs, and reported in different publications.
  2. Sample location: all natural samples for which data are reported require, if possible, information about the sample location, including latitude and longitude (if these are unknown, approximate coordinates obtained by using Google Earth would suffice). Marine samples require a depth below sea level. If applicable, the position of a sample within a stratigraphic section or within a core should be reported.
  3. Sample classification: samples should be classified (e.g. lithology for rocks and sediments, species for minerals and fossils and age).
  4. Sampling information such as the cruise or field program (if applicable)

Resources for Data Publication

  1. Earth System Science Data Journal data policy
  2. NERC data policy
  3. GANSEKI data policy
  4. USGS data publishing policy
  5. Policy from ICSU's Committee on Data for Science and Technology
  6. Center for Open Science (COS) data policy
  7. PLoS One data policy
  8. ICPSR data deposit guide
  9. The Data Verse Network Project data policy


Recommended Practices for Data Citations

Data citation is an evolving but increasingly important scientific practice. We see several important purposes of data citation:

  1. To aid scientific reproducibility through direct, unambiguous reference to the precise data used in a particular study.
  2. To provide fair credit for data creators or authors, data stewards, and other critical people in the data production and curation process.
  3. To ensure scientific transparency and reasonable accountability for authors and stewards.
  4. To aid in tracking the impact of data set and the associated data center through reference in scientific literature.
  5. To help data authors verify how their data are being used.
  6. To help future data users identify how others have used the data.

The core required elements of a citation from ESIP:

  1. Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.
  2. Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.
  3. Version-- the precise version of the data used. Careful version tracking is critical to accurate citation.
  4. Title-- the formal title of the data set
  5. Archive and/or Distributor-- the organization distributing or caring for the data, ideally over the long term.
  6. Locator/Identifier-- this could be a URL but ideally it should be a persistent service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.
  7. Access Date and Time-- because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
  1. Additional fields can be added as necessary to credit other people and institutions, etc. Additionally, it is important to provide a scheme for users to indicate the precise subset of data that were used. This could be the temporal and spatial range of the data, the types of files used, a specific query id, or other ways of describing how the data were subsetted.
  1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
  2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. For example: data citations should provide sufficient information to identify cited data reference within included reference list;
  3. Evidence: In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited; e.g., citations should be in close proximity to the claims relying on the data;
  4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community
  5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.
  6. Persistent: Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe;
  7. Specificity and verification: Data citation should facilitate access to the data themselves and to such associated metadata, documentation, code, and other material, thus, citation metadata should include additional information that can help identify specific portion of the data related supporting that claim. For example, versions or timeslice information should be supplied with any updated or dynamic dataset;
  8. Flexibility and interoperability: Data citation should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities.

Example reference citations:

Author, year, article title, journal, publisher, DOI
Author, year, book title, publisher, ISBN

Example data citations:

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. Parsons and M. J. Brodzik. National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/10.5060/D4MW2F23z
Author, year dataset title, data repository or archive, version, global persistant identifier
The plots shown in Figure X show the distribution of selected measures from the main data [author(s), year, portion of subset used]

Resources for Data Citation

  1. ESIP Data Citation Guidelines
  2. Force 11 Data Citation Principles
  3. Datacite
  4. CODATA
  5. Digital Curation Center

Index of Data Facilities

Lists of available data repositories

  1. Data Seal of Approval
  2. Databib
  3. re3data - the Registry of Research Data Repositories

Geochemical data

  1. EarthChem
  2. GEOROC
  3. PANGAEA
  4. NAVDAT
  5. GANSEKI
  6. Gulf of Mexico Research Initiative

Astronomical data

  1. NASA
  2. NOAA

Seismological data

  1. IRIS seismological data repository
  2. MinDATA

Meteorological and hydrological data

  1. Unidata - NCAR
  2. CUAHSI hydrological database
  3. EarthChem
  4. Smithsonian
  5. Gulf of Mexico Research Initiative
  6. NERC data centers
  7. NASA
  8. NOAA

Map data

  1. USGS

Comprehensive data repository

  1. Dryad digital repository
  2. DataVerse Network Project

Biological data

  1. The Biological and Chemical Oceanography Data Management Office (BCO-DMO)

Publishers in the Earth and Environmental Sciences

Data Journals

  1. Geoscience data journal - Wiley
  2. Earth System Science Data
  3. GeoRes - Elsevier
  4. Earth and Space Science - AGU
  5. Scientific Data - Nature

Publishers for scholarly articles

  1. Elsevier
  2. Wiley
  3. AGU publications
  4. Nature
  5. Science
  6. GSA publications
  7. Springer
  8. ProQuest
  9. eEarth
  10. EGU publications
  11. Oxford Journals
  12. ICDP
  13. Geoscience World
  14. Mineralogical Society of America