Difference between revisions of "Main Page/Start here"

From Earth Science Information Partners (ESIP)
Line 10: Line 10:
 
[[Wiki Page Name| Text to show]]
 
[[Wiki Page Name| Text to show]]
  
== The “best practices” will include standards for: ==
+
== Objectives ==
 
# Archiving and curating data sets;
 
# Archiving and curating data sets;
 
# Setting references and identifiers;
 
# Setting references and identifiers;

Revision as of 12:14, September 4, 2014

This is the test page for the Editor's Roundtable - an Initiative for Best Practices for Data Publication

What is the Editor's Roundtable?

The Editor's Roundtable is a community-based initiative, conceived by editors, publishers, and operators of data facilities at a series of meetings held in conjunction with major scientific conferences that IEDA facilitated. It is an effort to facilitate and foster communication and knowledge exchange among editors and publishers of Earth Science journals and Earth Science data facilities with the goal to develop, implement, and promote guidelines and best practices for scholarly publishing with particular emphasis on the publication of data in support of open access policies.

The Editors Roundtable builds on a successful initiative started in 2007 by EarthChem, which since 2010 is part of the IEDA data facility, to develop and promote best practices for the reporting of geochemical data in scholarly articles and data systems, and that resulted in a Policy Recommendation Requirements for the Publication of Geochemical Data (Goldstein et al. 2014, doi:10.1594/IEDA/100426), which was endorsed by all major scientific journals that publish geochemical data and has guided policies for the disclosure and documentation of geochemical data.


Text to show

Objectives

  1. Archiving and curating data sets;
  2. Setting references and identifiers;
  3. Linking datasets to publications;
  4. Integrating with emerging data citation practices and bibliometrics for data;
  5. Complying with interoperability standards.

Recommended practices for data submission

  • Data Accessibility and Format

Access to the complete data, upon which new scientific discovery and knowledge isbased, is a fundamental requirement for the reproducibility of scientific results.All NEW geochemical data used in a publication must be made available for future use by (1) submission to an accessible, persistent source such as a public database or dataarchive (for example, personal web sites are not persistent data archives), if it exists forthe specific data type, or by (2) listing the data explicitly in a data table associatedwith the publication. The data must in any case be available in downloadable format.For chemical abundance data of samples, elemental or oxide abundance data mustbe given unless a compelling reason can be provided; elemental abundance ratiosare acceptable only if the compositional data do not exist. Isotope ratios are, ofcourse, acceptable.Data should be reported in tabular format. Data must always be available as adownloadable file in a format that can be easily converted into spreadsheet format(for example, .csv, .txt). The file should include units for the listed measured values.This means that if a publication contains a data table in the main text or a pdf or imageversion of the data table as an electronic supplement, the data in the table(s) mustalso be available in a downloadable form that can be easily converted to spreadsheetformat.

  • Data Quality Information

Proper documentation of data quality is fundamental for comparison of research results and estimation of uncertainty. Authors must provide sufficient information (metadata) about the analytical process and reproducibility of measurements in order that the data quality can be properly evaluated. Correction procedures need to be clearly presented. This information is necessary to allow for scholarly reproduction of the results. Basic metadata such as analytical technique, lab, and values measured on reference materials need to accompany the data. If possible, metadata should be provided in standardized tabular format to facilitate access to this information for editors, reviewers, readers, and data managers. Information about the analytical procedure must be provided for each measured parameter. If a parameter has been analyzed by more than one method, each method must be documented separately. If possible, this information should be provided in tabular format.

  1. GENERAL ANALYTICAL METADATA must include:

a. Analytical technique (e.g. ICP, XRF, EMP) b. Laboratory (name of department/lab & institution) c. Analytical accuracy & reproducibility i. Name(s) and measured value(s) of (internationally recognized) reference standard(s) measured as unknown sample ii. Estimated uncertainty of reference standard measurement, and, if applicable, number of measurements

  1. METHOD SPECIFIC METADATA must include, as appropriate to the method (see

Appendix): a. Fractionation correction b. Standardization (Normalization) c. Total procedural blank d. Detection limit e. Calibration Method Specific Analytical Metadata The list below identifies metadata sets that are relevant for different types of analytical techniques, which may vary depending upon the specific analysis: A. Bulk Elemental Analyses (e.g. AAS, HPLC, ICPAES, ICPMS, INAA, XRF) a. Standardization (Normalization) b. Total procedural blank c. Detection limit B. In-situ Elemental Analyses (e.g. EMP, SIMS, LA-ICPMS) a. Standardization (Normalization) b. Detection limit c. Calibration C. Bulk Isotopic Analyses (e.g. TIMS, MC-ICPMS) a. Standardization (Normalization) b. Fractionation correction c. Total procedural blank d. Detection limit D. In-situ Isotope Analyses (e.g. SIMS, LA-MC-ICPMS, LA-ICPMS) a. Standardization (Normalization) b. Detection limit c. Normalization d. Fractionation correction

  • Sample Information

The geochemical data addressed in this policy are tied to samples. Essential information about the samples must be provided in order to allow for proper identification of their origin and type, and to trace their analytical history. All natural samples for which data are reported require, if applicable, information about the sample location. In addition and if applicable, samples should be classified (e.g. lithology for rocks and sediments, species for minerals and fossils). Samples should have global unique identifiers so that data can be unambiguously referenced to a sample. This allows a complete analytical profile of a sample to be established that includes data generated at different times or in different labs, and reported in different publications.

a. Metadata: All samples for which data are reported require, if applicable, information about sample location including, if possible latitude and longitude (if these are unknown, approximate coordinates obtained by using Google Earth would suffice). Marine samples require a depth below sea level. If applicable, the position of a sample within a stratigraphic section or within a core should be reported. Other critical sample metadata include lithological classification and age. b. Unique Identification: The problem of non-unique sample names needs to be addressed by the global earth science community. Currently, the only available system where individuals can obtain global unique identifiers for their samples is the sample registry SESAR (System for Earth Sample Registration, www.geosamples.org). The unique identifier provided and administered by SESAR is the 9-digit alphanumeric International Geo Sample Number (IGSN), which is used together with a person’s or institution’s sample name to ensure unambiguous identification of a sample. IGSNs can be obtained from SESAR by submitting the information about a sample that is required for publication through this policy.


Sample Metadata The list below identifies which metadata should be provided for samples (if applicable).

  • a. Sample name
  • b. Geospatial coordinates (latitude, longitude, possibly even utilizing Google Earth for an approximate value if otherwise unknown)
  • c. Unique Identifier (e.g. International GeoSample Number, IGSN)
  • d. Classification (e.g. lithology)
  • e. Age
  • f. Depth in core or position within a stratigraphic section (if applicable)
  • g. Cruise or field program (if applicable)

Recommended practices for citations

  • The Data Citation Principles

The Data Citation Principles cover purpose, function and attributes of citations. These principles recognize the dual necessity of creating citation practices that are both human understandable and machine-actionable

  1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
  2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. For example: data citations should provide sufficient information to identify cited data reference within included reference list;
  3. Evidence: In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited; e.g., citations should be in close proximity to the claims relying on the data;
  4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community
  5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.
  6. Persistance: Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe;
  7. Specificity and verification: Data citation should facilitate access to the data themselves and to such associated metadata, documentation, code, and other material, thus, citation metadata should include additional information that can help identify specific portion of the data related supporting that claim. For example, versions or timeslice information should be supplied with any updated or dynamic dataset;
  8. Flexibilityand interoperability: Data citation should be sufficienty flexible to accomodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities.


Examples: The plots shown in Figure X show the distribution of selected measures from the main data [author(s), year, portion of subset used] References section:

Author, year, article title, journal, publisher, DOI
Author, year dataset title?, data repository or archive, version, global persistant identifier?
Author, year, book title, publisher, ISBN


The core required elements of a citation are

  1. Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.

Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.

  1. Title--the formal title of the data set

Version--the precise version of the data used. Careful version tracking is critical to accurate citation.

  1. Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term.
  2. Locator/Identifier--this could be a URL but ideally it should be a persistent service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.
  3. Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
  4. Additional fields can be added as necessary to credit other people and institutions, etc. Additionally, it is important to provide a scheme for users to indicate the precise subset of data that were used. This could be the temporal and spatial range of the data, the types of files used, a specific query id, or other ways of describing how the data were subsetted.
  5. An example citation:
Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. Parsons and M. J. Brodzik. National Snow and Ice Data Center. Data set accessed 2008-05-14 athttp://dx.doi.org/10.5060/D4MW2F23z

Existing data facilities

  1. EarthChem
  2. GEOROC
  3. Pangea
  4. Data cite
  5. EarthCube: Council of Data Facilities
  6. NAVDAT
  7. GANSEKI
  8. USGS publication warehouse
  9. NASA
  10. NOAA
  11. Smithsonian
  12. NERC data centers includes the following:
Centre for Environmental Data Archival
National Geoscience Data Centre

Resources for data publication

  1. ESIP Data Citation Guidelines
  2. Force 11 Data Citation Principles
  3. GENSEKI data policy
  4. NERC data policy
  5. USGS

Publishers

  1. Elsevier
  2. Wiley
  3. AGU publications
  4. Nature
  5. Science
  6. GSA publications
  7. Springer
  8. eEarth
  9. EGU publications
  10. Oxford Journals
  11. ICDP