Difference between revisions of "FAIR Dataset Quality Information"

From Earth Science Information Partners (ESIP)
 
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Document =
+
=Document=
  
 
This is the document for community guidelines of consistently curating and representing dataset quality information, in line with the FAIR principles.
 
This is the document for community guidelines of consistently curating and representing dataset quality information, in line with the FAIR principles.
  
= Overview =
+
=Overview=
 
This document provides resources for developing community guidelines for consistently curating and representing dataset quality information and captures the outcomes. The guidelines aim to help curate dataset quality information that is findable and accessible, machine- and human-readable, interoperable, and reusable. <br>
 
This document provides resources for developing community guidelines for consistently curating and representing dataset quality information and captures the outcomes. The guidelines aim to help curate dataset quality information that is findable and accessible, machine- and human-readable, interoperable, and reusable. <br>
  
 
The target community is any entity that produces, publishes, manages, or uses digital Earth Science datasets or products. However, the guidelines will be general enough to be applicable to digital datasets of other disciplines.
 
The target community is any entity that produces, publishes, manages, or uses digital Earth Science datasets or products. However, the guidelines will be general enough to be applicable to digital datasets of other disciplines.
  
= Resources =
+
=Guidelines Document=
<big>'''Case Statement''' </big><br>
+
The first baseline of the guidelines document has been released after going through all the review comments and suggestions, and addressed them within the scope of the document.  The latest version of the guidelines document is maintained at https://doi.org/10.31219/osf.io/xsu4p. See the peer-reviewed paper ([http://doi.org/10.5334/dsj-2022-008 Peng et al. 2022]) on the guidelines development process. 
* Case statement for developing community guidelines for consistently curating and representing dataset quality information [https://drive.google.com/drive/u/1/folders/1Alhku_Nt2cFe_2ne141fAclXQjA7cYU8 (Peng et al. 2020)]<br>
 
  
<big>'''Multi-dimensions of Data and Information Quality:''' </big><br>
+
'''History'''  
* Quality Attributes for Data Consumers [https://www.tandfonline.com/doi/abs/10.1080/07421222.1996.11518099 (Wang and Strong 1996)] <br>
 
* Multi-dimensions of Earth Science Data and Information Quality [http://www.dlib.org/dlib/july17/ramapriyan/07ramapriyan.html (Ramapriyan et al. 2017)] <br>
 
* Overview of Data Quality Perspectives and Maturity Models [https://datascience.codata.org/articles/10.5334/dsj-2018-007/ (Peng 2018]; [https://www.youtube.com/watch?v=4mmPMYXQg48&list=PLG25fMbdLRa6Y2GLFUKhuuovSTTC2zHAE&index=3&t=4s Peng et al. 2019a: Recording)]<br><br>
 
  
<big>'''Existing Fitness for Purpose Assessment approaches Through the Full Life Cycle of Earth Science Datasets:''' </big><br>
+
A complete draft of the guidelines document (v00r05-20210417) is out for community review. The current document can be accessed at https://doi.org/10.31219/osf.io/xsu4p. A Google Form facilitates anonymous comment collection can be accessed [https://docs.google.com/forms/d/e/1FAIpQLSe93X3Imc4vnpQ3tntjEUd_ce6o-ePUrMWBlf2PjxeBHW5XYw/viewform here], which will be available until Friday June 4, 2021.  Alternatively, you can use [https://drive.google.com/file/d/1Xqu0Rj-rkWQnDZ7zyBEi_ys5IrOTboe4/view?usp=sharing this template] to capture all your comments and suggestions and send it to Ge Peng at [[Mailto:ge.peng@uah.edu|ge.peng@uah.edu]], Carlo Lacagnina at [[Mailto:carlo.lacagnina@bsc.es|carlo.lacagnina@bsc.es]], or Ivana Ivánová at [[Mailto:ivana.ivanova@curtin.edu.au|ivana.ivanova@curtin.edu.au]].
* Measurement systems: <br>
 
** GAIA-CLIM Measurement Maturity Matrix [http://www.gaia-clim.eu/system/files/workpkg_files/640276_Report%20on%20system%20of%20systems%20approach%20adopted%20and%20rationale.pdf (Thorne et al 2015)]<br>
 
  
* Production systems: <br>
+
Community feedback is important in helping us improve the quality of the document. Please contact us if you have any questions.
** CORE-CLIMAX Production System Maturity Matrix [https://www.eumetsat.int/website/home/Data/ClimateService/index.html (EUMETSAT 2013;] [https://www.ecmwf.int/sites/default/files/elibrary/2015/13474-system-maturity-assesment.pdf Schulz et al. 2015)] <br>
 
** DKRZ Quality Maturity Matrix [https://www.dkrz.de/pdfs/poster/Hoeck_et_al_EGU2015_maturitymatrices_15apr.pdf?lang=de (Hock et al. 2015)]<br>
 
** QA4ECV [https://www.mdpi.com/2072-4292/10/8/1254 (Nightingale et al. 2018)] <br>
 
  
* Scientific quality: <br>
+
=Resources=
** NASA Technical Readiness Levels for Operations [http://www.onethesis.com/wp-content/uploads/2016/11/1-s2.0-S0094576509002008-main.pdf (Mankins 2009)]<br>
+
<big>'''Case Statement''' </big>  
** NOAA STAR Data Product Algorithm Maturity Matrix [https://www.mdpi.com/2072-4292/8/2/139 (Zhou, Divakarla & Liu 2016)] <br>
 
** Perspectives of Data Uncertainty [https://esip.figshare.com/articles/Understanding_the_Various_Perspectives_of_Earth_Science_Observational_Data_Uncertainty/10271450 (Moroni et al. 2019)] <br>
 
** OGC UncertML (Williams et al. 2009) <br>
 
** Operational Readiness Levels For Disaster Operations (ESIP Disasters Cluster 2018) <br>  
 
  
* Product quality: <br>
+
*Case statement for developing community guidelines for consistently curating and representing dataset quality information [https://figshare.com/articles/Case_Statement_Community_Guidelines_for_FAIR_Dataset_Quality_Information/12605438 (Peng et al. 2020a)]
** NOAA CDR Product Maturity Matrix [https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2012EO440006 (Bates and Privette 2012)]<br>
+
*Call to Action for Global Access to and Harmonization of Quality Information of Individual Earth Science Datasets ([https://osf.io/nwe5p/ Peng et al. 2020b])
  
* Stewardship quality: <br>
+
<big>'''Summary Report of the Pre-ESIP Workshop'''</big>
** NCEI/CICS-NC Scientific Data Stewardship Maturity Matrix [https://datascience.codata.org/articles/abstract/10.2481/dsj.14-049/ (Peng et al. 2015)]<br>
 
** CEOS WGISS Data Management and Stewardship Maturity Matrix [http://ceos.org/document_management/Working_Groups/WGISS/Interest_Groups/Data_Stewardship/White_Papers/WGISS%20Data%20Management%20and%20Stewardship%20Maturity%20Matrix.pdf (WGISS DSIG 2017)]<br>
 
** WMO Stewardship Maturity Matrix for Climate Data [https://figshare.com/articles/The_WMO-Wide_Stewardship_Maturity_Matrix_for_Climate_Data/7006028 (SMM-CD Working Group 2019)]<br>
 
** GEOSS Data Management Principles and Data Sharing Principles [https://www.earthobservations.org/documents/dswg/201504_data_management_principles_long_final.pdf (GEO DMP TF 2015;] [https://www.earthobservations.org/documents/dswg/10_GEOSS%20Data%20Sharing%20Principles%20post%202015.pdf GEO DSWG 2014)] <br>
 
  
* Service quality:<br>
+
This workshop summary report ([https://osf.io/75b92 Peng et al. 2020]c) provides background for and summarizes main takeaways of a workshop held virtually to kick off the development of community guidelines for consistently curating and representing dataset quality information in a way that is in line with the FAIR principles.
** Level of Services Models: NSIDC (Duerr et al. 2009) and [https://earthdata.nasa.gov/collaborate/new-missions/level-of-service NASA Earth Science Data System] <br>
 
** NCEI Tiered Scientific Data Stewardship Services [http://www.dlib.org/dlib/may16/peng/05peng.html (Peng et al. 2016)] <br>
 
** [https://www.ncdc.noaa.gov/gosic/gcos-essential-climate-variable-ecv-data-access-matrix GCOS ECV Data and Information Access Matrix] <br>
 
** [https://www.goosocean.org/index.php?option=com_content&view=article&id=125&Itemid=113 Global Ocean Observing System (GOOS) framework] <br>
 
** NCEI/ESIP-DSC Data Use and Services Maturity Matrix [https://figshare.com/articles/MM-Serv_ESIP_2018sum_v2r1_20180709_pdf/6855020 (Serv-MM Working Group 2018)] <br>
 
** Data Use and Impact [https://esip.figshare.com/articles/Assessing_the_Science_Impact_of_Gridded_Population_Data_A_Pilot_Study/10028369 (Downs 2019)] <br><br>
 
  
<big>'''Dataset-level metadata quality: ''' </big><br>
+
<big>'''Multi-dimensions of Data and Information Quality:''' </big>  
* Completeness: NCEI Collection-Level Metadata Rubric Tool <br>
 
* FAIR metadata checklist – NCEAS MetaDIG [https://doi.org/10.5438/2chg-b074 (Habermann 2019 )] <br>
 
* Metadata checklist of LTER network data management system [https://www.sciencedirect.com/science/article/abs/pii/S157495411630108X?via%3Dihub (O’Brien et al. 2016)]<br>
 
* Data set provenance for science [https://www.researchgate.net/profile/Mark_Parsons4/publication/285600042_The_Importance_of_Data_Set_Provenance_for_Science/links/566b2c5a08ae62b05f04c570.pdf (Hills et al. 2015)] <br>
 
* Stewardship quality metadata [https://datascience.codata.org/articles/10.5334/dsj-2019-041/ (Peng et al 2019b)] <br> <br>
 
  
<big>'''Portfolio Management and Repository Certifications ''' </big><br>
+
*Quality Attributes for Data Consumers [https://www.tandfonline.com/doi/abs/10.1080/07421222.1996.11518099 (Wang and Strong 1996)]
* NGDA Data Lifecycle Maturity Model (LMM) [https://communities.geoplatform.gov/ngda-portfolio/2015-lifecycle-maturity-assessment/ (NGDA 2015]; [http://commons.esipfed.org/sites/default/files/2014_FGDC_BaselineAssessment_AGUPoster_PeltzLewisBlakeColemanJohnstonDeLoatch.pdf Peltz-Lewis et al. 2014)] <br>
+
*Multi-dimensions of Earth Science Data and Information Quality [http://www.dlib.org/dlib/july17/ramapriyan/07ramapriyan.html (Ramapriyan et al. 2017)]
* WDS-DSA-RDA Core Trustworthy Data Repository Requirements [https://doi.org/10.5281/zenodo.168411 (Edmunds et al. 2016;] [https://zenodo.org/record/3638211#.XtuwW55Kg0o 2019)]<br>
+
*Overview of Data Quality Perspectives and Maturity Models [https://datascience.codata.org/articles/10.5334/dsj-2018-007/ (Peng 2018]; [https://www.youtube.com/watch?v=4mmPMYXQg48&list=PLG25fMbdLRa6Y2GLFUKhuuovSTTC2zHAE&index=3&t=4s Peng et al. 2019a: Recording)]
* USGS Trusted Data Repository Checklist [http://doi.org/10.5334/dsj-2017-022 (Faundeen 2017)]<br>
 
* The TRUST Principles for Digital Repositories [https://doi.org/10.1038/s41597-020-0486-7 (Lin et al. 2020)] <br><br>
 
  
<big>'''FAIR Data Principles ''' </big><br>
+
<big>'''Existing Fitness for Purpose Assessment approaches Through the Full Life Cycle of Earth Science Datasets:''' </big>  
* FAIR Data Principles [https://www.nature.com/articles/sdata201618 (Wilkinson et al. 2016)]<br>
 
* RDA FAIR Data Maturity Model [https://www.rd-alliance.org/group/fair-data-maturity-model-wg/outcomes/fair-data-maturity-model-specification-and-guidelines (RDA FAIR Data Maturity Model WG 2020)] <br> <br>
 
  
<big>'''Organizational Challenges and Approaches ''' </big><br>
+
*Measurement systems:  
* NASA’s ESDSWG Data Quality Working Group Recommendations [https://www.youtube.com/watch?v=X8B5iHDH1xI&list=PLG25fMbdLRa6Y2GLFUKhuuovSTTC2zHAE&index=2&t=0s (Wei et al. 2019 - recording)]<br>
+
**GAIA-CLIM Measurement Maturity Matrix [http://www.gaia-clim.eu/system/files/workpkg_files/640276_Report%20on%20system%20of%20systems%20approach%20adopted%20and%20rationale.pdf (Thorne et al 2015)]
* Gaps in Essential Climate Variables Assessment [https://www.mdpi.com/2072-4292/11/8/986 (Nightingale et al. 2019)]<br>
 
* FAIR and Data Management for a Multidisciplinary Research Center [https://ir.uiowa.edu/cgi/viewcontent.cgi?article=1340&context=lib_pubs (Westra and Zhang 2019)]<br> <br>
 
  
<big>'''Data Quality Management Framework ''' </big><br>
+
*Production systems: 
* High Quality Global Data Management Framework for Climate Data (HQ-GDMFC) [https://library.wmo.int/doc_num.php?explnum_id=10197 (WMO 2019)]<br>
+
**CORE-CLIMAX Production System Maturity Matrix [https://www.eumetsat.int/website/home/Data/ClimateService/index.html (EUMETSAT 2013;] [https://www.ecmwf.int/sites/default/files/elibrary/2015/13474-system-maturity-assesment.pdf Schulz et al. 2015)]
* Implementation of a Data Management Quality Management Framework at the Marine Institute, Ireland [https://link.springer.com/article/10.1007/s12145-019-00432-w (Leadbetter et al. 2019)] <br>
+
**DKRZ Quality Maturity Matrix [https://www.dkrz.de/pdfs/poster/Hoeck_et_al_EGU2015_maturitymatrices_15apr.pdf?lang=de (Hock et al. 2015)]
* The Data Quality Challenge. Recommendations for Sustainable Research in the Digital Turn [http://www.rfii.de/download/the-data-quality-challenge-february-2020/ (Rat für Informationsinfrastrukturen 2020)]<br>
+
**QA4ECV [https://www.mdpi.com/2072-4292/10/8/1254 (Nightingale et al. 2018)]
* Conceptual Enterprise Framework for Managing Scientific Data Stewardship [https://datascience.codata.org/articles/10.5334/dsj-2018-015/ (Peng et al. 2018);] [https://doi.org/10.6084/m9.figshare.9171830 (Peng, Privette, & Maycock 2019)]<br> <br>
 
  
= Intended Users =
+
*Scientific quality: 
* Data producers, publishers, providers, and service providers for improved data sharing and reuse; <br>
+
**NASA Technical Readiness Levels for Operations [http://www.onethesis.com/wp-content/uploads/2016/11/1-s2.0-S0094576509002008-main.pdf (Mankins 2009)]
* Data quality management professionals for improved data quality and usability; <br>
+
**NOAA STAR Data Product Algorithm Maturity Matrix [https://www.mdpi.com/2072-4292/8/2/139 (Zhou, Divakarla & Liu 2016; Zhou et al. 2019)]
* Entities or organizations that manage and steward Earth science datasets during any stage of their full life cycle for improved enterprise data management and stewardship; <br>
+
**Perspectives of Data Uncertainty [https://esip.figshare.com/articles/Understanding_the_Various_Perspectives_of_Earth_Science_Observational_Data_Uncertainty/10271450 (Moroni et al. 2019)]
* End users who integrate various datasets and associated quality information for improved interoperability and reusability. <br> <br>
+
**OGC UncertML (Williams et al. 2009)
 +
**Operational Readiness Levels For Disaster Operations ([https://www.esipfed.org/orl ESIP Disasters Cluster])
  
= Definitions =
+
*Product quality: 
* '''''Data''''' can refer to anything that is collected, observed, or derived and used as a basis for reasoning, discussion, or calculation. Data can be either structured or unstructured, and can be represented in quantitative, qualitative, or physical forms. <br>
+
**NOAA CDR Product Maturity Matrix [https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2012EO440006 (Bates and Privette 2012)]
* '''''Scientific or research data''''' is defined as: the recorded factual material commonly accepted in the scientific community as necessary to validate research findings. <br>
+
 
* '''''Digital data''''', distinguished from physical records, such as paper weather reports, are represented in discrete numerical form that can be used by a computer or electronic device. <br>
+
*Stewardship quality:
* '''''Data product''''' refers to “a product that facilitates an end goal through the use of data,” usually with a well-thought out algorithm or approach [http://radar.oreilly.com/2012/07/data-jujitsu.html (Patil 2012)]. Data products tend to be structured and can be raw measurements or scientific products derived from raw measurements or other products. Products can also be statistical or numerical model outputs, including analyses, reanalyses, predictions, or projections. Earth Science data products may be further categorized based on their processing levels. <br>
+
**NCEI/CICS-NC Scientific Data Stewardship Maturity Matrix [https://datascience.codata.org/articles/abstract/10.2481/dsj.14-049/ (Peng et al. 2015)]
* '''''Dataset''''' is an identifiable collection of physical records, a digital rendition of factual materials, or a product of a given version of an algorithm/model. A dataset may contain one or many physical samples or data files in an identical format, having the same geophysical variable(s) and product specification(s), such as the geospatial location or spatial grid. Dataset and data product may be used interchangeably. <br>
+
**CEOS WGISS Data Management and Stewardship Maturity Matrix [http://ceos.org/document_management/Working_Groups/WGISS/Interest_Groups/Data_Stewardship/White_Papers/WGISS%20Data%20Management%20and%20Stewardship%20Maturity%20Matrix.pdf (WGISS DSIG 2017)]
* '''''Knowledge''''' is an abstract concept, defined as a familiarity, awareness, or understanding of someone or something, gained through education, experience, or association. It can refer to a theoretical or practical understanding of a subject. <br>
+
**WMO Stewardship Maturity Matrix for Climate Data [https://figshare.com/articles/The_WMO-Wide_Stewardship_Maturity_Matrix_for_Climate_Data/7006028 (SMM-CD Working Group 2019)]
* '''''Information''''' is considered as data being processed, organized, structured, or presented in a given context, while '''''knowledge''''' is gained from an understanding of the significance of information (Mosely et al. 2009, available at: https://technicspub.com/dmbok). Data and information may overlap and may be used interchangeably.<br>
+
**GEOSS Data Management Principles and Data Sharing Principles [https://www.earthobservations.org/documents/dswg/201504_data_management_principles_long_final.pdf (GEO DMP TF 2015;] [https://www.earthobservations.org/documents/dswg/10_GEOSS%20Data%20Sharing%20Principles%20post%202015.pdf GEO DSWG 2014)]
* '''''Dataset quality''''' includes quality of both data and associated information. <br>
+
 
* '''''Maturity model'''''  refers to a maturity reference or assessment model with desired evolution in discrete stages from a certain aspect or perspective of dataset quality. <br><br>
+
*Service quality:
 +
**Level of Services Models: NSIDC (Duerr et al. 2009) and [https://earthdata.nasa.gov/collaborate/new-missions/level-of-service NASA Earth Science Data System]
 +
**NCEI Tiered Scientific Data Stewardship Services [http://www.dlib.org/dlib/may16/peng/05peng.html (Peng et al. 2016)]
 +
**[https://www.ncdc.noaa.gov/gosic/gcos-essential-climate-variable-ecv-data-access-matrix GCOS ECV Data and Information Access Matrix]
 +
**[https://www.goosocean.org/index.php?option=com_content&view=article&id=125&Itemid=113 Global Ocean Observing System (GOOS) framework]
 +
**NCEI/ESIP-DSC Data Use and Services Maturity Matrix [https://figshare.com/articles/MM-Serv_ESIP_2018sum_v2r1_20180709_pdf/6855020 (Serv-MM Working Group 2018)]
 +
**Data Use and Impact [https://esip.figshare.com/articles/Assessing_the_Science_Impact_of_Gridded_Population_Data_A_Pilot_Study/10028369 (Downs 2019)]
 +
 
 +
<big>'''Dataset-level metadata quality: ''' </big>
 +
 
 +
*Completeness: NCEI Collection-Level Metadata Rubric Tool [https://data.noaa.gov//metaview/page?xml=NOAA/NESDIS/NGDC/Collection/iso/xml/ngdc_dems.xml&view=rubricv2/recordHTML (An Assessment Example)]
 +
*FAIR metadata checklist – NCEAS MetaDIG [https://doi.org/10.5438/2chg-b074 (Habermann 2019 )]
 +
*Metadata checklist of LTER network data management system [https://www.sciencedirect.com/science/article/abs/pii/S157495411630108X?via%3Dihub (O’Brien et al. 2016)]
 +
*Data set provenance for science [https://www.researchgate.net/profile/Mark_Parsons4/publication/285600042_The_Importance_of_Data_Set_Provenance_for_Science/links/566b2c5a08ae62b05f04c570.pdf (Hills et al. 2015)]
 +
*Stewardship quality metadata [https://datascience.codata.org/articles/10.5334/dsj-2019-041/ (Peng et al 2019b)]
 +
*AtMoDat (ATmospheric MOdel DATa) Maturity Indicator Metadata [https://www.atmodat.de/dissemination/2020_presentation_egu_neumann?lang=en (Neumann et al. 2020)]
 +
 
 +
<big>'''Portfolio Management and Repository Certifications ''' </big>
 +
 
 +
*NGDA Data Lifecycle Maturity Model (LMM) [https://communities.geoplatform.gov/ngda-portfolio/2015-lifecycle-maturity-assessment/ (NGDA 2015]; [http://commons.esipfed.org/sites/default/files/2014_FGDC_BaselineAssessment_AGUPoster_PeltzLewisBlakeColemanJohnstonDeLoatch.pdf Peltz-Lewis et al. 2014)]
 +
*CoreTrustSeal Trustworthy Data Repository Requirements [https://doi.org/10.5281/zenodo.168411 (Edmunds et al. 2016;] [https://zenodo.org/record/3638211#.XtuwW55Kg0o 2019)]
 +
*USGS Trusted Data Repository Checklist [http://doi.org/10.5334/dsj-2017-022 (Faundeen 2017)]
 +
*The TRUST Principles for Digital Repositories [https://doi.org/10.1038/s41597-020-0486-7 (Lin et al. 2020)]
 +
 
 +
<big>'''FAIR Data Principles ''' </big>
 +
 
 +
*FAIR Data Principles [https://www.nature.com/articles/sdata201618 (Wilkinson et al. 2016)]
 +
*RDA FAIR Data Maturity Model [https://www.rd-alliance.org/group/fair-data-maturity-model-wg/outcomes/fair-data-maturity-model-specification-and-guidelines (RDA FAIR Data Maturity Model WG 2020)]
 +
*EOSC FAIR Metrics [https://zenodo.org/record/4106116#.X5AxgXhKhm8 (Genova et al. 2020)]
 +
*A self-assessment tool to measure the FAIR-ness of an organization [https://zenodo.org/record/4080867#.X5AvlXhKhm8 (Bruin et al. 2020)]
 +
 
 +
<big>'''Organizational Challenges and Approaches ''' </big>
 +
 
 +
*NASA’s ESDSWG Data Quality Working Group Recommendations [https://www.youtube.com/watch?v=X8B5iHDH1xI&list=PLG25fMbdLRa6Y2GLFUKhuuovSTTC2zHAE&index=2&t=0s (Wei et al. 2019 - recording)]
 +
*Gaps in Essential Climate Variables Assessment [https://www.mdpi.com/2072-4292/11/8/986 (Nightingale et al. 2019)]
 +
*FAIR and Data Management for a Multidisciplinary Research Center [https://ir.uiowa.edu/cgi/viewcontent.cgi?article=1340&context=lib_pubs (Westra and Zhang 2019)]
 +
 
 +
<big>'''Data Quality Management Framework ''' </big>
 +
 
 +
*High Quality Global Data Management Framework for Climate Data (HQ-GDMFC) [https://library.wmo.int/doc_num.php?explnum_id=10197 (WMO 2019)]
 +
*Implementation of a Data Management Quality Management Framework at the Marine Institute, Ireland [https://link.springer.com/article/10.1007/s12145-019-00432-w (Leadbetter et al. 2019)]
 +
*The Data Quality Challenge. Recommendations for Sustainable Research in the Digital Turn [http://www.rfii.de/download/the-data-quality-challenge-february-2020/ (Rat für Informationsinfrastrukturen 2020)]
 +
*Conceptual Enterprise Framework for Managing Scientific Data Stewardship [https://datascience.codata.org/articles/10.5334/dsj-2018-015/ (Peng et al. 2018);] [https://doi.org/10.6084/m9.figshare.9171830 (Peng, Privette, & Maycock 2019)]
 +
 
 +
=Intended Users=
 +
 
 +
*Data producers, publishers, providers, and service providers for improved data sharing and reuse;
 +
*Data quality management professionals for improved data quality and usability;
 +
*Entities or organizations that manage and steward Earth science datasets during any stage of their full life cycle for improved enterprise data management and stewardship;
 +
*End users who integrate various datasets and associated quality information for improved interoperability and reusability. <br>
 +
 
 +
=Definitions=
 +
 
 +
*'''''Data''''' are representations of observations, objects, or other entities and can refer to anything that is collected, observed, generated or derived, and used as a basis for reasoning, discussion, or calculation. Data can be either structured or unstructured, and can be represented in quantitative, qualitative, or physical forms.
 +
*'''''Scientific or research data''''' is defined as: the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.
 +
*'''''Digital data''''', distinguished from physical records, such as paper weather reports, are represented in discrete numerical form that can be used by a computer or electronic device.
 +
*'''''Data product''''' refers to “a product that facilitates an end goal through the use of data,” usually with a well-thought out algorithm or approach [http://radar.oreilly.com/2012/07/data-jujitsu.html (Patil 2012)]. Data products tend to be structured and can be raw measurements or scientific products derived from raw measurements or other products. Products can also be statistical or numerical model outputs, including analyses, reanalyses, predictions, or projections. Earth Science data products may be further categorized based on their processing levels.
 +
*'''''Dataset''''' is an identifiable collection of physical records, and it can be processed, curated or published by a single agent. It may refer to a digital rendition of factual materials, or a product of a given version of an algorithm/model. A dataset may contain one or many physical samples or data files in an identical format, having the same geophysical variable(s) and product specification(s), such as the geospatial location or spatial grid. The general notion of datasets found in the literature currently is characterized by an interrelated family of more specific concepts: grouping, content, relatedness, and purpose ([https://doi.org/10.1002/meet.14504701240 Renear et al 2010]). Dataset and data product may be used interchangeably.
 +
*'''''Dataset quality''''' includes quality of both data and associated information, examples of which are metadata, software, algorithms, and practices or procedures applied to the dataset throughout its entire life cycle. Dataset quality is a multi-dimensional construct perception and/or a judgment of data's fitness or trustworthiness to serve intended research uses in a given context.
 +
*'''''Dataset quality information''''' includes quality of both data quality descriptive information such as those captured in documents, e.g., papers or reports, and quality metadata that is captured in a metadata record, throughout the entire life cycle of a dataset.
 +
*'''''Information''''' is considered as data being processed, organized, structured, communicated or presented so as to be meaningful to the recipient in a given context.
 +
*'''''Knowledge''''' is an abstract concept, defined as a familiarity, awareness, or understanding of someone or something, gained through education, experience, or association. It can refer to a theoretical or practical understanding of a subject.
 +
*'''''Maturity model'''''  refers to a maturity reference or assessment model with desired evolution in discrete stages from a certain aspect or perspective of dataset quality.
  
 
---
 
---

Latest revision as of 07:21, July 22, 2022

Document

This is the document for community guidelines of consistently curating and representing dataset quality information, in line with the FAIR principles.

Overview

This document provides resources for developing community guidelines for consistently curating and representing dataset quality information and captures the outcomes. The guidelines aim to help curate dataset quality information that is findable and accessible, machine- and human-readable, interoperable, and reusable.

The target community is any entity that produces, publishes, manages, or uses digital Earth Science datasets or products. However, the guidelines will be general enough to be applicable to digital datasets of other disciplines.

Guidelines Document

The first baseline of the guidelines document has been released after going through all the review comments and suggestions, and addressed them within the scope of the document. The latest version of the guidelines document is maintained at https://doi.org/10.31219/osf.io/xsu4p. See the peer-reviewed paper (Peng et al. 2022) on the guidelines development process.

History

A complete draft of the guidelines document (v00r05-20210417) is out for community review. The current document can be accessed at https://doi.org/10.31219/osf.io/xsu4p. A Google Form facilitates anonymous comment collection can be accessed here, which will be available until Friday June 4, 2021. Alternatively, you can use this template to capture all your comments and suggestions and send it to Ge Peng at [[1]], Carlo Lacagnina at [[2]], or Ivana Ivánová at [[3]].

Community feedback is important in helping us improve the quality of the document. Please contact us if you have any questions.

Resources

Case Statement

  • Case statement for developing community guidelines for consistently curating and representing dataset quality information (Peng et al. 2020a)
  • Call to Action for Global Access to and Harmonization of Quality Information of Individual Earth Science Datasets (Peng et al. 2020b)

Summary Report of the Pre-ESIP Workshop

This workshop summary report (Peng et al. 2020c) provides background for and summarizes main takeaways of a workshop held virtually to kick off the development of community guidelines for consistently curating and representing dataset quality information in a way that is in line with the FAIR principles.

Multi-dimensions of Data and Information Quality:

Existing Fitness for Purpose Assessment approaches Through the Full Life Cycle of Earth Science Datasets:

Dataset-level metadata quality:

Portfolio Management and Repository Certifications

FAIR Data Principles

Organizational Challenges and Approaches

Data Quality Management Framework

Intended Users

  • Data producers, publishers, providers, and service providers for improved data sharing and reuse;
  • Data quality management professionals for improved data quality and usability;
  • Entities or organizations that manage and steward Earth science datasets during any stage of their full life cycle for improved enterprise data management and stewardship;
  • End users who integrate various datasets and associated quality information for improved interoperability and reusability.

Definitions

  • Data are representations of observations, objects, or other entities and can refer to anything that is collected, observed, generated or derived, and used as a basis for reasoning, discussion, or calculation. Data can be either structured or unstructured, and can be represented in quantitative, qualitative, or physical forms.
  • Scientific or research data is defined as: the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.
  • Digital data, distinguished from physical records, such as paper weather reports, are represented in discrete numerical form that can be used by a computer or electronic device.
  • Data product refers to “a product that facilitates an end goal through the use of data,” usually with a well-thought out algorithm or approach (Patil 2012). Data products tend to be structured and can be raw measurements or scientific products derived from raw measurements or other products. Products can also be statistical or numerical model outputs, including analyses, reanalyses, predictions, or projections. Earth Science data products may be further categorized based on their processing levels.
  • Dataset is an identifiable collection of physical records, and it can be processed, curated or published by a single agent. It may refer to a digital rendition of factual materials, or a product of a given version of an algorithm/model. A dataset may contain one or many physical samples or data files in an identical format, having the same geophysical variable(s) and product specification(s), such as the geospatial location or spatial grid. The general notion of datasets found in the literature currently is characterized by an interrelated family of more specific concepts: grouping, content, relatedness, and purpose (Renear et al 2010). Dataset and data product may be used interchangeably.
  • Dataset quality includes quality of both data and associated information, examples of which are metadata, software, algorithms, and practices or procedures applied to the dataset throughout its entire life cycle. Dataset quality is a multi-dimensional construct perception and/or a judgment of data's fitness or trustworthiness to serve intended research uses in a given context.
  • Dataset quality information includes quality of both data quality descriptive information such as those captured in documents, e.g., papers or reports, and quality metadata that is captured in a metadata record, throughout the entire life cycle of a dataset.
  • Information is considered as data being processed, organized, structured, communicated or presented so as to be meaningful to the recipient in a given context.
  • Knowledge is an abstract concept, defined as a familiarity, awareness, or understanding of someone or something, gained through education, experience, or association. It can refer to a theoretical or practical understanding of a subject.
  • Maturity model refers to a maturity reference or assessment model with desired evolution in discrete stages from a certain aspect or perspective of dataset quality.

---

Return to Pre-ESIP Workshop: About
Return to Information Quality Cluster Homepage