Difference between revisions of "Metadata Dialects"

From Earth Science Information Partners (ESIP)
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Many scientific datasets and products are documented using approaches and tools developed by scientists and data collectors to support their own analysis and understanding. This documentation exists in notebooks, scientific papers, web pages, user guides, word processing documents, spreadsheets, data dictionaries, PDF’s, databases, custom binary and ASCII formats, and almost any other conceivable form, each with associated storage and preservation strategies. This custom, often un-structured, approach may work well for independent investigators or in the con-fines of a particular laboratory or community, but it makes it difficult for users out-side of these small groups to discover, access, use, and independently understand the data without consulting with its creators.
+
Metadata content can be approached in a variety of “dialects,” depending on the needs of specific user communities.  Though different, these languages also significantly overlap – as the “who, where, when, why, and how” must always be addressed, regardless of the community approach. Thus, in reality, these differences in approach are more akin to dialects of a universal documentation language than multiple, disparate languages.  As such, for the purposes of this work, the term “metadata dialect” will refer to standardized metadata documentation approaches, in order to promote emphasis on universal documentation concepts as opposed to implementation of individual standards.
 +
The following are some of the most common dialects used throughout the ESIP community.
  
Metadata, in contrast to documentation, helps address discovery, access, use, and understanding by providing well-defined content in structured representations. This makes it possible for users to access and quickly understand many aspects of datasets that they have not collected. It also makes it possible to integrate metadata into discovery and analysis tools, and to provide consistent references from the metadata to external documentation.
+
''Note:  While they are discussed independently, a dialect can use aspects of other dialects within its own — if the two dialects have the same/similar structure or the same file format.''
  
Different disciplines and communities have developed their own approaches to addressing the challenges of documenting data (or other resources) and steps used in collection, processing, and analysis of those data. Communities frequently refer to the results of these efforts as “metadata standards”. In addition, national and international standards bodies or agencies produce formal metadata standards usually with the intent of supporting some set of re-quirements (i.e. data discovery) across discipline and national boundaries.
+
* [[ADIwg (Alaska Data Integration Working Group)]]
 +
* [[CSDGM  (FGDC Content Standard for Digital Geospatial Metadata)]]
 +
* [[DCAT (Data Catalog Vocabulary)]]
 +
* [[Dcite (DataCite 3.1)]]
 +
* [[DIF (Directory Interchange Format)]]
 +
* [[Dryad]]
 +
* [[ECHO (EARTH OBSERVING SYSTEM (EOS) CLEARINGHOUSE)]]
 +
* [[ECS (EOSDIS Core System)]]
 +
* [[EML (Ecological Metadata Language)]]
 +
* [[HCLS (Dataset Descriptions: HCLS Community Profile)]]
 +
* [[HDF EOS5 (Hierarchical Data Format Earth Observing System 5)]]
 +
* [[ISO]]
 +
* [[ISO -1]]
 +
* [[netCDF (Network Common Data Format) Conventions]]
 +
* [[SERF (Service Entry Resource Format)]]
 +
* [[SOS (Sensor Observation Service)]]
 +
* [[THREDDS (Thematic Realtime Environmental Distributed Data Services)]]
 +
* [[WSDL (Web Service Description Language)]]
  
Unfortunately, this approach many times emphasizes differences between these communities. The ubiquitous “who, where, when, why, and how” questions must be answered in any discipline, so there is significant overlap between many of the con-cepts included in these “standards”. In fact, these standards are more like dialects of a universal documentation language then they are like separate languages. The term “metadata dialect” is introduced here as a substitute for “metadata standard” as an indication of a focus on universal documentation concepts rather than implementations in a particular “standard”.
 
  
==Content Standard for Digital Geospatial Metadata (CSDGM)==
 
The Federal Geographic Data Committee ([http://fgdc.gov FGDC]) is a U.S. Federal interagency committee that promotes the coordinated development, use, sharing, and dissemination of geospatial data. This nationwide data publishing effort is known as the National Spatial Data Infrastructure (NSDI). The NSDI is a physical, organizational, and virtual network designed to enable the development and sharing of this nation's digital geographic information resources. FGDC activities are administered through the FGDC Secretariat, hosted by the U.S. Geological Survey.
 
  
The Content Standard for Digital Geospatial Metadata  ([http://www.fgdc.gov/metadata/geospatial-metadata-standards#csdgm CSDGM]) is commonly referred to as FGDC Metadata. It is an important dialect for metadata for data distributed by the Federal and State governments in the United States and around the world.
+
[[Let's Start at the Beginning]]
 
 
==ISO==
 
ISO International Standards ensure that products and services are safe, reliable and of good quality. For business, they are strategic tools that reduce costs by minimizing waste and errors and increasing productivity. They help companies to access new markets, level the playing field for developing countries and facilitate free and fair global trade.
 
 
 
ISO Standards for geographic data and related resources are managed by ISO Technical Committee 211 ([http://www.isotc211.org/ TC211]). ISO 19115:2003 defines the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data.
 
 
 
'''ISO 19115:2003 is applicable to:'''
 
*the cataloguing of datasets, clearinghouse activities, and the full description of datasets;
 
*geographic datasets, dataset series, and individual geographic features and feature properties.
 
 
 
'''ISO 19115:2003 defines:'''
 
*mandatory and conditional metadata sections, metadata entities, and metadata elements;
 
*the minimum set of metadata required to serve the full range of metadata applications (data discovery, determining data fitness for use, data access, data transfer, and use of digital data);
 
*optional metadata elements - to allow for a more extensive standard description of geographic data, if required;
 
*a method for extending metadata to fit specialized needs.
 
 
 
Though ISO 19115:2003 is applicable to digital data, its principles can be extended to many other forms of geographic data such as maps, charts, and textual documents as well as non-geographic data.
 
 
 
ISO 19115-2:2009 extends the existing geographic metadata standard by defining the schema required for describing imagery and gridded data. It provides information about the properties of the measuring equipment used to acquire the data, the geometry of the measuring process employed by the equipment, and the production process used to digitize the raw data. This extension deals with metadata needed to describe the derivation of geographic information from raw data, including the properties of the measuring system, and the numerical methods and computational procedures used in the derivation. The metadata required to address coverage data in general is addressed sufficiently in the general part of ISO 19115.
 
[[http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml Codelist]]
 
 
 
==DIF (Directory Interchange Format)==
 
The Directory Interchange Format ([http://gcmd.nasa.gov/User/difguide/whatisadif.html DIF]) supports data discovery in NASA's Global Change Master Directory. The DIF structure has been flexible enough to evolve with growing metadata requirements, especially for the geospatial disciplines. DIF is a "container" for the metadata elements that are maintained in the IDN database, where validation for mandatory fields, keywords, personnel, etc. takes place.The DIF is used to create directory entries which describe a group of data. A DIF consists of a collection of fields which detail specific information about the data. Eight fields are required in the DIF; the others expand upon and clarify the information. Some of the fields are text fields, others require the use of controlled keywords (sometimes known as "valids").
 
An [http://gcmd.nasa.gov/Aboutus/xml/dif/dif.xsd XML schema] is available.
 
 
 
==ECHO==
 
The NASA-developed Earth Observing System (EOS) Clearinghouse ([http://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/esds-rfc-020/ECHO-10-Data-Partner-User-Guide-v-10.6.pdf ECHO]) is a spatial and temporal metadata registry that enables the science community to more easily use and exchange NASA's data and services. ECHO's main objective is to enable broader use of NASA's EOS data. It allows users to more efficiently search and access data and services and increases the potential for interoperability with new tools and services. The value of these resources increases as the potential to exchange and interoperate increases. ECHO has been working with other organizations to provide their Earth science metadata alongside NASA's for users to search and access. ECHO stores metadata from a variety of science disciplines and domains, including Climate Variability and Change, Carbon Cycle and Ecosystems, Earth Surface and Interior, Atmospheric Composition, Weather, and Water and Energy Cycle.
 
 
 
==ECS (U.S. Extended Continental Shelf)==
 
The U.S. Extended Continental Shelf ([http://www.ngdc.noaa.gov/mgg/ecs/ecs.html ECS]) Project is a multi-agency collaboration whose goals are to determine and define the extent of the U.S. continental shelf beyond 200 nautical miles (nm). Under international law as reflected in the 1982 United Nations Convention on the Law of the Sea (UNCLOS), every country is entitled to a continental shelf extending 200 nm from the coastline. The extended continental shelf (ECS) is the area that lies beyond this 200 nm limit where the U.S. has sovereign rights to the resources of the seafloor and sub-seafloor. The ECS Data Management Team will utilize and integrate common templates for all metadata using controlled vocabulary for science parameters, measurement units, platforms, instruments, programs, formats, domain specific content, etc.
 
 
 
==netCDF (Network Common Data Format) Conventions==
 
Network Common Data Form ([http://www.unidata.ucar.edu/software/netcdf/ NetCDF]) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The University Corporation for Atmospheric Research (UCAR) is the chief source of netCDF software, standards development, updates, etc. The format is an open standard. NetCDF Classic and 64-bit Offset Format are an international standard of the [http://www.opengeospatial.org/standards/netcdf Open Geospatial Consortium (OGC)].
 
The project is actively supported by UCAR.  Version 4.0 (2008) greatly enhanced the data model by allowing the use of the HDF5 data file format. Version 4.1 (2010) adds support for C and Fortran client access to specified subsets of remote data via [http://www.opendap.org/ OPeNDAP]. The format was originally based on the conceptual model of the NASA CDF but has since diverged and is no longer compatible with it.
 
 
 
Sharing data written in netCDF is facilitated by using the OGS [http://www.opengeospatial.org/standards/netcdf CF-netCDF (Climate-Forecast)] standard, the [https://geo-ide.noaa.gov/wiki/index.php?title=NetCDF_Attribute_Convention_for_Dataset_Discovery NetCDF Attribute Convention for Dataset Discovery] and the [https://geo-ide.noaa.gov/wiki/index.php?title=NODC_NetCDF_Templates NODC NetCDF templates.]
 
 
 
For radial data, sharing data is facilitated by using the [http://www.ral.ucar.edu/projects/titan/docs/radial_formats/cfradial.html Cf/Radial] convention.  Cf/Radial is a CF compliant convention for radial data from radar and lidar platforms that supports both airborne and ground-based sensors. The [http://www.eol.ucar.edu/  National Center for Atmospheric Research (NCAR) Earth Observing Laboratory] provides a [http://www.ral.ucar.edu/projects/titan/docs/radial_formats/radx.html#radx_lib  C++ library] and [http://www.ral.ucar.edu/projects/titan/docs/radial_formats/radx.html#radx_apps applications] to read/write CfRadial, [http://www.ral.ucar.edu/projects/titan/docs/radial_formats/index.html#dorade DORADE], [http://www.ral.ucar.edu/projects/titan/docs/radial_formats/index.html#uf UF], [http://www.eol.ucar.edu/Members/dennisf/foray FORAY-1], [http://www.roc.noaa.gov/WSR88D/PublicDocs/Publications/120214.pdf NEXRAD], and [ftp://ftp.sigmet.com/outgoing/manuals/program/3data.pdf SIGMET] formats.  NCAR, [http://www.unidata.ucar.edu/ Unidata], [http://www.arm.gov/ DOE/ARM], [http://www.nssl.noaa.gov/ NOAA/NSSL], various universities, and commercial radar vendors([http://www.eecradar.com/ EEC] and [http://www.prosensing.com/ Pro Sensing]) are using Cf/Radial.
 
 
 
==THREDDS==
 
The [http://www.unidata.ucar.edu/projects/THREDDS/ THREDDS] (Thematic Realtime Environmental Distributed Data Services) project is developing middleware to bridge the gap between data providers and data users. The goal is to simplify the discovery and use of scientific data and to allow scientific publications and educational materials to reference scientific data. The [http://www.unidata.ucar.edu/software/tds/ THREDDS Data Server] includes an XML representation for documenting catalogs and datasets included in them. This is the dialect described here.
 
 
 
==SOS==
 
The Open Geospatial Consortium [http://www.opengeospatial.org/standards/sos Sensor Observation Service] provides standard tools for managing sensor networks and data in an interoperable way. This standard defines a Web service interface which allows querying observations, sensor metadata, as well as representations of observed features. The initial request to an SOS server is for capabilities of the server. The GetCapabilities response is the dialect considered here.
 
 
 
==EML==
 
Ecological Metadata Language ([http://knb.ecoinformatics.org/software/eml/ EML]) is a metadata specification developed by the ecology discipline and for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document ecological data. Each EML module is designed to describe one logical part of the total metadata that should be included with any ecological dataset.
 
 
 
==Dryad==
 
[http://datadryad.org Dryad] is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies. Dryad is governed by a consortium of journals that collaboratively promote data archiving and ensure the sustainability of the repository.
 
 
 
[[Category:Documentation Cluster]]
 
[[Category:Documentation Connections]]
 

Latest revision as of 11:40, July 29, 2015

Metadata content can be approached in a variety of “dialects,” depending on the needs of specific user communities. Though different, these languages also significantly overlap – as the “who, where, when, why, and how” must always be addressed, regardless of the community approach. Thus, in reality, these differences in approach are more akin to dialects of a universal documentation language than multiple, disparate languages. As such, for the purposes of this work, the term “metadata dialect” will refer to standardized metadata documentation approaches, in order to promote emphasis on universal documentation concepts as opposed to implementation of individual standards. The following are some of the most common dialects used throughout the ESIP community.

Note: While they are discussed independently, a dialect can use aspects of other dialects within its own — if the two dialects have the same/similar structure or the same file format.


Let's Start at the Beginning