Difference between revisions of "Metadata Dialects"

From Earth Science Information Partners (ESIP)
 
(25 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Traditionally the term metadata is used in the world of libraries for a set of descriptors describing documents such as books and dissertations that are stored in them. Each such document is associated with a card containing such descriptions and it is used for discovery and management purposes. In the world of digital libraries, the term “metadata” had its revival. In the era of the Internet it will become even more difficult to discover a resource that may be useful for the work in mind. In ever growing repositories of digital resources that are highly related also management is an increasingly difficult task.
+
Metadata content can be approached in a variety of “dialects,” depending on the needs of specific user communities.  Though different, these languages also significantly overlap – as the “who, where, when, why, and how” must always be addressed, regardless of the community approach.  Thus, in reality, these differences in approach are more akin to dialects of a universal documentation language than multiple, disparate languages.  As such, for the purposes of this work, the term “metadata dialect” will refer to standardized metadata documentation approaches, in order to promote emphasis on universal documentation concepts as opposed to implementation of individual standards.
Therefore, the term “metadata” now usually refers to machine readable structured data of the keyword/value type describing Internet resources as a whole. These can be used to discover and manage these resources that can be distributed all over the Internet. The set of descriptors and their structural arrangement are specified by DTDs or XML schema and their semantics should be defined carefully according to ISO standards. Of course, this type of metadata has to be openly accessible.[[http://www.mpi.nl/echo/tec-rep/wp2-tr08-2003v1.pdf]]
+
The following are some of the most common dialects used throughout the ESIP community.
  
==FGDC Content Standard for Geospatial Metadata (CSDGM)==
+
''Note:  While they are discussed independently, a dialect can use aspects of other dialects within its own — if the two dialects have the same/similar structure or the same file format.''
The Federal Geographic Data Committee (FGDC) is an interagency committee that promotes the coordinated development, use, sharing, and dissemination of geospatial data on a national basis. This nationwide data publishing effort is known as the National Spatial Data Infrastructure (NSDI). The NSDI is a physical, organizational, and virtual network designed to enable the development and sharing of this nation's digital geographic information resources. FGDC activities are administered through the FGDC Secretariat, hosted by the U.S. Geological Survey.  
 
  
The Content Standard for Geospatial Metadata ([http://www.fgdc.gov/metadata/geospatial-metadata-standards#csdgm CSDGM]) is commonly referred to as FGDC Metadata. It is the dialect considered here.
+
* [[ADIwg (Alaska Data Integration Working Group)]]
 +
* [[CSDGM  (FGDC Content Standard for Digital Geospatial Metadata)]]
 +
* [[DCAT (Data Catalog Vocabulary)]]
 +
* [[Dcite (DataCite 3.1)]]
 +
* [[DIF (Directory Interchange Format)]]
 +
* [[Dryad]]
 +
* [[ECHO (EARTH OBSERVING SYSTEM (EOS) CLEARINGHOUSE)]]
 +
* [[ECS (EOSDIS Core System)]]
 +
* [[EML (Ecological Metadata Language)]]
 +
* [[HCLS (Dataset Descriptions: HCLS Community Profile)]]
 +
* [[HDF EOS5 (Hierarchical Data Format Earth Observing System 5)]]
 +
* [[ISO]]
 +
* [[ISO -1]]
 +
* [[netCDF (Network Common Data Format) Conventions]]
 +
* [[SERF (Service Entry Resource Format)]]
 +
* [[SOS (Sensor Observation Service)]]
 +
* [[THREDDS (Thematic Realtime Environmental Distributed Data Services)]]
 +
* [[WSDL (Web Service Description Language)]]
  
==ISO==
 
ISO International Standards ensure that products and services are safe, reliable and of good quality. For business, they are strategic tools that reduce costs by minimizing waste and errors and increasing productivity. They help companies to access new markets, level the playing field for developing countries and facilitate free and fair global trade.
 
  
ISO Standards for geographic data and related resources are managed by ISO Technical Committee 211 ([http://www.isotc211.org/ TC211]). ISO 19115:2003 defines the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data.
 
  
'''ISO 19115:2003 is applicable to:'''
+
[[Let's Start at the Beginning]]
*the cataloguing of datasets, clearinghouse activities, and the full description of datasets;
 
*geographic datasets, dataset series, and individual geographic features and feature properties.
 
 
 
'''ISO 19115:2003 defines:'''
 
*mandatory and conditional metadata sections, metadata entities, and metadata elements;
 
*the minimum set of metadata required to serve the full range of metadata applications (data discovery, determining data fitness for use, data access, data transfer, and use of digital data);
 
*optional metadata elements - to allow for a more extensive standard description of geographic data, if required;
 
*a method for extending metadata to fit specialized needs.
 
 
 
Though ISO 19115:2003 is applicable to digital data, its principles can be extended to many other forms of geographic data such as maps, charts, and textual documents as well as non-geographic data.
 
 
 
ISO 19115-2:2009 extends the existing geographic metadata standard by defining the schema required for describing imagery and gridded data. It provides information about the properties of the measuring equipment used to acquire the data, the geometry of the measuring process employed by the equipment, and the production process used to digitize the raw data. This extension deals with metadata needed to describe the derivation of geographic information from raw data, including the properties of the measuring system, and the numerical methods and computational procedures used in the derivation. The metadata required to address coverage data in general is addressed sufficiently in the general part of ISO 19115.
 
[[http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml Codelist]]
 
 
 
==DIF (Directory Interchange Format)==
 
The Directory Interchange Format ([http://gcmd.nasa.gov/User/difguide/whatisadif.html DIF]) supports data discovery in NASA's Global Change Master Directory. The DIF structure has been flexible enough to evolve with growing metadata requirements, especially for the geospatial disciplines. DIF is a "container" for the metadata elements that are maintained in the IDN database, where validation for mandatory fields, keywords, personnel, etc. takes place.The DIF is used to create directory entries which describe a group of data. A DIF consists of a collection of fields which detail specific information about the data. Eight fields are required in the DIF; the others expand upon and clarify the information. Some of the fields are text fields, others require the use of controlled keywords (sometimes known as "valids").
 
An [http://gcmd.nasa.gov/Aboutus/xml/dif/dif.xsd XML schema] is available.
 
 
 
==ECHO==
 
The NASA-developed Earth Observing System (EOS) Clearinghouse ([http://earthdata.nasa.gov/sites/default/files/esdswg/spg/rfc/esds-rfc-020/ECHO-10-Data-Partner-User-Guide-v-10.6.pdf ECHO]) is a spatial and temporal metadata registry that enables the science community to more easily use and exchange NASA's data and services. ECHO's main objective is to enable broader use of NASA's EOS data. It allows users to more efficiently search and access data and services and increases the potential for interoperability with new tools and services. The value of these resources increases as the potential to exchange and interoperate increases. ECHO has been working with other organizations to provide their Earth science metadata alongside NASA's for users to search and access. ECHO stores metadata from a variety of science disciplines and domains, including Climate Variability and Change, Carbon Cycle and Ecosystems, Earth Surface and Interior, Atmospheric Composition, Weather, and Water and Energy Cycle.
 
 
 
==ECS (U.S. Extended Continental Shelf)==
 
The U.S. Extended Continental Shelf ([http://www.ngdc.noaa.gov/mgg/ecs/ecs.html ECS]) Project is a multi-agency collaboration whose goals are to determine and define the extent of the U.S. continental shelf beyond 200 nautical miles (nm). Under international law as reflected in the 1982 United Nations Convention on the Law of the Sea (UNCLOS), every country is entitled to a continental shelf extending 200 nm from the coastline. The extended continental shelf (ECS) is the area that lies beyond this 200 nm limit where the U.S. has sovereign rights to the resources of the seafloor and sub-seafloor. The ECS Data Management Team will utilize and integrate common templates for all metadata using controlled vocabulary for science parameters, measurement units, platforms, instruments, programs, formats, domain specific content, etc.
 
 
 
==netCDF (Network Common Data Form)==
 
Network Common Data Form ([http://www.unidata.ucar.edu/software/netcdf/ NetCDF]) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The University Corporation for Atmospheric Research (UCAR) is the chief source of netCDF software, standards development, updates, etc. The format is an open standard. NetCDF Classic and 64-bit Offset Format are an international standard of the [http://www.opengeospatial.org/standards/netcdf Open Geospatial Consortium].
 
The project is actively supported by UCAR. The recently released (2008) version 4.0 greatly enhances the data model by allowing the use of the HDF5 data file format. Version 4.1 (2010) adds support for C and Fortran client access to specified subsets of remote data via [http://www.opendap.org/ OPeNDAP]. The format was originally based on the conceptual model of the NASA CDF but has since diverged and is no longer compatible with it.
 
 
 
Sharing data written in netCDF is facilitated by using [http://cf-pcmdi.llnl.gov/ Climate-Forecast] and [https://geo-ide.noaa.gov/wiki/index.php?title=NetCDF_Attribute_Convention_for_Dataset_Discovery Data Discovery] conventions.
 
 
 
==THREDDS==
 
The [http://www.unidata.ucar.edu/projects/THREDDS/ THREDDS] (Thematic Realtime Environmental Distributed Data Services) project is developing middleware to bridge the gap between data providers and data users. The goal is to simplify the discovery and use of scientific data and to allow scientific publications and educational materials to reference scientific data. The [http://www.unidata.ucar.edu/software/tds/ THREDDS Data Server] includes an XML representation for documenting catalogs and datasets included in them. This is the dialect described here.
 
 
 
==SOS==
 
The Open Geospatial Consortium [http://www.opengeospatial.org/standards/sos Sensor Observation Service] provides standard tools for managing sensor networks and data in an interoperable way. This standard defines a Web service interface which allows querying observations, sensor metadata, as well as representations of observed features. The initial request to an SOS server is for capabilities of the server. The GetCapabilities response is the dialect considered here.
 
 
 
==EML==
 
Ecological Metadata Language ([http://knb.ecoinformatics.org/software/eml/ EML]) is a metadata specification developed by the ecology discipline and for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document ecological data. Each EML module is designed to describe one logical part of the total metadata that should be included with any ecological dataset.
 
 
 
==Dryad==
 
[http://datadryad.org Dryad] is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies. Dryad is governed by a consortium of journals that collaboratively promote data archiving and ensure the sustainability of the repository.
 
 
 
[[Category:Documentation Cluster]]
 
[[Category:Documentation Connections]]
 

Latest revision as of 11:40, July 29, 2015

Metadata content can be approached in a variety of “dialects,” depending on the needs of specific user communities. Though different, these languages also significantly overlap – as the “who, where, when, why, and how” must always be addressed, regardless of the community approach. Thus, in reality, these differences in approach are more akin to dialects of a universal documentation language than multiple, disparate languages. As such, for the purposes of this work, the term “metadata dialect” will refer to standardized metadata documentation approaches, in order to promote emphasis on universal documentation concepts as opposed to implementation of individual standards. The following are some of the most common dialects used throughout the ESIP community.

Note: While they are discussed independently, a dialect can use aspects of other dialects within its own — if the two dialects have the same/similar structure or the same file format.


Let's Start at the Beginning