Documentation Terminology

From Earth Science Information Partners (ESIP)
Revision as of 11:30, November 20, 2014 by Ted.Habermann (talk | contribs) (Created page with "The terminology used in this work has been developed over many years of metadata related work in NOAA, NASA and other U.S. Federal Agencies. It is described along with applica...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The terminology used in this work has been developed over many years of metadata related work in NOAA, NASA and other U.S. Federal Agencies. It is described along with application examples in Habermann, 2014. Basic terminology is described here to ease understanding.

Documentation Concepts

Documentation Concepts are the fundamental items that require documentation in metadata. They are independent of dialect and can exist at a number of levels of detail. For example, spatial extent is a high-level documentation concept that can be described at a general level (i.e. spatial extent), but can also include more detailed concepts like bounding latitude/longitude box or geographic identifiers.

Conceptual metadata models are generally implemented using the Unified Modeling Language (UML), for example ISO 19115-1. These concepts can be implemented in a variety of representations, e.g. XML, RDF, etc.

Metadata Dialects

Many scientific datasets and products are documented using approaches and tools developed by scientists and data collectors to support their own analysis and understanding. This documentation exists in notebooks, scientific papers, web pages, user guides, word processing documents, spreadsheets, data dictionaries, PDF’s, databases, custom binary and ASCII formats, and almost any other conceivable form, each with associated storage and preservation strategies. This custom, often unstructured, approach may work well for independent investigators or in the confines of a particular laboratory or community, but it makes it difficult for users outside of these small groups to discover, access, use, and independently understand the data without consulting with its creators. Metadata, in contrast to documentation, helps address discovery, access, use, and understanding by providing well-defined content in structured representations. This makes it possible for users to access and quickly understand many aspects of datasets that they have not collected. It also makes it possible to integrate metadata into discovery and analysis tools, and to provide consistent references from the metadata to external documentation.

Different disciplines and communities have developed their own approaches to addressing the challenges of documenting data (or other resources) and steps used in collection, processing, and analysis of those data. Communities frequently refer to the results of these efforts as “metadata standards”. In addition, national and international standards bodies or agencies produce formal metadata standards usually with the intent of supporting some set of requirements (i.e. data discovery) across discipline and national boundaries.

Unfortunately, this approach many times emphasizes differences between these communities. The ubiquitous “who, where, when, why, and how” questions must be answered in any discipline, so there is significant overlap between many of the concepts included in these “standards”. In fact, these standards are more like dialects of a universal documentation language then they are like separate languages. The term “metadata dialect” is introduced here as a substitute for “metadata standard” as an indication of a focus on universal documentation concepts rather than implementations in a particular “standard”.

Spirals

Data providers and managers can be overwhelmed as metadata dialects evolve and add complexity in response to increased use and understanding requirements. Dividing the metadata improvement process into a series of steps that can be explained and accomplished in a clear and orderly process can help overcome this barrier. This situation is similar to those encountered in large software development projects (Spiral Model) and, borrowing from that genre; these steps are termed “spirals”.

Spirals are collections of concepts needed to address specific use cases or requirements. They are independent of metadata dialects or implementations. As an example, consider a use case that requires a description of the spatial and temporal extent of a dataset. Conceptually this use case requires minima and maxima for horizontal and vertical coordinates and for time. These concepts could be implemented as eight metadata elements, a list of geographic or temporal keywords, a list of named geographic and temporal features, or some combination of these. In all cases, the concepts are the same.

Metadata records are universally shared in XML because it provides structure and is easily processed using a variety of standard tools. Implementing a spiral for a particular metadata dialect requires a mapping between the concepts in the spirals and the XML implementation of the dialect. This mapping is described using the standard mechanism for describing XML elements, xPath (XPath).

Recommendations

Data and observations are many times collected within a small research group or organization in order to address specific scientific questions. Preparing data for re-use or sharing with other groups or non-experts brings a different set of documentation requirements and many research groups look for guidance about how to satisfy those new requirements. Many groups within the U.S. and the global environmental data community have addressed this need for guidance, generally in the form of lists of metadata elements required, recommended, or suggested for a particular documentation need. We term these lists recommendations. Table 1 lists the recommendations we considered so far in this work.

It is important to understand that recommendations are subsets of the capabilities of a given dialect that are thought by a particular group to be important for a particular metadata need. They are always a subset of the capabilities of the dialect.

Rubrics

The metadata improvement process is planned as progress through a series of spirals. Quantitatively and consistently determining the state of a metadata record in the improvement process is important for characterizing the task to be done and for measuring progress towards improvement goals. Once a set of spirals and xPaths for a particular dialect are defined, the state of a metadata record can be characterized by counting the number of fields that exist for each spiral and showing the results using rubrics, a tool borrowed from the educational community (Rubric). The rubric provides a visual description of the state of a record by arranging the spirals as rows in a table with degree of completeness shown as columns. As the record becomes more complete, the rubric score increases.