Difference between revisions of "Documentation Terminology"

From Earth Science Information Partners (ESIP)
(Created page with "The terminology used in this work has been developed over many years of metadata related work in NOAA, NASA and other U.S. Federal Agencies. It is described along with applica...")
(69 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The terminology used in this work has been developed over many years of metadata related work in NOAA, NASA and other U.S. Federal Agencies. It is described along with application examples in [http://figshare.com/articles/Metadata_Evaluation_and_Improvement/1133879 Habermann, 2014]. Basic terminology is described here to ease understanding.
Established metadata terminology is the result of a multi-decade, cooperative effort between metadata experts in NOAA, NASA and other U.S. Federal Agencies. The selection below is intended to provide a framework of basic terminology in order to facilitate understanding of fundamental concepts.  More in-depth information and application examples are available in [http://figshare.com/articles/Metadata_Evaluation_and_Improvement/1133879 Habermann, 2014].  
== Documentation Concepts ==
Documentation Concepts are the fundamental items that require documentation in metadata. They are independent of dialect and can exist at a number of levels of detail. For example, spatial extent is a high-level documentation concept that can be described at a general level (i.e. spatial extent), but can also include more detailed concepts like bounding latitude/longitude box or geographic identifiers.
<table border="1" cellpadding="3">
        <td valign="top"><b>Collection</b></td>
        <td valign="top">A group of metadata records commonly organized by a data facility, organization or project and often stored in a database or web accessible folder.
        <td valign="top"><b>Concept</b></td>
        <td valign="top">General term for describing a documentation entity. Concepts are independent of dialect so a single concept can occur in many dialects. They are typically represented (in XML) by an element or a collection of elements.</td>
        <td valign="top"><b>Dialect</b></td>
        <td valign="top">A particular representation of metadata that is specific to a community. Examples include Content Standard for Digital Geographic metadata (CSDGM), Directoy Interchange Format (DIF), ISO 19115-3 (XML representation of ISO 19115-1 conceptual model).</td>
        <td valign="top"><b>Dialect Maximum</b></td>
        <td valign="top">The number of concepts from a particular recommendation that are included in a particular dialect and, therefore, the maximum number of concepts from that recommendation that can be represented in the dialect. Note:  the dialect maximum is always less than or equal to the number of concepts included in the recommendation (recommendation maximum).</td>
        <td valign="top"><b>Documentation</b></td>
        <td valign="top">The complete collection of unstructured written, drawn, presented or recorded materials necessary for discovering, accessing, understanding, and reproducing scientific data and results.</td>
Conceptual metadata models are generally implemented using the Unified Modeling Language ([http://en.wikipedia.org/wiki/Unified_Modeling_Language UML]), for example ISO 19115-1. These concepts can be implemented in a variety of representations, e.g. XML, RDF, etc.
        <td valign="top"><b>Element</b></td>
        <td valign="top">An item providing a value for a concept, typically in an XML representation. Elements depend on dialects. They are the instantiation of a concept in a dialect.</td>
        <td valign="top"><b>Level</b></td>
        <td valign="top">Recommendations may have different degrees of necessity associated with a concept's occurrence in a record e.g. mandatory, recommended, and suggested. These subsets of concepts within a recommendation are called levels.
        <td valign="top"><b>Metadata</b></td>
        <td valign="top">Structured and standardized elements of scientific documentation</td>
        <td valign="top"><b>Recommendation</b></td>
        <td valign="top">A set of concepts that an organization identifies for achieving a documentation goal.
        <td valign="top"><b>Recommendation Maximum</b></td>
        <td valign="top">The number of concepts included in a recommendation. Note that the recommendation maximum is the maximum completeness score available for a metadata record being evaluated with respect to that recommendation. The recommendation maxima are always greater than or equal to all dialect maxima for that recommendation.
        <td valign="top"><b>Signature</b></td>
        <td valign="top">A series of numbers that give the number of concepts/elements missing from a metadata record (or a group of metadata records) in a series of spirals. Signatures with low numbers indicate fewer missing elements and a signature made up completely of 0's indicates a record or group of records that is complete with respect to a particular recommendation/dialect combination. A signature of 2 3 indicates that 2 elements are missing from the first spiral and 3 are missing from the second. The sum of the numbers in a signature is the total number of elements missing from a record or group of records.
        <td valign="top"><b>Spiral</b></td>
        <td valign="top">A set of concepts required to support a particular documentation need or use case.
== Metadata Dialects ==
Many scientific datasets and products are documented using approaches and tools developed by scientists and data collectors to support their own analysis and understanding. This documentation exists in notebooks, scientific papers, web pages, user guides, word processing documents, spreadsheets, data dictionaries, PDF’s, databases, custom binary and ASCII formats, and almost any other conceivable form, each with associated storage and preservation strategies. This custom, often unstructured, approach may work well for independent investigators or in the confines of a particular laboratory or community, but it makes it difficult for users outside of these small groups to discover, access, use, and independently understand the data without consulting with its creators.
== Concepts== 
Metadata, in contrast to documentation, helps address discovery, access, use, and understanding by providing well-defined content in structured representations. This makes it possible for users to access and quickly understand many aspects of datasets that they have not collected. It also makes it possible to integrate metadata into discovery and analysis tools, and to provide consistent references from the metadata to external documentation.
Concepts can be described at a general level or include more detailed information, e.g. “Spatial Extent” is a high level metadata concept that can be addressed in a general manner; or it can include more detailed concepts like bounding latitude/longitude box or geographic identifiers.
*''Examples include but are not limited to: Notebooks, scientific papers, web pages, user guides, word processing documents, spreadsheets, data dictionaries, PDF’s, custom binary and ASCII formats, and many others — each with associated storage and preservation strategies.''
Different disciplines and communities have developed their own approaches to addressing the challenges of documenting data (or other resources) and steps used in collection, processing, and analysis of those data. Communities frequently refer to the results of these efforts as “metadata standards”. In addition, national and international standards bodies or agencies produce formal metadata standards usually with the intent of supporting some set of requirements (i.e. data discovery) across discipline and national boundaries.   
More often than not, the scientific process is documented, stored, and circulated using different tools and approaches depending on the needs of an exclusive group within the scientific community.  This customized, often unstructured approach may work well for independent investigators or in the confines of a particular community; but for users outside of these small groups, it creates significant complications with discovering, accessing, using, and understanding (Space keeper - link to Metadata Recommendations – Background) for an explanation of these 4 processes) the data.   
Unfortunately, this approach many times emphasizes differences between these communities. The ubiquitous “who, where, when, why, and how” questions must be answered in any discipline, so there is significant overlap between many of the concepts included in these “standards”. In fact, these standards are more like dialects of a universal documentation language then they are like separate languages. The term “metadata dialect” is introduced here as a substitute for “metadata standard” as an indication of a focus on universal documentation concepts rather than implementations in a particular “standard”.
== Spirals ==
[[File:Spiral.png|thumb|Visual Depiction of the Spiral Model]]
Data providers and managers can be overwhelmed as metadata dialects evolve and add complexity in response to increased use and understanding requirements. Dividing the metadata improvement process into a series of steps that can be explained and accomplished in a clear and orderly process can help overcome this barrier. This situation is similar to those encountered in large software development projects ([http://en.wikipedia.org/wiki/Spiral_model Spiral Model]) and, borrowing from that genre; these steps are termed “spirals”.
Spirals are collections of concepts needed to address specific use cases or requirements. They are independent of metadata dialects or implementations. As an example, consider a use case that requires a description of the spatial and temporal extent of a dataset. Conceptually this use case requires minima and maxima for horizontal and vertical coordinates and for time. These concepts could be implemented as eight metadata elements, a list of geographic or temporal keywords, a list of named geographic and temporal features, or some combination of these. In all cases, the concepts are the same.
'''Spiral Model:'''  Like any language, metadata dialects are living entities that must evolve and expand in response to newly developed requirements within user communities. This constantly escalating effort inherently introduces increasing levels of complexity. To promote progress and metadata improvement, a Spiral model that utilizes small, actionable iterations is employed. Following this model, communities are able to improve their metadata over time.
Metadata records are universally shared in XML because it provides structure and is easily processed using a variety of standard tools. Implementing a spiral for a particular metadata dialect requires a mapping between the concepts in the spirals and the XML implementation of the dialect. This mapping is described using the standard mechanism for describing XML elements, xPath ([http://en.wikipedia.org/wiki/XPath XPath]).
The metadata improvement process is divided into a series of steps that can be defined and accomplished in a clear and orderly manner.
== Recommendations ==
Each loop in the spiral represents a concept that is divided into 4 quadrants consisting of: 
Data and observations are many times collected within a small research group or organization in order to address specific scientific questions. Preparing data for re-use or sharing with other groups or non-experts brings a different set of documentation requirements and many research groups look for guidance about how to satisfy those new requirements. Many groups within the U.S. and the global environmental data community have addressed this need for guidance, generally in the form of lists of metadata elements required, recommended, or suggested for a particular documentation need. We term these lists recommendations. Table 1 lists the recommendations we considered so far in this work.
:*Determine Objectives
:*Identify and Resolve Risks
:*Development and Testing
:*Plan Next Iteration
There is no limit to the # of loops that can be added. 
Use of a Rubric is one method to assess the completeness of a spiral.
It is important to understand that recommendations are subsets of the capabilities of a given dialect that are thought by a particular group to be important for a particular metadata need. They are always a subset of the capabilities of the dialect.
As discussed above, the metadata improvement process is orchestrated through a series of spirals.  In order to measure progress toward improvement goals and characterize the task to be done, the state of the metadata record in the improvement process (spiral) must be quantitatively and consistently evaluated.  Once a set of spirals for a particular dialect are defined, this can be accomplished through use of a rubric. The rubric provides a description of the state of a record by arranging the spirals as rows in a table with degree of completeness shown as columns. As the record becomes more complete, the rubric score increases.
== Rubrics ==
== Recommendations ==
The metadata improvement process is planned as progress through a series of spirals. Quantitatively and consistently determining the state of a metadata record in the improvement process is important for characterizing the task to be done and for measuring progress towards improvement goals. Once a set of spirals and xPaths for a particular dialect are defined, the state of a metadata record can be characterized by counting the number of fields that exist for each spiral and showing the results using rubrics, a tool borrowed from the educational community (Rubric). The rubric provides a visual description of the state of a record by arranging the spirals as rows in a table with degree of completeness shown as columns. As the record becomes more complete, the rubric score increases.
Historically, metadata content has been approached in a variety of ways depending on the needs of specific user communities. This resulted in the development of multiple metadata “dialects” that must evolve and improve as metadata needs change.  The metadata improvement process is defined and depicted using spirals and the success of the spirals are quantitatively and consistently evaluated using rubrics.
Recommendations are the conclusion to this evaluation process.  They are the metadata elements that are required, recommended, or suggested for a particular community need. Publishing this information in the form of recommendation lists eliminates the need to “reinvent the wheel” and therefore facilitates maximum output for minimum effort.  Along those same lines, it should also be noted that recommendations are a subset of a given dialect’s capabilities necessary to satisfy the specific needs of a particular user community. By employing only what is necessary to accomplish the task at hand – time, effort, and energy are significantly conserved.
[[Category:Documentation Connections]]

Latest revision as of 16:07, September 16, 2017

Established metadata terminology is the result of a multi-decade, cooperative effort between metadata experts in NOAA, NASA and other U.S. Federal Agencies. The selection below is intended to provide a framework of basic terminology in order to facilitate understanding of fundamental concepts. More in-depth information and application examples are available in Habermann, 2014.


Collection A group of metadata records commonly organized by a data facility, organization or project and often stored in a database or web accessible folder.
Concept General term for describing a documentation entity. Concepts are independent of dialect so a single concept can occur in many dialects. They are typically represented (in XML) by an element or a collection of elements.
Dialect A particular representation of metadata that is specific to a community. Examples include Content Standard for Digital Geographic metadata (CSDGM), Directoy Interchange Format (DIF), ISO 19115-3 (XML representation of ISO 19115-1 conceptual model).
Dialect Maximum The number of concepts from a particular recommendation that are included in a particular dialect and, therefore, the maximum number of concepts from that recommendation that can be represented in the dialect. Note: the dialect maximum is always less than or equal to the number of concepts included in the recommendation (recommendation maximum).
Documentation The complete collection of unstructured written, drawn, presented or recorded materials necessary for discovering, accessing, understanding, and reproducing scientific data and results.
Element An item providing a value for a concept, typically in an XML representation. Elements depend on dialects. They are the instantiation of a concept in a dialect.
Level Recommendations may have different degrees of necessity associated with a concept's occurrence in a record e.g. mandatory, recommended, and suggested. These subsets of concepts within a recommendation are called levels.
Metadata Structured and standardized elements of scientific documentation
Recommendation A set of concepts that an organization identifies for achieving a documentation goal.
Recommendation Maximum The number of concepts included in a recommendation. Note that the recommendation maximum is the maximum completeness score available for a metadata record being evaluated with respect to that recommendation. The recommendation maxima are always greater than or equal to all dialect maxima for that recommendation.
Signature A series of numbers that give the number of concepts/elements missing from a metadata record (or a group of metadata records) in a series of spirals. Signatures with low numbers indicate fewer missing elements and a signature made up completely of 0's indicates a record or group of records that is complete with respect to a particular recommendation/dialect combination. A signature of 2 3 indicates that 2 elements are missing from the first spiral and 3 are missing from the second. The sum of the numbers in a signature is the total number of elements missing from a record or group of records.
Spiral A set of concepts required to support a particular documentation need or use case.



Concepts can be described at a general level or include more detailed information, e.g. “Spatial Extent” is a high level metadata concept that can be addressed in a general manner; or it can include more detailed concepts like bounding latitude/longitude box or geographic identifiers.


  • Examples include but are not limited to: Notebooks, scientific papers, web pages, user guides, word processing documents, spreadsheets, data dictionaries, PDF’s, custom binary and ASCII formats, and many others — each with associated storage and preservation strategies.

More often than not, the scientific process is documented, stored, and circulated using different tools and approaches depending on the needs of an exclusive group within the scientific community. This customized, often unstructured approach may work well for independent investigators or in the confines of a particular community; but for users outside of these small groups, it creates significant complications with discovering, accessing, using, and understanding (Space keeper - link to Metadata Recommendations – Background) for an explanation of these 4 processes) the data.


Visual Depiction of the Spiral Model

Spiral Model: Like any language, metadata dialects are living entities that must evolve and expand in response to newly developed requirements within user communities. This constantly escalating effort inherently introduces increasing levels of complexity. To promote progress and metadata improvement, a Spiral model that utilizes small, actionable iterations is employed. Following this model, communities are able to improve their metadata over time.

The metadata improvement process is divided into a series of steps that can be defined and accomplished in a clear and orderly manner.

Each loop in the spiral represents a concept that is divided into 4 quadrants consisting of:

  • Determine Objectives
  • Identify and Resolve Risks
  • Development and Testing
  • Plan Next Iteration

There is no limit to the # of loops that can be added.


Use of a Rubric is one method to assess the completeness of a spiral.

As discussed above, the metadata improvement process is orchestrated through a series of spirals. In order to measure progress toward improvement goals and characterize the task to be done, the state of the metadata record in the improvement process (spiral) must be quantitatively and consistently evaluated. Once a set of spirals for a particular dialect are defined, this can be accomplished through use of a rubric. The rubric provides a description of the state of a record by arranging the spirals as rows in a table with degree of completeness shown as columns. As the record becomes more complete, the rubric score increases.


Historically, metadata content has been approached in a variety of ways depending on the needs of specific user communities. This resulted in the development of multiple metadata “dialects” that must evolve and improve as metadata needs change. The metadata improvement process is defined and depicted using spirals and the success of the spirals are quantitatively and consistently evaluated using rubrics. Recommendations are the conclusion to this evaluation process. They are the metadata elements that are required, recommended, or suggested for a particular community need. Publishing this information in the form of recommendation lists eliminates the need to “reinvent the wheel” and therefore facilitates maximum output for minimum effort. Along those same lines, it should also be noted that recommendations are a subset of a given dialect’s capabilities necessary to satisfy the specific needs of a particular user community. By employing only what is necessary to accomplish the task at hand – time, effort, and energy are significantly conserved.