Recommendations for Semantic Web Markup of Existing XML

From Earth Science Information Partners (ESIP)

Overview

The Air Quality Cluster would benefit from advice from the Semantic Web Cluster on how to add semantic markup to the XML-based OGC WMS Capabilities document to identify dataset name, type of data, domain, etc., in order to support a faceted search.

Use Case: Semantic Markup of WMS Capabilities Documents

The Air Quality Cluster is experimenting with using some kind of structured markup / tagging of OGC WMS and WCS capabilities documents (inside <Keyword> elements) to allow us to do structured searches on the documents. An example might be, "give me the layers where Dataset = 'OMI_AI_G'". See WMS_GetCapabilities#WMS_GetCapabilities_Layer_Description

However, if we are going to try to implement this kind of markup with a quasi-controlled vocabulary, we should do it in such a way that it is compatible with or even leverages the semantic web. A machine tags approach has been considered, e.g.

<Keyword>esip:dataset=OMI_AI_G</Keyword>

A link to an initial attempt of a WMS that includes the current keyword encoding: [1]. In the actual use case, this is used in a faceted search engine. Clearly, if this started out as a form of RDF, it would already be amenable to faceted search

An alternative using XLink has been proposed as well. (XlinkMarkupExample)

Alternatively, RDFa was considered, but it is mostly defined in the context of XHTML.

Can the ESIP Semantic Web cluster provide a recommendation or suggestion in how to move forward that would be:

  1. flexible and extensible,
  2. compatible with the evolving ESIP datatype and services ontology and
  3. lightweight and easy to use

For any proposed solution, it would be extremely helpful to provide:

  1. an example of implementation, based on the current case at [2]
  2. an assessment of the scheme's usability by semantic web newbies
  3. pointers to existing tools that can work with the proposed solution, if they exist
  4. for bonus points, can the scheme be chained? That is, if I have
    <Keyword>Platform:Satellite</Keyword>
    can I also have something like
    <Keyword>Satellite:Aura</Keyword>
  5. for even more bonus points, can the scheme be extended to, say, extend the OpenSearch Atom response to return structured metadata?

Resources

http://gcmd.nasa.gov/Resources/valids/archives/keyword_list.html

http://www.w3.org/2005/Incubator/ssn/wiki/Semantic_Mark_up -- Work in W3C SSN addressing this topic