Applications of Semantic Web for Earth Science

From Earth Science Information Partners (ESIP)
Revision as of 17:09, February 28, 2012 by JamesGallagher (talk | contribs)

Introduction

Semantic web technology is becoming ever more important in Earth Science applications in a number of diverse roles. Furthermore, it is likely to become an even more important enabler as ambitious data science efforts, such as the Earth Cube initiative and ESIP's own Earth Science Collaboratory, more forward. These enterprises seek to make it easier to bring disparate datasets together as well as disparate disciplines and even communities in an effort to leverage our burgeoning data in the service of understanding the Earth as a system. As these various resources and the communities leveraging them diversify, the need for semantic technology to help users navigate the sea of resources becomes more apparent. Indeed, this role in discovery is acknowledged in the key capabilities determined through the first EarthCube Charrette.

However, we should not neglect the important role semantic technology can and does play in other aspects of data for Earth Sciences. For instance, semantic technology can be found in a key role in several other areas noted in the Earth Cube charrette capabilities:

  • Automated Quality Assurance and Quality Control
  • Provenance capture and interpretation
  • Workflow construction
  • Data fusion

Many such applications use underpinned by semantic technology, with the result that its value is not always readily apparent. In this short white paper, we discuss several ongoing or completed projects and applications that use semantic web as an underpinning in order to raise awareness of this critical technology.

Data Quality Screening Service

by Christopher Lynnes, NASA/GSFC

The Data Quality Screening Service (DQSS) is designed to help automate the filtering of remote sensing data on behalf of science users. Whereas this process today involves much research through quality documents, followed by laborious coding, the DQSS acts as a Web Service to provide data users with data pre-filtered to their particular criteria, while at the same time guiding the user with filtering recommendations of the cognizant data experts. Data that do not pass the criteria are replaced with fill values, resulting in a file that has the same structure and is usable in the same ways as the original (Fig. 1).

Fig.1. Data Quality Screening Service showing a data array before screening, the quality criteria mask used for screening and the data array after screening. The scene is for Total Precipitable Water over Hurricane Ike on 9 September 2008. The figure on the left shows anomalously dry areas on the east side of the hurricane; however, these turn out to be low quality retrievals (center) and thus are removed from the data array by the screening process.

At the core of DQSS is an ontology that describes data fields, the quality fields for applying quality control and the interpretations of quality criteria. This allows a generalized code base that can nonetheless handle both a variety of datasets and a variety of quality control schemes. Indeed, a data collection can be added to the DQSS simply by registering instances in the ontology if it follows a quality scheme that is already modeled in the ontology. This will allow DQSS to scale to more data products with minimal cost.

For more on DQSS, see http://disc.sci.gsfc.nasa.gov/services/data-quality-screening-service.

Earth and Space Science Informatics Linked Open Data

by Tom Narock (University of Maryland, Baltimore County) and Eric Rozell (Rensselaer Polytechnic Institute)


Linked Open Data (LOD) is a data publishing methodology comprised of four simple principles.

  • Use unique identifiers (URIs) as names
  • Make those identifiers dereferenceable via HTTP
  • Dereferencing an identifier should return RDF (the data representation language of the semantic web)
  • Include links to other data sets

Following these principles results in structured data with explicit semantics. As a result, data from different sources can be connected and queried. Earth and Space Science Informatics Linked Open Data (ESSI-LOD) is a project aimed at creating Linked Open Data within the Earth and Space Sciences. Initial data sets to this project include historical conference data from the American Geophysical Union (AGU) as well as membership and meeting data from the Federation of Earth Science Information Partners (ESIP). Many members of the Earth science community participate in both AGU and ESIP, and there are many implicit relationships between the groups. However, answering questions across the two organizations has been difficult due to data being stored in proprietary and non-interoperable formats.

ESSI-LOD created Linked Open Data to alleviate these limitations. We converted 7 years of AGU conference data (2005-2010)and 5 years of ESIP membership/meeting data (2007-2011) into RDF and exposed it as Linked Open Data. Links between the data sets were made at the person level where we identified identities between AGU authors and ESIP members. Exposing LOD has opened the data to cross-organizational question answering, collaboration discovery, analysis of trends, and insight into the underlying social network. Moreover, LOD is scalable and easily affords other organizations the ability to link their LOD data to ESSI-LOD - thus further extending data fusion and querying capabilities.

The benefits of ESSI-LOD have been enabled by simple semantic web principles and reusable vocabularies. These semantic web technologies have facilitated rapid browsing (Figure 1), querying, visualizing (Figure 2), and extensibility not easily obtainable with the original data sets.

For more information on ESSI-LOD visit: http://essi-lod.org/

Fig.1. A web interface to ESSI-LOD. This tool is based on open source software and allows browsing of the linked data. Session, Authors, and Keywords are all links and allow users to interactively explore the linked data graph.
Fig.2. A visualization of ESSI-LOD data showing the co-authorship network of ESIP members who have attended AGU conferences. Each node represents an author and edges indicate co-authorship on an AGU conference presentation. The visualization illustrates cross-organizational querying by limiting the AGU authors to only those who are also ESIP members.

OPeNDAP's Hyrax Data Server provides RDF

by James Gallagher and Nathan Potter, OPeNDAP, Inc.