Difference between revisions of "Strategic Vision"

From Earth Science Information Partners (ESIP)
Line 24: Line 24:
  
 
* BCube - The BCube project is focused on demonstrating that brokering technologies can facilitate interoperability.  As a part of this project, a web-scale crawler has been developed that finds and characterizes data and web service descriptions found on the web.  Having scanned more than a million documents so far, a triple store containing these contents is being populated and will be available publicly.
 
* BCube - The BCube project is focused on demonstrating that brokering technologies can facilitate interoperability.  As a part of this project, a web-scale crawler has been developed that finds and characterizes data and web service descriptions found on the web.  Having scanned more than a million documents so far, a triple store containing these contents is being populated and will be available publicly.
 +
 +
* ClearEarth - The ClearEarth project is using NLP/ML processes to auto-generate semantic resources.
  
 
In creating this inventory, we also aim to identify dependencies - both internal and external. That is, what projects are affected by or leverage other projects. Can we make enhancements in multiple areas through advances on certain topics? For instance, one of the TestBed projects (Marshall Ma) could be enhanced if more ontologies were available via the ESIP Ontology Portal
 
In creating this inventory, we also aim to identify dependencies - both internal and external. That is, what projects are affected by or leverage other projects. Can we make enhancements in multiple areas through advances on certain topics? For instance, one of the TestBed projects (Marshall Ma) could be enhanced if more ontologies were available via the ESIP Ontology Portal

Revision as of 14:28, August 25, 2015

Introduction

The ESIP Semantic Web cluster is approaching 10 years old and is one of the oldest clusters within ESIP. During this decade of existence the tide has shifted to federal agencies such as NASA, DOE, and NSF CISE fully embracing semantic technologies. The amount of Linked Data is increasing at a staggering rate. There are new improvements for dataset descriptions (e.g. time and space) within commercial efforts such as schema.org. Semantic Web Technologies are now a mainstay in the geoscience information and data management community.

The time has come to move beyond prototypes and proofs of concept. To this end, the Semantic Web Cluster is developing a Strategic Vision and Road Map for the next 3 to 5 years. We aim to create a living document that will synthesize existing semantic efforts and guide future research and development. We want to coalesce our broad knowledge base and develop towards a common long term cyberinfrastructure. We would like to continue the tradition of the geosciences being an early adopter and feedback loop for the broader Semantic Web community.

In organizations such as the Research Data Alliance there is no semantics cluster or semantics working group. This is because Semantic Technologies permeate all aspects of data and information management. Within ESIP, we would also like to evolve to a position where these technologies and methodologies are similarly common place. To achieve these goals the Cluster will focus its efforts around

Current Projects

Test Bed Projects

Funding Friday Projects

External (to ESIP) Projects to which Cluster members are contributing

  • Global Change Master Directory (GCMD) Ontology - A GCMD Platform-Instrument-Sensor ontology that utilizes existing GCMD keyword hierarchies and SKOS concepts, as of 8/2015 the ontology was under review by NASA and not publicly available
  • [ https://data.globalchange.gov/ Global Change Information System ] - from the GCIS website "The GCIS is an open-source, web-based resource for traceable, sound global change data, information, and products. Designed for use by scientists, decision makers, and the public, the GCIS provides coordinated links to a select group of information products produced, maintained, and disseminated by government agencies and organizations. As well as guiding users to global change research products selected by the 13 member agencies, the GCIS serves as a key access point to assessments, reports, and tools produced by the USGCRP. The GCIS is managed, integrated, and curated by USGCRP."
  • [ http://toolmatch.esipfed.org/index ToolMatch ] - a semantic-based system for matching data to software tools and answering use cases such “I have data and need to know which tools I can use”, with an example being “I just downloaded an AIR Level 2 Standard retrieval file. How can I look at it?”. In addition to the project homepage there is also the [ http://github.com/ESIPFed/Toolmatch ToolMatch GitHub Repository ]
  • [ http://www.geolink.org GeoLink ] - The GeoLink project brings together experts from the geosciences, computer science, and library science in an effort to develop Semantic Web components that support discovery and reuse of data and knowledge. GeoLink's participating repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.
  • BCube - The BCube project is focused on demonstrating that brokering technologies can facilitate interoperability. As a part of this project, a web-scale crawler has been developed that finds and characterizes data and web service descriptions found on the web. Having scanned more than a million documents so far, a triple store containing these contents is being populated and will be available publicly.
  • ClearEarth - The ClearEarth project is using NLP/ML processes to auto-generate semantic resources.

In creating this inventory, we also aim to identify dependencies - both internal and external. That is, what projects are affected by or leverage other projects. Can we make enhancements in multiple areas through advances on certain topics? For instance, one of the TestBed projects (Marshall Ma) could be enhanced if more ontologies were available via the ESIP Ontology Portal

Action Items

  1. Identify internal and external dependencies

The Linked Science Cloud

Linked Science Cloud or Linked Earth Science Cloud?

The amount of Linked Data is increasing at a staggering rate. Yet, while there exists a handful of science datasets in the Linked Data Cloud, the potential for continuing to publish and interlink geoscience datasets is immense. To this end, the Cluster will coordinate the creation of the Linked Science Cloud.

Action Items

  1. Identify existing geoscience Linked Data sets which could be part of the Linked Science Cloud
  2. Identify potential links between these datasets
  3. Identify available GeoSPARQL endpoints for demonstration and evaluation of spatial and spatio-temporal queries

Provenance

The recent W3C PROV-O recommendation has provided an accessible solution for including provenance in semantic information systems. Moreover, the Cluster recognizes that provenance is becoming essential to many geoscience applications. To this end, semantic provenance should be an essential component of our semantic systems. The Cluster will take semantic provenance as a high priority when evaluating and planning future projects.

Ontology Development and Ontology Portal

not to go straight to encoding of an ontology…but rather to start with conceptual and information models to have a diversity of choices of how to encode. Need to think about Syntax, semantics, and pragmatics. Pragmatics for use. Keeping the human in the loop. Cognitive science aspect. Time to return to process ontologies (not just things, but also processes) (from peter fox's comments) NEED TO EXPAND ON THIS

Also need to keep in mind the multiple perspectives on ontology engineering. There are domain ontologies and there is a relatively new methodology around so-called Ontology Design Patterns. The cluster should follow the broader semantic web community and watch as the current research in these areas

Open Questions

  1. How do we measure uptake and usage of ontologies in the portal?
  2. What are reasonable metrics and goals in this area?

Ontology Governance

The Cluster will adopt some form of the [ http://www.obofoundry.org/crit.shtml The Open Biological and Biomedical Ontologies] principals for ontology governance. Specifically how this is to be accomplished is still open for debate?

Long Term Cyberinfrastructure

At present, there is a known challenge in transitioning from prototype applications to production quality cyberinfrastructure. There is often difficulty in identifying long term hosting and maintenance for our Semantic Web projects. To be fair, this is a common issue among most grant funded information technology projects. The Semantic Web Cluster, and ESIP in general, are committed to identifying new and promising solutions. We are continually working with [ Products_and_Services | ESIP Products and Services ] to explore possibilities.

We have also identified a need to encourage real-time sharing of progress from Testbed projects with the broader ESIP community. The Semantic Web Cluster is exploring the use of Semantic Technologies in this area and will serve as a willing test bed for further development.

Open Questions

  1. Are we following all of the OBO Foundry principals? If no, then which subset will the Cluster abide by?

Ideas, Thoughts, Comments

  • Being a visual person, I think it would be a good idea to have a diagram of the different areas the cluster is exploring. Perhaps a network diagram with areas as nodes and relationships between the areas as named edges. It would also be useful to add the names of people working in that area to each node.
  • How do we evaluate/benchmark our progress along this road map?