ToolMatch

From Earth Science Information Partners (ESIP)
Revision as of 09:57, January 10, 2014 by Pwest (talk | contribs)

What's New?

Problem Statement and Use Case

For a given dataset, it is difficult to find the tools that can be used to work with the dataset. In many cases, the information that Tool A works with Dataset B is somewhere on the Web, but not in a readily identifiable or discoverable form. In other cases, particularly more generalized tools, the information does not exist at all, until somebody tries to use the tool on a given dataset.

Thus, the simplest, most prevalent use case is for a user to search for the tools that can be used with a given dataset. A further refinement would be to specify what the tool can do with the dataset, e.g., read, visualize, map, analyze, reformat.

Note that the Energy Cluster is actually looking for this kind of tool. (See Rahul.)

Proposed Solution

Often, whether a tool is likely to work with a dataset can be inferred through simple rules. For example, knowing that a data is available in netcdf/CF1 and is on a lat/long grid is typically sufficient to infer the data can be viewed through Panoply. Secondly, the problem lends itself to crowdsourcing: once one user has found a given tool to be usable with a given dataset, this holds true for all users, and so the information should be promulgated.

We propose the construction of RDF triples that record the fact that a tool works with a particular dataset. It would be based on a simple ontology, with minimal information about the dataset (enough to uniquely identify it and present it as an option in a user interface). There would be slightly more information captured for the tool. A simple user interface would allow a user to select a dataset or paste in a unique dataset identifier.

Requirements

  • Tools can be either downloadable tools or online services
  • Datasets should be identifiable either through GCMD DIF ID or DOI.
  • Reformatted data and reformatting services (WCS, OPeNDAP) should be considered in compatibility.
  • A simple User interface should provide the ability to search for tools compatible with a certain dataset
  • Users should be able to see a brief description of the tool.
  • Users should be presented with a website for the tool in the search results.
  • A simple mechanism should be provided for authoring the RDF links. (This does not mean they will author RDF directly in, say, TTL or XML. They might do so through simple hashtags).
  • The "system" should be able to do some simple inferencing.

Artifacts

Random Notes

Relationship to Servicecasting

This could be an underlying infrastructure for generating service casts...

Inferencing

There are three kinds of inferences that can greatly cut down the cost of authoring the RDF.

  1. For tools based on netcdf-java, the likelihood of usability can often be inferred from availability in netCDF or OPeNDAP and presence of CF-1 coordinates. Note that this is not an ironclad guarantee however.
  2. Many data collections have "homeomorphic" siblings: datasets with the same or similar variables and data format. For example, AIRH2RET, AIRS2RET and AIRX2RET are AIRS Level 2 datasets that all have the same variables (roughly) and data structure and format. Usability for one in a given tool strongly implies usability for a sibling dataset.
  3. Some tools by their nature are NOT suited to an entire class of datasets, such as GrADS for mapping Level 2 data.

Tool Scripts

  1. Some tools are actually scripts written to work within other tools. An example of this is the opengrads cookbook, which contains scripts for reading and working with certain data collections in the GrADS tool.

Implementation Plan

  1. Have a telecon to work on the model
  2. Telecon to work on implementation plan
  3. Implement pieces offline
  4. Pull it all together at the Summer Meeting

Presentations

File:WinterMtg2013 ToolMatch.pptx ESIP Winter Meeting 2014 Presentation ESIP Winter Meeting 2014 Poster

SIGNUP SHEET

Made a spot for you to leave your "~~~~" if you want to contribute to this collaboration.:

  • Clynnes 12:25, 9 January 2012 (MST)
  • Hook 14:20, 9 January 2012 (MST)
  • JamesGallagher 13:58, 24 January 2012 (MST)
  • udadi 13:58, 24 January 2012 (MST)
  • Bdwilson 13:59, 25 January 2012 (MST)
  • Rozele 10:24, 27 February 2012 (MST)
  • Thuang 23:20, 28 March 2012 (MST)