Problem Statement and Use Case

For a given dataset, it is difficult to find the tools that can be used to work with the dataset. In many cases, the information that Tool A works with Dataset B is somewhere on the Web, but not in a readily identifiable or discoverable form. In other cases, particularly more generalized tools, the information does not exist at all, until somebody tries to use the tool on a given dataset.

Thus, the simplest, most prevalent use case is for a user to search for the tools that can be used with a given dataset. A further refinement would be to specify what the tool can do with the dataset, e.g., read, visualize, map, analyze, reformat.

Proposed Solution

Often, whether a tool is likely to work with a dataset can be inferred through simple rules. For example, knowing that a data is available in netcdf/CF1 and is on a lat/long grid is typically sufficient to infer the data can be viewed through Panoply. Secondly, the problem lends itself to crowdsourcing: once one user has found a given tool to be usable with a given dataset, this holds true for all users, and so the information should be promulgated.

We propose the construction of RDF triples that record the fact that a tool works with a particular dataset. It would be based on a simple ontology, with minimal information about the dataset (enough to uniquely identify it and present it as an option in a user interface). There would be slightly more information captured for the tool. A simple user interface would allow a user to select a dataset or paste in a unique dataset identifier.

Requirements

Datasets should be identifiable either through GCMD DIF ID or DOI
A sipmle interface

ToolMatch

Problem Statement and Use Case

Proposed Solution

Requirements