Difference between revisions of "ToolMatch"

From Earth Science Information Partners (ESIP)
 
(76 intermediate revisions by 11 users not shown)
Line 1: Line 1:
 +
= What's New? =
 +
* [https://docs.google.com/spreadsheets/d/1XiCun7yIaGj3v5eXzG8yXAjAha7tAcZZT4tTnpguPEU/edit#gid=403447662 Glossary of Terms] (gdoc spreadsheet)
 +
* [https://docs.google.com/spreadsheets/d/1tHZPZ4xtkpzEWR_l5T1t7muOqmtLJ4EBZnID3lPKtIA/edit#gid=72976985 Data Collections and Tools Characteristics] (gdoc spreadsheet)
 +
* [https://toolmatch.hackpad.com/Glossary-Ontology-Questions-TNQPPvUmcWg Glossary/Ontology questions] (hackpad)
 +
* [https://toolmatch.hackpad.com/Issues-with-Data-Collection-Tool-Identification-oiELoplmv2s Issues with data collection and tool identification] (discussions pieced together on hackpad)
 +
* [[ToolMatch To Do]]
 +
* Working meeting recordings are hosted on the [https://github.com/ESIPFed/toolmatch-ontology ToolMatch Ontology Github repository].
 +
:: To download, go to the link and click the '''Raw''' button.
 +
:: The [https://github.com/ESIPFed/toolmatch-ontology/blob/master/meetings/Toolmatch_20160323.mp4 March 23, 2016] recording includes the most recent discussion (primarily) focused on term definitions.
 +
:: [https://github.com/ESIPFed/toolmatch-ontology/blob/master/meetings/Toolmatch_20160328.mp4 March 28, 2016] recording, term definitions and possible ontology alignment.
 +
<br />
 +
'''previous'''
 +
* March 13, 2014 - [https://docs.google.com/spreadsheet/ccc?key=0AjMhAsKzul-tdFNuaVlnMWZrb1ZVZmFZT2g4LTBfSHc&usp=drive_web#gid=0 Tool Spreadsheet]
 +
* [[ToolMatch To Do]]
 +
* [[ToolMatch_Talkoot|Crowdsourcing ToolMatch population at the ESIP Meeting]]
 +
 
= Problem Statement and Use Case =
 
= Problem Statement and Use Case =
 
For a given dataset, it is difficult to find the tools that can be used to work with the dataset.  In many cases, the information that Tool A works with Dataset B is somewhere on the Web, but not in a readily identifiable or discoverable form.  In other cases, particularly more generalized tools, the information does not exist at all, until somebody tries to use the tool on a given dataset.
 
For a given dataset, it is difficult to find the tools that can be used to work with the dataset.  In many cases, the information that Tool A works with Dataset B is somewhere on the Web, but not in a readily identifiable or discoverable form.  In other cases, particularly more generalized tools, the information does not exist at all, until somebody tries to use the tool on a given dataset.
  
Thus, the simplest, most prevalent use case is for a user to search for the tools that can be used with a given dataset.  A further refinement would be to specify what the tool can do with the dataset, e.g., read, visualize, map, analyze, reformat.
+
Thus, the simplest, most prevalent use case is for a user to search for the tools that can be used with a given dataset.  A further refinement would be to specify what the tool can do with the dataset, e.g., read, visualize, map, analyze, reformat.
 +
 
 +
* Note that the Energy Cluster is actually looking for this kind of tool. (See Rahul.)
 +
 
 +
* Example Collection with tools that can be used with the collection
 +
** [[ Media:ToolsTable.pptx | C. Lynnes' Tools Table based on Model version 0.2 ]]
  
 
= Proposed Solution =
 
= Proposed Solution =
Line 18: Line 39:
 
* A simple mechanism should be provided for authoring the RDF links.  (This does not mean they will author RDF directly in, say, TTL or XML. They might do so through simple hashtags).
 
* A simple mechanism should be provided for authoring the RDF links.  (This does not mean they will author RDF directly in, say, TTL or XML. They might do so through simple hashtags).
 
* The "system" should be able to do some simple inferencing.
 
* The "system" should be able to do some simple inferencing.
 +
 +
= Artifacts =
 +
* [[ToolMatch Model]]
 +
* [[ToolMatch Proposal 0.2]]
 +
* [[ToolMatch Proposal 0.3]]
 +
* [[ToolMatch Identifiers]]
 +
* [[ToolMatch Publishing]]
 +
 
= Random Notes =
 
= Random Notes =
 +
== Relationship to Servicecasting ==
 +
This could be an underlying infrastructure for generating service casts...
 
== Inferencing ==
 
== Inferencing ==
 
There are three kinds of inferences that can greatly cut down the cost of authoring the RDF.
 
There are three kinds of inferences that can greatly cut down the cost of authoring the RDF.
Line 24: Line 55:
 
# Many data collections have "homeomorphic" siblings:  datasets with the same or similar variables and data format.  For example, AIRH2RET, AIRS2RET and AIRX2RET are AIRS Level 2 datasets that all have the same variables (roughly) and data structure and format.  Usability for one in a given tool strongly implies usability for a sibling dataset.
 
# Many data collections have "homeomorphic" siblings:  datasets with the same or similar variables and data format.  For example, AIRH2RET, AIRS2RET and AIRX2RET are AIRS Level 2 datasets that all have the same variables (roughly) and data structure and format.  Usability for one in a given tool strongly implies usability for a sibling dataset.
 
# Some tools by their nature are NOT suited to an entire class of datasets, such as GrADS for mapping Level 2 data.
 
# Some tools by their nature are NOT suited to an entire class of datasets, such as GrADS for mapping Level 2 data.
 +
 
== Tool Scripts ==
 
== Tool Scripts ==
 
# Some tools are actually scripts written to work ''within other tools''.  An example of this is the opengrads cookbook, which contains scripts for reading and working with certain data collections in the GrADS tool.
 
# Some tools are actually scripts written to work ''within other tools''.  An example of this is the opengrads cookbook, which contains scripts for reading and working with certain data collections in the GrADS tool.
 +
 +
=Implementation Plan=
 +
'''2016'''
 +
* TBD<br />
 +
<br />
 +
'''2013'''<br />
 +
Per ESIP Winter 2013 Meeting:
 +
# Continue identifying tools that could be included and describing pertinent characteristics
 +
# Identify collections that could be used as test cases
 +
# Adjust SADL and CMAP representations of ToolMatch models as they evolve
 +
# Coordinate with ESIP Energy Cluster as they are working on similar efforts
 +
# Identify next steps in joint telecons with the Energy Cluster & other interested parties
 +
# Create a plan for realistic implementation to be demonstrated at ESIP Summer mtg 2014
 +
 +
=Presentations=
 +
* ESIP Summer 2015 Presentation(s)
 +
* [http://tw.rpi.edu/web/doc/ESIPWinter2014_ToolMatch_Presentation ESIP Winter 2014 Presentation]
 +
* [http://tw.rpi.edu/web/doc/ESIPWinter2014_ToolMatch_PatrickWest ESIP Winter 2014 Poster]
 +
* [http://tw.rpi.edu/web/doc/TWeD/2014/Spring/OPeNDAP_And_ToolMatch Tetherless World Constellation TWeD Talk]
 +
* [http://wiki.esipfed.org/index.php/File:WinterMtg2013_ToolMatch.pptx ESIP Winter 2013 Presentation]
 +
 +
=Meetings=
 +
Meetings from June 2015 to present are documented on the [https://toolmatch.hackpad.com/ ToolMatch Hackpad]<br />
 +
'''Call in information'''<br />
 +
We are now using [https://www.gotomeeting.com/join/156154397 GoToMeeting]
 +
* You may also dial in using your phone: +1 (872) 240-3212
 +
* if prompted, the access code is: 156-154-397<br />
 +
<br />
 +
'''2015'''
 +
* [https://toolmatch.hackpad.com/4NWBTUR5BHg June 01, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/pFf4sNNfrJv May 18, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/tGwcciVEcIv May 18, 2015] - Status and Task List
 +
* [https://toolmatch.hackpad.com/onqmA8Sc29s May 03, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/9u05gFYCElB April 13, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/str5PRKihaI April 06, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/KxVO9DsRtKo March 30, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/NTrgk9C2lre March 09, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/XfT1YP7MPIz January 26, 2015] - Status Meeting
 +
* [https://toolmatch.hackpad.com/4Gk1MIt4mQy AGU 2014 and ESIP 2015 Meeting Notes]
 +
'''2014'''
 +
* [https://toolmatch.hackpad.com/k7adpkxNaYV December 08, 2014]
 +
* [https://toolmatch.hackpad.com/oYxkiA7BlZi December 01, 2014]
 +
* [https://toolmatch.hackpad.com/o90w9A49eiH November 17, 2014]
 +
* [https://toolmatch.hackpad.com/va0T0t0M6uS November 10, 2014]
 +
* [https://toolmatch.hackpad.com/dyh0NrnkAKb October 27, 2014]
 +
* [https://toolmatch.hackpad.com/FKE6IW4biDr October 20, 2014]
 +
* [https://toolmatch.hackpad.com/9OUlJ9edV8i September 20, 2014]
 +
* [https://toolmatch.hackpad.com/DYPWiII8YGp August 13, 2014]
 +
* [https://toolmatch.hackpad.com/WXbD690c7i0 August 06, 2014]
 +
* [https://toolmatch.hackpad.com/MTmq46C0vcC July 30, 2014]
 +
* [https://toolmatch.hackpad.com/VkKhJWP8cK9 July 11, 2014 - ESIP Summer Meeting ToolMatch session]
 +
* [http://commons.esipfed.org/node/2360 July 11, 2014 - ESIP Summer Meeting ToolMatch session - convener notes]
 +
* [https://toolmatch.hackpad.com/Ou0xkZO6igi June 25, 2014]
 +
* [https://toolmatch.hackpad.com/Bh5Ob0gOeeI June 04, 2014]
 +
* [https://toolmatch.hackpad.com/kygetSzCDsz May 28, 2014]
 +
* [https://toolmatch.hackpad.com/KX2hGMBe3mV May 20, 2014]
 +
* [https://toolmatch.hackpad.com/EAga66mLIhS April 01, 2014]
 +
* [https://toolmatch.hackpad.com/IYSPC176sC4 March 13, 2014]
 +
* [https://toolmatch.hackpad.com/DmUelyvX34K March 05, 2014]
 +
* [https://toolmatch.hackpad.com/0nMrDudY0h0 January 31, 2014]
 +
'''2013'''
 +
* [https://toolmatch.hackpad.com/BQlpfYFKCdc December 03, 2013]
 +
 
= SIGNUP SHEET =
 
= SIGNUP SHEET =
Made a spot for you to leave your <nowiki>"~~~~"</nowiki> if you want to contribute to this collaboration.:<br>
+
Make a spot for you to leave your <nowiki>"~~~~"</nowiki> if you want to contribute to this collaboration.:<br>
 
* [[User:Clynnes|Clynnes]] 12:25, 9 January 2012 (MST)
 
* [[User:Clynnes|Clynnes]] 12:25, 9 January 2012 (MST)
*
+
* [[User:Hook|Hook]] 14:20, 9 January 2012 (MST)
 +
* [[User:JamesGallagher|JamesGallagher]] 13:58, 24 January 2012 (MST)
 +
* [[User:udadi|udadi]] 13:58, 24 January 2012 (MST)
 +
* [[User:Bdwilson|Bdwilson]] 13:59, 25 January 2012 (MST)
 +
* [[User:Rozele|Rozele]] 10:24, 27 February 2012 (MST)
 +
* [[User:Thuang|Thuang]] 23:20, 28 March 2012 (MST)
 +
* [[User:Nhoebelheinrich|Nancy Hoebelheinrich]] 11:51, 26 July 2013 (PST)
 +
* [[User:Brandonnodnarb|Brandon Whitehead]] 16:27, 6 June 2015 (UTC)

Latest revision as of 15:49, March 28, 2016

What's New?

To download, go to the link and click the Raw button.
The March 23, 2016 recording includes the most recent discussion (primarily) focused on term definitions.
March 28, 2016 recording, term definitions and possible ontology alignment.


previous

Problem Statement and Use Case

For a given dataset, it is difficult to find the tools that can be used to work with the dataset. In many cases, the information that Tool A works with Dataset B is somewhere on the Web, but not in a readily identifiable or discoverable form. In other cases, particularly more generalized tools, the information does not exist at all, until somebody tries to use the tool on a given dataset.

Thus, the simplest, most prevalent use case is for a user to search for the tools that can be used with a given dataset. A further refinement would be to specify what the tool can do with the dataset, e.g., read, visualize, map, analyze, reformat.

  • Note that the Energy Cluster is actually looking for this kind of tool. (See Rahul.)

Proposed Solution

Often, whether a tool is likely to work with a dataset can be inferred through simple rules. For example, knowing that a data is available in netcdf/CF1 and is on a lat/long grid is typically sufficient to infer the data can be viewed through Panoply. Secondly, the problem lends itself to crowdsourcing: once one user has found a given tool to be usable with a given dataset, this holds true for all users, and so the information should be promulgated.

We propose the construction of RDF triples that record the fact that a tool works with a particular dataset. It would be based on a simple ontology, with minimal information about the dataset (enough to uniquely identify it and present it as an option in a user interface). There would be slightly more information captured for the tool. A simple user interface would allow a user to select a dataset or paste in a unique dataset identifier.

Requirements

  • Tools can be either downloadable tools or online services
  • Datasets should be identifiable either through GCMD DIF ID or DOI.
  • Reformatted data and reformatting services (WCS, OPeNDAP) should be considered in compatibility.
  • A simple User interface should provide the ability to search for tools compatible with a certain dataset
  • Users should be able to see a brief description of the tool.
  • Users should be presented with a website for the tool in the search results.
  • A simple mechanism should be provided for authoring the RDF links. (This does not mean they will author RDF directly in, say, TTL or XML. They might do so through simple hashtags).
  • The "system" should be able to do some simple inferencing.

Artifacts

Random Notes

Relationship to Servicecasting

This could be an underlying infrastructure for generating service casts...

Inferencing

There are three kinds of inferences that can greatly cut down the cost of authoring the RDF.

  1. For tools based on netcdf-java, the likelihood of usability can often be inferred from availability in netCDF or OPeNDAP and presence of CF-1 coordinates. Note that this is not an ironclad guarantee however.
  2. Many data collections have "homeomorphic" siblings: datasets with the same or similar variables and data format. For example, AIRH2RET, AIRS2RET and AIRX2RET are AIRS Level 2 datasets that all have the same variables (roughly) and data structure and format. Usability for one in a given tool strongly implies usability for a sibling dataset.
  3. Some tools by their nature are NOT suited to an entire class of datasets, such as GrADS for mapping Level 2 data.

Tool Scripts

  1. Some tools are actually scripts written to work within other tools. An example of this is the opengrads cookbook, which contains scripts for reading and working with certain data collections in the GrADS tool.

Implementation Plan

2016

  • TBD


2013
Per ESIP Winter 2013 Meeting:

  1. Continue identifying tools that could be included and describing pertinent characteristics
  2. Identify collections that could be used as test cases
  3. Adjust SADL and CMAP representations of ToolMatch models as they evolve
  4. Coordinate with ESIP Energy Cluster as they are working on similar efforts
  5. Identify next steps in joint telecons with the Energy Cluster & other interested parties
  6. Create a plan for realistic implementation to be demonstrated at ESIP Summer mtg 2014

Presentations

Meetings

Meetings from June 2015 to present are documented on the ToolMatch Hackpad
Call in information
We are now using GoToMeeting

  • You may also dial in using your phone: +1 (872) 240-3212
  • if prompted, the access code is: 156-154-397


2015

2014

2013

SIGNUP SHEET

Make a spot for you to leave your "~~~~" if you want to contribute to this collaboration.: