Difference between revisions of "Telecon (2019-08-07)"

From Earth Science Information Partners (ESIP)
 
(60 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Return to [[Archive]]
 
Return to [[Archive]]
  
===Attendees===
+
==Attendees==
 +
* Stevan Earl
 +
* Colin Smith
  
===Agenda & Notes===
+
==Agenda & Notes==
 
This agenda is largely informed by our 2019 ESIP Session ([https://docs.google.com/document/d/1Xg_1QV0CzAp_cVuVaWTQeealeFU5hyDCgD6nGidHUgk/edit?usp=sharing notes are here]).
 
This agenda is largely informed by our 2019 ESIP Session ([https://docs.google.com/document/d/1Xg_1QV0CzAp_cVuVaWTQeealeFU5hyDCgD6nGidHUgk/edit?usp=sharing notes are here]).
  
 
''Agenda notes are italicized.''
 
''Agenda notes are italicized.''
  
'''Refine IMCR scope''' - Uncertainty about project scope largely stems from an inability to prioritize use cases.
+
===Refine IMCR scope===
 +
Uncertainty about project scope largely stems from an inability to prioritize use cases.
 
* Who are the primary constituents we are trying to serve and what are their use cases? - ''Information managers (or anyone playing an information management role) in the environmental and ecological sciences. We are focusing on this domain first since it's where the expertise of this clusters contributors are from. Once methods are formalized and robust, we will expand to other domains.''
 
* Who are the primary constituents we are trying to serve and what are their use cases? - ''Information managers (or anyone playing an information management role) in the environmental and ecological sciences. We are focusing on this domain first since it's where the expertise of this clusters contributors are from. Once methods are formalized and robust, we will expand to other domains.''
 
* Who are ancillary constituents and what do they need? - ''Ancillary constituents are machines that need detailed metadata for each software library function with annotation to ontologies.''
 
* Who are ancillary constituents and what do they need? - ''Ancillary constituents are machines that need detailed metadata for each software library function with annotation to ontologies.''
Line 17: Line 20:
 
* Explicitly communicate this in a project scope. - ''Will do.''
 
* Explicitly communicate this in a project scope. - ''Will do.''
  
'''Improve software metadata so IMCR is useful to platforms like the [http://datadiscoverystudio.org/geoportal/#searchPanel Data Discovery Studio]'''
+
===Improve software metadata ===
 +
... so IMCR is useful to platforms like the [http://datadiscoverystudio.org/geoportal/#searchPanel Data Discovery Studio]
 
* What level of detail needs to be added? - ''Function level inputs and outputs.''
 
* What level of detail needs to be added? - ''Function level inputs and outputs.''
 
* How can we perform this task? - ''This may differ by implementation language and community of practice. This is nearly impossible to access programmatically for the R language. Perhaps this is simpler for others?''
 
* How can we perform this task? - ''This may differ by implementation language and community of practice. This is nearly impossible to access programmatically for the R language. Perhaps this is simpler for others?''
Line 23: Line 27:
 
* What priority does this task have? - ''Low. See prioritized list of action items below.''
 
* What priority does this task have? - ''Low. See prioritized list of action items below.''
  
'''Reconsider controlled vocabulary structure and terms''' - The current controlled vocabulary is a customized construct designed to serve the non-expert data manager to the detriment of excluding others. Aligning the IMCR vocabulary with established vocabularies (e.g. SWEET, GCMD) would enable machine actionability and inclusion in platforms like the Data Discovery Studio.
+
===Reconsider controlled vocabulary structure and terms===
 +
The current controlled vocabulary is a customized construct designed to serve the non-expert data manager to the detriment of excluding others. Aligning the IMCR vocabulary with established vocabularies (e.g. SWEET, GCMD) would enable machine actionability and inclusion in platforms like the Data Discovery Studio.
 
* Do we want to continue with the custom vocabulary or align with established vocabularies (can we do both)? - ''We will keep the current scheme as it is one commonly held in the minds of our primary constituents. We'll map the IMCR terms to other vocabs when possible and consider integration at a later time.''
 
* Do we want to continue with the custom vocabulary or align with established vocabularies (can we do both)? - ''We will keep the current scheme as it is one commonly held in the minds of our primary constituents. We'll map the IMCR terms to other vocabs when possible and consider integration at a later time.''
  
'''Scraping and sorting the web for IM software''' - This is being done by others, whom we can collaborate with. - ''We'll check in with Ilya when this prioritized item comes up. ''  
+
===Searching and sorting the web for IM software===
 +
This is being done by others, whom we can collaborate with. - ''We'll check in with Ilya when this prioritized item comes up. ''  
 
* Where does this fit into project priorities? - ''Low. Such an effort will require substantial effort on the part of current cluster participants for which such an activity is outside their current professional responsibilities. Additionally, this would dramatically change the complexion and direction of this clusters current scope and priorities. However, such a resource would be very useful.''
 
* Where does this fit into project priorities? - ''Low. Such an effort will require substantial effort on the part of current cluster participants for which such an activity is outside their current professional responsibilities. Additionally, this would dramatically change the complexion and direction of this clusters current scope and priorities. However, such a resource would be very useful.''
 
* How much effort will this require? - This will likely require a fact finding expedition. Who will lead it? - ''A lot. We'll delegate a fact finder when the priority arises.''
 
* How much effort will this require? - This will likely require a fact finding expedition. Who will lead it? - ''A lot. We'll delegate a fact finder when the priority arises.''
  
===Action items===
+
==Action Items & Notes (listed in order of priority)==
  
Listed in descending priority:
+
===Refine scope===
* Refine scope
+
Done. Should we link from the home page (for primary user group) to technical information (for secondary user group)?
* Wrap IMCR in a simple but attractive website in preparation for production release.
+
 
* Update the IMCR vocabulary, and consider aligning or mapping to the Software Ontology, SWEET, and GCMD.
+
===Wrap IMCR in a simple but attractive website in preparation for production release===
* Identify domain vocabularies for use with IMCR software
+
Done, '''[https://imcr-hackathon.github.io/portal/ website is here].'''
* Finish curation of Python libraries
+
 
* Revisit tagging of R libraries
+
===Update the IMCR vocabulary===
* Focus curation on code snippets
+
Done.
* Develop best practices for curating code snippets a software libraries
+
 
* Back up IMCR to GitHub
+
Consider aligning or mapping to the Software Ontology, SWEET, and GCMD
* Create monthly digests (news)
+
* SWEET - The SWEET Ontology may be useful for finer concept granularity, which is currently of low priority. Will revisit this later.
* Production release and advertising
+
* SWO - The Software Ontology has terms useful to both immediate and low priorities. Immediate focus is on aligning "Operation", "Process", and "Data management" classes of the IMCR with SWO. '''[https://docs.google.com/spreadsheets/d/1dl3ozXyzCcrNckUS6vMs-KzlfHI3z2Avn8X71DiMq_M/edit?usp=sharing Alignment notes are being kept in this document.]'''. SWO alignment is complete.
* Automate metadata maintenance
+
* GCMD - Several concepts within '''[https://gcmdservices.gsfc.nasa.gov/kms/concepts/concept_scheme/sciencekeywords/ NASAs Global Change Master Directory "Science Keywords" class]''' are relevant to IMCR. Alignment with GCMD is complete.
* How to expand scope.
+
* What vocabulary is DataONE using for their '''[https://www.dataone.org/software_tools_catalog Software Tools Catalog]'''? Can we align with it? A request for this information has been placed.
 +
* '''[https://kepler-project.org/ The Kepler Project]''' was contacted for information on the ontology used to categorize resources within that software workflow system, but the Kepler Project doesn't appear to have a maintainer and inquiries sent via the project's contact interface fails.
 +
 
 +
Other SWO classes to be used in IMCR:
 +
* Use "programming language" for OntoSoft "Implementation language" fields. This has been added to IMCR best practices.
 +
* Use "Data/Format" for OntoSoft "File format" fields. This has been added to IMCR best practices. This has been added to IMCR best practices.
 +
 
 +
'''[https://docs.google.com/spreadsheets/d/1PklZRyOEelkK-65gr2-JnW7GqaYH9ogbZ7mje_rx2Ik/edit?usp=sharing Maintenance of the vocabulary structure and terms is conducted in this Google Sheet.]''' You are welcome to suggest new terms and comment on existing ones.
 +
 
 +
===Identify domain vocabularies for use with IMCR software===
 +
Done. '''[https://vocab.lternet.edu/vocab/vocab/index.php The LTER Controlled Vocabulary]''' "disciplines" category has been added to IMCR best practices to keyword science domains.
 +
 
 +
===Toolkit for metadata maintenance===
 +
In progress. '''[https://github.com/IMCR-Hackathon/toolkit toolkit]'''
 +
 
 +
===Back-up IMCR to GitHub===
 +
Done. '''[https://github.com/IMCR-Hackathon/metadata A GitHub repository]''' has been created for backing up IMCR software metadata. The '''[https://github.com/IMCR-Hackathon/toolkit toolkit]''' function "backup()" facilitates this task.
 +
 
 +
===Revisit tagging of existing libraries===
 +
Done.
 +
 
 +
===Finish curation of Python libraries===
 +
Done.
 +
 
 +
===Create monthly digests (news)===
 +
In progress. Monthly news, relevant to IMCR users, will be posted on the '''[https://imcr-hackathon.github.io/website/ website]''' the first Monday of each month, and linked to in the telecon list-serve email sent out the same day. Maintainer news and ongoings will continue to be recorded in the ESIP Wiki. The first news posting will go out with the production release of the IMCR.
 +
 
 +
===Production release and advertising===
 +
In progress. Places to advertise upon production release:
 +
* EDI Newsletter
 +
* EDI Website
 +
* EDI Twitter
 +
* LTER Network Office Newsletter
 +
* LTER Network Office Twitter
 +
* DataONE resources
 +
* DataONE Twitter
 +
* ESIP Newsletter
 +
* ESIP Webinar
 +
 
 +
===Focus curation on code snippets===
 +
Done. Code snippets can be registered as a repository (e.g. GitHub).
 +
 
 +
===Develop best practices for curating code snippets===
 +
Done. Code snippets can be registered as a repository (e.g. GitHub).
 +
 
 +
===When & how to expand IMCR scope===
 +
Done. The scope (i.e. science domain and level of metadata detail) may be expanded once the current implementation has matured and as technologies allow. Additional science domains may be integrated within the current IMCR Portal or as separate IMCR Science Domain Portals. Requests to register software at the function level, with all corresponding metadata to make machine actionable, would require substantial manual, but will not be an issue when the relevant parsers and algorithms are available.

Latest revision as of 12:12, April 22, 2020

Return to Archive

Attendees

  • Stevan Earl
  • Colin Smith

Agenda & Notes

This agenda is largely informed by our 2019 ESIP Session (notes are here).

Agenda notes are italicized.

Refine IMCR scope

Uncertainty about project scope largely stems from an inability to prioritize use cases.

  • Who are the primary constituents we are trying to serve and what are their use cases? - Information managers (or anyone playing an information management role) in the environmental and ecological sciences. We are focusing on this domain first since it's where the expertise of this clusters contributors are from. Once methods are formalized and robust, we will expand to other domains.
  • Who are ancillary constituents and what do they need? - Ancillary constituents are machines that need detailed metadata for each software library function with annotation to ontologies.
  • Prioritize this list, constrain scope to the current priority, and add future scope to a project road map. - Humans, machines.
  • What science domain(s) are we serving? - Environmental and ecological first, then others.
  • What level of software maturity does the IMCR contain? - Mostly production ready software packages, but pre-preproduction packages and code snippets are welcome. Our curation activities are focusing on production and pre-preproduction software packages first, then on code snippets.
  • What do the registered software items represent (e.g. single scripts, packages, both)? - Both.
  • Explicitly communicate this in a project scope. - Will do.

Improve software metadata

... so IMCR is useful to platforms like the Data Discovery Studio

  • What level of detail needs to be added? - Function level inputs and outputs.
  • How can we perform this task? - This may differ by implementation language and community of practice. This is nearly impossible to access programmatically for the R language. Perhaps this is simpler for others?
  • How much effort will it require? - A lot.
  • What priority does this task have? - Low. See prioritized list of action items below.

Reconsider controlled vocabulary structure and terms

The current controlled vocabulary is a customized construct designed to serve the non-expert data manager to the detriment of excluding others. Aligning the IMCR vocabulary with established vocabularies (e.g. SWEET, GCMD) would enable machine actionability and inclusion in platforms like the Data Discovery Studio.

  • Do we want to continue with the custom vocabulary or align with established vocabularies (can we do both)? - We will keep the current scheme as it is one commonly held in the minds of our primary constituents. We'll map the IMCR terms to other vocabs when possible and consider integration at a later time.

Searching and sorting the web for IM software

This is being done by others, whom we can collaborate with. - We'll check in with Ilya when this prioritized item comes up.

  • Where does this fit into project priorities? - Low. Such an effort will require substantial effort on the part of current cluster participants for which such an activity is outside their current professional responsibilities. Additionally, this would dramatically change the complexion and direction of this clusters current scope and priorities. However, such a resource would be very useful.
  • How much effort will this require? - This will likely require a fact finding expedition. Who will lead it? - A lot. We'll delegate a fact finder when the priority arises.

Action Items & Notes (listed in order of priority)

Refine scope

Done. Should we link from the home page (for primary user group) to technical information (for secondary user group)?

Wrap IMCR in a simple but attractive website in preparation for production release

Done, website is here.

Update the IMCR vocabulary

Done.

Consider aligning or mapping to the Software Ontology, SWEET, and GCMD

  • SWEET - The SWEET Ontology may be useful for finer concept granularity, which is currently of low priority. Will revisit this later.
  • SWO - The Software Ontology has terms useful to both immediate and low priorities. Immediate focus is on aligning "Operation", "Process", and "Data management" classes of the IMCR with SWO. Alignment notes are being kept in this document.. SWO alignment is complete.
  • GCMD - Several concepts within NASAs Global Change Master Directory "Science Keywords" class are relevant to IMCR. Alignment with GCMD is complete.
  • What vocabulary is DataONE using for their Software Tools Catalog? Can we align with it? A request for this information has been placed.
  • The Kepler Project was contacted for information on the ontology used to categorize resources within that software workflow system, but the Kepler Project doesn't appear to have a maintainer and inquiries sent via the project's contact interface fails.

Other SWO classes to be used in IMCR:

  • Use "programming language" for OntoSoft "Implementation language" fields. This has been added to IMCR best practices.
  • Use "Data/Format" for OntoSoft "File format" fields. This has been added to IMCR best practices. This has been added to IMCR best practices.

Maintenance of the vocabulary structure and terms is conducted in this Google Sheet. You are welcome to suggest new terms and comment on existing ones.

Identify domain vocabularies for use with IMCR software

Done. The LTER Controlled Vocabulary "disciplines" category has been added to IMCR best practices to keyword science domains.

Toolkit for metadata maintenance

In progress. toolkit

Back-up IMCR to GitHub

Done. A GitHub repository has been created for backing up IMCR software metadata. The toolkit function "backup()" facilitates this task.

Revisit tagging of existing libraries

Done.

Finish curation of Python libraries

Done.

Create monthly digests (news)

In progress. Monthly news, relevant to IMCR users, will be posted on the website the first Monday of each month, and linked to in the telecon list-serve email sent out the same day. Maintainer news and ongoings will continue to be recorded in the ESIP Wiki. The first news posting will go out with the production release of the IMCR.

Production release and advertising

In progress. Places to advertise upon production release:

  • EDI Newsletter
  • EDI Website
  • EDI Twitter
  • LTER Network Office Newsletter
  • LTER Network Office Twitter
  • DataONE resources
  • DataONE Twitter
  • ESIP Newsletter
  • ESIP Webinar

Focus curation on code snippets

Done. Code snippets can be registered as a repository (e.g. GitHub).

Develop best practices for curating code snippets

Done. Code snippets can be registered as a repository (e.g. GitHub).

When & how to expand IMCR scope

Done. The scope (i.e. science domain and level of metadata detail) may be expanded once the current implementation has matured and as technologies allow. Additional science domains may be integrated within the current IMCR Portal or as separate IMCR Science Domain Portals. Requests to register software at the function level, with all corresponding metadata to make machine actionable, would require substantial manual, but will not be an issue when the relevant parsers and algorithms are available.