Difference between revisions of "Linked Open Research Data for Earth and Space Science Informatics"

From Earth Science Information Partners (ESIP)
(Created page with "== Linked Open Research Data for Earth and Space Science Informatics == <br> Tom Narock and Eric Rozell  ")
 
Line 1: Line 1:
 
== Linked Open Research Data for Earth and Space Science Informatics ==
 
== Linked Open Research Data for Earth and Space Science Informatics ==
<br>
+
Tom Narock and Eric Rozell
Tom Narock and Eric Rozell
+
<br><br>
   
+
'''Abstract:''' Earth and Space Science Informatics (ESSI) is inherently multi-disciplinary, requiring close collaborations between scientists and information technologists.  Identifying potential collaborations can be difficult, especially with the rapidly changing landscape of technologies and informatics projects.  The ability to discover the technical competencies of other researchers in the community can help in the discovery of collaborations. In addition to collaboration discovery, social network information can be used to analyze trends in the field, which will help project managers identify irrelevant, well-established, and emerging technologies and specifications.  This information will help keep projects focused on the technologies and standards that are actually being used, making them more useful to the ESSI community.<br>
 +
We address this problem with a solution involving two components: a pipeline for generating structured data from AGU-ESSI abstracts and ESIP member information, and an API and Web application for accessing the generated data. We use a Natural Language Processing technique, Named Entity Disambiguation, to extract information about researchers, their affiliations, and technologies they have applied in their research.  We encode the extracted data in the Resource Description Framework, using Linked Data vocabularies including the Semantic Web for Research Communities ontology and the Friend-of-a-Friend ontology. Lastly, we expose this data in three ways: through a SPARQL endpoint, through Java and PHP APIs, and through a Web application. Our implementations are open source, and we expect that the pipeline and APIs can evolve with the community.  
 +
 
 +
== Useful Tools ==
 +
DBPedia Spotlight:
 +
*text - the text you want annotated
 +
*confidence - a threshold for terms that are annotated
 +
*support - the minimum number of inlinks a Wikipedia page must have for annotation
 +
[http://goo.gl/2q7Nl DBPedia Spotlight Example]<br><br>
 +
[http://www.opencalais.com/GetStarted OpenCalais]
 +
 
 +
== Ideas ==
 +
*ESSI keywords were introduced in  Fall 2009. Maybe we should set up a web form to allow authors to go back and annotate older abstracts with ESSI keywords.
 +
*Chris Lynnes suggested measuring ESIP's impact - need to think of how to do this using our data.

Revision as of 09:55, July 25, 2011

Linked Open Research Data for Earth and Space Science Informatics

Tom Narock and Eric Rozell

Abstract: Earth and Space Science Informatics (ESSI) is inherently multi-disciplinary, requiring close collaborations between scientists and information technologists. Identifying potential collaborations can be difficult, especially with the rapidly changing landscape of technologies and informatics projects. The ability to discover the technical competencies of other researchers in the community can help in the discovery of collaborations. In addition to collaboration discovery, social network information can be used to analyze trends in the field, which will help project managers identify irrelevant, well-established, and emerging technologies and specifications. This information will help keep projects focused on the technologies and standards that are actually being used, making them more useful to the ESSI community.
We address this problem with a solution involving two components: a pipeline for generating structured data from AGU-ESSI abstracts and ESIP member information, and an API and Web application for accessing the generated data. We use a Natural Language Processing technique, Named Entity Disambiguation, to extract information about researchers, their affiliations, and technologies they have applied in their research. We encode the extracted data in the Resource Description Framework, using Linked Data vocabularies including the Semantic Web for Research Communities ontology and the Friend-of-a-Friend ontology. Lastly, we expose this data in three ways: through a SPARQL endpoint, through Java and PHP APIs, and through a Web application. Our implementations are open source, and we expect that the pipeline and APIs can evolve with the community.  

Useful Tools

DBPedia Spotlight:

  • text - the text you want annotated
  • confidence - a threshold for terms that are annotated
  • support - the minimum number of inlinks a Wikipedia page must have for annotation

DBPedia Spotlight Example

OpenCalais

Ideas

  • ESSI keywords were introduced in Fall 2009. Maybe we should set up a web form to allow authors to go back and annotate older abstracts with ESSI keywords.
  • Chris Lynnes suggested measuring ESIP's impact - need to think of how to do this using our data.