Difference between revisions of "Linked Open Research Data for Earth and Space Science Informatics"

From Earth Science Information Partners (ESIP)
Line 6: Line 6:
  
 
== Useful Tools ==
 
== Useful Tools ==
DBPedia Spotlight:  
+
[http://spotlight.dbpedia.org DBPedia Spotlight]:  
*text - the text you want annotated
+
* text - the text you want annotated
*confidence - a threshold for terms that are annotated
+
* confidence - a threshold for terms that are annotated
*support - the minimum number of inlinks a Wikipedia page must have for annotation
+
* support - the minimum number of inlinks a Wikipedia page must have for annotation
 
[http://goo.gl/2q7Nl DBPedia Spotlight Example]<br><br>
 
[http://goo.gl/2q7Nl DBPedia Spotlight Example]<br><br>
 
[http://www.opencalais.com/GetStarted OpenCalais]
 
[http://www.opencalais.com/GetStarted OpenCalais]
Line 18: Line 18:
 
*ESIP members list is available from Erin/Carol - FOAF?
 
*ESIP members list is available from Erin/Carol - FOAF?
 
*Peter mentioned uncovering hidden/non-explicit network
 
*Peter mentioned uncovering hidden/non-explicit network
 +
 +
== Evaluation ==
 +
* Option 1: Abstract clustering from disambiguation data
 +
** Summary: Use the disambiguations as sparse feature sets to cluster AGU abstracts.  Perform a purity test based on occurrence of tags within clusters.
 +
** Related Work: [http://www.springerlink.com/content/p278617582u5x3x1/ Document clustering of scientific texts using citation contexts]
 +
* Option 2: Precision and recall using human-annotated abstracts
 +
** Summary: Request annotations from past AGU presenters.  Perform disambiguation and evaluate precision and recall on entities.
 +
** Related Work: [http://i-semantics.tugraz.at/scientific-track/accepted-papers DBpedia Spotlight: Shedding Light on the Web of Documents]

Revision as of 00:40, July 30, 2011

Linked Open Research Data for Earth and Space Science Informatics

Tom Narock and Eric Rozell

Abstract: Earth and Space Science Informatics (ESSI) is inherently multi-disciplinary, requiring close collaborations between scientists and information technologists. Identifying potential collaborations can be difficult, especially with the rapidly changing landscape of technologies and informatics projects. The ability to discover the technical competencies of other researchers in the community can help in the discovery of collaborations. In addition to collaboration discovery, social network information can be used to analyze trends in the field, which will help project managers identify irrelevant, well-established, and emerging technologies and specifications. This information will help keep projects focused on the technologies and standards that are actually being used, making them more useful to the ESSI community.

We address this problem with a solution involving two components: a pipeline for generating structured data from AGU-ESSI abstracts and ESIP member information, and an API and Web application for accessing the generated data. We use a Natural Language Processing technique, Named Entity Disambiguation, to extract information about researchers, their affiliations, and technologies they have applied in their research. We encode the extracted data in the Resource Description Framework, using Linked Data vocabularies including the Semantic Web for Research Communities ontology and the Friend-of-a-Friend ontology. Lastly, we expose this data in three ways: through a SPARQL endpoint, through Java and PHP APIs, and through a Web application. Our implementations are open source, and we expect that the pipeline and APIs can evolve with the community.  

Useful Tools

DBPedia Spotlight:

  • text - the text you want annotated
  • confidence - a threshold for terms that are annotated
  • support - the minimum number of inlinks a Wikipedia page must have for annotation

DBPedia Spotlight Example

OpenCalais

Ideas

  • ESSI keywords were introduced in Fall 2009. Maybe we should set up a web form to allow authors to go back and annotate older abstracts with ESSI keywords.
  • Chris Lynnes suggested measuring ESIP's impact - need to think of how to do this using our data.
  • ESIP members list is available from Erin/Carol - FOAF?
  • Peter mentioned uncovering hidden/non-explicit network

Evaluation