Difference between revisions of "Documentation Cluster Minutes 2016-06-27"

From Earth Science Information Partners (ESIP)
(Created page with " ==Attendees== Anna Milan, John Kozimor, Lindsay Powers, Sean Gordon, Tyler Stevens, Annie Burgess, Sean Gordon ==Agenda== ===Presentation: Materials Science Data Management ...")
 
(,)
Line 5: Line 5:
 
==Agenda==
 
==Agenda==
 
===Presentation: Materials Science Data Management Initiatives at NIST by Bob Hanisch ===
 
===Presentation: Materials Science Data Management Initiatives at NIST by Bob Hanisch ===
 +
* Office of Data and Informatics
 +
** Standard reference data - undertaking modernization of apps and interfaces, all the metadata goes to data.gov, do charge for some of the reference data is for fee
 +
** Research data - NIST data portal
 +
** Data Science - informatics and analytics
 +
** Community - research data alliance, work with Network of National Metrology Institutes and BIPM
 +
 +
* Key ODI activities
 +
** 2 years ago - the practice was haphazard - trying to improve the data infrastructure
 +
** Materials Genome Initiative is a major stakeholder
 +
 +
* Goals
 +
** FAIR principles: Find, Access, Interoperable, Reusable
 +
 +
* Find
 +
* acceleratornetwork.org/mse-challenge
 +
* would like to go to a service to find "who as data on X or Y"
 +
* Example from Astronomy: VAO
 +
 +
* Materials Resource Registry (MRR)
 +
** "not going to take over the world"  with federated system
 +
** using OAI-PMH
 +
** challenge is defining metadata fields and terminology
 +
** new international WG with RDA to define new metadata schema that is appropriate
 +
** demo of keyword search with facet
 +
** draft of metadata terms
 +
 +
* Federated Architecture
 +
 +
* will be at the next RDA plenary in Denver
 +
 +
* MGI Code Catalog
 +
** will integrate 50-60 entries into the MRR
 +
** Metadata Schema that is used to describe the software - coding language, documentation,
 +
 +
* Standard Reference Data (SRD)
 +
** 1968 act
 +
** copyright
 +
** cost recovery
 +
** "UI Anarchy"
 +
** Socrata is a nice platform for display tabular data  - APIs will help streamline
 +
 +
* materialsdata.nist.gov
 +
** DSpace - communities within Dspace can be public or private (most are private)
 +
** 20 communities currently
 +
**
 +
 +
* Work closely with National Data Service (NDS)
 +
** NDS Labs environment allows sharing  --- Docker Containers...
 +
 +
* NDS Materials Data Facility
 +
* provides capability to link data to analysis
 +
* NIST is strict about what can be deployed by NIST and and shared
 +
* basic metadata  capability - trying to ensure that it's interoperable with their other metadata
 +
 +
=== Interoperate===
 +
* Materials Data Curation System (MDCS)
 +
** python, MongoDB, SPARQL, XML schema
 +
** documenting actual DATA - not just collections
 +
** HUGE challenges: stored in 140 different formats, no common schemas, proprietary in nature (e.g. Vendor specific)
 +
** curator is breaking down these barriers
 +
** 3 steps to curate
 +
** Can create own templates, but try to encourage re-use of existing
 +
** There is nothing specific to materials, but can be used to describe any research domain
 +
** 3 steps to export
 +
** REST API - supports automated capture
 +
 +
* some things to think about
 +
** Quality metadata is KEY
 +
*** metadata curation is non-trivial, can be costly
 +
** "whatever you do, you can always do more"
 +
** Important to address Interoperability at the proper scale
 +
*** too wide vs too narrow - important to cast the net at the appropriate scale
 +
* will often start with DC or DataCite and then add enough of that to support domain specific. If get too detailed - then no-one takes time to develop content.
 +
** Standards require community participation to assure take-up -- national, international..
 +
 +
== Q&A ==
 +
* LP: HDF has a collaborative forum with RDA on data formats. Finally trying to get a handle on who the HDF community is. Do any of these communities use HDF?
 +
** BH: not aware of any HDF use. Trying to get requirements for standardization, transparent, open-based format of instrument...?
 +
 +
* LP: are you hosting the code in the code catalog? Or just encourage publication and pointing to where ever it is hosted.
 +
** BH: the latter, unless we have developed the code. No code validation.
 +
 +
* AM: what is software metadata schema?
 +
** BH: was kind of ad hoc and started before he came, developed in house, but aware of Force 11 efforts.

Revision as of 12:54, June 27, 2016

Attendees

Anna Milan, John Kozimor, Lindsay Powers, Sean Gordon, Tyler Stevens, Annie Burgess, Sean Gordon

Agenda

Presentation: Materials Science Data Management Initiatives at NIST by Bob Hanisch

  • Office of Data and Informatics
    • Standard reference data - undertaking modernization of apps and interfaces, all the metadata goes to data.gov, do charge for some of the reference data is for fee
    • Research data - NIST data portal
    • Data Science - informatics and analytics
    • Community - research data alliance, work with Network of National Metrology Institutes and BIPM
  • Key ODI activities
    • 2 years ago - the practice was haphazard - trying to improve the data infrastructure
    • Materials Genome Initiative is a major stakeholder
  • Goals
    • FAIR principles: Find, Access, Interoperable, Reusable
  • Find
  • acceleratornetwork.org/mse-challenge
  • would like to go to a service to find "who as data on X or Y"
  • Example from Astronomy: VAO
  • Materials Resource Registry (MRR)
    • "not going to take over the world" with federated system
    • using OAI-PMH
    • challenge is defining metadata fields and terminology
    • new international WG with RDA to define new metadata schema that is appropriate
    • demo of keyword search with facet
    • draft of metadata terms
  • Federated Architecture
  • will be at the next RDA plenary in Denver
  • MGI Code Catalog
    • will integrate 50-60 entries into the MRR
    • Metadata Schema that is used to describe the software - coding language, documentation,
  • Standard Reference Data (SRD)
    • 1968 act
    • copyright
    • cost recovery
    • "UI Anarchy"
    • Socrata is a nice platform for display tabular data - APIs will help streamline
  • materialsdata.nist.gov
    • DSpace - communities within Dspace can be public or private (most are private)
    • 20 communities currently
  • Work closely with National Data Service (NDS)
    • NDS Labs environment allows sharing --- Docker Containers...
  • NDS Materials Data Facility
  • provides capability to link data to analysis
  • NIST is strict about what can be deployed by NIST and and shared
  • basic metadata capability - trying to ensure that it's interoperable with their other metadata

Interoperate

  • Materials Data Curation System (MDCS)
    • python, MongoDB, SPARQL, XML schema
    • documenting actual DATA - not just collections
    • HUGE challenges: stored in 140 different formats, no common schemas, proprietary in nature (e.g. Vendor specific)
    • curator is breaking down these barriers
    • 3 steps to curate
    • Can create own templates, but try to encourage re-use of existing
    • There is nothing specific to materials, but can be used to describe any research domain
    • 3 steps to export
    • REST API - supports automated capture
  • some things to think about
    • Quality metadata is KEY
      • metadata curation is non-trivial, can be costly
    • "whatever you do, you can always do more"
    • Important to address Interoperability at the proper scale
      • too wide vs too narrow - important to cast the net at the appropriate scale
  • will often start with DC or DataCite and then add enough of that to support domain specific. If get too detailed - then no-one takes time to develop content.
    • Standards require community participation to assure take-up -- national, international..

Q&A

  • LP: HDF has a collaborative forum with RDA on data formats. Finally trying to get a handle on who the HDF community is. Do any of these communities use HDF?
    • BH: not aware of any HDF use. Trying to get requirements for standardization, transparent, open-based format of instrument...?
  • LP: are you hosting the code in the code catalog? Or just encourage publication and pointing to where ever it is hosted.
    • BH: the latter, unless we have developed the code. No code validation.
  • AM: what is software metadata schema?
    • BH: was kind of ad hoc and started before he came, developed in house, but aware of Force 11 efforts.