Documentation Cluster Minutes 2016-06-27

From Earth Science Information Partners (ESIP)


Anna Milan, John Kozimor, Lindsay Powers, Sean Gordon, Tyler Stevens, Annie Burgess, Sean Gordon, Aaron Sweeney, Paul Lemieux


Presentation: Materials Science Data Management Initiatives at NIST by Bob Hanisch

  • Office of Data and Informatics
    • Standard reference data - undertaking modernization of apps and interfaces, all the metadata goes to, do charge for some of the reference data is for fee
    • Research data - NIST data portal
    • Data Science - informatics and analytics
    • Community - research data alliance, work with Network of National Metrology Institutes and BIPM
  • Key ODI activities
    • 2 years ago - the practice was haphazard - trying to improve the data infrastructure
    • Materials Genome Initiative is a major stakeholder
  • Goals
    • FAIR principles: Find, Access, Interoperable, Reusable
  • Find
  • would like to go to a service to find "who as data on X or Y"
  • Example from Astronomy: VAO
  • Materials Resource Registry (MRR)
    • "not going to take over the world" with federated system
    • using OAI-PMH
    • challenge is defining metadata fields and terminology
    • new international WG with RDA to define new metadata schema that is appropriate
    • demo of keyword search with facet
    • draft of metadata terms
  • Federated Architecture
  • will be at the next RDA plenary in Denver
  • MGI Code Catalog
    • will integrate 50-60 entries into the MRR
    • Metadata Schema that is used to describe the software - coding language, documentation,
  • Standard Reference Data (SRD)
    • 1968 act
    • copyright
    • cost recovery
    • "UI Anarchy"
    • Socrata is a nice platform for display tabular data - APIs will help streamline
    • DSpace - communities within Dspace can be public or private (most are private)
    • 20 communities currently
  • Work closely with National Data Service (NDS)
    • NDS Labs environment allows sharing --- Docker Containers...
  • NDS Materials Data Facility
  • provides capability to link data to analysis
  • NIST is strict about what can be deployed by NIST and and shared
  • basic metadata capability - trying to ensure that it's interoperable with their other metadata


  • Materials Data Curation System (MDCS)
    • python, MongoDB, SPARQL, XML schema
    • documenting actual DATA - not just collections
    • HUGE challenges: stored in 140 different formats, no common schemas, proprietary in nature (e.g. Vendor specific)
    • curator is breaking down these barriers
    • 3 steps to curate
    • Can create own templates, but try to encourage re-use of existing
    • There is nothing specific to materials, but can be used to describe any research domain
    • 3 steps to export
    • REST API - supports automated capture
  • some things to think about
    • Quality metadata is KEY
      • metadata curation is non-trivial, can be costly
    • "whatever you do, you can always do more"
    • Important to address Interoperability at the proper scale
      • too wide vs too narrow - important to cast the net at the appropriate scale
  • will often start with DC or DataCite and then add enough of that to support domain specific. If get too detailed - then no-one takes time to develop content.
    • Standards require community participation to assure take-up -- national, international..


  • LP: HDF has a collaborative forum with RDA on data formats. Finally trying to get a handle on who the HDF community is. Do any of these communities use HDF?
    • BH: not aware of any HDF use. Trying to get requirements for standardization, transparent, open-based format of instrument...?
  • LP: are you hosting the code in the code catalog? Or just encourage publication and pointing to where ever it is hosted.
    • BH: the latter, unless we have developed the code. No code validation.
  • AM: what is software metadata schema?
    • BH: was kind of ad hoc and started before he came, developed in house, but aware of Force 11 efforts.
  • BH: there are a lot of technical solutions, but challenging to get community buy in. NIST is a PI oriented organization.
  • AM: What worked/not worked in community buy in?
    • BH. pilot teams to work through process and document - see their team as being a solutions broker. Don't try to solve all problems at the same time. "one success story at a time". Top down mandates create pushback. 6 mos after he got there, was asked to work with IT and got $7 million and now working with their IT to build out a modern data management infrastructure that supports the Data Management LIfecycle. This has been a huge cultural challenge, IT hasn't had the concept of real data management.
  • LP: Materials data facility and materials data repository difference?
    • BH: (didn't capture answer)