Documentation Cluster Minutes 2016-06-27
From Earth Science Information Partners (ESIP)
Attendees
Anna Milan, John Kozimor, Lindsay Powers, Sean Gordon, Tyler Stevens, Annie Burgess, Sean Gordon, Aaron Sweeney, Paul Lemieux
Agenda
Presentation: Materials Science Data Management Initiatives at NIST by Bob Hanisch
- Office of Data and Informatics
- Standard reference data - undertaking modernization of apps and interfaces, all the metadata goes to data.gov, do charge for some of the reference data is for fee
- Research data - NIST data portal
- Data Science - informatics and analytics
- Community - research data alliance, work with Network of National Metrology Institutes and BIPM
- Key ODI activities
- 2 years ago - the practice was haphazard - trying to improve the data infrastructure
- Materials Genome Initiative is a major stakeholder
- Goals
- FAIR principles: Find, Access, Interoperable, Reusable
- Find
- acceleratornetwork.org/mse-challenge
- would like to go to a service to find "who as data on X or Y"
- Example from Astronomy: VAO
- Materials Resource Registry (MRR)
- "not going to take over the world" with federated system
- using OAI-PMH
- challenge is defining metadata fields and terminology
- new international WG with RDA to define new metadata schema that is appropriate
- demo of keyword search with facet
- draft of metadata terms
- Federated Architecture
- will be at the next RDA plenary in Denver
- MGI Code Catalog
- will integrate 50-60 entries into the MRR
- Metadata Schema that is used to describe the software - coding language, documentation,
- Standard Reference Data (SRD)
- 1968 act
- copyright
- cost recovery
- "UI Anarchy"
- Socrata is a nice platform for display tabular data - APIs will help streamline
- materialsdata.nist.gov
- DSpace - communities within Dspace can be public or private (most are private)
- 20 communities currently
- Work closely with National Data Service (NDS)
- NDS Labs environment allows sharing --- Docker Containers...
- NDS Materials Data Facility
- provides capability to link data to analysis
- NIST is strict about what can be deployed by NIST and and shared
- basic metadata capability - trying to ensure that it's interoperable with their other metadata
Interoperate
- Materials Data Curation System (MDCS)
- python, MongoDB, SPARQL, XML schema
- documenting actual DATA - not just collections
- HUGE challenges: stored in 140 different formats, no common schemas, proprietary in nature (e.g. Vendor specific)
- curator is breaking down these barriers
- 3 steps to curate
- Can create own templates, but try to encourage re-use of existing
- There is nothing specific to materials, but can be used to describe any research domain
- 3 steps to export
- REST API - supports automated capture
- some things to think about
- Quality metadata is KEY
- metadata curation is non-trivial, can be costly
- "whatever you do, you can always do more"
- Important to address Interoperability at the proper scale
- too wide vs too narrow - important to cast the net at the appropriate scale
- Quality metadata is KEY
- will often start with DC or DataCite and then add enough of that to support domain specific. If get too detailed - then no-one takes time to develop content.
- Standards require community participation to assure take-up -- national, international..
Q&A
- LP: HDF has a collaborative forum with RDA on data formats. Finally trying to get a handle on who the HDF community is. Do any of these communities use HDF?
- BH: not aware of any HDF use. Trying to get requirements for standardization, transparent, open-based format of instrument...?
- LP: are you hosting the code in the code catalog? Or just encourage publication and pointing to where ever it is hosted.
- BH: the latter, unless we have developed the code. No code validation.
- AM: what is software metadata schema?
- BH: was kind of ad hoc and started before he came, developed in house, but aware of Force 11 efforts.
- BH: there are a lot of technical solutions, but challenging to get community buy in. NIST is a PI oriented organization.
- AM: What worked/not worked in community buy in?
- BH. pilot teams to work through process and document - see their team as being a solutions broker. Don't try to solve all problems at the same time. "one success story at a time". Top down mandates create pushback. 6 mos after he got there, was asked to work with IT and got $7 million and now working with their IT to build out a modern data management infrastructure that supports the Data Management LIfecycle. This has been a huge cultural challenge, IT hasn't had the concept of real data management.
- LP: Materials data facility and materials data repository difference?
- BH: (didn't capture answer)