Releasing a data set
Data Release
Summary - Datasets generated that are not yet interpreted, but may have value as released and publicly available data
Objective/context - Publish via the web, with review. Trying to capture the process - what are the automated and human steps necessary?
Actors - will vary based on step in sequence. Generalized to:
- Author/Producers
- Organizational Approval (e.g., supervisor, different layers of management, org policies)
- Peer review? (some way to have multiple eyes on data before release)
- Release, archival, curation
Sequence of Events (Note: this does not capture the entire Data Life Cycle, only the portion related to Data Release)
- Create data product
- Register product
- Process review
- Metadata review
- Data review
- Revision of metadata/reconciliation of peer reviews/data reviews
- Submit product
- Process approval
- Approved at Science Center level
- Assign URL
- Metadata registration with clearinghouse
- DOI process
- Data product and metadata are preserved
- Disseminate product
PCCS artifacts
Notes
USGS has a community of practice called the Community for Data Integration (CDI); it is an open community; they have regular WebExed meetings open to all https://my.usgs.gov/confluence/display/cdi/Data+Release+Use+Case+Team
Focus has been on policies, as USGS is working on the policy manual that is being updated. Data release policies very outdated. Data management is becoming more prominent.
DRUC group (Data Release Use Case) - Two scenarios:
- Formal release through publication, https://my.usgs.gov/confluence/download/attachments/294650012/PrimaryUseCase10232012.pdf?version=1&modificationDate=1354733538313&api=v2
- Less formal release, published via the web - simply data, no interpretation -> science director approved (lower level review - fewer actors involved; however still very robust) https://my.usgs.gov/confluence/download/attachments/369688679/WebReleaseUseCaseDiagram-Master.pdf?version=1&modificationDate=1370277101669&api=v2