Difference between revisions of "Data Management Course Outline"

From Earth Science Information Partners (ESIP)
Line 50: Line 50:
 
**Recording provenance and context - Jeff Arnfield/NCDC
 
**Recording provenance and context - Jeff Arnfield/NCDC
 
**Tracking and describing changes to the data
 
**Tracking and describing changes to the data
 +
**'''QUESTIONS'''
 +
***Citation, provenance and context are also documentation/metadata activities.  Should they be grouped there instead?
 
*Data Formats
 
*Data Formats
 
**Building understandable spreadsheets - Jeff Arnfield/NCDC
 
**Building understandable spreadsheets - Jeff Arnfield/NCDC
Line 55: Line 57:
 
**Choosing and adopting community accepted standards
 
**Choosing and adopting community accepted standards
 
**Avoiding proprietary formats
 
**Avoiding proprietary formats
*Creating metadata (Overlap with "Preservation Strategies.Metadata"... should these sections be combined?)
+
*Creating metadata  
 
**For your collections as a whole
 
**For your collections as a whole
 
**Creating item level metadata
 
**Creating item level metadata
Line 64: Line 66:
 
**Publishing metadata to GCMD - Tyler Stevens/GCMD
 
**Publishing metadata to GCMD - Tyler Stevens/GCMD
 
**Publishing metadata to ECHO
 
**Publishing metadata to ECHO
 +
**'''QUESTIONS'''
 +
*** This section seems to overlap with "Preservation Strategies.Metadata"... should these sections be combined, or clearly differentiated?
 +
*** Is "documentation" a friendlier, and more inclusive, term?
 +
*** The "publishing" items are most closely related to advertising/accessing data -- should they be moved there?
 
*Working with your archive organization
 
*Working with your archive organization
 
**Broadening your user community
 
**Broadening your user community
Line 76: Line 82:
 
**Handling sensitive data
 
**Handling sensitive data
 
**Rights
 
**Rights
 +
**'''QUESTIONS'''
 +
***Should "advertising your data" and "providing access" be separate sections or subsections? 
 +
***Need to address portals and registries beyond GCMD & ECHO. Some agencies have specific requirements for publishing metadata.
  
 
===Preservation strategies===
 
===Preservation strategies===

Revision as of 11:10, August 4, 2011

NOTE: We agreed that the target audience initially would be scientists

For Scientists

The case for data stewardship

  • Agency requirements
    • NSF data management plan
    • NASA science data policy
    • NOAA Administrative Order 212-15, Management of Environmental and Geospatial Data and Information
  • Return on Investment
    • Return on your investment
    • Expanding the audience for your data
    • Return on public investments
  • Verifiable science
    • Tying your data to standards, metrics, and benchmarks
  • Facilitating science through interoperable discovery and access
  • Enhancing your reputation
  • Preserving the Scientific Record
    • Establishing Relationships with archives
    • Preserving a Record of Environmental Change
    • Other case studies?
  • What Not to do when Archiving Data!

Data Management plans

  • Why do a data management plan?
  • Elements of a plan -
    • Identify materials to be created
    • Identify your audience(s)
    • Data organization
    • Roles and responsibilities
    • Describing and documenting your data, including metadata
    • Standards used
    • Data access, sharing, and re-use policies
    • Backups, archives, and preservation strategy
    • ??QUESTION: Should the plan define (an) objective metric(s) to make implementation and compliance measurable?
  • Estimating effort and resources required
    • Hardware, software capabilities required
    • Personnel resources and skills needed
  • Some available resources to help with developing your plan

Local Data Management

  • Managing your data
    • Data identifiers and locators - Jeff Arnfield/NCDC
    • File naming conventions (Cook)
    • Backing up your data (Cook)
    • Developing a citation for your data (Cook)
    • Recording provenance and context - Jeff Arnfield/NCDC
    • Tracking and describing changes to the data
    • QUESTIONS
      • Citation, provenance and context are also documentation/metadata activities. Should they be grouped there instead?
  • Data Formats
    • Building understandable spreadsheets - Jeff Arnfield/NCDC
    • Using self-describing data formats
    • Choosing and adopting community accepted standards
    • Avoiding proprietary formats
  • Creating metadata
    • For your collections as a whole
    • Creating item level metadata
    • Metadata for discovery - Tyler Stevens/GCMD
    • Metadata for access and use - Jeff Arnfield/NCDC
    • Metadata for archiving - Jeff Arnfield/NCDC
    • Metadata for tracking data processing
    • Publishing metadata to GCMD - Tyler Stevens/GCMD
    • Publishing metadata to ECHO
    • QUESTIONS
      • This section seems to overlap with "Preservation Strategies.Metadata"... should these sections be combined, or clearly differentiated?
      • Is "documentation" a friendlier, and more inclusive, term?
      • The "publishing" items are most closely related to advertising/accessing data -- should they be moved there?
  • Working with your archive organization
    • Broadening your user community
    • Planning for longer term preservation - Jeff Arnfield/NCDC
  • Providing access to your data
    • Evaluating who your audience is
    • Who gets to access your data
      • Agency best practices & policies
    • Access mechanisms
    • Advertising your data (i.e., data casting)
    • Tracking data usage
    • Handling sensitive data
    • Rights
    • QUESTIONS
      • Should "advertising your data" and "providing access" be separate sections or subsections?
      • Need to address portals and registries beyond GCMD & ECHO. Some agencies have specific requirements for publishing metadata.

Preservation strategies

  • Sponsor (e.g., Agency) or institution requirements
  • Options for archiving your data
    • What archives are out there?
      • Discipline or institutional archives
      • Finding an archive
    • What to do if there is no archive out there
  • What data goes into a Long-term archive?
  • What do long term archives do with my data? - Jeff Arnfield/NCDC
  • Data transfer & submission agreements
    • See "Submission Agreements" section under "For Data Managers"
    • Agency/archive specific requirements my vary
  • Intro to the OAIS Reference Model
  • Emerging standards for preservation
  • Metadata

Responsible Data Use

  • Citation and credit
  • Data restrictions
  • Fair use
  • Feedback and metrics
  • Collaboration
  • Community participation

For Data Managers

  • Data Management plan support
  • Collection or acquisition policies
  • Intro to OAIS reference model
  • Initial Assessment and appraisal
    • Identify information to be preserved
      • main features and properties
      • dependencies on information here or elsewhere
    • Identify objects to be received
    • Establish complementary information needs (e.g., format, data descriptions, provenance, reference information, context, fixity information)
      • What complementary information is needed for data useful for climate studies (USGCRP list)
    • Assessing potential designated communities
    • Assessing probable curation duration
    • Assessing data transfer options
    • Defining access paths
    • Assessing costs and feasibility
    • Metadata, metadata standards, and levels of metadata
  • Submission agreements
    • Data integrity
    • Contacts
    • Schedule
    • Operational Procedures
    • Error reconciliation
    • Constraints
    • other aspects necessary for understanding how to support the data
  • Preparing for ingest
  • Ingesting data
    • Validation checks
    • Identifiers
    • Citations
    • Levels of service
  • Periodic re-assessment
  • Curation activities
    • Media migration
    • Format migration