Difference between revisions of "Preservation Use Case Choosing a dataset"
Line 5: | Line 5: | ||
Relevant experts may include | Relevant experts may include | ||
+ | * science domain experts that know the science applications. | ||
* instrument experts that know the subtleties of the observation mechanism. | * instrument experts that know the subtleties of the observation mechanism. | ||
− | * algorithm experts that know the variations in retrievals | + | * algorithm experts that know the variations in retrievals. |
+ | * process experts that know the subtleties of the processing implementation. | ||
* data format experts that know handling of for example HDF4 vs HDF5. | * data format experts that know handling of for example HDF4 vs HDF5. | ||
+ | |||
+ | How best to capture the "gotchas" potentially introduced each step along the way? | ||
+ | |||
+ | Suitability of data usage | ||
+ | * Mapping observations (e.g. variables) to appropriate science focus areas. | ||
==Actors== | ==Actors== |
Revision as of 07:52, July 9, 2013
Choosing a data set from multiple similar choices.
Summary
A research user needs to pick the data set from multiple similar data sets that best meets the user’s requirements for their intended application. An example could be a polar bear ecologist choosing a data set on sea ice conditions in a region of the Hudson Bay from the multiple data sets listed at NSIDC. Another example could be a user choosing which sea surface temperature data set from PO.DAAC to use in forcing a model of an agal bloom. Many other examples exist. Traditionally this was done by the user consulting a relevant expert. Ideally, one could conceive of an expert system helping guide the user through their query, if the system had access to sufficient information.
Relevant experts may include
- science domain experts that know the science applications.
- instrument experts that know the subtleties of the observation mechanism.
- algorithm experts that know the variations in retrievals.
- process experts that know the subtleties of the processing implementation.
- data format experts that know handling of for example HDF4 vs HDF5.
How best to capture the "gotchas" potentially introduced each step along the way?
Suitability of data usage
- Mapping observations (e.g. variables) to appropriate science focus areas.
Actors
- Research user
- Data expert(s)/Expert system
- Archive
Sequence of Events
- User poses initial request to expert
- Expert queries user on specifics
- Iteration between user and expert to understand vocabularies and actual needs
- Initial possible data sets are identified by basic criteria like whether the data set covers the right time and location
- The list is further refined by more qualitative criteria specific to the actual query
- A recommended data set or ranked list of data sets is returned to the user
PCCS Artifacts
- Data usage information
- Informal feedback from users (e.g., Amazon-style comments)
- publications about the data
- Publications that use the data
- Data “peer review” information. This is ill defined, but could include
- audit information about practices and processes to produce and maintain the data
- Advise from scientific advisory groups, etc.
- ….
- Authority or certification information
- Who is the authority (if there is one) that is ascerting hat the data meet certain quality criteria (e.g. Nat. Weather Service)
- Criteria used in the certification