Difference between revisions of "PS Data Quality 2006 04 25"

Revision as of 15:19, April 25, 2006

Back to: P&S_Data_Quality

Four Main Quality Dimensions

instrument accuracy
platform
environmental effects
data processing

Subdimensions of each dimension

What words should be used to break down into a second tier ten-scale in each Main Dimension?
How much will it depend on each context of the data collection and use?
Ease of use by decision support systems

One simple set of criteria

science-ready="A"

commercial-ready="C"

education-ready="B"

Use of the Quality Assessments

Is it risk or degree of confidence?

Related to use in making decisions--there the decision maker needs to determine level of risk aversion appropriate--relates to the cost to a business of making a "bad" decision

Maybe we can provide guides-to-use for these assessments

Might be useful even to have a simple five-star system. This might be all that is needed for one category of user--whereas other areas of research might want much more.

Testing

Under what conditions was it tested? Lab, over time, multiple sites and situations

Strategy for getting the assessments entered

Simple assessments can come from the providers themselves

How do we come up with incentives and tools to assure that the provider will/can do this?

Could we use some measure of data quality to determine how aggressively to promote certain products?

Assessments might also come from the users

Might be related to number of responses

We could simply report numbers of positive and negative comments

Survey of ESIPs

might even pay for people to review

Data reliability vs quality issues

Federation might only talk about reliability and leave quality to provider

Edits to Discussion from March 28, 2006

Objective

Create a common set of data quality metrics across all Federation data products. Data providers can provide measures for their own products. 3rd parties can provide their own ratings. Quality can refer to accuracy, completeness, and consistency. It is not clear how to measure consistency. It is desirable to provide quality assurance.

We would like to create a 1-10 Data quality scale, where:

 1 = no accuracy claimed
10 = fully reliable data that has withstood the test of time

This measure can be applied to any of the quality dimensions:

Quality Dimensions

Sensor/Instrument (well calibrated, stable, checked across instruments, V/V)
Spacecraft (locational and communication accuracy)
Environment Issues (contamination from clouds, rainfall, ground, sea, dirt, etc.)
Data Processing (accuracy of interpolation, algorithms, ancillary source data)

Our Task

Create a 1-10 scale for each dimension. We will work with Federation members to associate a quality description with each value.

3rd party ratings

NCDC

NCDC Certified data (only states that it is in the archive -- designates as official, not a quality statement)

Dataset docs use FGDC quality section, with different levels of detail

GCMD

DIF records have some minimum required fields to accept

then have a text field to describe quality

ECHO

"measured parameters" from ECS model

QA percent cloud cover; missing pixels;

CLASS/Climate Data Record

Maturity Model approach for data (John Bates application from software maturity)

Level of maturity (five levels of improved treatment)

See CDR Maturity paper

FGDC

Whole section on quality, text only

Testimonials

Peer review

Discussion

Completeness

Is this a measure of quality?

Depends on stated offering from the provider; if they claim it is complete and it isn't

Assertions about datasets

We may want some standard for claiming and measuring how valid a claim may be

Additional Questions

What common data quality standards can the Federation offer within the Earth Information Exchange?
How can we enforce these standards within the Earth Information Exchange?
Are there similar ratings for "data services"?

Action

Rob will send advertisement to the whole group for next months meeting.

@@ Line 11: / Line 11: @@
 #What words should be used to break down into a second tier ten-scale in each Main Dimension?
 #How much will it depend on each context of the data collection and use?
+#Ease of use by decision support systems
 ===One simple set of criteria===