File:You 2007 - Relationship of flagging frequency to confidence intervals.pdf
You_2007_-_Relationship_of_flagging_frequency_to_confidence_intervals.pdf (file size: 267 KB, MIME type: application/pdf)
With the widespread use of electronic interfaces in data collection, many networks have increased, or will increase, the sampling rate and add more sensors. The associated increase in data volume will naturally lead to an increased reliance on automatic quality assurance (QA) procedures. The number of data entries flagged for further manual validation can be affected by the choice of confidence intervals in statistically based QA procedures, which in turn affects the number of bad entries classified as good measurements. At any given station, a number of confidence intervals for the Spatial Regression Test (SRT) were specified and tested in this study, using historical data for both the daily minimum (Tmin) and maximum (Tmax), to determine how the frequency of flagging is related to the choice of confidence interval. An assessment of the general relationship of the number of data flagged to the specified confidence interval over a set of widely dispersed stations in the High Plains was undertaken to determine whether a single confidence factor would suffice, at all stations, to identify a moderate number of flags. This study suggests that using a confidence factor ‘f ’ larger than 2.5 to specify the confidence interval will flag a reasonable number of measurements (<1%) for further manual validation and a single confidence factor can be applied for a state. This paper initially compares two formulations of the SRT method. This comparison is followed by an analysis of the percentage of observations flagged as a function of confidence interval.
Click on a date/time to view the file as it appeared at that time.
|current||18:22, January 26, 2015||(267 KB)||DataRonin||With the widespread use of electronic interfaces in data collection, many networks have increased, or will increase, the sampling rate and add more sensors. The associated increase in data volume will naturally lead to an increased reliance on automati...|