Difference between revisions of "Sandbox MM"
Line 44: | Line 44: | ||
<li>Practices:</li> | <li>Practices:</li> | ||
</ol> | </ol> | ||
− | + | <ol type="lower-alpha"> | |
− | + | <li>Maintain an appropriate level of human inspection. Develop the capability to easily view real-time data and examine regularly (daily/weekly). Regular inspection can help identify sensor problems quickly and might allow for fewer site visitations. Certain problems such as visible extreme spikes, intermittent values, or repetitive values can be easily viewed in raw data plots. | |
− | + | </li><li>Spot check measurements with a reference sensor can be routinely used for some measurements, i.e. temperature, snow depth, etc. to verify the performance of in situ sensors. | |
− | + | </li><li>A portable instrument package that runs alongside installed sensors over a fixed period (daily or longer cycle) can be useful in identifying problems. This type of co-location might be done to audit sensor performance on an annual or periodic basis. | |
− | + | </li><li>Record the date and time of known events that may impact measurements (see Management section). Ideally, these notes can be entered or captured for automated access. For example, sensors are known to demonstrate alternative behavior during site visits or maintenance activities, and light or trip sensors might be used in recording sensor access. | |
− | + | </li><li> Routinely synchronize the time clock on dataloggers with the public Network Time Protocol (NTP) server (http://www.ntp.org/). | |
− | + | </li><li>Provide a reference time zone and avoid changing data logger timestamps for daylight savings time. Many would argue the best practice is to output data in Coordinated Universal Time (UTC), which is particularly useful when data spans multiple time zones. However, most local users of the data prefer seeing output in local standard time because it corresponds to local ecological conditions, i.e., ocean tides or solar noon, and may ease troubleshooting or field-based checking. Another strategy is to provide the local offset from UTC within the data stream to allow simple conversion to UTC, or allow users to query the data and choose whatever time zone they would like to receive the data in. ISO 8601 (http://www.iso.org/iso/home/standards/iso8601.htm) is an international standard covering the exchange of date and time-related data and provides timezone support. For example, 2013-09-17T07:56:32-0500 provides the offset from an EST timezone, however, lack of support in many instruments and software packages is a drawback to its use. Recently, REST services are constructed to allow the return of datetime values with an implicit timezone offset enabling a convenient sharing of data with timestamp flexibility. | |
+ | </li><li>Ensure that files stored on the logger are transmitted error-free to the data center for import (use error-corrected protocols like FTP, Ymodem and HTTP); scheduled manual file download and post-import checks if non-error-corrected protocols are used in interim). | ||
+ | </ol> |
Revision as of 15:16, April 7, 2014
Sensor data quality
Overview
A new generation of environmental sensors and recent major technological advancements in the acquisition and real-time transmission of continuously monitored environmental data provides a major challenge in providing quality assurance (QA) and quality control (QC) for high-throughput data streams. Deployments of sensor networks are becoming increasingly common at environmental research locations, and there is a growing need to access these large volumes of data in near real-time. However, the direct release of streaming sensor data raises the likelihood that incorrect or misleading data will be made available. Additionally, as research applications begin to rely on real-time data streams, the continual and consistent delivery of this information will be essential. This increasing access and use of environmental sensor data demands the development of strategies to assure data quality, the immediate application of quality control methods, and a description of any QA/QC procedures applied to the data.
Traditional QC systems tend to operate on file-based collections of environmental data from field sheets, field recorders or computers, or downloaded datalogger files. Manually applied tools and techniques such as graphical comparisons are used to provide data validation. Documentation is typically not well-organized and not directly associated with data values. The application of these systems must balance the need for release without months or years of delay versus the delivery of well-documented, high quality data. However, with increasing deployment of sensor networks, these older systems fail to scale or keep pace with user needs associated with high volumes of streaming data. Comprehensive and responsive QC systems are needed that are designed to reduce potential problems and can more quickly produce high quality data and metadata. Methods described here for building a QC system will include identification of:
- preventative measures to be taken in the field
- quality checks that can be performed in near real-time
- necessary data management practices
Introduction
A team approach is necessary to build a QC system and multiple skills and personnel are needed. The QC system will begin with system design and preventative measures taken in the field and continue through through data quality checking and data publishing. A lead scientist will propose research questions and describe the types of data and necessary quality. Expertise in field logistics, sensor system and wireless communications will play a role in site design and construction. A sensor system expert will provide knowledge of specific sensors and programming skills to establish quality control checking. Field technicians with strong knowledge of the overall scientific goals and communication skills can help to articulate issues and discover solutions. A data manager will be needed to guide delivery and archival of documented data products. Communication among all parties is necessary for the most timely delivery of well-documented and high quality data.
All team members will be needed to define a QC workflow that is useful in describing procedures and personnel responsibilities as the data flows from field sensors to published data streams. (need an example). A QC system must allow for an iterative, quality management cycle to accommodate feedback to policies, procedures, and system design as data collections continue over time. A system will depend on communication among team members to assure that noted sensor data collection and transport issues and problems are addressed quickly and documented in the data stream. An active, well-documented QC system will help to establish user-confidence in data products.
Automated or semi-automated QC systems are needed that can adequately review and screen source data and still provide for its timely release. Automated quality control processes such as range checking can be performed in near real-time and a system can assign data qualifier codes, or flags, for any sensor value when problems or uncertainty occurs in the data stream. However, these processes can often only indicate potential problems in the data stream that still require manual review. A comprehensive QC system is only achievable as a hybrid system demanding both automated QC checks and manual intervention to assure highest data quality.
For this chapter we will define quality assurance (QA) as those preventative processes or steps taken to reduce problems and inaccuracies in the streaming data. These will include sensor network design, protocol development for routine maintenance and sensor calibration, and best practice procedures for field activities and data management. Quality control (QC) primarily refers to the tests provided to check data quality and the assignment of data flags and other notations to qualify issues and describe problems. QC system refers to this complete set of QA/QC preventative and product-oriented processes.
Methods
Sensor Quality Assurance (QA):
Quality assurance (QA) refers to preventative measures and activities used to minimize inaccuracies in the data. For example, scheduling regular site visits and maintenance procedures, or continuously monitoring and evaluating site sensor behavior can prevent sensor failures or lead to early detection of problems. Designing networks with redundant sensor measurements provides an additional means to quality check sensor data and assure continuity of measurement. Of course, the time and expense to conduct high-level maintenance procedures or implement efficient and redundant designs may be limited by project budgets, but may be warranted by the importance of the data. Here we describe QA measures categorized by design, maintenance, and practices:
- Design:
- Design for replicate sensors. Co-located sensors independent of the datalogger and included in the data flow can be useful checks. For example, check temperature measurements might be made alongside a Campbell thermistor with a HOBO pendant, SDI-12 temperature sensor, or analog thermocouple. Ideally, three replicate sensors are used so that sensor drift can be detected (with two sensors it may not be obvious which sensor is drifting).
- Assure an adequate power supply. Power considerations might include adding a low voltage cutoff (LVD) to prevent logger “brown-out”, or adding power accessories with switched power supply (e.g. CSI logger, IP relay) to programmatically control optional devices (radios, power-cycle loggers).
- Protect all instrumentation and wiring from UV light, animals, human disturbance, etc. such as with flex conduit or enclosures.
- Implement an automated alert system to warn about potential sensor network issues or certain events, e.g., extreme storms. For example, automated alerts might signal low battery power, indicate sensor calibration is needed, or indicate high winds or precipitation.
- Add on-site cameras or webcams. Webcams can be used to record weather or site conditions, animal disturbance or human access.
- Maintenance:
- Schedule routine sensor maintenance. Routine site visits following standard protocols can assure proper maintenance activities.
- Standardize field notebooks, check sheets or field computer applications to lead field technicians through a standard set of procedures and assure that all necessary tasks are conducted. These notebooks or applications can serve as an entry point for technical observations regarding potential problems or sensor failures.
- Schedule routine calibration of instruments and sensors based on manufacturer specifications. Maintaining additional calibrated sensors of the same make/model can allow immediate replacement of sensors removed for calibration to avoid data loss. Otherwise, sensor calibrations can be scheduled at non-critical times or staggered such that a nearby sensor can be used as a proxy to fill gaps.
- Anticipate common repairs and maintain inventory replacement parts. Sensors can be replaced before failure where sensor lifetimes are known or can be estimated.
- Assure proper installation of sensors (correct orientation, clean wiring, solid connections and mounting, etc.). Protocols for installing new sensors will also assure that key information is logged regarding a sensor’s establishment (See Management section).
- Practices:
- Maintain an appropriate level of human inspection. Develop the capability to easily view real-time data and examine regularly (daily/weekly). Regular inspection can help identify sensor problems quickly and might allow for fewer site visitations. Certain problems such as visible extreme spikes, intermittent values, or repetitive values can be easily viewed in raw data plots.
- Spot check measurements with a reference sensor can be routinely used for some measurements, i.e. temperature, snow depth, etc. to verify the performance of in situ sensors.
- A portable instrument package that runs alongside installed sensors over a fixed period (daily or longer cycle) can be useful in identifying problems. This type of co-location might be done to audit sensor performance on an annual or periodic basis.
- Record the date and time of known events that may impact measurements (see Management section). Ideally, these notes can be entered or captured for automated access. For example, sensors are known to demonstrate alternative behavior during site visits or maintenance activities, and light or trip sensors might be used in recording sensor access.
- Routinely synchronize the time clock on dataloggers with the public Network Time Protocol (NTP) server (http://www.ntp.org/).
- Provide a reference time zone and avoid changing data logger timestamps for daylight savings time. Many would argue the best practice is to output data in Coordinated Universal Time (UTC), which is particularly useful when data spans multiple time zones. However, most local users of the data prefer seeing output in local standard time because it corresponds to local ecological conditions, i.e., ocean tides or solar noon, and may ease troubleshooting or field-based checking. Another strategy is to provide the local offset from UTC within the data stream to allow simple conversion to UTC, or allow users to query the data and choose whatever time zone they would like to receive the data in. ISO 8601 (http://www.iso.org/iso/home/standards/iso8601.htm) is an international standard covering the exchange of date and time-related data and provides timezone support. For example, 2013-09-17T07:56:32-0500 provides the offset from an EST timezone, however, lack of support in many instruments and software packages is a drawback to its use. Recently, REST services are constructed to allow the return of datetime values with an implicit timezone offset enabling a convenient sharing of data with timestamp flexibility.
- Ensure that files stored on the logger are transmitted error-free to the data center for import (use error-corrected protocols like FTP, Ymodem and HTTP); scheduled manual file download and post-import checks if non-error-corrected protocols are used in interim).