EnviroSensing Monthly telecons
back to EnviroSensing Cluster main page
Telecons on the fourth Tuesday of every month at 4:00pm ET. Click on 'join'
Next telecon December 23, 2014, 4:00 PM EDT. No telecon due to Christmas
Notes from past telecons
Jordan Read presented the SensorQC R package
Wade Sheldon presented the GCE Data Toolbox – a short summary follows:
- Community-oriented environmental software package
- Lightweight, portable, file-based data management system implemented in MATLAB
- generalized technical analysis framework, useful for automatic processing, and it's a good compromise using either programmed-in or file-based operations
- Generalized tabular data model
- Metadata, data, robust API, GUI library, support files, MATLAB databases
- Benefits and costs: platform independent, sharing both code and data seamlessly across the systems, version independent as far as MATLAB goes, and is now "free and open source" software. There is a growing community of users in LTER.
Toolbox data model
- Data model is meant to be a self-describing environmental data set-- the metadata is associated with the data, create date and edit date and such are maintained, and its lineage.
- Quality control criteria- can apply custom function or one already in the toolbox
- Data arrays, corresponding arrays of qualifier flags -- similar to a relational database table but with more associated metadata
Toolbox function library
- The software library is referred to as a "toolbox"
- a growing level of analytical functions, transformations, aggregation tools
- GUI functions to simplify the usage
- indexing and search support tools, and data harvest management tools
- Command line API but there is also a large and growing set of graphical form interfaces and you can start the toolbox without even using the command line
Data management framework
- Data management cycle - designed to help an LTER site do all of its data management tasks
- Data and metadata can be imported into the framework and a very mature set of predefined import filters exist: csv, space- and tab-delimited and generic parsers. Also, specialized parsers are available for Sea-Bird CTD, sondes, Campbell, Hobo, Schlumberger, OSIL, etc.
- Live connections i.e. Data Turbine, ClimDB, SQL DB's, access to the MATLAB data toolbox
- Can import data from NWIS, NOAA, NCDC, etc.
- Can set evaluation rules, conditions, evaluations, etc.
- Automated QC on import but can do interactive analysis and revision
- All steps are automatically documented, so you can generate an anomalies report by variable and date range which lets you communicate more to the users of the data
- Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites).
The challenge was that most automated approaches still produced too many flagged data that needed to be manually checked. Multiple statistical methods were tested based on long-term historical data. The method they selected was to use a moving window of data from the same hour over 30 days and test for 4 standard deviations in that window; E.g., use all data for 1 pm for days 30 - 60 of the year, compute four standard deviations, and set the range for the midpoint day (45) at the 1pm hour to that range.
- Josh Cole reported on his system, which is in development and he will be able to share scripts with the group.
- Brief discussion of displaying results using web tools.
- Great Basin site discussed the variability in their data, which "has no normal"-- how could we perform qa/qc based on statistics and ranges in this case?
- Discussion of bringing Wade Sheldon to call next time / usefulness of the toolbox for data managers
- Discussion of using Pandas package- does anyone have experience, can we get them on?
- Discussion of the trade off between large data stores, computational strength, and power. Good solutions?
- ESIP email had some student opportunities which may be of interest
- Overall, it was considered helpful if people were willing to share scripts. Discussion of a GIT repository for the group, or possibly just use the Wiki.
Suggestions for future discussion topics
- Citizen Science contributions to environmental monitoring
- 'open' sensors - non-commercial sensors made in-house, technology, use, best practices
- Latest sensor technologies
- Efficient data processing approaches
- Online data visualizations
- New collaborations to develop new algorithms for better data processing
- Sensor system management tools (communicating field events and associating them with data)