EnviroSensing Monthly telecons

From Federation of Earth Science Information Partners
Revision as of 18:50, October 30, 2014 by Donhenshaw (talk | contribs)

back to EnviroSensing Cluster main page

Telecons on the fourth Tuesday of every month at 4:00pm ET. Click on 'join'

Next telecon November 25, 2014, 4:00 PM EDT. Jordan Read will introduce the R package SensorQC

Notes from past telecons


Wade Sheldon presented the GCE Data Toolbox – a short summary follows:

  • Community-oriented environmental software package
  • Lightweight, portable, file-based data management system implemented in MATLAB
  • generalized technical analysis framework, useful for automatic processing, and it's a good compromise using either programmed-in or file-based operations
  • Generalized tabular data model
  • Metadata, data, robust API, GUI library, support files, MATLAB databases
  • Benefits and costs: platform independent, sharing both code and data seamlessly across the systems, version independent as far as MATLAB goes, and is now "free and open source" software. There is a growing community of users in LTER.

Toolbox data model

  • Data model is meant to be a self-describing environmental data set-- the metadata is associated with the data, create date and edit date and such are maintained, and its lineage.
  • Quality control criteria- can apply custom function or one already in the toolbox
  • Data arrays, corresponding arrays of qualifier flags -- similar to a relational database table but with more associated metadata

Toolbox function library

  • The software library is referred to as a "toolbox"
  • a growing level of analytical functions, transformations, aggregation tools
  • GUI functions to simplify the usage
  • indexing and search support tools, and data harvest management tools
  • Command line API but there is also a large and growing set of graphical form interfaces and you can start the toolbox without even using the command line

Data management framework

  • Data management cycle - designed to help an LTER site do all of its data management tasks
  • Data and metadata can be imported into the framework and a very mature set of predefined import filters exist: csv, space- and tab-delimited and generic parsers. Also, specialized parsers are available for Sea-Bird CTD, sondes, Campbell, Hobo, Schlumberger, OSIL, etc.
  • Live connections i.e. Data Turbine, ClimDB, SQL DB's, access to the MATLAB data toolbox
  • Can import data from NWIS, NOAA, NCDC, etc.
  • Can set evaluation rules, conditions, evaluations, etc.
  • Automated QC on import but can do interactive analysis and revision
  • All steps are automatically documented, so you can generate an anomalies report by variable and date range which lets you communicate more to the users of the data


  • Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites).

The challenge was that most automated approaches still produced too many flagged data that needed to be manually checked. Multiple statistical methods were tested based on long-term historical data. The method they selected was to use a moving window of data from the same hour over 30 days and test for 4 standard deviations in that window; E.g., use all data for 1 pm for days 30 - 60 of the year, compute four standard deviations, and set the range for the midpoint day (45) at the 1pm hour to that range.

  • Josh Cole reported on his system, which is in development and he will be able to share scripts with the group.
  • Brief discussion of displaying results using web tools.
  • Great Basin site discussed the variability in their data, which "has no normal"-- how could we perform qa/qc based on statistics and ranges in this case?
  • Discussion of bringing Wade Sheldon to call next time / usefulness of the toolbox for data managers
  • Discussion of using Pandas package- does anyone have experience, can we get them on?
  • Discussion of the trade off between large data stores, computational strength, and power. Good solutions?
  • ESIP email had some student opportunities which may be of interest
  • Overall, it was considered helpful if people were willing to share scripts. Discussion of a GIT repository for the group, or possibly just use the Wiki.


Suggestions for future discussion topics

  • Citizen Science contributions to environmental monitoring
  • 'open' sensors - non-commercial sensors made in-house, technology, use, best practices
  • Latest sensor technologies
  • Efficient data processing approaches
  • Online data visualizations
  • New collaborations to develop new algorithms for better data processing
  • Sensor system management tools (communicating field events and associating them with data)