Difference between revisions of "EnviroSensing Monthly telecons"

From Earth Science Information Partners (ESIP)
Line 7: Line 7:
  
 
===Notes from past telecons===
 
===Notes from past telecons===
 +
==10/28/2014==
 +
Wade Sheldon presented the GCE Data Toolbox – a short summary follows:
 +
 +
* Community-oriented environmental software package
 +
* Lightweight, portable, file-based data management system implemented in MATLAB
 +
* generalized technical analysis framework, useful for automatic processing, and it's a good compromise using either programmed-in or file-based operations
 +
* Generalized tabular data model
 +
* Metadata, data, robust API, GUI library, support files, MATLAB databases
 +
* Benefits and costs: platform independent, sharing both code and data seamlessly across the systems, version independent as far as MATLAB goes, and is now "free and open source" software. There is a growing community of users in LTER.
 +
 +
Toolbox data model
 +
-------------------
 +
* Data model is meant to be a self-describing environmental data set-- the metadata is associated with the data, create date and edit date and such are maintained, and its lineage.
 +
* Quality control criteria- can apply custom function or one already in the toolbox
 +
* Data arrays, corresponding arrays of qualifier flags -- similar to a relational database table but with more associated metadata
 +
 +
Toolbox function library
 +
--------------------
 +
* The software library is referred to as a "toolbox"
 +
* a growing level of analytical functions, transformations, aggregation tools
 +
* GUI functions to simplify the usage
 +
* indexing and search support tools, and data harvest management tools
 +
* Command line API but there is also a large and growing set of graphical form interfaces and you can start the toolbox without even using the command line
 +
 +
Data management framework
 +
-------------------
 +
* Data management cycle - designed to help an LTER site do all of its data management tasks
 +
* Data and metadata can be imported into the framework and a very mature set of predefined import filters exist: csv, space- and tab-delimited and generic parsers. Also, specialized parsers are available for Sea-Bird CTD, sondes, Campbell, Hobo, Schlumberger, OSIL, etc.
 +
* Live connections i.e. Data Turbine, ClimDB, SQL DB's, access to the MATLAB data toolbox
 +
* Can import data from NWIS, NOAA, NCDC, etc.
 +
* Can set evaluation rules, conditions, evaluations, etc.
 +
* Automated QC on import but can do interactive analysis and revision
 +
* All steps are automatically documented, so you can generate an anomalies report by variable and date range which lets you communicate more to the users of the data
 +
 
==9/23/2014==
 
==9/23/2014==
 
* Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites).  
 
* Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites).  

Revision as of 17:48, October 30, 2014

back to EnviroSensing Cluster main page

Telecons on the fourth Tuesday of every month at 4:00pm ET. Click on 'join'

Next telecon October 28, 2014, 4:00 PM EDT, Wade Sheldon: GCE Matlab data toolbox

November 25, 2014, 4:00 PM EDT. Jordan Read will introduce the R package SensorQC

Notes from past telecons

10/28/2014

Wade Sheldon presented the GCE Data Toolbox – a short summary follows:

  • Community-oriented environmental software package
  • Lightweight, portable, file-based data management system implemented in MATLAB
  • generalized technical analysis framework, useful for automatic processing, and it's a good compromise using either programmed-in or file-based operations
  • Generalized tabular data model
  • Metadata, data, robust API, GUI library, support files, MATLAB databases
  • Benefits and costs: platform independent, sharing both code and data seamlessly across the systems, version independent as far as MATLAB goes, and is now "free and open source" software. There is a growing community of users in LTER.

Toolbox data model


  • Data model is meant to be a self-describing environmental data set-- the metadata is associated with the data, create date and edit date and such are maintained, and its lineage.
  • Quality control criteria- can apply custom function or one already in the toolbox
  • Data arrays, corresponding arrays of qualifier flags -- similar to a relational database table but with more associated metadata

Toolbox function library


  • The software library is referred to as a "toolbox"
  • a growing level of analytical functions, transformations, aggregation tools
  • GUI functions to simplify the usage
  • indexing and search support tools, and data harvest management tools
  • Command line API but there is also a large and growing set of graphical form interfaces and you can start the toolbox without even using the command line

Data management framework


  • Data management cycle - designed to help an LTER site do all of its data management tasks
  • Data and metadata can be imported into the framework and a very mature set of predefined import filters exist: csv, space- and tab-delimited and generic parsers. Also, specialized parsers are available for Sea-Bird CTD, sondes, Campbell, Hobo, Schlumberger, OSIL, etc.
  • Live connections i.e. Data Turbine, ClimDB, SQL DB's, access to the MATLAB data toolbox
  • Can import data from NWIS, NOAA, NCDC, etc.
  • Can set evaluation rules, conditions, evaluations, etc.
  • Automated QC on import but can do interactive analysis and revision
  • All steps are automatically documented, so you can generate an anomalies report by variable and date range which lets you communicate more to the users of the data

9/23/2014

  • Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites).

The challenge was that most automated approaches still produced too many flagged data that needed to be manually checked. Multiple statistical methods were tested based on long-term historical data. The method they selected was to use a moving window of data from the same hour over 30 days and test for 4 standard deviations in that window; E.g., use all data for 1 pm for days 30 - 60 of the year, compute four standard deviations, and set the range for the midpoint day (45) at the 1pm hour to that range.

  • Josh Cole reported on his system, which is in development and he will be able to share scripts with the group.
  • Brief discussion of displaying results using web tools.
  • Great Basin site discussed the variability in their data, which "has no normal"-- how could we perform qa/qc based on statistics and ranges in this case?
  • Discussion of bringing Wade Sheldon to call next time / usefulness of the toolbox for data managers
  • Discussion of using Pandas package- does anyone have experience, can we get them on?
  • Discussion of the trade off between large data stores, computational strength, and power. Good solutions?
  • ESIP email had some student opportunities which may be of interest
  • Overall, it was considered helpful if people were willing to share scripts. Discussion of a GIT repository for the group, or possibly just use the Wiki.


8/26/2014

Suggestions for future discussion topics

  • Citizen Science contributions to environmental monitoring
  • 'open' sensors - non-commercial sensors made in-house, technology, use, best practices
  • Latest sensor technologies
  • Efficient data processing approaches
  • Online data visualizations
  • New collaborations to develop new algorithms for better data processing
  • Sensor system management tools (communicating field events and associating them with data)