Difference between revisions of "EnviroSensing Monthly telecons"
From Earth Science Information Partners (ESIP)
Donhenshaw (talk | contribs) |
Donhenshaw (talk | contribs) |
||
Line 7: | Line 7: | ||
===Notes from past telecons=== | ===Notes from past telecons=== | ||
+ | ==10/28/2014== | ||
+ | Wade Sheldon presented the GCE Data Toolbox – a short summary follows: | ||
+ | |||
+ | * Community-oriented environmental software package | ||
+ | * Lightweight, portable, file-based data management system implemented in MATLAB | ||
+ | * generalized technical analysis framework, useful for automatic processing, and it's a good compromise using either programmed-in or file-based operations | ||
+ | * Generalized tabular data model | ||
+ | * Metadata, data, robust API, GUI library, support files, MATLAB databases | ||
+ | * Benefits and costs: platform independent, sharing both code and data seamlessly across the systems, version independent as far as MATLAB goes, and is now "free and open source" software. There is a growing community of users in LTER. | ||
+ | |||
+ | Toolbox data model | ||
+ | ------------------- | ||
+ | * Data model is meant to be a self-describing environmental data set-- the metadata is associated with the data, create date and edit date and such are maintained, and its lineage. | ||
+ | * Quality control criteria- can apply custom function or one already in the toolbox | ||
+ | * Data arrays, corresponding arrays of qualifier flags -- similar to a relational database table but with more associated metadata | ||
+ | |||
+ | Toolbox function library | ||
+ | -------------------- | ||
+ | * The software library is referred to as a "toolbox" | ||
+ | * a growing level of analytical functions, transformations, aggregation tools | ||
+ | * GUI functions to simplify the usage | ||
+ | * indexing and search support tools, and data harvest management tools | ||
+ | * Command line API but there is also a large and growing set of graphical form interfaces and you can start the toolbox without even using the command line | ||
+ | |||
+ | Data management framework | ||
+ | ------------------- | ||
+ | * Data management cycle - designed to help an LTER site do all of its data management tasks | ||
+ | * Data and metadata can be imported into the framework and a very mature set of predefined import filters exist: csv, space- and tab-delimited and generic parsers. Also, specialized parsers are available for Sea-Bird CTD, sondes, Campbell, Hobo, Schlumberger, OSIL, etc. | ||
+ | * Live connections i.e. Data Turbine, ClimDB, SQL DB's, access to the MATLAB data toolbox | ||
+ | * Can import data from NWIS, NOAA, NCDC, etc. | ||
+ | * Can set evaluation rules, conditions, evaluations, etc. | ||
+ | * Automated QC on import but can do interactive analysis and revision | ||
+ | * All steps are automatically documented, so you can generate an anomalies report by variable and date range which lets you communicate more to the users of the data | ||
+ | |||
==9/23/2014== | ==9/23/2014== | ||
* Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites). | * Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites). |
Revision as of 17:48, October 30, 2014
back to EnviroSensing Cluster main page
Telecons on the fourth Tuesday of every month at 4:00pm ET. Click on 'join'
Next telecon October 28, 2014, 4:00 PM EDT, Wade Sheldon: GCE Matlab data toolbox
November 25, 2014, 4:00 PM EDT. Jordan Read will introduce the R package SensorQC
Notes from past telecons
10/28/2014
Wade Sheldon presented the GCE Data Toolbox – a short summary follows:
- Community-oriented environmental software package
- Lightweight, portable, file-based data management system implemented in MATLAB
- generalized technical analysis framework, useful for automatic processing, and it's a good compromise using either programmed-in or file-based operations
- Generalized tabular data model
- Metadata, data, robust API, GUI library, support files, MATLAB databases
- Benefits and costs: platform independent, sharing both code and data seamlessly across the systems, version independent as far as MATLAB goes, and is now "free and open source" software. There is a growing community of users in LTER.
Toolbox data model
- Data model is meant to be a self-describing environmental data set-- the metadata is associated with the data, create date and edit date and such are maintained, and its lineage.
- Quality control criteria- can apply custom function or one already in the toolbox
- Data arrays, corresponding arrays of qualifier flags -- similar to a relational database table but with more associated metadata
Toolbox function library
- The software library is referred to as a "toolbox"
- a growing level of analytical functions, transformations, aggregation tools
- GUI functions to simplify the usage
- indexing and search support tools, and data harvest management tools
- Command line API but there is also a large and growing set of graphical form interfaces and you can start the toolbox without even using the command line
Data management framework
- Data management cycle - designed to help an LTER site do all of its data management tasks
- Data and metadata can be imported into the framework and a very mature set of predefined import filters exist: csv, space- and tab-delimited and generic parsers. Also, specialized parsers are available for Sea-Bird CTD, sondes, Campbell, Hobo, Schlumberger, OSIL, etc.
- Live connections i.e. Data Turbine, ClimDB, SQL DB's, access to the MATLAB data toolbox
- Can import data from NWIS, NOAA, NCDC, etc.
- Can set evaluation rules, conditions, evaluations, etc.
- Automated QC on import but can do interactive analysis and revision
- All steps are automatically documented, so you can generate an anomalies report by variable and date range which lets you communicate more to the users of the data
9/23/2014
- Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites).
The challenge was that most automated approaches still produced too many flagged data that needed to be manually checked. Multiple statistical methods were tested based on long-term historical data. The method they selected was to use a moving window of data from the same hour over 30 days and test for 4 standard deviations in that window; E.g., use all data for 1 pm for days 30 - 60 of the year, compute four standard deviations, and set the range for the midpoint day (45) at the 1pm hour to that range.
- Josh Cole reported on his system, which is in development and he will be able to share scripts with the group.
- Brief discussion of displaying results using web tools.
- Great Basin site discussed the variability in their data, which "has no normal"-- how could we perform qa/qc based on statistics and ranges in this case?
- Discussion of bringing Wade Sheldon to call next time / usefulness of the toolbox for data managers
- Discussion of using Pandas package- does anyone have experience, can we get them on?
- Discussion of the trade off between large data stores, computational strength, and power. Good solutions?
- ESIP email had some student opportunities which may be of interest
- Overall, it was considered helpful if people were willing to share scripts. Discussion of a GIT repository for the group, or possibly just use the Wiki.
8/26/2014
Suggestions for future discussion topics
- Citizen Science contributions to environmental monitoring
- 'open' sensors - non-commercial sensors made in-house, technology, use, best practices
- Latest sensor technologies
- Efficient data processing approaches
- Online data visualizations
- New collaborations to develop new algorithms for better data processing
- Sensor system management tools (communicating field events and associating them with data)