EnviroSensing Monthly telecons
back to EnviroSensing Cluster main page
Telecons are on the first Monday of every month. Click on 'join'
- Next telecon: June 6, 2017 at 5:00pm EST
Notes and recordings from past telecons
To listen to recordings from past telecons, click here. To log in, use the username firstname.lastname@example.org, and the password Earth111.
Renee F. Brown
Envirosensing Summer 2018 Meeting Planning
- ideas on what they would like to see us do
- Additional standards
- two IOT protocols
- LWMQM and Coapp
- Coapp very compressed data transmission protocol
- Additional standards
- Standards for manual user corrections of data
- corrections and or annotations
- without necessarily changing the data itself
- markup language for manual user corrections of data
- ways to annotate at the data point level
- richer way of annotating individual data points
- could be done with SQL or in NOSQL database
- Standards for manual user corrections of data
- how does this relate to theme?
- Automated quality control
- broader category of data quality
- CSSI elements to frameworks
- standardized containers for QA and metadata capture
- e.g., for iButton network
- pass on to level 1 automated QA tests
- simple, not as complex as GCE toolbox
- help prepare more reliable datasets
- Scotty working on proposal
- would sensorML be helpful for ingestion
- Mike also working on JsonLD extensions for real-time data sources
- outline region - get summary of operational sensors, sampling rates etc.
- how does this relate to theme?
- How does jsonLD compare with sensorML?
- schema.org used by google for finding related items
- want to extend to data searches
- json LD stands for Linked Data
- schema.org relatively small set of ways to link data
- would need extensions for handling datasets
- in EarthCube focuses on data centers
- working in new proposals on going to parameter level
- Janet - sensorML is more content-rich
- very broad!
- want to extend work to come up with profiles manufactures can populate
- archiving real time data
- how often to update in portals etc. - best practices
- connecting real time streams to archives
- How does jsonLD compare with sensorML?
- data descriptors for quality assurance and control
- could we have a session on that topic?
- what standard does (or doesn't) do for sensors
- what are best practices, recommendations for new users
- who would like to volunteer to talk about standards - metadata interoperability
- would be good to hear about Coapp and ISPO
- constraint application protocol - COAP
- IPSO protocol for describing connected things including actuators
- may not be mature, but has large user base...
- very limited vocabulary for sensors
- metadata capture, quality control, reporting quality control process
- existing standards or new needed?
- Smart Cities community is not yet at this point in terms of data interoperability
- interested in monetizing
- we have something to offer them
- SWEET ontology - link to sensors not clear
- Socioeconomic Value theme?
- often collect data for specific purposes
- but additional value may come from use by others
- or could lead them wrong if metadata not complete enough to explain what has been done
- getting better handle on collection of metadata
- cautionary tales - what happens if we DON'T have the right tools for metadata
- SNOTEL temperature data is a major cautionary tale in this way - Scotty
- documentation is all manual
- Smart Cities is huge socieconomic impact
- tie to events
- floods in Boulder - models underestimated water content of clouds
- Feedback on robust data corrections might be an interesting project (ties in above)
- don't have good ways to track transformations could have one session more on annotation standards and another more on QA/QC
- could be two complementary sessions
- workflows then annotations and standards
- if you are correcting data by adding offset, that is QAQC thing....
- some drone folks think existing metadata standards are too onerous - perhaps
- better to have big "bag of words"
- more flexible standards
- how signal processing techniques can uncover structure from unstructured data
- e.g., Latent semantic indexing with frequency composition
- similar idea, but for metadata
- document, key values
- could do a good presentation on that - but wear "mad scientist" hat....
- hard to fit things in in field
- there is ESIP funding for speakers in breakout sessions - not sure how to get that!
- would be good to get Kirk to one of these sessions
- have until the end of the month to do this
- being part of ESIP governance helps
- cluster can propose serial sections
- QA workflows
- metadata standards (and unstandards)
- Scotty will put together and send to email list
- would not mind 10-15 min per talk sequences with extended discussion
- short presentations are good to start with - ideally WITHOUT conclusions
- focused talks on specific standards
- Can put directly into web site - long time to edit
- data descriptors for quality assurance and control
- Virginia Coast Reserve LTER
- Data collected by individual researchers; metadata ingested; automated sensors
- Weather station: combination temperature, humidity, wind, rain.
- Several tide stations using radar. Shared in real-time with NOAA.
- Ground water monitoring station
- 900 MHz IP, 2.4 GHz WiFi
- Major stations: Broadwater tower, Machipongo station
- Archiving and publishing data:
- Level 0: raw data
- Level 1: processed data
- Level 2: value added data
- Data collection: PC with LoggerNet to Unix Computer running SAS/R
- Something missing from automater QAQC: user corrections/flagging (e.g. tipping bucket clogged).
- Right now, hand-coded php with forms for user input.
- User input entered into a database that can be queried.
- Code to implement user corrections automatically generated (e.g. using R code)
- Felimon: any documentation on how you do QC?
- John: Documentation is the actual code template used to do the post-processing.
- Felimon: Look into QARTOD for how specific variables should be QAQC'd: https://ioos.noaa.gov/project/qartod/
- Don: Campbell, John L., Rustad, Lindsey E., Porter, John H., Taylor, Jeffrey R., Dereszynski, Ethan W., Shanley, James B., Gries, Corinna, Henshaw, Donald L., Martin, Mary E., Sheldon, Wade M., Boose, Emery R (2013) Quantity is Nothing without Quality: Automated QA/QC for Streaming Environmental Sensor Data (Pub. No: 4825, Journal Article)
- Mike: Does user QAQC require people to be constantly checking the data?
- John: User input usually a result of checking on the equipment.
- Mike: Do you use controlled vocabularies, etc.?
- John: Generally not. Many stations go back to the 1980's. Some stations have these vocabularies, and measurements from other stations are tied to these.
- Corinna: @Felimon: Are you working with NCEI?
- Felimon: Yes, we do upload to NCEI.
ESIP Summer Meeting Call for Sessions:
Recap of winter meeting
- About 20 participants, from organizations including CUAHSI, EPA, etc.
- Talked about sensor data workflow.
- Gave survey on sensor data workflow.
- Is this survey a useful product?
Comments on survey
- Renee: In favor of distributing to a broader audience (LTERs, LBFS).
- Renee: Few people using trusted repositories.
- Mike: CHORDS trying to make use of trusted repositories more accessible.
- Renee: Everybody is using different solutions. What's use in developing own solutions, as opposed to standards?
- Corinna: DataONE is not a trusted repository; it is an aggregator.
- Corinna: Coming up with a data model that works for everybody is too ambitious. Sometimes prevents people from properly using them.
- John: Why so many solutions?: scale and resources, similarity of objectives, things change very fast.
- John: As for vocabulary, scaling can be an issue. Process can become very difficult if number of terms is too large.
- Don: Are we seeing any commonality of approaches?
- Don: Were we supposed to be identifying bottlenecks in the survey?
- Scotty: Some bottlenecks include: QC flagging process. Import configuration process. Overall development of modified data model.
- John: Possibly only use a subset of the survey as a planning tool.
- Scotty: Could be useful for a team of, say, civil engineers who are trying to write a proposal and figure out what tools they need. Most will not be thinking of trusted repositories, e.g.
- Mike: In NSF community, more emphasis on taking more data and more measurements, as opposed to what is done with those measurements (are they well-documented, etc.).
- Corinna: Don't give as many choices on survey, may be to constraining. PI's won't be planning at this level of detail, e.g.
- Aaron: The more generic a tool is, the harder it is to learn.
- Conclusions: trim down diagram to get rid of specifics. Focus on general technologies used.
- ODM seems to be taking ODM in a lot of new directions---goal of end-to-end sensor data management?
- Corinna: could be too broad. Hard to get data in, hard to get data out.
- John: Standardized response blocks. Smaller number than in current survey.
- Headers instead of diagrams.
Renee F. Brown
Discussion of plans for the winter meeting
- Scheduled a working session for the cluster with John Porter
- Use this session to sit down, talk about workflows for end-to-end sensor data management
- Everyone will hopefully have a cross-section of where high-performance/gaps in performance
- Don: See what software people are using in the field, what loggers are used, how data is sent back to base station?
- Scotty: Have a system diagram available for everyone to show their workflows and see where they can use particular tools.
- Paul: Is the working session going to be interactive?
- Scotty: Depends on how many people show up. If there are 20 people, e.g., have a brief synopsis, have an exercise in which people explore different architectures and see how their workflow fits in.
- Else if there are very few people, go around and describe for each user.
- Don: Have you described that in the abstract?
- Scotty: we kept it general.
- Don: We should say "attendees should come prepared for participation"
- Scotty: Concurrent with OGC, metadata carpentry, earth data science analytics, valuables consortium session (and one more)
We we should do this next year:
- Alternating monthly phone calls:
- (i) cluster work, potential publications
- (ii) Guest speakers present 20-30 minutes on an aspect of their research
- Mature workflows
- Completely different workspaces. Preferably earth sciences, but different.
- Renee: Would like to see NEON again.
- Scotty: Celeste had sensor network destroyed by wildfires (had insurance); would be interesting to see how they rebuild it.
- Renee: What topics would you all be interested in?
- Don: Touch base with the group and see if they have any updates.
- Jane: How does the winter session fit into the FAIR data concept.
- Scotty: Look at how well workflow fits into the FAIR concept. Large strategic impact is FAIR concept; tactical focus is how to do you manage increasing volumes of data with decreasing person-hours.
- Jane: OGC will be at this winter meeting.
- Scotty: Many groups are using a modified ODM CUAHSI data model. Similar to what Jane is using.
- Jane: Can likely find speakers in RDA groups.
- Don: Sustainable data management group has an ESIP wiki with notes on FAIR data.
Skip January telecon; reconvene in February.
Renee F Brown
- Preliminary remarks
- senior project on CHORDS at University Nevada Reno
- Continuation of discussion on cluster goals: summary
- Time to start summarizing key software tools, seeing how they can fit together and where the gaps are
- If someone has to spin up a new sensor network from scratch, how could they use a set of tools to streamline the process
- How to have information be interoperable?
- Submitted a working session to the ESIP Winter Meeting
- To plan for taking first step forward over the next couple years
- Co-organizers are Scotty and John Porter
- Renee stepping up to be a new co-chair for the Envirosensing Group
- What would you suggest to a brand-new network manager on what they should do, what tools should they use?
- Literature review for each of the main boxes
- Corinna: most interesting part to talk about are the arrows.
- Define what the arrows represent
- How many arrows could sensorML satisfy?
- More MLs than sensorML:
- WaterML, EML
- Data Turbine as a vehicle?
- Where would it fit in, does it fit in?
- Hasn't been used, because Campbell provides all the services in the management interface.
- Has anyone done a study on how long it takes to set up and configure a sensor data management system?
- There are some aspects of QC that can slow down transfer of streaming sensor data to the database
- OGC SOS and OpenSensorHub handles many of the things on the diagram
- Ethan McMahon: Is there a general diagram?
- Scotty: the Cluster needs to define a topic objective with specific goals for the next 1-2 years. We have been spending most of our effort looking at software tools and workflows for sensor network scientists and managers. We could detail some of this, enhancing the resources already available in the Best Practices. Our goal should be to reduce IM time spent on managing data, to increase scalability and focus more on quality. Streamlining the setup/config/monitoring processes is key, while including community standards and external connectivity.
- Updating a workflow model graphic would give helpful guidance, modify Scotty's?
- Matt: a literature review of each sector of the workflow (tools, practices) would be very helpful, and we could add those to new Guide documents.
- Corinna: the challenge is in defining what the flows between software elements consist of (formats/contents/sizes/standards). The group could spend worthwhile time on that.
- Matt: SensorML as a primary communications medium between softwares - how much does it already cover, and could it use improvement?
- We could take a "use case approach", where as an exercise we have IM's fill in the blanks on a model graphic with their solutions, highlight the bottlenecks, and make suggestions or wish lists. - Felimon: we should have a presentation by 52north, they may have solved many of these problems or have good component solutions?
- Other resources to explore as we set up this topic: OpenSensorHub, SensorML, EML, WaterML, OGC-SOS (standards)
(Incomplete list) Don Henshaw
- Intro - Scotty
- software framework for data management
- working on proposal
- but could have community projects
- diagram shared by Scotty
- where does FRAMEWORK fit in - dashed line
- into structured database a la CUASHI
- GCE toolbox example of stand alone software
- description of diagram
- interfaces to sensor systems
- database structure built along community standards
- metadata and data proceed in parallel
- harkens to FAIR principles for data management
- definitely worthy of groups consideration
- interested in relation with OGC and sensorWeb
- Corinna input?
- I was hoping Jeff would jump in, he's done most of this from the CUASHI perspective
- EDI can't do all of what is in diagram
- more interested in matchmaking
- want as modular as possible so we can insert expertise into particular areas
- Jeff Horsburgh
- knowing what we've worked on
- many pieces worked on as open source
- utility is variable
- community to work on them would make it better
- still a gap , hard to spin up CUASHI software
- database structure & standards - should follow what has already been done
- worthy goal to pursue?
- is this thinking too big for the cluster attention?
- Two questions
- some providers take the lions share of IOT platform - Azure, Google, IOT etc. role in providing interface for embedded devices
- what external connections would you like to integrate with
- open question
- could use a wider array of definitions for inputs
- brining in metadata from a community should only require setting up once
- include built-in test cases - automated continuous development process
- software framework for data management
might be best to make end-to-end system work with most popular systems real-time applications
- probably not - in external connections is near real time
- lessen work for IMs etc.
- or focus on practices
- GCE toolbox actually covers a LOT of this ground
- but built on top of Matlab
- specialized metadata
- can use some "lessons learned" from it
- as Corinna said, would like to see modular with well-documented interfaces, perhaps as web services
- could go "box by box" for discussions of what is needed and what is already working
- develop some use cases
- develop cluster goals
- need to focus on interface/APIs between boxes
- that allows others to build code on either end independently
- Like modular approach - talking about process boxes
- AND uses diverse software as processing
- Loggernet to GCE Toolbox to CSV to DB
- could fill out - broaden input
- identify and move forward on generic pieces
- XDOMES at Winter Meeting - will this topic be discussed there
- want outreach to make a larger group available
- would have no problem proposing session at winter meeting
- we know how to do this as individuals - but tools for new person don't exist yet
- would be good to leave some blank boxes in framework and let others fill it out
- modularity sounds good - importance of pieces varies between groups
- would like a plug-and-play framework
- overlaying pieces on framework would identify holes and what plays well together
- Would like both higher and lower level documents
- need to communicate with managers and programmers
- Next call
- run through ideas on how data flows from place to place
- identify places where other boxes are needed
Recap of summer meeting
- Janet's session
- Wade's session
- Funding Friday Prize
Cove Sturtevant (NEON) - Mobile Applications for maintenance and Field Data
- Instrumented Systems
- IS Science Data Quality Monitoring - automated flagging of data
- Adding separate system for sensor health monitoring
- Quality monitoring application
- Science review
- Rolling analyses
- Maintenance records
- Q: Where do you put in biases and offsets for individual sensors?
- A: This is done in calibration lab.
- Q: Are standard QAQC measures used? (ioos.noaa.gov/project/qartod)
- A: Not yet.
- Q: Are technicians viewing this in field on their phones? Do they need access to cell/wifi?
- A: Generally use ruggedized tablets. Can pre-download data if there is no connectivity.
- Q: Where do you get the barcodes from?
- A: We make them.
- Q: How long does it take to develop apps?
- A: Single applications can be developed in less than a day. Some applications are drag and drop and can be made in a couple hours.
- Q: What is the cost of fulcrum
- A: Standard, Business, Professional. Standard is $18 per device/user per month.
Renee - McMurdough LTER Data Manager
Discussion of Summer Meeting Sessions
- No response from DataONE yet
- Four speakers right now: Matt Bartos, Mike Daniels, Mike Botts, Wade Sheldon
- Cluster emphasis
- End to end sensor data management
- Best practices for deployment are fairly mature
- 2 & 3 Priority for envirosensing cluster
- Raw vs. curated / snapshot vs. linking / strategies for metadata / sensorML vs. tabular / provenance -- how to link back to earlier versions, which data to keep
- Skip July meeting; skip August; reconvene in September
Discussion of Envirosensing Panel
- Data ONE
- ARS (agricultral research stations)
- National snow and ice data center
Questions for panel
- Data submission protocols
- What are the proper protocols and standards for submitting data to repositories?
- e.g. should data be sent as snapshots or streams?
- Integration with real-time streaming sensor networks
- What APIs are available for automatically pushing data to a repository?
- How can repositories encourage participation from small research labs that maintain their own sensor networks?
- Data Quality
- Should repositories have a role in assuring data quality?
- What type of quality control should be performed before submission to repositories?
- Should repositories provide checks for data quality?
- How should storage of metadata associated with data quality be handled?
- Data curation
- Who should be responsible for data curation: submitter or publisher?
- Would it be helpful to have an external rating system for data quality/usefulness?
- Data duplication
- What is the proper way to deal with syncing and duplication of datasets across repositories?
Jason P. Downing
Renee F Brown
CHORDS: Cloud-hosted real-time data services for the geosciences
- Real-time data is of critical and growing importance in the geosciences
- Necessary for hazards like floods, earthquakes, etc.; but also field experiments
- Enhances rates of data transfer from the field will improve data quality and research outcomes
- Organizations like NCAR have great real-time visualization tools, but the data are not easily accessible
- Small research teams are taking valuable measurements that could also be of broad benefit
- However, these data often aren't accessible to the broader community
- Case studies:
- Studying evaporation in the great lakes
- Using infrasound to detect severe weather
- Volcano monitoring in Tanzania
- Crowdsourced real-time data helping to measure and predict earthquakes
- Web of sensor data can be challenging to manage
- Varying spatial scales, flags, metadata
- Most scientists don't want to spend time reading standards
- Enter CHORDS
- Chords emphasizes simple ingest and access to real-time data
- Meant for scientists who want to spend time doing science rather than managing data.
- Can be set up using Amazon Web Services by a lay-user.
- Data is pushed using simple HTTP GET requests.
- Live demo of portal
- Implementation details
- SensorML used to register each site.
- Data fetch via geojson, csv, etc.
- Data stored in influxdb, MYSQL used for metadata
- Connects to grafana for visualization.
- Version 1.0 scheduled for October 2017
- Automatic DOIs for data
- Implementing OGC standards
- Event triggers
- CHORDS architecture
- Portals operated by individuals feed into processing, translation, mapping services.
- Workflows to integrate with archiving services.
- Discussion on plans for the summer
- Decided on a breakout session, along with a panel.
- Breakout group will focus on end-to-end sensor systems.
Paul Celicourt - An end to end automated environmental data collection system
- Objective: develop an integrated data acquisition system.
- Incorporates sensing, data management, publication and analysis into the same package.
- Secondary objectives:
- Self-organized sensor network.
- Platform-independent and protocol-agnostic.
- Software application to encode and decode sensors and sensor platform descriptions.
- Hardware supports most popular data interfaces.
- Data publication in different formats and unit systems.
- Hardware costs less than $200.
- System operation and network organization
- TEDS are used as a mechanism to provide metadata to each station prior to deployment.
- Use CUAHSI ODM data format and Django Web Framework for automated data management.
- Field deployment in Brooklyn.
- Uses Zigbee protocol.
- Software tools: PyTED. Sensor description using IEEE 1451 standards.
- Software tools: HydroUnits. Dimensional analysis in Hydrologic computing systems using sensor-based standards.
- System is capable of handling data collection, transmission, management and publication.
- Effective in reducing field data acquisition workload and reducing human errors.
- Currently developing an online configuration and programming tool.
Janet Fredericks - Update on XDOMES project
- SensorML editor working, but need to develop more manufacturer-friendly vocabularies.
- Want to encourage sensor manufacturers to create content.
- Sensor manufacturer suggested not only creating document, but also an ID for each sensor that can be used by data managers.
- Link to sensor registry spreadsheet added to wiki page.
- Schedule time for showing sensorML editor at upcoming telecon.
Matt Bartos - Wireless sensor networks for smart water infrastructure
- Overview of research efforts underway at the University of Michigan Real-Time Water Systems Lab.
- Description of wireless sensor node hardware and data backend.
- Two ongoing applications:
- Using wireless sensor networks for real-time flash flood monitoring in the Dallas--Fort Worth Metroplex.
- Using wireless sensors to optimize stormwater quality via automated control infrastructure in Ann Arbor, MI.
Janet Fredericks - Update on XDOMES
- Current efforts
- Overhaul so users can create re-usable models of sensors with rulesets.
- Release of sensorML editor ongoing.
- Goals for cluster
- Create vocabularies that manufacturers can reference (e.g. sensor types, observable properties).
- Sensor vocabularies should be domain-driven rather than manufacturer driven.
- Remove ambiguities in vocabularies (e.g. beam strength vs. sensor strength).
- Identify some example cross-domain sensors.
- Create vocabularies that manufacturers can reference (e.g. sensor types, observable properties).
- Immediate goals
- By March meeting: Have sensor types and observable properties.
- Make spreadsheet (SensorType and ObservableProperties) a google doc that is linked to from envirosensing page to invite more community participation.
Plans for next telecon
- Doodle poll to set time for upcoming meetings (effective April).
Andrew Rettig (started work with ES cluster several years ago, stopped about a year ago; teaching ES networking at U of Dayton)
Eric Fritzinger (works with Scotty)
Discussion of what is needed from community to enable semantic and syntactic interoperability
- Sensor manufacturer responsible for making OEM model description
- creates a unique identifier for each sensor
- SensorML file is created and belongs to sensor owner
- Data managers describe processing that's done
- Documents reference terms that have meaning --> reference ontologies
- Hoping to get community to help with development of ontologies, esp in communities Janet isn't a part of (i.e. not in oceanography comm'ty) -- then Janet can work with sensor manufacturers to reference those terms
- Recorded presentation includes example ontology form; Wiki reference links should include date accessed
- MMI ontology registry has a big vocabulary database
- Can create new vocabularies and map to other known ontologies
- Watch the video of the telecon for more information about how this works
Scheduling an informal lunch/coffee meeting at AGU in San Francisco (all)
- Scotty added Janet's presentation schedule to his schedule for the meeting -- contact either of them if you want to meet up with them
- Janet will send out this message to everyone
Next meeting (2/6/17)
- Vasu and Janet talk about engaging sensor manufacturers
Change in leadership
- Scotty Strachan to begin co-leading EnviroSensing group with Janet Fredericks
- good time to change leadership, Don is retiring soon, time to think about new directions (XDOMES? other projects?)
- Matt Bartos (U Michigan) will take over as student fellow (Alison is done at the end of the year)
Discussion of new directions
- Don - short recap of history of EnviroSensing cluster and original mission
- this is review from previous telecons--see past notes
- Janet - update on past year of work on XDOMES
- ES group mission and goals is well-aligned with her work, but she needs people to participate rather than just be an audience for the work they're doing
- e.g. looking for vocabularies... need this before can go to sensor manufacturers
- having difficulty getting people involved -- need folks to participate!
- no ES explicit presence (i.e. no specific sessions) at winter meeting, but intend to have a session at summer meeting
- Don suggested more explicitly crafting this as a clear opportunity/desire to get people involved
- good to tie to specific time frame, task, or event
- Send out info prior to telecon, have people look at it and prep something specific, then discuss on telecon
- maybe try to target specific people - perhaps folks can help ID who these people would be
- follow-up meeting next month to try to get volunteers to describe sensors (spreadsheet?)
- plan to have an informal ESIP/EnviroSensing get-together at AGU?
- would be great to have a few volunteers, work up to people doing talks at ESIP Summer Meeting?
- spend some time together talking about what this is, catching up on the cluster, etc.
- Scotty - some thoughts on future directions
- excited to step in and help out with the group, since it's been very helpful to expand his data management world
- completed PhD over the summer, things are currently a little hectic--transitions, etc.
- focus on field designs, esp. on mountain environments
- could do multiple projects at once--would be good to stay focused and help the XDOMES project
Janet Fredericks on Q2O web service
Using SWE to bind metadata to observational data - enabling dynamic data quality assessment
Background: NOAA-sponsored project to address data quality in sensor web enablement frameworks
- Janet walked participants through the Q2O (QARTOD-to-OGC) web service implementation to demonstrate how web services are used to describe processing, select and describe data and offer it to a user as a service.
- open sensor access so that people can discover sensor data, etc.
- geolocatable/geospatially aware, fully described data, sensors, and processing
- free open source software on github
SmartCity Air Challenge
- Ethan: EPA is putting out a challenge called the SmartCityAir challenge: ask communities to tell them how they would deploy a large team of air sensors
- provide seed money, etc.
- groups have to describe how they would monitor and manage the data, deal with sensors throughout their lifecycle, do this sustainably…
- gov’t learning what practices work best
- will be announced formally next week; Ethan will send a less formal paragraph out to this group
- interested in ideas for how to get word out about this, how to encourage people to use best data management practices that exist
- focus is really on the data management side
With XDOMES and OpenSensorHub → trying to get all of that kind of knowledge (how it was used, description of data, etc.) so that you could have people throwing sensors out and have access to that data
- Janet will be sending out an email about preparing for the workshop at the summer meeting → people can let Janet know that they’re coming
- At Thursday session hopefully Janet, Mike, and Vasu will all be presenting
- Maybe talk about whether we want to reach out to a broader audience? Janet would like to know how many people plan to attend the workshop
- Poster? Don says we did one last year and could just substitute some of the stuff on there with summary points of what we’ve done this year on our calls -- Don will produce the poster if Janet sends summary slides from some of the talks to him
NO call in July -- we’ll just see each other at ESIP! Next call will be in August.
Don Henshaw (Forest Service in Oregon)
Alison Adams (EnviroSensing Student Fellow)
Annie Burgess (ESIP)
Lindsay Barbieri (Student Fellow)
Carlos Rueda (software engineer at Monterrey, part of XDOMES team)
Corinna Gries (LTER)
Felimon Gayanilo (Texas A&M, XDOMES)
Janet Fredericks (WHOI, XDOMES)
Wade Sheldon (LTER)
Mark Bushnell (oceanographer, XDOMES)
Jane Wyngaard (JPL)
Mark Bushnell – Quality Assurance and Quality Control of Real-Time Ocean Data (QARTOD):
- QARTOD manuals: focus on real time, usually coastal (http://www.ioos.noaa.gov/qartod)
- Includes quality control tests and quality assurance of sensors (in an appendix)
- Discussion of operational vs. scientific quality control and different needs/contexts for each
- Board meets quarterly to review progress and identify next variables (if you have ideas for variables, let Mark know!)
- Each manual takes 6-8 months and each one is a living document that is updated
- 26 core variables
- next up: phytoplankton species!
- Discussed an example test from the waves manual
- Five “states” for data qc flags (pass, not evaluated, suspect or of high interest, fail, missing data)
Discussion & Questions for Mark:
- How to handle flags that represent a mix of semantic notions? Hard for data consumers to understand (what’s the actual problem?)
- What about showing (or not showing) data that doesn’t meet a certain standard?
- If you’re looking for extreme events, for example, you might want to see all the data…
- Helpful to have the option to see all the data (if you’re an operator, say, you might look at failed tests for an instrument)
- Best to let the data user select what level of quality they’re interested in
- For EnviroSensing, is there a place to save/share code for QC tests?
- Not at the moment
- Would be good to start tracking/storing code for tests that people do somewhere and have a DOI that describes the processing
- python codes for the implementation of the QARTOD recommendations is at https://github.com/ioos/qartod (also look here --> https://github.com/asascience-open/QARTOD)
- Core link in email from Janet (body copied below) -- many other pages and additional information can be found from that link
- Background: The U.S. Integrated Ocean Observing System® (IOOS) Quality Assurance/Quality Control of Real-Time Oceanographic Data (QARTOD) project has published nine data quality-control manuals since 2012. The manuals are based on U.S. IOOS-selected parameters (or core variables) that are of high importance to ocean observations. The purpose of the manuals is to establish real-time data quality control (QC) procedures for data collection for core variables, such as water levels, currents, waves, and dissolved nutrients. The QC procedures provide guidance to eleven U.S. IOOS Regional Associations and other ocean observing entities, helping data providers and operators to ensure the most accurate real-time data possible. It began as a grass-roots organization over a decade ago - the background can be found on the IOOS QARTOD Project website: http://www.ioos.noaa.gov/qartod/welcome.html
- Links from Mark
- The link for the flags crosswalk note is http://odv.awi.de/fileadmin/user_upload/odv/misc/ODV4_QualityFlagSets.pdf.
- There's similar work at http://www.iode.org/index.php?option=com_oe&task=viewDocumentRecord&docID=10762
- Tad Slawecki (email@example.com) heads the informal QARTOD Implementation Working Group and posts minutes at https://docs.google.com/document/d/128xPGjTMBP9FC-SEg9vGe8bD145LGdapH21WV5smzn4/edit
- In those minutes, he has a USGS github link at https://github.com/USGS-R/sensorQC
Attendees (small group):
Best practices work:
- Post a citation suggestion on the introduction page of best practices? Maybe also a snapshot PDF that folks could download with a citation?
- People DO go to this page, says Scotty
- Mountain Research Institute (MRI) is doing some work trying to write best practices, would be good to try to get them to use ours rather than create something entirely new…
Future of EnviroSensing cluster:
- Just need to have one vision at a time--for now we can stay focused on the XDOMES work
- Scotty wants to continue to stay involved, promote cluster and its work a bit more to folks after finishing up his PhD (soon!); would be interested in taking lead in cluster after that, too
- Mark will present material on real-time quality control that was planned for this month on the next call instead, due to low attendance on this call
- Would be great to reiterate that we didn’t start as ONLY LTER--the best practices doc had input from other folks too--and that that isn’t where we have to stay, either--we can continue to incorporate other things/groups
- Janet will talk about summer meeting workshop plans on June call: registering vocabulary
- will be a good workshop for beginners/people who need to be introduced to the concept
- Summer meeting (July): have a 1.5-hour session for EnviroSensing--should email asking for folks to present
1. Future meeting ideas (Don)
- Revisit best practices--have folks present the chapters they did and rekindle interest; Scotty said he’d be willing to do this for his chapter
- Email Alison at firstname.lastname@example.org if you have ideas for future telecons!
2. Summer meeting -- session/workshop ideas? (proposals due at beginning of April)
- XDOMES workshop: connected to EnviroSensing cluster?
- sensor provenance EnviroSensing breakout session?
- Janet to lead more hands-on workshop on Semantic Web, etc.?
- Let Alison, Don, or Janet know if you have additional ideas
3. Update on work plan draft (Don & Janet)
- Conversation with Erin last week
- If you might like to be the next leader, let Don know--thinking of having an “on deck” leader position
- Interest in AGU Geospace blog? Lots on data use, etc. This could be a place to put out info about our cluster
4. Rotating chair position for Products & Services (Alison)
- Right now, Products & Services does three things: (1) FUNding Friday, (2) the P&S Testbed, and (3) tech evaluation process with NASA AIST to evaluate AIST-funded projects. Also working on an evaluation process for the projects funded through the testbed, and would hopefully provide this to the Earth science community at large eventually.
- P&S wants to have a rotating co-chair position; would last for three months and would be a rep from a different committee. The first co-chair (starting in April) wouldn’t be involved in proposal evaluation/selection, but would be involved in ideas for student/PI matchmaking and mentoring for FUNding Friday and the evaluation and testbed activities. It would be a great opportunity to learn more about P&S and have them more about us. It wouldn’t prevent you from submitting a proposal to the testbed.
- If you’re interested, email Soren at email@example.com; you can sign up for the rotating co-chair position here
5. XDOMES team on Semantic Web (Krzysztof Janowicz)
- Full recording with slides available in recorded meeting
- Semantic technologies can improve semantic interoperability
- More intelligent metadata, more efficient discovery, use, reproducibility of data; reduce misinterpretation and meet data requirements of journals, etc.
- Large community in science, gov’t, industry; many open source and commercial tolls and infrastructures; there are existing ontologies and huge amounts of interlinked data
- Move the logic OUT of software and INTO the data
- Semantic technologies support horizontal as well as vertical workflows
- Semantic interoperability can only be measured after the fact bc meaning is an emergent property of interaction; come up with technologies that prevent data that shouldn’t be combined from being combined
- How much of this is already in use or is this just setting up the platform at this point?
- With XDOMES we are at an early stage, but in terms of the infrastructure, a lot of it is already productively used
Bar (Lindsay Barbieri) - Data Analytics Student Fellow
Corinna Gries (North Temperate Lakes LTR site)
Janet Fredericks (Woods Hole, managing coastal observatory and leading XDOMES project)
Vasu Kilaru (EPA Office of Research and Development)
Josh Cole (UMBC Data Manager -- interested in hearing about XDOMES)
Scotty Strachan (Geography Dept at University Nevada Reno, current PhD student)
Pete Murdoch (Science Advisor for NE Region of USGS, also working with DOI)
Ethan McMahon (EPA Office of Environmental Information)
Renee Brown (Univ of New Mexico, Sevilleta LTER and Field Station)
Jane Wyngaard (Post-doc at JPL)
Don Henshaw (Forest Service PNW Research Station)
Notes drafted by Alison Adams (EnviroSensing Student Fellow) according to meeting recording
On the agenda:
Janet on XDOMES
What should EnviroSensing cluster do?
XDOMES (Cross-Domain Observational Metadata Environmental Sensing Network) NSF project funded as part of EarthCube
- cross-domain observational metadata
- focusing on: sensor metadata creation, data quality assessment, sensor interoperability, automated metadata generation, content management
- 4 types of work (funded for two years)
- software development to create Sensor ML Generators and registries for semantic techs and Sensor ML documents
- Trying to engage people like this community, sensor manufacturers to create a network of people interested in promoting adoption of sensor ML techs
- Useability assessment
- Integration of semantic interoperability
- this project is about capturing metadata at the time of CREATION of the data
- GOAL: put out metadata automatically in interoperable ways (community-adopted, standards-based framework)
- use of registered terms to enable interoperability
- can also help manage operations by providing standardized information about how sensors were configured, etc.
- want to develop community within ESIP who will follow and promote this approach
- vet, look at usability assessment, what do you want us to do and is it useful to you?
- Vasu says this is related to things EPA has been working on--excited to continue the conversation
- Jane asked about communication with sensor-creators so this can be implemented in new sensors; Janet said that isn’t part of the project at this point
- Potential application for folks with Alliance for Coastal Technologies--they work on sensor validation and I think it’s a slightly different take, but this is definitely something they could benefit from (like if we start describing our sensors in a way that the information can be harvested)
- If people are interested you can see what Janet is putting in the file and think about whether this would be helpful or useful or whether she’s missing something; would you be willing to test this in your own environment?
- Delivering data is kind of beyond the scope of the XDOMES project at this point
- Vision: info is out on the web, and you have an app to go and harvest the information that you need
- Do we want to work this pilot project into public presence in this cluster?
Future WebEx ideas
- Could do a WebEx on how water ML, sensor ML, etc. fit together
- Janet is approaching Krzysztof Janowicz to discuss sensors and semantics with us on the March call
Today we had Dave Johnson from LiCOR Environmental speak with us about eddy covariance. We had a large group of attendees.
- The problem they are solving is that the data coming in with eddy covariance is high resolution and large, more than 300 million records per year.
- They were able to put much of the processing and QC logic into the SMART system, which is on the instruments in the field.
- An eddy covariance system can "see" about 100 x its height above the canopy.
- We were interested in the possibilities for real-time data qc, and also how the information can be transferred between the instruments in the field (i.e. cell modem, line, etc.)
Today we had William Jeffries from Heat Seek NYC talk to us about his platform for civilian monitoring of home temperatures in apartments using low-cost sensors and wireless mesh networks taking hourly readings.
Attending the call were Fox and Don from Andrews, Josh at UM, Jason in Fairbanks, Becky from Onset, Ryan from UT and Scotty from UNV as well as some listen-in callers.
- Heat seek NYC has nodes containing up to 1000 low-power, low-cost thermometers which are mapped to custom printed circuit boards. The software stack is Ruby on Rails with a Post-Gres backend.
- HSNYC focuses on sensor network's ability to affect and effect policy, and so far has seen that the government views sensor networks as a cool solution to a lot of difficult regulatory challenges. They provide reliable information that officials and citizens can use with printable, court-friendly PDF's.
- Funding is by kickstarter and was started by the NYC Big Apps campaign, a giant Civic Hacking Competition.
- They have been incorporated as a non profit in NY since 2014
- First sensors were Twine Sensors off the shelf temperature sensors; not so good
- Now they make their own temperature sensors, using a push system instead of a pull system. (if there's a problem with the system just don't get pull out of it.)
- We asked about LCD and the reason for not implementing is that many tenants aren't actually as interested in the data as the response from the policy makers
- Some people will abuse the system so they use tamper evident tape and photos of installation to protect ; landlords have financial incentives and there can be intentional lack of repairs
- Local cache-ing of the sensor occurs; sensors cache at the hub, relay server that sends back
- There are several levels of caching, and also well as flash memory on the XP radios
- A key question was is there a long-term storage that is a local cache? We don't have a solution either
- We had interest in a company called H20degree
- For QC they do indoor and outdoor temperature comparison, comparing the sensors to one another and to store bought sensors before and after installation...
- Frequency for radios/caching is hourly waking up point data
- We are all interested in a low-or-no power source/ transmitter, maybe based on raspberri pi
- need a high peak power envelope-- transmission needs to deal with being fairly far apart... implementation needs to come online and transmit an awesome signal for a short time and go back to sleep.
Note on Webex recording: I am going to check into the recording- my system claims to have been recording but it is nowhere to be found (Fox)
Rick Susfalk from the Desert Research Institute presented about Acuity Data Portal. Notes from meeting taken by Fox Peterson, please edit as you see fit.
We had in attendance 8 persons.
Acuity Portal System
- Started in 2006
- originally VDV data solution
- improvements to web-interface; sits on top of VDV as Acuity server
- Acuity is a continuous monitoring of key client-driven data
- it includes sensors and data logging deployment and maintenance, telemetry, data storage and analysis, automated airing, web portal for data access
- individualized web presence tailored to client needs
- not a single tool, but instead integrates commercial, open source, and proprietary hardware and tools
- customizable project specific descriptions
- common tools used to provide rapid, cost-effective deployment of individualized portals
- physical infrastructure is shared amongst smaller clients for cost-saving or it can be segregated for larger clients.
- access is controlled down to the variable level-- "we can define who gets to see what"-- for example, public can not see some features
- one view could be "pre-defined graphs" without logging in, but if you want to download the data you must log in at your permissions level
- priority, very important
- DOD certification and accredidation (possible link below)
- have hired a security professional
- https:// everywhere for good protocol
- customized thresholding and data-freshness
- trending alerts, for example, know if battery will go bad
- stochastic and numerical modeling
- scoring incoming data for QA/QC processing
- web-based GUI
- users and managers can create, edit, and modify alerts online
- groups can be created so that you can schedule management and alerts
- also offers localized redundant alerting
- two-way communication with the campbell data loggers (cr1000)
- more about getting the data to the data managers for more in-depth QA/QC than about providing that part of the tooling
Data graphing features
- Pre-determined graphs for basic users
- Data selector for more advanced users
- "we don't know what the users want to see so we give them the tools to do it" (good idea!)
- anything you can change in Excel you can change in their graphs on the website -
- vista data vision (http://www.vistadatavision.com/)
- another vdv link: (http://www.vistadatavision.com/features/responsive/)
- relates your parameters to the network and what other sensors are doing
- current system is getting more flexible
- metadata is still largely user responsibility
Flight plan - safety tool
- field personnel are data
- users put in the travel time for safety
- buddy system, alerted right before you return, then calls boss etc. Many levels hierarchy
- Portals that are monitoring things
- ability for data refreshment
- colors for indication, ex., data would not be gray if there was lots of new data
- users can change the settings on the data logger
- scrolling, scaling, plotting, etc. via interaction with the user
- can save your own graphs
graphs and alerts
- many parameters
- you can save!
- email, sms, phone
- default settings for users
- lots of personnel management tools in this in general
- cross-station "truly alarm or not" if station1 has a value but station2 has a different one, don't alarm sorts of rules
- lists/user groups appear to be very important with this tool
- sensor and triggers: customize one or more parameters that you are bringing into your database
real-time updates on loggers
- ex. 10 minute data, user comes in and makes a change, the information is saved to the database and then is presented to all other users
- the person will request a change and say what that change is
- when there are different levels of connectivity ie. analog phone modems, before the data has the chance to work its way back into the system there is a lot of validation being done
- everglades heat pulse flow meters
- uses google maps
- extends beyond the vdv, more than 1 .dat file
- integrate multiple .dat into many tables
- managed by the data managers at DRI
- workflow :
logger net --> vdv --> acuity, ok, let's give access to the DM for all these variables, click on it ok now the manager can see it --> generate an excel file with tables for all this metadata --> enter the data into the excel files--> send back to acuity --> injests, runs queries, back to db-- > metadata in bulk, quickly
- we asked if the system ends before the qa/qc process begins, answer: qa/qc is done at the DRI, near real time QA though
- direct the managers to the future data problems
- manual decision making
Scotty asked about the duration (long and short term) of projects and how affects funding. most is funded by long-term projects; this is why they do the stats and numerical methods in the future
Amber asked about pricing; pricing is by hour to get up, then a price for maintaining the system for the duration of the project 5-10 .dat files, only 8 hours of person time at DRI to make a portal
Jordan Read presented the SensorQC R package
Wade Sheldon presented the GCE Data Toolbox – a short summary follows:
- Community-oriented environmental software package
- Lightweight, portable, file-based data management system implemented in MATLAB
- generalized technical analysis framework, useful for automatic processing, and it's a good compromise using either programmed-in or file-based operations
- Generalized tabular data model
- Metadata, data, robust API, GUI library, support files, MATLAB databases
- Benefits and costs: platform independent, sharing both code and data seamlessly across the systems, version independent as far as MATLAB goes, and is now "free and open source" software. There is a growing community of users in LTER.
Toolbox data model
- Data model is meant to be a self-describing environmental data set-- the metadata is associated with the data, create date and edit date and such are maintained, and its lineage.
- Quality control criteria- can apply custom function or one already in the toolbox
- Data arrays, corresponding arrays of qualifier flags -- similar to a relational database table but with more associated metadata
Toolbox function library
- The software library is referred to as a "toolbox"
- a growing level of analytical functions, transformations, aggregation tools
- GUI functions to simplify the usage
- indexing and search support tools, and data harvest management tools
- Command line API but there is also a large and growing set of graphical form interfaces and you can start the toolbox without even using the command line
Data management framework
- Data management cycle - designed to help an LTER site do all of its data management tasks
- Data and metadata can be imported into the framework and a very mature set of predefined import filters exist: csv, space- and tab-delimited and generic parsers. Also, specialized parsers are available for Sea-Bird CTD, sondes, Campbell, Hobo, Schlumberger, OSIL, etc.
- Live connections i.e. Data Turbine, ClimDB, SQL DB's, access to the MATLAB data toolbox
- Can import data from NWIS, NOAA, NCDC, etc.
- Can set evaluation rules, conditions, evaluations, etc.
- Automated QC on import but can do interactive analysis and revision
- All steps are automatically documented, so you can generate an anomalies report by variable and date range which lets you communicate more to the users of the data
- Fox Peterson (Andrews LTER) reported on QA/QC methods they are applying to historic climate records (~13 million data points for each of 6 sites).
The challenge was that most automated approaches still produced too many flagged data that needed to be manually checked. Multiple statistical methods were tested based on long-term historical data. The method they selected was to use a moving window of data from the same hour over 30 days and test for 4 standard deviations in that window; E.g., use all data for 1 pm for days 30 - 60 of the year, compute four standard deviations, and set the range for the midpoint day (45) at the 1pm hour to that range.
- Josh Cole reported on his system, which is in development and he will be able to share scripts with the group.
- Brief discussion of displaying results using web tools.
- Great Basin site discussed the variability in their data, which "has no normal"-- how could we perform qa/qc based on statistics and ranges in this case?
- Discussion of bringing Wade Sheldon to call next time / usefulness of the toolbox for data managers
- Discussion of using Pandas package- does anyone have experience, can we get them on?
- Discussion of the trade off between large data stores, computational strength, and power. Good solutions?
- ESIP email had some student opportunities which may be of interest
- Overall, it was considered helpful if people were willing to share scripts. Discussion of a GIT repository for the group, or possibly just use the Wiki.
Suggestions for future discussion topics
- Citizen Science contributions to environmental monitoring
- 'open' sensors - non-commercial sensors made in-house, technology, use, best practices
- Latest sensor technologies
- Efficient data processing approaches
- Online data visualizations
- New collaborations to develop new algorithms for better data processing
- Sensor system management tools (communicating field events and associating them with data)