Difference between revisions of "Sensor Data Management Middleware"

From Earth Science Information Partners (ESIP)
Line 1: Line 1:
 
back to [[EnviroSensing Cluster]] main page
 
back to [[EnviroSensing Cluster]] main page
  
[[File:table.png|Fig 1. Middleware |400px|right]]
+
[[File:table.png|thumbnail|400px|right|Fig. 1 The position of middleware in a generic sensor data management system.]]
  
 
<big>'''Overview'''</big> <br />
 
<big>'''Overview'''</big> <br />
Line 57: Line 57:
 
 
 
Some steps in selecting middleware are:  
 
Some steps in selecting middleware are:  
[[File:SensorMgmtWorkflow.png|Fig 1. Sensor management workflow. Simple sensor management configuration is presented in blue; optional system components are shown in grey. |400px|right]]
+
[[File:SensorMgmtWorkflow.png|thumbnail|400px|right|Fig 2. Sensor management workflow. Simple sensor management configuration is presented in blue; optional system components are shown in grey.]]
 
#Identify your objectives. What do you want the middleware to do?
 
#Identify your objectives. What do you want the middleware to do?
 
#Assemble a list of candidate software.
 
#Assemble a list of candidate software.
Line 78: Line 78:
  
 
:'''[http://giws.usask.ca/Documentation/SiteInformation/MarmotTelemetry.pdf Marmot Creek Research Site], Rocky Mountains, Canada'''
 
:'''[http://giws.usask.ca/Documentation/SiteInformation/MarmotTelemetry.pdf Marmot Creek Research Site], Rocky Mountains, Canada'''
 +
[[File:MarmotCreekSensorMgmt.png|thumbnail|right|400px|Fig. 3. Marmot Creek research site's PacBus Network with mixed data loggers and Raven to RF401 Base]]
 
:'''''Introduction'''''
 
:'''''Introduction'''''
 
:Marmot Creek research site is located on the eastern slopes of Rocky Mountains in Alberta, Canada. The site is dominated by the needle leaf vegetation and poorly developed mountain soils.  Precipitation, snow depth, soil moisture, soil temperature, short and longwave radiation, air temperature, humidity, wind speed, and turbulent fluxes of heat and water vapour data sets are collected and used for the hydrological modelling of the Marmot Creek Basin. Time series records are obtained at Hay Meadow, Upper Clearing , Vista View, Fisera Ridge, and Centennial Ridge hydro-meteorological stations equipped with different sensor configurations and Campbell Scientific data loggers.
 
:Marmot Creek research site is located on the eastern slopes of Rocky Mountains in Alberta, Canada. The site is dominated by the needle leaf vegetation and poorly developed mountain soils.  Precipitation, snow depth, soil moisture, soil temperature, short and longwave radiation, air temperature, humidity, wind speed, and turbulent fluxes of heat and water vapour data sets are collected and used for the hydrological modelling of the Marmot Creek Basin. Time series records are obtained at Hay Meadow, Upper Clearing , Vista View, Fisera Ridge, and Centennial Ridge hydro-meteorological stations equipped with different sensor configurations and Campbell Scientific data loggers.
Line 83: Line 84:
 
:The telemetry network consists of one Raven CDMA cellular modem and RF401 spread spectrum radio modem located at the Upper Clearing base station, four additional RF401 modems located at each of the Meteorological stations serviced by telemetry, and the desktop computer located at the University of Saskatchewan. The radios connected to the data loggers at each of the meteorological stations talk to the base station on an ongoing basis. All of the data loggers and RF401 radios have PacBus addresses and they operate as PacBus Nodes.  Also, data loggers are set to operate as routers enabling routing inside this network through the various paths. The telemetry network configuration is presented in Figure 3.
 
:The telemetry network consists of one Raven CDMA cellular modem and RF401 spread spectrum radio modem located at the Upper Clearing base station, four additional RF401 modems located at each of the Meteorological stations serviced by telemetry, and the desktop computer located at the University of Saskatchewan. The radios connected to the data loggers at each of the meteorological stations talk to the base station on an ongoing basis. All of the data loggers and RF401 radios have PacBus addresses and they operate as PacBus Nodes.  Also, data loggers are set to operate as routers enabling routing inside this network through the various paths. The telemetry network configuration is presented in Figure 3.
  
 +
 +
<br/>
 
<caption><big>'''Table 1. Basic features'''</big></caption>
 
<caption><big>'''Table 1. Basic features'''</big></caption>
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"

Revision as of 19:26, March 24, 2014

back to EnviroSensing Cluster main page

Fig. 1 The position of middleware in a generic sensor data management system.

Overview

Middleware are software packages and procedures that reside virtually between data collectors, such as automated sensors, and data ‘consumers’, such as data repositories, websites, or other software applications. Middleware can be used to perform tasks such as streaming data from data loggers to servers, archiving data, analyzing data, or generating visualizations.

Many middleware packages are available for developing a comprehensive, reliable, and cost-effective environmental information management system. Each middleware option can have a unique set of requirements or capabilities, and costs can vary widely. A single middleware package may be used if it includes all of the user requirements, or multiple middleware may be bundled into a data management system if they are compatible or interoperable with each other and the rest of the data collection and management system.

This section describes multiple middleware packages that are currently available, and provides examples of how different software and procedures are being used to collect, analyze, visualize, and disseminate sensor-supplied environmental data.


Introduction

There are multiple factors that may affect the choice, use, and performance of middleware. These factors may be classified according to a group’s research agenda, technological requirements, and personnel skill sets.

Research Agenda: The research agenda of a group is a major determinant of the type of middleware system needed. A group focused on only one or a few narrowly focused research questions may need fewer types of sensors and consequently, fewer software modules may be adequate to streamline data processing from collection to the end goal. A team that investigates multiple questions spanning multiple research domains is likely to use more diverse and/or larger sets of sensors. There may not be a single middleware package that can meet all of the needs of a research group. In this case, multiple packages will need to be linked into a workflow.
Technological Requirements: The technological requirements of a research program may vary from simple to complex. If the research can be done with sensors from a single, well-managed company, the proprietary software packaged with the purchased sensor network may be adequate for at least a major portion of the information management system. For example, for Campbell Scientific dataloggers, their “LoggerNet” software integrates communication, data download, display and graphics functions. However, some dataloggers and sensors (particularly innovative ones, custom-built), may need custom-written software. It is important to plan time and budgets for required software upgrades, licensing, additional packages, support, and maintenance. Systems that cost less in the outset may not always be cheaper over the long run. It is also important to consider how to best meet infrastructure and bandwidth requirements, while deploying middleware on a variety of servers or laptop computers in the field or lab setting. Depending on the data and hardware infrastructure characteristics, each middleware option can introduce benefits or drawbacks to the overall system functionality.
Personnel Skills: Another key factor to consider is the skill set of the personnel. A complex data management system may require multiple people, each with a unique skill set such as database design, system architecture, web programming, etc. It is important to correctly identify each person’s skill set and role in data management tasks. It may also be necessary to plan for additional hires or job-training to addresses various scenarios and solutions, to identify appropriate salaries, and to budget enough time for software development and system administration. More details about the personnel roles and skills can be found in the “Roles and required skill sets“ section.


Middleware Functionality

Middleware can be classified with respect to the functionality they provide, such as:

Controlling instrumentation and data collection: Modules may be used to control sampling intervals, manage the event-triggered (burst) or continuous sampling regimes, communicate and transfer data between the instrumentation and other system components.
Data monitoring, processing, and analysis: Modules may provide alarm management, perform automated QA/QC on data streams, or run derivative calculations including averages, aggregation and accumulation, data shifting and transformation, filtering of time series records with respect to the dates, value range, location, station/variable type, or other criteria.
Export and publishing of data: Modules may provide functionality to export sensor data to different formats (e.g., ASCII, binary, or xml), different archives, make data discoverable through geospatial catalogues, or publish the data through web services.
Data visualization: Modules may provide visualization (e.g., tables, graphs, sonograms) of geospatial and/or time series data from sensor arrays or workflow structures.
Documentation: Modules may be used to document field events through paperless collection of field data, integrate sensor data and documentation (see sensor tracking & documentation section), or handle sensor calibration records.
Other supported functionality: Modules may be used to provide access to external data (e.g., ODBC, JDBC, OLE DB), to connect or chain other middleware components, or to implement mobile applications.


Middleware can also be classified by software proprietary rights and whether they are considered applications or platforms. Accordingly, we can identify different groups of middleware:

  • Proprietary data management applications and platforms
  • Proprietary research applications
  • Limited open source applications (free packages that can be used with proprietary solutions)
  • Open source data management applications and platforms
  • Open source research applications and programming languages


Some of the applications and platforms listed above are often identified as a software of choice for many different organizations. More details about each of these components are provided in the next section of this document.


Best Practices

Choosing the middleware components that will best fit the tasks and work environment can be challenging. In addition to the personnel roles and skills, budget, and infrastructure considerations already discussed in the Introduction and other chapters of this best practices guide, it is important to be aware of the whole sensor management process in order to identify the suitable middleware components. In some cases, a proprietary middleware software will required as part of the information management system if the instrumentation only outputs data in a proprietary format. In other cases, multiple open source software packages may be suitable for chaining into a comprehensive system that manages data from collection to final archiving and sharing.

Some steps in selecting middleware are:

Fig 2. Sensor management workflow. Simple sensor management configuration is presented in blue; optional system components are shown in grey.
  1. Identify your objectives. What do you want the middleware to do?
  2. Assemble a list of candidate software.
  3. Rate the candidates based on capabilities, cost (keeping in mind that a simple-to-use but expensive package may cut costs in the long-term), stability, and ease of use with respect to the personnel skills available on your team.
  4. If no single software product can meet all the objectives, test to see how well different candidate software integrate with one another to perform the needed functions.


During this planning stage, consider the following recommendations:

  • Identify workflow components and describe their functional requirements from the instrumentation to the archive level of organization (see Figure x). Some components can be optional or part of the more complex solutions.
  • Plan for robust execution and choose software and hardware components that can handle the loss of connectivity, power, or other failures related to harsh environmental or operational conditions.
  • Choose reusable/sharable components.
  • Keep field deployment of middleware as simple as possible (keep out of field if possible).
  • Use as few middleware components as possible based on research group requirements.
  • Document and diagram the entire workflow and update as needed.


Case Studies
We present several real world case studies in that vary widely in the types of ecosystems that sensors are deployed in and in complexity of the information management system. Some case studies include proprietary software only, some include free or open-source software, and some include both.

Marmot Creek Research Site, Rocky Mountains, Canada
Fig. 3. Marmot Creek research site's PacBus Network with mixed data loggers and Raven to RF401 Base
Introduction
Marmot Creek research site is located on the eastern slopes of Rocky Mountains in Alberta, Canada. The site is dominated by the needle leaf vegetation and poorly developed mountain soils. Precipitation, snow depth, soil moisture, soil temperature, short and longwave radiation, air temperature, humidity, wind speed, and turbulent fluxes of heat and water vapour data sets are collected and used for the hydrological modelling of the Marmot Creek Basin. Time series records are obtained at Hay Meadow, Upper Clearing , Vista View, Fisera Ridge, and Centennial Ridge hydro-meteorological stations equipped with different sensor configurations and Campbell Scientific data loggers.
Communication equipment and methods
The telemetry network consists of one Raven CDMA cellular modem and RF401 spread spectrum radio modem located at the Upper Clearing base station, four additional RF401 modems located at each of the Meteorological stations serviced by telemetry, and the desktop computer located at the University of Saskatchewan. The radios connected to the data loggers at each of the meteorological stations talk to the base station on an ongoing basis. All of the data loggers and RF401 radios have PacBus addresses and they operate as PacBus Nodes. Also, data loggers are set to operate as routers enabling routing inside this network through the various paths. The telemetry network configuration is presented in Figure 3.



Table 1. Basic features

Program Licensing Cost Input data format Export data format Needed programming expertise
Antelope Orb Proprietary Pay ASCII, Binary ASCII, Binary Advanced
Aquarius Proprietary Pay Advanced
ArcGIS Proprietary Pay ASCII, shapefiles ASCII, shapefiles Advanced
B3 Open source Free ASCII ASCII None to Basic
BigSense and LtSense Open source Free Binary CSV, JSON, TXT, XML Advanced
Cosm
CUAHSI HIS Open source Free ASCII XML, WaterML Intermediate
DataTurbine Open source Free ASCII, Binary ASCII, Binary Advanced
EddyPro Proprietary Pay Binary ASCII, Binary Intermediate
GCE Toolbox Matlab is proprietary, Toolbox is open source Matlab is pay, Toolbox is free ASCII, Binary(?), database ASCII, Binary(?), MAT, database Intermediate to advanced
Hobolink (Onset) Proprietary Free Proprietary ASCII, Proprietary None
Hoboware (Onset) Proprietary Pay Proprietary ASCII, Proprietary None
Kepler Open source Free ASCII, Binary ASCII, Binary Basic to Advanced
Lake Analyzer Proprietary/Open source Free ASCII ASCII Basic
LoggerNet (Campbell) Proprietary Pay Proprietary ASCII, database Intermediate
Nexsen's Technology Proprietary Pay
Pegasus
R Open source Free ASCII, Binary, database ASCII, Binary, database Intermediate to Advanced
SAS Proprietary Pay ASCII, Binary, database ASCII, Binary, database Intermediate to Advanced
Taverna Open source Free Intermediate to Advanced
Vista Data Vision Proprietary Pay ASCII ASCII Basic?
VizTrails Open source Free ASCII ASCII Basic to Advanced
WaterML support
WISKI Proprietary Pay ASCII ASCII Advanced
YSI EcoNet Proprietary Pay


Program Hardware communication QA/QC capacity Capacity to stream to archive Data transformation and analysis Data visualization Custom SQL queries/Scripting
Antelope Orb Yes, customizable Yes, customizable Yes, customizable Yes, customizable Yes, customizable Yes
Aquarius Yes, not customizable Example Example Example
ArcGIS Example Example Example Example Example
B3 Example Example Example Example Example
BigSense and LtSense Example Example Example Example Example
Cosm Example Example Example Example Example
CUAHSI HIS Example Example Example Example Example
DataTurbine Example Example Example Example Example
EddyPro Example Example Example Example Example
GCE Matlab Example Example Example Example Example
Onset Hobolink Example Example Example Example Example
Onset Hoboware Example Example Example Example Example
Kepler Example Example Example Example Example
Lake Analyzer Example Example Example Example Example
Campbell LoggerNet Example Example Example Example Example
Nexsen's Technology Example Example Example Example Example
Pegasus Example Example Example Example Example
R Example Example Example Example Example
SAS Example Example Example Example Example
Taverna Example Example Example Example Example
Vista Data Vision Example Example Example Example Example
VizTrails Example Example Example Example Example
WaterML support Example Example Example Example Example
WISKI Example Example Example Example Example
YSI EcoNet Example Example Example Example Example



Program Task automation Multi-tier architecture Website publishing Streaming through web service Support for modeling
Antelope Orb Example Example Example Example Example
Aquarius Example Example Example Example Example
ArcGIS Example Example Example Example Example
B3 Example Example Example Example Example
BigSense and LtSense Example Example Example Example Example
Cosm Example Example Example Example Example
CUAHSI HIS Example Example Example Example Example
DataTurbine Example Example Example Example Example
EddyPro Example Example Example Example Example
GCE Matlab Example Example Example Example Example
Onset Hobolink Example Example Example Example Example
Onset Hoboware Example Example Example Example Example
Kepler Example Example Example Example Example
Lake Analyzer Example Example Example Example Example
Campbell LoggerNet Example Example Example Example Example
Nexsen's Technology Example Example Example Example Example
Pegasus Example Example Example Example Example
R Example Example Example Example Example
SAS Example Example Example Example Example
Taverna Example Example Example Example Example
Vista Data Vision Example Example Example Example Example
VizTrails Example Example Example Example Example
WaterML support Example Example Example Example Example
WISKI Example Example Example Example Example
YSI EcoNet Example Example Example Example Example