Group B2 - Data Processing and Integration

From Federation of Earth Science Information Partners

<Back to Data Summit Workspace

 Data ProcessingData Consolidation/IntegrationEnd-to-End IntegrationDecision Support
3D-AQSNot GivenNot GivenNot GivenNot Given
AIRNowProcessing is done at the AIRNow Data Management Center, using Oracle and other custom code. QA/QC and data integration are the primary processing steps at the DMC.AIRNow serves to consolidate real-time data and forecasts from agencies across the country. The AIRNowTech systems also perform some integration by pulling in meteorological data, fire and smoke information, trajectory modeling, etc.Data processing, QA/QC, integration, and visualization are all performed within the AIRNow system of systems.Public visualization is via animated GIF maps, offered on the AIRNow.Gov website. Dynamic visualization may be accomplished via the Navigator tool on AIRNowTech. In addition, AQI current conditions and forecasts are available as KML and can be used for Google Earth mashups (
CASTNETData is managed in an Oracle 10g databaseNot GivenNot GivenNot Given
CMAQNot GivenNot GivenNot GivenNot Given
DataFedThe processing of raw data is performed by reusable web-service components, which include filtering, aggregation, and data fusion services. Data processing applications are created by chaining services using workflow software.Data consolidation from heterogeneous to homogeneous structure is performed on the fly for most datasets. Many historical datasets are cached at DataFed for fast data access and browsing.
EMFUses the Sparse Matrix Operator Kernel Emissions (SMOKE) modeling system to process emission inventories to the resolutions needed by the air quality models and other data summaries. Uses the Spatial Surrogate Tool (based on the MIMS Spatial Allocator) to create Spatial Surrogates that are an input to SMOKE. Uses the Speciation Tool to create chemical speciation factors for VOC, needed as an input to SMOKE.Sharing data planned with Emission Inventory System (EIS) to help reduce/prevent duplicate data.Not GivenNot Given
EPA AIRQuest Data WarehouseNot GivenNot GivenNot GivenNot Given
EPA AQSQuality assurance and temporal aggregation.Data from multiple networks is included (criteria, toxics, visibility, etc.)Not GivenNot Given
ESIPNot GivenMetadata aggregator. Provide domain-specific context for data where metadata and other contextual information about earth science data is available to help understand how data are created and used.Promote the integration and flow of Earth science data from collection and analysis to end-use.Not Given
GIOVANNIS4PM, Giovanni workflowCALIPSO, CloudSat, MODISComplete end-to-end chains starting from the original data source via various protocols are integrated together into Giovanni instancesNot Given
GeoWebNot GivenNot GivenNot GivenNot Given
HEIsee dataset informationsee dataset informationfrom the selection of relevant/interesting sites to data download (one query form for several types of relevant information)queries on meta data allow users to become familiar with dataset and availability of data
NARSTOData in the Data Exchange Standard format are checked for various aspects of formatting and some content QA, and issues are resolved prior to archiving. Consistent data and metadata reporting requirements.This is being developed.Not GivenNot Given
NASA Atmospheric Science Data CenterFull high-level processing for CERES, MISR and CALIPSO data sets.
  • Data format primarilyin HDF-EOS with integrated metadata.
  • Metadata managed in PostGres POSTGIS open source database.
Full science data life cycle from processing to archival and distribution
  • Air quality decision support
  • Renewable energy/energy management decision support
NEISGEIWeb services for data aggregation and analysis.Not GivenNot GivenNot Given
RSIGYes. RSIG does subsetting, reformatting, conversion, and augmenting of data to facilitate visualization and subsequent analysis with other datasets. Examples include: aggregating across multiple files covering the user-specified date/time range, subsetting to a lon-lat box, striding over dense data points for fast visualization rasterization, decoding integer data to reals, lossless compression/decompression to speed-up transfer over the network, units conversion to SI and model-common units, data reordering to match model-common conventions such as ground-up, augmentation with longitude, latitude and elevation for each data value, and projecting and aggregating into model grid cells.Yes. RSIG visualizes data together over a map. RSIG can regrid all surface data points onto the CMAQ grid (layer 1). RSIG does not regrid data (e.g., CALIPSO LIDAR) onto the CMAQ vertical grid layers due to unresolved difficulties with the vertical grid scheme used by CMAQ.Not presently. There are plans for HB/MCMC launch and visualize.Not presently. There are plans to enable RSIG to launch other applications such as HB/MCMC and visualize their results. Also, external applications such as AirQuest can utilize RSIG's WCS servers.
Unidata IDD Data SystemMany Unidata sites have elaborate processing systems, but the central Unidata dissemination facilities deliver the data in its original form.Data from many different sources are available via the Unidata datastreams in real-time and can be analyzed in an integrated fashion via tools such as the Unidata Integrated Data ViewerUnidata has been involved in large integration projects such as GEOSS and the NSF ITR project called LEAD (Linked Environments for Atmospheric Discovery) that incorporate the Unidata suite of data systems and tools into end-to-end systems.Unidata also supports middleware such as the netCDF and the THREDDS Data Server that enable users to access data from remote servers via standard protocols for analysis and display using tools other than those provided by Unidata. Many of these are tools such as ArcGIS and IDL that are used in the GIS and Decision Support Realm
VIEWSThe VIEWS data import system accepts data in a variety of formats from various providers and can also automatically download and extract data from known online sources. The import system extracts the data and metadata from its source, examines it for basic integrity issues like duplicate records and mismatched data types, performs various data conversions, transforms the data into an integrated schema (described above), calculates checksums for numeric fields, and validates the resultant data against its source before transferring it to the production data warehouse. Many other processing operations – such as filtering, aggregation, averaging, formatting, renaming, recoding, and outputting – can be done on-the-fly by using the various data access and analysis tools on the website.VIEWS employs an advanced data acquisition and import system to integrate data from several air quality data centers into a single, highly-optimized data warehouse. Ground-based measurements from dozens of monitoring networks, air quality modeling results, and detailed emissions inventories are imported and updated on a regular basis using a generalized, uniform data model and carefully standardized metadata. Names, codes, units, and quality flags from the source datasets are carefully mapped to a unified standard, and native formats and organizations are transformed into a common, normalized database schema. This design enables users to explore, merge, and analyze datasets of widely-varying origin in a consistent, unified manner with a common set of tools and web services. This degree of interoperability allows decision-makers to analyze diverse datasets side-by-side and focus on high-level planning strategies without having to contend with the details of data management and manipulation.The acquisition, import, verification, transformation, integration, presentation, analysis, and dissemination of data are all handled by the various subsystems of VIEWS. In addition, the regular solicitation and incorporation of user feedback ensures that this end-to-end integration is further supplemented by the input of the decision makers who comprise the VIEWS user base.VIEWS/TSS users are typically asking questions of “What pollutants are impacting a given area?” and “Where are these pollutants coming from?” The answers depend upon accurate assessments of aerosol loading and source attribution, and FLMs and states are occupied on an ongoing basis with these goals. States are further mandated to answer the question of “What can be done to reduce these impacts?”, because the Regional Haze Rule requires states and tribes to develop implementation plans for reducing emissions and demonstrating reasonable progress towards doing so, and these plans must provide for an improvement during the 20% worst visibility days while also ensuring no degradation during the 20% best visibility days. To accomplish this, users must identify the pollutants, quantify their amounts, and determine the sources of anthropogenic emissions that contribute to this pollution on both the “best” and the “worst” visibility days in a given area. They must then determine available control measures for each source and evaluate these measures on the basis of costs, time, energy and environmental impacts, and the remaining life of the source. Planners then employ these analyses to make decisions about what controls to implement, to estimate projected improvements, and to track their progress in reaching these goals. The resulting decisions have obvious ecological impacts, but can also have important political and economic impacts in the sense that deciding which sources to control is a politically-significant issue and the process of controlling emissions and tracking progress costs money and takes time. VIEWS and the TSS have been designed with these decision needs in mind from the outset, and input and feedback from primary decision-makers is solicited on a routine basis through workshops and training sessions. Such input is carefully factored back into the design process to improve existing tools and design new ones in an ongoing process of progressive evolution.