DataFed

From Federation of Earth Science Information Partners

Jump to: navigation, search

<Back to Data Summit Workspace <All Data Systems
Edit with Form or Submit Word Doc

General

Contact

Data System Name: DataFed
Data System URL: http://datafedwiki.wustl.edu
Contact Person: Rudy Husar
Contact e-mail: rhusar@me.wustl.edu

Background

About the Data System (Purposes, Audience):

DataFed is Web services-based software that non-intrusively mediates between autonomous, distributed data providers and users. DataFed is designed in accordance with the GEOSS architecture; It provides standard interfaces to heterogeneous distributed data, fosters data integration and use with processing web services and tools, and collects metadata and user-feedback on datasets. DataFed also provides standards-based data feeds to the NASA Giovanni System.

Presentation:

Not Given

History:

The federated data system, DataFed, was in development since 2001 at Washington University, CAPITA, with grants from NSF, NASA, EPA and RPOs. Since 2004, DataFed served both Regulatory and Policy support to EPA. Within CAPITA, DataFed has become a scientific data analysis tool.

Agencies:

Washington University

List of Publications, Papers, Presentations:

FASTNET,EPA Exceptional Event Project, Interoperability of Web Service-Based Data Access and Processing... ESTO 2006, Combined Aerosol Trajectory Tool, CATT, DataFed: Mediated Web Services for Distributed AQ Data Access and Processing IGARS 2007, Interoperable Info System of Systems for HTAP HTAP 2007

Data System Scope

Data Content


Datasets Served:

See Dataset Catalog

Parameters:

DataFed provides access to over 100+ distributed, air-quality relevant datasets (surface, satellite, and model) which can be explored and analyzed by tools for processing and visualization.

Spatial - Temporal Coverage:

About half of the datasets are global scale, a third are US-scale, while some datasets are for other regions. Most datasets are multi-year in extent. About a ten datasets are near-real-time.

Applications/Potential


Health:

No applications to health studies. However, the datasets mediated through DataFed are suitable for health studies, particularly in conjunction with the 1km-resolution US population data.

Forecasting and Reanalysis:

A current NASA project with BAMS uses DataFed to assimilate surface obs. into a forecast model. We are not aware of any formal Air Quality Reanalysis effort; hopefully, thee community will

Model/Emissions Evaluation:

The EPA NEISGEI Project uses DataFed to integrate and to evaluate multiple emission databases. DataFed was used to prepare an evaluation of the CMAQ Aerosol Model with IMPROVE and FRM data. In the NASA project with BAMS, DataFed provides surface observations for assimilation into a forecast model. Add HTAP data integration...

Characterization, Trends, Accountability:

DataFed was used to perform aerosol characterization for the RPO project, FASTNET. DataFed is the main data source supporting the development of EPA's Exceptional Event Rule. It is now used in the implementation of the EE Rule.

Other:

Since 2004, a major role of DataFed was to participate in interoperability experiments for GEOSS.

Data System IT

Primary/Official Store for Some data:

DataFed is a mediator of data flow between providers and users. It does not primary/official data.

Data Consolidation/integration:

Data consolidation from heterogeneous to homogeneous structure is performed on the fly for most datasets. Many historical datasets are cached at DataFed for fast data access and browsing.

Providing Data Access to users/externals:

DataFed is a homogenizer of distributed, heterogeneous datasets through data 'wrappers'. As a result all the data mediated in DataFed are accessible through international standard data access services, OGC WCS and WMS. At this time all data access services are free and offered through an open interface.

Data Processing:

The processing of raw data is performed by reusable web-service components, which include filtering, aggregation, and data fusion services. Data processing applications are created by chaining services using workflow software.

Visualization/Analysis:

The visualization tools for parameter-spatial-temporal browsing are applicable for each dataset in the federated data system. The output data from the processing services are also available for mashups with other popular tools e.g. Google Earth and GIS software.

Decision Support (e.g. some integration into user business process):

DataFed has served the RPOs through the FASTNET and Combined Aerosol Trajectory Tool (CATT) projects. More recently DataFed supports the decisions for the Exceptional Event Rule for PM2.5 and ozone.

End-to-End Integration:

Data access, processing and visualization are all performed within DataFed. Specific workflow configurations are created from the loosely coupled web services for different scientific analysis or decision-support applications. An example custom workflow is the Combined Aerosol Trajectory Tool (CATT).

Other DS Values:

Not Given

Data Access and/or Output Interoperability:

Both the raw input data as well as the processed outputs are accessible through international standard interfaces. This allows the creation of loosely-coupled network applications (Paper-PDF).

Reusable Tools and Methods:

The data access, processing and visualization services in DataFed are all composed of reusable Web Services through both SOAP and REST protocols (Paper-PDF).

Security Barriers and Solutions:

The data access and processing services are accessible through the SOAP-WSDL protocol, which is designed to pass through firewalls. At this time there are no access restrictions to these services.

User Feedback Approach:

For each dataset registered in DataFed there is a "DataSpace" wiki page for the collection of dataset-relavent information, including user feedback (e.g. AirNOW).

Other Architecture:

The DataFed architecture has been used as a model for demonstrating the "System of Systems" aspect of GEOSS.

User Provided Content


Facts about DataFedRDF feed
About DataFed is Web services-based software tha DataFed is Web services-based software that non-intrusively mediates between autonomous, distributed data providers and users. DataFed is designed in accordance with the GEOSS architecture; It provides standard interfaces to heterogeneous distributed data, fosters data integration and use with processing web services and tools, and collects metadata and user-feedback on datasets. DataFed also provides standards-based data feeds to the NASA Giovanni System. ed data feeds to the NASA Giovanni System.
ContactName Rudy Husar  +
Contactemail rhusar@me.wustl.edu  +
DataSystemAgencies Washington University  +
DataSystemAppCharact DataFed was used to perform aerosol charac DataFed was used to perform aerosol characterization for the RPO project, FASTNET. DataFed is the main data source supporting the development of EPA's Exceptional Event Rule. It is now used in the implementation of the EE Rule. orkspace| implementation of the EE Rule.]]
DataSystemAppFcstReAnaly A current NASA project with BAMS uses DataFed to assimilate surface obs. into a forecast model. We are not aware of any formal Air Quality Reanalysis effort; hopefully, thee community will
DataSystemAppHealth No applications to health studies. However, the datasets mediated through DataFed are suitable for health studies, particularly in conjunction with the 1km-resolution US population data.
DataSystemAppModelEval The The [http://capita.wustl.edu/NEISGEI/main.html EPA NEISGEI Project uses DataFed to integrate and to evaluate multiple emission databases. DataFed was used to prepare an evaluation of the CMAQ Aerosol Model with IMPROVE and FRM data. In the NASA project with BAMS, DataFed provides surface observations for assimilation into a forecast model. Add HTAP data integration... recast model. Add HTAP data integration...
DataSystemAppOther Since 2004, a major role of DataFed was to participate in interoperability experiments for GEOSS.
DataSystemArchInterop Both the raw input data as well as the pro Both the raw input data as well as the processed outputs are accessible through international standard interfaces. This allows the creation of loosely-coupled network applications (Paper-PDF). ed_IGARSS07_Barcelona_ER3.pdf Paper-PDF]).
DataSystemArchOther The DataFed architecture has been used as a model for demonstrating the "System of Systems" aspect of GEOSS.
DataSystemArchSecurity The data access and processing services are accessible through the SOAP-WSDL protocol, which is designed to pass through firewalls. At this time there are no access restrictions to these services.
DataSystemArchToolsMethods The data access, processing and visualizat The data access, processing and visualization services in DataFed are all composed of reusable Web Services through both SOAP and REST protocols (Paper-PDF). ed_IGARSS07_Barcelona_ER3.pdf Paper-PDF]).
DataSystemArchUserFeedbck For each dataset registered in DataFed there is a "DataSpace" wiki page for the collection of dataset-relavent information, including user feedback (e.g. AirNOW).
DataSystemCoverage About half of the datasets are global scale, a third are US-scale, while some datasets are for other regions. Most datasets are multi-year in extent. About a ten datasets are near-real-time.
DataSystemDataSets See Dataset Catalog
DataSystemHistory The federated data system, DataFed, was in The federated data system, DataFed, was in development since 2001 at Washington University, CAPITA, with grants from NSF, NASA, EPA and RPOs. Since 2004, DataFed served both Regulatory and Policy support to EPA. Within CAPITA, DataFed has become a scientific data analysis tool. as become a scientific data analysis tool.
DataSystemName DataFed  +
DataSystemParam DataFed provides access to over DataFed provides access to over [http://datafedwiki.wustl.edu/index.php/Compact_Catalog_-_Alphabetical 100+ distributed, air-quality relevant datasets (surface, satellite, and model) which can be explored and analyzed by tools for processing and visualization. by tools for processing and visualization.
DataSystemRef [http://datafedwiki.wustl.edu/index.php/FASTNET FASTNET,EPA Exceptional Event Project, Interoperability of Web Service-Based Data Access and Processing... ESTO 2006, Combined Aerosol Trajectory Tool, CATT, DataFed: Mediated Web Services for Distributed AQ Data Access and Processing IGARS 2007, Interoperable Info System of Systems for HTAP HTAP 2007 Info System of Systems for HTAP] HTAP 2007
DataSystemURL http://datafedwiki.wustl.edu  +
DataSystemValueAccess DataFed is a homogenizer of distributed, h DataFed is a homogenizer of distributed, heterogeneous datasets through data 'wrappers'. As a result all the data mediated in DataFed are accessible through international standard data access services, OGC WCS and WMS. At this time all data access services are free and offered through an open interface. ree and offered through an open interface.
DataSystemValueConsolidation Data consolidation from heterogeneous to homogeneous structure is performed on the fly for most datasets. Many historical datasets are cached at DataFed for fast data access and browsing.
DataSystemValueDecisionSupport DataFed has served the RPOs through the [h DataFed has served the RPOs through the FASTNET and Combined Aerosol Trajectory Tool (CATT) projects. More recently DataFed supports the decisions for the Exceptional Event Rule for PM2.5 and ozone. xceptional Event Rule for PM2.5 and ozone.
DataSystemValueOther Not Given
DataSystemValueProcess The processing of raw data is performed by reusable web-service components, which include filtering, aggregation, and data fusion services. Data processing applications are created by chaining services using workflow software.
DataSystemValueVis The visualization tools for parameter-spat The visualization tools for parameter-spatial-temporal browsing are applicable for each dataset in the federated data system. The output data from the processing services are also available for mashups with other popular tools e.g. Google Earth and GIS software. mo_OMI_NO2 Google Earth] and GIS software.
EndtoEndIntegration Data access, processing and visualization Data access, processing and visualization are all performed within DataFed. Specific workflow configurations are created from the loosely coupled web services for different scientific analysis or decision-support applications. An example custom workflow is the Combined Aerosol Trajectory Tool (CATT). Combined Aerosol Trajectory Tool (CATT)].
PresentationURL Not Given  +
PrimaryDataStorage DataFed is a mediator of data flow between providers and users. It does not primary/official data.
Personal tools