Difference between revisions of "Questions and Comments about CF-1.6 Station Data Convention"

From Earth Science Information Partners (ESIP)
Line 36: Line 36:
 
[http://webapps.datafed.net/cov_345061.ogc?SERVICE=WCS&REQUEST=GetCoverage&VERSION=1.0.0&CRS=EPSG:4326&COVERAGE=pm25nr&TIME=2007-05-11T20:00:00/2007-05-12T14:00:00&BBOX=-108,34,-104,36,0,0&WIDTH=-1&HEIGHT=-1&DEPTH=-1&FORMAT=sts-header-dump Same as above, but returning only the CDL header file without data]
 
[http://webapps.datafed.net/cov_345061.ogc?SERVICE=WCS&REQUEST=GetCoverage&VERSION=1.0.0&CRS=EPSG:4326&COVERAGE=pm25nr&TIME=2007-05-11T20:00:00/2007-05-12T14:00:00&BBOX=-108,34,-104,36,0,0&WIDTH=-1&HEIGHT=-1&DEPTH=-1&FORMAT=sts-header-dump Same as above, but returning only the CDL header file without data]
  
The above implementation is still lacking what we really need, so let's use it just as a proof of concept: it is possible to pack station data into a CF-NetCDF and it's more expressive than CSV.
+
The above implementation is still lacking what we really need, so let's use it just as a proof of concept: it is possible to pack station data into a CF-NetCDF and it's more expressive than CSV. It's also created from an obsolete CF-1.5 draft, so it's incompatible with the real CF-1.5 document.
 +
 
 +
== CF 1.5 and Station Data ==
 +
 
 +
The section [http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#id2867470 5.4. Timeseries of Station Data] shows a more recently accepted coding of station data.
 +
 
 +
That section shows almost what datafed is producing. Simplifying the data by removing the pressure dimension:
 +
 
 +
    dimensions:
 +
        station = 10 ;  // measurement locations
 +
        time = UNLIMITED ;
 +
    variables:
 +
        float humidity(time,station) ;
 +
            humidity:long_name = "specific humidity" ;
 +
            humidity:coordinates = "lat lon" ;
 +
        double time(time) ;
 +
            time:long_name = "time of measurement" ;
 +
            time:units = "days since 1970-01-01 00:00:00" ;
 +
        float lon(station) ;
 +
            lon:long_name = "station longitude";
 +
            lon:units = "degrees_east";
 +
        float lat(station) ;
 +
            lat:long_name = "station latitude" ;
 +
            lat:units = "degrees_north" ;
 +
 
 +
Propertioes of the data:
 +
 
 +
* Locations:
 +
** locations are indexed along ''station'' dimension
 +
** each location has ''lat'' and ''lon''
 +
* Time:
 +
** Measurement time is enumerated and is consistent across the locations
 +
* Data
 +
** Only the data is stored, very efficient transport mode for densely packed data.
 +
** Times and locations are not stored physically, since they can be computed from the index to the data point.
 +
** Missing data could be indicated by adding ''missing_value'' attribute.

Revision as of 11:52, August 29, 2011

Back to WCS Wrapper

The objective of this page is to promote discussion about CF-netCDF formats for station point data.

CSV, Comma Separated Values: Its Uses and Limitations

CSV format is easy, compact and practical for many uses.It can be fed to any spreadsheet and consumed easily with custom software.

Example of WCS GetCoverage query returning an envelope with uri to the result CSV file

   loc_code,lat,lon,datetime,pm25nr, etc...
   350610008,34.81,-106.74,2007-05-12T00:00:00,3.8, etc...
   350439004,35.62,-106.72,2007-05-12T00:00:00,20.9, etc...
   350610008,34.81,-106.74,2007-05-12T01:00:00,6.9, etc...

Unfortunately, the format makes embedding metadata in a CF-like convention difficult.

  • Location dimension:
    • Incompleteness: No idea what else is known about the stations than loc_code, lat and lon.
    • Stations with no data may be present with NULL value, or may not be present at all. No indication what's the case.
    • Repetition of the same latitude and longitude values.
  • Time dimension:
    • What's the periodicity? The software needs to guess, that the sample query coverage AQS_H actually is hourly data.
    • Are all the locations in the same periodicity, or do the locations have individual recording times? Again a guess.
    • What was the requested time min + max, and what's the real returned time range.

There's field pm25nr but there's more about it. It's PM 2.5 Non-Reference Method, units are ug/m3, source is from EPA Air Quality Network.

CF-netCDF for the Rescue.

At datafed, we have supported station data before CF-1.5 and WCS 1.1.0 were official. This was implemented in the older WCS 1.0.0 standard and has not been incorporated into the WCS 1.1 service yet.

Sample Query to AQS_H returning CF-netCDF station time series

Same as above, but returning only the CDL header file without data

The above implementation is still lacking what we really need, so let's use it just as a proof of concept: it is possible to pack station data into a CF-NetCDF and it's more expressive than CSV. It's also created from an obsolete CF-1.5 draft, so it's incompatible with the real CF-1.5 document.

CF 1.5 and Station Data

The section 5.4. Timeseries of Station Data shows a more recently accepted coding of station data.

That section shows almost what datafed is producing. Simplifying the data by removing the pressure dimension:

   dimensions:
       station = 10 ;  // measurement locations
       time = UNLIMITED ;
   variables:
       float humidity(time,station) ;
           humidity:long_name = "specific humidity" ;
           humidity:coordinates = "lat lon" ;
       double time(time) ;
           time:long_name = "time of measurement" ;
           time:units = "days since 1970-01-01 00:00:00" ;
       float lon(station) ;
           lon:long_name = "station longitude";
           lon:units = "degrees_east";
       float lat(station) ;
           lat:long_name = "station latitude" ;
           lat:units = "degrees_north" ;

Propertioes of the data:

  • Locations:
    • locations are indexed along station dimension
    • each location has lat and lon
  • Time:
    • Measurement time is enumerated and is consistent across the locations
  • Data
    • Only the data is stored, very efficient transport mode for densely packed data.
    • Times and locations are not stored physically, since they can be computed from the index to the data point.
    • Missing data could be indicated by adding missing_value attribute.