Difference between revisions of "CF Coordinate Conventions"

Revision as of 14:36, September 8, 2010

CF_Coordinate_Conventions Description: [[TermDesc::The purpose of the CF Conventions is to create self-describing netCDF files: each variable in the file has an associated description of what it represents, each value can be located in space and time. An important benefit of a convention is that it enables software tools to display data and perform operations on specified subsets of the data with no user intervention. It is equally important that the coordinate data is easy for human users to write and to understand. These conventions enable programs like Datafed Browser to work without configuration.]]

Glossary Domain: WCS, HTAP

Contributors

Brian Eaton, Jonathan Gregory, Bob Drach, Karl Taylor, Steve Hankin, John Caron, Rich Signell, Phil Bentley, Greg Rappa.

History

[[History::The convention was designed by Brian Eaton, Jonathan Gregory, Bob Drach, Karl Taylor and Steve Hankin. Version 1.0, 28 October, 2003, Version 1.1, 17 January, 2008 Version 1.2, 4 May, 2008, Version 1.3, 7 November, 2008, Version 1.4, 27 February, 2009]]

Term Details

Introduction

This is a short overview how netCDF-CF convention is written.

Direct Benefits of CF Convention

The obvious benefit is the ease that software tools can read the files and recognize the meaning of each item. The Datafed datafed WCS service framework is such a tool, it relies heavily on CF information in Data Configuration for Cubes, making it possible to automatically create a WCS 1.1 service from files.

The provider folder in the system becomes the service URL path
Each netCDF file becomes a coverage. So a WCS service may contain several, unrelated and different looking coverages.
Each data variable in the netCDF file becomes a field in the coverages. Variables share the dimensions, so coverage fields have equal latitude, longitude and time dimensions.
- A coverage is not a single piece of data, a coverage is a record of fields.
- Coverages do not have units.
- Coverage must have at least one field.
- A field is a single measure.
- Fields do have a scalar unit of measure.
- A field can be a data field or metadata field: a quality flag or measurement error range.

Some design advice:

- If fields relate to each other, they should be in the same coverage. Like temperature, humidity, visibility and weather code may be individual fields in a weather coverage. Depending on weather code, (clear, rain, snow), the weather measurements can be filtered to get rid of misleading data.

- If the measurements are related, but each data variable has a set of quality flags, it may be better to create a separate coverage for each measurement. For example:

- - temperature coverage
    - temperature field
    - number of observations

- - wind coverage
    - speed field
    - direction field
    - gust maximum field
    - number of observations

CF Metadata in the netCDF files

The convention does not standardize any variable or dimension names. All the CF metadata is written in attributes. If variable names would have to be standard names, like nitrogen_monoxide, it would limit the number of such variables to one per file. Therefore, to name a variable to a standard name, an attribute is used.

Some important attributes:

standard_name is the real name of the variable, from CF Naming Conventions
long_name is the human readable, non-standard name.
units is the human readable, non-standard name, from udunits.

Coordinate Types

A coordinate variable is a variable that has only one dimension, and the name of the variable is the name of the dimension. The values in the variable are the dimension coordinates.

Four types of coordinates get special treatment in CF conventions, the three physical directions and time.

Latitude and Longitude Coordinates

The latitude coordinate variable must have the same name as the latitude dimension

standard_name=latitude and optionally axis=Y marks the latitude variable. This can be used in projections that have an orthogonal latitude axis.

Similarly, The longitude coordinate variable must have the same name as the longitude dimension and may have axis=X.

Coordinates of longitude axis typically are from -180...180 or 0..360, but that is not part of the convention.

If the latitude-longitude coordinates are not Cartesian, two dimensional coordinate variables can be used.

Vertical (Height or Depth) Coordinate

The axis=Z attribute marks dimension either height or depth.

The direction of positive (i.e., the direction in which the coordinate values are increasing), whether up or down, cannot in all cases be inferred from the units. For example, if an oceanographic netCDF file encodes the depth of the surface as 0 and the depth of 1000 meters as 1000 then the axis would use attribute positive=down. If, on the other hand, the depth of 1000 meters were represented as -1000 then the value would be positive=up. For a pressure coordinates positive=down.

Vertical coordinate can also be dimensionless, like sigma_level.

Time Coordinate

The variable is marked as standard_name=time and/or axis=T and units=hours since 1990-01-01"

Timeseries of Station Data

This is begin revised in unofficial Point Observation Conventions. The conventions describe how to code point and and trajectory data into a netCDF file.

Labels and Alternative Coordinates

Coordinates that are not in the physical XYZ-T domain can be used. Simple numeric dimensions, like wavelength, can be just variables. Enumerated dimensions may have a text coordinate, like red green and blue. These dimensions are called labels.

Data Representative of Cells

When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of finite "volume," a complete description of the variable should include metadata that describes the domain or extent of each cell, and the characteristic of the field that the cell values represent.

Cell Boundaries

attribute bounds=lat_bnds tells, that variable lat_bnds is a boundary variable. It has two dimensions, latitude and length or 2 dimension bounds. Bounds dimension gives the minimum and maximum value of the coordinate variable.

Bounds attribute applies to time dimension as well.

Cell Measures

For some calculations, information is needed about the size, shape or location of the cells that cannot be deduced from the coordinates and bounds without special knowledge that a generic application cannot be expected to have. In many cases the areas can be calculated from the cell bounds, but there are exceptions.

Attribute cell_measures=area: cell_area tells that the area of the cell is in variable cell_area.

Cell Methods

To describe the characteristic of a field that is represented by cell values, we define the cell_methods attribute of the variable. This is a string attribute comprising a list of blank-separated words of the form name: method.

For example: cell_methods=time: mean

Reduction of Dataset Size

This part of the standard tells how to run-length encode netCDF files so that they became more compact. This kind of coding makes reading more difficult, if possible simple zipping of the file is preferred.

It is possible to pack floating point data to 2 or 1 bytes. Attributes scale_factor and add_offset give a linear formula to calculate the real_value = scale_factor * stored_data + add_offset.

@@ Line 28: / Line 28: @@
 ** Coverage must have at least one field.
 ** A field is a single measure.
-** Fields do have unit of measure.
+** Fields do have a scalar unit of measure.
 ** A field can be a data field or metadata field: a quality flag or measurement error range.