CF Coordinate Conventions

< Back to Glossary | Edit with Form

CF_Coordinate_Conventions Description: [[TermDesc::The purpose of the CF Conventions is to create self-describing netCDF files: each variable in the file has an associated description of what it represents, each value can be located in space and time. An important benefit of a convention is that it enables software tools to display data and perform operations on specified subsets of the data with no user intervention. It is equally important that the coordinate data is easy for human users to write and to understand. These conventions enable programs like Datafed Browser to work without configuration.]]

Glossary Domain: WCS, HTAP

Contributors

Brian Eaton, Jonathan Gregory, Bob Drach, Karl Taylor, Steve Hankin, John Caron, Rich Signell, Phil Bentley, Greg Rappa.

History

[[History::The convention was designed by Brian Eaton, Jonathan Gregory, Bob Drach, Karl Taylor and Steve Hankin. Version 1.0, 28 October, 2003, Version 1.1, 17 January, 2008 Version 1.2, 4 May, 2008, Version 1.3, 7 November, 2008, Version 1.4, 27 February, 2009]]

Term Details

Introduction

This is a short overview how netCDF-CF convention is written.

The nice match between netCDF-CF and WCS 1.1

The obvious benefit is the ease that software tools can read the files and recognize the meaning of each item. The Datafed datafed WCS service framework is such a tool, it relies heavily on CF information in Data Configuration for Cubes, making it possible to automatically create a WCS 1.1 service from files.

Each netCDF file becomes a coverage. So a WCS service may contain several, unrelated and different looking coverages.
Each data variable in the netCDF file becomes a field in the coverages. Variables share the dimensions, so coverage fields have equal latitude, longitude and time dimensions.
- A coverage is not a single piece of data, a coverage is a record of fields, like a netCDF file is a collection of variables.
- Coverages do not have units. The global attributes in netCDF files do not contain units either.
- Coverage must have at least one field. A netCDF file must have a variable to contain data.
- A field is a single measure. A netCDF variable has a scalar type: int, float, char etc...
- Fields do have a scalar unit of measure. Units in netCDF files are associated to individual variables.
- A field can be a data field or metadata field: a quality flag or measurement error range.

Some design advice:

If the measurements have big mismatch, they should be different coverages.

If fields relate to each other, they should be in the same coverage:

- weather coverage: file "weather.nc"
  - temperature field
  - humidity field
  - visibility field
  - weather code field: relates to all the measurements and it (clear | rain | show | tornado) can be used to filter out misleading data.

If the measurements are related, but each data variable has an independent set of quality flags, it may be better to create a separate coverage for each measurement. For example:

- temperature coverage: file "temperature.nc"
  - temperature field
  - number of observations field
  - quality flag field

- wind coverage: file "wind.nc"
  - speed field
  - direction field
  - gust maximum field
  - number of observations field

In this case, fields that belong together are in separate coverages. This is advisable to avoid incorrect associations between fields. Like the two "number of observations" are totally different fields and "quality flag" only relates to temperature, not to wind. It would be possible to put these to one coverage, since the wind and temperature may have identical dimensions. The design decision here is to weigh pros and cons of each arrangement.

CF Metadata in the netCDF files

The convention does not standardize any variable or dimension names. All the CF metadata is written in attributes. If variable names would have to be standard names, like nitrogen_monoxide, it would limit the number of such variables to one per file. Therefore, to name a variable to a standard name, an attribute is used.

Some important attributes:

standard_name is the real name of the variable, from CF Naming Conventions
long_name is the human readable, non-standard name.
units is the human readable, non-standard name, from udunits.

Coordinate Types

A coordinate variable is a variable that has only one dimension, and the name of the variable is the name of the dimension. The values in the variable are the dimension coordinates.

Four types of coordinates get special treatment in CF conventions, the three physical directions and time.

Latitude and Longitude Coordinates

The latitude coordinate variable must have the same name as the latitude dimension

standard_name=latitude and optionally axis=Y marks the latitude variable. This can be used in projections that have an orthogonal latitude axis.

Similarly, The longitude coordinate variable must have the same name as the longitude dimension and may have axis=X.

Coordinates of longitude axis typically are from -180...180 or 0..360, but that is not part of the convention.

If the latitude-longitude coordinates are not Cartesian, two dimensional coordinate variables can be used.

Vertical (Height or Depth) Coordinate

The axis=Z attribute marks dimension either height or depth.

The direction of positive (i.e., the direction in which the coordinate values are increasing), whether up or down, cannot in all cases be inferred from the units. For example, if an oceanographic netCDF file encodes the depth of the surface as 0 and the depth of 1000 meters as 1000 then the axis would use attribute positive=down. If, on the other hand, the depth of 1000 meters were represented as -1000 then the value would be positive=up. For a pressure coordinates positive=down.

Vertical coordinate can also be dimensionless, like sigma_level.

Time Coordinate

The variable is marked as standard_name=time and/or axis=T and units=hours since 1990-01-01"

Timeseries of Station Data

This is begin revised in unofficial Point Observation Conventions. The conventions describe how to code point and and trajectory data into a netCDF file.

Labels and Alternative Coordinates

Coordinates that are not in the physical XYZ-T domain can be used. Simple numeric dimensions, like wavelength, can be just variables. Enumerated dimensions may have a text coordinate, like red green and blue. These dimensions are called labels.

Data Representative of Cells

When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of finite "volume," a complete description of the variable should include metadata that describes the domain or extent of each cell, and the characteristic of the field that the cell values represent.

Cell Boundaries

attribute bounds=lat_bnds tells, that variable lat_bnds is a boundary variable. It has two dimensions, latitude and length or 2 dimension bounds. Bounds dimension gives the minimum and maximum value of the coordinate variable.

Bounds attribute applies to time dimension as well.

Cell Measures

For some calculations, information is needed about the size, shape or location of the cells that cannot be deduced from the coordinates and bounds without special knowledge that a generic application cannot be expected to have. In many cases the areas can be calculated from the cell bounds, but there are exceptions.

Attribute cell_measures=area: cell_area tells that the area of the cell is in variable cell_area.

Cell Methods

To describe the characteristic of a field that is represented by cell values, we define the cell_methods attribute of the variable. This is a string attribute comprising a list of blank-separated words of the form name: method.

For example: cell_methods=time: mean

Reduction of Dataset Size

This part of the standard tells how to run-length encode netCDF files so that they became more compact. This kind of coding makes reading more difficult, if possible simple zipping of the file is preferred.

It is possible to pack floating point data to 2 or 1 bytes. Attributes scale_factor and add_offset give a linear formula to calculate the real_value = scale_factor * stored_data + add_offset.