Difference between revisions of "Creating NetCDF CF Files"

From Earth Science Information Partners (ESIP)
Line 214: Line 214:
  
 
Lets look at it one section at a time:
 
Lets look at it one section at a time:
 +
 +
This first section describes global attributes:
  
 
<html><pre>
 
<html><pre>
Line 222: Line 224:
 
&lt;attribute name="comment" type="string" value="Concentration file output From CMAQ model dyn alloc version CTM" /&gt;
 
&lt;attribute name="comment" type="string" value="Concentration file output From CMAQ model dyn alloc version CTM" /&gt;
 
&lt;attribute name="Conventions" type="string" value="CF-1.0" /&gt;
 
&lt;attribute name="Conventions" type="string" value="CF-1.0" /&gt;
 +
 +
</pre></html>
 +
 +
Next come the dimensions:
 +
 +
<html><pre>
 +
 
&lt;dimension name="time" length="0" isUnlimited="true" /&gt;
 
&lt;dimension name="time" length="0" isUnlimited="true" /&gt;
 
&lt;dimension name="lat" length="220" /&gt;
 
&lt;dimension name="lat" length="220" /&gt;
 
&lt;dimension name="lon" length="288" /&gt;
 
&lt;dimension name="lon" length="288" /&gt;
 +
 +
 +
</pre></html>
 +
 +
Next time dimension variable
 +
 +
<html><pre>
 +
 
&lt;variable name="time" type="int" shape="time"&gt;
 
&lt;variable name="time" type="int" shape="time"&gt;
 
&lt;attribute name="standard_name" type="string" value="time" /&gt;
 
&lt;attribute name="standard_name" type="string" value="time" /&gt;
Line 231: Line 248:
 
&lt;attribute name="axis" type="string" value="T" /&gt;
 
&lt;attribute name="axis" type="string" value="T" /&gt;
 
&lt;/variable&gt;
 
&lt;/variable&gt;
 +
</pre></html>
 +
 +
latitude and longitude dimension variables
 +
 +
<html><pre>
 +
 
&lt;variable name="lat" type="double" shape="lat"&gt;
 
&lt;variable name="lat" type="double" shape="lat"&gt;
 
&lt;attribute name="standard_name" type="string" value="latitude" /&gt;
 
&lt;attribute name="standard_name" type="string" value="latitude" /&gt;
Line 245: Line 268:
 
&lt;values start="-91.6" increment="0.05" /&gt;
 
&lt;values start="-91.6" increment="0.05" /&gt;
 
&lt;/variable&gt;
 
&lt;/variable&gt;
 +
</pre></html>
 +
 +
data variables variables
 +
 +
<html><pre>
 +
 
&lt;variable name="PM2_5" type="float" shape="time lat lon"&gt;
 
&lt;variable name="PM2_5" type="float" shape="time lat lon"&gt;
 
&lt;attribute name="long_name" type="string" value="PM2.5 concentration" /&gt;
 
&lt;attribute name="long_name" type="string" value="PM2.5 concentration" /&gt;
Line 258: Line 287:
 
&lt;/variable&gt;
 
&lt;/variable&gt;
 
&lt;/netcdf&gt;
 
&lt;/netcdf&gt;
 
</netcdf>
 
  
 
</pre></html>
 
</pre></html>

Revision as of 10:25, July 22, 2009

UNDER CONSTRUCTION


Back to WCS Access to netCDF Files

Back to WCS NetCDF Development

NetCDF-CF Convention

The reason to use CF convention: Enable plug and play connectivity.

Well done NetCDF files are human readable. After all: what could dimension longitude mean besides longitude. If you get data in NetCDF format, it's usually fairly easy to see what really is there.

It's also easy to write a generic browser, that can display every variable for you.

But since a lot of data in NetCDF files have geographical meaning, a graphical viewer should be able to draw the data on the map, on it's own. This involves, at minimum:

  • finding the three geographical dimensions
  • Finding time dimension, if any
  • Understanding the geographical projection

From any generic NetCDF, this requires human intelligence. After all, the n-dimensional data variables, dimensional variables and other metadata variables look precisely the same for the program code. There are legion of ways to code projection information, and decoding it reliably is very difficult.

Conventions come to rescue. For example, in CF convention X-axis dimension variable must have an attribute axis='X', and the values are points on X-axis. No guesswork.

The best documentation is at CF Metadata page.

Example

In short, a simple ncdump CDL output for a NetCDF-CF file may look like this:

netcdf TOMS_AI_58 {
dimensions:
	time = 1 ;
	lat = 3 ;
	lon = 4 ;
variables:
	double time(time) ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:units = "days since 1979-01-01" ;
		time:axis = "T" ;
	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
	byte AI(time, lat, lon) ;
		AI:long_name = "Aerosol Index" ;
		AI:units = "fraction" ;
		AI:_FillValue = -1b ;
		AI:missing_value = -1b ;

// global attributes:
		:title = "NASA TOMS Project" ;
		:comment = "NASA Total Ozone Mapping Spectrometer Project" ;
		:Conventions = "CF-1.0" ;
data:

 time = 9952 ;

 lat = 32.5, 33.5, 34.5 ;

 lon = -89.375, -88.125, -86.875, -85.625 ;

 AI =
  0, 2, 1, 2,
  _, 2, 3, 2,
  1, 4, 4, 2 ;
}

Let's go over section by section:

Dimensions

dimensions:
	time = 1 ;
	lat = 3 ;
	lon = 4 ;
}

These names can be anything. The reason is, that sometimes you may want to store two grids, that have different dimensions, into the same file. In that case you could name dimensions lat1, lat2 etc...

Time Dimension Variable

	double time(time) ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:units = "days since 1979-01-01" ;
		time:axis = "T" ;

It's the attribute axis = "T" marks this variable as time dimension.

Latitude and Longitude Dimension Variables

	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;

Again, it's the attributes axis = "Y" and axis = "X" that mark geographical dimension variables. The linear projection is acknowledged with standard names.

Data Variable

	byte AI(time, lat, lon) ;
		AI:long_name = "Aerosol Index" ;
		AI:units = "fraction" ;
		AI:_FillValue = -1b ;
		AI:missing_value = -1b ;

The dimensions are marked with axis attributes, so this is a data variable. The units = "fraction" is not standard, and therefore the compliance checker reports it as an error.

Global Attributes

		:title = "NASA TOMS Project" ;
		:comment = "NASA Total Ozone Mapping Spectrometer Project" ;
		:Conventions = "CF-1.0" ;

Only convention and version number Conventions = CF-1.0 is required.

Verifying NetCDF-CF Files

Since CF-1.0 conventions contain a lot of definitions, verifying them by machine is necessary.

Online Compliance Checker

There is a fairly complete compliance checker online. They have some NetCDF documentation and CF convention documentation online too. The latest compliance checker is here, it lets you upload a NetCDF file and does a wide range of checks.

Using Compliance Checker from python as a service

TODO: this is just a standard form with http-post, and therefore it should be easy to use it as a service from python. The owsadmin tool should clone a subset of a netcdf cube and submit it.

Using Offline Compliance Checker

TODO Michael Decker has developed a model_checker, that can currently to the following:

find netCDF files that

  • do not have axis=X|Y|Z|T attributes
  • have unknown axis attributes
  • have non-ASCII characters in string attributes
  • define the same axis name multiple times
  • have no units in T-axis
  • have incompatible time format in T-axis

try to repair all of the above problems except for problems concerning the T-axis which can't be fixed automatically

Making this module to work with nc3 just requires some methods into the cf1 module.

Creating NetCDF-CF files

There are a few ways to create a NetCDF file. In general, it's easiest to create the empty file with a descriptive method, and use a programming language to fill in the data.

Use CDL and ncgen

The CDL text you see above is fairly readable, and ncgen can turn it into a NetCDF file.

Use NCML

NCML, NetCDF Markup Language is an XML language for manipulating NetCDF. The designers provide Java implementation, and Datafed supports creating and verifying files.

Creating and Filling NetCDF Files using NCML and python

After installing Windows or Linux you have the datafed.nc3 and datafed.cf1 modules.

Here's a test ncml: CMAQ_Baron_20.ncml

Lets look at it one section at a time:

This first section describes global attributes:


<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
	<explicit />
	<attribute name="title" type="string" value="CMAQ_Baron" />
	<attribute name="comment" type="string" value="Concentration file output From CMAQ model dyn alloc version CTM" />
	<attribute name="Conventions" type="string" value="CF-1.0" />

Next come the dimensions:


	<dimension name="time" length="0" isUnlimited="true" />
	<dimension name="lat" length="220" />
	<dimension name="lon" length="288" />


Next time dimension variable


	<variable name="time" type="int" shape="time">
		<attribute name="standard_name" type="string" value="time" />
		<attribute name="long_name" type="string" value="time" />
		<attribute name="units" type="string" value="hours since 2009-01-01" />
		<attribute name="axis" type="string" value="T" />
	</variable>

latitude and longitude dimension variables


	<variable name="lat" type="double" shape="lat">
		<attribute name="standard_name" type="string" value="latitude" />
		<attribute name="long_name" type="string" value="latitude" />
		<attribute name="units" type="string" value="degrees_north" />
		<attribute name="axis" type="string" value="Y" />
		<values start="6.2" increment="0.05" />
	</variable>
	<variable name="lon" type="double" shape="lon">
		<attribute name="standard_name" type="string" value="longitude" />
		<attribute name="long_name" type="string" value="longitude" />
		<attribute name="units" type="string" value="degrees_east" />
		<attribute name="axis" type="string" value="X" />
		<values start="-91.6" increment="0.05" />
	</variable>

data variables variables


	<variable name="PM2_5" type="float" shape="time lat lon">
		<attribute name="long_name" type="string" value="PM2.5 concentration" />
		<attribute name="units" type="string" value="micrograms/m^3" />
		<attribute name="_FillValue" type="float" value="255" />
		<attribute name="missing_value" type="float" value="255" />
	</variable>
	<variable name="O3" type="float" shape="time lat lon">
		<attribute name="long_name" type="string" value="Ozone" />
		<attribute name="units" type="string" value="ppmV" />
		<attribute name="_FillValue" type="float" value="255" />
		<attribute name="missing_value" type="float" value="255" />
	</variable>
</netcdf>

You can create the NetCDF file with a few lines of python:

import datafed
from datafed import cf1

cf1.create_ncml22("CMAQ_Baron_20.nc", "CMAQ_Baron_20.ncml")

This creates the dimensions and dimension variables, and leaves the data empty.

Now we can add a spacial slice of a given time:

import datafed
from datafed import cf1
import datetime

nc = cf1.open("CMAQ_Baron_20.nc", "w")

dt = datetime.datetime(2009, 1, 2)

# create array for one time slice
# 220 latitudes, 288 longitudes, NaN as initial value
data = cf1.create_array([220, 288], float("NaN")) 
for line in open("data.txt"): 
    #data.txt must have lines
    #x-index, y-index, value
    #
    #0, 0, 5.4
    #0, 1, 7.6
    #etc...

    x, y, v = line.split(",")
    data[int(y)][int(x)] = float(v)

nc.put_time_slice("PM2_5", slice, dt)

This will write the data array into PM2_5 variable, and also write "24" into the datetime variable, since the time is 24 hours from the time dimension start 2009-01-01.

Notice, that put_time_slice only supports updating or appending, inserting into the middle is not possible.