Creating NetCDF CF Files

From Federation of Earth Science Information Partners
(Difference between revisions)
Jump to: navigation, search
(Example showing a NetCDF file as a CDL text)
(Creating and Filling NetCDF Files using NCML and python)
Line 217: Line 217:
 
Here's a test ncml: [http://128.252.167.125:8080/static/CMAQ_Baron/CMAQ_Baron_20.ncml CMAQ_Baron_20.ncml]  
 
Here's a test ncml: [http://128.252.167.125:8080/static/CMAQ_Baron/CMAQ_Baron_20.ncml CMAQ_Baron_20.ncml]  
  
Lets look at it one section at a time:
+
Lets look at it one section at a time, You'll see it's just as simple as CDL
  
 
=== Global Attributes ===
 
=== Global Attributes ===

Revision as of 10:36, 22 July 2009

UNDER CONSTRUCTION


Back to WCS Access to netCDF Files

Back to WCS NetCDF Development

Contents

NetCDF-CF Convention

The reason to use CF convention: Enable plug and play connectivity.

Well done NetCDF files are human readable. After all: what could dimension longitude mean besides longitude. If you get data in NetCDF format, it's usually fairly easy to see what really is there.

It's also easy to write a generic browser, that can display every variable for you.

But since a lot of data in NetCDF files have geographical meaning, a graphical viewer should be able to draw the data on the map, on it's own. This involves, at minimum:

  • finding the three geographical dimensions
  • Finding time dimension, if any
  • Understanding the geographical projection

From any generic NetCDF, this requires human intelligence. After all, the n-dimensional data variables, dimensional variables and other metadata variables look precisely the same for the program code. There are legion of ways to code projection information, and decoding it reliably is very difficult.

Conventions come to rescue. For example, in CF convention X-axis dimension variable must have an attribute axis='X', and the values are points on X-axis. No guesswork.

The best documentation is at CF Metadata page.

Example showing a NetCDF file as a CDL text

CDL = network Common data form Description Language

Notice, that CDL is convention agnostic, it does not know about CF conventions.

Using ncdump to generate CDL output for a NetCDF-CF file may look like this:

netcdf TOMS_AI_58 {
dimensions:
	time = 1 ;
	lat = 3 ;
	lon = 4 ;
variables:
	double time(time) ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:units = "days since 1979-01-01" ;
		time:axis = "T" ;
	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
	byte AI(time, lat, lon) ;
		AI:long_name = "Aerosol Index" ;
		AI:units = "fraction" ;
		AI:_FillValue = -1b ;
		AI:missing_value = -1b ;

// global attributes:
		:title = "NASA TOMS Project" ;
		:comment = "NASA Total Ozone Mapping Spectrometer Project" ;
		:Conventions = "CF-1.0" ;
data:

 time = 9952 ;

 lat = 32.5, 33.5, 34.5 ;

 lon = -89.375, -88.125, -86.875, -85.625 ;

 AI =
  0, 2, 1, 2,
  _, 2, 3, 2,
  1, 4, 4, 2 ;
}

Let's go over section by section:

Dimensions

dimensions:
	time = 1 ;
	lat = 3 ;
	lon = 4 ;
}

These names can be anything. The reason is, that sometimes you may want to store two grids, that have different dimensions, into the same file. In that case you could name dimensions lat1, lat2 etc...

Time Dimension Variable

	double time(time) ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:units = "days since 1979-01-01" ;
		time:axis = "T" ;

It's the attribute axis = "T" marks this variable as time dimension.

Latitude and Longitude Dimension Variables

	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;

Again, it's the attributes axis = "Y" and axis = "X" that mark geographical dimension variables. The linear projection is acknowledged with standard names.

Data Variable

	byte AI(time, lat, lon) ;
		AI:long_name = "Aerosol Index" ;
		AI:units = "fraction" ;
		AI:_FillValue = -1b ;
		AI:missing_value = -1b ;

The dimensions are marked with axis attributes, so this is a data variable. The units = "fraction" is not standard, and therefore the compliance checker reports it as an error.

Global Attributes

		:title = "NASA TOMS Project" ;
		:comment = "NASA Total Ozone Mapping Spectrometer Project" ;
		:Conventions = "CF-1.0" ;

Only convention and version number Conventions = CF-1.0 is required.

Verifying NetCDF-CF Files

Since CF-1.0 conventions contain a lot of definitions, verifying them by machine is necessary.

Online Compliance Checker

There is a fairly complete compliance checker online. They have some NetCDF documentation and CF convention documentation online too. The latest compliance checker is here, it lets you upload a NetCDF file and does a wide range of checks.

Using Compliance Checker from python as a service

TODO: this is just a standard form with http-post, and therefore it should be easy to use it as a service from python. The owsadmin tool should clone a subset of a netcdf cube and submit it.

Using Offline Compliance Checker

TODO Michael Decker has developed a model_checker, that can currently to the following:

find netCDF files that

  • do not have axis=X|Y|Z|T attributes
  • have unknown axis attributes
  • have non-ASCII characters in string attributes
  • define the same axis name multiple times
  • have no units in T-axis
  • have incompatible time format in T-axis

try to repair all of the above problems except for problems concerning the T-axis which can't be fixed automatically

Making this module to work with nc3 just requires some methods into the cf1 module.

Creating NetCDF-CF files

There are a few ways to create a NetCDF file. In general, it's easiest to create the empty file with a descriptive method, and use a programming language to fill in the data.

Use CDL and ncgen

The CDL text you see above is fairly readable, and ncgen can turn it into a NetCDF file.

Use NCML

NCML, NetCDF Markup Language is an XML language for manipulating NetCDF. The designers provide Java implementation, and Datafed supports creating and verifying files.

Creating and Filling NetCDF Files using NCML and python

After installing Windows or Linux you have the datafed.nc3 and datafed.cf1 modules.

Here's a test ncml: CMAQ_Baron_20.ncml

Lets look at it one section at a time, You'll see it's just as simple as CDL

Global Attributes

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
	<explicit />
	<attribute name="title" type="string" value="CMAQ_Baron" />
	<attribute name="comment" type="string" value="Concentration file output From CMAQ model dyn alloc version CTM" />
	<attribute name="Conventions" type="string" value="CF-1.0" />

Dimensions

	<dimension name="time" length="0" isUnlimited="true" />
	<dimension name="lat" length="220" />
	<dimension name="lon" length="288" />


Time Dimension Variable

	<variable name="time" type="int" shape="time">
		<attribute name="standard_name" type="string" value="time" />
		<attribute name="long_name" type="string" value="time" />
		<attribute name="units" type="string" value="hours since 2009-01-01" />
		<attribute name="axis" type="string" value="T" />
	</variable>

Latitude and Longitude Dimension Variables

	<variable name="lat" type="double" shape="lat">
		<attribute name="standard_name" type="string" value="latitude" />
		<attribute name="long_name" type="string" value="latitude" />
		<attribute name="units" type="string" value="degrees_north" />
		<attribute name="axis" type="string" value="Y" />
		<values start="6.2" increment="0.05" />
	</variable>
	<variable name="lon" type="double" shape="lon">
		<attribute name="standard_name" type="string" value="longitude" />
		<attribute name="long_name" type="string" value="longitude" />
		<attribute name="units" type="string" value="degrees_east" />
		<attribute name="axis" type="string" value="X" />
		<values start="-91.6" increment="0.05" />
	</variable>

Data Variables

	<variable name="PM2_5" type="float" shape="time lat lon">
		<attribute name="long_name" type="string" value="PM2.5 concentration" />
		<attribute name="units" type="string" value="micrograms/m^3" />
		<attribute name="_FillValue" type="float" value="255" />
		<attribute name="missing_value" type="float" value="255" />
	</variable>
	<variable name="O3" type="float" shape="time lat lon">
		<attribute name="long_name" type="string" value="Ozone" />
		<attribute name="units" type="string" value="ppmV" />
		<attribute name="_FillValue" type="float" value="255" />
		<attribute name="missing_value" type="float" value="255" />
	</variable>
</netcdf>

Creating NetCDF Files using Python

Using the NCML file above, you can create the NetCDF file with a few lines of python:

import datafed
from datafed import cf1

cf1.create_ncml22("CMAQ_Baron_20.nc", "CMAQ_Baron_20.ncml")

This creates the dimensions and dimension variables, and leaves the data empty.

Now we can add a spacial slice of a given time:

import datafed
from datafed import cf1
import datetime

nc = cf1.open("CMAQ_Baron_20.nc", "w")

dt = datetime.datetime(2009, 1, 2)

# create array for one time slice
# 220 latitudes, 288 longitudes, NaN as initial value
data = cf1.create_array([220, 288], float("NaN")) 
for line in open("data.txt"): 
    #data.txt must have lines
    #x-index, y-index, value
    #
    #0, 0, 5.4
    #0, 1, 7.6
    #etc...

    x, y, v = line.split(",")
    data[int(y)][int(x)] = float(v)

nc.put_time_slice("PM2_5", slice, dt)

This will write the data array into PM2_5 variable, and also write "24" into the datetime variable, since the time is 24 hours from the time dimension start 2009-01-01.

Notice, that put_time_slice only supports updating or appending, inserting into the middle is not possible.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox