Difference between revisions of "Creating NetCDF CF Files"

From Earth Science Information Partners (ESIP)
Line 176: Line 176:
  
 
=== Creating and Filling NetCDF Files using NCML and python ===
 
=== Creating and Filling NetCDF Files using NCML and python ===
 +
 +
After installing [http://wiki.esipfed.org/index.php/WCS_Wrapper_Installation_WindowsOP Windows] or [http://wiki.esipfed.org/index.php/WCS_Wrapper_Installation_LinuxOP Linux] you have the ''datafed.nc3'' and '''datafed.cf1''' modules.
  
 
Here's a test ncml: [http://128.252.167.125:8080/static/CMAQ_Baron/CMAQ_Baron_20.ncml CMAQ_Baron_20.ncml]
 
Here's a test ncml: [http://128.252.167.125:8080/static/CMAQ_Baron/CMAQ_Baron_20.ncml CMAQ_Baron_20.ncml]
 +
 +
You can create the NetCDF file with a few lines of python:
  
 
<html><pre>
 
<html><pre>
from datafed import nc3, cf1
+
from datafed import cf1
  
 
cf1.create_ncml22("CMAQ_Baron_20.nc", "CMAQ_Baron_20.ncml")
 
cf1.create_ncml22("CMAQ_Baron_20.nc", "CMAQ_Baron_20.ncml")
 +
 +
</pre></html>
 +
 +
This creates the dimensions and dimension variables, and leaves the data empty.
 +
 +
Now we can add a daily slice:
 +
 +
<html><pre>
 +
from datafed import cf1
 +
import datetime
 +
 +
nc = cf1.open("CMAQ_Baron_20.nc")
 +
 +
dt = datetime.datetime(2009, 1, 2)
 +
 +
data = cf1.create_array([1, 220, 288]) # one time, 220 latitudes, 288 longitudes
 +
for line in open("data.txt"):
 +
    #data.txt must have lines
 +
    #x-index, y-index, value
 +
    #0, 0, 5.4
 +
    #0, 1, 7.6
 +
    #etc...
 +
    x, y, v = line.split(",")
 +
    data[0][int(y)][int(x)] = float(v)
 +
 +
nc.write_time_slice("PM2_5", dt, data)
 +
</pre></html>
 +
 +
This will write the data array into PM2_5 variable, and also write "24" into the datetime variable, since the time is 24 hours from the time dimension start 2009-01-01.
 +
 +
Notice, that write_time_slice only supports updating or appending, inserting into the middle is not possible.
  
 
</pre></html>
 
</pre></html>

Revision as of 17:53, July 16, 2009

UNDER CONSTRUCTION


Back to WCS Access to netCDF Files

Back to WCS NetCDF Development

NetCDF-CF Convention

The reason to use CF convention: Enable plug and play connectivity.

Well done NetCDF files are human readable. After all: what could dimension longitude mean besides longitude. If you get data in NetCDF format, it's usually fairly easy to see what really is there.

It's also easy to write a generic browser, that can display every variable for you.

But since a lot of data in NetCDF files have geographical meaning, a graphical viewer should be able to draw the data ion the map, on it's own. This involves, at minimum:

  • finding the three geographical dimensions
  • Finding time dimension, if any
  • Understanding the geographical projection

From any generic NetCDF, this requires human intelligence. After all, the n-dimensional data variables, dimensional variables and other metadata variables look precisely the same for the program code. There are legion of ways to code projection information, and decoding it reliably is very difficult.

Conventions come to rescue. For example, if a variable has attribute axis='X', there's only one interpretation for the values of this variable: it must have just one dimension, and the values are points on X-axis along that dimension. No more guesswork, and wrong guesses, for the programmers.


The best documentation is at CF Metadata page.

Example

In short, a simple ncdump output for a NetCDF-CF file may look like this:

netcdf TOMS_AI_58 {
dimensions:
	time = 1 ;
	lat = 3 ;
	lon = 4 ;
variables:
	double time(time) ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:units = "days since 1979-01-01" ;
		time:axis = "T" ;
	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
	byte AI(time, lat, lon) ;
		AI:long_name = "Aerosol Index" ;
		AI:units = "fraction" ;
		AI:_FillValue = -1b ;
		AI:missing_value = -1b ;

// global attributes:
		:title = "NASA TOMS Project" ;
		:comment = "NASA Total Ozone Mapping Spectrometer Project" ;
		:Conventions = "CF-1.0" ;
data:

 time = 9952 ;

 lat = 32.5, 33.5, 34.5 ;

 lon = -89.375, -88.125, -86.875, -85.625 ;

 AI =
  0, 2, 1, 2,
  _, 2, 3, 2,
  1, 4, 4, 2 ;
}

Let's go over section by section:

Dimensions

dimensions:
	time = 1 ;
	lat = 3 ;
	lon = 4 ;
}

These names can be anything. The reason is, that sometimes you may want to store two grids, that have different dimensions, into the same file. In that case you could name dimensions lat1, lat2 etc...

Time Dimension Variable

	double time(time) ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:units = "days since 1979-01-01" ;
		time:axis = "T" ;

It's the attribute axis = "T" marks this variable as time dimension.

Latitude and Longitude Dimension Variables

	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;

Again, it's the attributes axis = "Y" and axis = "X" that mark grographical dimension variables. Since the linear projection is acknowledged with standard names.

Data Variable

	byte AI(time, lat, lon) ;
		AI:long_name = "Aerosol Index" ;
		AI:units = "fraction" ;
		AI:_FillValue = -1b ;
		AI:missing_value = -1b ;

The dimensions are marked with axis attributes, so this is a data variable. The units = "fraction" is not standard, and therefore the compliance checker reports it as an error.

Global Attributes

		:title = "NASA TOMS Project" ;
		:comment = "NASA Total Ozone Mapping Spectrometer Project" ;
		:Conventions = "CF-1.0" ;

Only Conventions = CF-1.x is required.

Verifying NetCDF-CF Files

Since CF-1.0 conventions contain a lot of definitions, verifying them by machine is necessary. There is a fairly complete compliance checker online. They have some NetCDF documentation and CF convention documentation online too. The latest compliance checker is here, it lets you upload a NetCDF file and does a wide range of checks.

TODO: this is just a standard form with http-post, and therefore it should be easy to use it as a service from python. The owsadmin tool should clone a subset of a netcdf cube and submit it.

Creating NetCDF-CF files

There are a few ways to create a NetCDF file. In general, it's much easier to create the empty file with a descriptive method, and use a programming language to fill in the data.

Use CDL and ncgen

The CDL text you see above is fairly readable, and ncgen can turn it into a NetCDF file.

Use NCML

NCML, NetCDF Markup Language is an XML language for manipulating NetCDF. The designers provide Java implementation, and Datafed supports creating and verifying files.

Creating and Filling NetCDF Files using NCML and python

After installing Windows or Linux you have the datafed.nc3 and datafed.cf1 modules.

Here's a test ncml: CMAQ_Baron_20.ncml

You can create the NetCDF file with a few lines of python:

from datafed import cf1

cf1.create_ncml22("CMAQ_Baron_20.nc", "CMAQ_Baron_20.ncml")

This creates the dimensions and dimension variables, and leaves the data empty.

Now we can add a daily slice:

from datafed import cf1
import datetime

nc = cf1.open("CMAQ_Baron_20.nc")

dt = datetime.datetime(2009, 1, 2)

data = cf1.create_array([1, 220, 288]) # one time, 220 latitudes, 288 longitudes
for line in open("data.txt"): 
    #data.txt must have lines
    #x-index, y-index, value
    #0, 0, 5.4
    #0, 1, 7.6
    #etc...
    x, y, v = line.split(",")
    data[0][int(y)][int(x)] = float(v)

nc.write_time_slice("PM2_5", dt, data)

This will write the data array into PM2_5 variable, and also write "24" into the datetime variable, since the time is 24 hours from the time dimension start 2009-01-01.

Notice, that write_time_slice only supports updating or appending, inserting into the middle is not possible.

</html>