Difference between revisions of "Example CF-NetCDF for Satellite"

From Earth Science Information Partners (ESIP)
Line 122: Line 122:
 
         create()
 
         create()
  
== Download Procedure ==
+
== Downloading the Data Files ==
 +
 
 +
It's possible to download a data file and directly append it into the netcdf cube, without storing any temporary files. This this approach has the drawback, that if anything goes wrong, you have to download everything again. It's not necessarily an error, but maybe you want to do some data processing, and now redoing the whole process is inconvenient. With current disk spaces, it's better to first download the files and store them locally.
 +
 
 +
The module AI_ftp.py does just that. It stores the file locally, and retrieves only new files.
  
 
=== Make URL from the Template ===
 
=== Make URL from the Template ===
  
 
=== Download the Text File ===
 
=== Download the Text File ===
 +
 +
== Compile Data Files into CF-NetCDF ==
  
 
=== Compile the Text File into an rectangular array ===  
 
=== Compile the Text File into an rectangular array ===  
  
 
=== Append the time slice ===
 
=== Append the time slice ===

Revision as of 15:34, July 6, 2011

Back to WCS Access to netCDF Files

A Real life example how to download and store satellite data.

Total Ozone Mapping Spectrometer (TOMS) satellite

This data has been collected by three different satellites from 1979 to present. It is downloadable in text files, linear rectangular grid.

Nimbus satellite:

Grid size 288 * 180

EPTOMS satellite:

Grid size 288 * 180

As you can see, there is a gap between 1993-05-06 and 1996-07-22.

OMI satellite:

Grid size 360 * 180

The python module AI_data.py contains the templates for ftp data urls, like

   template_path_omi = '/pub/omi/data/aerosol/Y%Y/L3_aersl_omi_%Y%m%d.txt'
   first_omi_datetime = datetime.datetime(2004, 9, 6)

The %Y %m %d are python format codes, 4-digit year, 2-digit month, 2-digit day.

a programmer can now get the url, the following returns '/pub/omi/data/aerosol/Y%Y/L3_aersl_omi_20100324.txt':

   AI_data.determine_ftp_path(datetime.datetime(2010, 3, 24)

Creating an Empty CF-NetCDF File

We describe AerosolIndex.nc CF-NetCDF file in xml file AerosolIndex.ncml because it is much easier to create a declarative description of the file than write script code to create netcdf files programmatically. Once the NCML file is done, creating a netcdf file is an one liner.

NetCDF Markup Language (NCML)

NCML documentation

The first line is the root element and namespace

   <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">

The explicit means, that all the metadata is here. Alternatively, readMetadata would require an existing source netcdf file.

       <explicit />

Global attributes. Notice, that NCML does not require CF conventions. Therefore, you have to declare the convention yourself.

       <attribute name="title" type="string" value="NASA TOMS Project" />
       <attribute name="comment" type="string" value="NASA Total Ozone Mapping Spectrometer Project" />
       <attribute name="Conventions" type="string" value="CF-1.0" />

Declare dimensions. This is a 3-dimensional grid, with time as the unlimited dimension. Since there are two grid sizes, 288 with 1.25 degree steps and 360 with 1.0 degree steps, we choose 360 steps for the longitude and adjust the Nimbus2 and TOMS data.

       <dimension name="time" length="0" isUnlimited="true" />
       <dimension name="lat" length="180" />
       <dimension name="lon" length="360" />

Time dimension. It is advisable to use integers as data type. If you have hourly data, don't use "days since 2000-01-01" and float datatype, since days = hours/24 does not have a nice decimal digit representation.

       <variable name="time" type="int" shape="time">
           <attribute name="standard_name" type="string" value="time" />
           <attribute name="long_name" type="string" value="time" />
           <attribute name="units" type="string" value="days since 1978-11-01" />
           <attribute name="axis" type="string" value="T" />
       </variable>

Geographical dimensions. The naming and attributes are important.

       <variable name="lat" type="double" shape="lat">
           <attribute name="standard_name" type="string" value="latitude" />
           <attribute name="long_name" type="string" value="latitude" />
           <attribute name="units" type="string" value="degrees_north" />
           <attribute name="axis" type="string" value="Y" />
           <values start="-89.5" increment="1" />
       </variable>
       <variable name="lon" type="double" shape="lon">
           <attribute name="standard_name" type="string" value="longitude" />
           <attribute name="long_name" type="string" value="longitude" />
           <attribute name="units" type="string" value="degrees_east" />
           <attribute name="axis" type="string" value="X" />
           <values start="-179.5" increment="1" />
       </variable>

The data variable. The _FillValue and missing_value should be the same, and NaN is the recommended value. Numeric values like -999 are dangerous, since they may accidentally mess up averages etc.

       <variable name="AI" type="float" shape="time lat lon">
           <attribute name="long_name" type="string" value="Aerosol Index" />
           <attribute name="units" type="string" value="fraction" />
           <attribute name="_FillValue" type="float" value="NaN" />
           <attribute name="missing_value" type="float" value="NaN" />
       </variable>

Closing tag

   </netcdf>

Create Script AI_create.py

By running the following python program create.py the empty netcdf cube is done.

   from datafed import cf1
   
   def create():
       cf1.create_ncml22('AerosolIndex.nc', 'AerosolIndex.ncml', '64bitoffset')
   
   if __name__ == "__main__":
       create()

Downloading the Data Files

It's possible to download a data file and directly append it into the netcdf cube, without storing any temporary files. This this approach has the drawback, that if anything goes wrong, you have to download everything again. It's not necessarily an error, but maybe you want to do some data processing, and now redoing the whole process is inconvenient. With current disk spaces, it's better to first download the files and store them locally.

The module AI_ftp.py does just that. It stores the file locally, and retrieves only new files.

Make URL from the Template

Download the Text File

Compile Data Files into CF-NetCDF

Compile the Text File into an rectangular array

Append the time slice