Talk:WCS NetCDF Development

From Earth Science Information Partners (ESIP)
Revision as of 05:10, July 16, 2009 by Michael Decker (MDecker) (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Time filtering implementation -- Michael Decker (MDecker) 07:10, 16 July 2009 (EDT)

I have trouble to understand how the current time filtering code works, I basically just copied it. It appears to do roughly what it should but for me it's very hard to predict what it would filter out under given conditions.

I think the biggest problem is the support for lists of individual times. Just limiting it to a single time range would make it much easier. According to WCS specs we would have to support an infinite number of time positions and intervals. But I think the current code only provides support for max. one time range or two individual time positions.

My impression is that we have to come up with a better and more understandable way of handling this. And either we go back to supporting only one time range or we try to support it fully, which will be much more work I guess.

Doing Version Upgrade -- Hoijarvi 10:40, 10 July 2009 (EDT)

I'm doing a full upgrade of the system now. This involves:

- upgrade 3rd party components:

  • python to 2.6.2
  • NetCDF3
  • lxml to 2.2.2
  • webpy to 0.32

DONE with above

sources are available with darcs from http://webapps.datafed.net/nest/OWS

I'll check the licenses to see if I can make one bundle for download.

- Merge Michael's improvements and fixes to development tree

- Converting the source repository to darcs v2

Re: Doing Version Upgrade -- Michael Decker (MDecker) 04:48, 14 July 2009 (EDT)

I took the chance to upgrade to lxml 2.2.2 and webpy 0.32 as well. We are also running the latest numpy 1.3.0 now. The OWS server was up and running again in minutes thanks to your webpy patches.

-- Hoijarvi 16:14, 8 July 2009 (EDT)

Starting a NetCDF-CF-WCS brain dump. Feel free to send feedback.

-- Michael Decker (MDecker) 06:49, 9 July 2009 (EDT)

This page seems to be a good idea to discuss our efforts instead of mailing all the time.

python nc3-clone (pync3)

I'm currently working on a python version of nc3 that offers the same interface to slice subcubes as nc3 does. I did some first tests yesterday and so far all the output files produced by nc3 as well as pync3 come out with identical contents.

On top of that I also added some handling of variables that are not processed by nc3 right now because they do not depend on X and Y or on other dimensions (like bnds).

Performance will probably something that takes some more time to look at. Yesterday I benchmarked with the biggest file I could find that was compatible with nc3 (and pync3) and right now the python implementation takes around double the time of the C++ version. I have to do some more tests, but I'm pretty sure most of it is due to inefficient slicing operations and big temporary numpy arrays. There is a selection mechanism built into nio itself, so I will try to figure out if we can use that. Hopefully it will be implemented a bit more efficiently and at least with less temporal copies of data.

I'm not expecting to get the python implementation up to the same speed, but maybe it's still possible to get it a little closer. (only 30-50% slowdown would be nice) A big advantage of the python implementation is certainly it's far easier maintainability and it's a shame that pynio does not exist for Windows.

Currently I'm wondering if I should try to build wrapper functions for all the native NetCDF calls that are supported by nc3. The goal would be to have exactly the same interface for both modules so that software built on it can be the same. However, I'm not sure how easy it is to build these wrapper functions based on nio. Especially as there is quite some abstraction and information like variableIDs and such are not really accessible. An alternative would be to pipe all the low-level calls through to the real nc3 module. But my hope would be that we can easily get the WCS server to run on all unix platforms without having to compile the nc3 module and instead just use nio and the python module.

Re: python nc3-clone (pync3) -- Hoijarvi 10:09, 10 July 2009 (EDT)

I'll wait with this until I have other patches applied.

request format for RangeSubset

Yesterday I was also trying to request more than one variable using the RangeSubset parameter. The first problem was that ';' is usually used as a delimiter between key-value-pairs. So the Python cgi parser module does not handle this part correctly and instead sees every variable name as a parameter with no value. Only the first variable name is kept as a value of the RangeSubset-parameter. To fix this without having to change the parser and because ';' is not a delimiter mentioned in the WCS specs, I chose to replace it by something else to get it through the parser.

Another thing I found was that parsing of the RangeSubset values probably does not agree with the specs right now: section 10.2.2.2 of the WCS 1.1.0 specs does not mention putting "fieldSubset:" in front of every variable name. Instead just the variable name should be used.

Both changes are available via our darcs repository if you want to have a look.

Re: request format for RangeSubset -- Hoijarvi 10:19, 10 July 2009 (EDT)

You're right. I have misinterpreted the "fieldSubset:", there's no such prefix here. I have to fix this in our registration server-by-server basis, since places like Northrop are running the old WCS, and until all those are dead, I have to work around.

PyNIO problems

It turns out that nio has a bug that is pretty mean when you try to access NetCDF files created by nio with the low-level C functions: When creating string attributes, nio allocates one byte too much for the string and saves the wrong length in the file. So when you create the axis attribute with a one character string, nio saves a length of 2 in the file while the string is really only 1 character long. Reading this back with nio seems to be no problem, but when using the C-functions in nc3, the wrong length is reported and the module refuses to read the attribute. A workaround seems to be to write the same attribute a 2nd time after it has been created. This fixes the length information and the data can be read with nc3 as well.

I will bring this up on the pyngl mailing list and hope they will fix it for the next release.

Re: PyNIO problems -- Hoijarvi 10:21, 10 July 2009 (EDT)

What is the extra character? If it's 0x00, I could just read the string and it will compare correctly to 'X'.

Re: Re: PyNIO problems -- Michael Decker (MDecker) 03:52, 13 July 2009 (EDT)

It's 0x00 as far as I could see with a hex editor. It would be great to have a quick fix for that in the nc3 code because it seems some of the data files we deliver have been made with a faulty nio version.

It might still take some time for the next official nio release that fixes the problem. With the nc3 code fixed it should not bother us that much any more though.

CF-1.0 checker

There is an online checker linked at http://cf-pcmdi.llnl.gov/conformance/compliance-checker.

This checker can validate CF-1.0 to 1.4 and is apparently written in Python. It would be perfect if we could simply use that instead of having to do our own development. Unfortunately I was not able to find a downloadable version so far. I have no idea if they are willing to release it at all, but if we can get it, it would save us quite some work. I have tried it with one of our files yesterday and I only got a stack trace back, so I guess it's not all that robust just yet. So it seems they could use some help with bugfixing anyway. Does anyone know more about this? Maybe we can arrange something to get the code?

location and format of logfiles

As stated before, I would like to move the logs directory out of the static directory. On Linux I would like to put it to /var/log/OWS or something like that, no idea where it should go on Windows. The main idea is to prevent access to the logs via the web server.

I'm also wondering about the log format. For me it would be much more handy if the Logs would come in Apache-Format. Then I could use the existing logwatch-scripts to do some analysis and summaries easily.

Re: location and format of logfiles -- Hoijarvi 10:24, 10 July 2009 (EDT)

Do whatever you please with logs. They have not been a priority for me, and I don't depend on any formats. Using a standard format is an obvious choice.

Re: Re: location and format of logfiles -- Michael Decker (MDecker) 12:37, 14 July 2009 (EDT)

I have played with the logging, but it seems impossible to make the webpy standalone server output another log format without changing it's code.

So what I'm doing for now is to simply log the log output of webpy to a file instead of console. Unfortunately another package is required to log to a file: WsgiLog. So I have made logging to a file optional and it is still possible to run the server as before without having to get the package. Logging to a file should also work on windows when WsgiLog is installed.

When logging to a file it is also possible to zero the last 8 bits of the IP address to "anonymize" the logs a little. I put this in because of pricavy concerns coming up with some people in germany when it's about logging people's IPs.

I also changed the log path to "/var/log/ows/" for posix OSes and left it unchanged for the rest for now.

For a productive system it might be the best to go for a real http server on the long run. According to the webpy documentation it should be easy to set up with FastCGI. I'm sure it will be much easier to take care of proper logging there.

misc

- "Allow queries across longitude, from 179..-179 or 359..1 as a two-degree query, as the spec says." I did not find that in the specs so far. Does it mean you only have to support a two-degree query or should every query across the longitude be supported? I will try to add this to pync3 today.

- "Improve the admin tool to give better error messages and progress messages." With the added check tool it already gives a little more info about what could be wrong, but of course that is far from complete.

- "2009-06-04: After being used for some datasets at Datafed, we get the first external developer, M. Decker, to contribute to the source code. He adds Unicode support." I did not do real unicode support so far. We have talked about it and came to the conclusion that it was a bad idea before NetCDF4. The only thing related to charsets is in the check tool. It converts non-ASCII characters in string attributes.

Re: misc -- Hoijarvi 10:31, 9 July 2009 (EDT)

In document 06-083r8 WCS 1.1

7.7.2 Spanning the antimeridian of a geographic CRS

In a geographic CRS, longitude values are often defined with a finite extent (e.g., from -180 degrees to +180 degrees). The minimum and maximum longitude limits define a single line (on the ellipsoid, sphere, or cylinder), known as the antimeridian, across which longitude values are discontinuous: as this line is crossed, longitude changes abruptly (e.g., going West from a little more than -180 degrees to a little less than +180 degrees). This necessitates an “extended” interpretation of the Bounding Box definition:

A Bounding Box defined in a geographic CRS (or a WGS84BoundingBox) whose LowerCorner longitude is greater than that of its UpperCorner shall describe a region that crosses the longitude discontinuity.

I'll also changed the unicode to non-ascii characters.

Re: Re: misc -- Michael Decker (MDecker) 04:50, 14 July 2009 (EDT)

pync3 supports this now. and for the latitude as well, just in case.