Difference between revisions of "Talk:WCS NetCDF Development"
Line 57: | Line 57: | ||
I'm also wondering about the log format. For me it would be much more handy if the Logs would come in Apache-Format. Then I could use the existing logwatch-scripts to do some analysis and summaries easily. | I'm also wondering about the log format. For me it would be much more handy if the Logs would come in Apache-Format. Then I could use the existing logwatch-scripts to do some analysis and summaries easily. | ||
+ | ====Re: location and format of logfiles -- [[User:Hoijarvi|Hoijarvi]] 10:24, 10 July 2009 (EDT)==== | ||
+ | |||
+ | Do whatever you please with logs. They have not been a priority for me, and I don't depend on any formats. Using a standard format is an obvious choice. | ||
=== misc === | === misc === |
Revision as of 07:24, July 10, 2009
-- Hoijarvi 16:14, 8 July 2009 (EDT)
Starting a NetCDF-CF-WCS brain dump. Feel free to send feedback.
-- Michael Decker (MDecker) 06:49, 9 July 2009 (EDT)
This page seems to be a good idea to discuss our efforts instead of mailing all the time.
python nc3-clone (pync3)
I'm currently working on a python version of nc3 that offers the same interface to slice subcubes as nc3 does. I did some first tests yesterday and so far all the output files produced by nc3 as well as pync3 come out with identical contents.
On top of that I also added some handling of variables that are not processed by nc3 right now because they do not depend on X and Y or on other dimensions (like bnds).
Performance will probably something that takes some more time to look at. Yesterday I benchmarked with the biggest file I could find that was compatible with nc3 (and pync3) and right now the python implementation takes around double the time of the C++ version. I have to do some more tests, but I'm pretty sure most of it is due to inefficient slicing operations and big temporary numpy arrays. There is a selection mechanism built into nio itself, so I will try to figure out if we can use that. Hopefully it will be implemented a bit more efficiently and at least with less temporal copies of data.
I'm not expecting to get the python implementation up to the same speed, but maybe it's still possible to get it a little closer. (only 30-50% slowdown would be nice) A big advantage of the python implementation is certainly it's far easier maintainability and it's a shame that pynio does not exist for Windows.
Currently I'm wondering if I should try to build wrapper functions for all the native NetCDF calls that are supported by nc3. The goal would be to have exactly the same interface for both modules so that software built on it can be the same. However, I'm not sure how easy it is to build these wrapper functions based on nio. Especially as there is quite some abstraction and information like variableIDs and such are not really accessible. An alternative would be to pipe all the low-level calls through to the real nc3 module. But my hope would be that we can easily get the WCS server to run on all unix platforms without having to compile the nc3 module and instead just use nio and the python module.
Re: python nc3-clone (pync3) -- Hoijarvi 10:09, 10 July 2009 (EDT)
I'll wait with this until I have other patches applied.
request format for RangeSubset
Yesterday I was also trying to request more than one variable using the RangeSubset parameter. The first problem was that ';' is usually used as a delimiter between key-value-pairs. So the Python cgi parser module does not handle this part correctly and instead sees every variable name as a parameter with no value. Only the first variable name is kept as a value of the RangeSubset-parameter. To fix this without having to change the parser and because ';' is not a delimiter mentioned in the WCS specs, I chose to replace it by something else to get it through the parser.
Another thing I found was that parsing of the RangeSubset values probably does not agree with the specs right now: section 10.2.2.2 of the WCS 1.1.0 specs does not mention putting "fieldSubset:" in front of every variable name. Instead just the variable name should be used.
Both changes are available via our darcs repository if you want to have a look.
Re: request format for RangeSubset -- Hoijarvi 10:19, 10 July 2009 (EDT)
You're right. I have misinterpreted the "fieldSubset:", there's no such prefix here. I have to fix this in our registration server-by-server basis, since places like Northrop are running the old WCS, and until all those are dead, I have to work around.
PyNIO problems
It turns out that nio has a bug that is pretty mean when you try to access NetCDF files created by nio with the low-level C functions: When creating string attributes, nio allocates one byte too much for the string and saves the wrong length in the file. So when you create the axis attribute with a one character string, nio saves a length of 2 in the file while the string is really only 1 character long. Reading this back with nio seems to be no problem, but when using the C-functions in nc3, the wrong length is reported and the module refuses to read the attribute. A workaround seems to be to write the same attribute a 2nd time after it has been created. This fixes the length information and the data can be read with nc3 as well.
I will bring this up on the pyngl mailing list and hope they will fix it for the next release.
Re: PyNIO problems -- Hoijarvi 10:21, 10 July 2009 (EDT)
What is the extra character? If it's 0x00, I could just read the string and it will compare correctly to 'X'.
CF-1.0 checker
There is an online checker linked at http://cf-pcmdi.llnl.gov/conformance/compliance-checker.
This checker can validate CF-1.0 to 1.4 and is apparently written in Python. It would be perfect if we could simply use that instead of having to do our own development. Unfortunately I was not able to find a downloadable version so far. I have no idea if they are willing to release it at all, but if we can get it, it would save us quite some work. I have tried it with one of our files yesterday and I only got a stack trace back, so I guess it's not all that robust just yet. So it seems they could use some help with bugfixing anyway. Does anyone know more about this? Maybe we can arrange something to get the code?
location and format of logfiles
As stated before, I would like to move the logs directory out of the static directory. On Linux I would like to put it to /var/log/OWS or something like that, no idea where it should go on Windows. The main idea is to prevent access to the logs via the web server.
I'm also wondering about the log format. For me it would be much more handy if the Logs would come in Apache-Format. Then I could use the existing logwatch-scripts to do some analysis and summaries easily.
Re: location and format of logfiles -- Hoijarvi 10:24, 10 July 2009 (EDT)
Do whatever you please with logs. They have not been a priority for me, and I don't depend on any formats. Using a standard format is an obvious choice.
misc
- "Allow queries across longitude, from 179..-179 or 359..1 as a two-degree query, as the spec says." I did not find that in the specs so far. Does it mean you only have to support a two-degree query or should every query across the longitude be supported? I will try to add this to pync3 today.
- "Improve the admin tool to give better error messages and progress messages." With the added check tool it already gives a little more info about what could be wrong, but of course that is far from complete.
- "2009-06-04: After being used for some datasets at Datafed, we get the first external developer, M. Decker, to contribute to the source code. He adds Unicode support." I did not do real unicode support so far. We have talked about it and came to the conclusion that it was a bad idea before NetCDF4. The only thing related to charsets is in the check tool. It converts non-ASCII characters in string attributes.
Re: misc -- Hoijarvi 10:31, 9 July 2009 (EDT)
In document 06-083r8 WCS 1.1
7.7.2 Spanning the antimeridian of a geographic CRS
In a geographic CRS, longitude values are often defined with a finite extent (e.g., from -180 degrees to +180 degrees). The minimum and maximum longitude limits define a single line (on the ellipsoid, sphere, or cylinder), known as the antimeridian, across which longitude values are discontinuous: as this line is crossed, longitude changes abruptly (e.g., going West from a little more than -180 degrees to a little less than +180 degrees). This necessitates an “extended” interpretation of the Bounding Box definition:
A Bounding Box defined in a geographic CRS (or a WGS84BoundingBox) whose LowerCorner longitude is greater than that of its UpperCorner shall describe a region that crosses the longitude discontinuity.
I'll also changed the unicode to non-ascii characters.