Difference between revisions of "Talk:Streaming and or netCDF File"

From Earth Science Information Partners (ESIP)
Line 6: Line 6:
 
Streaming huge files is beneficial, because you can start writing to output before reading the data has finished. Todd Plessel at EPA uses streaming, I've heard.
 
Streaming huge files is beneficial, because you can start writing to output before reading the data has finished. Todd Plessel at EPA uses streaming, I've heard.
 
The Unidata NetCDF library does not support it directly, but if you restrict the writing order: attributes first, then variables in the defined order, it might be possible to add streaming support to the netcdf library. We should ask unidata netcdf developers. About the IO bottleneck: modern file caching, especially in windows when you open the file with temporary bit on, is quite fast. We definitely need to benchmark the IO speed first.
 
The Unidata NetCDF library does not support it directly, but if you restrict the writing order: attributes first, then variables in the defined order, it might be possible to add streaming support to the netcdf library. We should ask unidata netcdf developers. About the IO bottleneck: modern file caching, especially in windows when you open the file with temporary bit on, is quite fast. We definitely need to benchmark the IO speed first.
 +
 +
====Re: Re: Streaming netCDF -- [[User:Rhusar|Rhusar]] 06:51, 14 July 2011 (MDT)====
 +
That sounds interesting, I would love to hear about that. Streaming on-the-fly could improve reaction times of the server a lot.<br>
 +
''The Unidata NetCDF library does not support it directly, but if you restrict the writing order: attributes first, then variables in the defined order, it might be possible to add streaming support to the netcdf library. We should ask unidata netcdf developers.'' <br>
 +
Yes, I have been considering something like that, but without explicit
 +
support from the low-level netCDF libs, you can never be quite sure if it will work in all cases. The netcdf4-python module offers a sync/flush
 +
call that would probably help hacking this if there is no reliable official way to do it.

Revision as of 06:51, July 14, 2011

Streaming netCDF -- Michael Decker (MDecker) 06:40, 14 July 2011 (MDT)

Is it possible to stream netcdf output files to the client on-the-fly without the need to save them locally on disk first? this would eliminate a disk i/o bottleneck and also ease possible restrictions on the maximum size of requested datasets. (this would at least be useful for the store=false mode of operation). In this context we should also explore if there are any benefits from using openDAP to some extent. Another idea we had was to find some way to reliably estimate (or predict) the total dataset size before delivery (and possibly before assembly of the result on the server). This would be useful on the server side to restrict queries that will be too big as well as on the user side to give an idea about likely download durations.

Re: Streaming netCDF -- Rhusar 06:42, 14 July 2011 (MDT)

Streaming huge files is beneficial, because you can start writing to output before reading the data has finished. Todd Plessel at EPA uses streaming, I've heard. The Unidata NetCDF library does not support it directly, but if you restrict the writing order: attributes first, then variables in the defined order, it might be possible to add streaming support to the netcdf library. We should ask unidata netcdf developers. About the IO bottleneck: modern file caching, especially in windows when you open the file with temporary bit on, is quite fast. We definitely need to benchmark the IO speed first.

Re: Re: Streaming netCDF -- Rhusar 06:51, 14 July 2011 (MDT)

That sounds interesting, I would love to hear about that. Streaming on-the-fly could improve reaction times of the server a lot.
The Unidata NetCDF library does not support it directly, but if you restrict the writing order: attributes first, then variables in the defined order, it might be possible to add streaming support to the netcdf library. We should ask unidata netcdf developers.
Yes, I have been considering something like that, but without explicit support from the low-level netCDF libs, you can never be quite sure if it will work in all cases. The netcdf4-python module offers a sync/flush call that would probably help hacking this if there is no reliable official way to do it.