Difference between revisions of "Talk:Streaming and or netCDF File"

From Earth Science Information Partners (ESIP)
(Streaming netCDF -- ~~~~)
 
Line 1: Line 1:
== Streaming netCDF -- [[User:Rhusar|Rhusar]] 06:40, 14 July 2011 (MDT) ==
+
== Streaming netCDF -- [[User:MDecker|MDecker]] 06:40, 14 July 2011 (MDT) ==
  
 
Is it possible to stream netcdf output files to the client on-the-fly without the need to save them locally on disk first? this would eliminate a disk i/o bottleneck and also ease possible restrictions on the maximum size of requested datasets. (this would at least be useful for the store=false mode of operation). In this context we should also explore if there are any benefits from using openDAP to some extent. Another idea we had was to find some way to reliably estimate (or predict) the total dataset size before delivery (and possibly before assembly of the result on the server). This would be useful on the server side to restrict queries that will be too big as well as on the user side to give an idea about likely download durations.
 
Is it possible to stream netcdf output files to the client on-the-fly without the need to save them locally on disk first? this would eliminate a disk i/o bottleneck and also ease possible restrictions on the maximum size of requested datasets. (this would at least be useful for the store=false mode of operation). In this context we should also explore if there are any benefits from using openDAP to some extent. Another idea we had was to find some way to reliably estimate (or predict) the total dataset size before delivery (and possibly before assembly of the result on the server). This would be useful on the server side to restrict queries that will be too big as well as on the user side to give an idea about likely download durations.
 +
 +
===Re: Streaming netCDF -- [[User:Rhusar|Rhusar]] 06:42, 14 July 2011 (MDT)===
 +
Streaming huge files is beneficial, because you can start writing to output before reading the data has finished. Todd Plessel at EPA uses streaming, I've heard.
 +
The Unidata NetCDF library does not support it directly, but if you restrict the writing order: attributes first, then variables in the defined order, it might be possible to add streaming support to the netcdf library. We should ask unidata netcdf developers. About the IO bottleneck: modern file caching, especially in windows when you open the file with temporary bit on, is quite fast. We definitely need to benchmark the IO speed first.

Revision as of 06:42, July 14, 2011

Streaming netCDF -- Michael Decker (MDecker) 06:40, 14 July 2011 (MDT)

Is it possible to stream netcdf output files to the client on-the-fly without the need to save them locally on disk first? this would eliminate a disk i/o bottleneck and also ease possible restrictions on the maximum size of requested datasets. (this would at least be useful for the store=false mode of operation). In this context we should also explore if there are any benefits from using openDAP to some extent. Another idea we had was to find some way to reliably estimate (or predict) the total dataset size before delivery (and possibly before assembly of the result on the server). This would be useful on the server side to restrict queries that will be too big as well as on the user side to give an idea about likely download durations.

Re: Streaming netCDF -- Rhusar 06:42, 14 July 2011 (MDT)

Streaming huge files is beneficial, because you can start writing to output before reading the data has finished. Todd Plessel at EPA uses streaming, I've heard. The Unidata NetCDF library does not support it directly, but if you restrict the writing order: attributes first, then variables in the defined order, it might be possible to add streaming support to the netcdf library. We should ask unidata netcdf developers. About the IO bottleneck: modern file caching, especially in windows when you open the file with temporary bit on, is quite fast. We definitely need to benchmark the IO speed first.