[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ingrid and netCDF



> Organization: Lamont-Doherty Earth Observatory of Columbia University
> Keywords: 199408181641.AA02037

Benno,

> I read the entry; it looks fine.  You might want to change
> 
> Given the proper commands in its command file, it can read data
> from the data catalog, ...
> 
> to
> 
> Given the proper commands in its command file, it can read data
> from its data catalog, ... 

Done.

> I have a technical netCDF question: is it possible to generate a
> netCDF file to standard output (i.e. sequentially)?  As data servers
> become more common, there is a need for a format to communicate
> between clients and servers, and netCDF could fit that bill.  The
> problem that I have from the server side is that I have to generate a
> netCDF file, then send it to the client.  I would prefer to generate
> and send it at the same time.
>
> Note that I do not need the full flexibility of the netCDF interface:
> I am perfectly happy declaring all of my attributes and file structure
> at the beginning and have that sent in one block.  all I really need
> to be sequential is for the data as it is written slice by slice to be
> sent slice by slice.  For the client, the same would be true: all I
> need to be sequential is that the data can be read slice by slice.  At
> the moment the server side is what I really care about, though.

It is not possible to directly generate a netCDF file to standard output
with the current library implementation.  One problem is with the record
dimension.  The current number of records (the size of the unlimited
dimension) is stored in the file header and is always kept consistent (by
the library) with the number of records written so far.  Hence if the header
is written first, it will show 0 records.  As each new record is written,
the number of records will be incremented in the header, but this incremented
number will never be output if output is sequential.

We have netCDF operators that read from standard input and write to standard
output so they can be used in processing pipelines without explicit naming
of temporary files, but the way this is implemented is brute force.  If a
program is to write a netCDF file to standard output, it creates and writes
to a private temporary netCDF file and if the temporary file is properly
closed it gets copied to standard output before exiting.  Similarly, a
program that reads a netCDF file from standard input first copies standard
input into a private temporary file and then uses ncopen() on that temporary
file to access the data.  This might be costly for large files, but seems to
work OK for small files.

Another possible approach is to have a client request data by using a remote
procedure call interface to a netCDF server.  This may save I/O because the
client may make multiple requests for different small subsets of the data,
and the extraction of these small subsets from a large file is best
performed on the server side.  A programmer in the oceanography community
(James Gallagher) has actually implemented a full-blown RPC interface for
the netCDF library to try out this idea, but I'm not sure how well it works.

--Russ