[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #KAE-542730]: netCDF question - realtime replication of netCDF files



Hi Joe,

> We have a lidar (HSRL) that records data in netCDF.  We need to get the same 
> data copied (near real-time) to a server for archiving, and to allow 
> real-time access to the data files stored on the server.
> 
> Currently, we use a Python script which watches the current netCDF file and 
> writes new records over a socket.   As far as I know, this script parses the 
> netCDF file binary format.  The files are recorded in netCDF3 classic format. 
>  This script was written by a programmer at University of Wisconsin, where 
> the instrument was developed.
> 
> Can you suggest a cleaner way of achieving this functionality?

Depending on how much latency you can tolerate, you might use this 
special version of rsync for netCDF files developed by Joe Sirott
and just invoke it every few seconds to mirror your data efficiently:

  http://www.epic.noaa.gov/epic/software/cdfsync/

The problem with that (and any other approach I know of that doesn't
parse the netCDF file to be mirrored) is that it may open the data
file to be read at an inopportune time, for example in the middle of
writing a large record.  In that case, the mirrored data might be
inconsistent, because reading the last record might read an 
end-of-file before all the data is read.

To guarantee that the input file is in a consistent state, I think
the writing and copying programs both need to create/open the file 
with the NC_SHARE flag, which is designed so for one writer and
multiple readers to access data concurrently.  I'm not sure how else
to do what you want.

--Russ



Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: KAE-542730
Department: Support netCDF
Priority: Normal
Status: Closed