Concurrent access by writer and readers

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Concurrent access by writer and readers
From: Russ Rew <russ@xxxxxxxxxxxxxxxx>
Date: Thu, 22 Apr 2004 15:37:40 -0600

Hi,

For some light reading last night, I was reading the HDF5 FAQs (I
know, I've got to get out more :-), and came across a possible
show-stopper:

  http://hdf.ncsa.uiuc.edu/hdf5-quest.html#grdwt

As background, users of netCDF sometimes have one writer process and
one or more reader processes opening and accessing the same file
concurrently, using nc_sync() or NC_SHARE to make sure the readers and
writer see a consistent version of the file.  The way concurrent
access is handled is explained here in about seven paragraphs:

  
http://www.unidata.ucar.edu/packages/netcdf/guidec/guidec-10.html#HEADING10-322

under the nc_sync() description.  

Note that there are two different levels of concern for
synchronization:

  1. data, that is values of variables that are changed and new data
     added, including new records as the result of the unlimited
     dimension being increased by the writer process

  2. schema changes, such as adding new dimensions, variables, or
     attributes, changing the names of things, or even changing the
     value of an attribute.

NetCDF provides good support for multiple readers and one writer for
changes of the first type, to the data, by either using nc_sync() or
(preferred) by using the NC_SHARE flag on open.

NetCDF provides almost no support for concurrent changes of the second
type, which involve a writer changing the schema (header) information
for a file, implying that the cached in-memory header information
would all have to be reread.

So for the fairly uncommon second kind of change (to the schema), we
recommend that some external form of communication be used to inform
the readers of a need to close and reopen the file to see the changes
made by the writer.  However the more common first kind of change is
handled without needing any communication between writer and readers
and without requiring closing and reopening the file.

If my reading of the HDF5 FAQ answer is right, this common kind of
data concurrency is not supported in HDF5, so systems that make data
changes with a concurrent writer and one or more readers won't work
unless we provide some new communication among the processes doing I/O
to make sure readers close and then reopen the file after *any* write.
Is this right, or am I taking the HDF5 FAQ answer too literally?

We're currently not doing all this stuff in our netCDF-4 prototype
if a file is open with the NC_SHARE flag or on nc_sync() calls.  If we
have to add code on reads to close and then reopen the file if it's
been modified, this will require some rework and have performance
implications.

On the other hand, maybe everything is OK and the above is not really
necessary to assure that the reader gets a consistent, if not
absolutely up-to-date, view of the file (which is all that the netCDF
implementation needs).

Comments?

--Russ

Follow-Ups:
- Re: Concurrent access by writer and readers
  - From: Quincey Koziol

2004 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-hdf archives: