[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #VZU-531672]: NetCDF error during model runtime



Hi Akshay,

> I am running the CMAQ model v4.7, which uses the netCDF file format for data 
> storage. I have netCDF version 3.6.3, and each computational job is running 
> on an 8-core parallel processor configuration (with 8 such jobs running in 
> parallel, reading from and writing to 26TB xfs RAID 6 arrays). Recently, 
> there have been several netCDF errors which cause the CMAQ program to quit: 
> error -43 (error processing attribute FTYPE), error -51 (unknown file fromat) 
> and sometimes error -37 (disk synch error, I think). These errors occur when 
> opening a file for the first time (ie. The CMAQ program checks for the 
> existence of the file, and writes to a new file if the file is not found). 
> Also, the errors seem to happen to any of the 8 parallel jobs at different 
> run times, but always when the file is being opened as new for the first 
> computational timestep.
> 
> I was wondering if you had any suggestions as to how to tackle this problem. 
> The netCDF setup has worked fine for previous runs, and the only thing that 
> has changed is the filesystem (we migrated to the above-mentioned new xfs 
> filesystem recently). On this note, are there any specific filesystem 
> settings that need to be configured in order for netCDF to perform currectly?

Are you trying to write in the same file from multiple processes or threads
concurrently?  NetCDF 3.6.3 is only designed to permit one writer and multiple
readers, not multiple writers.  There is no filesystem setting that will make
multiple concurrent writes safe or reliable with netCDF-3.

Perhaps you should consider using netCDF-4 or parallel netCDF, either of which
supports multiple concurrent writes on an underlying parallel file system.

If you are not attempting multiple concurrent writes, then the problem you are
reporting sounds like a new problem we haven't seen before.  Is it practical to
isolate the problem to a small program we could use to reproduce it here?

--Russ


Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: VZU-531672
Department: Support netCDF
Priority: High
Status: Closed