[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 980507: bad netCDF 3.4 problems on winterpark



> >From: Charlie Zender <address@hidden>
> >Keywords: 199805072123.PAA08776
>
> First, you should know that all the netCDF access is being done with
> netCDF version 2 library calls, if that makes a difference. The
> routines are linked to the 3.4 library, however. As you say, the
> netCDF library might not be an issue if the cause is IRIX64 filesystem
> bugs. We do so much of our data manipulation in netCDF form that a
> subtle OS bug is somewhat more likely to be noticed in the context of
> netCDF.
>
> >In general terms, what is your program doing?
> >Does it create a new output file or perform an redefintion?
>
> The program is opening a 20 Mb input file, reading a 0.5 Mb hyperslab
> of data (the last four records), creating an output file, and writing
> the hyperslab into the new output file. The second ncks command is
> merely printing to screen a small portion of the hyperslab that got
> written to the output file. The bad data is definitely in the file,
> and not a bug in the print command, because we have graphed the data
> in the file using other tools and the (erroneous) zeros show up.
>
> >As as experiment which may further isolate the problem,
> >add a command line option which adds the NC_SHARE flags
> >to the open or create. If the problem goes away or persists
> >with this flag on will focus our attention.
>
> Done. The problems persisted.
>
> Thanks,
> Charlie
> --
> Charlie Zender      Voice: (303) 497-1612, FAX: 497-1324
> NCAR ASP & CGD     E-mail: address@hidden
> P.O. Box 3000         URL: http://www.cgd.ucar.edu/cms/zender
> Boulder CO 80307-3000 PGP: finger -l address@hidden

Sounds like the problem is with the first program.

Here are some things to check.

Prior to netcdf-3, the 'ncid' netcdf descriptors were always
0, 1, 2, for the first, second, third netcdf opened.
In netcdf-3, the ncid is the underlying file descriptor, so in
most cases, the first ncid returned is 3. We've had cases where
programmers assume the old behavior, but you are not one of those
people, right?

Be absolutely sure that the output file is being closed
before program exit.

Somehow verify that the subset/copy operation in your program is
doing what you want. Maybe add a little code to read the data back
from the output file after it is written and compare what you get
with what was expected. (I know you are doing this in the second program.
I'm hoping that doing this in the context of the first program will
help isolate the problem.

Just as verification, running with NC_SHARE should have slowed down the
program noticibly if significant amounts of data are involved. Did it?

Hope this helps.

-glenn