[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Recovering data from a corrupted netCDF file



> Organization: Geophysical Institute, University of Alaska, Fairbanks
> Keywords: 199405262136.AA11923

Hi Mark,

> Hi, my name is Mark Conde and I have a (probably nasty) question 
> regarding several netCDF files of mine that have been corrupted in some 
> manner.  My problem is that something has happened to these files such 
> that any netCDF utilities that I use to attempt to access them 
> immediatly return a message saying "not a netCDF file".   I believe the 
> data are mostly intact but have been modified in some places.  My hope 
> is to be able to recover at least some of the data from these files.  
> Unfortuantely, I do not have any idea of the physical format in which 
> netCDF stores its data, so I have no means of writing code which can 
> "scrape" the intact data from these files.  Is there such description 
> circulated?  Better yet, are there any utilities to help recover damaged 
> netCDF files?  It would be a major disadvantage if netCDF files become 
> unuseable after small corruptions - by comparison intact data is easily 
> recovered from text files.
> 
> Hoping for some help...

The test for whether a file is a netCDF file is just comparison of the first
four bytes of the file with the netCDF "magic number", which is the bytes
'C', 'D', 'F', SOH (Start Of Header or ASCII control-A, used for the version
number of the netCDF format which is still version 1).  If you use the Unix
"od" command to look at the beginning of a netCDF file as characters with
"od -c", you should always see the following as the first four bytes:

    % od -c foo.nc
    0000000   C   D   F 001  ...

If you see some other characters first followed by these four, perhaps some
extra bytes were added to the front of our files that you can easily "scrape
off" to restore the files.  If you see the bytes have been swapped and appear
as 

    D C 001 F

instead, for example, you can use utilities such as dd to swap each pair of
bytes to restore the file.  If you see nothing resembling these four bytes
near the front of the file, there may be no practical way to recover the
data because you won't be able to locate where the netCDF "header" starts
that contains information about the names and sizes of dimensions, and
names, types, shapes, and offsets within the file of the netCDF variables.

I know of no utilities available to help with netCDF data recovery, and
yours is the first request we've seen for this capability.

There is a chapter in the netCDF User's Guide that describes the structure
of a netCDF file.  In addition, the netcdf/libsrc/local_nc.h file in the
netCDF sources specifies the structure of netCDF arrays, which are encoded
by the XDR functions into bytes that appear in the netCDF file.  If you have
other netCDF files with exactly the same structure as your corrupted netCDF
files, or if you have a CDL file that exactly matches the files, it is
possible to independently compute the byte offset from the beginning of the
file for each variable within each netCDF record.  The data could then be
read by positioning to the computed byte offset and using the appropriate
XDR read for the data type (byte, short, long, float, or double) to decode
the data array.  If you happen to be using floating-point and have a
computer that also supports IEEE float-point, then the floating point arrays
in the file don't even need to be "decoded", since the XDR representation
for floating-point numbers is then the same as the native representation.

As for the existence of a document describing the exact file structure,
here's an excerpt from a recent answer to another user about this:

    We don't have such a document for several reasons.  First, there is a
    chapter in the netCDF User's Guide on "NetCDF File Structure and
    Performance" that explains the physical structure of netCDF data at a
    high enough level to make clear the performance implications of
    different data organizations.

    Second, we don't want netCDF users to write programs that depend on the
    physical representation of netCDF data.  If they did that, we would not
    be free in the future to change the physical representation.  If users
    only go through the documented interfaces to access the data, any
    changes we make in the future physical representation will be
    transparent to current users.

    Finally, the file structure is completely specified by the source code,
    and by the description that it is the XDR-encoding of the NC structures
    defined in netcdf/libsrc/local_nc.h.  Since XDR is specified elsewhere
    in a separate document, we didn't want to copy that specification but
    instead just refer to it.  That specification is available via a WWW
    browser such as Mosaic or via gopher at

            gopher://ds.internic.net/00/rfc/rfc1014.txt

    ....

    Anyway, I hope this helps explain why I can't point at a single specific
    document.

__________________________________________________________________________
                      
Russ Rew                                              UCAR Unidata Program
address@hidden                                        P.O. Box 3000
(303)497-8645                                 Boulder, Colorado 80307-3000