Re: [netcdfgroup] retrieving missing data from netcdf3_64BIT_OFFSET formatted netcdf files

  • To: Ramakrishnan N <ram.n.krishnan@xxxxxxxxx>
  • Subject: Re: [netcdfgroup] retrieving missing data from netcdf3_64BIT_OFFSET formatted netcdf files
  • From: Gus Correa <gus@xxxxxxxxxxxxxxxxx>
  • Date: Fri, 4 Feb 2022 12:11:49 -0500
Hi Ramakrishnan

I presume the latest screenshot you sent is of another (correct) prod.nc
file, not the damaged one.

That seems to be a hard nut to crack.
I would guess there is a bug in the Amber IO routines, or maybe it is using
the file system inconsistently.
Have you ran the job to completion, or are you copying the files out before
the simulation ends?
I mean, because Amber may be closing the netcdf files only after it
finishes the simulation
(that would be a bad design, but it is possible).

Unfortunately the fact that the file size increases with the number of
frames doesn't guarantee that
the data was actually written to the file.
The space may have been allocated and filled with _FillValues (missing
values) or with zeros.

A more basic tool to inspect the file contents is "object dump" "od".
It has many options to dump the file contents in hexadecimal, octal,
floating point, integer, character,
etc formats, to limit the size of the input, to specify an offset to start
the dump from, etc.
It is a bit painful, but may give you a hint of what is inside the files,
and whether it is worth the effort to
try to recover them.
https://man7.org/linux/man-pages/man1/od.1.html


Gus

On Fri, Feb 4, 2022 at 11:54 AM Ramakrishnan N <ram.n.krishnan@xxxxxxxxx>
wrote:

> Hi Gus
>
> Thank you for looking into this. The time variable is supposed to contain
> floating-point numbers as in the attached screenshot.
>
> I did try converting the file to a  cdl format following some earlier
> threads in the mailing list. The resulting cdl file does not contain any
> data.
>
> I am trying to figure out where the data is getting stored. As I pointed
> out in my initial post, the size of the netcdf file increases with the
> number of frames stored during the simulation.
>
> Best
> Ram
>
>
> On Fri, Feb 4, 2022 at 11:35 AM Gus Correa <gus@xxxxxxxxxxxxxxxxx> wrote:
>
>> Hi Ramakrishnan
>>
>> Do you know what is stored in the "time" variable, if anything?
>>
>> ncdump -c prod.nc
>>
>> should tell (prints the header and the coordinate variables, including
>> time).
>>
>> A brute force method to fix a broken netCDF (if the file size is not
>> gigantic)
>> that I used in the past is this:
>>
>> First use ncdump to dume the whole file to a text file,
>> say ncdump prod.nc > prod.cdl
>> cdl is the "common data language:", a text representation of a netCDF
>> file.
>>
>> Then, second, edit/doctor the prod.cdl text file (with vi, emacs, etc).
>> Replace the time dimension 0 by the correct value.
>> If the time coordinate variable is wrong, replace each of its entries by
>> the correct value.
>> Be very careful so as not to mess up the cdl syntax (there are commas
>> separating values,
>> and other symbols separating variables, etc).
>> Save the edited prod.cdl file, possibly with a different name, say
>> prod_new.cdl
>>
>> Finally use ncgen to regenerate the prod.nc file from the edited cdl:
>> ncgen -b -o prod_new.nc prod_new.cdl
>> This will create the prod_new.nc file, presumably with everything fixed
>> (assuming the original file had
>> the correct data, except for the time dimension and time coordinate
>> variable perhaps).
>> I am citing from memory here, so please double check the ncgen man page
>> before you try.
>> For details on the ncgen syntax, see:
>> https://linux.die.net/man/1/ncgen
>>
>> I used this primitive method to fix a few damaged netCDF files in the
>> past,
>> in desperate situations like yours.
>> I am not proud of its elegance,
>> but sometimes elegance is what you care less about,
>> and it worked for me.
>>
>> Also, you may need to do "ncdump -k prod.nc" beforehand,
>> to check which type of netcdf format your file has.
>> Then use the same format in the ncgen command above.
>>
>> I hope it helps.
>> Gus Correa
>>
>> On Fri, Feb 4, 2022 at 9:52 AM Ramakrishnan N <ram.n.krishnan@xxxxxxxxx>
>> wrote:
>>
>>> I have a netcdf file (prod.nc) that contains time series from a
>>> molecular dynamics simulation (Amber force field, OpenMM engine, parmed
>>> netCDFReporter). The netCDFReporter had some problems and as a result, the
>>> number of frames in the netcdf file is zero. Given below is the ncdump for
>>> the file:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *$ncdump -h prod.nc <http://prod.nc>netcdf prod {dimensions:
>>> frame = UNLIMITED ; // (0 currently)        spatial = 3 ;        atom =
>>> 20504 ;variables:        char spatial(spatial) ;        float time(frame)
>>> ;                time:units = "picosecond" ;        float
>>> coordinates(frame, atom, spatial) ;                coordinates:units =
>>> "angstrom" ;// global attributes:                :Conventions = "AMBER" ;
>>>               :ConventionVersion = "1.0" ;                :application =
>>> "AmberTools" ;                :program = "ParmEd" ;
>>> :programVersion = "3.4.0+11.g1be8ca0f" ;                :title =
>>> "ParmEd-created trajectory" ;}*
>>>
>>> However, the netcdf file has non-zero size (that increases linearly with
>>> the number of frames stored) which implies that it certainly has the data
>>> written into it. I tried a number of tools (nco tools, netCDF4, scipy
>>> netcdf reader, xarray) to access the missing data but have not succeeded.
>>>
>>>
>>>
>>>
>>>
>>> *I have two questions:1. Does the file contain real data?2. If the
>>> former, is there a way to retrieve the data and create a new netcdf file?*
>>>
>>> I am desperately looking to salvage near 3 microseconds of simulation
>>> data which would take more than 2 months to generate. I would greatly
>>> appreciate it if anyone can provide me with some insight into this problem.
>>>
>>> The attached netcdf file has 14 frames that can be used to examine the
>>> issue.
>>>
>>> Thanks in advance
>>>
>>> Best
>>> Ram
>>>
>>>
>>> _______________________________________________
>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>> recorded in the Unidata inquiry tracking system and made publicly
>>> available through the web.  Users who post to any of the lists we
>>> maintain are reminded to remove any personal information that they
>>> do not want to be made public.
>>>
>>>
>>> netcdfgroup mailing list
>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> https://www.unidata.ucar.edu/mailing_lists/
>>>
>>
  • 2022 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: