Re: [netcdfgroup] retrieving missing data from netcdf3_64BIT_OFFSET formatted netcdf files

  • To: Gus Correa <gus@xxxxxxxxxxxxxxxxx>
  • Subject: Re: [netcdfgroup] retrieving missing data from netcdf3_64BIT_OFFSET formatted netcdf files
  • From: Ramakrishnan N <ram.n.krishnan@xxxxxxxxx>
  • Date: Fri, 4 Feb 2022 12:54:07 -0500
Hi Gus

You are right, the screenshot was from another file that was written
correctly. Both files were written by OpenMM. The correct one used the
netCDFreporter from the mdtraj package while the wrong one used the
netCDFreporter from the parmed package.

Thank you for the suggestion on using object dump. The attached screenshot
shows the output from od which appears to the coordinates for my systems.
This gives me hope that the file indeed contains the data and not the
_Fillvalues. I will work on this and try to extract the data in the
appropriate format.

Thanks again for your time.

Best
Ram
[image: Screenshot 2022-02-04 125140.png]




On Fri, Feb 4, 2022 at 12:12 PM Gus Correa <gus@xxxxxxxxxxxxxxxxx> wrote:

> Hi Ramakrishnan
>
> I presume the latest screenshot you sent is of another (correct) prod.nc
> file, not the damaged one.
>
> That seems to be a hard nut to crack.
> I would guess there is a bug in the Amber IO routines, or maybe it is
> using the file system inconsistently.
> Have you ran the job to completion, or are you copying the files out
> before the simulation ends?
> I mean, because Amber may be closing the netcdf files only after it
> finishes the simulation
> (that would be a bad design, but it is possible).
>
> Unfortunately the fact that the file size increases with the number of
> frames doesn't guarantee that
> the data was actually written to the file.
> The space may have been allocated and filled with _FillValues (missing
> values) or with zeros.
>
> A more basic tool to inspect the file contents is "object dump" "od".
> It has many options to dump the file contents in hexadecimal, octal,
> floating point, integer, character,
> etc formats, to limit the size of the input, to specify an offset to start
> the dump from, etc.
> It is a bit painful, but may give you a hint of what is inside the files,
> and whether it is worth the effort to
> try to recover them.
> https://man7.org/linux/man-pages/man1/od.1.html
>
>
> Gus
>
> On Fri, Feb 4, 2022 at 11:54 AM Ramakrishnan N <ram.n.krishnan@xxxxxxxxx>
> wrote:
>
>> Hi Gus
>>
>> Thank you for looking into this. The time variable is supposed to contain
>> floating-point numbers as in the attached screenshot.
>>
>> I did try converting the file to a  cdl format following some earlier
>> threads in the mailing list. The resulting cdl file does not contain any
>> data.
>>
>> I am trying to figure out where the data is getting stored. As I pointed
>> out in my initial post, the size of the netcdf file increases with the
>> number of frames stored during the simulation.
>>
>> Best
>> Ram
>>
>>
>> On Fri, Feb 4, 2022 at 11:35 AM Gus Correa <gus@xxxxxxxxxxxxxxxxx> wrote:
>>
>>> Hi Ramakrishnan
>>>
>>> Do you know what is stored in the "time" variable, if anything?
>>>
>>> ncdump -c prod.nc
>>>
>>> should tell (prints the header and the coordinate variables, including
>>> time).
>>>
>>> A brute force method to fix a broken netCDF (if the file size is not
>>> gigantic)
>>> that I used in the past is this:
>>>
>>> First use ncdump to dume the whole file to a text file,
>>> say ncdump prod.nc > prod.cdl
>>> cdl is the "common data language:", a text representation of a netCDF
>>> file.
>>>
>>> Then, second, edit/doctor the prod.cdl text file (with vi, emacs, etc).
>>> Replace the time dimension 0 by the correct value.
>>> If the time coordinate variable is wrong, replace each of its entries by
>>> the correct value.
>>> Be very careful so as not to mess up the cdl syntax (there are commas
>>> separating values,
>>> and other symbols separating variables, etc).
>>> Save the edited prod.cdl file, possibly with a different name, say
>>> prod_new.cdl
>>>
>>> Finally use ncgen to regenerate the prod.nc file from the edited cdl:
>>> ncgen -b -o prod_new.nc prod_new.cdl
>>> This will create the prod_new.nc file, presumably with everything fixed
>>> (assuming the original file had
>>> the correct data, except for the time dimension and time coordinate
>>> variable perhaps).
>>> I am citing from memory here, so please double check the ncgen man page
>>> before you try.
>>> For details on the ncgen syntax, see:
>>> https://linux.die.net/man/1/ncgen
>>>
>>> I used this primitive method to fix a few damaged netCDF files in the
>>> past,
>>> in desperate situations like yours.
>>> I am not proud of its elegance,
>>> but sometimes elegance is what you care less about,
>>> and it worked for me.
>>>
>>> Also, you may need to do "ncdump -k prod.nc" beforehand,
>>> to check which type of netcdf format your file has.
>>> Then use the same format in the ncgen command above.
>>>
>>> I hope it helps.
>>> Gus Correa
>>>
>>> On Fri, Feb 4, 2022 at 9:52 AM Ramakrishnan N <ram.n.krishnan@xxxxxxxxx>
>>> wrote:
>>>
>>>> I have a netcdf file (prod.nc) that contains time series from a
>>>> molecular dynamics simulation (Amber force field, OpenMM engine, parmed
>>>> netCDFReporter). The netCDFReporter had some problems and as a result, the
>>>> number of frames in the netcdf file is zero. Given below is the ncdump for
>>>> the file:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *$ncdump -h prod.nc <http://prod.nc>netcdf prod {dimensions:
>>>> frame = UNLIMITED ; // (0 currently)        spatial = 3 ;        atom =
>>>> 20504 ;variables:        char spatial(spatial) ;        float time(frame)
>>>> ;                time:units = "picosecond" ;        float
>>>> coordinates(frame, atom, spatial) ;                coordinates:units =
>>>> "angstrom" ;// global attributes:                :Conventions = "AMBER" ;
>>>>               :ConventionVersion = "1.0" ;                :application =
>>>> "AmberTools" ;                :program = "ParmEd" ;
>>>> :programVersion = "3.4.0+11.g1be8ca0f" ;                :title =
>>>> "ParmEd-created trajectory" ;}*
>>>>
>>>> However, the netcdf file has non-zero size (that increases linearly
>>>> with the number of frames stored) which implies that it certainly has the
>>>> data written into it. I tried a number of tools (nco tools, netCDF4, scipy
>>>> netcdf reader, xarray) to access the missing data but have not succeeded.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *I have two questions:1. Does the file contain real data?2. If the
>>>> former, is there a way to retrieve the data and create a new netcdf file?*
>>>>
>>>> I am desperately looking to salvage near 3 microseconds of simulation
>>>> data which would take more than 2 months to generate. I would greatly
>>>> appreciate it if anyone can provide me with some insight into this problem.
>>>>
>>>> The attached netcdf file has 14 frames that can be used to examine the
>>>> issue.
>>>>
>>>> Thanks in advance
>>>>
>>>> Best
>>>> Ram
>>>>
>>>>
>>>> _______________________________________________
>>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>>> recorded in the Unidata inquiry tracking system and made publicly
>>>> available through the web.  Users who post to any of the lists we
>>>> maintain are reminded to remove any personal information that they
>>>> do not want to be made public.
>>>>
>>>>
>>>> netcdfgroup mailing list
>>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>>> For list information or to unsubscribe,  visit:
>>>> https://www.unidata.ucar.edu/mailing_lists/
>>>>
>>>

PNG image

  • 2022 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: