Re: [netcdfgroup] nf90_char size

To: Davide Sangalli <davide.sangalli@xxxxxx>
Subject: Re: [netcdfgroup] nf90_char size
From: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
Date: Sat, 2 May 2020 09:35:46 -0600
For an easy workaround, you might try writing the original file in 64-bit
offset format, or CDF5 with a newer version of the netcdf library.  This
would bypass any mysterious netcdf-4 behavior.  There is nothing in your
current data scheme that needs netcdf-4 format.


On Fri, May 1, 2020 at 6:30 PM Dave Allured - NOAA Affiliate <
dave.allured@xxxxxxxx> wrote:

> Everything looks good in ncdump -hs.  The ncvalidator error is expected
> because the format is not in the netcdf-3 family.
>
> I am puzzled.  This looks like the hdf5 layer lost a whole lot of file
> space, but I don't see how.  One straightforward thing to try is upgrading
> to more recent versions of the netcdf and HDF5 libraries.
>
> If that doesn't help, then to get more information, try replicating the
> file with nccopy, h5copy, or h5repack.
>
> https://portal.hdfgroup.org/display/HDF5/HDF5+Command-line+Tools
>
> Use contiguous or chunked, but for testing purposes, do not enable any
> compression.  The idea is that the writers in those tools should be
> correctly optimized to rewrite those large char arrays without wasted
> space, in case your own writer did something strange.
>
> I suppose there could be a storage bug in the hdf5 or netcdf support
> libraries.  Your char arrays are uncommonly large, so they might have
> triggered some sort of edge case.
>
> I am refraining from suggesting low level debugging because I do not want
> to inflict pain.  Otherwise, see if other readers have some ideas, or post
> the question to the HDF5 users forum.
>
>
> On Fri, May 1, 2020 at 5:40 PM Davide Sangalli <davide.sangalli@xxxxxx>
> wrote:
>
>> I also add
>>
>> ncvalidator ndb.BS_COMPRESS0.005000_Q1
>> Error: Unknow file signature
>>     Expecting "CDF1", "CDF2", or "CDF5", but got "�HDF"
>> File "ndb.BS_COMPRESS0.005000_Q1" fails to conform with CDF file format
>> specifications
>>
>> Best,
>> D.
>>
>> On 02/05/20 01:26, Davide Sangalli wrote:
>>
>> Output of ncdump -hs
>>
>> D.
>>
>> ncdump -hs BSK_2-5B_X59RL-50B_SP_bse-io/ndb.BS_COMPRESS0.005000_Q1
>>
>> netcdf ndb.BS_COMPRESS0 {
>> dimensions:
>>         BS_K_linearized1 = 2025000000 ;
>>         BS_K_linearized2 = 781887360 ;
>>         complex = 2 ;
>>         BS_K_compressed1 = 24776792 ;
>> variables:
>>         char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>>                 BSE_RESONANT_COMPRESSED1_DONE:_Storage = "contiguous" ;
>>         char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>>                 BSE_RESONANT_COMPRESSED2_DONE:_Storage = "contiguous" ;
>>         char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>>                 BSE_RESONANT_COMPRESSED3_DONE:_Storage = "contiguous" ;
>>         float BSE_RESONANT_COMPRESSED1(BS_K_compressed1, complex) ;
>>                 BSE_RESONANT_COMPRESSED1:_Storage = "contiguous" ;
>>                 BSE_RESONANT_COMPRESSED1:_Endianness = "little" ;
>> // global attributes:
>>                 :_NCProperties =
>> "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ;
>>                 :_SuperblockVersion = 0 ;
>>                 :_IsNetcdf4 = 1 ;
>>                 :_Format = "netCDF-4" ;
>>
>>
>>
>> On Sat, May 2, 2020 at 12:24 AM +0200, "Dave Allured - NOAA Affiliate" <
>> dave.allured@xxxxxxxx> wrote:
>>
>> I agree that you should expect the file size to be about 1 byte per
>>> stored character.  IMO the most likely explanation is that you have a
>>> netcdf-4 file with inappropriately small chunk size.  Another possibility
>>> is a 64-bit offset file with crazy huge padding between file sections.
>>> This is very unlikely, but I do not know what is inside your writer code.
>>>
>>> Diagnose, please.  Ncdump -hs.  If it is 64-bit offset, I think
>>> ncvalidator can display the hidden pad sizes.
>>>
>>>
>>> On Fri, May 1, 2020 at 3:37 PM Davide Sangalli <davide.sangalli@xxxxxx>
>>> wrote:
>>>
>>>> Dear all,
>>>> I'm a developer of a fortran code which uses netcdf for I/O
>>>>
>>>> In one of my runs I created a file with some huge array of characters.
>>>> The header of the file is the following:
>>>> netcdf ndb.BS_COMPRESS0 {
>>>> dimensions:
>>>>     BS_K_linearized1 = 2025000000 ;
>>>>     BS_K_linearized2 = 781887360 ;
>>>> variables:
>>>>     char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>>>>     char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>>>>     char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>>>> }
>>>>
>>>> The variable is declared as nf90_char which, according to the
>>>> documentation should be 1 byte per element.
>>>> Thus I would expect the total size of the file to be 1
>>>> byte*(2*2025000000+781887360) ~ 4.5 GB
>>>> Instead the file size is 16059445323 bytes ~ 14.96 GB, i.e. 10.46 GB
>>>> more and a factor 3.33 bigger
>>>>
>>>> This happens consistently if I consider the file
>>>> netcdf ndb {
>>>> dimensions:
>>>>     complex = 2 ;
>>>>     BS_K_linearized1 = 2025000000 ;
>>>>     BS_K_linearized2 = 781887360 ;
>>>> variables:
>>>>     float BSE_RESONANT_LINEARIZED1(BS_K_linearized1, complex) ;
>>>>     char BSE_RESONANT_LINEARIZED1_DONE(BS_K_linearized1) ;
>>>>     float BSE_RESONANT_LINEARIZED2(BS_K_linearized1, complex) ;
>>>>     char BSE_RESONANT_LINEARIZED2_DONE(BS_K_linearized1) ;
>>>>     float BSE_RESONANT_LINEARIZED3(BS_K_linearized2, complex) ;
>>>>     char BSE_RESONANT_LINEARIZED3_DONE(BS_K_linearized2) ;
>>>> }
>>>> The float component should weight ~36 GB while the char component
>>>> should be identical to before, i.e. 4.5 GB for a total of 40.5 GB
>>>> The file is instead ~ 50.96 GB, i.e. again a factor 10.46 GB bigger
>>>> than expected.
>>>>
>>>> *Why ?*
>>>>
>>>> My character variables are something like
>>>> "tnnnntnnnntnnnnnnnntnnnnnttnnnnnnnnnnnnnnnnt..."
>>>> but the file size is already like that just after the file creation,
>>>> i.e. before filling it.
>>>>
>>>> Few info about the library, compiled linking to HDF5 (hdf5-1.8.18),
>>>> with parallel IO support:
>>>> Name: netcdf
>>>> Description: NetCDF Client Library for C
>>>> URL: http://www.unidata.ucar.edu/netcdf
>>>> Version: 4.4.1.1
>>>> Libs: -L${libdir}  -lnetcdf -ldl -lm
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5hl_fortran.a
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_fortran.a
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_hl.a
>>>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5.a
>>>> -lz -lm -ldl -lcurl
>>>> Cflags: -I${includedir}
>>>>
>>>> Name: netcdf-fortran
>>>> Description: NetCDF Client Library for Fortran
>>>> URL: http://www.unidata.ucar.edu/netcdf
>>>> Version: 4.4.4
>>>> Requires.private: netcdf > 4.1.1
>>>> Libs: -L${libdir} -lnetcdff
>>>> Libs.private: -L${libdir} -lnetcdff -lnetcdf
>>>> Cflags: -I${includedir}
>>>>
>>>> Best,
>>>> D.
>>>> --
>>>> Davide Sangalli, PhD
>>>> CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX
>>>> Centre
>>>> Area della Ricerca di Roma 1, 00016 Monterotondo Scalo, Italy
>>>> http://www.ism.cnr.it/en/davide-sangalli-cv/
>>>> http://www.max-centre.eu/
>>>>
>>>
References:
- [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Dave Allured - NOAA Affiliate
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Dave Allured - NOAA Affiliate
2020 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: