Re: [netcdfgroup] nf90_char size

  • To: Davide Sangalli <davide.sangalli@xxxxxx>
  • Subject: Re: [netcdfgroup] nf90_char size
  • From: Wei-Keng Liao <wkliao@xxxxxxxxxxxxxxxx>
  • Date: Sat, 2 May 2020 15:55:06 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=northwestern.edu; dmarc=pass action=none header.from=northwestern.edu; dkim=pass header.d=northwestern.edu; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aS1wqyVZESSEViXGNoVTkvs2Jju6dmfNFs6UcleXr60=; b=Zpurc/Ybb5kEgV2RBJhRHirFKbta3V4HWvN5dbzSejlW7glMWOaZOY8njN176d8DKVeysUGtQ/4iupDKi5pnh0+OlkvHq6Bl4AeQNxQ9Je4AIFTLZyM3Go+4ODEx0J8TCFQ/qGVYeNjjXbhxEtJyfDipObMqs1KJ79X6cg0Pn4NtTe70AqulJYw7QMgEoj+qWQZCBoTJa+xkQWubPyaPzS+ola7SIWSh5vLx7cEyywaSG/6YNyhPtnBIYZvXJaPXk4AcOBS/DlKiusUQXOpuGlmDL8WvIsxLYX04ZRz8Bk98FEsnVJik3+eb89aUxxkLH/kmX/KBXD4jFAF6Zv5h9w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ejIiMBYr5rf+Okg8xHAWigIHQRwiKpgLxibPX+cPn0X95puvkLLOOAA8P6Q9J40zx/4M9YWDl4bDh/bTBoaZcMQodGltXCDZA5epoo2KawIBV2NfSY2fMI3Y/a2HWY2wwfT/Bxo+co2XWn25QgW1RghyJa/Rp6Wknni/j9reJlOTJSIS1BVQR19U8Le2rXAOKgTjGO+Vqu2yfnfX+F6WmftvWN2AFw7tpomCsgRwkm9KLC9o0IxQIA2BHqehl/UAKq71yQsa1HoWUn6V6CaJjIx3xvTaDQh0r0YBsTtK5HTmVUCdWdmp8TZ8+rcSnAaqrURYTlh6bjUW82BXjuwoJg==
  • Authentication-results: cnr.it; dkim=none (message not signed) header.d=none; cnr.it; dmarc=none action=none header.from=northwestern.edu;
For HDF5 files, command “h5dump -Hp ndb.BS_COMPRESS0.005000_Q1” shows
the data chunk settings used by all datasets in the file.

Command “h5stat -Ss ndb.BS_COMPRESS0.005000_Q1” shows information about
free space, metadata, raw data, etc.

They may reveal why your file is abnormal big.
Most likely it is the chunk setting you used.

Wei-keng

> On May 1, 2020, at 6:40 PM, Davide Sangalli <davide.sangalli@xxxxxx> wrote:
> 
> I also add
> 
> ncvalidator ndb.BS_COMPRESS0.005000_Q1 
> Error: Unknow file signature
>     Expecting "CDF1", "CDF2", or "CDF5", but got "�HDF"
> File "ndb.BS_COMPRESS0.005000_Q1" fails to conform with CDF file format 
> specifications
> 
> Best,
> D.
> 
> On 02/05/20 01:26, Davide Sangalli wrote:
>> Output of ncdump -hs
>> 
>> D.
>> 
>> ncdump -hs BSK_2-5B_X59RL-50B_SP_bse-io/ndb.BS_COMPRESS0.005000_Q1
>> 
>> netcdf ndb.BS_COMPRESS0 {
>> dimensions:
>>         BS_K_linearized1 = 2025000000 ;
>>         BS_K_linearized2 = 781887360 ;
>>         complex = 2 ;
>>         BS_K_compressed1 = 24776792 ;
>> variables:
>>         char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>>                 BSE_RESONANT_COMPRESSED1_DONE:_Storage = "contiguous" ;
>>         char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>>                 BSE_RESONANT_COMPRESSED2_DONE:_Storage = "contiguous" ;
>>         char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>>                 BSE_RESONANT_COMPRESSED3_DONE:_Storage = "contiguous" ;
>>         float BSE_RESONANT_COMPRESSED1(BS_K_compressed1, complex) ;
>>                 BSE_RESONANT_COMPRESSED1:_Storage = "contiguous" ;
>>                 BSE_RESONANT_COMPRESSED1:_Endianness = "little" ;
>> // global attributes:
>>                 :_NCProperties = 
>> "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ;
>>                 :_SuperblockVersion = 0 ;
>>                 :_IsNetcdf4 = 1 ;
>>                 :_Format = "netCDF-4" ;
>> 
>> 
>> 
>> On Sat, May 2, 2020 at 12:24 AM +0200, "Dave Allured - NOAA Affiliate" 
>> <dave.allured@xxxxxxxx> wrote:
>> 
>> I agree that you should expect the file size to be about 1 byte per stored 
>> character.  IMO the most likely explanation is that you have a netcdf-4 file 
>> with inappropriately small chunk size.  Another possibility is a 64-bit 
>> offset file with crazy huge padding between file sections.  This is very 
>> unlikely, but I do not know                 what is inside your writer code.
>> 
>> Diagnose, please.  Ncdump -hs.  If it is 64-bit offset, I think ncvalidator 
>> can display the hidden pad sizes.
>> 
>> 
>> On Fri, May 1, 2020 at 3:37 PM Davide Sangalli <davide.sangalli@xxxxxx> 
>> wrote:
>> Dear all,
>> I'm a developer of a fortran code which uses netcdf for I/O
>> 
>> In one of my runs I created a file with some huge array of characters.
>> The header of the file is the following:
>> netcdf ndb.BS_COMPRESS0 {
>> dimensions:
>>     BS_K_linearized1 = 2025000000 ;
>>     BS_K_linearized2 = 781887360 ;
>> variables:
>>     char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>>     char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>>     char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>> }
>> 
>> The variable is declared as nf90_char which, according to the documentation 
>> should be 1 byte per element.
>> Thus I would expect the total size of the file to be 1 
>> byte*(2*2025000000+781887360) ~ 4.5 GB
>> Instead the file size is 16059445323 bytes ~ 14.96 GB, i.e. 10.46 GB more 
>> and a factor 3.33 bigger
>> 
>> This happens consistently if I consider the file
>> netcdf ndb {
>> dimensions:
>>     complex = 2 ;
>>     BS_K_linearized1 = 2025000000 ;
>>     BS_K_linearized2 = 781887360 ;
>> variables:
>>     float BSE_RESONANT_LINEARIZED1(BS_K_linearized1, complex) ;
>>     char BSE_RESONANT_LINEARIZED1_DONE(BS_K_linearized1) ;
>>     float BSE_RESONANT_LINEARIZED2(BS_K_linearized1, complex) ;
>>     char BSE_RESONANT_LINEARIZED2_DONE(BS_K_linearized1) ;
>>     float BSE_RESONANT_LINEARIZED3(BS_K_linearized2, complex) ;
>>     char BSE_RESONANT_LINEARIZED3_DONE(BS_K_linearized2) ;
>> }
>> The float component should weight ~36 GB while the char component should be 
>> identical to before, i.e. 4.5 GB for a total of 40.5 GB
>> The file is instead ~ 50.96 GB, i.e. again a factor 10.46 GB bigger than 
>> expected.
>> 
>> Why ?
>> 
>> My character variables are something like
>> "tnnnntnnnntnnnnnnnntnnnnnttnnnnnnnnnnnnnnnnt..."
>> but the file size is already like that just after the file creation, i.e. 
>> before filling it.
>> 
>> Few info about the library, compiled linking to HDF5 (hdf5-1.8.18), with 
>> parallel IO support:
>> Name: netcdf
>> Description: NetCDF Client Library for C
>> URL: http://www.unidata.ucar.edu/netcdf
>> Version: 4.4.1.1
>> Libs: -L${libdir}  -lnetcdf -ldl -lm 
>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5hl_fortran.a
>>  
>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_fortran.a
>>  
>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_hl.a
>>  
>> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5.a
>>  -lz -lm -ldl -lcurl
>> Cflags: -I${includedir}
>> 
>> Name: netcdf-fortran
>> Description: NetCDF Client Library for Fortran
>> URL: http://www.unidata.ucar.edu/netcdf
>> Version: 4.4.4
>> Requires.private: netcdf > 4.1.1
>> Libs: -L${libdir} -lnetcdff
>> Libs.private: -L${libdir} -lnetcdff -lnetcdf
>> Cflags: -I${includedir}
>> 
>> Best,
>> D.
>> -- 
>> Davide Sangalli, PhD
>> CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX 
>> Centre
>> Area della Ricerca di Roma 1, 00016 Monterotondo Scalo, Italy
>> http://www.ism.cnr.it/en/davide-sangalli-cv/
>> http://www.max-centre.eu/
> 
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
> 
> 
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> https://urldefense.com/v3/__https://www.unidata.ucar.edu/mailing_lists/__;!!Dq0X2DkFhyF93HkjWTBQKhk!GlMUXr2ZUUJOLFkvEP_YqN7UDZILtBBWb_Z5DVa2Mwi9UIg_yB2Hb7tJibyV8bgan4ku$
>   

  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: