Re: [netcdfgroup] nc_open takes a long time to open a big file

Hello Jennifer,

    If your chunk parameters and dataset dimensions remain as you've
originally stated (1,1,160,320,etc), I suspect these could be playing
an indirect role in the long file open times reported. In HDF5 (upon
which NetCDF is built) these are typically set to a more 'square' I/O
geometry, and rarely assigned to be even multiple of the dataset's
dimensions.
      This is a fairly large file (18404502496), and we don't know how
much memory (or swap) you have available nor anything about I/O bus
contention; any of these things can play a role in responsiveness.
       Opening any file triggers a lot of downstream activity by the
OS, including inode traverses, queries on current state of cache
condition, lock checking, wait-states for re-tries on corrupted
sectors, etc.  If there are a very large number of physical files
present in the directory where this file resides, or there is a lot of
contention from concurrent processes on the inode table for the
filesystem, these factors can also translate to longer file open
times.
       Closing a file can be even more involved, since buffer flushes
have to be performed through several layers of cache, VMM tables --
how long does it take to close the same file (e.g. does this also take
a longer than expected?).
       Last-- I haven't checked the internals of nc_open() lately, but
sometimes multi-tier, wrapped APIs (like NetCDF)  bundle of other
'look ahead' initialization actions along with the 'opening' of a
file, such as building or traversing large in-memory indices of
internal node trees, skipping over gaps in the internal structures...
if nc_open() triggers this sort of extra-curricular mechanism, wall
clock time to perform the open could increase somewhat linearly with
the size of the file.
    Not sure any of this helps, other than to help look for some
deterministic causes...

cheers,
jmg

On Thu, Sep 9, 2010 at 6:06 PM, Denis Nadeau <denis.nadeau@xxxxxxxx> wrote:
> Hi Jennifer,
>
>
>
> Did you call nc_set_chunk_cache?  If not, do you know what the default cache
> size was set up to?  (Look in your config.log created after you called
> configure)
>
> I must admit that 20 seconds is quite long.
>
>
>
> Denis
>
>
>
>
>
> From: netcdfgroup-bounces@xxxxxxxxxxxxxxxx
> [mailto:netcdfgroup-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Jennifer Adams
> Sent: Thursday, September 09, 2010 7:34 PM
> To: netCDF Mail List
> Subject: [netcdfgroup] nc_open takes a long time to open a big file
>
>
>
> Dear Experts,
>
> I'm using netcdf-4.1.1-rc1 and hdf5-1.8.4-patch1 on a 64-bit linux server
> running CentOS-5.5.
>
> I have a netcdf-4 file that is 18404502496 bytes large.
>
>
>
> My file's dimensions look like this:
>
>         lon = 320 ;
>
>         lat = 160 ;
>
>         lev = 11 ;
>
>         time = 1581 ;
>
>
>
> It has 7 variables that look like this:
>
>    float temp(time, lev, lat, lon) ;
>
>                 temp:_Storage = "chunked" ;
>
>                 temp:_ChunkSizes = 1, 1, 160, 320 ;
>
>                 temp:_DeflateLevel = 1 ;
>
>                 temp:_Shuffle = "true" ;
>
>
>
> and 1 variable that looks like this:
>
>     float sfp(time, lat, lon) ;
>
>                 sfp:_Storage = "chunked" ;
>
>                 sfp:_ChunkSizes = 1, 160, 320 ;
>
>                 sfp:_DeflateLevel = 1 ;
>
>                 sfp:_Shuffle = "true" ;
>
>
>
> Is it normal for nc_open to take 20 seconds to open this file before
> returning control to my C program?
>
>
>
> --Jennifer
>
>
>
>
>
> --
>
> Jennifer M. Adams
>
> IGES/COLA
>
> 4041 Powder Mill Road, Suite 302
>
> Calverton, MD 20705
>
> jma@xxxxxxxxxxxxx
>
>
>
>
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>



-- 
------------------------------------------------------------
Joseph Glassy
Lead Software Engineer (contractor)
NASA Measures (Freeze/Thaw),Rm CFC 424
College of Forestry and Conservation
Univ. Montana, Missoula, MT 59812
Lupine Logic Inc.
www.lupinelogic.com
Scientific and Technical Programming



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: