Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Re: performance degrades with filesize

I've followed the discussion on this subject today.  I've not looked at
this issue in eons.  I recall that this certainly was a problem with early
netCDF releases, apparently from forcing long seeks in big files (assuming
I remembered correctly).  At that time, I tended to work around the problem
by keeping relatively few time steps in each file mapped to the unlimited
dimension, and then creating multiple files.  The files would have multiple
(static) dimensions and variables, but would have fast access.  The
applications then took care of the bookkeeping to treat the set of files as
a single data set.  I still use that approach today, having not revisited
how to do it better with more recent versions of netCDF.

--------------------------
Lloyd A. Treinish
Deep Computing Institute
IBM Thomas J. Watson Research Center
P. O. Box 218
Yorktown Heights, NY 10598
914-945-2770 (voice)
914-945-3434 (facsimile)
lloydt@xxxxxxxxxx
http://www.research.ibm.com/people/l/lloydt/
http://www.research.ibm.com/weather


John Galbraith <john@xxxxxxxxxxxxxxx>@unidata.ucar.edu on 09/10/2001
03:26:23 PM

Please respond to John Galbraith <john@xxxxxxxxxxxxxxx>

Sent by:  owner-netcdfgroup@xxxxxxxxxxxxxxxx


cc:   Ethan Alpert <ethan@xxxxxxxxxxxx>, john@xxxxxxxxxxxxxxx,
      netcdfgroup@xxxxxxxxxxxxxxxx



>>>>> "Steve" == Steve Emmerson <steve@xxxxxxxxxxxxxxxx> writes:

    >> ... I can't be certain but it seems like the entire file is
    >> rewritten when the unlimited dimension increases.

    Steve> The C implementation from the Unidata Program Center of the
    Steve> netCDF API *does not* rewrite the entire netCDF file when the
    Steve> unlimited dimension is increased -- effectively, the file is
    Steve> simply appended to.

That is why I say "seems", because the write time "seems" to be
proportional to the size of the file.  I have no evidence that the file is
actually being copied.  In fact, I would be surprised if it was and would
suspect that I am calling the netcdf library incorrectly.  I probably am
calling it incorrectly, based on this trouble I am having, but I don't know
what my problem is yet.

    Steve> I don't know about the Python interface.

The Python interface basically just converts the python array slices to
netcdf arguments and calls ncvarputg().  (Python arrays are contiguous
values).  The Python module never actually touches the netcdf file except
through that call to ncvarputg().  Even if the python wrapper was deathly
slow, it would be the same deathly slow interval every write and it
wouldn't be dependent on the file size.

Maybe there is some issue with calling the old netcdf API?

Thanks,
     John


--
John Galbraith                  email: john@xxxxxxxxxxxxxxx
Los Alamos National Laboratory,   home phone: (505) 662-3849
                                  work phone: (505) 665-6301




 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Community Programs   Unidata is a member of the UCAR Community Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690