[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20040901: netCDF C - What writes faster netcdf files: netcdf for perl, c, or fortran?



>To: address@hidden
>From: "Kate Edwards" <address@hidden>
>Subject: netCDF C - What writes faster netcdf files: netcdf for perl, c, or 
>fortran?
>Organization: Univ. of Washington, Applied Physics Lab
>Keywords: 200409011954.i81JsPqs012300 netCDF write

Hi Kate,

> I am finding it is taking a very long time to write out a large
> netcdf file in netcdf for Matlab, which is my main programming
> language.  Would switching languages speed things up, for example
> using netcdf for c, perl or fortran instead?  Of these three, which
> would be fastest at writing very large netcdf files?

I don't know how the Matlab interface is implemented, so it's hard to
tell exactly why your writes are slow.  In particular, if you are
using the "WetCDF Toolbox for Matlab"

  http://woodshole.er.usgs.gov/staffpages/cdenham/public_html/MexCDF/wetcdf.html

it has some significant efficiency problems for some kinds of access.

Using the C interface would be fastest currently, although there are a
few cases in which the Java interface actually beats the C interface
by up to a factor of two.  The Fortran interface is only a thin veneer
over the C interface, and that veneer layer shouldn't add much
overhead unless you are accessing the data one value at a time instead
of using array accesses.  Similarly, the perl interface actually calls
the C interface underneath, but may add more overhead since the perl
interpreter has to be invoked and values copied into perl data
structures. 

Another possibility is that you are merely writing the data in an
inefficient way that could be improved with any language interface by
writing data values in a different order.  For example, if ncdump
shows the structure and shape of a variable is:

  float temperature(time, level, lon, lat);

then the most efficient way to write the data will be in contiguous
blocks, for example:

  - writing a 1D array of values for all lat indices for each 
    (time, level, lon) combination, or
  - writing a 2D slab of values for all (lon,lat) indices for each
    (time, level) combination, or
  - writing a 3D slab of values for all (level,lon,lat) indices for
    each time, etc.

The least efficient way to write the data would be to write along the
most-slowly varying time dimension, writing all the times for some
(level,lon,lat) combination, for example.  That would require a seek,
a read, and a write of one disk block for each value written.

For more information on performance, see the appropriate section of
the User's Guide, "NetCDF File Structure and Performance":

  http://www.unidata.ucar.edu/packages/netcdf/guidec/guidec-14.html#HEADING14-0

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden          http://www.unidata.ucar.edu/staff/russ

NOTE: All email exchanges with Unidata User Support are recorded in the
Unidata inquiry tracking system and then made publicly available
through the web.  If you do not want to have your interactions made
available in this way, you must let us know in each email you send to us.