[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some interesting performance data for netCDF4.0.1 (fwd)



"V. Balaji" <address@hidden> writes:

> Hi Ed, one of our developers has noticed interesting (disturbing)
> behaviour in one of our homegrown netCDF tools.
>
> I don't want to drag you into the melee by ccing you into this
> group, but I wonder if the sudden performance cliff at 2GB rings any
> bells for you or colleagues at Unidata, either in terms of changes
> to libnetcdf or changes to the way we're invoking it.
>
> Thanks,

Howdy all!

I have read Jeff's description of the problem, and a few facts may help
clarify the situation...

* Certainly you should be testing with the netCDF snapshot release, it
  has some performance improvements:
  ftp://ftp.unidata.ucar.edu/pub/snapshot/netcdf-4-daily.tar.gz

* When creating netCDF-4 (and netCDF-4 classic model) the output should
  be in HDF5. One way to tell is to look at the first 4 bytes of the
  file. In emacs the file will start like this: HDF If you see "CDF"
  instead, then you are not creating a netCDF-4 file. (Check your
  nc_create call - by default netCDF-4 still produces classic format
  files, not netCDF-4/HDF5 files). From what Jeff says, it seems that
  you are not actually using netCDF-4/HDF5 files:

      "My very last experiment showed that the output of the 4.0.1
      mppnccombine produces by default a file that does not seem to be
      in the hdf5 format (or at least has a format different from the
      *.000?  files). How did I deduce that? When you do "od -a -N 10"
      on the output files, you will see "CDF ...", which is the format
      for the netcdf classic or the netcdf 64-bit offset format, but is
      different from the format if you do "od" on the .000* files, which
      show up as "hdf5..."."

* Performance will be the same for netCDF-4 files with or without the
  classic model turned on. That affects what you can add to the file,
  but not how (and how fast) data are read or written. 

* NetCDF-4 can easily handle files and variables larger than 2GB. The
  NC_CLASSIC_MODEL flag doesn't matter for this.

* Chunk sizes are an important consideration. Chunk sizes are chosen by
  default if you don't specify them, and that works pretty poorly for
  larger variables. (But it seems that you are not producing netCDF-4
  files anyway. See above.)

Please let me know if this doesn't help. You should not see any serious
performance problems with netCDF-4 if we are doing everything right...

Thanks,

Ed

-- 
Ed Hartnett  -- address@hidden