[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More performance data for netCDF4.0.1/hdf-1.8.2



Tushar Mohan <address@hidden> writes:

> Hi Ed,
>
> I've been probing the performance issue in mppnccombine. The program
> combines records across multiple netcdf input files into a single
> output file.
>

Interesting!

I am forwarding this to the HDF5 programmers for some attention.

I note that the latest HDF5 is 1.8.4, not 1.8.2, so you might want to
give that a try...


> My experiments seem to show that the performance drops off due to
> excessive TLB misses (thrashing of some sort) on the
> Montecito/Montvale chips (Itanium-64) in certain HDF5 functions,
> notably HFI_register. This happens, while processing certain input
> files in the hdf5 format -- v3-classic input files do not show this
> problem. The output file format is used (v3-classic or hdf5) makes no
> difference in performance. The problem shows up, however, at the point
> when the output file hits the 2GB mark, and remains from then on. We
> are using a netcdf chunk size of 64K. Worth noting is that if certain
> input files are left out from the combine, the problem doesn't show up
> at all. I've tried with the latest hdf5 release - 1.8.2, and
> netcdf-4.0.1. Snapshot development versions of netcdf, while slightly
> better in performance than the 4.0.1 used for this experiment, exhibit
> a similar performance profile: the performance-challenged runs are
> overwhelmingly dominated by time spent in a few HDF5 functions.
>
> Using a performance tool - hpcrun (part of HPCToolkit) - I was able to
> obtain a statistical profile for total cycles and TLB misses for a
> partial run. The total cycles (or time spent) closely tracked the TLB
> misses, and leapt exponentially once the output size hit the 2 GB
> mark. I'm attaching the pruned profile. The full one is a few MB, and
> is at:
>
> http://www.samaratechnologygroup.com/pub/mppnccombine-hdf5-profile.txt.gz
>
> It seems that a region of code performing a pointer traversal is
> exhibiting poor locality and is causing thrashing in the TLB. If you
> study the H5I_register code in the attached profile, you'll see that
> when an "excessive" number of objects are registered using
> H5I_register,  then an extremely expensive operation to guard against
> duplicate IDs being parceled out, is performed. This test, a dozen
> lines of source code, consumes most of the application runtime.
>
> In fact, it's conceivable that the 2GB point is a red herring, and
> it's just that the number of objects being registered causes the
> "duplicate ID check", and this just happened to be at the 2 GB point
> in the output file. My observation that certain input files cause the
> problem, may be explained by them having enough objects to register
> that the "duplicate ID check" code is triggered, although I don't
> understand enough of the file contents to confirm this hypothesis.
>
> This problem seems to be in the province of the hdf5 developers,
> however, I thought I'd check the netcdf team, in case, you have a clue
> on whether certain parameters used for hdf5 by the netcdf library can
> affect the performance. While the code did not seem to imply so, it's
> possible that increasing the hdf5 type hash size may help, for
> example.
>
> In the attached profile, the first column showing percentages is the
> CPU cycles column, and it's showing as a percentage of total
> application CPU time. The second column shows the % of D-TLB misses.
>
> If you know of anybody in the hdf5 developer list, who might be able
> to help, I'd appreciate if you could include them in the mail thread.
>
> Please contact me if you have any questions, and thanks in advance for
> your help.
>
> Regards,
> Tushar
>
> On Wed, Dec 9, 2009 at 12:10 AM, Ed Hartnett <address@hidden> wrote:
>> "V. Balaji" <address@hidden> writes:
>>
>>> Hi Ed, one of our developers has noticed interesting (disturbing)
>>> behaviour in one of our homegrown netCDF tools.
>>>
>>> I don't want to drag you into the melee by ccing you into this
>>> group, but I wonder if the sudden performance cliff at 2GB rings any
>>> bells for you or colleagues at Unidata, either in terms of changes
>>> to libnetcdf or changes to the way we're invoking it.
>>>
>>> Thanks,
>>
>> Howdy all!
>>
>> I have read Jeff's description of the problem, and a few facts may help
>> clarify the situation...
>>
>> * Certainly you should be testing with the netCDF snapshot release, it
>>  has some performance improvements:
>>  ftp://ftp.unidata.ucar.edu/pub/snapshot/netcdf-4-daily.tar.gz
>>
>> * When creating netCDF-4 (and netCDF-4 classic model) the output should
>>  be in HDF5. One way to tell is to look at the first 4 bytes of the
>>  file. In emacs the file will start like this: HDF If you see "CDF"
>>  instead, then you are not creating a netCDF-4 file. (Check your
>>  nc_create call - by default netCDF-4 still produces classic format
>>  files, not netCDF-4/HDF5 files). From what Jeff says, it seems that
>>  you are not actually using netCDF-4/HDF5 files:
>>
>>      "My very last experiment showed that the output of the 4.0.1
>>      mppnccombine produces by default a file that does not seem to be
>>      in the hdf5 format (or at least has a format different from the
>>      *.000?  files). How did I deduce that? When you do "od -a -N 10"
>>      on the output files, you will see "CDF ...", which is the format
>>      for the netcdf classic or the netcdf 64-bit offset format, but is
>>      different from the format if you do "od" on the .000* files, which
>>      show up as "hdf5..."."
>>
>> * Performance will be the same for netCDF-4 files with or without the
>>  classic model turned on. That affects what you can add to the file,
>>  but not how (and how fast) data are read or written.
>>
>> * NetCDF-4 can easily handle files and variables larger than 2GB. The
>>  NC_CLASSIC_MODEL flag doesn't matter for this.
>>
>> * Chunk sizes are an important consideration. Chunk sizes are chosen by
>>  default if you don't specify them, and that works pretty poorly for
>>  larger variables. (But it seems that you are not producing netCDF-4
>>  files anyway. See above.)
>>
>> Please let me know if this doesn't help. You should not see any serious
>> performance problems with netCDF-4 if we are doing everything right...
>>
>> Thanks,
>>
>> Ed
>>
>> --
>> Ed Hartnett  -- address@hidden
>>
>

-- 
Ed Hartnett  -- address@hidden