[netcdfgroup] Fwd: status of thread safety

Forwarding my reply to the list.

Begin forwarded message:

From: Elena Pourmal <epourmal@xxxxxxxxxxxx<mailto:epourmal@xxxxxxxxxxxx>>
Subject: Re: [netcdfgroup] status of thread safety
Date: July 21, 2016 at 11:35:45 AM EDT
To: Burlen Loring <bloring@xxxxxxx<mailto:bloring@xxxxxxx>>
Cc: Prabhat <prabhat@xxxxxxx<mailto:prabhat@xxxxxxx>>, Jeffrey Johnson 
<jnjohnson@xxxxxxx<mailto:jnjohnson@xxxxxxx>>, Harinarayan Krishnan 
<hkrishnan@xxxxxxx<mailto:hkrishnan@xxxxxxx>>, Quincey Koziol 
<koziol@xxxxxxx<mailto:koziol@xxxxxxx>>

Hi Burlen and All,

On Jul 21, 2016, at 10:16 AM, Quincey Koziol 
<koziol@xxxxxxx<mailto:koziol@xxxxxxx>> wrote:

Hi Burlen,
Yes, the HDF5 steering committee (CC’ing Elena at the HDF Group) has been 
considering several things that would help your situation with netCDF-4.  
First, the effort to include the HDF5 HL API routines in the global library 
lock has been discussed a lot over the last year and I believe is under 
consideration by a funded project (Elena - Yes?).

Unfortunately, this effort has not been funded, and at this point is not of the 
highest priority for THG. [We need to finish some tasks including performance 
issues for the HDF5 1.10.* releases.] Said this, the TS issue for the HL 
interfaces is very important and we would like to address it ASAP.

We accept patches. If anyone has extra time and is willing to work with us on 
the implementation, this may help!

Thank you!

Elena

 After that, we are also working to get funding through the DOE ECP program to 
enable both asynchronous I/O in HDF5 and multi-threaded concurrent access to 
the HDF5 library.  Those are probably 1-2 years out though (assuming our 
proposal gets funded :-)

Quincey


On Jul 19, 2016, at 11:26 AM, Prabhat <prabhat@xxxxxxx<mailto:prabhat@xxxxxxx>> 
wrote:

Quincey will know the specifics.

Prabhat

On Jul 19, 2016, at 11:25 AM, Burlen Loring 
<bloring@xxxxxxx<mailto:bloring@xxxxxxx>> wrote:

Hi Prabhat & Quincey,

You guys have some connection to HDF5 right? I'm facing some issue with NetCDF 
4 HDF5 format, namely that NetCDF uses HDF5 HL API but HDF5 doesn't make the HL 
API threadsafe. TECA needs the thread safe configuration. More details are 
found below in a conversation on the NetCDF user list. Do you guys know if 
there's any plan for a thread safe option with the HL API?

Burlen

-------- Forwarded Message --------
Subject:        Re: [netcdfgroup] status of thread safety
Date:   Mon, 18 Jul 2016 16:07:24 -0700
From:   Burlen Loring <bloring@xxxxxxx><mailto:bloring@xxxxxxx>
To:     dmh@xxxxxxxx<mailto:dmh@xxxxxxxx> <dmh@xxxxxxxx><mailto:dmh@xxxxxxxx>, 
netcdfgroup@xxxxxxxxxxxxxxxx<mailto:netcdfgroup@xxxxxxxxxxxxxxxx>



Dennis,

Great! Glad to know you are working on it! NetCDF C API is the only one
I care about.

Your scenario 1 is the one I'm using to hide latency of Lustre file
system when we need to extract metadata from large dataset. We have some
that contain 10k files. A pool of threads picks away at the list of
NetCDF files, only a single thread accesses any given file. We hold a
lock while we open and close the file, what happens in between open and
close, the reading and parsing of the metadata, occurs freely. Only 1
thread is touching any given file. This is the scenario where I need
thread safety baked into HDF5, or I end up intermittently crashing.

To get more info about the complication of configuring HDF5 thread safe
with the HDF5 HL API try to configure HDF5 with both thread safety and
the HL API. here is the output from hdf5 1.8.17

./configure  --enable-threadsafe --enable-hl

checking for thread safe support... configure: error: The thread-safe
library is incompatible with the high-level library. --disable-hl can be
used to prevent building the high-level library (recommended).
Alternatively, --enable-unsupported will allow building the high-level
library, though this configuration is not supported by The HDF Group.

Building in the unsupported mode is what we are currently doing to work
around the issue. I don't know what the plan for Hdf5 HL API and theads
is, but perhaps NetCDF should make use of the "low level" HDF5 API and
avoid the issue altogether?

Your scenario 2 is also of interest. in that scenario I currently
protect all nc calls and this serializes the I/O but still lets our
calculations run in parallel. The calcs take much longer than the I/O so
even though I/O is serialized this still works out well.

Burlen

On 07/18/2016 03:23 PM, dmh@xxxxxxxx<mailto:dmh@xxxxxxxx> wrote:
> >  Unfortunately HDF5's thread safe config is officially mutually
> > exclusive with the HDF5 HL API used by NetCDF.
>
> I was unaware of this. My understanding was that the thread-safe
> HDf5 operated by providing a global lock to serialize all
> accesses.  Assuming I am correct, you seem to be saying that this
> lock is not used at the HL API level.
>
> To date WRT netcdf, there have been two notions of thread-safe:
> 1. Allow multiple threads to operate as long as they are operating
>    on different files.
> 2. Allow multiple threads to operate on the same file.
>
> #1 is doable -- just time consuming to implement.
> In fact I have a netcdf-c branch that should allow this for
> netcdf 3 (classic) files. The approach is to isolate all mutable
> global state used by the library and surround operations on that
> state (both read and write) with a mutex lock. Since none of the
> state accesses are all that long, this should not affect
> performance very much. Note that an implicit assumption is that
> all c-library calls (esp. malloc) are or can be made thread-safe.
>
> This approach might also work for netcdf-4 files
> except that we are limited by what the HDF5 library does.
> If there API is globally serialized, then our locking regime
> will not help.
>
> There is no obvious reason AFIAK why the HDF5 library could not
> be modified to do a similar isolation of global state. Note this
> issue crops up for the pnetcdf library also.
>
> #2 is much harder and would require significant refactoring of
> any library that attempted it. The reason is that access to EVERY
> piece of state (global or not) must be made thread-safe.
>
> Finally, note that this issue is largely independent of parallel IO
> using e.g. MPIO.
>
> I look forward to further discussion of this issue; especially
> any complication I might be overlooking.
>
> =Dennis Heimbigner
>  Unidata
>
>
> On 7/18/2016 12:24 PM, Burlen Loring wrote:
>> Hi All,
>>
>> Just wanted to voice concern about the status of thread safety in NetCDF
>> 4 HDF5. The locking strategy we've successfully used with NetCDF classic
>> is not sufficient for NetCDF 4 with HDF5. In addition to our locking
>> strategy HDF5 needs to be compiled with a thread safe option.
>> Unfortunately HDF5's thread safe config is officially mutually exclusive
>> with the HDF5 HL API used by NetCDF. When HDF5 is forced to compile with
>> thread safety and HDF5 HL API, our threaded code runs without issue. It
>> also performs well, which is important. My concern is the fact that we
>> now rely upon a build configuration that is officially unsupported by
>> HDF5.
>>
>> Given the continual evolution to many core architectures, the horrendous
>> latency on modern parallel file systems on supercomputing platforms,
>> and that we have to deal with datasets structured such that latency is a
>> major issue, threading is ever more critical. It's really important that
>> we have a viable path to thread safety that is officially supported by
>> HDF5 and performant. We don't want to be facing problems down the road
>> due to use of the unsupported HDF5 config. Using the unsupported config
>> creates a deployment issue as we'd like to rely on HDF5 installed at HPC
>> centers or in official Linux distros, neither of whom will likely be
>> compiling HDF5 in an unsupported configuration. I also believe that for
>> the best performance locking is better done at the lowest level where it
>> can be fine grained, hence locking all NetCDF I/O in our application is
>> undesirable.
>>
>> I'm hoping that this conversation can be a data point that people are
>> using threads to speed processing of large datasets on parallel file
>> systems. It's important for us to have an officially supported thread
>> safe option for NetCDF 4 HDF5 format.
>>
>> Burlen
>>
>>
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web.  Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx<mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx<mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/






  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: