Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?

To: Wei-keng Liao <wkliao@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
From: Heiko Klein <Heiko.Klein@xxxxxx>
Date: Tue, 22 Sep 2015 16:43:52 +0200

Hi Wei-keng,

thanks for the information. I got now the parallel version working with
NC_INDEPENDENT and from 2 processors, I see some benefit, e.g. real-time
reduction from 40s to 30s. But when I add more processors performance
get worse.

I'm using parallel hdf5  with an uncompressed netcdf4 file. Is
NC_INDEPENDENT faster with netcdf3 pNetcdf?


What I'm basically doing now is:

for (size_t vi = 0; vi < vars.size(); ++vi) {
   if ((vi % mifi_mpi_size) != mifi_mpi_rank) {
       continue; // skipping
   }
   check(nc_var_par_access(ncId, varId[vi], NC_INDEPENDENT));
   //... read data for the variable from grib
   check(nc_put_vara_float(ncId, varId[vi], start, count, data));
}

Maybe there are better ways to do that?


Best regards,

Heiko


On 2015-09-22 09:07, Wei-keng Liao wrote:
> Hi, Heiko
> 
> In that case, you can use independent mode.
> I.e. nc_var_par_access(ncid, varid, NC_INDEPENDENT)
> 
> It still allows you to write to a shared file from multiple
> MPI processes independently, at different time.
> 
> However, the performance will not be as good as the collective mode.
> 
> Wei-keng
> 
> On Sep 22, 2015, at 1:45 AM, Heiko Klein wrote:
> 
>> Hei Wei-keng,
>>
>> thanks for your tip about using pnetcdf. I've worked with MPI, but only
>> for modeling, i.e. when all processes do approximately the same thing at
>> the same time.
>>
>> The problem here is that the 10 input-files don't appear on my machines
>> at the same time. They are ensemble members and downloaded from
>> different machines with different processors, so the first file might
>> appear 30s before the last file (within a total time-step time of 2
>> minutes). I would like to start as soon as the first file appears, but
>> this sounds very difficult with MPI, isn't it? (I'm more familiar with
>> OpenMP, and there exist task-based parallelization (what I would use
>> here), and loop-base parallelization (which is more like MPI?))
>>
>> Best regards,
>>
>> Heiko
>>
>> On 2015-09-22 03:24, Wei-keng Liao wrote:
>>> Hi, Heiko
>>>
>>> Parallel I/O to the classical netCDF format is supported by netCDF through 
>>> PnetCDF underneath.
>>> It allows you to write concurrently to a single shared file from multiple 
>>> MPI processes.
>>> Of course, you will have to build PnetCDF first and then build netCDF with 
>>> --enable-pnetcdf configure option.
>>>
>>> Your netCDF program does not need much changes to make use this feature. 
>>> All you have to
>>> do is the followings.
>>> 1. call nc_create_par() instead of nc_create()
>>> 2. add NC_PNETCDF to the create mode argument of nc_create_par
>>> 3. call nc_var_par_access(ncid, varid, NC_COLLECTIVE) after nc_enddef to 
>>> enable collective I/O mode
>>>
>>> There are a couple example codes available in this URL.
>>> http://cucis.ece.northwestern.edu/projects/PnetCDF/#InteroperabilityWithNetCDF4
>>>
>>> There are instructions in each example file for building netCDF with 
>>> PnetCDF.
>>> For downloading PnetCDF, please see 
>>> http://cucis.ece.northwestern.edu/projects/PnetCDF/download.html
>>>
>>> Wei-keng
>>>
>>> On Sep 21, 2015, at 9:14 AM, Heiko Klein wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> yes, they are all writing to the same file - we want to have one file at
>>>> the end.
>>>>
>>>> I've been scanning through the source-code of netcdf3. I guess the
>>>> problem of the partly written sections is caused by the translation of
>>>> the nc_put_vara calls to internal pages, and the from the internal pages
>>>> to disk. And eventually, the internal pages are not aligned with my
>>>> nc_put_vara calls, so even when the region of nc_put_vara doesn't
>>>> overlap between concurrent calls, the internal pages do? Is there a way
>>>> to enforce proper alignment? I see nc__enddef has several align parameters.
>>>>
>>>>
>>>> I'm aware that concurrent writes are not officially supported by the
>>>> netcdf-library. But IT-infrastructure has changed a lot since the start
>>>> of the netcdf-library and systems are nowadays highly parallelized, both
>>>> on CPU and also in IO/filesystems. I'm trying to find a way to allow for
>>>> simple parallelization. Having many output-files from a model is risky
>>>> for data-consistency - so I would like to avoid it without sacrificing
>>>> to much speed.
>>>>
>>>> Best regards,
>>>>
>>>> Heiko
>>>>
>>>>
>>>> On 2015-09-21 15:18, Nick Papior wrote:
>>>>> So, are they writing to the same files?
>>>>>
>>>>> I.e. job1 writes a(:,1) to test.nc <http://test.nc> and job2 writes
>>>>> a(:,2) to test.nc <http://test.nc>?
>>>>> Because that is not allowed.
>>>>>
>>>>> 2015-09-21 15:13 GMT+02:00 Heiko Klein <Heiko.Klein@xxxxxx
>>>>> <mailto:Heiko.Klein@xxxxxx>>:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>   I'm trying to convert about 90GB of NWP data 4 times daily from grib to
>>>>>   netcdf. The grib-files arrive as fast as the data can be downloaded from
>>>>>   the HPC machines. They come by 10 files/forecast timestep.
>>>>>
>>>>>   Currently, I manage to convert 1 file/forecast timestep and I would like
>>>>>   to parallelize the conversion into independent jobs (i.e. neither MPI or
>>>>>   OpenMP), with a theoretical performance increase of 10. The underlying
>>>>>   IO system is fast enough to handle 10 jobs, and I have enough CPUs, but
>>>>>   the concurrently written netcdf-files show data which is only written
>>>>>   half to the disk, or mixed with other slices.
>>>>>
>>>>>   What I do is create a _FILL_VALUE 'template' file, containing all
>>>>>   definitions before the NWP job runs. When a new set of files arrives,
>>>>>   the data is put to the respective data-slices which don't have any
>>>>>   overlap, there is never a redefine, only functions like: nc_put_vara_*
>>>>>   with different slices.
>>>>>
>>>>>   Since the nc_put_vara_* calls are non-overlapping, I hoped that this
>>>>>   type of concurrent write would work - but it doesn't. Is my idea really
>>>>>   so bad to write data in parallel (e.g. there are internal buffers which
>>>>>   are rewritten)? Any ideas how to improve the conversion process?
>>>>>
>>>>>   Best regards,
>>>>>
>>>>>   Heiko
>>>>>
>>>>>   _______________________________________________
>>>>>   netcdfgroup mailing list
>>>>>   netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
>>>>>   For list information or to unsubscribe,  visit:
>>>>>   http://www.unidata.ucar.edu/mailing_lists/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Kind regards Nick
>>>>
>>>> -- 
>>>> Dr. Heiko Klein                   Norwegian Meteorological Institute
>>>> Tel. + 47 22 96 32 58             P.O. Box 43 Blindern
>>>> http://www.met.no                 0313 Oslo NORWAY
>>>>
>>>> _______________________________________________
>>>> netcdfgroup mailing list
>>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>>> For list information or to unsubscribe,  visit: 
>>>> http://www.unidata.ucar.edu/mailing_lists/ 
>>>
>>
>> -- 
>> Dr. Heiko Klein                   Norwegian Meteorological Institute
>> Tel. + 47 22 96 32 58             P.O. Box 43 Blindern
>> http://www.met.no                 0313 Oslo NORWAY
> 

-- 
Dr. Heiko Klein                   Norwegian Meteorological Institute
Tel. + 47 22 96 32 58             P.O. Box 43 Blindern
http://www.met.no                 0313 Oslo NORWAY

Follow-Ups:
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Wei-keng Liao

References:
- [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Heiko Klein
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Nick Papior
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Heiko Klein
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Wei-keng Liao
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Heiko Klein
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Wei-keng Liao

2015 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: