[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #IFD-703495]: problem with parallel I/O when time coordinate is unlimited



> Dear netcdf support team,
> 
> I wish to report a problem in writing data on a netcdf4 file using
> parallel I/O.
> 
> All processors create a file with
> 
> nc_stat = NF90_CREATE_PAR( filename, OR( NF90_NOCLOBBER, &
> NF90_CLASSIC_MODEL ), comm2d, &
> MPI_INFO_NULL, id_set_mask(mid,av) )
> 
> and then define some coordinates. The time coordinate is unlimited:
> ...
> nc_stat = NF90_DEF_DIM( id_set_mask(mid,av), 'time', NF90_UNLIMITED,
> id_dim_time_mask(mid,av) )
> 
> ...
> 
> The timestep is written by each processor with:
> 
> nc_stat = NF90_PUT_VAR( id_set_mask(mid,av), id_var_time_mask(mid,av), &
> (/ simulated_time /),                          &
> start = (/ domask_time_count(mid,av) /),       &
> count = (/ 1 /) )
> 
> Then the data of a variable (local_pf) is added to the file:
> 
> nc_stat = NF90_PUT_VAR( id_set_mask(mid,av),  &
> id_var_domask(mid,av,if),  &
> local_pf,  &
> start = (/ mask_start_l(mid,1), mask_start_l(mid,2),  &
> mask_start_l(mid,3), domask_time_count(mid,av)
> /),  &
> count = (/ mask_size_l(mid,1), mask_size_l(mid,2),  &
> mask_size_l(mid,3), 1 /),  &
> stride =(/1,1,1,1/) )
> 
> Not every processor has data to write. If not, their local_pf has size 0.
> If processor 0 does not have data to write my output file contains no
> values for the variable 'w' (in this example):
> 
> >ncdump example_25.nc
> netcdf example_25 {
> dimensions:
> time = UNLIMITED ; // (1 currently)
> zu_3d = 1 ;
> zw_3d = 1 ;
> x = 4 ;
> xu = 4 ;
> y = 2 ;
> yv = 2 ;
> variables:
> double time(time) ;
> time:units = "seconds" ;
> double zu_3d(zu_3d) ;
> zu_3d:units = "meters" ;
> double zw_3d(zw_3d) ;
> zw_3d:units = "meters" ;
> double x(x) ;
> x:units = "meters" ;
> double xu(xu) ;
> xu:units = "meters" ;
> double y(y) ;
> y:units = "meters" ;
> double yv(yv) ;
> yv:units = "meters" ;
> float w(time, zw_3d, y, x) ;
> w:long_name = "w" ;
> w:units = "m/s" ;
> 
> // global attributes:
> :Conventions = "COARDS" ;
> :title = "PALM 3.7  Rev: 404  run: example.00  host:
> lcsgih  24-11-09 11:07:53" ;
> :VAR_LIST = ";w;" ;
> data:
> 
> time = 60 ;
> 
> zu_3d = 175 ;
> 
> zw_3d = 210 ;
> 
> x = 525, 1325, 1425, 1625 ;
> 
> xu = 500, 1300, 1400, 1600 ;
> 
> y = 1325, 1425 ;
> 
> yv = 1300, 1400 ;
> 
> w =
> _, _, _, _,
> _, _, _, _ ;
> }
> 

This is expected behavior. Since you have written one timestep of data to one 
of the variables that share the time dimension, you will get one timestep of 
data in every variable that shares that dimension. The data you will get will 
be the fill value for that variable, which is shown in ncdump as the "_" 
character. For more info on fill values see 
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-f90/Fill-Values.html.

If you turn off fill mode for the variable with nf90_def_var_fill 
(http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-f90/NF90_005fDEF_005fVAR_005fFILL.html#NF90_005fDEF_005fVAR_005fFILL),
 in which case you will get random data.

If you really don't want this to happen, perhaps you should define a different 
unlimited dimension for the w variable.

> If I don't set the time coordinate to UNLIMITED but to an integer, I get
> values for 'w' even if processor 0 don't have data to write:
> 
> ncdump example_32.nc
> netcdf example_32 {
> dimensions:
> time = 1 ;
> zu_3d = 1 ;
> zw_3d = 1 ;
> x = 4 ;
> xu = 4 ;
> y = 2 ;
> yv = 2 ;
> variables:
> double time(time) ;
> time:units = "seconds" ;
> double zu_3d(zu_3d) ;
> zu_3d:units = "meters" ;
> double zw_3d(zw_3d) ;
> zw_3d:units = "meters" ;
> double x(x) ;
> x:units = "meters" ;
> double xu(xu) ;
> xu:units = "meters" ;
> double y(y) ;
> y:units = "meters" ;
> double yv(yv) ;
> yv:units = "meters" ;
> float w(time, zw_3d, y, x) ;
> w:long_name = "w" ;
> w:units = "m/s" ;
> 
> // global attributes:
> :Conventions = "COARDS" ;
> :title = "PALM 3.7  Rev: 404  run: example.00  host:
> lcsgih  24-11-09 13:55:46" ;
> :VAR_LIST = ";w;" ;
> data:
> 
> time = 60 ;
> 
> zu_3d = 175 ;
> 
> zw_3d = 210 ;
> 
> x = 525, 1325, 1425, 1625 ;
> 
> xu = 500, 1300, 1400, 1600 ;
> 
> y = 1325, 1425 ;
> 
> yv = 1300, 1400 ;
> 
> w =
> 0.01312302, 0.03813415, 0.01370101, 0.000190291,
> -0.02151492, 0.03767766, 0.04295725, -0.05340138 ;
> }
> 

This surprises me, because I would expect to see the fill value here, not these 
random floats. I am going to add a test to confirm this is working.

> 
> It is essential for us to have an unlimited time coordinate and the
> feasibility that not every processor has to write.
> 

This is possible, but when one variable is extended along an unlimited 
dimension, all variables that share that unlimited dimension are also extended, 
using the fill value, or random data if fill value is turned off for 
performance reasons.

But you can use multiple unlimited dimensions to achieve what you want.

Thanks,

Ed


Ticket Details
===================
Ticket ID: IFD-703495
Department: Support netCDF
Priority: Urgent
Status: Closed