[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #UNH-363541]: F90 netcdf 3.6.3 performance question



Tim,

I note your already calling the NO_FILL option, so I don't think this is
related to fill values.

I think it's slow because of the order in which you are writing the
data: just a single int (4 bytes) within each record, then just a just
a single double (8 bytes) in each record, etc.  To write a small
amount of data in each record requires reading the block into memory,
putting the data in the block in memory, then writing out the block to
disk.

I think it your I/O would be an order of magnitude faster if instead
you just wrote all of each record sequentially, in a loop on the
number of records.  So the code in the WriteNetCDF subroutine would
be something like:

obsindex  = dimlen + 1
icount(1) = 1
do iobs = obsindex, obsindex + ngood - 1
  istart(1) = iobs
  call nc_check(nf90_inq_varid(ncid, 'ObsIndex', VarID), &
           'WriteNetCDF', 'inq_varid:ObsIndex '//trim(fname))
  call nc_check(nf90_put_var(ncid, VarId, iobs, &
             start=istart, count=icount), 'WriteNetCDF', 'put_var:ObsIndex')

  call nc_check(nf90_inq_varid(ncid, 'time', VarID), &
             'WriteNetCDF', 'inq_varid:time '//trim(fname))
  call nc_check(nf90_put_var(ncid, VarId, obs_times(iobs), &
                start=istart, count=icount), 'WriteNetCDF', 'put_var:time')

  call nc_check(nf90_inq_varid(ncid, 'obs_type', VarID), &
             'WriteNetCDF', 'inq_varid:obs_type '//trim(fname))
  call nc_check(nf90_put_var(ncid, VarId, obs_types(iobs), &
                start=istart, count=icount), 'WriteNetCDF', 'put_var:obs_type')

  call nc_check(nf90_inq_varid(ncid, 'which_vert', VarID), &
             'WriteNetCDF', 'inq_varid:which_vert '//trim(fname))
  call nc_check(nf90_put_var(ncid, VarId, which_vert(iobs), &
                start=istart, count=icount), 'WriteNetCDF', 
'put_var:which_vert')

  dim1length = size(locations,1)
  call nc_check(nf90_inq_varid(ncid, 'location', VarID), &
             'WriteNetCDF', 'inq_varid:location '//trim(fname))
  call nc_check(nf90_put_var(ncid, VarId, locations(:,iobs), &
           start=istart, count=dim1length ), &
             'WriteNetCDF', 'put_var:location')

! and so on
!  ...


  call nc_check(nf90_inq_varid(ncid, 'qc', VarID), &
             'WriteNetCDF', 'inq_varid:qc '//trim(fname))
  call nc_check(nf90_put_var(ncid, VarId,  qc_copies(:,iobs), &
           start=istart, count=dim1length ), &
             'WriteNetCDF', 'put_var:observations')
enddo

This way, each record stays cached in an in-meory buffer after it is
read in and while it is being filled, so it only gets read once and
flushed to disk once.  The nf90_inq_varid calls in the loop above
don't take much time (they are reading in-memory info that was read
from disk when you opened the file) so they don't take much time, but
if you want to take them out of the loop, you can, by using a
different VarID for each variable, e.g.:

call nc_check(nf90_inq_varid(ncid, 'ObsIndex', ObsIndexID), &
         'WriteNetCDF', 'inq_varid:ObsIndex '//trim(fname))
call nc_check(nf90_inq_varid(ncid, 'time', timeID), &
    'WriteNetCDF', 'inq_varid:time '//trim(fname))
! ... and so on

and then use these variable IDs in the loop, e.g.:

  call nc_check(nf90_put_var(ncid, ObsIndexId, iobs, &
             start=istart, count=icount), 'WriteNetCDF', 'put_var:ObsIndex')

  call nc_check(nf90_put_var(ncid, timeId, obs_times(iobs), &
                start=istart, count=icount), 'WriteNetCDF', 'put_var:time')

and so on.

I haven't checked whether any of the above compiles, but if you
organize it this way, it will end up doing a lot less I/O.

--Russ

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: UNH-363541
Department: Support netCDF
Priority: Normal
Status: Closed