[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #UNH-363541]: F90 netcdf 3.6.3 performance question



Tim,

> I really don't get what you're saying.
> 
> When I write the data, I already write big contiguous blocks of data ...
> 
> WriteNetcdf() has calls like the following:
> 
> call nc_check(nf90_put_var(ncid, VarId, obs_times(1:ngood), &
> start=istart, count=icount), 'WriteNetCDF',
> 'put_var:time')
> 
> where 'ngood' is generally about 50,000 ... so I'm writing 50,000 *
> 64bit reals
> in one 'go' ...

But they are *not* contiguous, there is one value in each record, so this
write generates 50,000 I/O calls to write perhaps 50,000 disk blocks
(maybe 25,000 disk blocks if two records fit in a disk block).
In the netCDF-3 format, each record is contiguous, but values from the
same record variable are not, they are interlaced on the disk.  The values
for all record variables are stored together.  Here is a picture of
netCDF classic format record storage:

  
http://www.unidata.ucar.edu/software/netcdf/workshops/2008/performance/FileFormat.html
  
> To my knowledge, I _never_ write a single int within each record ...
> The record/unlimited dimension for me is NOT 'time', it is 'ObsIndex',
> which has a dimid of ObsNumDimID

Right, I understood that.  The key to writing lots of record data efficiently
is not to write a little bit of data into a large number of records, as you
are now doing, but instead to write all the data you can into each record,
and to write the records in order.  With the seven record variables you have
defined, you will be doing only 1/7th of the I/O calls that way, because
you will be taking advantage of disk buffering.  It may seem counterintuitive
that writing a small amount of data in each write takes less time, but you
are really only moving the data into in-memory disk buffers, and the buffer
gets flushed to the disk only once for each record, not once for each record
value.

> If what you were saying was true ... it would still not explain why
> the second time 50,000 items were being written would take 3 seconds
> and the first time takes more than 600 seconds.

I didn't realize you were writing 50,000 records with *each* call of the
WriteNetcdf routine.  You're right, merely reorganizing your writes
to be record-at-a-time won't make the second and subsequent sets of
writes faster than the first time in that case, unless there's some disk
optimization going on that I don't know about.

> So, I really don't get it ...

I still think reorganizing your writes will speed things up significantly, 
but if the first write still takes much longer than the second, I'd be 
interested in seeing instrumentation on a run to see what's happening with
the I/O.

> Tim Hoar, Associate Scientist
> National Center for Atmospheric Research
> address@hidden
> 303 497 1708
> 
> 
> On Apr 27, 2009, at 2:36 PM, Unidata netCDF Support wrote:
> 
> > Tim,
> >
> > I note your already calling the NO_FILL option, so I don't think
> > this is
> > related to fill values.
> >
> > I think it's slow because of the order in which you are writing the
> > data: just a single int (4 bytes) within each record, then just a just
> > a single double (8 bytes) in each record, etc.  To write a small
> > amount of data in each record requires reading the block into memory,
> > putting the data in the block in memory, then writing out the block to
> > disk.
> >
> > I think it your I/O would be an order of magnitude faster if instead
> > you just wrote all of each record sequentially, in a loop on the
> > number of records.  So the code in the WriteNetCDF subroutine would
> > be something like:
> >
> > obsindex  = dimlen + 1
> > icount(1) = 1
> > do iobs = obsindex, obsindex + ngood - 1
> >   istart(1) = iobs
> >   call nc_check(nf90_inq_varid(ncid, 'ObsIndex', VarID), &
> >            'WriteNetCDF', 'inq_varid:ObsIndex '//trim(fname))
> >   call nc_check(nf90_put_var(ncid, VarId, iobs, &
> >              start=istart, count=icount), 'WriteNetCDF',
> > 'put_var:ObsIndex')
> >
> >   call nc_check(nf90_inq_varid(ncid, 'time', VarID), &
> >          'WriteNetCDF', 'inq_varid:time '//trim(fname))
> >   call nc_check(nf90_put_var(ncid, VarId, obs_times(iobs), &
> >             start=istart, count=icount), 'WriteNetCDF', 'put_var:time')
> >
> >   call nc_check(nf90_inq_varid(ncid, 'obs_type', VarID), &
> >          'WriteNetCDF', 'inq_varid:obs_type '//trim(fname))
> >   call nc_check(nf90_put_var(ncid, VarId, obs_types(iobs), &
> >             start=istart, count=icount), 'WriteNetCDF', 'put_var:obs_type')
> >
> >   call nc_check(nf90_inq_varid(ncid, 'which_vert', VarID), &
> >          'WriteNetCDF', 'inq_varid:which_vert '//trim(fname))
> >   call nc_check(nf90_put_var(ncid, VarId, which_vert(iobs), &
> >             start=istart, count=icount), 'WriteNetCDF', 
> > 'put_var:which_vert')
> >
> >   dim1length = size(locations,1)
> >   call nc_check(nf90_inq_varid(ncid, 'location', VarID), &
> >          'WriteNetCDF', 'inq_varid:location '//trim(fname))
> >   call nc_check(nf90_put_var(ncid, VarId, locations(:,iobs), &
> >        start=istart, count=dim1length ), &
> >          'WriteNetCDF', 'put_var:location')
> >
> > ! and so on
> > !  ...
> >
> >
> >   call nc_check(nf90_inq_varid(ncid, 'qc', VarID), &
> >          'WriteNetCDF', 'inq_varid:qc '//trim(fname))
> >   call nc_check(nf90_put_var(ncid, VarId,  qc_copies(:,iobs), &
> >        start=istart, count=dim1length ), &
> >          'WriteNetCDF', 'put_var:observations')
> > enddo
> >
> > This way, each record stays cached in an in-meory buffer after it is
> > read in and while it is being filled, so it only gets read once and
> > flushed to disk once.  The nf90_inq_varid calls in the loop above
> > don't take much time (they are reading in-memory info that was read
> > from disk when you opened the file) so they don't take much time, but
> > if you want to take them out of the loop, you can, by using a
> > different VarID for each variable, e.g.:
> >
> > call nc_check(nf90_inq_varid(ncid, 'ObsIndex', ObsIndexID), &
> >          'WriteNetCDF', 'inq_varid:ObsIndex '//trim(fname))
> > call nc_check(nf90_inq_varid(ncid, 'time', timeID), &
> >     'WriteNetCDF', 'inq_varid:time '//trim(fname))
> > ! ... and so on
> >
> > and then use these variable IDs in the loop, e.g.:
> >
> >   call nc_check(nf90_put_var(ncid, ObsIndexId, iobs, &
> >              start=istart, count=icount), 'WriteNetCDF',
> > 'put_var:ObsIndex')
> >
> >   call nc_check(nf90_put_var(ncid, timeId, obs_times(iobs), &
> >             start=istart, count=icount), 'WriteNetCDF', 'put_var:time')
> >
> > and so on.
> >
> > I haven't checked whether any of the above compiles, but if you
> > organize it this way, it will end up doing a lot less I/O.
> >
> > --Russ
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                     http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: UNH-363541
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: UNH-363541
Department: Support netCDF
Priority: Normal
Status: Closed