[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #BVD-982935]: Problems with the fuction nc_open



Hi Marcone,

Sorry to take so long to respond to your question ...
 
> my name is Marcone Magnus, I'm a graduating  student of computer science
> in  Federal University of Santa Catarina (http://www.ufsc.br/), I'm doing a
> research Lapesd (http://www.lapesd.inf.ufsc.br/) and Cyclops Group (
> http://www.cyclops.ufsc.br/) about high performance computer on a health
> care system that we devoloping and I'm using netcdf to storage medichal
> data.
> 
> my problem is I need to open and close my netcdf file several times. And as
> the file increases size it became slower to open.
> In fact open the file turn to be a critical point in my system. I using c
> interface and netcdf4 with hierarchical format. There is no way to use a
> nc_open function with time fixed ?

When a netCDF-4 file is opened, it reads all the metadata into memory once, so
that later references to the metadata can be accessed quickly.  By "metadata", 
I mean

  - names and sizes of dimensions
  - names, shapes, and types of variables
  - names and types of attributes
  - names of groups and their subgroups
  - definitions of all user-defined types

This could be slow if you add a large amount of metadata to the file before 
closing
and re-opening it.  However, if you are just adding more data to the file 
(values
for variables already defined), it should not slow down the nc_open calls
significantly.  Most of the use cases for netCDF-4 that we have seen benefit 
from
reading in all of the metadata when the file is first opened, to speed up access
to the data and metadata on subsequent calls while the file is still open.

The underlying HDF5 library works differently, only reading in metadata as 
needed, so
it is faster for cases such as a large number of nested groups where the common 
case
is to only read data from a small subset of those groups before closing the 
file.  That
makes the open much faster, but each read that has to access metadata slower.

We have considered implementing an optional  "fast open" by following the HDF5 
model,
but so far there has not been enough demand for that feature to make it a high
priority for development.

The only suggestions I have are to

  - consider making more use of data and less use of metadata for representing 
your
    data structures.  For example, instead of using thousands of separate small
    variables, use a smaller number of variables with indexing, or use large
    multidimensional variables instead of many small variables.

  - similarly, if you have thousands of deeply nested groups, consider a design 
that
    uses indexing in a few groups instead of relying on recursion in deeply 
nested
    groups.

  - consider using HDF5 directly instead of netCDF-4, to see if it's model of 
lazy
    evaluation of metadata is better suited for your data representations

  - try to keep the file open while it is used, to amortize the cost of opening 
and
    reading in all the metadata

--Russ





Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: BVD-982935
Department: Support netCDF
Priority: Critical
Status: Closed