[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #BVD-982935]: Problems with the fuction nc_open

Hi Marcone,

Sorry to take so long to respond to your question ...
> my name is Marcone Magnus, I'm a graduating  student of computer science
> in  Federal University of Santa Catarina (http://www.ufsc.br/), I'm doing a
> research Lapesd (http://www.lapesd.inf.ufsc.br/) and Cyclops Group (
> http://www.cyclops.ufsc.br/) about high performance computer on a health
> care system that we devoloping and I'm using netcdf to storage medichal
> data.
> my problem is I need to open and close my netcdf file several times. And as
> the file increases size it became slower to open.
> In fact open the file turn to be a critical point in my system. I using c
> interface and netcdf4 with hierarchical format. There is no way to use a
> nc_open function with time fixed ?

When a netCDF-4 file is opened, it reads all the metadata into memory once, so
that later references to the metadata can be accessed quickly.  By "metadata", 
I mean

  - names and sizes of dimensions
  - names, shapes, and types of variables
  - names and types of attributes
  - names of groups and their subgroups
  - definitions of all user-defined types

This could be slow if you add a large amount of metadata to the file before 
and re-opening it.  However, if you are just adding more data to the file 
for variables already defined), it should not slow down the nc_open calls
significantly.  Most of the use cases for netCDF-4 that we have seen benefit 
reading in all of the metadata when the file is first opened, to speed up access
to the data and metadata on subsequent calls while the file is still open.

The underlying HDF5 library works differently, only reading in metadata as 
needed, so
it is faster for cases such as a large number of nested groups where the common 
is to only read data from a small subset of those groups before closing the 
file.  That
makes the open much faster, but each read that has to access metadata slower.

We have considered implementing an optional  "fast open" by following the HDF5 
but so far there has not been enough demand for that feature to make it a high
priority for development.

The only suggestions I have are to

  - consider making more use of data and less use of metadata for representing 
    data structures.  For example, instead of using thousands of separate small
    variables, use a smaller number of variables with indexing, or use large
    multidimensional variables instead of many small variables.

  - similarly, if you have thousands of deeply nested groups, consider a design 
    uses indexing in a few groups instead of relying on recursion in deeply 

  - consider using HDF5 directly instead of netCDF-4, to see if it's model of 
    evaluation of metadata is better suited for your data representations

  - try to keep the file open while it is used, to amortize the cost of opening 
    reading in all the metadata


Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu

Ticket Details
Ticket ID: BVD-982935
Department: Support netCDF
Priority: Critical
Status: Closed