Due to the current gap in continued funding from the U.S. National Science Foundation (NSF), the NSF Unidata Program Center has temporarily paused most operations. See NSF Unidata Pause in Most Operations for details.

Re: [netcdfgroup] File with large number of variables

Dani <pressec@xxxxxxxxx> writes:

> Hi,
> I have to write and read data to/from a netcdf file that has 750
> variables, all of them using unlimited dimensions (only one per
> variable, some dimensions shared) and 10 fixed dimensions.
>
> I have use netcdf-4 (because of the multiple unlimited dimensions
> requirement) and C API.
>
> I'm making some prototyping on my development machine (Linux 2GB RAM)
> and found several performance issues that I hope someone can help me
> fix/understand:
>
> (1) when i create a file and try to define 1000 variables (all int)
> and a single shared unlimited dimension, the process takes all
> available RAM (swap included) and fails with "Error (data:def closed)
> -- HDF error" after a (long)while.
>
> If I do the same closing and opening the file again every 10 or 100
> new definitions, it works fine.  I can bypass this by creating the
> file once (ncgen) and using a copy of it on every new file, but I
> would prefer not to. Why does creating the variables take that much
> memory?

When you create a netCDF variable, HDF5 allocates a buffer for that
variable. The default size of the buffer is 1 MB. 

I have reproduced your problem, but it can be solved be explicitly
setting the buffer size for each variable to a lower value. I have
checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
cache setting:

      for (v = 0; v < NUM_VARS; v++)
      {
         sprintf(var_name, "var_%d", v);
         if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
         if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
      }

Note the call to nc_set_var_chunk_cache(), right after the call to
nc_def_var.

When I take this line out, I get a serious slowdown around 4000
variables. (I have more memory available than you do.)

But when I add the call to set_var_chunk_cache(), setting the chunk
cache to zero, then there is no slowdown, even for 10,000 variables.

Thanks,

Ed
-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx



  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: