[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 970820: NetCDF file sizes



>To: address@hidden
>From: "Genevieve Fox" <address@hidden>
>Subject: Re: 970820: NetCDF file sizes
>Organization: LANL
>Keywords: 199708202132.PAA07302

Genevieve,

> Here is the ncdump info:
> -------------------------
> 
> > ncdump -h /scratch/dixiebutte/glf/Tdixiebutte01209.nc
> netcdf Tdixiebutte01209 {
> dimensions:
>         testx = 131072 ;
> 
> variables:
>         double Test Data(testx) ;
> 
> ----------------------------------------------------------
> 
> Does that help?

The size of the above file should be 1048668 bytes, as I've just
verified by storing the ncdump output in T.cdl, and then running "ncgen
-o test.nc -b T.cdl" to create a new netCDF file:

 -rw-rw-r--   1 russ     usystem  1048668 Aug 21 10:46 test.nc

This is about what you'd expect for storing 131072 doubles, each of
which requires 8 bytes, since 131072 * 8 is 1048576, with netCDF
overhead for storing information about the dimension and variable of 92
bytes.

> #> I need some insight to a file size issue with NetCDF.  I am writing out
> #> a 1 MB array 100 times - thus 100 MB of data  - or 104857600 bytes of data.
> #> When I do an ls on the file, I have a file of size 209719296 bytes.
> #>
> #> Any idea why this is double the size it should be?
> #>
> #> -
> ---------------------------------------------------------------------------
> #> Here are the netcdf calls I use:
> #>
> #> io_id         = nccreate(io_outfile, NC_CLOBBER);
> #> data_dims[0]  = ncdimdef(io_id, "testx", (long)(block_items * blocks_out));
> #> nc_varid      = ncvardef(io_id, data_name, NC_DOUBLE, 1, data_dims);
> #> ncendef(io_id);
> #>
> #> for (j=0; j < io_blocks_out ; j++)
> #>    status     = ncvarput(io_id, nc_varid, data_start, data_count, array);
> #>
> #> ncclose(io_outfile_id);

I'm not sure why you are creating a file using "io_id" as the netCDF ID,
but closing it using "io_outfile_id" instead.  I'll assume these have
the same value.

I assume io_blocks_out is 100.

I can't tell what the values of block_items or blocks_out are from the
above, or how you are changing the data_start and data_count arrays
between calls.  If block_items is 131072, blocks_out is 100, data_count
is 131072, and you are merely incrementing data_start by 131072 each
time through the loop, then the result will be a netCDF file of size
104857692, as I have verified by compiling and running the appended
program (using netCDF version 3.3.1).  I've changed the variable name to
"Test_Data" instead of "Test Data" (blanks in variable names are
discouraged because CDL doesn't allow them), but that shouldn't have any
effect on the file size.

Thus, I can't reproduce the problem here; the file size is as expected.
I can only suggest that you check the values of block_items, blocks_out,
io_blocks_out, data_start, and data_count, to see if they are what you
expect them to be.

--Russ

#include <netcdf.h>

main() {
    int block_items = 131072;
    int blocks_out = 100;
    int io_id         = nccreate("ts.nc", NC_CLOBBER);
    int data_dims[1];
    int nc_varid;
    int j;
    int io_blocks_out = 100;
    long data_start[] = {0};
    long data_count[] = {131072};
    double array[131072];
    int status;

    data_dims[0]  = ncdimdef(io_id, "testx", (long)(block_items * blocks_out)); 
    nc_varid      = ncvardef(io_id, "Test_Data", NC_DOUBLE, 1, data_dims);

    ncendef(io_id);

    for (j=0; j < io_blocks_out ; j++)
        status = ncvarput(io_id, nc_varid, data_start, data_count, array);
    ncclose(io_id);
}