Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?

Hi Ed,

I don't think HDF5 will write only the last value since you are asking HDF5 to
create that size of big dataset. It will write 17179869152 bytes + overhead into
the disk. So depending on your system, it may take minutes. 

Quincey may give you more technical explanations. I don't know if using chunks
may help you much. However, I think this is a good case to apply compression
filter since it will compress very well and should overcome I/O time.

Kent

Quoting Ed Hartnett <ed@xxxxxxxxxxxxxxxx>:

> Howdy all!
> 
> I am writing a test program which writes large files (well over 2
> GB). I have some questions about HDF5 and very large files. I need to
> check out whether netCDF-4 has been correctly implemented for best
> performance.
> 
> In the program below, I create 4 datasets, of type double. They are
> one-dimensional, with length 2147483644/4. (That is 17179869152 bytes
> of data.)
> 
> Then I write the last value only in each dataset.
> 
> Took a really long time - minutes. Is this expected? What is HDF5
> doing in the background here? Is there something I can do with
> chunking here to improve the speed of this program?
> 
> I am not setting a fill calue, so what is being written here? I
> naively expected that HDF5 would not write all the data I am skipping,
> but would find a way to write data only around the value that I am
> actually writing...
> 
> The file that this program creates is 17179883735 bytes, which is
> 14583 bytes of HDF5 overhead. Is that about what is expected?
> 
> Any comments welcome...
> 
> Thanks,
> 
> Ed
> 
> /*
>  Copyright 2007, UCAR/Unidata
>  See COPYRIGHT file for copying and redistribution conditions.
> 
>  This program (quickly, but not throughly) tests the large file
>  features of netCDF-4.
> 
>  $Id: tst_large.c,v 1.3 2007/08/18 12:26:38 ed Exp $
> */
> #include <config.h>
> #include <nc_tests.h>
> #include <netcdf.h>
> #include <stdio.h>
> #include <string.h>
> 
> /* This is the magic number for classic format limits: 2 GiB - 4
>    bytes. */
> #define MAX_CLASSIC_BYTES 2147483644
> 
> /* This is the magic number for 64-bit offset format limits: 4 GiB - 4
>    bytes. */
> #define MAX_64OFFSET_BYTES 4294967292
> 
> /* Handy for constucting tests. */
> #define QTR_CLASSIC_MAX (MAX_CLASSIC_BYTES/4)
> 
> /* We will create this file. */
> #define FILE_NAME "tst_large.nc"
> 
> int
> main(int argc, char **argv)
> {
> 
>     printf("\n*** Testing really large files in netCDF-4/HDF5 format,
> quickly.\n");
> 
>     printf("\n*** Testing create of simple, but large, file...");
>     {
> #define DIM_NAME "Time_in_nanoseconds"
> #define NUMDIMS 1
> #define NUMVARS 4
> 
>        int ncid, dimids[NUMDIMS], varid[NUMVARS];
>        char var_name[NUMVARS][NC_MAX_NAME + 1] = {"England", "Scotland",
> "Ireland", "Wales"};
>        size_t index[2] = {QTR_CLASSIC_MAX-1, 0};
>        int ndims, nvars, natts, unlimdimid;
>        nc_type xtype;
>        char name_in[NC_MAX_NAME + 1];
>        size_t len;
>        double pi = 3.1459, pi_in;
>        int i; 
> 
>        /* Create a netCDF netCDF-4/HDF5 format file, with 4 vars. */
>        if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR;
>        if (nc_set_fill(ncid, NC_NOFILL, NULL)) ERR;
>        if (nc_def_dim(ncid, DIM_NAME, QTR_CLASSIC_MAX, dimids)) ERR;
>        for (i = 0; i < NUMVARS; i++)
>        {
>         if (nc_def_var(ncid, var_name[i], NC_DOUBLE, NUMDIMS, 
>                        dimids, &varid[i])) ERR;
>        }
>        if (nc_enddef(ncid)) ERR;
>        for (i = 0; i < NUMVARS; i++)
>         if (nc_put_var1_double(ncid, i, index, &pi)) ERR;
>        if (nc_close(ncid)) ERR;
>        
>        /* Reopen and check the file. */
>        if (nc_open(FILE_NAME, 0, &ncid)) ERR;
>        if (nc_inq(ncid, &ndims, &nvars, &natts, &unlimdimid)) ERR;
>        if (ndims != NUMDIMS || nvars != NUMVARS || natts != 0 || unlimdimid
> != -1) ERR;
>        if (nc_inq_dimids(ncid, &ndims, dimids, 1)) ERR;
>        if (ndims != 1 || dimids[0] != 0) ERR;
>        if (nc_inq_dim(ncid, 0, name_in, &len)) ERR;
>        if (strcmp(name_in, DIM_NAME) || len != QTR_CLASSIC_MAX) ERR;
>        for (i = 0; i < NUMVARS; i++)
>        {
>         if (nc_inq_var(ncid, i, name_in, &xtype, &ndims, dimids, &natts)) ERR;
>         if (strcmp(name_in, var_name[i]) || xtype != NC_DOUBLE || ndims != 1 
> || 
>             dimids[0] != 0 || natts != 0) ERR;
>         if (nc_get_var1_double(ncid, i, index, &pi_in)) ERR;
>         if (pi_in != pi) ERR;
>        }
>        if (nc_close(ncid)) ERR;
>     }
> 
>     SUMMARIZE_ERR;
>     FINAL_RESULTS;
> }
> 
> 
> -- 
> Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx
> 
> _______________________________________________
> netcdf-hdf mailing list
> netcdf-hdf@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/ 
>