Re: [netcdfgroup] Unexpectedly large netCDF4 files from python

To: Ted Mansell <ted.mansell@xxxxxxxx>
Subject: Re: [netcdfgroup] Unexpectedly large netCDF4 files from python
From: Chris Barker <chris.barker@xxxxxxxx>
Date: Tue, 5 Apr 2016 12:52:39 -0700

On Tue, Apr 5, 2016 at 12:13 PM, Ted Mansell <ted.mansell@xxxxxxxx> wrote:

> You might check the ChunkSizes attribute with 'ncdump -hs'. The newer
> netcdf sets larger default chunks than it used to. I had this issue with
> 1-d variables that used an unlimited dimension. Even if the dimension only
> had a small number, the default chunk made it much bigger.


I had the same issue -- 1-d variable had a chunksize of 1, which was
really, really bad!

But that doesn't seem to be the issue here -- I ran the same code, and get
the same results, and here is the dump:

netcdf text3 {
types:
  ubyte(*) variable_data_t ;
dimensions:
    timestamp_dim = UNLIMITED ; // (1 currently)
    data_dim = UNLIMITED ; // (1 currently)
    item_len = 100 ;
variables:
    double timestamp(timestamp_dim) ;
        timestamp:_Storage = "chunked" ;
        timestamp:_ChunkSizes = 524288 ;
    variable_data_t data(data_dim) ;
        data:_Storage = "chunked" ;
        data:_ChunkSizes = 4194304 ;
        data:_NoFill = "true" ;

// global attributes:
        :_Format = "netCDF-4" ;
}

if I read that right, nice big chunks.

note that if I do'nt use a VLType variable, I still get a 4MB file --
though that could be the netcdf4 overhead:

netcdf text3 {
types:
  ubyte(*) variable_data_t ;
dimensions:
    timestamp_dim = UNLIMITED ; // (1 currently)
    data_dim = UNLIMITED ; // (1 currently)
    item_len = 100 ;
variables:
    double timestamp(timestamp_dim) ;
        timestamp:_Storage = "chunked" ;
        timestamp:_ChunkSizes = 524288 ;
    ubyte data(data_dim, item_len) ;
        data:_Storage = "chunked" ;
        data:_ChunkSizes = 1, 100 ;

// global attributes:
        :_Format = "netCDF-4" ;
}

something is up with the VLen.....

-CHB





> (Assuming the variable is not compressed.)
>
> -- Ted
>
> __________________________________________________________
> | Edward Mansell <ted.mansell@xxxxxxxx>
> | National Severe Storms Laboratory
> |--------------------------------------------------------------
> | "The contents of this message are mine personally and
> | do not reflect any position of the U.S. Government or NOAA."
> |--------------------------------------------------------------
>
> On Apr 5, 2016, at 1:44 PM, Val Schmidt <vschmidt@xxxxxxxxxxxx> wrote:
>
> > Hello netcdf folks,
> >
> > I’m testing some python code for writing sets of timestamps and variable
> length binary blobs to a netcdf file and the resulting file size is
> perplexing to me.
> >
> > The following segment of python code creates a file with just two
> variables, “timestamp” and “data”, populates the first entry of the
> timestamp variable with a float and the corresponding first entry of the
> data variable with an array of 100 unsigned 8-bit integers. The total
> amount of data is 108 bytes.
> >
> > But the resulting file is over 73 MB in size. Does anyone know why this
> might be so large and what I might be doing to cause it?
> >
> > Thanks,
> >
> > Val
> >
> >
> > from netCDF4 import Dataset
> > import numpy
> >
> > f = Dataset('scratch/text3.nc','w')
> >
> > dim = f.createDimension('timestamp_dim',None)
> > data_dim = f.createDimension('data_dim',None)
> >
> > data_t = f.createVLType('u1','variable_data_t’)
> >
> > timestamp = f.createVariable('timestamp','d','timestamp_dim')
> > data = f.createVariable('data',data_t,'data_dim’)
> >
> > timestamp[0] = time.time()
> > data[0] = uint8( numpy.ones(1,100))
> >
> > f.close()
> >
> > ------------------------------------------------------
> > Val Schmidt
> > CCOM/JHC
> > University of New Hampshire
> > Chase Ocean Engineering Lab
> > 24 Colovos Road
> > Durham, NH 03824
> > e: vschmidt [AT] ccom.unh.edu
> > m: 614.286.3726
> >
> >
> > _______________________________________________
> > netcdfgroup mailing list
> > netcdfgroup@xxxxxxxxxxxxxxxx
> > For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx

Follow-Ups:
- Re: [netcdfgroup] Unexpectedly large netCDF4 files from python
  - From: Chris Barker

References:
- [netcdfgroup] Unexpectedly large netCDF4 files from python
  - From: Val Schmidt
- Re: [netcdfgroup] Unexpectedly large netCDF4 files from python
  - From: Ted Mansell

2016 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: