[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #AIQ-275071]: [netcdf-hdf] Unexpected overall file size jump



James,

> I am advised that you should be able to get the following via
> anonymous ftp:
> 
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.fnc> Fluid_Meas.fnc
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.fnc-nccopy-k3>
> Fluid_Meas.fnc-nccopy-k3
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.snc> Fluid_Meas.snc
> 
> ftp://ftp.exa.com/outgoing/netcdf/
> <ftp://ftp.exa.com/outgoing/netcdf/Fluid_Meas.snc-nccopy-k3>
> Fluid_Meas.snc-nccopy-k3

Thanks, I see what you mean!  We'll have to investigate why the 
netCDF-4 copies of these netCDF classic format files are so much 
larger than expected (e.g. 42 MB classic file but 96 MB netCDF-4 
file, and ncdump shows not a lot of metadata).  I don't currently 
have an explanation, but it could be a bug.

--Russ

> >> Thanks for the reply.  If the difference were metadata, wouldn't we
> >> expect to see the greatest difference between the netcdf-3 firnat
> >> and HDF with smaller data files?  In fact, we're finding the
> >> opposite.
> >
> > Yes, if you only have a moderate amount of metadata and lots of data,
> > HDF5 files would be much larger with a small amount of data but similar
> > in size with a large amount of data.
> >
> > If, however, you had lots of metadata (for example 5000 variables and
> > 5000 dimensions), then the HDF5 files might appear significantly larger
> > even with lots of data.
> >
> >> We would like to share some larger data files with you guys in
> >> order to better understand the situation.  Would you be willing to
> >> pick some data up from our ftp site?
> >
> > Yes, that would be useful.
> >
> > --Russ
> >
> >> > Hi James,
> >> >
> >> >> We recently began working on a transition from netcdf 3.6.2 to 4.1.1.
> >> >>
> >> >> The process was trouble free and things seem to be working, but we
> >> have
> >> >> been surprised to find the HDF variant producing extremely large
> >> files
> >> >> relative to the old netcdf native form.  Our measurement files are
> >> >> already
> >> >> enormous, and further growth would be deadly.
> >> >>
> >> >> Has anyone else encountered this?
> >> >
> >> > There is a larger fixed-size overhead for metadata (names and
> >> > properties of variables, dimensions, and attributes) in the HDF5-based
> >> > netCDF-4 format, but in our experience, it's not significant for files
> >> > with lots of data and only a moderate amount of metadata.  And use of
> >> > compression can make equivalent netCDF-4 files significantly smaller
> >> > than netCDF-3 classic format files.
> >> >
> >> > As an example we use in our netCDF training workshop, a small netCDF
> >> > classic format file with only one dimension of size 2 and one variable
> >> > that uses that dimension is very small using netCDF classic or 64-bit
> >> > offset formats:
> >> >
> >> >     88  test.nc1   # classic format
> >> >     92  test.nc2   # 64-bit -offset format
> >> >   5072  test.nc3   # netCDF-4 format
> >> >   5108  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > However, if you change the dimension size to 10000, the sizes are much
> >> > closer:
> >> >
> >> >  40080  test.nc1   # classic format
> >> >  40084  test.nc2   # 64-bit -offset format
> >> >  45064  test.nc3   # netCDF-4 format
> >> >  45101  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > And if you apply level-1 compression to the variable in the netCDF-4
> >> > format, the netCDF-4 file is significantly smaller for this
> >> > (artificial) data:
> >> >
> >> >  40080  test.nc1   # classic format
> >> >  40084  test.nc2   # 64-bit -offset format
> >> >  21055  test.nc3   # netCDF-4 format
> >> >  21092  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > Finally, if you apply the shuffle filter along with compression for
> >> > this test file, the result is significantly better compression:
> >> >
> >> >  40080  test.nc1   # classic format
> >> >  40084  test.nc2   # 64-bit -offset format
> >> >   7777  test.nc3   # netCDF-4 format
> >> >   7814  test.nc4   # netCDF-4 -classic model format
> >> >
> >> > It's easy to run little experiments like this with the "nccopy"
> >> > utility in the latest netCDF snapshot release (soon to be in version
> >> > 4.1.2), as you can specify conversions and compression on the command
> >> > line:
> >> >
> >> >   
> >> > http://www.unidata.ucar.edu/netcdf/workshops/2010/utilities/NccopyExamples.html
> >> >
> >> > This is a very articficial example and it's unlikely you'll get
> >> > results as good with your real data, but experimenting with nccopy's
> >> > compression options on some real data could determine what you can
> >> > expect in using netCDF 4 for your data.
> >> >
> >> > --Russ
> >> >
> >> > Russ Rew                                         UCAR Unidata Program
> >> > address@hidden                      http://www.unidata.ucar.edu
> >> >
> >> >
> >> >
> >> > Ticket Details
> >> > ===================
> >> > Ticket ID: AIQ-275071
> >> > Department: Support netCDF
> >> > Priority: Normal
> >> > Status: Closed
> >> >
> >>
> >>
> >>
> >
> > Russ Rew                                         UCAR Unidata Program
> > address@hidden                      http://www.unidata.ucar.edu
> >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: AIQ-275071
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> 
> 
> 

Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: AIQ-275071
Department: Support netCDF
Priority: Normal
Status: Closed


NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.