[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: FW: Netcdf4-Java reading problem



P.S. I haven't found a definitive answer on the dimension length limit for 
netcdf-C, but I think this says it is 2^32:

http://www.unidata.ucar.edu/software/netcdf/faq-lfs.html#Large%20File%20Support12
>>> If you get the netCDF library error "Invalid dimension size", you are 
>>> exceeding the size limit of netCDF dimensions, which must be less than 
>>> 2,147,483,644 for classic files with no large file support and otherwise 
>>> less than 4,294,967,292. 

That would likely hold us for the life of the mission we're currently working.

The following made me hope it was 2^63:

http://www.unidata.ucar.edu/software/netcdf/docs/group__dimensions.html
>>> Dimension lengths in the C interface are type size_t rather than type int 
>>> to make it possible to access all the data in a netCDF dataset on a 
>>> platform that only supports a 16-bit int data type, for example MSDOS.

The fact that the python wrapper seems to work with >2^31 rows makes me 
confident the C library is ok with 3G dimension length.

Kim
________________________________________
From: Kim Kokkonen
Sent: Wednesday, August 20, 2014 2:43 PM
To: John Caron; Edward Hartnett
Cc: support-netcdf-java; Benjamin Busby
Subject: RE: FW: Netcdf4-Java reading problem

Hi John. The particular compound variable causing the trouble has about 3e9 
rows and growing. We have two variables needed of that size to represent 
time-dimensioned int16 data collected at 10/sec over a 11+ year mission. The 
variable seems ok when we build the netCDF4/HDF5 file using a python writer. 
Reading is also ok via python, but java can't handle it.

The current file is 45GB (uncompressed), including one of these giant variables 
plus some smaller ones. We have more to add, but were trying the bigger pieces 
first.

Do you have ideas for workarounds using java access to the file? We've thought 
about using a vlen field within the compound type to hold 10 or more of the 
rows, reducing the overall variable size to 300M instead of 3G, but that leads 
to other issues we've encountered with vlens. Another option would be to slice 
the data into year-size netCDF files and open the right one based on the time 
we need.

We have a large codebase written in java that needs to access the data 
(currently stored in a database) and we can't afford to reimplement it all. I 
guess another option is to use JNI or JNA with underlying C for the netCDF 
access. Have you done that before?

Any plans to extend the dimension lengths for java? 2^31 isn't all that big 
these days.

Thanks,
Kim
________________________________________
From: John Caron [address@hidden]
Sent: Wednesday, August 20, 2014 1:03 PM
To: Edward Hartnett
Cc: address@hidden; Kim Kokkonen; support-netcdf-java
Subject: Re: FW: Netcdf4-Java reading problem

Yes, netcdf-java will fail if any dimension length > 2^31 -1, since we use 
signed ints for dimensions lengths. Can netcdf-C library handle this?

Thats quite a massive file you have there. How big is it?



On Mon, Aug 18, 2014 at 3:53 PM, Edward Hartnett 
<address@hidden<mailto:address@hidden>> wrote:
Howdy John!

I hope all is going well at Unidata! ;-)

Can netCDF-java netCDF-4 files handle unlimited dimensions > MAX_INT?

Thanks!
Ed
________________________________________
From: Benjamin Busby
Sent: Monday, August 18, 2014 2:36 PM
To: Edward Hartnett
Cc: Kim Kokkonen
Subject: Netcdf4-Java reading problem

Hi Ed,

I've come across a problem with reading a netcdf file using Java, where a 
dimension whose length is greater than the maximum int value causes the file to 
fail when attempting to open it. Here's the header for the current netcdf file 
that I'm working with:

netcdf AllTimData {
types:
  compound AllTimScienceSamples {
    int64 packetVtcw ;
    int64 sampleVtcw ;
    byte adcLatchup3 ;
    byte adcLatchup2 ;
    byte adcLatchup1 ;
    byte coneActiveStateD ;
    byte coneActiveStateC ;
    byte coneActiveStateB ;
    byte coneActiveStateA ;
    byte relayCone ;
    byte shutterPositionD ;
    byte shutterPositionC ;
    byte shutterPositionB ;
    byte shutterPositionA ;
    byte shutterOvercurrentD ;
    byte shutterOvercurrentC ;
    byte shutterOvercurrentB ;
    byte shutterOvercurrentA ;
    byte dspGainMode ;
    byte dspEclipseMode ;
    byte sequenceCounter ;
    int heaterControl ;
    byte loopDataLengthA ;
    byte loopDataLengthB ;
    int64 MUTimeStamp ;
  }; // AllTimScienceSamples
  compound AllTimScienceDataA {
    int64 sampleVtcw ;
    int64 dataVtcw ;
    int loopData ;
  }; // AllTimScienceDataA
dimensions:
        TSS-sampleVtcw = UNLIMITED ; // (328872712 currently)
        TSS-index = UNLIMITED ; // (3289 currently)
        TSDA-dataVtcw = UNLIMITED ; // (3282309813 currently)
        TSDA-index = UNLIMITED ; // (32824 currently)
variables:
        int64 TSS-sampleVtcw(TSS-sampleVtcw) ;
        AllTimScienceSamples TimScienceSamples(TSS-sampleVtcw) ;
        int64 TSS-index(TSS-index) ;
        int64 TSDA-dataVtcw(TSDA-dataVtcw) ;
        AllTimScienceDataA TimScienceDataA(TSDA-dataVtcw) ;
        int64 TSDA-index(TSDA-index) ;
}

Opening and reading from this file works fine in Python, but attempting to even 
open the file in Java causes this error:

java.io.IOException: java.lang.IllegalArgumentException: Unlimited Dimension 
length =-1012657483 must >= 0
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:430)
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:397)
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:384)
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:372)
        at IndexedNC4Reader.main(IndexedNC4Reader.java:62)
Caused by: java.lang.IllegalArgumentException: Unlimited Dimension length 
=-1012657483 must >= 0
        at ucar.nc2.Dimension.setLength(Dimension.java:433)
        at ucar.nc2.Dimension.<init>(Dimension.java:363)
        at ucar.nc2.iosp.hdf5.H5header.addDimension(H5header.java:801)
        at ucar.nc2.iosp.hdf5.H5header.findDimensionScales(H5header.java:613)
        at ucar.nc2.iosp.hdf5.H5header.makeNetcdfGroup(H5header.java:416)
        at ucar.nc2.iosp.hdf5.H5header.read(H5header.java:216)
        at ucar.nc2.iosp.hdf5.H5iosp.open(H5iosp.java:130)
        at ucar.nc2.NetcdfFile.<init>(NetcdfFile.java:1528)
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:820)
        at ucar.nc2.NetcdfFile.open(NetcdfFile.java:427)
        ... 4 more

I traced this back to the documentation for ucar.nc2.Dimension.setLength which 
takes an int argument, which won't work for the 3,282,309,813 size of the 
dimension in TimScienceDataA. Do you know of a workaround for this problem?

Also, possibly related, trying to examine the variables independently through a 
"ncdump -v" is blank for variables/dimensions with a size greater than 2^31-1. 
Is this a normal occurrence?

Thank you,
Ben