[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: large netCDF data sets



Hi Dave,

> I am using a beta version of netcdf on the NCAR IBM
> machines that is supposed to allow data sets larger 
> than 2 GB.  I know that the library is partially
> working since I can indeed generate large files:
> 
> 2599136 -rw-r--r--   1 gill     ncar     2661494168 Aug 02 09:44 wrfinput_d01
> 
> Previously, with the original version of the netCDF
> libraries, the file would essentially stop being built
> just short of 2 GB.
> 
> Now the problem is that around where the 2 GB mark of 
> data would be, the subsequent data is not readable.
> The header looks like: 
> 
> netcdf wrfinput_d01 {
> dimensions:
>         Time = UNLIMITED ; // (1 currently)
>         DateStrLen = 19 ;
>         west_east = 1599 ;
>         south_north = 1199 ;
>         west_east_stag = 1600 ;
>         bottom_top = 34 ;
>         south_north_stag = 1200 ;
>         bottom_top_stag = 35 ;
>         ext_scalar = 1 ;
>         DIM0009 = 5 ;
>         land_cat_stag = 24 ;
>         soil_cat_stag = 16 ;
>         soil_layers_stag = 5 ;
> variables:
>         char Times(Time, DateStrLen) ;
>         float LU_INDEX(Time, south_north, west_east) ;
>                 LU_INDEX:FieldType = 104 ;
>                 LU_INDEX:MemoryOrder = "XY " ;
>                 LU_INDEX:description = "LAND USE CATEGORY" ;
>                 LU_INDEX:units = "" ;
>                 LU_INDEX:stagger = "" ;
>         float U(Time, bottom_top, south_north, west_east_stag) ;
>                 U:FieldType = 104 ;
>                 U:MemoryOrder = "XYZ" ;
>                 U:description = "x-wind component" ;
>                 U:units = "m s-1" ;
>                 U:stagger = "X" ;
>         float V(Time, bottom_top, south_north_stag, west_east) ;
>                 V:FieldType = 104 ;
>                 V:MemoryOrder = "XYZ" ;
>                 V:description = "y-wind component" ;
>                 V:units = "m s-1" ;
>                 V:stagger = "Y" ;
> 
> blah, blah, blah
> 
> 
>         float TMN(Time, south_north, west_east) ;
>                 TMN:FieldType = 104 ;
>                 TMN:MemoryOrder = "XY " ;
>                 TMN:description = "SOIL TEMPERATURE AT LOWER BOUNDARY" ;
>                 TMN:units = "K" ;
>                 TMN:stagger = "" ;
>         float XLAND(Time, south_north, west_east) ;
>                 XLAND:FieldType = 104 ;
>                 XLAND:MemoryOrder = "XY " ;
>                 XLAND:description = "LAND MASK (1 FOR LAND, 2 FOR WATER)" ;
>                 XLAND:units = "" ;
>                 XLAND:stagger = "" ;
>         float SNOWC(Time, south_north, west_east) ;
>                 SNOWC:FieldType = 104 ;
>                 SNOWC:MemoryOrder = "XY " ;
>                 SNOWC:description = "FLAG INDICATING SNOW COVERAGE (1 FOR 
> SNOW COVER)" ;
>                 SNOWC:units = "" ;
>                 SNOWC:stagger = "" ;
> 
> // global attributes:
> 
> 
> blah, blah, blah
> 
> All of the gloabal attributes are fine and the 
> first several arrays are readable:
> 
>  V =
>   0.9275115, 0.9107732, 0.8933921, 0.8753608, 0.8566817, 0.8373512, 
>     0.8173745, 0.7967464, 0.7754745, 0.7535835, 0.7310421, 0.7078766, 
>     0.6840948, 0.6596546, 0.6345721, 0.6088864, 0.582568, 0.5556307, 
>     0.528056, 0.4998606, 0.4710823, 0.4416702, 0.4127204, 0.3856255, 
>     0.3594684, 0.3329445, 0.3060701, 0.2788765, 0.2513194, 0.2234103, 
>     0.1951479, 0.166538, 0.1375814, 0.1082985, 0.07867109, 0.04868151, 
>     0.01834685, -0.0123029, -0.04327539, -0.07462986, -0.1063238, -0.1383224, 
>     -0.1706348, -0.20329, -0.2363084, -0.269637, -0.3033047, -0.3373152, 
>     -0.371634, -0.4062615, -0.4412068, -0.4764935, -0.5120971, -0.5480246, 
>  
> But once you get towards the end of the data, you get:
> 
> /contrib/bin/ncdump -v TMN 
> /ptmp/gill/SPEC/WRFV2/test/em_real/Extra_Large/wrfinput_d01
> 
>  TMN =
> 
> 
> - - - - - -
> 
> So, based on our header have we violated any of the 
> assumptions required for a large data set using netCDF?
> Any aid or suggestions are welcome.

I'm *very* interested in this problem report, but I'm currently out of
town on vacation and don't have much time to work on it.

Your header doesn't violate any of the constraints on structure of a
large dataset under 3.6.0, so should work OK.  This may indicate a bug
in the LFS library code that our test cases haven't uncovered.  But it
may also indicate a bug in ncdump, if that's all you are using to
check whether the values are stored correctly.

One thing you could do to help isolate the problem is to use just use
the netCDF library interface to access the bad values rather than
depending on ncdump.  If you get the same bad values using the library
interface that ncdump displays, that would indicate the problem is in
the library.

Also, if you could provide us with a complete CDL file so we could
duplicate the problem, that would help.  The CDL file need not
include any values for variables, or maybe just the coordinate values
you get with ncdump -c.

Thanks for reporting the problem.

--Russ