Re: [thredds] Thredds WMS support for large source grids

Hi Ben,

I've now implemented this in the ncWMS trunk.  (Actually I took the same idea 
as your implementation but did it in another way, which hopefully saves some 
memory.)  You might be able to build a new ncWMS.jar file or wait for the next 
codebase sync.

I haven't implemented Xiangtan's fix to force the SCANLINE data-reading 
strategy as sometimes this isn't the right thing to do (although it certainly 
is when datasets get very large).  I think this needs to be made configurable, 
or perhaps the switch to SCANLINE could happen when grids get above a certain 

I also haven't gotten around to implementing your unit test, but thanks very 
much indeed for this, I'll try to do that soon.

Many thanks for your contribution!  And you definitely hold the current record 
for "largest file served through ncWMS" (at least to my knowledge).  I'm 
pleased the performance holds up OK.

Cheers, Jon

-----Original Message-----
From: Ben Caradoc-Davies [mailto:Ben.Caradoc-Davies@xxxxxxxx] 
Sent: 05 September 2011 10:13
To: Jon Blower
Cc: thredds mailing list
Subject: Thredds WMS support for large source grids


I tested WMS in thredds 4.2.6 with large NetCDF source grids and encountered an 
integer overflow in ncwms PixelMap. (You foretold this in the comments!) The 
attached patch fixes this defect at the cost of a small increase in memory use.

You might remember writing (in PixelMap):

// Calculate a single integer representing this grid point in the source grid 
// TODO: watch out for overflows (would only happen with a very large grid!) 
int sourceGridIndex = j * this.sourceGridISize + i;

The integer overflow appears when the source grid has more than 2**31-1 points. 
For example, this limit is exceeded with a 26 GB NetCDF file with a single 
ubyte variable on a 92255x301081 grid.

The attached patch includes Xiangtan Lin's CdmUtils fix to force 
DataReadingStrategy.SCANLINE for HDF5:

The PixelMap change replaces the single integer array representing source and 
target grid offsets integers packed into a single long with two long arrays, 
one for source and one for target. This costs extra memory but may, in addition 
to supporting large grids, improve performance by avoiding packing an unpacking.

It also includes:
- a minor CdmUtils static initialiser change to appease ecj (the Eclipse
- access changes in HorizontalCoordSys to support unit testing
- a fix for axis sizes needed when LatLonCoordSys is explicitly instantiated in 
the unit test (otherwise they can never be set)
- a unit test in which only the small() test method passes before the patch is 
applied (to ensure existing behaviour is preserved for small grids); all test 
methods ensure the expected source grid offset monotonicity

The patch is against the ncwms-src.jar distributed with thredds 4.2.6 (I'm 
guessing the ncwms tds4.2-20101102 branch).

With this patch applied and the replacement ncwms.jar installed in WEB-INF/lib, 
thredds 4.2.6 can serve a test 647 GB NetCDF4/HDF5 file via

The test file has a single ubyte variable on a 461276x1505407 grid.

Performance is better than I expected; the aligned source and target grids plus 
the nearest-point mapping from target to source seem to do the trick.

Kind regards,

Ben Caradoc-Davies <Ben.Caradoc-Davies@xxxxxxxx> Software Engineering Team 
Leader CSIRO Earth Science and Resource Engineering Australian Resources 
Research Centre