Re: [netcdf-java] OutOfMemoryError when opening an ensemble forecast dataset (many netcdf resources) via HTTP

  • To: John Caron <jcaron1129@xxxxxxxxx>
  • Subject: Re: [netcdf-java] OutOfMemoryError when opening an ensemble forecast dataset (many netcdf resources) via HTTP
  • From: Jesse Bickel - NOAA Affiliate <jesse.bickel@xxxxxxxx>
  • Date: Wed, 23 Oct 2019 17:23:25 -0500
Hi John,

Yes, it is technically feasible to open a portion of a forecast and
close that portion of the forecast, iterating one by one over hundreds
of timesteps, meanwhile collecting the data in that forecast. However,
one forecast spans many netCDF resources, and many of those forecasts
are included in the same netCDF resources. I agree that the use of
try-with-resources and Closeable/AutoCloseable are more desirable in
general. The abstraction I am building implements Closeable and
manages these several hundred netCDF resources. All it does is read
one of those forecasts when asked to. It could do this in serial or in
parallel. This same abstraction works for both HTTP resources and
filesystem resources thanks to the good abstraction in the Java cdm
library.

The question to me is this: why does the OutOfMemoryError occur only
with HTTP resources but not with filesystem resources? Why doesn't the
filesystem version allocate a ten million byte buffer on opening the
file while the HTTP version does? I have observed that reducing the
buffer size removes the OutOfMemoryError but involves some tradeoffs.
Therefore I speculate that some past tuning for a particular use case
caused the ten million byte buffer. It is neither here nor there how
or why, I was just curious. One can observe this by passing a 13MiB
netCDF local file to the same program: no OutOfMemoryError. Pass the
same 13MiB netCDF resource but via HTTP and one gets an
OutOfMemoryError.

As far as using the Aggregation capability, glancing over the software
it looks like this might automatically read all the bytes of at least
several netCDF resources and then write them to temporary storage on
the filesystem. If true, this is contrary to the goal. I am trying to
avoid transferring multiple GiB over a network when the software is
only interested in KiB or MiB. The HTTPRandomAccessFile seems to do
the right thing by using Range requests. The only strange thing is the
allocation of such a large amount of resources to only open a netCDF
resource. If I am wrong about the Aggregation abstraction (i.e. it
does not auto-download to a filesystem) then I would be interested to
try that instead.

In any case, it appears the issue is resolved with the merge of
https://github.com/Unidata/netcdf-java/pull/138 which makes the
property available to whatever launches the JVM.

Thank you so much for taking the time to reply and suggest
alternatives. Thanks for the netCDF Java library!

Jesse

--
Contractor, ERT, Inc.
Federal Affiliation: NWC/OWP/NOAA/DOC


  • 2019 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: