[netcdf-java] NetCDF File and Variable Data Caching

To: netcdf-java@xxxxxxxxxxxxxxxx
Subject: [netcdf-java] NetCDF File and Variable Data Caching
From: Kevin Off - NOAA Affiliate <kevin.off@xxxxxxxx>
Date: Wed, 8 Jun 2016 17:17:51 -0500

Hi all,

I am trying to understand caching when it comes to the file and the actual
data. The application that I am working on will provide data from 133
NetCDF files that range in size from 50 MB to 400 MB. These are weather
forecast files that contain about 22 variables that we are interested in.
Each variable has between 1 and 55 or so time steps as dimensions.

This is a Spring web application running in an embedded tomcat instance.
All of the files on disk amount to about 22GB of data.

When I receive a request I:

   1. Re-project the lat lon to the dataset's projection (Lambert Convormal)
   2. Lookup the index of the data from the coordinate variabls
   3. loop through every variable
   4. Perform the Array a = var.read()
   5. Loop through every time step and retrieve the value at the specified
   point
   6. Return it all in a JSON document.

This application needs to be extremely fast. We will be serving thousands
of requests per second (in production on a scaled system) depending on
weather conditions.

I have been told that hardware is not an obstacle and that I can use as
much memory as I need.
During my coding and debugging I have been able to achieve a response time
of about 200ms - 400ms on average (this does not include any network time).
As I add timers to every part of the application I find that most of the
time is spent in the Variable.read() function.

Here is a summary of the the configuration of the app.

NetcdfDataset.initNetcdfFileCache(100, 200, 0);
NetcdfDataset nc = NetcdfDataset.acquireDataset(filename, null)
for each coverage{
  Variable v = ds.findVariable(name)
  Array d = v.read()
  for each time step {
    value = d.read(time, y, x)
  }
}
nc.close()

I have several questions.

   1. I noticed that when the NetcdfDataset.close() function is called it
   detects that I am using caching and performs releases. This causes the
   IOServiceProvider (AbstractIOServiceProvider).release() to be called which
   closes and nulls the RandomAccessFile. Then, next time that
   NetcdfDataset.acquireDataset() is called it causes the
   FileCache.acquireCacheOnly() to return null because the cached
   NetcdfDataset.raf (RandomAccessFile) is null so it makes the lastModified =
   0. Am I missing something or is there no way to reuse the NetcdfDataset
   after you call close()?
   2. What does NetcdfDataset.acquireDataset() actually cache? Is it just
   the metadata or does it actually read in the data to all of the variables?
   3. Can I avoid having to do a Variable.read() for every request?
   Shouldn't this data be cached inside of the netcdf file.
   4. I see that there are caching functions on the Variable object. Should
   I be using those caching options and just storing those Variable objects in
   memory in my own cache instead.
   5. Would it be a better option to use NetcdfFile.openInMemory().

I know this is a bit long winded but I just want to make sure to explore
all of my options. I have spent a lot of time stepping through the ucar
library and have already learned a lot. I just need a little guidance
regarding some of the more abstract caching functionality. Thanks for your
help.

-- 
Kevin Off
Internet Dissemination Group, Kansas City
Shared Infrastructure Services Branch
National Weather Service
Software Engineer / Ace Info Solutions, Inc.
<http://www.aceinfosolutions.com>

Follow-Ups:
- Re: [netcdf-java] NetCDF File and Variable Data Caching
  - From: Christian Ward-Garrison

2016 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: