Re: [netcdf-java] NetCDF File and Variable Data Caching

  • To: Kevin Off - NOAA Affiliate <kevin.off@xxxxxxxx>
  • Subject: Re: [netcdf-java] NetCDF File and Variable Data Caching
  • From: Christian Ward-Garrison <cwardgar@xxxxxxxx>
  • Date: Wed, 15 Jun 2016 16:31:03 -0600
Hi Kevin,

Sorry for the delay in responding–I was busy with the release of 4.6.6–but
I have some time to work on this issue now. A couple questions:

1. What does your webapp do? It sounds like it takes a user-defined subset
of the data in a NetCDF file and returns it in JSON format. How similar is
it to our NetCDF Subset Service (example
<http://thredds.ucar.edu/thredds/ncss/grib/NCEP/NAM/Alaska_11km/Best/dataset.html>
)?
2. What version of NetCDF-Java are you using. I suspect that much of the
slowness you're encountering was already fixed
<https://github.com/cwardgar/thredds/commit/075e9a819ee10714d53b355481a7cccac88b1fb9#diff-99981060deed76f1a9ddedc4362acd7fL155>
in v4.6.5.

Cheers,
Christian

On Wed, Jun 8, 2016 at 4:17 PM, Kevin Off - NOAA Affiliate <
kevin.off@xxxxxxxx> wrote:

> Hi all,
>
> I am trying to understand caching when it comes to the file and the actual
> data. The application that I am working on will provide data from 133
> NetCDF files that range in size from 50 MB to 400 MB. These are weather
> forecast files that contain about 22 variables that we are interested in.
> Each variable has between 1 and 55 or so time steps as dimensions.
>
> This is a Spring web application running in an embedded tomcat instance.
> All of the files on disk amount to about 22GB of data.
>
> When I receive a request I:
>
>    1. Re-project the lat lon to the dataset's projection (Lambert
>    Convormal)
>    2. Lookup the index of the data from the coordinate variabls
>    3. loop through every variable
>    4. Perform the Array a = var.read()
>    5. Loop through every time step and retrieve the value at the
>    specified point
>    6. Return it all in a JSON document.
>
> This application needs to be extremely fast. We will be serving thousands
> of requests per second (in production on a scaled system) depending on
> weather conditions.
>
> I have been told that hardware is not an obstacle and that I can use as
> much memory as I need.
> During my coding and debugging I have been able to achieve a response time
> of about 200ms - 400ms on average (this does not include any network time).
> As I add timers to every part of the application I find that most of the
> time is spent in the Variable.read() function.
>
> Here is a summary of the the configuration of the app.
>
> NetcdfDataset.initNetcdfFileCache(100, 200, 0);
> NetcdfDataset nc = NetcdfDataset.acquireDataset(filename, null)
> for each coverage{
>   Variable v = ds.findVariable(name)
>   Array d = v.read()
>   for each time step {
>     value = d.read(time, y, x)
>   }
> }
> nc.close()
>
> I have several questions.
>
>    1. I noticed that when the NetcdfDataset.close() function is called it
>    detects that I am using caching and performs releases. This causes the
>    IOServiceProvider (AbstractIOServiceProvider).release() to be called which
>    closes and nulls the RandomAccessFile. Then, next time that
>    NetcdfDataset.acquireDataset() is called it causes the
>    FileCache.acquireCacheOnly() to return null because the cached
>    NetcdfDataset.raf (RandomAccessFile) is null so it makes the lastModified =
>    0. Am I missing something or is there no way to reuse the NetcdfDataset
>    after you call close()?
>    2. What does NetcdfDataset.acquireDataset() actually cache? Is it just
>    the metadata or does it actually read in the data to all of the variables?
>    3. Can I avoid having to do a Variable.read() for every request?
>    Shouldn't this data be cached inside of the netcdf file.
>    4. I see that there are caching functions on the Variable object.
>    Should I be using those caching options and just storing those Variable
>    objects in memory in my own cache instead.
>    5. Would it be a better option to use NetcdfFile.openInMemory().
>
> I know this is a bit long winded but I just want to make sure to explore
> all of my options. I have spent a lot of time stepping through the ucar
> library and have already learned a lot. I just need a little guidance
> regarding some of the more abstract caching functionality. Thanks for your
> help.
>
> --
> Kevin Off
> Internet Dissemination Group, Kansas City
> Shared Infrastructure Services Branch
> National Weather Service
> Software Engineer / Ace Info Solutions, Inc.
> <http://www.aceinfosolutions.com>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: