Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Caching


Disk Caching

Writing temporary files using DiskCache

There are a number of places where the library needs to write files to disk. If you end up using the file more than once, its useful to save these files. Before it writes the temporary file, it looks to see if it already exists.

  1. If a filename ends with ".Z", ".zip", ".gzip", ".gz", or ".bz2", NetcdfFile.open will write an uncompressed file of the same name, but without the suffix.
  2. The GRIB IOSP writes an index file with the same name with suffix .gbx. Other IOSPs may do similar things.
  3. Nexrad2, Cinrad2 files that are compressed will be uncompressed to a file with an .uncompress prefix.

By default, DiskCache prefers to place the temporary file in the same directory as the original file. If it does not have write permission in that directory, by default it will use the directory ${user_home}/.unidata/cache/. You can change the directory by calling ucar.nc2.util.DiskCache.setRootDirectory().

You might want to always write temporary files to the cache directory, in order to manage them in a central place. To do so, call ucar.nc2.util.DiskCache.setCachePolicy( boolean alwaysInCache) with parameter alwaysInCache = true.

You may want to limit the amount of space the disk cache uses (unless you always have data in writeable directories, so that the disk cache is never used). To scour the cache, call DiskCache.cleanCache(). There are several variations of the cleanup:

For long running appplication, you might want to do this periodically in a background timer thread, as in the following example.

1) Calendar c = Calendar.getInstance(); // contains current startup time
   c.add( Calendar.MINUTE, 30); // add 30 minutes to current time     // run task every 60 minutes, starting 30 minutes from now
2) java.util.Timer timer = new Timer();  
   timer.scheduleAtFixedRate( new CacheScourTask(), c.getTime(), (long) 1000 * 60 * 60 ); 

3) private class CacheScourTask extends java.util.TimerTask {   
    public void run() {
     StringBuffer sbuff = new StringBuffer();
4)   DiskCache.cleanCache(100 * 1000 * 1000, sbuff); // 100 Mbytes
     sbuff.append("----------------------\n");
5)   log.info(sbuff.toString());
    }
   }
   ...
   // upon exiting
6) timer.cancel();
  1. Get the current time and add 30 minutes to it
  2. Start up a timer that executes every 60 minutes, starting in 30 minutes
  3. Your class must extend TimerTask, the run method is called by the Timer
  4. Scour the cache, allowing 100 Mbytes of space to be retained
  5. Optionally log a message with the results of the scour.
  6. Make sure you cancel the timer before your application exits, or else the process will not terminate.

GRIB indexing

In 4.0, the cache policy for GRIB indexes is set seperately from generic DiskCache, in order to give you seperate control:

   GribServiceProvider.setIndexAlwaysInCache( true); // always use the cache for grib index

Note that you still control whether to alway use a cache directory, and where that is located with DiskCache methods:

    ucar.nc2.util.DiskCache.setCachePolicy( boolean alwaysInCache);
ucar.nc2.util.DiskCache.setRootDirectory(String cacheDir)
;

In multi-threaded situations such as a server, you need to make sure that grib indexing is thread-safe. One way to do this is to generate the indexes ahead of time, then tell the library to not write the index, but only use files that already have an index:

   GribServiceProvider.setIndexExtendMode( IndexExtendMode.none); // never write an index
   GribServiceProvider.setIndexSyncMode( IndexExtendMode.none); // never sync the index

See GRIB decoder for details on generating a GRIB index externally. See GribServiceProvider javadoc for more details.


Object Caching

NetcdfFileCache

NetcdfFile objects are cached in memory for performance. When acquired, the object is locked so another thread cannot use. When closed, the lock is removed. When the cache is full, older objects are removed from the cache, and all resources released.

Note that typically a java.io.RandomAccessFile object, holding an OS file handle, is open while its in the cache. You must make sure that your cache size is not so large such that you run out of file handles due to NetcdfFile object caching. Most aggregations do not hold more than one file handle open, no matter how many files are in the aggregation. The exception to that is a Union aggregation, which holds each of the files in the union open for the duration of the NetcdfFile object.

Holding a file handle open also creates a read lock on some operating systems, which will prevent the file from being opened in write mode.

To enable caching, you must first call

  NetcdfDataset.initNetcdfFileCache(int minElementsInMemory, int maxElementsInMemory, int period);

where minElementsInMemory are the number of objects to keep in the cache when cleaning up, maxElementsInMemory triggers a cleanup if the cache size goes over it, and period specifies the time in seconds to do periodic cleanups.

After enabling, you can disable with:

NetcdfDataset.disableNetcdfFileCache();

However, you cant reenable after disabling.

Setting minElementsInMemory to zero will remove all files not currently in use every period seconds.

Normally the cleanup is done is a background thread to not interferre with your application, and the maximum elements is approximate. When resources such as file handles must be carefully managed, you can set a hard limit with this call:

   NetcdfDataset.initNetcdfFileCache(int minElementsInMemory, int maxElementsInMemory, int hardLimit, int period);

so that as soon as the number of NetcdfFile objects exceeds hardLimit , a cleanup is done immediately in the calling thread.

  

This document is maintained by John Caron and was last updated on Sep 28, 2009

 

 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Community Programs   Unidata is a member of the UCAR Community Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690