[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010606: question about Java netcdf reading performance



>To: address@hidden
>From: "Matt Pearce" <address@hidden>
>Subject: question about Java netcdf reading performance
>Organization: UCAR/Unidata
>Keywords: 200106062013.f56KDEp06300

Hi Matt,

> I have a question about reading Variable data in netCDF using the
> Java version(s) of netCDF library.  I noticed in the older version
> of the code that you were calling functions like raf.readFloat(),
> raf.readInt(), etc.  In other words, reading value by value from the
> file (which is very inefficient and *slow*).  You mention that reads
> are much faster in the new version of the Java netCDF library.

Actually the functions raf.readFloat(), raf.readInt(), etc. in version
1 do not read value-by-value from the file, but instead use our own
implementation of a RandomAccessFile class which is an efficient,
buffered replacement for java.io.RandomAccessFile.  See
RandomAccessFile.java in the sources, and especially the comments at
the beginning for more about this.  What we did was similar to what's
in the new java.nio package for buffering random access files, with
similar performance gains.  In one benchmark, the use of our
RandomAccessFile implementation made netCDF access to blocks of values
20 to 30 times faster than using java.io.RandomAccessFile.

> Two questions for you:
> 
> 1. In the new version of the library, what is the proper way to read
> Variable data?  In the previous version, I got a Variable object and
> called toArray() to grab the data for a variable.  Is this still the
> same, or is there a better way to grab large datasets in a high
> performance way?

Here's an example fragment of how to read variable data in the version
2 interface:

    try {
      NetcdfFile nc = new NetcdfFile(fileName); // open it readonly
      Variable v = nc.findVariable(varName);    // get variable by name
      Array varMa = v.read();
        ...
    }

See the documentation for more information ...

> 2. Did you use some type of buffered read technique (i.e., into an
> array) instead of reading value by value from the netCDF file?  I am
> looking for a high performance way to read data from Variables and I
> am not sure how to do this.
> Do you have any suggestions?

The version 2 interface uses the version 1 buffered RandomAccessFile,
so should have similar performance for reading all the values of a
variable into an array.  Where it achieves much better performance is
in the subsetting and reordering operations and the use of
unsynchronized access to the new MultiArray objects.

I've CC:ed John Caron, the developer of the version 2 interface, in
case he has more to add ...

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu