[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory usage in JAVA API



Visweswara Rao Kottapalli wrote:

Hello Dr.John,

I have a question regarding memory usage of Array class.

I have a huge CDF file of 1 GigaBytes with Mass Spectral Data. It has
four huge 1D arrays. Two of the arrays are of size 1,51,200,000 each.
These two arrays are short and int dataTypes respectively. The other two
1D arrays are of sizes 604,800 with double and int as dataTypes
respectively.

I am using the following piece of code to read these four arrays.

massValues = massValuesVariable.read().copyTo1DJavaArray();
intensityValues = intensityValuesVariable.read().copyTo1DJavaArray();
totalValues = (double[])totalIntensityVariable.read().copyTo1DJavaArray();
indexLocator = (int[])indexLocatorIndexVariable.read().copyTo1DJavaArray();

Though the total memory occupied in RAM by four of these arrays togather
is less than 900MB, the first statement, that is reading massValues, itself is
taking over 1.55GBytes of RAM. And for the second statement, its almost
going till 1.75GBytes and showing low virtual memory. And its taking over
20mins to complete these four operations.

you are making a copy, so it will need at least twice the amount of memory. Avoid the copy if you can.

you are likely thrashing the virtual memory of your system.

what kind of machine / OS are you running?


Can you suggest any better way of doing this? Why is it taking so much of
memory?

Also instead of loading the whole array into the memory, can we read value
by value from the file using the existing netCDF java API?

yes, the right way to handle large arrays is to bring them into memory piece by piece. You have to be more clever in writing your program so that you can get some work done on just one part of the array, before moving on to the next. For example, if the data has a time dimension, often you can read just one time "slice" in, process that, then go on to the next time.

To read just a part of the array at a time, use:

 /**
  * Read data from the netcdf file and return a memory resident Array.
  * This Array has the same element type as the IOArray, and the requested 
shape.
  * Note that this does not do rank reduction, so the returned Array has the 
same rank
  *  as the Variable. Use Array.reduce() for rank reduction.
  * <p>
  * <code>assert(origin[ii] + shape[ii] <= Variable.shape[ii]); </code>
  * <p>
  * @param origin int array specifying the starting index.
  * @param shape  int array specifying the extents in each
  *     dimension. This becomes the shape of the returned Array.
  * @return the requested data in a memory-resident Array
  */
 public Array read(int [] origin, int [] shape) throws IOException, 
InvalidRangeException;


Let me know if that helps.