Re: VisAD memory and speed performace with harge data set

  • To: m huang <qomo@xxxxxxxxx>
  • Subject: Re: VisAD memory and speed performace with harge data set
  • From: Bill Hibbard <billh@xxxxxxxxxxxxx>
  • Date: Sat, 14 Dec 2002 05:59:52 -0600 (CST)
On Fri, 13 Dec 2002, m huang wrote:

> I am trying to find the optimum ways to use VisAD in two scenarios,
> both of which are related to the change in existing arrays that
> feed data into VisAD.
> The first scenario is after I have fed VisAD with some data stored
> in Java primitive arrays, and have made the data to be rendered on
> a display, the data in these array changes. So I need to update the
> display to replect the data change. I find that if I re-run the
> flatField.setSamples() with the data arrays the corresponding changes
> in the display will be made.
> I would like to know if this is supposed to be the way to change the
> display?

It is one way to do it. You could also create a new FlatField
and pass it to the setData() method of your DataReference.

> The second scenario is that I have fixed sized huge arrays. To
> implify
> the case suppose the data arrays are 1D double type for x and y,
> represented by (x -> y) or (index->(x,y)) If I want to look at part
> of
> the data set, I can set the range in RangeMap of x (or set range of
> index in ( index->(x,y)). My problem is that the part of x and y
> arrays not being looked at are still processed by VisAD (e.g. when
> looking for Max and Min when the data is set) and result extra memory
> allocation inside VisAD.

If you explicitly call setRange() for all appropriate
ScalarMaps, then VisAD will skip the search for min and
max values for auto-scaling.

> In a special case, if I only want to look at the first n points of
> (x->y) where n is smaller than the length of x and y arrays, I
> thought
> I could use this to get a subset of x into x_set:
> ...
> x_set=new Gridded1DDoubleSet(type, new double[][]{x_array}, n);
> ...
> But I get an exception saying that `` visad.SetException:
> visad.SetException: Gridded1DDoubleSet.init_doubles: samples
> [0] length [x array length] doesn't match expected length [n]''.
> What is the point of n if n is not allowed to be smaller than
> the length of x ? and even if I do set "copy" to true, whereas
> a new array is supposed be created and values copied from x, this
> exception still happens (from reading source code). Can anyone
> explain what is the idea behind the last parameter in constructor
> Gridded1DDoubleSet(MathType type, float[][] samples, int lengthX)  ?

The lengthX argument is there for consistency with all the
other GriddedSet constructors. For example, a Gridded2DSet
with manifold dimension = 2 needs lengthX and lengthY to know
how to factor the length of the samples array, and a
Gridded2DSet with manifold dimension = 1 needs lengthX just
to indicate that the manifold dimension is 1 rather than 2.

> So for the second scenario, isn't there a generic way to tell
> VisAD only to look part of the data and not to process or
> allocate memory for the part that is not being looked at?

The default DataRenderers will apply some processing to all
points even with a ScalarMap to SelectRange. This is partly
to gain the efficiency of eith not copying data at all (where
possible), or using arraycopy() where possible.

You are free to create a custom DataRenderer (see tutorial)
that handles your special case more efficiently.

> Now for both scenarios, I find that the setSamples() in FlatField
> actually ignores the "copy" argument. Why is the copy argument
> "meaningless" (per JavaDoc) in FlatField? If really meaningless, why
> is the argument there?

This is not true. If you pass a double[][] array to a
FlatField with the default FloatSet range Set, indicating
internal storage as a float[][] array, then it must copy to
convert the doubles to floats. But if copy = false and it
can avoid copying, it will.

> Since FlatFlied is such a fundamental
> data structure in VisAD, does this mean if anything in the orginal
> primitive data array changes, be it data value  or the
> subset of interest, VisAD MUST throw away the old FlatField and
> whatever internal data structures created to plot the old data,
> and make a new ones?

The VisAD display logic does not throw away any FlatFields.
If any value changes, the default DataRenderer do re-display
the entire Field because they don't know which values changed
and there may also be changes to Controls or ScalarMap ranges
that require everything to be re-displayed. There are some
non-default DataRenderers (e.g., ImageRendererJ3D) that do not
re-display everything. You are free to create more.

> I also notice that assignment loops are used
> to copy arrays in, even when copying the same type
> of array. Why isn't System.arraycopy() used ?

Not true. You'll find calls to arraycopy() all over VisAD.
If you find a place where a loop can be replaced by a call
to arraycopy, let us know.

> If the original data structure is simple enough, is it possible
> to only pass the reference of the original data all the way through
> until perhaps the screen coordinates are needed to be calculated?

The VisAD display logic avoids copying data until it has
to change data values. This can happen because of unit
conversions, coordinate transforms and ScalarMap scalings.

> To estimate memory use, in the above (x->y) case, how many
> times more momory does VisAD  use in order to plot it? Are
> there internal buffers that I haven't noticed in this post?

I'm not sure. Once upon a time I did a detailed analysis
of memory use for iso-surfaces and contour lines, which
are major memory hogs. It is hard to analyze in simple
cases because Java3D itself uses quite a bit of memory.
The best guide is to run with Optimizit and measure memory
use empirically.

> Unless I missed something big in VisAD (which happens often :-)
> the performance hit can be a show stopper in semi-realtime
> situation such as plotting time series where there are new
> data points available and appended to existing data set
> at even modest rate.

We are applying VisAD to applications with large data sets,
using a variety of techniques such as variable resolution
rendering, custom DataRenderers and custom file readers. The
other thing to keep in mind is that current performance
problems will be long forgotten by 2007. Designs aimed for
5 years in the future should focus on adequate generality
and flexibility to adapt to new needs, rather than current
performance problems.