[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Has anyone tried netCDF for Java?



> I've already done a little bit of work with the package. My main
> interest is in trying to build some of the Netcdf/MultiArray objects
> into JavaBeans. Its fairly preliminary at the moment, although I've at
> least got to the stage where I can string a few beans together in the
> Beanbox to do colour plots of Variables from a NetcdfFile object. The
> aim is to get a decent set of beans together, so that I can quickly
> prototype GUI front-ends for use in developing numerical ocean models,
> using a drag-and-drop visual builder tool.
>
> On the whole, I've found the netcdf & multiarray packages to be very
> well thought out and robust. A few minor points:
>
> * Occasional method name inconsistencies
>
>   eg. in class Variable:
>       public DimensionIterator getDimensionIterator()
>   but in class Netcdf:
>       public VariableIterator iterator()

There is a reason for this "inconsistancy".
The "Set" interfaces DimensionSet, AttributeSet, Schema ~= ProtoVariableSet,
and Netcdf ~= VariableSet all have the same method signatures, in particular,
an iterator() method which returns the type specific Iterator.
In C++, these would be derived from a "Set<>" template class.

Variable is _not_ a Set of Dimensions, so it is not consistant with this
pattern.

> * The relationship between Variable & ProtoVariable is not completely
> clear to me.

It is difficult. We went around and around on this.
One could imagine defining an interface which includes all the
methods common to ProtoVariable and Variable, or having Variable inherit
from ProtoVariable. In the final analysis, however, Variable is not
a "kind of" ProtoVariable. At most, one can think that a Variable
"has a" associated ProtoVariable. Even this last statement isn't exactly
true, since the fields common to Variable and ProtoVariable are immutable
in Variable.

A way to think of the relationship that a ProtoVariable is a prototype for
Variables.

> For example, suppose I have a Variable and I would like
> to pass information about it to another object, but without providing
> all the data access methods. What I'd like to do is pass a
> ProtoVariable, but given a Variable there doesn't seem to be a way to
> extract its corresponding ProtoVariable (the reverse is OK, ie. I can
> use a ProtoVariable in constructing a Variable, so there is a lack of
> symmetry there). The same arguments apply to Schema & Netcdf too.

There is a public constructor for ProtoVariable which takes a
Variable as its only arg. We have to construct (copy some fields)
because the ProtoVariable can have Attributes added and deleted,
whereasthe Variable cannot.

Similarly, there is a public constructor for Schema that takes a Netcdf.

Don't worry to much about the cost of using these constructors.
They are as shallow as possible. The immutable contents of the various
containers is shared.

> * Given an instance of Variable, it would be nice to be able to
> extract all the Dimension names with a single call such as 'String[]
> getNames()', in the same way that I can extract all the dimension
> lengths in one go using 'int[] getLengths()'. [These are both
> essentially shortcut alternatives to setting up a DimensionIterator
> and doing a call on each contained item].
> Come to think of it, I'm not
> 100% clear why a DimensionIterator is needed anyway, given that
> Dimension objects are pretty simple things, and there is always going
> to be a limited number of them in any one Variable.

getLengths() is required by the (more abstract) MultiArray interface.
I could see a use for a String [] getDimensionNames() method. The use of
DimensionIterator is really a matter of taste. It encourages us
to think about a Variable as having an associated ordered list of Dimensions.
Exposing getLengths() and getDimensionNames() (and not getDimensionIterator())
would encourage us to think of a Variable has having an associated ordered list
of ints (lengths) and a parallel associated ordered list of names for those
lengths. Comes down to the same thing, just OO vs FORTRAN thinking.



> * Efficiency of MultiArray? *As-far-I-can-tell*, the only way to
> actually extract primitive data from a MultiArray object is by using
> getDouble(int[]) (or getFloat, or whatever...) which extract a single
> scalar entry in a MultiArray. My worry is that this could be very
> costly, as compared to accessing an entry in an array of primitives,
> especially in the middle of a series of long nested loops. It would be
> nice to be able to extract data at a higher granularity. I haven't
> thought the implementation aspects of this through in any detail, but
> I suspect it could make the signature of MultiArray quite messy. Maybe
> something like an IndexIterator, but which can actually return values,
> plus iterate over a block of indices and return an array of
> values. This seems quite crucial from a performance point of view....

MultiArray.copyin() and MultiArray.copyout() do aggregate access.

Another aggregate copy is the MultiArrayImpl(MultiArray) constructor.
MultiArrayImpl exposesits internal storage, so java.lang.System.arraycopy()
can be used for aggregate copy into and out of a MultiArrayImpl.

The MultiArray framework actually allows you to avoid intermediate array copies
altogether. The pattern of use is as follows. There is a MultiArray
that contains some numbers of interest that you are going to do something with.
You isolate the numbers of interest (clipping, subsampling, and slicing)
using a MultiArrayProxy. Use the proxy's lengths to construct an IndexIterator
to visit the numbers of interest, using the primitive MultiArray
get() or set() operations.

> * More documentation & examples would be useful. A few simple examples
> which read in a real-world (but small) 3d dataset, extract & print
> information about the dataset, and extract and plot a couple of 2d
> slices in different orientations of a specified Variable, would help
> enormously.

If you look at the source, esp multiarray, grep for main().
The little tests are the only examples we have at present.

-glenn