Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

Unidata’s Design for Multidimensional Arrays in Java

John Caron, Unidata/UCAR

ACM Java Grande 2000 Conference

 June 3, 2000

Introduction

               

The numeric community has found fatal performance weaknesses in using  "arrays of arrays" to represent multidimensional arrays in Java [1], [2]. The possibility of non-rectangular (ragged) arrays and aliasing prevents compiler optimizations that are essential to fast numeric performance. Modifications to the Java language have been considered to make Java more suitable for numeric computation [3], [4].  Recent work has focused on defining standard classes for multidimensional arrays, such as IBM's Array package [5] and the Colt package from CERN [6].

 

Unidata’s perspective differs somewhat from the pure performance-oriented concerns of JavaGrande’s Numeric Working Group. Unidata is the developer of netCDF (network Common Data Form), an API with multiple language bindings (Java, C, Fortran77, Fortran90, C++, Perl, Python) for multidimensional arrays and their metadata, and a hardware independent file format to make these persistent [7], [8]. Our main users are scientific programmers, especially in earth sciences.  We have recently redesigned our Java API for improved performance and functionality. By necessity we support arbitrary-rank arrays of primitives, and we also find that the ability to manipulate arrays in type-independent ways can lead to powerful and elegant code in some circumstances.

 

In the next section I present Unidata’s design and implementation of the ucar.ma2 package for multidimensional arrays of primitives stored in memory. In section two I compare it to similar packages from other groups. In section three I present timing results using the standard Java environment, and I summarize in the last section.

 

1. Unidata’s Multidimensional Array Package           

Array Interface

 

                We use an interface to define Array, our abstract data type (ADT) for multidimensional arrays.  This interface definition is not always identical to the public methods of the implementing classes, and can simplify the task of the first-time user in learning how to use the ADT.  It is important for general data handling that all functionality is available through this interface.

 

Arrays can have arbitrary rank, and we provide concrete implementations for arrays of rank 0-7 for efficiency. While the underlying storage can use any of the Java primitive types (double, float, long, int, short, byte, char, boolean), the data can be accessed in a type independent way, with the implementing class casting the data to the requested type (and throwing a runtime ForbiddenConversionException if the cast is illegal), or using a direct assign when the requested type is the same as the data type.

 

The data type, rank, and shape of an array are immutable, while the data values themselves are mutable. Generally this makes Arrays thread-safe, and no synchronization is done in the Array package. There is the possibility of non-atomic read/writes on 64 bit primitives (long, double). In this case the user should add their own synchronization if needed. Presumably 64-bit CPUs will make those operations atomic also.

 

Logical “views” of the array can be created, including flip, section, transpose, slice and permute. These logically reorder or subset the data without copying. Views can be composed without loss of efficiency of data access. Operations like copy, sectionCopy, and reshape make a copy of the data, and also physically reorder the data to be in logical order.

 

The following summarizes the Array interface:

 

public interface Array {

 

    // array shape and type

  public long   getSize();              // total # elements

  public int    getRank();              // array rank

  public int[]  getShape();             // array dimension sizes    

  public Class  getElementType();       // data type of backing array

 

    // accessor helpers

  public Index         getIndex();          // random access 

  public IndexIterator getIndexIterator();  // sequential access

  public IndexIterator getRangeIterator(Range[] ranges); // access subset

  public IndexIterator getIteratorFast();  // arbitrary order

 

    // accessors: for each data type (double, float, long, int, short,     

    // byte, char, boolean) there are methods of the form eg:

  public double getDouble(Index ima);

  public void   setDouble(Index ima, double value);

  ...

 

    // create new Array, no data copy

  public Array flip( int dim);           // invert dimension

  public Array permute( int[] dims);     // permute dimensions

  public Array reduce();          // rank reduction for any dims of length 1

  public Array reduce(int dim);   // rank reduction for named dimension

  public Array section( Range[] ranges); // create logical subset

  public Array slice(int dim, int val);         // rank-1 subset

  public Array transpose( int dim1, int dim2);  // transpose dimensions

 

    // create new Array, with data copy

  public Array copy();                  

  public Array sectionCopy( Range[] ranges);    // subset

  public Array reshape( int [] shape);   // total # elements must be the same

}

 

Accesses to specific array elements are made using an Index object, for example:

 

double sum = 0.0;

   Index index = A.getIndex();

   int [] shape = A.getShape();

    for (i=0; i<shape[0]; i++)

     for (j=0; j< shape[1]; j++)

      for (k=0; k< shape[2]; k++)

    sum += A.getDouble(index.set(i,j,k));

 

Note that in this example, A can be of any type convertible to a double.  Index has various methods for setting the element index:

 

public interface Index {

    // general

  public Index set(int [] index);

  public void setDim(int dim, int value);

 

    // convenience methods for rank 0-7

  public Index set(int v0);        // set index 0

  public Index set(int v0, int v1);      // set index 0 and 1

  public Index set(int v0, int v1, int v2); // set index 0,1,2

  ...

 

  public Index set0(int v);      // set index 0

  public Index set1(int v);      // set index 1

  public Index set2(int v);      // set index 2

  ...

}

Because an Index object stores state, threads that share Arrays must obtain their own Index from the Array.

 

An IndexIterator is used to sequentially traverse all data in an Array in logical (row-major) order. For example, logical order for A(i,j,k) has k varying fastest, then j, then i. Note that because of the possibility that A is a flipped or permuted view, logical order may not be the same as physical order. Example:

double sum = 0.0;

IndexIterator iter = A.getIndexIterator();

while (iter.hasNext())

        sum += iter.getDoubleNext();

 

Note that in this example A can be of arbitrary rank. The interface for an IndexIterator is summarized here:

 

public interface IndexIterator  {

  public boolean hasNext();

 

    // for each data type

  public double getDoubleNext();

  public double getDoubleCurrent();

  public void setDoubleNext(double val);

  public void setDoubleCurrent(double val);

  ...

}

There are two special kinds of iterators: Array.getIteratorFast() returns an iterator that iterates over the array in an arbitrary order. It can be used to make iteration as fast as possible when the order of the returned elements is immaterial, for example in the summing example above. Array.getRangeIterator() returns a “range” iterator that iterates over a subset of an array, in logical order. In the following example, the sum is made only over the first 10 rows, and all columns, of the array:

 

  int sum = 0;

  IndexIterator iter =A.getRangeIterator(new Ranges[2]{new Range(0,9), null});

  while (iter.hasNext())

    sum += iter.getIntNext();

 

Array Implementation

 

ArrayAbstract is the superclass for all implementations of Array, and there is a concrete class that extends it for each data type, for example, ArrayDouble is the concrete class for double arrays.  In addition, for each data type, there is a concrete subclass for each rank 0-7, for example ArrayDouble.D3 is a concrete class specialized for double arrays of rank 3. These rank-specific classes are static inner classes of their superclass. This design allows handling arrays completely generally (through the Array interface), in a rank-independent way (though the Array<Type> classes), or in a rank and type specific way for ranks 0-7 (through the Array<Type>.D<rank> classes). See Figure 1.

 

The most general way to create an Array is to use the static factory method in ArrayAbstract:

 

  public abstract class ArrayAbstract implements Array, Cloneable {

    static public ArrayAbstract factory( Class type, int [] shape);

    ...

  }

The type-specific subclasses can be instantiated directly with an arbitrary rank. These also add type-specific get/set accessors, for example:

 

  public class ArrayDouble extends ArrayAbstract {

      // constructor

    public ArrayDouble(int [] dimensions);

 

       // type-specific accessors

    public double get(Index i);

    public void set(Index i, double value);

       ...

  }

 

Users will likely instantiate rank and type specific classes directly. These also add rank-specific get/set routines, for example:

 

  public static class D3 extends ArrayDouble {

      // constructor

    public D3 (int len0, int len1, int len2);

 

      // type and rank specific accessors

    public double get(int i, int j, int k);

    public void set(int i, int j, int k, double value);

}

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

             Figure 1. Array class heirarchy

 

                          

 

These implementations use dense one-dimensional Java arrays of primitives for the actual data storage. A sparse storage format should also be possible, but we have not implemented this.

 

IndexImpl is a concrete, general rank implementation of Index, and is extended by rank specific subclasses for efficiency (we have rank 0-7 implementations).  ArrayAbstract and its subclasses have an IndexImpl of the appropriate rank that is delegated all rank-specific functions.  This orthogonal design keeps the number of classes small, and makes adding new ranks or data types quite simple.

 

The “logical view” operations (flip, section, transpose, slice and permute) are implemented by manipulating the index calculation within the IndexImpl object.  These operations are affine, as is the operation that transforms the n-dim index into the 1-dim element index, and therefore any composition is an affine transformation.  The resulting transformation is immutable, and can be computed during the IndexImpl object construction. Therefore there is no extra cost associated with the index calculation for these operations (or any composition of them) during element access. These operations do logical data reordering; physical reordering can be done by making an array copy.

 

                An IndexIterator traverses array elements in logical order, which we have defined as row-major (as in C).  An iterator can in principle be more efficient than other element accesses because 1) the index values cannot be out of range, and therefore do not need to be bounds checked, and 2) the element calculation usually changes by a fixed stride each time.  We take advantage of these facts in our implementation, as well as package-private accessor methods, to reduce the number of method calls per data access from 3 to 2.

2. Related Work

IBM Ninja Array package

 

The goals of the IBM design [5] are:

·         provide high-level array operations (e.g. addition, section, transpose) like Fortran 90

·         speed comparable to optimized C and Fortran code

·         preserve Java safety and security

·         actual data layout is not exposed

·         no changes to Java language semantics

 

This design creates multidimensional arrays for all the primitive types (boolean, byte, char, short, int, long, float, double), for Objects, and also for complex numbers. The current implementation allows ranks 0-3, but general rank arrays are not possible.  Like Unidata, dense 1D Java arrays are used for data storage.

 

At the top of the class inheritance graph is the abstract class Array, which provides query, reshaping, and get/set Object wrapper methods. (The following are simplified summaries, with exceptions not shown):

 

public abstract Array {

  public int rank();

  public int[] shape();

  public int size(int dim); // size of this dimension

  public int size();              // product of dimensions

  public int last(int dim); // index of last element

 

    // general accessors

  public Object getAsObject(int[] index);      

  public void setToObject(int[] index, Object obj);

 

  public abstract Array permuteAxes(int permarray[]); // no copy

  public abstract Array reshape(int size[]);            // copy

}

 

All array classes have the abstract class <type>Array  as superclass. For example doubleArray is an abstract class that adds single element get/set methods, specifying the index using an int array:

 

public abstract class doubleArray extends Array {

   ...

   public abstract double get(int[] index) ;

   public abstract void set(int [] index, double d) ;

}

The concrete class doubleArray3D adds methods specific to rank 3 arrays:

 

 public final class doubleArray3D extends doubleArray {

   public doubleArray3D(int n0, int n1, int n2);

   public doubleArray3D(doubleArray3D arrayin);

   public doubleArray3D(double datain[][][]);

   ...

 

The following methods create subarrays of the original array, with the same data backing store. Range is a class that specifies a range of indices and an optional step (we also use Range in ucar.ma2). These variations of the section() method show the possible ways that a section can be created, either by fixing an index at a certain value (which reduces the rank of the resulting Array) or by limiting to a subset specified by the Range object.

 

   public doubleArray0D section(int i, int j, int k);

   public doubleArray1D section(int i, int j, Range K);

   public doubleArray1D section(int i, Range J, int k);

   public doubleArray1D section(Range I, int j, int k);

   public doubleArray2D section(int i, Range J, Range K);

   public doubleArray2D section(Range I, int j, Range K);

   public doubleArray2D section(Range I, Range J, int k;

   public doubleArray3D section(Range I, Range J, Range K);

 

The element-wise relational operations equals, greater, greaterEquals, less, lessEquals, unequals  are implemented using methods like the following example:

 

   public booleanArray3D less(double scalar);

   public void less(double scalar, booleanArray3D result);

   public booleanArray3D less(doubleArray3D arrayin);

   public void less(doubleArray3D arrayin, booleanArray3D result);

 

The element-wise arithmetic operators div, minus, plus, and times are implemented using methods like the following example:

 

   public doubleArray3D plus(double scalar);

   public void plus (double scalar, doubleArray3D result);

   public doubleArray3D plus (doubleArray3D arrayin);

   public void plus (doubleArray3D arrayin, doubleArray3D result);

   public void plusAssign (double scalar);

   public void plusAssign (doubleArray3D arrayin);

 

Data values are set using the assign methods (and the set methods described next):

 

   public void assign(double value);

   public void assign(doubleArray3D arrayin);

 

There are a large number of accessor methods with the form:

 

   public doubleArraynD get(I idx0, I idx1, I idx2);

   public void get(I idx0, I idx1, I idx2, doubleArraynD result);

   public void set(I idx0, I idx1, I idx2, double source);

   public void set(I idx0, I idx1, I idx2, doubleArraynD source);

 

where I = single index, list of indices (scatter/gather) or range of indices and doubleArraynD is an array of the appropriate rank.  The get() methods copy the data, either to a user-supplied array or to a new array. The set() methods store a scalar or array value into the array.

 

Finally, there are several additional reshape methods which also make a copy of the data (the product of the size parameters must be equal to the array size):

   public doubleArray1D reshape(int size0);

   public doubleArray2D reshape(int size0, int size1);

   public doubleArray3D reshape(int size0, int size1, int size2);

 

In summary, the IBM Array implementation has array classes specialized for each data type and rank, with large numbers of operators that replicate Fortran 90 functionality. Arrays of arbitrary rank are not possible, and there is little support for data-type independent array handling. There are no array iterators. Due to the non-orthogonal design, there are a large number of methods in each class, and the number increases with rank.

 

These classes have been designed in the context of IBM’s experimental static compiler (the High Performance Compiler for Java). Along with the use of the fused multiply-add (fma) hardware instruction, the Ninja group has used these classes and compiler to achieve numeric benchmarks for Java that ran within 50-90% of optimized Fortran [9].  These results indicate the potential for Java to be a viable numeric platform in the future.

 

CERNS’ Colt package

 

The Colt package was developed for scientific computing, particularly for the domain of high-energy physics [6].  It implements 1,2 and 3 dimensional arrays of doubles or Objects, using either a dense 1D Java array or a sparse, hashed table for the data storage. Like the IBM design, all of the functionality is in the concrete, rank-specific classes, and arbitrary rank arrays are not possible.

 

Like our design and IBM’s, Colt supports the creation of array “views” implemented using logical index mapping rather than data copying. Colt implements all of our operations (flip, subset, transpose, slice, permute) as well as scatter/gather, sort, and partition. Colt also provides a number of linear algebra operations (e.g. BLAS, matrix decomposition) hand tuned to run efficiently with their array classes.

 

3. Timing Tests

 

The following tests are on a 333 MHz Pentium II NT machine, 256M memory, using
Java 2 SDK v 1.2.2-001 Windows 95/98/NT Production Release, using the currently shipping 1.01 HotJava JVM.  Numbers can vary by 20% or so between runs. The units are reported in CPU cycles/operation (smaller is better). The first three timings are discarded, and then 10 calls are averaged.  I originally also examined results from the Symantec JIT compiler.  It appears that the JIT compiler does a better job on simpler code, and HotSpot does better on more complicated code. Generally, HotSpot does better on these codes. Since HotSpot will be the standard environment in the future, I report only those results.

 

Read/Write Test. This is a simple write and read back on a 3D double array of 100 * 200 * 200 (4M) elements. The write test fills the array with unique values, and the read test then reads and tests that all values are correct. Index bounds checking are done on each dimension. The large size array (32 MB) should defeat any memory cache effects.

 

This test should not be taken as very indicative of the performance for the IBM or Colt packages, as these have both been optimized for numeric algorithms, not simple read/write access. The IBM package in particular is very high-performing in the context of their experimental compiler. Here we are restricting ourselves to performance using the standard version of Java.  The test can be used to judge the success of Unidata’s package, which is intended for efficient access to arrays. Numeric algorithms are intended to be built using our Array package in an orthogonal design that does not expose the internal data storage. 

 

The memory transfer rate obtained from the C code results in Figure 1 translates to a bandwidth of 133MB/sec for writing and 80MB/sec reading, which is likely close to the limit for CPU-to-memory transfer. Comparing Java to C using 1D arrays, Java has a 15% increase for writing and a 60% increase for reading arrays. 

 

The "fatal performance weaknesses" of Java 3D arrays do not appear in this test. The problem with 3D arrays appears only when trying to create very high performance compilers that can do aggressive optimizations [5].  In fact, Java 3D arrays do as well or better than Java 1D arrays. The compiler may be successful interleaving bounds checking with memory wait states to achieve this performance.