|
|
|||
|
||||
This paper describes new software written entirely in the Java programming language (Arnold, 1996) that encompasses the functionality required by the netCDF data model, but is more abstract than previous netCDF implementations. Concrete classes are available to read and write netCDF files. The software may be extended to support data models and file formats other than netCDF. We include support for remote access to data via Java Remote Method Invocation (RMI) (Sun 1997). The work also includes an object-oriented approach to multidimensional array access which augments Java language facilities. Continuing the netCDF tradition, the Java version lowers barriers to exchange and use of scientific data.
The software is delivered as two Java packages, ucar.multiarray and ucar.netcdf . We discuss each in turn. We devote a separate section to discussion of the RMI work.
Like C and C++, the Java programming language provides an array primitive which is actually a (fixed length) vector. Also like C and C++, this primitive is used in the language to build up multidimensional arrays as vectors of vectors. For example, a two dimensional M by N array would be constructed as a vector of M references to M vectors each of length N. This strategy wastes some storage, and the waste increases for higher dimensioned arrays. In random access patterns, the added level of indirection for each dimension incurs a performance penalty as well. Numerical programs in C or C++ often avoid direct use of the language construct for multidimensional arrays. The availability of pointers and pointer arithmetic can be used to trade off against the costs, but these features are not available in Java. This suggests a need for Java library classes that abstract the notion of multidimensional array and implement commonly used array operations. These are included in the ucar.multiarray package.
The primary interface in this package is MultiArray . The methods available include reflection or introspection functions to discover the rank, shape and component type, of the array. To access the data in the array, methods which set and get single values and methods for aggregate data copy in and out of the array are provided. A special form of the copy out operation is useful for converting the MultiArray data into a Java language array. The accessors take a simple array of integer (index vector) as index argument.
A helper class, IndexIterator , is provided. It is used for stepping through the values of an index vector .
Three concrete implementations of the MultiArray interface are provided. Class ArrayMultiArray is an adapter for Java language arrays that presents the MultiArray interface. Similarly, class ScalarMultiArray wraps single objects, such as instances of java.lang.Number . Class MultiArrayImpl provides a default implementation that attempts to overcome the problems with Java language arrays outlined in the first paragraph of this section.
Scientists often wish to perform operations on multidimensional arrays that produce a new multiarray. For example, one might wish to extract a two-dimensional slice out of a three-dimensional array, or sample a grid by taking every other value. The ucar.multiarray package provides a novel framework for accomplishing this using the MultiArrayProxy / IndexMap framework. This framework is an application of the `delayed evaluation' technique. MultiArrayProxy uses an IndexMap to provide a different view of some particular MultiArray . The data is not copied. The introspection and data access functions refer to the backing array via the IndexMap to present the desired result. Concrete IndexMap implementations provide primitive mappings to slice, clip, sample, flip, transpose or flatten a MultiArray . These may be composed using nested constructors to arbitrary complexity.
The netCDF data model provides an abstraction for sampled functions between multidimensional spaces. Samplings of the domain and range are represented as named multidimensional arrays called variables. The functional relationship between the elements of the domain and the range is maintained through the use of shared dimensions.
For example, suppose we wish to model a curve in the plane with N samples. We might create a dimension named "samples" whose length is N. We would then create two variables, say "x" and "y", which each have a single dimension, "samples". The relationship between elements of x and those of y is made explicit through the shared dimension.
An important feature of netCDF is that the individual variables may have descriptive metadata, called attributes, associated with them. This might include characteristics such as the unit of measurement (as a string), identification of values that are to have special interpretation, or specifying the precision of data. The data set as a whole may also have attributes, such as a string describing the pedigree of the data.
The ucar.netcdf package includes the following basic elements.
An important difference between the Java netCDF implementation and those in other languages has to do with modifying metadata and the definition of a data set (schema). Java supports multiple threads of execution in the language, and we support distributed processing via RMI. In a multiprocessing environment, access to mutable shared state information must be synchronized. To avoid as much of this cost as practicable, we chose to make attribute values, and most of the netCDF dataset schema, immutable.
For example, consider changing the value of the "units" attribute of some variable from "feet" to "meters". To be consistent, this change would be accompanied by a change in all the data values of that attribute, multiplying each by.3048. A programmer could use the C interface to perform this sequence of operations `in situ'. The data set is inconsistent until the sequence is completed. The Java design forces the programmer to copy the data set and the new dataset is consistent upon completion of its construction.
In supporting remote access, this design decision has several advantages. It is possible to safely copy and cache the schema information into a client. With this information local to the client, the client can perform many common operations and sanity checks without incurring the round-trip remote procedure cost. If it were possible to modify attributes, there would have to be protocol for notifying clients of the change, or every attribute look up would have to go to the server.
From the early days of netCDF, we have received requests for versions of netCDF that could "run across a network". We have resisted this, since it seemed that to do it right, we would have to implement the equivalent of a secure network file system. This did not prevent others from doing important work in this area, including the Distributed Oceanographic Data System (Cornillon 1993, DODS 1998).
With the availability of Java RMI, important parts of the infrastructure are in place. The ucar.netcdf package includes classes that should make construction of systems like DODS simpler.
There are two aspects to the problem. The first is to provide a directory service indicating what data sets are available, and a mechanism for users to open or connect to a given data set. The second is to provide the remote methods to access that data set. In the RMI context, we view these as separate services. We provide interface definitions and RMI implementations for both services. However, the directory service specification we provide is very simple and would probably be extended for production systems.
The minimalist directory service is defined by the interface NetcdfService . Consistent with the Java Naming and Directory Interface (JNDI) conventions, this has a l ookup() method which returns an Object by name. In this case, the object returned implements interface NetcdfRemoteProxy. This interface wraps a single instance of Netcdf to provide remote services required in the construction of an instance of RemoteNetcdf .
Class RemoteNetcdf is the user-friendly concrete Netcdf implementation that hides most of the above details from the user.
The delayed-evaluation technique used by the MultiArrayProxy / IndexMap framework should provide significant performance advantages when applied to variables obtained from a RemoteNetcdf.
The software, documentation and
a number of coding examples my be accessed on the web starting at:
http://www.unidata.ucar.edu/software/netcdf/java/
Arnold, K and J. Gosling, 1996: The Java Programming Language , Addison Wesley.
Cornillon, P., Flierl, G., Gallagher, J., Milkowski, G., 1993: Report on the first workshop for the distributed oceanographic data system , The University of Rhode Island, Graduate school of Oceanography.
DODS 1998: <URL:http://www.unidata.ucar.edu/packages/dods/>
Rew, R. K., G. P. Davis, S. Emmerson and H. Davies, 1997: Netcdf User's Guide <URL:http://www.unidata.ucar.edu/software/netcdf/docs.html>
Rew, R. K. and G. P. Davis, 1997: Unidata's netCDF Interface for Data Access: Status and Plans. Proceedings, 13th International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography and Hydrology , February, Long Beach, California.
Sun Microsystems 1997: Java Remote Method Invocation Specification. <URL:http://www.javasoft.com/products/jdk/rmi/index.html>
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | ||||||
|
||||||