Georeferencing with Java: An Example of Executable Metadata Russell K. Rew Unidata Program Center University Corporation for Atmospheric Research Boulder, Colorado 1. INTRODUCTION Metadata is data about data; it describes data and facilitates the use of data by others, especially when it is integrated with the data it describes. Examples of meta- data include the units in which the data values are repre- sented and information that provides a way to associate a time or geospatial location with each data value (geo- referencing metadata). This paper describes the imple- mentation of a prototype for executable metadata, one possible approach to making scientific data more useful for visualization and analysis applications. The georeferencing prototype described here pro- vides information about the location of data in a portable executable form, as an array of bytes that can be recon- stituted into a Java object that implements the georefer- encing functionality needed by applications. The georeferencing object provides a way to package trans- formations between geospatial coordinates and data indices. It supports writing applications that use a simple, general, and abstract interface for obtaining location information about data. This approach is not practical with languages such as C++, C, or Fortran, because these languages provide no portable representation for executable content. What is proposed falls short of a complete object-oriented architecture, as described, for example, in Vckovski (1996) in which all data is provided as objects rather than bits; instead, this prototype pack- ages only some useful metadata into object form. One problem with more traditional georeferencing standards is lack of complete support for the wide variety of ways for representing geospatial information in the geosciences. The ingenuity and creativity of data provid- ers in inventing new ways to represent geospatial infor- mation compactly tends to confound efforts to anticipate all needs in a single standard. For example, the "quasi- regular thinned grids" used by NCEP in packaging AVN model output data in GRIB form, as described in Dey (1996), present a challenge to other georeferencing models. Georeferencing objects provide a more comprehen- sive and flexible representation. If a georeferencing scheme can be implemented in software, it can be encapsulated in executable metadata representing a georeferencing object. "Smart data" interfaces such as the one described here may improve support for object-oriented applica- tions. Unidata hopes to ultimately help develop some of the infrastructure and software to make practical an effective division of responsibilities between data and applications, and to improve access to scientific data used for research and education in the atmospheric sci- ences. Experiences with a Java prototype for platform- independent data georeferencing make this approach appear practical for Java applications. 2. THE PROBLEM The general problem addressed is how to deal with the complexity of data from diverse sources in visualiza- tion and analysis applications by moving much of the data-specific complexity from applications to the data itself. The specific problem dealt with by the current pro- totype is packaging only one aspect of the data-specific complexity, geospatial information, into a form that is por- table, general, accurate, secure, compact, efficient, and extensible. Client applications should be able to deal with new datasets that use new forms of executable georefer- encing without requiring changes to the application. In this proof-of-concept prototype, we have limited the problem to a simple representation of transforma- tions between planar grid coordinates (two-dimensional indices) and surface latitude-longitude coordinates. Hence, the problem is simplified to: · determining the (lat, lon) location correspond- ing to any (i, j) point of a planar grid · determining the real (i, j) grid indices corre- sponding to any (lat, lon) location The first capability makes it possible to plot data defined on the grid on a map display. The second capa- bility supports determining grid indices corresponding to a mouse click on a map display. Together, these transfor- mations provide a simple and general interface for use in applications. A few other convenience methods are also needed to determine the shape of the associated georef- erenced grid. A more complete interface might include vertical and time coordinates, handle one-dimensional domains (e.g. station data or trajectories) and deal with higher dimensional grids with data-dependent coordinate trans- formations. The issues encountered in implementing and evaluating the simpler two-dimensional case, however, are thought to be representative of the more general case. 3. IMPLEMENTATION ISSUES The prototype Java application we implemented, MapGeoGrid, reads the data and metadata from a spec- ified dataset and plots the georeferenced grids associ- ated with variables in the dataset on a world image. For this purpose, we adapted the MapApplet package described in Callahan (1997), writing a new MapTool subclass, and providing a custom class loader. Source for the MapGeoGrid prototype application is available from . Security is an important issue, even in implementing simple class methods that only perform arithmetic com- putations to implement coordinate transformations. In Java, classes loaded with a custom class loader are in a separate name space from local classes, to prevent some potential security problems. Running the georefer- encing methods in an applet security context would be sufficient to protect the client environment, but applets cannot use a custom class loader required by this tech- nique. Thus an applet must download the georeferencing class from its remote server; an application may use other Java security mechanisms, for example digital sig- natures to authenticate the origin or association of exe- cutable bytes with source code. Since loaded classes and application classes can- not share names, a common pre-defined interface that is a super-class of the loaded class is used by the applica- tion. When the custom class is loaded, an object of that class is constructed and cast into an instance of the common superclass it implements. Then methods of this instance are invoked to perform georeferencing opera- tions on the data. Performance was completely adequate in the proto- type implementation, because expensive operations in setting up the transformations are only performed once, when the constructor for the georeferencing class is invoked. Performance could be further enhanced by including methods in the abstract interface for transform- ing one- and two-dimensional arrays of grid locations and earth locations, instead of only providing point-at-a- time methods. A default implementation using loops invoking the point-at-a-time methods would minimize the burden on data providers who did not choose to take advantage of these optimizations. 4. USAGE In a typical usage scenario, a data provider would package georeferenced data by providing with the data the portable byte codes for a Java class that implements a GeoGrid interface to the data. Data users would access the data and the execut- able metadata together. Mechanisms for sharing georef- erencing metadata among multiple variables or datasets, as well as for multiple metadata variables per dataset would be the same as for conventional metadata. The users' applications, written in terms of the abstract Geo- Grid interface, would read the byte codes for the particu- lar subclass of GeoGrid associated with the dataset using the custom class loader developed for this proto- type, instantiate an object of that class, and use the methods of the resulting object for georeferencing. The same uniform GeoGrid class methods would be used for data from multiple sources, but different data-specific georeferencing methods would actually be invoked for different GeoGrid objects. For purposes of evaluation of the prototype, we chose as a first example of geodata some model output data from NCEP, distributed in GRIB form on the National Weather Service High Resolution Data Service. The NCEP "211grid," a 93 by 65 regional Lambert Con- formal grid over the continental U.S., has non-trivial transformations between index space and latitude-longi- tude. We converted a collection of GRIB products that use this grid into a single netCDF file in which the georef- erencing information was stored in a byte array variable, referenced by name by data variables defined on the grid. 5. EVALUATION The prototype was only designed to test the practi- cality and usefulness of the idea of executable metadata and to uncover any unanticipated problems or issues that implementation of the idea would reveal. The results are mixed: we end up with "half-objects" that can be used as ordinary data files by conventional applications using traditional data access interfaces, but that also behave like simple georeferencing objects for Java appli- cations that activate the executable content. But for the data to be useful to conventional applications, the geo- referencing data must also be represented convention- ally. 5.1 Benefits Benefits to using this technique for implementing executable metadata, and georeferencing metadata in particular, may include: · simpler application interfaces to georeferenced data; · portable executable metadata for Java- equipped platforms in distributed environments; · applications immune to changes in georefer- encing; · reduced possibility for misinterpreting location of data, since georeferencing is implemented once by the data supplier rather than many times by the data users; and · compact representation for complex georefer- encing, because size does not increase with spatial resolution. As a concrete example of the compactness achiev- able with Java byte codes, the class data needed for exe- cutable metadata for the "211 grid" required about 2000 bytes. In contrast, representing the grid as a two dimen- sional array of single precision latitudes and longitudes requires over 48,000 bytes, and efficiently supports only one of the two transformations between index space and latitude-longitude space. Another potential advantage of this approach in a transition to object-oriented data access is that the inter- faces and specific class implementations of the inter- faces required can later be incorporated into full-blown data objects. 5.2 Limitations and Remaining Problems The primary limitation of this technique is its require- ment that applications that make use of the metadata must be written in Java. (Something like the CORBA infrastructure might provide interfaces for applications written in other languages, but access to a Java Virtual Machine would still be required to execute the georefer- encing methods.) If only Java applications can use the executable content of such data, why not provide the data as actual objects of a class that implements the abstract georefer- encing interface, rather than as bits containing a repre- sentation of such a class? A pure object-oriented approach would package all appropriate methods with the data. The proposed approach may be more practical in the short run, because data providers may be reluc- tant to also become code providers (and maintainers) to the extent necessary to provide the classes that wrap their data as actual objects. The more gradual approach of refining a standard applications interface for one aspect of metadata at a time may be a more realistic way to eventually achieve the benefits of object-oriented data and applications. Another limitation is lack of any simple way to browse or search metadata represented only in execut- able form. Information about data coverage and resolu- tion is only available by invoking the methods of the metadata. Traditional ways to represent georeferencing infor- mation have included: · simple but rigid standards (e.g. all data must be on regular latitude-longitude grids); · conventions specified in standard documents separate from the data; · specialized data formats that anticipate and provide compact representations for some common ways to represent geodata. None of these traditional approaches provides the extensibility and power of executable metadata, but they have other advantages. Language-independence is an important characteristic in environments where other languages are widely used for scientific applications. Language-independence is probably also necessary for long-term archives, since useful data may out-last Java or any other particular language. It is prudent to insist that the Java sources for executable metadata be stored with any archives that include such data. 6. CONCLUSIONS Storing georeferencing functions with the data for execution when and where the data is accessed avoids the need for applications to support elaborate conven- tions for parameterizing many different kinds of georefer- encing. Instead, applications may assume a simple interface for accessing georeferenced data, and depend on the data to realize specific implementations of that interface. Whether these benefits are enough to out- weigh the limitation that only Java applications may make use of such metadata remains to be determined. If this approach is useful, it may also be applied to other kinds of complex metadata, such as calibration, interpolation algorithms, derivative calculations, error estimates, and the representation of irregular domains. Object-oriented approaches to data access enhance the practicality of developing interoperable applications that can better deal with the complexity of multiple forms of scientific data. Executable metadata may be useful as an incremental approach to achieving some of the benefits of object-oriented architectures for scientific data access, visualization, and analysis. 7. REFERENCES Callahan, J., S. Hankin, J. Davison, 1997. "Improving Web Access To Gridded Data: Java Tools for Cli- mate Data Servers," 13th International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Anaheim, California, Am. Meteor. Soc., 189-191. Dey, C. H., Office Note 388, GRIB (edition 1): The WMO Format for the Storage of Weather Product Informa- tion and the Exchange of Weather Product Mes- sages," 1996. NCEP Vckovski, A., F. Bucher, 1996. "Virtual Data Sets - Smart Data for Environmental Applications," The Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Sante Fe, NM, January 1996. . GEOREFERENCING WITH JAVA: AN EXAMPLE OF EXECUTABLE METADATA Russell K. Rew* Unidata Program Center University Corporation for Atmospheric Research Boulder, Colorado