Writing an IOSP : Overview
A client uses the NetcdfFile, NetcdfDataset, or one of the Scientific Dataset APIs to read data from a CDM file. These provide a rich and sometimes complicated API to the client. Behind the scenes, when any of these APIs actually read from a dataset, however, they use a very much simpler interface. The Netcdf Java library has many implementations of this interface, one for each different file format that it knows how to read. This design is called a Service Provider pattern; since the implementations are providing input/output, we call them I/O Service Providers, or IOSPs for short.
IOSPs are managed by the NetcdfFile class. When a client requests a dataset (by calling NetcdfFile.open), the file is opened as a ucar.unidata.io.RandomAccessFile (an improved version of java.io.RandomAccessFile). Each registered IOSP is then asked "is this your file?" by calling isValidFile( ucar.unidata.io.RandomAccessFile). The first one that returns true claims it.
The ucar.nc2.IOServiceProvider interface
public interface ucar.nc2.IOServiceProvider {
// Check if this is a valid file for this IOServiceProvider.
public boolean isValidFile( ucar.unidata.io.RandomAccessFile raf) throws IOException;
// Open existing file, and populate ncfile with it.
public void open(ucar.unidata.io.RandomAccessFile raf, NetcdfFile ncfile, CancelTask cancelTask) throws IOException;
// Read data from a top level Variable and return a memory resident Array.
public ucar.ma2.Array readData(ucar.nc2.Variable v2, List section) throws java.io.IOException, ucar.ma2.InvalidRangeException;
// Read data from a Variable that is nested in one or more Structures.
// If there are no Structures in the file, this will never be called.
public ucar.ma2.Array readNestedData(ucar.nc2.Variable v2, java.util.List section) throws IOException, ucar.ma2.InvalidRangeException;
// Close the file.
public void close() throws IOException;
// Extend the file if needed in a way that is compatible with the current metadata.
public boolean syncExtend() throws IOException;
// Check if file has changed, and reread metadata if needed.
public boolean sync() throws IOException;
// A way to communicate arbitrary information to an iosp.
public void setSpecial( Object special);
// print Debug info for this object.
public String toStringDebug(Object o);
// Show debug / underlying implementation details
public String getDetailInfo();
}
- You must examine the file that is passed to you, and quickly and accurately determine if it is can be opened by this IOSP.
- You will then be called again with the same file, and an empty NetcdfFile object, which you will fill. If you need to do a lot of I/O, you should periodically check cancelTask.isCancel(), and if its true, return immediately. This allows users to bail out of opening a file if its taking too long.
- Data will be read from Variable through this call. The section list is a list of ucar.ma2.Range which define the requested data subset.
- If you use Structures, data for Variables that are members of Structures are read through this method.
- Release all resources, for example, by calling RandomAccessFile.close().
- If the file may change since it was opened, you may optionally implement this routine. The changes must not affect any of the structural metadata. For example, in the NetCDF-3 IOSP, we check to see if the record dimension has grown.
- If the file may change since it was opened, you may optionally implement this routine. The structural metadata is allowed to change. For example, in the GRIB IOSP, we check to see if new Grib records were added, and we may add or modify existing coordinate variables if they have.
- This allows applications to pass an arbitrary object to the IOSP, through the NetcdfFile.open( location, buffer_size, cancelTask, spiObject) method. As a rule, you should not count on getting any such special information.
- A little-used debugging aide, return null or an empty String.
- Here you can pass any information that is useful to debugging. It can be viewed through the ToolsUI.
Design goals for IOSP implementations
-
Allow access to the dataset through the netCDF/CDM API
-
Allow user access to every interesting bit of information in the dataset
-
Hide details related to file format (eg links, file structure details)
-
Try to mimic data access efficiency of netCDF-3
-
Create good use metadata: accurate coordinate systems, enable classification by scientific data type
-
Create good discovery metadata in the global attributes: title, creator, version, date created, etc.
-
Follow standards and good practices
Design issues for IOSP implementors
-
What are the netCDF objects to expose? Should I use netCDF-3 or full netCDF4/CDM data model? Attributes vs Variables?
-
How do I make data access efficient? What are the common use cases?
-
How much work should I do in the open() method? Can/should I defer some processing?
-
Should I cache data arrays? Can I provide efficient strided access?
-
What to do if dataset is not self contained : external tables, hardcoding?
This document is maintained by John Caron and was last updated on Aug 3, 2007