Tutorial: Writing a Datatype Implementation

Overview

To use a dataset at the scientific datatype layer, the datatype must be determined. This is done by an implementation of a ucar.nc2.dt.TypedDataset.



The TypedDataset contains a number of methods to access basic dataset discovery metadata:

  public String getTitle();
public String getDescription();
public String getLocationURI();
public Date getStartDate();
public Date getEndDate();
public ucar.unidata.geoloc.LatLonRect getBoundingBox();

and others to access data variable information [VariableSimpleIF exposes size, shape, and type but does not provide data access methods]:

  public List getDataVariables();
public VariableSimpleIF getDataVariable( String shortName);

It also provides access to underlying netCDF objects:

  /** Return underlying NetcdfFile, or null if none. */
public ucar.nc2.NetcdfFile getNetcdfFile();

 public List getGlobalAttributes();
public ucar.nc2.Attribute findGlobalAttributeIgnoreCase( String name );

As well as some housekeeping methods:

  /** Close all resources associated with this dataset. */
public void close() throws java.io.IOException;

/** Show debug / underlying implementation details */
public String getDetailInfo();

Various specializations are available and provide query and access capabilities specific to the Datatype. For instance, PointObsDataset, StationObsDataset, TrajectoryObsDataset, GridDataset and RadialDatasetSweep. (all part of the ucar.nc2.dt package) are all subclasses of TypedDataset.

In general, a seperate TypedDataset class must be written for each convention for encoding a dataset of a given type. This obviously is burdensome, and data providers are encouraged to use existing Conventions for writing their datasets. Also, we are working on developing a new convention for observation data that would be appropriate for point, station, and trajectory datasets. Radar data convention (???).

A number of existing "conventions" are supported in each of the subtypes mentioned (point, station, trajectory, grid, and radial).

Details on TrajectoryObsDataset

The TrajectoryObsDataset interface provides access to a collection of trajectories. Each trajectory in the collection is then accessed through an instance of the TrajectoryObsDatatype interface. As with TypedDataset, some basic dataset discovery metadata is available for each trajectory:

  public String getId();
public String getDescription();
public Date getStartDate();
public Date getEndDate();
 public ucar.unidata.geoloc.LatLonRect getBoundingBox();

as is the size of the trajectory and a description of the variables available:

  public int getNumberPoints();

public List getDataVariables();
public VariableSimpleIF getDataVariable( String name );

A number of ways to access the data are also provided. This includes access to each individual point in the trajectory:

  public PointObsDatatype getPointObsData(int point) throws IOException;

a method for iterating through each point in the trajectory:

  public DataIterator getDataIterator( int bufferSize ) throws IOException;

and several methods for accessing a range of data points from a trajectory:

  public ucar.ma2.Range getFullRange();
public ucar.ma2.Range getPointRange( int point) throws InvalidRangeException;
public ucar.ma2.Range getRange( int start, int end, int stride) throws InvalidRangeException;

public ucar.ma2.Array getTime( ucar.ma2.Range range) throws IOException, InvalidRangeException;
public ucar.ma2.Array getLatitude( ucar.ma2.Range range) throws IOException, InvalidRangeException;
public ucar.ma2.Array getLongitude( ucar.ma2.Range range) throws IOException, InvalidRangeException;
public ucar.ma2.Array getElevation( ucar.ma2.Range range) throws IOException, InvalidRangeException;
public ucar.ma2.Array getData( ucar.ma2.Range range, String parameterName) throws IOException, InvalidRangeException;

Implementation Issues

  1. Identifying conventions: Currently there is not a broadly recognized convention for trajectory data. This means that various conventions are in use, some well defined and others not. Thus, determining the convention a particular dataset follows is not a trivial matter.
  2. Determining time, latitude, longitude, elevation coordinates: How time and location is encoded in a particular dataset depends on the convention followed by that dataset. Which variables are the time and location coordinates is not always well defined in a given convention. Many of the current implementations use various heuristics to determine the coordinates.
  3. Implementing the various data access methods: The ease with which the various data access methods can be implemented depends on the structure of the dataset. Access patterns vary greatly between point by point access and access over a range. (??? For instance, a netCDF file with time as the unlimited dimension will ... ???)

Implementation Details

We have two abstract classes that do the majority of the current trajectory implementations, especially for the data access methods. These classes are: ucar.nc2.dt.trajectory.SingleTrajectoryObsDataset and ucar.nc2.dt.trajectory.MultiTrajectoryObsDataset. They both extend ucar.nc2.dt.TypedDatasetImpl and implement ucar.nc2.dt.TrajectoryObsDataset.

The main jobs the concrete subclasses perform is to check if the dataset follows an appropriate convention and then use that convention to determine (or calculate) the coordinate variables. For instance, the ucar.nc2.dt.trajectory.RafTrajectoryObsDataset checks for the "Convention" attribute for appropriate values:

    Attribute conventionsAtt = ncf.findGlobalAttributeIgnoreCase( "Conventions" );
if ( ! conventionsAtt.getStringValue().equals( "NCAR-RAF/nimbus" ) )
throw new IllegalArgumentException( "File <" + ncf.getId() + "> not a \"NCAR-RAF/nimbus\" convention file." );

And then determines which variables are the time, latitude, longitude, and elevation coordinate variables:

    Attribute versionAtt = ncf.findGlobalAttributeIgnoreCase( "Version" );
if ( versionAtt.getStringValue().equals( "1.2"))
{
timeDimName = "Time";
timeVarName = "time_offset";

latVarName = "LAT";
lonVarName = "LON";
elevVarName = "ALT";
...
}