Re: Meeting about improving the GRD API.

To: Ian Barrodale <ian@xxxxxxxxxxxxx>
Subject: Re: Meeting about improving the GRD API.
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Fri, 09 Feb 2007 09:13:28 -0700

Hello all, comments are in-line:

Ian Barrodale wrote:

Hi Ted, John, Russ, and John:
Thank you all for taking the time yesterday to both listen to our storyand to further enlighten us about your work. It was much appreciated.
The note below provides a possible implementation route, and somequestions. Please feel free to point out any shortcomings in ourproposed approach, and please provide any answers that come to mindregarding our questions.
Thanks again,
Ian
======================
Goal
-------
Based on feedback from BCS Grid DataBlade customers and, in particular,Ted Habermann, we feel that there may be some value in providingalternate ways of accessing data from a Grid DataBlade (GRD) - powereddatabase through existing widely-used protocols and methods. Note thatby "accessing", we really mean just the reading part, as we alreadyprovide, through the BCS Gridded Data Loader client, a means ofconveniently ingesting data from many forms into a GRD-powereddatabase. One method of accessing the data would be to cast it in theform of the Common Data Model (CDM) supported by the Java netCDF APIfrom UCAR. The advantage of this is that:
    * users would be able to write software using the Java netCDF API
      (which is fairly straightforward to use and well documented) for
      accessing GRD data, and
    * data providers can use a GRD-powered database and provide access
      to it through OPeNDAP, WCS, netCDF files, etc. using the Java
      netCDF API (see page 53 attachment, modified from the slide on
      page 53 of
http://www.unidata.ucar.edu/staff/caron/presentations/CDM.ppt).
Our understanding of a possible implementation
---------------------------------------------------------------------

To handle GRD data from the Java netCDF API, we would have to:
(i) Create a GRD I/O service provider for the Java netCDF API (see page38 attachment) that can communicate with the GRD database using acombination of JDBC and the existing Java GRD API. The Java netCDF APIuses a service provider architecture to handle reading multipledifferent file formats and casting them in the form of the CDM.
(ii) Create a GRD content manager to handle the georeferencinginformation in the GRD.
One possible method for allowing users to access GRD data without afull THREDDS catalog is to supply some type of unique URL to the database:
  grd://user:pass@server/database
and the service provider would construct a CDM instance that contains amain group of all the grids in the database and allow the user toaccess those grids through the API.
For example:

  grd://peter:test123@xxxxxxxxxxxxxxxxxx/coastwatch
might be a reference to a GRD database running at Barrodale thatcontains gridded NOAA CoastWatch satellite-derived data for some numberof geographic areas and time periods. The resulting netCDF datasetwould be one that contains a list of grids under a root group like adirectory structure:
  /
  /sst/
  /sst/northeast/
  /sst/northeast/jan01_2007    <---- a grid
  /sst/northeast/jan02_2007    <---- another grid
  ...
  /chlorophyll/northeast/jan01_2007   <---- a third grid
  /chlorophyll/northeast/jan02_2007   <---- and so on
It depends on the desired complexity of the grids in the database as towhether the user would require a more sophisticated catalog withquerying ability such as that which THREDDS could supply.


see the last answer below.

BTW, the TDS will soon have the ability to do proper HTTP-based authentication, 
and we are hoping to make that a standard in OPenDAP clients, which can act 
like browsers and pop up a username/password dialog window, instead of 
embedding the user:pass@ in the URL.

Questions
---------------

We have the following questions:
1) Where in the netCDF API would the content manager that handles GRDgeoreferencing information sit?
2) How does the I/O SP architecture determine the I/O SP for a givenfile:// <file://\> style URL? How would it know to handle a grd:// URLdifferently?


Very perceptive question; let me start here to explain these 2 questions:

The IOSP architecture is, in fact (RandomAccessFile) file based. Since you will 
be URL based, we have to fit you in at a higher level, namely 
NetcdfDataset.openFile(). If you look there you will see that we look for 
opendap (http: or dods:) and thredds: URLs. It might makes sense to generalize 
this to allow plugging in external handlers for your protocol, similar to how 
java.net.ContentHandler works. Otherwise we might put your code in the core, 
which is also a possibility.

Anyway, NetcdfDataset.openFile() would detect your URL scheme and call 
NetcdfFile with your IOSP. We will have to add a new constructor for that. (You 
could alternately just subclass NetcdfFile, which is what DODSNetcdfFile does).

As for the "content manager that handles GRD georeferencing information". It could be a 
CoordSysBuilder subclass. However, this is actually unnecessary if you use an existing Convention, and we 
would highly recommend using the CF Convention for gridded data. Since you are creating the "file", 
you can add the attributes and variables needed by that Convention. This makes your data "CF 
compliant" automatically, which is a real win.

3) Have we interpreted the slide on page 53 correctly -- is there aserver that can serve out data using the CDM (via the Java netCDF API)as an intermediate step?


yes, the THREDDS Data Server

4) Does a group structure to represent GRD contents map to an OPeNDAPconnection, WCS, or netCDF file or do those types of datarepresentations only have netCDF variables and no groups?


In principle you could use Groups, but they really wont be fully supported 
until we get the netcdf-4 file format finished and tested. I would advise to 
start with the simpler case of no groups.

5) Our understanding of the netCDF Java library is that it has, inparticular, the following two entry points:


    * NetcdfFile : this is the bare netCDF access to files of various
      types. It doesn't understand anything about coordinate systems.
      You can add an I/O service provider to handle your favorite file
      format via a class method. The variables it returns are instances
      of Variable (which of course don't know anything about coordinate
      systems).
    * NetcdfDataset : this is a layer built above the NetcdfFile layer
      and is the usual interface for applications (e.g., a WCS). It
      handles converting various attributes into a coordinate system. It
      has a number of methods relating to adding or getting coordinate
      systems. These methods seem to be applied to the entire file,

rather than to individual variables (or groups).


coordinate systems are really variable-specific. however the common case is 
that each dataset has a single coordinate system (or a set of closely related 
ones).

    CoordinateSystem
    
<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/CoordinateSystem.html>
    *findCoordinateSystem*
    
<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#findCoordinateSystem%28java.lang.String%29>(
java.lang.String name)// Retrieve the CoordinateSystem with the specified name.java.util.List *getCoordinateAxes*<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateAxes%28%29>()// Get the list of all CoordinateAxis objects used by thisdataset.
java.util.List * getCoordinateTransforms *<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateTransforms%28%29>()// Get the list of all CoordinateTransform objects used bythis dataset.
boolean * getCoordSysWereAdded *<http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordSysWereAdded%28%29>()
          // Has Coordinate System metadata been added.
The NetcdfDataset object contains instances of VariableDS. They are likea wrapper for the Variable objects found in the NetcdfFile object. Thereis a method to ask a VariableDS for the list of coordinate systemsassociated with it.


exactly

If we interpret things correctly , when a NetcdfDataset object is builtfrom a NetcdfFile object, the NetcdfDataset object is responsible forfiguring out the coordinate system information from attributes in theNetcdfFile, and composing a VariableDS from the coordinate systeminformation and each Variable. In theory, by implementing our ownCoordSysBuilder class and registering it, we should be able to addcoordinate system information to each VariableDS individually.


yes, or as i mentioned use an existing Convention and CoordSysBuilder.

A question then is : do applications like the web coverage server andOPeNDAP server get their coordinate information from VariableDS objectsor from the NetcdfDataset object?



OPenDAP is (more or less) at the same level as NetcdfFile, and so just 
faithfully transmits Variables, Attributes, and Dimensions across the wire. The 
coordinate systems then are added by clients (like CDM) that understand the 
convention. We are expecting that DAP4, the future opendap protocol, will add 
Groups.

WCS, OTOH, works at the coordinate system level, and so uses the GridDatatype, which is specialized for "coverage" data, and gets its coordinates systems from NetcdfDataset. The clent makes requests in coordinate space, and we know how to translate that into index space. Currently we can send back either geoTiff or netcdf/CF files. There are some limittions- the grid spacing must be uniform in WCS 1.0. We expect to move to WCS 1.1 later this year, which removes that limitation. We havent implemented reprojection/resampling, and im not sure that we will.

If it is from the NetcdfDatasetobject, then the strategy of grouping all the grids in a database into asingle NetcdfDataset, as outline above, won't work, and we'd be obligedto use a THREDDS server. Is this correct?


It would likely be a mistake to put a lot of disparate data into the same 
NetcdfDataset. Better to find the right granularity, which is typically 
homogenous data that shares the same discovery metadata.  So I would not use 
the Group mechanism to break the data into granules, better to make seperate 
datasets. Its possible that such an idiom will develop with Netcdf-4, but 
better to get something working that stays within existing practice, then 
decide if you want to forge ahead. Let me emphasize that its really important 
to find the right dataset granularity.

This means you want to use THREDDS catalogs to publish the dataset URLs and 
associated metadata, and possibly use TDS to serve your data. Once you had an 
IOSP or equivilent for your data, the main work is to develop the catalogs. 
These can be pretty minimal, but automatically populating catalogs with 
high-quality metadata is a huge win in the long run.

I think that would be a powerful value-added product, but of course i dont know 
what your customers really want. As Ted mentioned, its a good time to help 
influence TDS strategy, and it appears to me that your small company with 
extensive scientific experience would be a good fit with Unidata.

John

Follow-Ups:
- Re: Meeting about improving the GRD API.
  - From: Ian Barrodale
- Re: Meeting about improving the GRD API.
  - From: Ian Barrodale

2007 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: