[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Meeting about improving the GRD API.



Thanks very much for this detailed response John.

Currently at BCS we're having a very interesting discussion about the way forward, and I've placed a call to Ted (no return call yet) to get his opinion on a satellite application that would become feasible by combining the strengths of the various technologies involved.  More later today....

Regards,
Ian
===================
 
At 08:13 AM 2/9/2007, John Caron wrote:
Hello all, comments are in-line:

Ian Barrodale wrote:
Hi Ted, John, Russ, and John:
Thank you all for taking the time yesterday to both listen to our story and to further enlighten us about your work.  It was much appreciated.
The note below provides a possible implementation route, and some questions.  Please feel free to point out any shortcomings in our proposed approach, and please provide any answers that come to mind regarding our questions.
Thanks again,
Ian
=======================
 
Goal
-------
Based on feedback from BCS Grid DataBlade customers and, in particular, Ted Habermann,  we feel that there may be some value in providing alternate ways of  accessing data from a Grid DataBlade (GRD) - powered database through existing widely-used protocols and methods.  Note that by "accessing", we really mean  just the reading part, as we already provide, through the BCS Gridded Data Loader client, a means of conveniently ingesting data from many forms into a GRD-powered database.  One method of accessing the data  would be to cast it in the form of the Common Data Model (CDM)  supported by the Java netCDF API from UCAR.  The advantage of this is that:
    * users would be able to write software using the Java netCDF API
      (which is fairly straightforward to use and well documented) for
      accessing GRD data, and
    * data providers can use a GRD-powered database and provide access
      to it through OPeNDAP, WCS, netCDF files, etc. using the Java
      netCDF API (see page 53 attachment, modified from the slide on
      page 53 of
      http://www.unidata.ucar.edu/staff/caron/presentations/CDM.ppt).
Our understanding of a possible implementation
---------------------------------------------------------------------
To handle GRD data from the Java netCDF API, we would have to:
(i) Create a GRD I/O service provider for the Java netCDF API (see page 38 attachment) that can communicate with the GRD database using  a combination of JDBC and the existing Java GRD API.  The Java netCDF API uses a service provider architecture to handle reading multiple different file formats and casting them in the form of the CDM.
(ii) Create a GRD content manager to handle the georeferencing 
information in the GRD.
One possible method for allowing users to access GRD data without a 
full THREDDS catalog is to supply some type of unique URL to the database:
  grd://user:pass@server/database
and the service provider would construct a CDM instance that contains a main group of all the grids in the database and allow the user to access those grids through the API.
For example:
  grd://peter:address@hidden/coastwatch
might be a reference to a GRD database running at Barrodale that contains gridded NOAA CoastWatch satellite-derived data for some number of geographic areas and time periods.  The resulting netCDF dataset would be one that contains a list of grids under a root group like a directory structure:
  /
  /sst/
  /sst/northeast/
  /sst/northeast/jan01_2007    <---- a grid
  /sst/northeast/jan02_2007    <---- another grid
  ...
  /chlorophyll/northeast/jan01_2007   <---- a third grid
  /chlorophyll/northeast/jan02_2007   <---- and so on
It depends on the desired complexity of the grids in the database as to whether the user would require a more sophisticated catalog with querying ability such as that which THREDDS could supply.

see the last answer below.

BTW, the TDS will soon have the ability to do proper HTTP-based authentication, and we are hoping to make that a standard in OPenDAP clients, which can act like browsers and pop up a username/password dialog window, instead of embedding the user:pass@ in the URL.

Questions
---------------
We have the following questions:
1) Where in the netCDF API would the content manager that handles GRD georeferencing information sit?
2) How does the I/O SP architecture determine the I/O SP for a given 
file:// <file://\> style URL?  How would it know to handle a grd:// URL 
differently?

Very perceptive question; let me start here to explain these 2 questions:

The IOSP architecture is, in fact (RandomAccessFile) file based. Since you will be URL based, we have to fit you in at a higher level, namely NetcdfDataset.openFile(). If you look there you will see that we look for opendap (http: or dods:) and thredds: URLs. It might makes sense to generalize this to allow plugging in external handlers for your protocol, similar to how java.net.ContentHandler works. Otherwise we might put your code in the core, which is also a possibility.

Anyway, NetcdfDataset.openFile() would detect your URL scheme and call NetcdfFile with your IOSP. We will have to add a new constructor for that. (You could alternately just subclass NetcdfFile, which is what DODSNetcdfFile does).

As for the "content manager that handles GRD georeferencing information". It could be a CoordSysBuilder subclass. However, this is actually unnecessary if you use an existing Convention, and we would highly recommend using the CF Convention for gridded data. Since you are creating the "file", you can add the attributes and variables needed by that Convention. This makes your data "CF compliant" automatically, which is a real win.

3) Have we interpreted the slide on page 53 correctly -- is there a server that can serve out data using the CDM (via the Java netCDF API) as an intermediate step?

yes, the THREDDS Data Server

4) Does a group structure to represent GRD contents map to an OPeNDAP connection, WCS, or netCDF file or do those types of data representations only have netCDF variables and no groups?

In principle you could use Groups, but they really wont be fully supported until we get the netcdf-4 file format finished and tested. I would advise to start with the simpler case of no groups.

5) Our understanding of the netCDF Java library is that it has, in particular, the following two entry points:
    * NetcdfFile : this is the bare netCDF access to files of various
      types. It doesn't understand anything about coordinate systems.
      You can add an I/O service provider to handle your favorite file
      format via a class method. The variables it returns are instances
      of Variable (which of course don't know anything about coordinate
      systems).
    * NetcdfDataset : this is a layer built above the NetcdfFile layer
      and is the usual interface for applications (e.g., a WCS). It
      handles converting various attributes into a coordinate system. It
      has a number of methods relating to adding or getting coordinate
      systems. These methods seem to be applied to the entire file,
      rather than to individual variables (or groups).

coordinate systems are really variable-specific. however the common case is that each dataset has a single coordinate system (or a set of closely related ones).


    CoordinateSystem
    < http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/CoordinateSystem.html >
    *findCoordinateSystem*
    < http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#findCoordinateSystem%28java.lang.String%29 >(
    java.lang.String name)     // Retrieve the CoordinateSystem with the specified name.
         java.util.List *getCoordinateAxes* < http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateAxes%28%29>( )
          // Get the list of all CoordinateAxis objects used by this dataset.
          java.util.List * getCoordinateTransforms * < http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateTransforms%28%29 > ()
          // Get the list of all CoordinateTransform objects used by this dataset.
          boolean * getCoordSysWereAdded * < http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordSysWereAdded%28%29 > ()
          // Has Coordinate System metadata been added.
The NetcdfDataset object contains instances of VariableDS. They are like a wrapper for the Variable objects found in the NetcdfFile object. There is a method to ask a VariableDS for the list of coordinate systems associated with it.

exactly

If we interpret things correctly , when a NetcdfDataset object is built from a NetcdfFile object, the NetcdfDataset object is responsible for figuring out the coordinate system information from attributes in the NetcdfFile, and composing a VariableDS from the coordinate system information and each Variable. In theory, by implementing our own CoordSysBuilder class and registering it, we should be able to add coordinate system information to each VariableDS individually.

yes, or as i mentioned use an existing Convention and CoordSysBuilder.


A question then is : do applications like the web coverage server and OPeNDAP server get their coordinate information from VariableDS objects or from the NetcdfDataset object?


OPenDAP is (more or less) at the same level as NetcdfFile, and so just faithfully transmits Variables, Attributes, and Dimensions across the wire. The coordinate systems then are added by clients (like CDM) that understand the convention. We are expecting that DAP4, the future opendap protocol, will add Groups.

WCS, OTOH, works at the coordinate system level, and so uses the GridDatatype, which is specialized for "coverage" data, and gets its coordinates systems from NetcdfDataset. The clent makes requests in coordinate space, and we know how to translate that into index space. Currently we can send back either geoTiff or netcdf/CF files. There are some limittions- the grid spacing must be uniform in WCS 1.0. We expect to move to WCS 1.1 later this year, which removes that limitation. We havent implemented reprojection/resampling, and im not sure that we will.

If it is from the NetcdfDataset object, then the strategy of grouping all the grids in a database into a single NetcdfDataset, as outline above, won't work, and we'd be obliged to use a THREDDS server. Is this correct?

It would likely be a mistake to put a lot of disparate data into the same NetcdfDataset. Better to find the right granularity, which is typically homogenous data that shares the same discovery metadata.  So I would not use the Group mechanism to break the data into granules, better to make seperate datasets. Its possible that such an idiom will develop with Netcdf-4, but better to get something working that stays within existing practice, then decide if you want to forge ahead. Let me emphasize that its really important to find the right dataset granularity.

This means you want to use THREDDS catalogs to publish the dataset URLs and associated metadata, and possibly use TDS to serve your data. Once you had an IOSP or equivilent for your data, the main work is to develop the catalogs. These can be pretty minimal, but automatically populating catalogs with high-quality metadata is a huge win in the long run.

I think that would be a powerful value-added product, but of course i dont know what your customers really want. As Ted mentioned, its a good time to help influence TDS strategy, and it appears to me that your small company with extensive scientific experience would be a good fit with Unidata.

John

**********************************************
Ian Barrodale, Ph.D.
President
Barrodale Computing Services Ltd.
Tel: (250) 472-4372 Fax: (250) 472-4373
Web: http://www.barrodale.com
Email: address@hidden
**********************************************
Mailing Address:
P.O. Box 3075 STN CSC
Victoria BC Canada V8W 3W2

Shipping Address:
Hut R, McKenzie Avenue
University of Victoria
Victoria BC Canada V8W 3W2
**********************************************