Meeting about improving the GRD API.

Ian Barrodale ian at barrodale.com
Fri Feb 9 11:47:25 MST 2007


Thanks very much for this detailed response John.

Currently at BCS we're having a very interesting discussion about the 
way forward, and I've placed a call to Ted (no return call yet) to 
get his opinion on a satellite application that would become feasible 
by combining the strengths of the various technologies 
involved.  More later today....

Regards,
Ian
==================

At 08:13 AM 2/9/2007, John Caron wrote:
>Hello all, comments are in-line:
>
>Ian Barrodale wrote:
>>Hi Ted, John, Russ, and John:
>>Thank you all for taking the time yesterday to both listen to our 
>>story and to further enlighten us about your work.  It was much appreciated.
>>The note below provides a possible implementation route, and some 
>>questions.  Please feel free to point out any shortcomings in our 
>>proposed approach, and please provide any answers that come to mind 
>>regarding our questions.
>>Thanks again,
>>Ian
>>======================
>>
>>Goal
>>-------
>>Based on feedback from BCS Grid DataBlade customers and, in 
>>particular, Ted Habermann,  we feel that there may be some value in 
>>providing alternate ways of  accessing data from a Grid DataBlade 
>>(GRD) - powered database through existing widely-used protocols and 
>>methods.  Note that by "accessing", we really mean  just the 
>>reading part, as we already provide, through the BCS Gridded Data 
>>Loader client, a means of conveniently ingesting data from many 
>>forms into a GRD-powered database.  One method of accessing the 
>>data  would be to cast it in the form of the Common Data Model 
>>(CDM)  supported by the Java netCDF API from UCAR.  The advantage 
>>of this is that:
>>     * users would be able to write software using the Java netCDF API
>>       (which is fairly straightforward to use and well documented) for
>>       accessing GRD data, and
>>     * data providers can use a GRD-powered database and provide access
>>       to it through OPeNDAP, WCS, netCDF files, etc. using the Java
>>       netCDF API (see page 53 attachment, modified from the slide on
>>       page 53 of
>>       http://www.unidata.ucar.edu/staff/caron/presentations/CDM.ppt).
>>Our understanding of a possible implementation
>>---------------------------------------------------------------------
>>To handle GRD data from the Java netCDF API, we would have to:
>>(i) Create a GRD I/O service provider for the Java netCDF API (see 
>>page 38 attachment) that can communicate with the GRD database 
>>using  a combination of JDBC and the existing Java GRD API.  The 
>>Java netCDF API uses a service provider architecture to handle 
>>reading multiple different file formats and casting them in the 
>>form of the CDM.
>>(ii) Create a GRD content manager to handle the georeferencing
>>information in the GRD.
>>One possible method for allowing users to access GRD data without a
>>full THREDDS catalog is to supply some type of unique URL to the database:
>>   grd://user:pass@server/database
>>and the service provider would construct a CDM instance that 
>>contains a main group of all the grids in the database and allow 
>>the user to access those grids through the API.
>>For example:
>>   grd://peter:test123@omni.barrodale.com/coastwatch
>>might be a reference to a GRD database running at Barrodale that 
>>contains gridded NOAA CoastWatch satellite-derived data for some 
>>number of geographic areas and time periods.  The resulting netCDF 
>>dataset would be one that contains a list of grids under a root 
>>group like a directory structure:
>>   /
>>   /sst/
>>   /sst/northeast/
>>   /sst/northeast/jan01_2007    <---- a grid
>>   /sst/northeast/jan02_2007    <---- another grid
>>   ...
>>   /chlorophyll/northeast/jan01_2007   <---- a third grid
>>   /chlorophyll/northeast/jan02_2007   <---- and so on
>>It depends on the desired complexity of the grids in the database 
>>as to whether the user would require a more sophisticated catalog 
>>with querying ability such as that which THREDDS could supply.
>
>see the last answer below.
>
>BTW, the TDS will soon have the ability to do proper HTTP-based 
>authentication, and we are hoping to make that a standard in OPenDAP 
>clients, which can act like browsers and pop up a username/password 
>dialog window, instead of embedding the user:pass@ in the URL.
>
>>Questions
>>---------------
>>We have the following questions:
>>1) Where in the netCDF API would the content manager that handles 
>>GRD georeferencing information sit?
>>2) How does the I/O SP architecture determine the I/O SP for a given
>>file:// <file://\> style URL?  How would it know to handle a grd:// URL
>>differently?
>
>Very perceptive question; let me start here to explain these 2 questions:
>
>The IOSP architecture is, in fact (RandomAccessFile) file based. 
>Since you will be URL based, we have to fit you in at a higher 
>level, namely NetcdfDataset.openFile(). If you look there you will 
>see that we look for opendap (http: or dods:) and thredds: URLs. It 
>might makes sense to generalize this to allow plugging in external 
>handlers for your protocol, similar to how java.net.ContentHandler 
>works. Otherwise we might put your code in the core, which is also a 
>possibility.
>
>Anyway, NetcdfDataset.openFile() would detect your URL scheme and 
>call NetcdfFile with your IOSP. We will have to add a new 
>constructor for that. (You could alternately just subclass 
>NetcdfFile, which is what DODSNetcdfFile does).
>
>As for the "content manager that handles GRD georeferencing 
>information". It could be a CoordSysBuilder subclass. However, this 
>is actually unnecessary if you use an existing Convention, and we 
>would highly recommend using the CF Convention for gridded data. 
>Since you are creating the "file", you can add the attributes and 
>variables needed by that Convention. This makes your data "CF 
>compliant" automatically, which is a real win.
>
>>3) Have we interpreted the slide on page 53 correctly -- is there a 
>>server that can serve out data using the CDM (via the Java netCDF 
>>API) as an intermediate step?
>
>yes, the THREDDS Data Server
>
>>4) Does a group structure to represent GRD contents map to an 
>>OPeNDAP connection, WCS, or netCDF file or do those types of data 
>>representations only have netCDF variables and no groups?
>
>In principle you could use Groups, but they really wont be fully 
>supported until we get the netcdf-4 file format finished and tested. 
>I would advise to start with the simpler case of no groups.
>
>>5) Our understanding of the netCDF Java library is that it has, in 
>>particular, the following two entry points:
>>     * NetcdfFile : this is the bare netCDF access to files of various
>>       types. It doesn't understand anything about coordinate systems.
>>       You can add an I/O service provider to handle your favorite file
>>       format via a class method. The variables it returns are instances
>>       of Variable (which of course don't know anything about coordinate
>>       systems).
>>     * NetcdfDataset : this is a layer built above the NetcdfFile layer
>>       and is the usual interface for applications (e.g., a WCS). It
>>       handles converting various attributes into a coordinate system. It
>>       has a number of methods relating to adding or getting coordinate
>>       systems. These methods seem to be applied to the entire file,
>>       rather than to individual variables (or groups).
>
>coordinate systems are really variable-specific. however the common 
>case is that each dataset has a single coordinate system (or a set 
>of closely related ones).
>
>
>>     CoordinateSystem
>> 
>><http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/CoordinateSystem.html>
>>     *findCoordinateSystem*
>> 
>><http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#findCoordinateSystem%28java.lang.String%29>(
>>     java.lang.String name)     // Retrieve the CoordinateSystem 
>> with the specified name.
>>          java.util.List *getCoordinateAxes* 
>> <http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateAxes%28%29>() 
>>
>>           // Get the list of all CoordinateAxis objects used by 
>> this dataset.
>>           java.util.List * getCoordinateTransforms * 
>> <http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordinateTransforms%28%29> 
>> ()
>>           // Get the list of all CoordinateTransform objects used 
>> by this dataset.
>>           boolean * getCoordSysWereAdded * 
>> <http://www.unidata.ucar.edu/software/netcdf-java/v2.2.18/javadoc/ucar/nc2/dataset/NetcdfDataset.html#getCoordSysWereAdded%28%29> 
>> ()
>>           // Has Coordinate System metadata been added.
>>The NetcdfDataset object contains instances of VariableDS. They are 
>>like a wrapper for the Variable objects found in the NetcdfFile 
>>object. There is a method to ask a VariableDS for the list of 
>>coordinate systems associated with it.
>
>exactly
>
>>If we interpret things correctly , when a NetcdfDataset object is 
>>built from a NetcdfFile object, the NetcdfDataset object is 
>>responsible for figuring out the coordinate system information from 
>>attributes in the NetcdfFile, and composing a VariableDS from the 
>>coordinate system information and each Variable. In theory, by 
>>implementing our own CoordSysBuilder class and registering it, we 
>>should be able to add coordinate system information to each 
>>VariableDS individually.
>
>yes, or as i mentioned use an existing Convention and CoordSysBuilder.
>
>
>>A question then is : do applications like the web coverage server 
>>and OPeNDAP server get their coordinate information from VariableDS 
>>objects or from the NetcdfDataset object?
>
>
>OPenDAP is (more or less) at the same level as NetcdfFile, and so 
>just faithfully transmits Variables, Attributes, and Dimensions 
>across the wire. The coordinate systems then are added by clients 
>(like CDM) that understand the convention. We are expecting that 
>DAP4, the future opendap protocol, will add Groups.
>
>WCS, OTOH, works at the coordinate system level, and so uses the 
>GridDatatype, which is specialized for "coverage" data, and gets its 
>coordinates systems from NetcdfDataset. The clent makes requests in 
>coordinate space, and we know how to translate that into index 
>space. Currently we can send back either geoTiff or netcdf/CF files. 
>There are some limittions- the grid spacing must be uniform in WCS 
>1.0. We expect to move to WCS 1.1 later this year, which removes 
>that limitation. We havent implemented reprojection/resampling, and 
>im not sure that we will.
>
>>If it is from the NetcdfDataset object, then the strategy of 
>>grouping all the grids in a database into a single NetcdfDataset, 
>>as outline above, won't work, and we'd be obliged to use a THREDDS 
>>server. Is this correct?
>
>It would likely be a mistake to put a lot of disparate data into the 
>same NetcdfDataset. Better to find the right granularity, which is 
>typically homogenous data that shares the same discovery 
>metadata.  So I would not use the Group mechanism to break the data 
>into granules, better to make seperate datasets. Its possible that 
>such an idiom will develop with Netcdf-4, but better to get 
>something working that stays within existing practice, then decide 
>if you want to forge ahead. Let me emphasize that its really 
>important to find the right dataset granularity.
>
>This means you want to use THREDDS catalogs to publish the dataset 
>URLs and associated metadata, and possibly use TDS to serve your 
>data. Once you had an IOSP or equivilent for your data, the main 
>work is to develop the catalogs. These can be pretty minimal, but 
>automatically populating catalogs with high-quality metadata is a 
>huge win in the long run.
>
>I think that would be a powerful value-added product, but of course 
>i dont know what your customers really want. As Ted mentioned, its a 
>good time to help influence TDS strategy, and it appears to me that 
>your small company with extensive scientific experience would be a 
>good fit with Unidata.
>
>John

**********************************************
Ian Barrodale, Ph.D.
President
Barrodale Computing Services Ltd.
Tel: (250) 472-4372 Fax: (250) 472-4373
Web: http://www.barrodale.com
Email: ian at barrodale.com
**********************************************
Mailing Address:
P.O. Box 3075 STN CSC
Victoria BC Canada V8W 3W2

Shipping Address:
Hut R, McKenzie Avenue
University of Victoria
Victoria BC Canada V8W 3W2
**********************************************



More information about the Netcdf-java mailing list