[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Uniform data indexing and querying



Hi Sveta:

Most of these questions are probably better addressed to the opendap 
principles, Peter Cornillon or James Gallagher. However, I have added my own 
opinions, briefly below.

Sveta Shasharina wrote:
Dear John,

Your name came up in our conversation with Mike Folk (NCSA).  We are
discussing a common project on adding a "remoting" layer to the indexing
HDF5 API (Mike and Rishi Sinha).  The idea will be to take this API allowing
to index (bitmap index) and efficiently query/access data, extract a
"format-agnostic" (working hopefully for HDF5 and NetCDF) interface and use
this interface in a service which will make allow for a remote operation.
The client of this service will be in some kind of 4GL (for visualization).

This is all in regard of a possible proposal for an NSF/SBIR proposal (due
June 13) addressing a topic of "Visualization of large data."

I see overlaps with what you and your colleagues are doing (I googled you
:-) and think we could collaborate or, at least, exchange ideas for the
future collaboration.   I work at Tech-X Corporation in Boulder (wee
http://www.txcorp.com and http://grid.txcorp.com) and we have many
scientific computing and data management projects (mostly for DOE, some from
DOD and NASA).

So if you find time, could you please answer a couple of questions?

1.  Is opendap used outside of earth systems?

also used in Space Physics (HAO/NCAR).
not sure where else

2.  What benefits does it have as a transfer mechanism (compared, say, to
gridftp, soap, corba etc)?

opendap is a subsetting data access protocol. The subsetting of large datasets 
is crucial.

compared to:
gridftp is not subsetting, only bulk transfer
soap is not a data access protocol. opendap 4 will use SOAP.
corba: opendap is not a distributed object system, but rather client/server. 
the client does have to worry about the server's object's lifecycles.



3.  What is the status of unifying NetCDF and HDF5? And is NetCDF4 widely
supported?  I heard that parallel NetCDF is superior to HDF5, so why then
NetCDF4 (based on HDF5)?

We dont really unify HDF5/Netcdf. Rather, Netcdf4 is a profile (subset) of HDF5. NetCDF4 is brand new, not even complete until HDF5 version 8 is released (this fall we hope). HDF5 has a richer data model than Netcdf, so we are taking advantage of various new features, not just parellel I/O.
See: http://www.unidata.ucar.edu/software/netcdf/netcdf-4/

We are also looking at indexing in order to provide remote access to large collections of data. I would be interested in hearing what approaches you might take.
Regards,

John Caron