Questions about Linnea Cook's meeting summary

Hi,

First, I'm sorry that I couldn't attend the meeting.  As I explained to Adam
Gaither, it occurred during a Unidata workshop that had been previously
scheduled, and I was giving many of the workshop presentations.  I have a
few questions about the meeting summary.  If you know any of the answers,
please respond to me directly (russ@xxxxxxxxxxxxxxxx) if the answer is of
little interest to other members of the isdf mailing list.

Linnea writes:

> ...
> The Earth Observing Systems project (EOS) is expected to receive $3 billion
> in funding over the next decade.  The Earth Observing System Data
> Information System (EOSDIS) part of the EOS project has selected NCSA's
> netCDF/HDF merge product as its scientific data I/O library.

I hadn't heard this yet, although I have heard that the EOSDIS project has
made several decisions on data interfaces that were later revised.  How
could such a merge product be selected for EOSDIS when the characteristics
of the netCDF/HDF merge are not yet completely specified and the
implementation is still in the design and prototype stage?  What conditions
are there on whether EOSDIS will use the netCDF/HDF merge and what
constraints do the EOSDIS requirements place on the interface?  Is there
someone else I can contact to learn about this?

> ...
> Russ Rew (who leads the netCDF project at Unidata) and Jeff Long (the
> author of SILO) are currently corresponding to refine the SILO extensions
> to netCDF.  They hope to agree upon these extensions and cooperate with
> NCSA and Unidata so that the same extensions are put into both the
> netCDF/HDF merge and into netCDF.

This overstates our objectives a bit.  We have not yet agreed to put the
SILO extensions into netCDF, but only to study the SILO extensions.  I'm
still corresponding with Jeff Long to try to understand the extensions.  As
a result, we are still undecided about whether the atmospheric science
community that supports Unidata has enough need for the benefits these
extensions provide to offset the costs they add to the implementation.  We
have not yet identified the resources to implement additional netCDF
extensions beyond what we had already planned before we learned of the
netCDF/HDF merger.

> ...
> Two other topics were mentioned but not resolved at the NCSA / LLNL
> meeting.  These topics were the `standard' definition of some objects and
> the use of a socket library interface for reading data across a network.

I don't understand this at all.  The BSD socket interfaces are too low-level
for convenient data access across a network.  Using a network file system
makes socket-level interfaces completely unnecessary.  What is this about?

> ...
> The PDBLib (Portable Database Library) scientific database library is of
> considerable interest to the National Grid Project because of its speed and
> flexibility.  PDBLib is similar to HDF in that both the library and the
> file it produces are portable.  One major difference between PDBLib and HDF
> is that PDBLib allows the user to define C-like structures, then read and
> write these structures in one operation.  The structures can contain
> primitive data, pointers to primitive data, other structures, and pointers
> to other structures.  PDBLib also has a more general conversion model - it
> can write in native format, then read that on any other machine.  Or, it
> can create a file on one machine in any other machine's format.  HDF can
> read/write data in a machine's native format but can not move this file to
> any other machine which uses a different format.  HDF also can read/write
> IEEE format on any machine - this IEEE format file is portable to any
> computer.  PDBLib was developed at LLNL by Stewart Brown.  The SILO
> interface is currently implemented on top of PDBLib.

Is PDBLib proprietary?  My (possibly incorrect) understanding was that the
rights to make PDBLib into a commercial product were reserved by its author.
If I've got this wrong, please correct me.  If PDBLib is proprietary, what
constraints does this put on use of PDBLib for standards?

> ...
> Another desire was to be able to write a code's internal data structures
> directly to disk.  Some subsequent discussion indicated that being able to
> write data quickly and with little overhead (little extra information
> written to disk) was the basic requirement.  Another part of this
> requirement seems to be the ability to write any data to disk without first
> getting an 'approved' tag or data type implemented.  This was for use
> during the development stage.  All agreed that eventually the tags would be
> officially requested, granted and documented.  Since the issue of writing a
> code's internal data structures directly to disk received considerable
> comment, we should specifically address this in our feedback to Mike Folk.
> Related to this is the question of whether it is important to be able to
> use other tools (such as graphics codes) to read (and display or do other
> operations on) this data?

For internal data-structure writes, what's wrong with write(2) or fwrite(3)?
The idea of standardizing tags for code's internal data structures sounds
like a case of overzealous standardization.  The number and variety of
internal data structures for programs is comparable to the number and
variety of internal functions and subroutines used in implementing those
programs, and does not seem to me to be a viable candidate for
standardization.

----
Russ Rew                  University Corporation for Atmospheric Research
Unidata Program Center    P.O. Box 3000
russ@xxxxxxxxxxxxxxxx     Boulder, Colorado 80307-3000

>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 17 2003 Dec -0700 10:01:40 
Message-ID: <wrxsmjjgyyz.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 17 Dec 2003 10:01:40 -0700
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: requirements for handling chunking...
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id hBHH1guf004553
        for netcdf-hdf-out; Wed, 17 Dec 2003 10:01:42 -0700 (MST)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id hBHH1fp2004549
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Wed, 17 Dec 2003 10:01:41 -0700 (MST)
Organization: UCAR/Unidata
Keywords: 200312171701.hBHH1fp2004549
Lines: 35
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

Howdy all!

Based on recent emails and discussions, here's an attempt at
requirements relating to the setting of chunksizes. This is from the
requirements at
http://my.unidata.ucar.edu/content/software/netcdf/netcdf-4/reqs.html. 

    * Chunking is required in any dataset with one or more unlimited
      dimension in HDF5. NetCDF-4 supports setting chunk parameters at
      variable creation with the following new function:

int nc_def_var_x(int ncid, const char *name, nc_type xtype, int ndims, 
        const int *dimidsp, int *varidp, int chunkalg, int *chunksizes);

      Where chunksize is a pointer to an array of size ndims, with the
      chunksize in each dimension. If chunksizes is NULL, the user can
      select a chunking algorithm by setting chunkalg to NC_CHUNK_SEQ
      (to optimize for sequential access), NC_CHUNK_SUB (for chunk
      sizes set to favor equally subsetting in any dimension.

      When the (netcdf-3) function nc_def_var is used, a sequential
      chunking algorithm will be used. (Just as if the var had been
      created with NC_CHUNK_SEQ).

      The sequential chunking algorithm sets a chunksize of 1 all
      unlimited dimensions, and all other chunksizes to the size of
      that dimension, unless the resulting chunksize is greater than
      250 KB, in which case subsequent dimensions will be set to 1
      until the chunksize is less than 250 KB (one quarter of the
      default chunk cache size).

      The subsetting chunking algorithm sets the chunksize in each
      dimension to the nth root of (desired chunksize/product of n
      dimsizes).