Hi, First, I'm sorry that I couldn't attend the meeting. As I explained to Adam Gaither, it occurred during a Unidata workshop that had been previously scheduled, and I was giving many of the workshop presentations. I have a few questions about the meeting summary. If you know any of the answers, please respond to me directly (russ@xxxxxxxxxxxxxxxx) if the answer is of little interest to other members of the isdf mailing list. Linnea writes: > ... > The Earth Observing Systems project (EOS) is expected to receive $3 billion > in funding over the next decade. The Earth Observing System Data > Information System (EOSDIS) part of the EOS project has selected NCSA's > netCDF/HDF merge product as its scientific data I/O library. I hadn't heard this yet, although I have heard that the EOSDIS project has made several decisions on data interfaces that were later revised. How could such a merge product be selected for EOSDIS when the characteristics of the netCDF/HDF merge are not yet completely specified and the implementation is still in the design and prototype stage? What conditions are there on whether EOSDIS will use the netCDF/HDF merge and what constraints do the EOSDIS requirements place on the interface? Is there someone else I can contact to learn about this? > ... > Russ Rew (who leads the netCDF project at Unidata) and Jeff Long (the > author of SILO) are currently corresponding to refine the SILO extensions > to netCDF. They hope to agree upon these extensions and cooperate with > NCSA and Unidata so that the same extensions are put into both the > netCDF/HDF merge and into netCDF. This overstates our objectives a bit. We have not yet agreed to put the SILO extensions into netCDF, but only to study the SILO extensions. I'm still corresponding with Jeff Long to try to understand the extensions. As a result, we are still undecided about whether the atmospheric science community that supports Unidata has enough need for the benefits these extensions provide to offset the costs they add to the implementation. We have not yet identified the resources to implement additional netCDF extensions beyond what we had already planned before we learned of the netCDF/HDF merger. > ... > Two other topics were mentioned but not resolved at the NCSA / LLNL > meeting. These topics were the `standard' definition of some objects and > the use of a socket library interface for reading data across a network. I don't understand this at all. The BSD socket interfaces are too low-level for convenient data access across a network. Using a network file system makes socket-level interfaces completely unnecessary. What is this about? > ... > The PDBLib (Portable Database Library) scientific database library is of > considerable interest to the National Grid Project because of its speed and > flexibility. PDBLib is similar to HDF in that both the library and the > file it produces are portable. One major difference between PDBLib and HDF > is that PDBLib allows the user to define C-like structures, then read and > write these structures in one operation. The structures can contain > primitive data, pointers to primitive data, other structures, and pointers > to other structures. PDBLib also has a more general conversion model - it > can write in native format, then read that on any other machine. Or, it > can create a file on one machine in any other machine's format. HDF can > read/write data in a machine's native format but can not move this file to > any other machine which uses a different format. HDF also can read/write > IEEE format on any machine - this IEEE format file is portable to any > computer. PDBLib was developed at LLNL by Stewart Brown. The SILO > interface is currently implemented on top of PDBLib. Is PDBLib proprietary? My (possibly incorrect) understanding was that the rights to make PDBLib into a commercial product were reserved by its author. If I've got this wrong, please correct me. If PDBLib is proprietary, what constraints does this put on use of PDBLib for standards? > ... > Another desire was to be able to write a code's internal data structures > directly to disk. Some subsequent discussion indicated that being able to > write data quickly and with little overhead (little extra information > written to disk) was the basic requirement. Another part of this > requirement seems to be the ability to write any data to disk without first > getting an 'approved' tag or data type implemented. This was for use > during the development stage. All agreed that eventually the tags would be > officially requested, granted and documented. Since the issue of writing a > code's internal data structures directly to disk received considerable > comment, we should specifically address this in our feedback to Mike Folk. > Related to this is the question of whether it is important to be able to > use other tools (such as graphics codes) to read (and display or do other > operations on) this data? For internal data-structure writes, what's wrong with write(2) or fwrite(3)? The idea of standardizing tags for code's internal data structures sounds like a case of overzealous standardization. The number and variety of internal data structures for programs is comparable to the number and variety of internal functions and subroutines used in implementing those programs, and does not seem to me to be a viable candidate for standardization. ---- Russ Rew University Corporation for Atmospheric Research Unidata Program Center P.O. Box 3000 russ@xxxxxxxxxxxxxxxx Boulder, Colorado 80307-3000 >From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 17 2003 Dec -0700 10:01:40 Message-ID: <wrxsmjjgyyz.fsf@xxxxxxxxxxxxxxxxxxxxxxx> Date: 17 Dec 2003 10:01:40 -0700 From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx> To: netcdf-hdf@xxxxxxxxxxxxxxxx Subject: requirements for handling chunking... Received: (from majordo@localhost) by unidata.ucar.edu (UCAR/Unidata) id hBHH1guf004553 for netcdf-hdf-out; Wed, 17 Dec 2003 10:01:42 -0700 (MST) Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu [18.104.22.168]) by unidata.ucar.edu (UCAR/Unidata) with ESMTP id hBHH1fp2004549 for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Wed, 17 Dec 2003 10:01:41 -0700 (MST) Organization: UCAR/Unidata Keywords: 200312171701.hBHH1fp2004549 Lines: 35 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx Precedence: bulk Howdy all! Based on recent emails and discussions, here's an attempt at requirements relating to the setting of chunksizes. This is from the requirements at http://my.unidata.ucar.edu/content/software/netcdf/netcdf-4/reqs.html. * Chunking is required in any dataset with one or more unlimited dimension in HDF5. NetCDF-4 supports setting chunk parameters at variable creation with the following new function: int nc_def_var_x(int ncid, const char *name, nc_type xtype, int ndims, const int *dimidsp, int *varidp, int chunkalg, int *chunksizes); Where chunksize is a pointer to an array of size ndims, with the chunksize in each dimension. If chunksizes is NULL, the user can select a chunking algorithm by setting chunkalg to NC_CHUNK_SEQ (to optimize for sequential access), NC_CHUNK_SUB (for chunk sizes set to favor equally subsetting in any dimension. When the (netcdf-3) function nc_def_var is used, a sequential chunking algorithm will be used. (Just as if the var had been created with NC_CHUNK_SEQ). The sequential chunking algorithm sets a chunksize of 1 all unlimited dimensions, and all other chunksizes to the size of that dimension, unless the resulting chunksize is greater than 250 KB, in which case subsequent dimensions will be set to 1 until the chunksize is less than 250 KB (one quarter of the default chunk cache size). The subsetting chunking algorithm sets the chunksize in each dimension to the nth root of (desired chunksize/product of n dimsizes).