[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NPOESS Sample Files



Ken,

> ... Why netCDF-4?

Well one (biased) answer was recently provide by one of our users in a
support question:

> On the other hand, every time I am forced to use HDF or, even worse,
> HDF-EOS, I gnash my teeth and rend my garments at the experience of
> using a data storage API designed by a combination of database wonks
> and [aerospace contractor] engineers.  And I mean that in the worst
> sense possible.  Coming back to the land of milk, honey, and netCDF
> after those diasporas is a huge relief.

I think the user overstated the differences, and HDF5 is very well
designed in some ways, especially for HPC users who have had to deal
with lots of APIs and programming interface philosophies, but comments
like the above point out a goal we strive to attain: make the
interface as simple as practical.  Keep the "surface area" of the
interface minimal so that the whole data model is comprehensible
without too much study.  Some of this requires carefully written
documentation and a better set of examples than we currently make
available, but it also means leaving some complexities out of the data
model.

As an example, in HDF5 no data object has a unique or distinguished
name.  If you ask what is the name of a Group or Dataset, the answer
can be "here's a list of aliases (links), any one of which refers to
the object", but no member of the list is primary.  This necessitates
developers sometimes having to consider whether two names refer to the
same data object, for example.

As another example, you need to close every HDF5 object when you are
done with it, whereas in netCDF-4, a single close of the file takes
care of freeing resources and flushing the buffers to disk.

It's an open question whether we have preserved enough simplicity in
the netCDF-4 interface and data model to make it as attractive to
developers and data providers as netCDF-3.  Adding Groups to netCDF-4
by only adding a handful of interfaces is one example where we have
succeeded in providing a lot more power with only a small increment in
complexity.  Our decision not to support the complexities of HDF-5
References introduce may prove to have been wrong, but it's too early
to tell whether the power they add is worth the added complexity.

As we wrote in our AMS 2006 paper:

  ... Ultimately, data and applications will only be adapted to
  netCDF-4 if a critical mass of data and useful applications
  exist. Some data providers may decide that netCDF-4 is not enough
  simpler than HDF5 to justify using netCDF-4 rather than HDF5.
  Similarly, application developers may decide that if they need to
  modify their applications to fully support netCDF-4, they might
  consider expending the extra effort required to provide full support
  for HDF5 as well.

The whole idea, which entailed some considerable risk, was to see if
we could preserve the desirable common characteristics of netCDF and
HDF5 while taking advantage of their separate strengths: the
widespread use and simplicity of netCDF and the generality and
performance of HDF5.

Whether we achieved this objective will ultimately be decided by
users, developers, and data providers.  If HDF5 becomes more popular
than netCDF, the effort made in developing netCDF-4 has still improved
both netCDF and HDF.

Jim Gray, who was the 1998 Turing Award winner in computer science for
his work in relational databases and transaction processing, recently
published an article: 

  http://research.microsoft.com/research/pubs/view.aspx?tr_id=860

where he wrote:

  While the commercial world has standardized on the relational data
  model and SQL, no single standard or tool has critical mass in the
  scientific community.  There are many parallel and competing efforts
  to build these tool suites - at least one per discipline.  Data
  interchange outside each group is problematic.  In the next decade,
  as data interchange among scientific disciplines becomes
  increasingly important, a common HDF-like format and package for all
  the sciences will likely emerge.

We think netCDF-4 and its follow-on developments that we are planning
is a candidate for not just the "HDF-like format", but also the data
model that may fill this niche.

--Russ