[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NPOESS Sample Files



Ken,

> ... That sounds good to me, but your web site also puts a disclaimer
> that  "they probably should not be" read with the HDF5 tools.  Can
> you explain that disclaimer?

In addition to what Ed wrote, interoperability will be enhanced by
data providers if they can restrict the HDF5 features they use to what
have been identified in the Common Data Model that netCDF-4
implements, because then they can use either HDF5 tools or netCDF-4
tools on the same file.  If a data provider decides to use HDF5
References, the implications should be understood, that the data may
not easily be mapped into netCDF-4 (or OPeNDAP) abstractions that
support access through another interface.

> Also, would you perhaps provide a comment on my leaning one way or
> the other between netCDF-4 and HDF-5?  The data sets I typically
> deal with are typically either regularly gridded or at least
> geo-referenced with lat/lon coords for each satellite observation.
> HDF4's tiling/chunking is very important to me, especially for the
> big global grids (serving subsets via OPeNDAP is much faster than
> when the data are in netCDF-3 and the whole file needs to be
> decompressed, even if the user wants only one pixel), so I am glad
> to see that feature is part of netCDF-4/HDF5. Also, the new parallel
> I/O features I feel will become ever more important especially when
> we move into the NPOESS era.

I'd say there are tradeoffs, and you should try to preserve some
flexibility for the people who will access the data in the future.  If
most of the users of the data or developers of applications that will
access the data are already familiar and happy with HDF5, that's
important.  At this point HDF5 is more mature in support for things
like parallel I/O and chunking than netCDF-4.  We aren't yet even sure
that the chunking parameters we use as a default are optimal for any
particular use.  Some other considerations are:

 - your judgment of the importance of simplicity versus power among
   users and developers
 - likely future size and  funding for the development/support groups:
   The HDF Group, Inc and UCAR's Unidata Program
 - size of external developer community providing additional tools and
   language interfaces
 - likely future user communities: NPP/NPOESS operational and research
   community, HPC communities, climate and geoscience users in
   research and education 
 - stance toward backward compatibility versus new features
 - importance of optimal performance in comparison with other
   characteristics

and so on.  It's a difficult decision that involves some risks either
way.  If you decide to use HDF5, I would advise you to be conservative
in use of features not supported by other data models.

--Russ