Hi Joe, comments in line:
Joe Sirott wrote:
Thanks for taking the time to come up with this specification. It
looks like a good start. I do have some concerns about the complexity
of the spec, though, and would like to suggest a few changes that
might make it easier to use.
I believe that this spec is too complicated for most potential
users. For instance, it appears that any software that is able to read
these collections will have to have parse a SQL-like expression in
order to interpret a collection.
Well its a simple syntax: "XXXX <dim_name> XXXX <dim_name> XXXX <variable_name>"
But i only threw it in to have something concrete. One could use 3
Another source of complexity is the
varying dimensionality of the dimensions and observations (either 1D
or 2D depending on the type of data).
Yes, actually i think you could probably have any number of dimensions.
Still another example is the use
of character variables for storing attribute data for collections
(should software assume that any character variable is an attribute)?
I dont understand this, do you have an example?
It's also difficult to edit data with this convention. How would I
individual profile from a collection? Or, worse, what if points needed
to be added or removed from an individual profile? I'd have to
regenerate the entire netCDF file in the latter case. That makes this
convention only practical as an archive format.
Some variants are optimal for archival, others for dynamic modification.
The backwards linked list is optimal for adding arbitrary amounts of data
efficiently, but its pretty bad when you read it. My intention is to give
standard options that the user can choose depending on need.
If you want to throw me a use case, Ill try to give you a concrete solution.
An alternative would be to store each individual
profile/trajectory/time series in a separate netCDF file. Collections
would consist of a set of netCDF files stored in a zip or jar
file. The zip file could also contain some sort of (XML?) manifest
file that could contain metadata about the collection as a whole. Any
metadata associated with an individual profile would be stored as a
global attribute in the appropriate netCDF file. Editing a profile
would be as simple as extracting the netCDF file from the archive,
rewriting it, and then storing it back in the jar file.
This is a good solution sometimes, but not generally. Many small files are not
optimal for large archives. We are having trouble on motherlode right now with
excessive inode consumption. Unzipping is too costly if the data is accessed
To make it even easier for consumers of this data, I would also
restrict the data type of all variables to double. Also, all four
would be required.
I also lean to requiring x,y,z,t coordinates, but others arent so sure. Note
this is not the same as having x,y,z,t dimensions. In fact this is a very
important part of the proposal that deserves to be highlighted.
Im claiming that the general way to do coordinate systems for this kind of
data looks something like
dataVar:coordinates = “lon lat z time”;
rather than follow gridded data conventions like COARDS and use variations of:
I think this is what you are saying below.
Some examples (from your CDL examples):
Collection of point data
Unchanged (just one file in archive)
Collection of profile data
For each netCDF file:
Collection of trajectories
For each netCDF file:
Station time series
I think this looks fine, exccept I want to also cover the case where someone
needs to put more than one thing in a file.
Thanks for your input.