[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problem with []() in 3.4 version



> To: Russ Rew <address@hidden>
> From: "R. Bauer" <address@hidden>
> Subject: Re: problem with []() in 3.4 version
> Organization: Institut fuer Stratosphaerische Chemie (ICG-1)

Hi Reimar,

You wrote:

> > ...
> > So if you have existing netCDF files with other characters in names, for
> > example "()[]", the 3.x library should be able to read them, but it will
> > not permit creating new names with such characters in them.  If you need
> > to circumvent the restriction it is possible to just take the name test
> > out of the library source code and recompile it, but if you do this,
> > realize that ncgen users may have trouble reading your files or
> > understanding the output from ncdump on your files.  The change requires
> > a simple modification to the NC_check_name() function in libsrc/string.c
> > to permit the additional characters you want and then rebuilding.
> >
> > With version 3.4, we relaxed the name restriction slightly to permit
> > periods, ".", in the names of netCDF (and CDL) components.
>
> We are not interested to store our data in a patched netCDF Format. I aggree
> to the structure belonging to your defined dataformat netCDF. Only the point
> with the naming conventions will be terible a lot of more work for us if
> there is no way to add other signs like the dot "." to the CDL Format..

First, I'm sorry this change has caused such problems and that we were
not aware of the problems earlier.  Part of the reason we make test
releases available well in advance of the final releases is to discover
such problems and find solutions for them.  Before version 3.3 was
released in May of 1997, we announced and released several beta test
versions, and included information about the name restrictions in the
prerelease documentation

  http://www.unidata.ucar.edu/packages/netcdf/prerelease.html

    In the previous library, there was no checking that the characters
    used in the name of a netCDF object were compatible with CDL
    restrictions. The ncdump and ncgen utilities that use CDL permit
    only alphanumeric characters, "_" and "-" in names. Now this
    restriction is also enforced by the library for creation of new
    dimensions, variables, and attributes. Previously existing
    components with names like "@*#.^&* !" will still work OK.

but we heard no feedback that this change would cause any problems.

Second, I was not suggesting that you store your data in a "patched
netCDF format".  The format would still be the same, it's just that the
library restriction on new names would be removed.  Any application
linked with an unmodified netCDF 2.x or 3.x library would be able to
read or modify your files, just as they can now read or modify files
created with netCDF 2.x versions, even if variable names contain unusual
non-alphanumeric characters.  I was only suggesting that you apply a
patch to remove the naming restriction where netCDF files must be
written with names that include characters such as "()[]@#:".  This
seems to me as if it would be less trouble than changing the variable
names to not use these characters, and to make this option easier, I've
described and included the necessary patch in the "Known Problems with
the netCDF 3.4 Distribution" page at

  http://www.unidata.ucar.edu/packages/netcdf/known_problems.html

I discovered in testing this that I needed to also remove some tests for
"bad names" in the C and Fortran interface tests, so these changes are
also included with the above patch.

> I have written a complex reading writing function for our datasets. I will
> explain what's are the main problems.
> 
> - We have a lot of chemical experimental datasets. For the names of the
> variables we have decided to use the ASAD name convention. This is an
> international standard too.
> It is defined by Paul D.Brown & Oliver Wild, Cambridge Centre for Atmospheric
> Science, Cambridge.
> They have defined thats O singled 1D will be written as O(1D).
> 
> Some of the other working examples are Br, Br2,  BrCl,  BrO,  BrONO, BrONO2,
> BrSH, C,  C2Cl4, C2H2
> 
> At this point I have the following problem.
> If I aggree to your definition of only alphanumerical signs in the variable
> name without '()' I will disagree to the ASAD community. If I aggree to
> ASAD I have to disagree to netCDF naming conventions.

With the patch mentioned above, you can be consistent with the ASAD name
conventions.  We only made this change as a way to try to satisfy users
who were having problems with the inconsistency between CDL and netCDF
names, but you must not have encountered any such problems, since you
apparently have been using names inconsistent with CDL requirements in
order to be consistent with ASAD conventions.  Below, I try to describe
why some restrictions on characters permitted in CDL names is necessary.

> - In the netCDF Definitions there is no way to define an other variable array
> of data as an attribute of a given variable. For our data belonging to
> experiments we could have for each value for example a standard deviation or
> a counter for the points used for a middling. Or by the time I have a begin_t
> and a end_t used by a time_window for the middeling or interpolation. For
> this case of attributes I am using the "@" sign
> The variables are defined in netCDF e.g. name@STDEV
> My reading routines is able to identify this STDEV belonging to name and will
> add it automaticly to the structure holding all data of name.
> 
> - The sign ":" I have used in the past to sign manipulated data
> I was thinking about using "#" for an other feature too.
> 
> 
> I have a favour to ask you to add some more signs to the CDL format like
> "@ () : #"

The CDL name restrictions are not arbitrary, but rather are dictated by
the ability to write a grammar for a small language (CDL) that
unambiguously represents the structure of netCDF datasets.  For example,
currently the CDL declaration

    float x(m);

means "x" is a floating-point variable of rank 1 with dimension "m".  If
the characters "(" and ")" are permitted in variable names, then this
statement becomes ambiguous, since it could also mean that "x(m)" is a
scalar floating-point variable (of rank 0).  Similarly, the CDL
declaration

    temp:units         = "celsius";

means the "units" attribute of a variable named "temp" has the string
value "celsius".  If the character ":" were allowed in variable names,
this statement becomes ambiguous, since it could also refer to a variable
named "temp:units".  Such ambiguities are a real problem if a simple
parser based on yacc is to be implemented, and ncgen is such a parser.

Permitting "." in names was actually difficult, because it required
changing the grammar for CDL so that there were no ambiguities in the
representation of floating-point constants containing "." as a decimal
point and similar variable names containing "." merely as a character.

I'm not sure there would be any problem with permitting any of "@#[]"
without modifying the grammar to try these out, but I'm also not sure
whether these would be of any use to you in a CDL that does not permit
"(" and ")" in names.  And there may be additional problems with adding
such characters for other existing language interfaces, such as perl,
Matlab, python, IDL, etc.  So I'm sorry, but I can't commit to such a
change without more research ...

> --
> R.Bauer
> 
> Institut fuer Stratosphaerische Chemie (ICG-1)
> Forschungszentrum Juelich
> email: address@hidden

--Russ
_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu