[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netCDF3.6.0-beta3



Reimar,

> > How difficult would it be for you to require escaping some of these
> > special characters in variable names, for example instead of
> > permitting the variable name 'a.b(c)', requiring 'a\.b\(c\)'?
> > 
> > If we required escaping special characters in variable names, we could
> > allow all special characters, including blanks.  We're considering
> > this for netCDF-4, as well as permitting Unicode for names.  I realize
> > there are backward compatibility problems, I'm just wondering how
> > serious the backward compatibility issue is at this point, since it
> > will only get worse and eventually make such a change impossible.
>
> escaping would be fine but we got some questions about how this could or
> would be done.
> 
> Does it needs user changes of their programs or is there a nc global
> variable planned which makes this automaticly in the belonging routines,
> e.g. nf90_def_var ?
> 
> if probably a global variable like nc_use_escape is true then
> nf90_def_var knows to write the escape sequences for the non
> alphanumeric characters otherwise it gives an error about the wrong signs.
> 
> By reading the routine will know thats a ( should be interpreted as \(
> and the user could use the same input name as now.
> 
> In the programs then we could use for example O3(1) but internally it is
> stored as O3\(1\).
> 
> If it would be implemented this way there is only a header var to change
> and all goes the same as before.

You're right, we could provide automatic escaping if a global variable
is set appropriately.  That may be the best way to do it, but we need
to consider how to distinguish between escaped characters that are
part of the variable name and the same character used as syntax for
something else, such as a "." character used to indicate a member
component of a structure variable, which will be permitted with HDF5
as a storage layer.  We haven't decided on the best way to do this
yet.

> Now let me ask some questions about usage of unicode.
> It's probably the best method to get used very different language signs,
> but what happens if a user does not have the right fonts installed by
> looking into a data file?

There will be a way to indicate Unicode symbols in an encoding that
will distinguish the symbols without requiring Unicode fonts, such as
is done for Python.

> Did you thought about using of UTF-8 this is described in section 3.9 of
> the Unicode 4.0 standard or http://www.ietf.org/rfc/rfc3629.txt?

Yes, I think UTF-8 would be a very good way to represent names for
netCDF objects.  It would ensure that all the current names that use
only US-ASCII characters are valid Unicode strings.

However, I'm not sure UTF-8 would be the best way to represent
character data on disk, since it's a variable length encoding and thus
not necessarily suitable for direct access to the nth character in a
long string.  HDF5 has not dealt with Unicode encoding issues yet, so
we will have to determine how to do it for netCDF-4.  We may support
a default encoding and other encodings specified by a distinguished
attribute.

--Russ