|
|
|||
|
||||
A CDM Object name refers to the name of a Group, Dimension, Variable, Attribute, or Enum. An object name is a String, a variable length array of Unicode characters. We record here the set of allowable characters for different formats.
NetCDF-4 C library Object names refer to the name of a Group, Dimension, Variable, Attribute, user-defined Type, compound type Member, or enumeration type Symbol.
The CDM has not added user-defined types, and "compound type Member" are considered the same as Variables. Otherwise the two models are the same.
A netCDF identifier is stored in a netCDF file as UTF-8 Unicode characters, NFC normalized. There are some restrictions on the valid characters used in a netCDF identifier:
ID = ([a-zA-Z0-9_]|{UTF8})([^\x00-\x1F\x7F/]|{UTF8})*
UTF8 = multibyte UTF8 encoded char
which says:
Also See:
Which characters in an identifier must be escaped in CDL?
[^\x00-\x1F\x7F/_.@+-a-zA-Z0-9]
A CDL document is encoded in UTF-8, and the following characters need to be preceeded by a '\' (92) in an identifier:
32-42,44,58-63,91-96,123-126
Alternatively, we can enumerate the escaped characters (using the regular expression syntax accepted by lex or flex):
idescaped = \\[ !"#$%&'()*,:;<=>?\[\\\]^`{|}~]
Then a CDL representation of an ID can be defined as a combination of regular and escaped chars:
ID = ([a-zA-Z_]|{UTF8})([a-zA-Z0-9_.@+-]|{UTF8}|{idescaped})*
Must vs should ???
Uses standard XML encoding and escaping.
The chars '&', '<', '>' must be replaced by these entity references: "&", "<", ">" In some places the single and double qoute must be replaced by "'" and """ respectively
Typically an XML parser/library will handle this transparently.
It appears that OPeNDAP allows the '/' char in an identifier? The first char can also be one of these:
[-+/%.\\*]
From the OPeNDAP lexers:
1. from dds.lex and ce_expr.lex
[-+a-zA-Z0-9_/%.\\*][-+a-zA-Z0-9_/%.\\#*]*
2. from das.lex
[-+a-zA-Z0-9_/%.\\*:()][-+a-zA-Z0-9_/%.\\#*:()]*
(same as dds plus ':','(', and ')' are added)
3. from gse.lex
[-+a-zA-Z0-9_/%.\\][-+a-zA-Z0-9_/%.\\#]*
(same as dds except that '*' is removed)
Their note:
"...Note that the DAS allows Identifiers to have parens and colons while the DDS and expr scanners don't. It's too hard to disambiguate functions when IDs have parens in them and adding colons makes parsing the array projections hard..."
In a URL, OpenDAP uses percent encoding (eg %20 for a space character).
A direct translation of their grammar would appear to be this:
PathName={AbsolutePathName}|{RelativePathName}
Separator=[/]+
AbsolutePathName={Separator}{RelativePathName}?
RelativePathName={Component}({Separator}|{RelativePathName})*
Component=[.]|{Name}
Name=[.]|({Charx}{Character}*)|{Character}+
/* Ascii set - '/'
Character={Charx}|[.]
/* Ascii set - '.' and '/' */
Charx=[ !"#$%&'()*+,-0123456789:;<=>?@\[\\\]^`{|}~\x00-\x1e,\x7f]
One version of the manual, apparently out of date (http://www.unidata.ucar.edu/software/netcdf/guidec/guidec-7.html#HEADING7-4)
The names of dimensions, variables and attributes consist of arbitrary sequences of alphanumeric characters (as well as underscore '_' and hyphen '-'), beginning with a letter or underscore. (However names commencing with underscore are reserved for system use.) Case is significant in netCDF names.
That would be:
[a-zA-Z_][a-zA-Z0-9_-]*
A more up-to-date-version (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html#The-NetCDF-Data-Model) documents the addition of the '.' character in names for netCDF version 3.4 in March, 1998
The names of dimensions, variables and attributes consist of arbitrary sequences of alphanumeric characters (as well as underscore '_', period '.' and hyphen '-'), beginning with a letter or underscore. (However names commencing with underscore are reserved for system use.) Case is significant in netCDF names. A zero-length name is not allowed.
That would be:
[a-zA-Z_][a-zA-Z0-9_.-]*
CF Convention 1.0:
Variable, dimension and attribute names should begin with a letter and be composed of letters, digits, and underscores:
[a-zA-Z][a-zA-Z0-9_]*
From the code:
1. netcdf.texi: [a-zA-Z_][a-zA-Z0-9_-.]* 2. libsrc/string.c: [a-zA-Z_][a-zA-Z0-9_-.+@:()]* 3. ncgen.l: [a-zA-Z_][a-zA-Z0-9_-.+@#\[\]]*
The "extra" chars are:
The characters '@', ':', '(', and ')' were the ones added for the
chemists at the German Institute for Stratospheric Chemistry. The
inconsistency between these and what's documented in the User Guide is
intentional. We told them we would not disallow these characters but
would also not document them or support them in ncgen.
The '+' char is also in this undocumented set for the chemists, added later (2004/10/04).
The '#', '[', and ']' are a mystery. They should probably be removed
This document is maintained by John Caron and was last updated on Jun 18, 2008
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | ||||||
|
||||||