Unidata - To provide the data services, tools, and cyberinfrastructure leadership that advance Earth system science, enhance educational opportunities, and broaden participation. Unidata
         
  advanced  
 

CDM Object Names

A CDM Object name refers to the name of a Group, Dimension, Variable, Attribute, or Enum. An object name is a String, a variable length array of Unicode characters. We record here the set of allowable characters for different formats.

NetCDF-4 C library Object names refer to the name of a Group, Dimension, Variable, Attribute, user-defined Type, compound type Member, or enumeration type Symbol.

The CDM has not added user-defined types, and "compound type Member" are considered the same as Variables. Otherwise the two models are the same.


Proposed for NetCDF-3 and NetCDF-4 Identifiers:

A netCDF identifier is stored in a netCDF file as UTF-8 Unicode characters, NFC normalized. There are some restrictions on the valid characters used in a netCDF identifier:

  ID = ([a-zA-Z0-9_]|{UTF8})([^\x00-\x1F\x7F/]|{UTF8})*
UTF8 = multibyte UTF8 encoded char

which says:

Also See:

CDL

Which characters in an identifier must be escaped in CDL?

[^\x00-\x1F\x7F/_.@+-a-zA-Z0-9] 

A CDL document is encoded in UTF-8, and the following characters need to be preceeded by a '\' (92) in an identifier:

 32-42,44,58-63,91-96,123-126

Alternatively, we can enumerate the escaped characters (using the regular expression syntax accepted by lex or flex):

idescaped = \\[ !"#$%&'()*,:;<=>?\[\\\]^`{|}~]

Then a CDL representation of an ID can be defined as a combination of regular and escaped chars:

ID = ([a-zA-Z_]|{UTF8})([a-zA-Z0-9_.@+-]|{UTF8}|{idescaped})*

Must vs should ???

 

NcML

Uses standard XML encoding and escaping.

The chars '&', '<', '>' must be replaced by these entity references: "&amp;", "&lt;", "&gt;" In some places the single and double qoute must be replaced by "&apos;" and "&quot;" respectively

Typically an XML parser/library will handle this transparently.

 

CDM Section Specification

 

OPeNDAP

It appears that OPeNDAP allows the '/' char in an identifier? The first char can also be one of these:

[-+/%.\\*]

From the OPeNDAP lexers:

1. from dds.lex and ce_expr.lex

       [-+a-zA-Z0-9_/%.\\*][-+a-zA-Z0-9_/%.\\#*]*
2. from das.lex
       [-+a-zA-Z0-9_/%.\\*:()][-+a-zA-Z0-9_/%.\\#*:()]*
  (same as dds plus ':','(', and ')' are added)
3. from gse.lex
       [-+a-zA-Z0-9_/%.\\][-+a-zA-Z0-9_/%.\\#]*
  (same as dds except that '*' is removed)

Their note:

"...Note that the DAS allows Identifiers to have parens and colons while the DDS and expr scanners don't. It's too hard to disambiguate functions when IDs have parens in them and adding colons makes parsing the array projections hard..."

In a URL, OpenDAP uses percent encoding (eg %20 for a space character).

HDF5

A direct translation of their grammar would appear to be this:

PathName={AbsolutePathName}|{RelativePathName}


Separator=[/]+

AbsolutePathName={Separator}{RelativePathName}?
RelativePathName={Component}({Separator}|{RelativePathName})*
Component=[.]|{Name}
Name=[.]|({Charx}{Character}*)|{Character}+
/* Ascii set - '/' Character={Charx}|[.]
/* Ascii set - '.' and '/' */ Charx=[ !"#$%&'()*+,-0123456789:;<=>?@\[\\\]^`{|}~\x00-\x1e,\x7f]

Notes

One version of the manual, apparently out of date (http://www.unidata.ucar.edu/software/netcdf/guidec/guidec-7.html#HEADING7-4)

The names of dimensions, variables and attributes consist of arbitrary sequences of alphanumeric characters (as well as underscore '_' and hyphen '-'), beginning with a letter or underscore. (However names commencing with underscore are reserved for system use.) Case is significant in netCDF names.

That would be:

   [a-zA-Z_][a-zA-Z0-9_-]*

A more up-to-date-version (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html#The-NetCDF-Data-Model) documents the addition of the '.' character in names for netCDF version 3.4 in March, 1998

The names of dimensions, variables and attributes consist of arbitrary sequences of alphanumeric characters (as well as underscore '_', period '.' and hyphen '-'), beginning with a letter or underscore. (However names commencing with underscore are reserved for system use.) Case is significant in netCDF names. A zero-length name is not allowed.

That would be:

   [a-zA-Z_][a-zA-Z0-9_.-]*

CF Convention 1.0:

Variable, dimension and attribute names should begin with a letter and be composed of letters, digits, and underscores:

   [a-zA-Z][a-zA-Z0-9_]*

From the code:

 1. netcdf.texi:     [a-zA-Z_][a-zA-Z0-9_-.]*
 2. libsrc/string.c: [a-zA-Z_][a-zA-Z0-9_-.+@:()]*
 3. ncgen.l:         [a-zA-Z_][a-zA-Z0-9_-.+@#\[\]]*

The "extra" chars are:

  1. The characters '@', ':', '(', and ')' were the ones added for the
    chemists at the German Institute for Stratospheric Chemistry.  The
    inconsistency between these and what's documented in the User Guide is
    intentional.  We told them we would not disallow these characters but
    would also not document them or support them in ncgen.
  2. The '+' char is also in this undocumented set for the chemists, added later (2004/10/04).
  3. The '#', '[', and ']' are a mystery. They should probably be removed

 


This document is maintained by John Caron and was last updated on Jun 18, 2008
 
 
  Contact Us     Site Map     Search     Terms and Conditions     Privacy Policy     Participation Policy
 
National Science Foundation (NSF) UCAR Office of Programs University Corporation for Atmospheric Research (UCAR)   Unidata is a member of the UCAR Office of Programs, is managed by the University Corporation for Atmospheric Research, and is sponsored by the National Science Foundation.
P.O. Box 3000     Boulder, CO 80307-3000 USA     Tel: 303-497-8643     Fax: 303-497-8690