[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: netCDF questions



Hi John and Wendy,

> We have a couple of questions regarding netCDF, assuming you
> are still the one to contact with such questions. 

It's better to send such questions to "address@hidden".  That way
you'll get an answer even if I happen to be on vacation or out of town.

> We have been using netCDF for 3 years now, previously creating
> the binary files via calls to the netCDF C library. We have recently 
> tried to use ncgen to create the binary files, only to discover that 
> variable names that were acceptable to the C library (e.g. "Foo.bar"
> or "Foo bar") are now flagged as syntax errors by ncgen (it wants 
> something more like "Foo_bar" ). 
> 
> One question is what might be the consequences of modifying
> the ncgen's lexical analyzer (ncgen.l) to allow characters such as 
> periods and spaces in variable names (such as has been done 
> via the netCDF C libarary)?

This issue has come up several times on the netcdfgroup mailing list, so I'll
first include relevant excerpts of a posting I wrote to that mailing list
that attempted to explain why the netCDF library is more lenient about
acceptable netCDF names than the ncgen utility is about CDL names:

  ... it is possible to create netCDF files with netCDF library calls that
  ncdump and ncgen cannot handle correctly ...  First, here is what the
  netCDF User's Guide says about CDL names:

      CDL names for variables, attributes, and dimensions may be any
      combination of alphabetic or numeric characters as well as `_' and `-'
      characters, but names beginning with `_' are reserved for use by the
      library.  Case is significant in CDL names.  The netCDF library does not
      enforce any restrictions on netCDF names, so it is possible (though
      unwise) to define variables with names that are not valid CDL names.


  Since the netCDF library puts no restrictions on names (except that they
  must be shorter than MAX_NC_NAME characters) you can even create netCDF
  files that use names containing punctuation, control characters, and
  non-ASCII bytes.  The CDL data description language, however, requires more
  restrictive names to make it possible to parse CDL statements.  As an
  example of the potential parsing difficulties, if you named a variable
  `p(time)', then it would be ambiguous whether the following was a CDL
  declaration of the scalar variable `p(time)' or a 1-dimensional variable `p'
  that used the `time' dimension:

      float p(time) ;

  Similarly, names that begin with digits are parsed in CDL as numeric
  constants.

  A perverse programmer could use new lines and semicolons in netCDF variable
  names to create a netCDF file that, when dumped with ncdump, would look like
  CDL statements that had nothing to do with the contents of the file.

  To get around such possibilities, we could add to the library a check when
  defining a name that the name conforms to the same regular expression for
  names used in CDL parsing (in ncgen/ncgen.l)

      [A-Za-z_][A-Za-z_0-9-]*

  but someone may want to write a new data description language for netCDF
  someday that permits a larger subset of names, or there may be users who
  don't use ncdump or ncgen that are already using more general names, e.g.
  with `.' in them.  Thus adding a new restriction on names at the library
  level might break existing applications.

> Another is, looking towards the future, might the use of spaces 
> and periods within variable names someday be rejected by the 
> C library calls? How is your crystal ball?

No, as indicated above, we have no intention of changing the library in a
way that might break existing applications, so we will continue to permit
any characters to be used in netCDF variable, dimension, and attribute names.
The only problem with using names that contain punctuation is the inability
to use the ncgen utility on the output of ncdump for such files, so if
you don't need to use ncgen, there is no reason to change your existing
netCDF files.

At one point, I tried to change the grammar of CDL to permit the use of the
"."  character in CDL names because another user asked about this, but at
the time I was unable to create a parsable grammar acceptable to yacc that
permitted this.  I'm not completely convinced this isn't possible, either
with yacc or a different parser, but I haven't looked at the problem again
recently.  I can't remember the details, but I seem to remember that the
changes to ncgen.l were straightforward, but I couldn't modify ncgen.y to
make things work.

> Another possiblility (and probably the cleanest one), would be 
> to use variable attributes to store our "Foo.bar" strings. This
> would, however, require us to rework a substantial amount
> of existing code. 

Yes, but that's not necessary if you don't need to use the ncgen utility.

The next release of netcdf (release 2.4) will include two additional
utilities developed by Harvey Davies of CSIRO, nc2text and text2nc, that
will provide an alternative to ncdump and ncgen for displaying and
manipulating netCDF data from the command line.  I'm not sure what
restrictions these utilities put on netCDF names, but it's possible they are
less restrictive than ncgen.  I'll try to check on this next week.

--Russ

______________________________________________________________________________

Russ Rew                                           UCAR Unidata Program
address@hidden                              http://www.unidata.ucar.edu