[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 950215: ncdump - ncgen



> Keywords: 199502160145.AA18781

Hi Dag,

> I recently installed some of the netcdf software available from
> your anonymous FTP data base (netcdf.tar and ncopers.tar).
> I used this to create a CDL file of a netCDF datafile in order to 
> extract a subset of the file.
> The result CDL file contained a global attribute on the form:
> 
>    :Start Day UT = "Day 25, 1995";
> 
> When I had subracted the subset and used ncgen to generate a new
> netCDF file, I got a syntax error on this line. When I then changed this
> line to:
> 
>    :Start_Day_UT = "Day 25, 1995";
> 
> everything went as it was supposed to. ncgen obviously didn't like 
> 'Start Day UT' but preferred the one word 'Start_Day_UT'.
> 
> Of course I was unable to read the resulting netCDF file with our current
> data analysis program, since I had to change the global attribute name. 
> My question is therefore; why does ncdump make a CDL file that ncgen
> can't read? Is this a bug in the program, or am I missing something here?
> 
> I tried to do a ncdump of the file and then use ncgen to generate the netCDF 
> file again without doing anything to the CDL file...
> but got the same error message.

That's a good question, and I now should add the question and answer to our
FAQ list, since it's now been asked more than once.  Here's my reply from
the last time it was asked:

> I generated the cdl file appended at the end of this message using ncdump 
> (lets call it sample.cdl). When I try ncgen sample.cdl I get:
> 
>       sample.cdl line 7: syntax error
> 
> It appears its because it doesn't like spaces in names since I can fix the
> cdl file by changing:
> 
>       Universal Time to Universal_Time

Yes, this is an opportunity to point out that it is possible to create
netCDF files with netCDF library calls that ncdump and ncgen cannot handle
correctly, and to explain why.  First, here is what the netCDF User's Guide
says about CDL names:

    CDL names for variables, attributes, and dimensions may be any
    combination of alphabetic or numeric characters as well as `_' and `-'
    characters, but names beginning with `_' are reserved for use by the
    library.  Case is significant in CDL names.  The netCDF library does not
    enforce any restrictions on netCDF names, so it is possible (though
    unwise) to define variables with names that are not valid CDL names.
    The names for the primitive data types are reserved words in CDL, so the
    names of variables, dimensions, and attributes must not be type names.

Since the netCDF library puts no restrictions on names (except that they
must be shorter than MAX_NC_NAME characters) you can even create netCDF
files that use names containing punctuation, control characters, and
non-ASCII bytes.  The CDL data description language, however, requires more
restrictive names to make it possible to parse CDL statements.  As an example
of the potential parsing difficulties, if you named a variable `p(time)',
then it would be ambiguous whether the following was a CDL declaration of
the scalar variable `p(time)' or a 1-dimensional variable `p' that used the
`time' dimension:

    float p(time) ;

Similarly, names that begin with digits are parsed in CDL as numeric
constants.

A perverse programmer could use new lines and semicolons in netcdf variable
names to create a netCDF file that, when dumped with ncdump, would look like
CDL statements that had nothing to do with the contents of the file.

To get around such possibilities, we could add to the library a check when
defining a name that the name conforms to the same regular expression for
names used in CDL parsing (in ncgen/ncgen.l)

    [A-Za-z_][A-Za-z_0-9-]*

but someone may want to write a new data description language for netCDF
someday that permits a larger subset of names, or there may be users who
don't use ncdump of ncgen that are already using more general names, e.g.
with `.' in them.  Thus adding a new restriction on names at the library
level might break existing applications.

______________________________________________________________________________

Russ Rew                                                UCAR Unidata Program
address@hidden                                          P.O. Box 3000
http://www.unidata.ucar.edu/                          Boulder, CO 80307-3000
______________________________________________________________________________