[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fwd: Documentation for backslash encoding of String metadata]



Bob,

> >> I'm working on switching to netCDF-java 4.0. Sorry for the delay.
> >>
> >> With netCDF-java 4.0, there seems to have been a change in the way
> >> special characters are encoded in attribute values and when they appear
> >> in ncdump.
> >>
> >> I couldn't find a mention of this at
> >> http://www.unidata.ucar.edu/software/netcdf-java/v4.0/CHANGES
> >> or documentation at
> >> http://www.unidata.ucar.edu/software/netcdf-java/v4.0/javadoc/index.html
> >> or
> >> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html
> >>
> >> Are the details documented somewhere?
> >
> > As far as I know, we haven't changed the way special characters are
> > encoded in attribute values in the C-based interfaces.  I may have
> > overlooked something, but could you give me an example of an attribute
> > value that's displayed differently by ncdump in netCDF-4 than it was by
> > a previous version of ncdump?
>
> Yes, if I use NCdump.print in netcdfJava 2.2.22 (and previous),
> characters like ' " and newline exist as single characters.
> (Actually, the " was troublesome because the entire String attribute was
> displayed with " at the beginning and end -- it should have been encoded.
>
> Yes, if I use NCdump.print in netcdfJava 4.0, characters like ' and
> newline (and probably ", but I haven't tested it yet) exist as two
> characters, backslash plus the character (or n for newline) as in a Java
> or JSON-encoded String.
> ' looks odd because possessive words now have an internal backslash:
> e.g., Bob\'s
> newline looks odd because it takes away the visual formatting that
> occurs in the attribute (e.g., a history attribute with a separate line
> for each processing step).

You're right that escaping the apostrophe doesn't seem necessary in CDL
for attribute string values, but I just verified that that particular
escape has been generated by ncdump since at least version 2.3.2, first
released in 1993, and perhaps versions previous to that.  It should have
been documented in the User's Guide section on CDL Notation for Data
Constants:

  http://www.unidata.ucar.edu/netcdf/docs/netcdf.html#CDL-Constants

but I see that it's not mentioned there.  Although it might be
considered a bug, I think you're the first to point it out.  The
original reason for the escape was probably for single character
constants, which use the CDL notation

  ownership = 'B', 'o', 'b', '\'s';

and that was carried over to the string notation unnecessarily.  The
ncgen utility parses it correctly, so it's only the CDL representation
that's wrong.  At this point, I'm reluctant to change it, because it
would also require changes in ncgen, several of the tests, and any other
user or commercial software that depends on CDL.

--Russ