[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fwd: Documentation for backslash encoding of String metadata]



Bob, what you're seeing is probably sloppiness in the netcdf-java library. It
will take me a few days to figure out whats going on.

Russ Rew wrote:
> Bob,
>
>>>> I'm working on switching to netCDF-java 4.0. Sorry for the delay.
>>>>
>>>> With netCDF-java 4.0, there seems to have been a change in the way
>>>> special characters are encoded in attribute values and when they appear
>>>> in ncdump.
>>>>
>>>> I couldn't find a mention of this at
>>>> http://www.unidata.ucar.edu/software/netcdf-java/v4.0/CHANGES
>>>> or documentation at
>>>> http://www.unidata.ucar.edu/software/netcdf-java/v4.0/javadoc/index.html
>>>> or
>>>> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html
>>>>
>>>> Are the details documented somewhere?
>>> As far as I know, we haven't changed the way special characters are
>>> encoded in attribute values in the C-based interfaces.  I may have
>>> overlooked something, but could you give me an example of an attribute
>>> value that's displayed differently by ncdump in netCDF-4 than it was by
>>> a previous version of ncdump?
>> Yes, if I use NCdump.print in netcdfJava 2.2.22 (and previous),
>> characters like ' " and newline exist as single characters.
>> (Actually, the " was troublesome because the entire String attribute was
>> displayed with " at the beginning and end -- it should have been encoded.
>>
>> Yes, if I use NCdump.print in netcdfJava 4.0, characters like ' and
>> newline (and probably ", but I haven't tested it yet) exist as two
>> characters, backslash plus the character (or n for newline) as in a Java
>> or JSON-encoded String.
>> ' looks odd because possessive words now have an internal backslash:
>> e.g., Bob\'s
>> newline looks odd because it takes away the visual formatting that
>> occurs in the attribute (e.g., a history attribute with a separate line
>> for each processing step).
>
> You're right that escaping the apostrophe doesn't seem necessary in CDL
> for attribute string values, but I just verified that that particular
> escape has been generated by ncdump since at least version 2.3.2, first
> released in 1993, and perhaps versions previous to that.  It should have
> been documented in the User's Guide section on CDL Notation for Data
> Constants:
>
>   http://www.unidata.ucar.edu/netcdf/docs/netcdf.html#CDL-Constants
>
> but I see that it's not mentioned there.  Although it might be
> considered a bug, I think you're the first to point it out.  The
> original reason for the escape was probably for single character
> constants, which use the CDL notation
>
>   ownership = 'B', 'o', 'b', '\'s';
>
> and that was carried over to the string notation unnecessarily.  The
> ncgen utility parses it correctly, so it's only the CDL representation
> that's wrong.  At this point, I'm reluctant to change it, because it
> would also require changes in ncgen, several of the tests, and any other
> user or commercial software that depends on CDL.
>
> --Russ