netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.
n 2005.05.10 10:22 Russ Rew wrote:
[...]all current ASCII-encoded names are already UTF-8.
Unfortunately this is not quite true. People have been putting anything they want in path names including extended ASCII. The bytes > 127 are not necessarily legal UTF-8, so we can't just say all existing files are UTF-8, unfortunately. (This doesn't harm the file or library, but tools will have problems if we tell them it's UTF-8 and it isn't.)
I don't know why you would want to support more than one encoding for names,
We have many requests for non-English character sets, so it wouldbe nice to support them in the future. Between the above gotcha and the desire to someday support other char sets, the idea is to make ASCII and UTF-8 be the first of possibly many.
Since existing files may well have non-UTF8 in them, ASCII must be the default for backward compatibility.
At least one library change is needed to support UTF-8 encoded names, specifically for iterating through dataset names in a Group in "alphabetical order". For names with non-ASCII characters, this order should follow the Unicode collation algorithm.
My understanding is that the current proposal will sort the objects by numeric value of the bytes in the names for all cases. I don't know if UTF-8 has
a different collating order than this, if so, it won't be implemented at this time.I'm trying to determine if the proposed changes address your requirements
well enough to be worth doing.