Re: when will HDF5 support Unicode?

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: when will HDF5 support Unicode?
From: "Robert E. McGrath" <mcgrath@xxxxxxxxxxxxx>
Date: Wed, 11 May 2005 11:06:34 -0500


n 2005.05.10 10:22 Russ Rew wrote:

[...]all current ASCII-encoded names are
already UTF-8.


Unfortunately this is not quite true.  People have been putting anything
they want in path names including extended ASCII.  The bytes > 127
are not necessarily legal UTF-8, so we can't just say all existing files
are UTF-8, unfortunately. (This doesn't harm the file or library, but
tools will have problems if we tell them it's UTF-8 and it isn't.)


I don't know why you would want to support more than one encoding for
names,


We have many requests for non-English character sets, so it would

be nice to support them in the future.Between the above gotcha and the desire to someday support otherchar sets, the idea is to make ASCII and UTF-8 be the first of possiblymany.

Since existing files may well have non-UTF8 in them, ASCII must be the
default for backward compatibility.


At least one library change is needed to support UTF-8 encoded names,
specifically for iterating through dataset names in a Group in
"alphabetical order".  For names with non-ASCII characters, this order
should follow the Unicode collation algorithm.

My understanding is that the current proposal will sort the objects bynumericvalue of the bytes in the names for all cases. I don't know if UTF-8has

a different collating order than this, if so, it won't be implemented at
this time.

I'm trying to determine if the proposed changes address yourrequirements

well enough to be worth doing.

References:
- Re: when will HDF5 support Unicode?
  - From: Russ Rew

2005 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-hdf archives: