netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.
> We've had several discussions of UTF-8 support. The current ideas are > incorporated in a RFC at: > > http://hdf.ncsa.uiuc.edu/RFC/Unicode/Unicode.html > > Close reading of this RFC will indicate that we know how to support > UTF-8 for user data, but support for UTF-8 for names is still TBD. I would consider supporting only UTF-8 for names but permit users to specify other encodings as well for user data, for two reasons: - fixed-width encodings (like UCS2) permit quick access to the nth character in a string - other encodings may permit more compact representation than UTF-8 for strings that contain a lot of non-ASCII characters Joel Spolsky's column is a good introduction to some Unicode issues, but I recommend this article for developers: http://www.w3.org/TR/charmod/ For example, the above gives examples of some of the complications in sorting datasets alphabetically in a Group if you support Unicode names. You might need to use the "Unicode Collation Algorithm" in that case. Fortunately, there are open source implementations for such things in ICU (International Components For Unicode): http://icu.sourceforge.net/ --Russ