Re: when will HDF5 support Unicode?

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

Hi Russ,

> >     I'll write some tests that check for proper insertion of non-ASCII 
> > strings
> > as object & attribute names and let you know what I find out.
> > 
> >     Note that Unicode strings as elements of a dataset is harder and 
> > probably
> > won't work correctly currently.
> 
> Right.  For data, multiple encodings would have to be supported.  What
> we're considering is an "_Encoding" attribute that would identify the
> character encoding for a string, e.g.
> 
>   String Address;
>      Address:_Encoding = "UTF-8";
> 
> For backward compatibility, we would have to assume no encoding when
> this attribute is not specified.  With this implementation of Unicode
> strings and the ability to store arbitrary arrays of bytes, there
> might not be any implications for the HDF5 library.

    This is OK, but perhaps we should enable a new character set type of
H5T_CSET_UTF8 instead, so the information about the string was included in the
file format directly?  

    Quincey

P.S. - This reminds me that I will need to add an "encoding" attribute to the
    object names in groups so that UTF-8 names can be distinguished from ASCII
    names. :-)


  • 2005 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: