netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.
Hi Quincey, I'd like to reconsider the Unicode issue, and specifically ask about the feasibility of what we hope is a small addition to HDF5 to allow netCDF to support UTF-8 encoded names for variables, dimensions, and attributes without HDF5 having to support such encoded names. We would like to just declare in netCDF documentation that the names for netCDF variables, dimensions, and attributes are UTF-8 encoded when provided to or returned from netCDF interfaces. This is backwards compatible, because we currently only support ASCII strings (with some restrictions), and what we're proposing would just remove the restrictions and allow non-ASCII bytes (with the upper bit set), to allow for UTF-8 encoding of other Unicode characters. What we would need from HDF5 is a way to request that names for Datasets and Attributes allow an arbitrary byte array, so we can use UTF-8 encoding for non-ASCII characters. Is this feasible? Otherwise there are no library changes in netCDF that we would need to support UTF-8 encoding for Unicode names. Some applications such as ncdump and ncgen will have to know how to handle encoded names, but we are willing to deal with that. Note that we're not requesting that you drop restrictions on all names, just that you provide a way for netCDF-4 to be able to use names with non-ASCII bytes, for example a call to a function that says checking on new names will subsequently lenient (e.g. you could still disallow empty names, names with embedded null characters, or names that are too long). Existing code that didn't invoke this call would still have to abide by the current name restrictions. Also I notice that the documentation for H5Acreate and H5Dcreate at http://hdf.ncsa.uiuc.edu/HDF5/doc/RM_H5A.html#Annot-Create http://hdf.ncsa.uiuc.edu/HDF5/doc/RM_H5D.html#Dataset-Create currently list no restrictions on names to use only ASCII characters, but the Introduction to HDF5 says A dataset name is a sequence of alphanumeric ASCII characters. --Russ