[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[UDUNITS #DRL-868187]: udunits-2.0.0: locale character set determination bug


> There's a problem in the regex code used in the udunits2 command-line
> utility for detecting the correct default character set to use. The regular
> expressions are compiled without the REG_EXTENDED flag, and the
> regular expressions themselves don't properly match the appropriate
> targets.

Thanks for sending this in.

Because the code is intended to match a character-set specification
regardless of where it occurs in a string, I modified your suggested
regular expressions somewhat.  The code is now

            {"^c$",                     UT_ASCII},
            {"^posix$",                 UT_ASCII},
            {"ascii",                   UT_ASCII},
            {"latin.?1([^0-9]|$)",      UT_LATIN1},
            {"8859.?1([^0-9]|$)",       UT_LATIN1},
            {"utf.?8([^0-9]|$)",        UT_UTF8},

I did add the REG_EXTENDED option.  I forgot that the "?" metacharacter
only exists in extended regular expressions.

> I've attached a patch which addresses the problem.
> Would using something like libcharset
> <URL: http://www.haible.de/bruno/packages-libcharset.html> be an
> option? nl_langinfo(CODESET) is the semi-portable way of detecting
> locale character encoding, but it is notoriously arbitrary.

I did it the way I did because that heuristic is not reliable --- 
especially for only a few, short strings.  Determining the character
set is outside the scope of a units package and is most appropriately
the responsibility of the client.

Again, thanks for sending this in.

> Best regards,
> Sam Yates

Steve Emmerson

Ticket Details
Ticket ID: DRL-868187
Department: Support UDUNITS
Priority: Normal
Status: Closed

NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.