[UDUNITS #DRL-868187]: udunits-2.0.0: locale character set determination bug


> There's a problem in the regex code used in the udunits2 command-line
> utility for detecting the correct default character set to use. The regular
> expressions are compiled without the REG_EXTENDED flag, and the
> regular expressions themselves don't properly match the appropriate
> targets.

Thanks for sending this in.

Because the code is intended to match a character-set specification
regardless of where it occurs in a string, I modified your suggested
regular expressions somewhat.  The code is now

            {"^c$",                     UT_ASCII},
            {"^posix$",                 UT_ASCII},
            {"ascii",                   UT_ASCII},
            {"latin.?1([^0-9]|$)",      UT_LATIN1},
            {"8859.?1([^0-9]|$)",       UT_LATIN1},
            {"utf.?8([^0-9]|$)",        UT_UTF8},

I did add the REG_EXTENDED option.  I forgot that the "?" metacharacter
only exists in extended regular expressions.

> I've attached a patch which addresses the problem.
> Would using something like libcharset
> <URL: http://www.haible.de/bruno/packages-libcharset.html> be an
> option? nl_langinfo(CODESET) is the semi-portable way of detecting
> locale character encoding, but it is notoriously arbitrary.

I did it the way I did because that heuristic is not reliable --- 
especially for only a few, short strings.  Determining the character
set is outside the scope of a units package and is most appropriately
the responsibility of the client.

Again, thanks for sending this in.

> Best regards,
> Sam Yates

Steve Emmerson

