digitalmars.D.bugs - [Issue 1235] New: std.string.tolower() fails on certain utf8 characters
- d-bugmail puremagic.com (33/33) May 15 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1235
- d-bugmail puremagic.com (5/5) Jun 28 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1235
- d-bugmail puremagic.com (9/9) Jul 01 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1235
http://d.puremagic.com/issues/show_bug.cgi?id=1235
Summary: std.string.tolower() fails on certain utf8 characters
Product: D
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Phobos
AssignedTo: bugzilla digitalmars.com
ReportedBy: d chqrlie.org
import std.string;
int main(char[][] args)
{
printf("tolower(\"\\u0130e\") -> \"%.*s\"\n", tolower("\u0130e"));
return 0;
}
produces incorrect output:
tolower("\u0130e") -> "i e"
Bug comes from erroneous code in phobos/std/string.d line 843:
if (r.length != i + j)
r = r[0 .. i + j];
Turkish dotted capital I (U+0130) is correctly converted to ASCII i (u+0069).
But converted character does not use the same number of bytes as original
character. The code above is therefore incorrect. As far as I understand the
implementation, it could be removed completely.
A similar issue is present in toupper(), with the additional twist that
conversion to uppercase should not be special cased for the ASCII subset in the
Turkish Locale.
Additionally, non ASCII code is triggered by if (c >= 0x7F) where it should be
if (c > 0x7F).
--
May 15 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1235
I agree, with the exception that for UTF characters, there is no such thing as
a locale. So the toupper("i") cannot be set to \u0130.
--
Jun 28 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1235
bugzilla digitalmars.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
Fixed DMD 1.018 and DMD 2.002
--
Jul 01 2007









d-bugmail puremagic.com 