digitalmars.D.bugs - [Issue 10472] New: lastIndexOf(string, string) does not find single character string at beginning of string
- d-bugmail puremagic.com (23/23) Jun 25 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472
- d-bugmail puremagic.com (44/44) Jun 25 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472
- d-bugmail puremagic.com (7/7) Jun 25 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472
- d-bugmail puremagic.com (19/29) Jun 26 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472
http://d.puremagic.com/issues/show_bug.cgi?id=10472 Summary: lastIndexOf(string, string) does not find single character string at beginning of string Product: D Version: D2 Platform: x86 OS/Version: Linux Status: NEW Severity: critical Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: rburners gmail.com --- While working on lastIndexOf with a slice index I noticed that, assert(lastIndexOf(to!S("öabcdefcdef"), to!T("ö")) == 0); fails on both x64 as welll x86. It also fails for the single dchar version, assert(lastIndexOf(to!S("öabcdefcdef"), to!dchar("ö")) == 0); I hope I will find the time to fix this this week. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 25 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10472 The problem is this condition: ---- if (cast(dchar)(cast(Char)c) == c) ---- This is basically saying "if the code_point_ representation fits in a singe code_unit_, then we look at the code_units_". This is wrong, since for UTF8 characters with code_point_s in the 0x80 0xFF range will "fit" in a single code_unit_, but actually have a dual code_unit_ representation. In particular: ö is represented by \00F6, which fits in a single code unit, yet, when encoded into UTF8 take up two: "0xC3 0xB6" The correct question is: if (codeLength!Char(c) == 1) Or, if you want to tweak a little, since you don't need the *actual* codeLength: ---- static if (Char.sizeof == 1) immutable fits = c <= 0x7F; else static if (Char.sizeof == 2) immutable fits = c <= 0xFFFF; else immutable fits = true; if (fits) { ... ---- ------------------ BTW, implementation wise, I do believe a simple foreach_reverse is more efficient, because it pops *as* it decodes. The for loop needs to stride backwards (again) after a call to "back" ("back" strides backwards already)). foreach_reverse (i, dchar c2 ; s) { if ( c2 == c) return i; } and immutable c1 = std.uni.toLower(c); foreach_reverse (i, dchar c2 ; s) { if ( std.uni.toLower(c2) == c1) return i; } In any case, it is much simpler. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 25 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10472 --- awesome, thanks for taking a look. This gives me somewhere to start, or just copy paste ;-) I properly will find some time tonight as it looks right now. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 25 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10472---- static if (Char.sizeof == 1) immutable fits = c <= 0x7F; else static if (Char.sizeof == 2) immutable fits = c <= 0xFFFF; else immutable fits = true; if (fits) { ... ----Edit: The correct condition would actually be:----static if (Char.sizeof == 1) immutable fits = c <= 0x7F; else static if (Char.sizeof == 2) immutable fits = c <= 0xD7FF || (0xE000 <= c && c <= 0xFFFF else immutable fits = true; if (fits) { ... ---- This would be the "most correct" condition. As stated in the pull, both: `if (std.ascii.isAscii(c))` `if (codeLength!Char(c) == 1)` would also be correct. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 26 2013