digitalmars.D.bugs - [Issue 10472] New: lastIndexOf(string, string) does not find single character string at beginning of string

d-bugmail puremagic.com (23/23) Jun 25 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472

d-bugmail puremagic.com (44/44) Jun 25 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472
d-bugmail puremagic.com (7/7) Jun 25 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472
d-bugmail puremagic.com (19/29) Jun 26 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10472

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=10472

           Summary: lastIndexOf(string, string) does not find single
                    character string at beginning of string
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: rburners gmail.com



---
While working on lastIndexOf with a slice index I noticed that,

assert(lastIndexOf(to!S("�abcdefcdef"), to!T("�")) == 0);

fails on both x64 as welll x86.


It also fails for the single dchar version,

assert(lastIndexOf(to!S("�abcdefcdef"), to!dchar("�")) == 0);


I hope I will find the time to fix this this week.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 25 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=10472




The problem is this condition:

----
if (cast(dchar)(cast(Char)c) == c)
----

This is basically saying "if the code_point_ representation fits in a singe
code_unit_, then we look at the code_units_". This is wrong, since for UTF8
characters with code_point_s in the 0x80 0xFF range will "fit" in a single
code_unit_, but actually have a dual code_unit_ representation. In particular:

� is represented by \00F6, which fits in a single code unit, yet, when encoded
into UTF8 take up two: "0xC3 0xB6"

The correct question is:
if (codeLength!Char(c) == 1)

Or, if you want to tweak a little, since you don't need the *actual*
codeLength:

----
        static if      (Char.sizeof == 1) immutable fits = c <=   0x7F;
        else static if (Char.sizeof == 2) immutable fits = c <= 0xFFFF;
        else immutable fits = true;

        if (fits)
        {
            ...
----

------------------
BTW, implementation wise, I do believe a simple foreach_reverse is more
efficient, because it pops *as* it decodes. The for loop needs to stride
backwards (again) after a call to "back" ("back" strides backwards already)).

            foreach_reverse (i, dchar c2 ; s)
            {
                if ( c2 == c)
                    return i;
            }

and

            immutable c1 = std.uni.toLower(c);
            foreach_reverse (i, dchar c2 ; s)
            {
                if ( std.uni.toLower(c2) == c1)
                    return i;
            }

In any case, it is much simpler.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 25 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=10472




---
awesome, thanks for taking a look. This gives me somewhere to start, or just
copy paste ;-) I properly will find some time tonight as it looks right now.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 25 2013

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=10472





 ----
         static if      (Char.sizeof == 1) immutable fits = c <=   0x7F;
         else static if (Char.sizeof == 2) immutable fits = c <= 0xFFFF;
         else immutable fits = true;
 
         if (fits)
         {
             ...
 ----

Edit: The correct condition would actually be:
 ----

        static if      (Char.sizeof == 1) immutable fits = c <=   0x7F;
        else static if (Char.sizeof == 2) immutable fits = c <= 0xD7FF ||
(0xE000 <= c && c <= 0xFFFF
        else immutable fits = true;

        if (fits)
        {
            ...
----

This would be the "most correct" condition. As stated in the pull, both:
`if (std.ascii.isAscii(c))`
`if (codeLength!Char(c) == 1)`

would also be correct.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jun 26 2013

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 10472] New: lastIndexOf(string, string) does not find single character string at beginning of string