www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 4389] New: CTFE: Appending dchar to empty string creates invalid string.

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389

           Summary: CTFE: Appending dchar to empty string creates invalid
                    string.
           Product: D
           Version: D2
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: critical
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: yebblies gmail.com



With dmd2.047 the following code will create a string which is actually invalid
utf-32.
The problem goes away if you start with a non-empty string.
This can occur easily when appending is used in conjunction with the new string
front() and back() functions that return dchar.

test case:


import std.stdio;

string get()
{
    string s;
    dchar c = 'D';

    s ~= c;
    s ~= c;

    return s;
}

void main()
{
    enum s = get(); // Call 'get' with ctfe
    writeln('"', s, '"'); // outputs "D "

    auto p = (s.ptr); // take pointer to defeat bounds checking
    auto s2 = p[0..8];
    writeln('"', s2, '"'); // outputs "D   D   "

    auto s3 = get(); // At compile time, this works as expected
    writeln('"', s3, '"'); // outputs "DD"
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 25 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ice-on-valid-code
            Summary|CTFE: Appending dchar to    |ICE(constfold.c,
                   |empty string creates        |expression.c), or wrong
                   |invalid string.             |code: string~=dchar in CTFE



Actually constant folding of string ~= dchar is ALWAYS wrong, even if the
string is non-empty. All these tests fail. It can also cause an
ICE(constfold.c) if you attempt a further concatenation, or ICE(expression.c)
if it is mixed in (bug 3490).

It happens because string ~= dchar is treated specially in the backend, but not
in constfold.c.

---------

int bug4389()
{    
    string s;
    dchar c = '\u2348';
    s ~= c;
    assert(s.length==3);
    dchar d = 'D';
    s ~= d;
    assert(s.length==4);
    s = "";
    s ~= c;
    assert(s.length==3);
    s ~= d;
    assert(s.length==4);
    string z;
    wchar w = '\u0300';
    z ~= w;
    assert(z.length==2);
    z = "";
    z ~= w;
    assert(z.length==2);
    return 1;
}

static assert(bug4389());

// ICE(constfold.c)
int ice4389()
{
    string s;
    dchar c = '\u2348';
    s ~= c;
    s = s ~ "xxx";
   return 1; 
}

static assert(ice4389());

// ICE(expression.c)
string ice4390()
{
    string s;
    dchar c = '`';
    s ~= c;
    s ~= c;
   return s; 
}

static assert(mixin(ice4390()) == ``);

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 23 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389




*** Issue 4390 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 23 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389




Created an attachment (id=827)
Patch against  D2 svn 767

Although the patch is relatively long, it is very mundane.
The patch adds these functions to utf.c (they are copied from druntime). These
convert single chars from UTF32 to UTF8 or UTF16.

void utf_encodeChar(unsigned char *s, dchar_t c);
void utf_encodeWchar(unsigned short *s, dchar_t c);
int utf_codeLengthChar(dchar_t c);
int utf_codeLengthWchar(dchar_t c);

Two other wrapper functions provide the interface that is actually used:
int utf_codeLength(int sz, dchar_t c);
void utf_encode(int sz, void *s, dchar_t c);

Then, these functions are used in Cat() in constfold.c
Instead of assuming a len of 1, it uses utf_codeLength; and instead of a
memcpy, it uses utf_encode.

The cases which are patched in Cat() are: TOKnull ~ dchar, dchar ~ TOKnull,
TOKstring ~ dchar.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 23 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla digitalmars.com



18:54:58 PST ---
The patch should work in D1, but does not.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 27 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|D2                          |D1 & D2



19:27:54 PST ---
http://www.dsource.org/projects/dmd/changeset/825

Leaving bug open until D1 is fixed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 27 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389




Fails on D1 because dchar, wchar implicitly convert to char, even if they don't
fit.

PATCH: expression.c, CatAssignExp::semantic(), line 8574. The two 'else'
clauses need to be swapped,
so that we check for an array~char before we check for implicit conversion.


    {   // Append array
        e2 = e2->castTo(sc, e1->type);
        type = e1->type;
        e = this;
    }
----------- SWAP THIS SECTION WITH THE NEXT ONE
    else if ((tb1->ty == Tarray) &&
        e2->implicitConvTo(tb1next)
       )
    {   // Append element
        e2 = e2->castTo(sc, tb1next);
        type = e1->type;
        e = this;
    }
-------
    else if (tb1->ty == Tarray &&
        (tb1next->ty == Tchar || tb1next->ty == Twchar) &&
        e2->implicitConvTo(Type::tdchar)
       )
    {   // Append dchar to char[] or wchar[]
        e2 = e2->castTo(sc, Type::tdchar);
        type = e1->type;
        e = this;

        /* Do not allow appending wchar to char[] because if wchar happens
         * to be a surrogate pair, nothing good can result.
         */
    }
-----------
    else
    {
        if (tb1 != Type::terror && tb2 != Type::terror)
            error("cannot append type %s to type %s", tb2->toChars(),
tb1->toChars());
        e = new ErrorExp();
    }

ALTERNATE PATCH:
Incidentally, if dchar->char is disallowed, all DMD tests still pass. But this
would probably cause existing valid code to break, so probably the first
solution is better.

mtype.c,  line 1570, MATCH TypeBasic::implicitConvTo(Type *to)
+       if ((ty == Tdchar || ty == Twchar) && to->ty == Tchar)
+            return MATCHnomatch;
+       if (ty == Tdchar && to->ty == Twchar)
+           return MATCHnomatch;
    }
    else if (flags & TFLAGSfloating)
    {
        // Disallow implicit conversion of floating point to integer
        if (tob->flags & TFLAGSintegral)
            return MATCHnomatch;

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 20 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4389


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED



D1 fixed as well:
https://github.com/D-Programming-Language/dmd/commit/2b51411e9e8b7ca3e62747ffb6f96c94089cebcc

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 06 2011