www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9634] New: std.regex.ctRegex chokes on (?:a+)

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634

           Summary: std.regex.ctRegex chokes on (?:a+)
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: nilsbossung googlemail.com


--- Comment #0 from Nils <nilsbossung googlemail.com> 2013-03-02 10:50:40 PST
---
Maybe a duplicate of issue 7440. But this one gives a different error message.

---
cat > test.d <<code
import std.regex;
auto re = ctRegex!"(?:a+)";
code
dmd -c -o- test.d
---
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2739): Error:
['c', 'a', 's', 't', '(', 'I', 'R', ')', '0'][0u..9u]
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2099):       
called from here: (Kickstart!(char) __ctmp1227 = 0;
 , __ctmp1227).this(this, new uint[](256u))
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2153):       
called from here: this.lightPostprocess()
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(1966):       
called from here: (Regex!(char) __ctmp1948 = 0;
 , __ctmp1948).this(this)
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6489):       
called from here: parser.program()
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6472):       
called from here: regexImpl(pattern, flags)
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6497):       
called from here: regex("(?:a+)", [])
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2739): Error:
['c', 'a', 's', 't', '(', 'I', 'R', ')', '0'][0u..9u]
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2099):       
called from here: (Kickstart!(char) __ctmp1227 = 0;
 , __ctmp1227).this(this, new uint[](256u))
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2153):       
called from here: this.lightPostprocess()
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(1966):       
called from here: (Regex!(char) __ctmp1948 = 0;
 , __ctmp1948).this(this)
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6489):       
called from here: parser.program()
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6472):       
called from here: regexImpl(pattern, flags)
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6499):       
called from here: regex("(?:a+)", [])
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6499):       
called from here: ctGenRegExCode(regex("(?:a+)", []))
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2739): Error:
['c', 'a', 's', 't', '(', 'I', 'R', ')', '0'][0u..9u]
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2099):       
called from here: (Kickstart!(char) __ctmp1227 = 0;
 , __ctmp1227).this(this, new uint[](256u))
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(2153):       
called from here: this.lightPostprocess()
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(1966):       
called from here: (Regex!(char) __ctmp1948 = 0;
 , __ctmp1948).this(this)
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6489):       
called from here: parser.program()
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6472):       
called from here: regexImpl(pattern, flags)
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6506):       
called from here: regex("(?:a+)", [])
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6506):       
called from here: StaticRegex(null, Regex(null, null, null, 0u, 0u, 0u, 0u, 0u,
null, null, ShiftOr(null, 0u, 0u))).this(regex("(?:a+)", []), & func)
/home/nils/d/dmd/dmd2/linux/bin32/../../src/phobos/std/regex.d(6523): Error:
template instance std.regex.ctRegexImpl!("(?:a+)", []) error instantiating
test.d(2):        instantiated from here: ctRegex!("(?:a+)")
test.d(2): Error: template instance std.regex.ctRegex!("(?:a+)") error
instantiating
---

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 02 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #1 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-03-12
07:56:06 PDT ---
Created an attachment (id=1200)
Stripped down regex parser that shows the bug

Don I belive I've found something that of interest.

In short - I've got cut down sample, there is no _ctfe branching here, ~500
lines and most of imports are there to print stuff.

Up to last step in bytecode generation all is OK and the result is the same,
then at the last step somehow even simple fill with 1 to n loop fails.
(see below)

See lines marked as CRITICAL POINT to see this bisection of the execution.

The output is:

RT version
0       Char   (0x0)
1       Char ☺ (0x1)
2       Char ☻ (0x2)
3       Char ♥ (0x3)
4       Char ♦ (0x4)
5       cast(IR)0
6       End

CT version
0       Char   (0x0)
1       Char ♥ (0x3)
2       Char ☻ (0x2)
3       Char ♥ (0x3)
4       Char ♦ (0x4)
5       cast(IR)0
6       End

Instead it should be the same in both versions.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #2 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-04-17
13:18:11 PDT ---
(In reply to comment #1)
 Created an attachment (id=1200) [details]
 Stripped down regex parser that shows the bug
 

I think I've pinpointed the issue in an even smaller test case (~90 LOCs). It directly relates to dealing with arrays of structs at CTFE. Interesting point is that the assertion in main passes in the code below if you switch 'Bytecode' struct that is nothing more then one int to simply int. Hence my thought about structs being the trigger. struct Bytecode { int raw; } struct Parser { dchar _current; bool empty; string pat; Bytecode[] ir; this(string pattern) { pat = pattern; next(); uint fix;//fixup pointer for(;;) { switch(current) { case '(': next(); fix = cast(uint)ir.length; assert(current == '?'); next(); assert(current == ':'); ir ~= Bytecode(-1); next(); break; case ')': //CRITICAL POINT: the last closing paren //return; // up to this point generated bytecode is the same next(); //return; //still OK { //CRITICAL POINT size_t cnt = ir.length-fix-1; //even simple write loop is failing with awful results for(size_t i = 0; i < cnt; i++) { auto bc = Bytecode(i+10); ir[fix+i] = bc; } } return; // and here it differs default: uint start = cast(uint)ir.length; ir ~= Bytecode(10*current); next(); uint len = cast(uint)ir.length - start; next(); ir ~= Bytecode(-4); ir ~= ir[start .. start+len]; ir ~= Bytecode(-1); } } } property dchar current(){ return _current; } bool next() { if(pat.length == 0) { empty = true; return false; } _current = pat[0]; pat = pat[1..$]; return true; } } public auto getIr(string pattern) { auto ir = Parser(pattern).ir; return ir; } void main() { auto re = getIr("(?:a+)"); static re2 = getIr("(?:a+)"); //uncomment to see that it's a 3rd element of 2 arrays that differs /* import std.stdio; writeln("RT version"); writeln(re); writeln("\n\nCT version"); writeln(re2); */ assert(re == re2); } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Apr 17 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|std.regex.ctRegex chokes on |[CTFE] std.regex.ctRegex
                   |(?:a+)                      |chokes on (?:a+)
           Severity|normal                      |critical


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Apr 17 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #3 from Nils <nilsbossung googlemail.com> 2013-04-17 14:20:35 PDT
---
(In reply to comment #2)
 //uncomment to see that it's a 3rd element of 2 arrays that differs
 /*
     import std.stdio;
     writeln("RT version");
     writeln(re);
     writeln("\n\nCT version");
     writeln(re2);
 */

For me, it's the second element that differs (DMD32 D Compiler v2.063-devel-0630526): --- RT version [Bytecode(10), Bytecode(11), Bytecode(12), Bytecode(13), Bytecode(-1)] CT version [Bytecode(10), Bytecode(13), Bytecode(12), Bytecode(13), Bytecode(-1)] --- -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Apr 17 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #4 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-04-17
14:29:31 PDT ---
(In reply to comment #3)
 (In reply to comment #2)
 //uncomment to see that it's a 3rd element of 2 arrays that differs
 /*
     import std.stdio;
     writeln("RT version");
     writeln(re);
     writeln("\n\nCT version");
     writeln(re2);
 */

For me, it's the second element that differs (DMD32 D Compiler v2.063-devel-0630526): --- RT version [Bytecode(10), Bytecode(11), Bytecode(12), Bytecode(13), Bytecode(-1)] CT version [Bytecode(10), Bytecode(13), Bytecode(12), Bytecode(13), Bytecode(-1)] ---

You're right that must a mistake on my part. It's 2nd one that is different as I'm seeing this exact output. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Apr 17 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #5 from Nils <nilsbossung googlemail.com> 2013-04-17 15:12:50 PDT
---
Reduced the code from comment #2 further:

struct Bytecode
{
    int raw;
}

Bytecode[] getIr()
{
    Bytecode[] ir;

    ir ~= Bytecode(42);
    ir ~= ir[0 .. 1]; // add .dup and CTFE gets it
    assert(&ir[0] != &ir[1]); // fails in CTFE

    ir[0].raw = 13; // overwrites both ir[0] and ir[1]
    assert(ir[0].raw == 13);
    assert(ir[1].raw == 42); // fails in CTFE

    return ir;
}

void main()
{
    enum expected = [Bytecode(13), Bytecode(42)];
    assert(getIr() == expected); // passes
    static assert(getIr() == expected); // fails
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Apr 17 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #6 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-05-12
13:42:34 PDT ---
(In reply to comment #5)
 Reduced the code from comment #2 further:
 
 struct Bytecode
 {
     int raw;
 }
 
 Bytecode[] getIr()
 {
     Bytecode[] ir;
 
     ir ~= Bytecode(42);
     ir ~= ir[0 .. 1]; // add .dup and CTFE gets it

A-ha! That indicates that the bug is in appending slices of that same array... That might get us workaround (without dup) and indeed if I change the above line to this: foreach(v; ir[0 .. 1]) ir ~= v; it passes even for bigger slices.
     assert(&ir[0] != &ir[1]); // fails in CTFE
 
     ir[0].raw = 13; // overwrites both ir[0] and ir[1]
     assert(ir[0].raw == 13);
     assert(ir[1].raw == 42); // fails in CTFE
 
     return ir;
 }
 
 void main()
 {
     enum expected = [Bytecode(13), Bytecode(42)];
     assert(getIr() == expected); // passes
     static assert(getIr() == expected); // fails
 }

(In reply to comment #5)
 Reduced the code from comment #2 further:
 
 struct Bytecode
 {
     int raw;
 }
 
 Bytecode[] getIr()
 {
     Bytecode[] ir;
 
     ir ~= Bytecode(42);
     ir ~= ir[0 .. 1]; // add .dup and CTFE gets it
     assert(&ir[0] != &ir[1]); // fails in CTFE
 
     ir[0].raw = 13; // overwrites both ir[0] and ir[1]
     assert(ir[0].raw == 13);
     assert(ir[1].raw == 42); // fails in CTFE
 
     return ir;
 }
 
 void main()
 {
     enum expected = [Bytecode(13), Bytecode(42)];
     assert(getIr() == expected); // passes
     static assert(getIr() == expected); // fails
 }

-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull


--- Comment #7 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-05-12
13:59:59 PDT ---
https://github.com/D-Programming-Language/phobos/pull/1290

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kennytm gmail.com


--- Comment #8 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-05-12
14:03:14 PDT ---
*** Issue 7440 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634


Martin Nowak <code dawg.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |code dawg.eu


--- Comment #9 from Martin Nowak <code dawg.eu> 2013-05-18 08:30:26 PDT ---
This is related to bug 6052.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 18 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #10 from Martin Nowak <code dawg.eu> 2013-05-21 07:10:00 PDT ---
This necessitates a workaround in Phobos.
https://github.com/D-Programming-Language/phobos/pull/1290

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 21 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #11 from github-bugzilla puremagic.com 2013-06-07 10:53:57 PDT ---
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/ed349e1a519e2245c9adc0789deb09350d8511c0
workaround issue 9634

https://github.com/D-Programming-Language/phobos/commit/db7e325f6d4a26294976b8fb9ebc69124d5056bc
Merge pull request #1290 from blackwhale/regex-9634

Workaround issue 9634

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 07 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
            Summary|[CTFE] appending a slice of |[CTFE] wrong code
                   |an array of structs to      |concatenating arrays of
                   |itself                      |structs


--- Comment #12 from Don <clugdbug yahoo.com.au> 2013-06-17 23:48:12 PDT ---
Reduced test case shows it doesn't even require ~=.
---------
struct Bug9634 {
    int raw;
}

bool bug9634()
{
    Bug9634[] jr = [Bug9634(42)];
    Bug9634[] ir = jr ~ jr;
    assert(&ir[0] != &ir[1]);
    return true;
}

static assert(bug9634());

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 17 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634



--- Comment #13 from github-bugzilla puremagic.com 2013-06-19 02:12:22 PDT ---
Commit pushed to master at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/baf2d607b6b723b498753b730b62b9b258a5659e
Fix bug 9634 [CTFE] wrong code concatenating arrays of structs

When concatenating, the array literals need to be duplicated, if
they are CTFE values.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 19 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9634


yebblies <yebblies gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |yebblies gmail.com
         Resolution|                            |FIXED


--- Comment #14 from yebblies <yebblies gmail.com> 2013-06-19 19:13:10 EST ---
(hopefully)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 19 2013