www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Bug 93] New: Template regex example fails without -release switch

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/bugzilla/show_bug.cgi?id=93

           Summary: Template regex example fails without -release switch
           Product: D
           Version: 0.152
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: godaves yahoo.com


Without the -release switch, the template example for the 2006 SDWest
Presentation fails on both linux and Windows.

http://www.digitalmars.com/d/templates-revisited.html

The linker error on Windows is:
 Error 42: Symbol Undefined _array_5regex
--- errorlevel 1

The linker error on Linux is:

test_regex.o(.gnu.linkonce.t_D5regex49__T10regexMatchVG12aa12_5b612d7a5d2a5c732a5c772aZ10regexMatchFAaZAAa+0x3a):
In function
`_D5regex49__T10regexMatchVG12aa12_5b612d7a5d2a5c732a5c772aZ10regexMatchFAaZAAa':
: undefined reference to `_array_5regex'
test_regex.o(.gnu.linkonce.t_D5regex30__T9testRangeVAaa1_61VAaa1_7aZ9testRangeFAaZi+0x16):
In function `_D5regex30__T9testRangeVAaa1_61VAaa1_7aZ9testRangeFAaZi':
: undefined reference to `_array_5regex'
test_regex.o(.gnu.linkonce.t_D5regex30__T9testRangeVAaa1_61VAaa1_7aZ9testRangeFAaZi+0x33):
In function `_D5regex30__T9testRangeVAaa1_61VAaa1_7aZ9testRangeFAaZi':
: undefined reference to `_array_5regex'
test_regex.o(.gnu.linkonce.t_D5regex78__T14testZeroOrMoreS55_D5regex30__T9testRangeVAaa1_61VAaa1_7aZ9testRangeFAaZiZ14testZeroOrMoreFAaZi+0x3d):
In function
`_D5regex78__T14testZeroOrMoreS55_D5regex30__T9testRangeVAaa1_61VAaa1_7aZ9testRangeFAaZiZ14testZeroOrMoreFAaZi':
: undefined reference to `_array_5regex'
test_regex.o(.gnu.linkonce.t_D5regex32__T9testRangeVG1aa1_00VG1aa1_20Z9testRangeFAaZi+0x15):
In function `_D5regex32__T9testRangeVG1aa1_00VG1aa1_20Z9testRangeFAaZi':
: undefined reference to `_array_5regex'
test_regex.o(.gnu.linkonce.t_D5regex32__T9testRangeVG1aa1_00VG1aa1_20Z9testRangeFAaZi+0x29):
more undefined references to `_array_5regex' follow
collect2: ld returned 1 exit status
--- errorlevel 1

Source Code
-----------
test_regex.d:
-------------
import std.stdio;
import temp_regex;

void main()
{
    auto exp = &regexMatch!(r"[a-z]*\s*\w*");
    writefln("matches: %s", exp("hello    world"));
}
;---

temp_regex.d
------------
module temp_regex;

const int testFail = -1;

/**
 * Compile pattern[] and expand to a custom generated
 * function that will take a string str[] and apply the
 * regular expression to it, returning an array of matches.
 */

template regexMatch(char[] pattern)
{
  char[][] regexMatch(char[] str)
  {
    char[][] results;
    int n = regexCompile!(pattern).fn(str);
    if (n != testFail && n > 0)
      results ~= str[0..n];
    return results;
  }
}

/******************************
 * The testXxxx() functions are custom generated by templates
 * to match each predicate of the regular expression.
 *
 * Params:
 *      char[] str      the input string to match against
 *
 * Returns:
 *      testFail        failed to have a match
 *      n >= 0          matched n characters
 */

/// Always match
template testEmpty()
{
  int testEmpty(char[] str) { return 0; }
}

/// Match if testFirst(str) and testSecond(str) match
template testUnion(alias testFirst, alias testSecond)
{
  int testUnion(char[] str)
  {
    int n1 = testFirst(str);
    if (n1 != testFail)
    {
      int n2 = testSecond(str[n1 .. $]);
      if (n2 != testFail)
        return n1 + n2;
    }
    return testFail;
  }
}

/// Match if first part of str[] matches text[]
template testText(char[] text)
{
  int testText(char[] str)
  {
    if (str.length &&
        text.length <= str.length &&
        str[0..text.length] == text
       )
      return text.length;
    return testFail;
  }
}

/// Match if testPredicate(str) matches 0 or more times
template testZeroOrMore(alias testPredicate)
{
  int testZeroOrMore(char[] str)
  {
    if (str.length == 0)
      return 0;
    int n = testPredicate(str);
    if (n != testFail)
    {
      int n2 = testZeroOrMore!(testPredicate)(str[n .. $]);
      if (n2 != testFail)
        return n + n2;
      return n;
    }
    return 0;
  }
}

/// Match if term1[0] <= str[0] <= term2[0]
template testRange(char[] term1, char[] term2)
{
  int testRange(char[] str)
  {
    if (str.length && str[0] >= term1[0]
                   && str[0] <= term2[0])
      return 1;
    return testFail;
  }
}

/// Match if ch[0]==str[0]
template testChar(char[] ch)
{
  int testChar(char[] str)
  {
    if (str.length && str[0] == ch[0])
      return 1;
    return testFail;
  }
}

/// Match if str[0] is a word character
template testWordChar()
{
  int testWordChar(char[] str)
  {
    if (str.length &&
        (
         (str[0] >= 'a' && str[0] <= 'z') ||
         (str[0] >= 'A' && str[0] <= 'Z') ||
         (str[0] >= '0' && str[0] <= '9') ||
         str[0] == '_'
        )
       )
    {
      return 1;
    }
    return testFail;
  }
}

/*****************************************************/

/**
 * Returns the front of pattern[] up until
 * the end or a special character.
 */

template parseTextToken(char[] pattern)
{
  static if (pattern.length > 0)
  {
    static if (isSpecial!(pattern))
      const char[] parseTextToken = "";
    else
      const char[] parseTextToken =
           pattern[0..1] ~ parseTextToken!(pattern[1..$]);
  }     
  else
    const char[] parseTextToken="";
}

/**
 * Parses pattern[] up to and including terminator.
 * Returns:
 *      token[]         everything up to terminator.
 *      consumed        number of characters in pattern[] parsed
 */
template parseUntil(char[] pattern,char terminator,bool fuzzy=false)
{
  static if (pattern.length > 0)
  {
    static if (pattern[0] == '\\')
    {
      static if (pattern.length > 1)
      {
        const char[] nextSlice = pattern[2 .. $];
        alias parseUntil!(nextSlice,terminator,fuzzy) next;
        const char[] token = pattern[0 .. 2] ~ next.token;
        const uint consumed = next.consumed+2;
      }
      else
      {
        pragma(msg,"Error: expected character to follow \\");
        static assert(false);
      }
    }
    else static if (pattern[0] == terminator)
    {
      const char[] token="";
      const uint consumed = 1;
    }
    else
    {
      const char[] nextSlice = pattern[1 .. $];
      alias parseUntil!(nextSlice,terminator,fuzzy) next;
      const char[] token = pattern[0..1] ~ next.token;
      const uint consumed = next.consumed+1;
    }
  }
  else static if (fuzzy)
  {
    const char[] token = "";
    const uint consumed = 0;
  }
  else
  {
    pragma(msg,"Error: expected " ~
               terminator ~
               " to terminate group expression");
    static assert(false);
  }                     
}

/**
 * Parse contents of character class.
 * Params:
 *   pattern[] = rest of pattern to compile
 * Output:
 *   fn       = generated function
 *   consumed = number of characters in pattern[] parsed
 */

template regexCompileCharClass2(char[] pattern)
{
  static if (pattern.length > 0)
  {
    static if (pattern.length > 1)
    {
      static if (pattern[1] == '-')
      {
        static if (pattern.length > 2)
        {
          alias testRange!(pattern[0..1], pattern[2..3]) termFn;
          const uint thisConsumed = 3;
          const char[] remaining = pattern[3 .. $];
        }
        else // length is 2
        {
          pragma(msg,
            "Error: expected char following '-' in char class");
          static assert(false); 
        }
      }
      else // not '-'
      {
        alias testChar!(pattern[0..1]) termFn;
        const uint thisConsumed = 1;
        const char[] remaining = pattern[1 .. $];
      }
    }
    else
    {
      alias testChar!(pattern[0..1]) termFn;
      const uint thisConsumed = 1;
      const char[] remaining = pattern[1 .. $];
    }
    alias regexCompileCharClassRecurse!(termFn,remaining) recurse;
    alias recurse.fn fn;
    const uint consumed = recurse.consumed + thisConsumed;
  }
  else
  {
    alias testEmpty!() fn;
    const uint consumed = 0;
  }
}

/**
 * Used to recursively parse character class.
 * Params:
 *  termFn = generated function up to this point
 *  pattern[] = rest of pattern to compile
 * Output:
 *  fn = generated function including termFn and
 *       parsed character class
 *  consumed = number of characters in pattern[] parsed
 */

template regexCompileCharClassRecurse(alias termFn,char[] pattern)
{
  static if (pattern.length > 0 && pattern[0] != ']')
  {
    alias regexCompileCharClass2!(pattern) next;
    alias testOr!(termFn,next.fn,pattern) fn;
    const uint consumed = next.consumed;
  }
  else
  {
    alias termFn fn;
    const uint consumed = 0;
  }
}

/**
 * At start of character class. Compile it.
 * Params:
 *  pattern[] = rest of pattern to compile
 * Output:
 *  fn = generated function
 *  consumed = number of characters in pattern[] parsed
 */

template regexCompileCharClass(char[] pattern)
{       
  static if (pattern.length > 0)
  {
    static if (pattern[0] == ']')
    {
      alias testEmpty!() fn;
      const uint consumed = 0;
    }
    else
    {
      alias regexCompileCharClass2!(pattern) charClass;
      alias charClass.fn fn;
      const uint consumed = charClass.consumed;
    }
  }
  else
  {
    pragma(msg,"Error: expected closing ']' for character class");
    static assert(false);       
  }
}

/**
 * Look for and parse '*' postfix.
 * Params:
 *  test = function compiling regex up to this point
 *  pattern[] = rest of pattern to compile
 * Output:
 *  fn = generated function
 *  consumed = number of characters in pattern[] parsed
 */

template regexCompilePredicate(alias test, char[] pattern)
{
  static if (pattern.length > 0 && pattern[0] == '*')
  {
    alias testZeroOrMore!(test) fn;
    const uint consumed = 1;
  }
  else
  {
    alias test fn;
    const uint consumed = 0;
  }
}

/**
 * Parse escape sequence.
 * Params:
 *  pattern[] = rest of pattern to compile
 * Output:
 *  fn = generated function
 *  consumed = number of characters in pattern[] parsed
 */

template regexCompileEscape(char[] pattern)
{
  static if (pattern.length > 0)
  {
    static if (pattern[0] == 's')
    {
      // whitespace char
      alias testRange!("\x00","\x20") fn;
    }
    else static if (pattern[0] == 'w')
    {
      //word char
      alias testWordChar!() fn;
    }
    else
    {
      alias testChar!(pattern[0 .. 1]) fn;
    }
    const uint consumed = 1;
  }
  else
  {
    pragma(msg,"Error: expected char following '\\'");
    static assert(false);
  }
}

/**
 * Parse and compile regex represented by pattern[].
 * Params:
 *  pattern[] = rest of pattern to compile
 * Output:
 *  fn = generated function
 */

template regexCompile(char[] pattern)
{
  static if (pattern.length > 0)
  {
    static if (pattern[0] == '[')
    {
      const char[] charClassToken =
          parseUntil!(pattern[1 .. $],']').token;
      alias regexCompileCharClass!(charClassToken) charClass;
      const char[] token = pattern[0 .. charClass.consumed+2];
      const char[] next = pattern[charClass.consumed+2 .. $];
      alias charClass.fn test;
    }
    else static if (pattern[0] == '\\')
    {
      alias regexCompileEscape!(pattern[1..pattern.length]) escapeSequence;
      const char[] token = pattern[0 .. escapeSequence.consumed+1];
      const char[] next =
          pattern[escapeSequence.consumed+1 .. $];
      alias escapeSequence.fn test;
    }
    else
    {
      const char[] token = parseTextToken!(pattern);
      static assert(token.length > 0);
      const char[] next = pattern[token.length .. $];
      alias testText!(token) test;
    }

    alias regexCompilePredicate!(test, next) term;
    const char[] remaining = next[term.consumed .. next.length];

    alias regexCompileRecurse!(term,remaining).fn fn;
  }
  else
    alias testEmpty!() fn;
}

template regexCompileRecurse(alias term,char[] pattern)
{
  static if (pattern.length > 0)
  {
    alias regexCompile!(pattern) next;
    alias testUnion!(term.fn, next.fn) fn;
  }
  else
    alias term.fn fn;
}

/// Utility function for parsing
template isSpecial(char[] pattern)
{
  static if (
    pattern[0] == '*' ||
    pattern[0] == '+' ||
    pattern[0] == '?' ||
    pattern[0] == '.' ||
    pattern[0] == '[' ||
    pattern[0] == '{' ||
    pattern[0] == '(' ||
    pattern[0] == ')' ||
    pattern[0] == '$' ||
    pattern[0] == '^' ||
    pattern[0] == '\\'
  )
    const isSpecial = true;
  else
    const isSpecial = false;
}


-- 
Apr 08 2006
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/bugzilla/show_bug.cgi?id=93


clugdbug yahoo.com.au changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|blocker                     |major




------- Comment #1 from clugdbug yahoo.com.au  2006-04-11 02:13 -------
This isn't a blocker.


-- 
Apr 11 2006
prev sibling next sibling parent reply d-bugmail puremagic.com writes:
http://d.puremagic.com/bugzilla/show_bug.cgi?id=93


godaves yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |blocker
            Version|0.152                       |0.153




------- Comment #2 from godaves yahoo.com  2006-04-11 09:08 -------
"Blocker: Blocks development and/or testing work." It's a blocker if you run
into that bug and want to use Contract Programming during the course of
development and testing. After all, that's a major part of the langauge. Let
Walter make the call.


-- 
Apr 11 2006
parent reply Don Clugston <dac nospam.com.au> writes:
d-bugmail puremagic.com wrote:
 http://d.puremagic.com/bugzilla/show_bug.cgi?id=93
 
 
 godaves yahoo.com changed:
 
            What    |Removed                     |Added
 ----------------------------------------------------------------------------
            Severity|major                       |blocker
             Version|0.152                       |0.153
 
 
 
 
 ------- Comment #2 from godaves yahoo.com  2006-04-11 09:08 -------
 "Blocker: Blocks development and/or testing work." It's a blocker if you run
 into that bug and want to use Contract Programming during the course of
 development and testing. After all, that's a major part of the langauge. Let
 Walter make the call.

That category list really should be changed, it is completely inappropriate for a compiler. Almost every bug affects development and testing work in that sense! (And segfaults of the compiler are not as bad as incorrect code generation). The fact that a particular example does not compile with -release is not a blocker. I can assure you that contract programming works in general. Blockers are very rare, one example occurred in an early DMD release where almost any program would fail to compile. I doubt that any blockers will be discovered that aren't regressions. (An example of a blocker would be: "dmd can no longer be used with build"). To have any chance of this being fixed, you need to have a go at cutting down the error. Walter generally ignores bug reports which are longer than 20 lines. I suspect he'll completely ignore the severity.
Apr 11 2006
parent reply Dave <Dave_member pathlink.com> writes:
Don Clugston wrote:
 d-bugmail puremagic.com wrote:
 http://d.puremagic.com/bugzilla/show_bug.cgi?id=93


 godaves yahoo.com changed:

            What    |Removed                     |Added
 ---------------------------------------------------------------------------- 

            Severity|major                       |blocker
             Version|0.152                       |0.153




 ------- Comment #2 from godaves yahoo.com  2006-04-11 09:08 -------
 "Blocker: Blocks development and/or testing work." It's a blocker if 
 you run
 into that bug and want to use Contract Programming during the course of
 development and testing. After all, that's a major part of the 
 langauge. Let
 Walter make the call.

That category list really should be changed, it is completely inappropriate for a compiler. Almost every bug affects development and testing work in that sense! (And segfaults of the compiler are not as bad as incorrect code generation). The fact that a particular example does not compile with -release is not a blocker. I can assure you that contract programming works in general. Blockers are very rare, one example occurred in an early DMD release where almost any program would fail to compile. I doubt that any blockers will be discovered that aren't regressions. (An example of a blocker would be: "dmd can no longer be used with build"). To have any chance of this being fixed, you need to have a go at cutting down the error. Walter generally ignores bug reports which are longer than 20 lines. I suspect he'll completely ignore the severity.

I appreciate your concerns and believe it or not put some thought into the original report severity, etc. If Walter wants to ignore it that is his prerogative. If Walter wants to 'downgrade' it that is fine w/ me. Believe me, I'm not doing this stuff to make Walter's job harder. I did not try to reduce the error any more than it is because the summary of the example says: "What follows is a cut-down version of Eric Anderton's regex compiler. It is just enough to compile the regular expression above, serving to illustrate how it is done." In fact I went to the extra 'trouble' of copying and pasting the code to put it all in one spot, and tested it both on Windows and Linux. I agree it probably a recent regression - all the more reason IMHO to get it taken care of right away because Walter knows what he's changed recently in that area. I also agree that perhaps some better bug report descriptions could be developed, but I hesitate to say that because I don't have the time right now to come up with suggestions and/or make the changes myself. - Dave
Apr 11 2006
parent Don Clugston <dac nospam.com.au> writes:
Dave wrote:
 Don Clugston wrote:
 d-bugmail puremagic.com wrote:
 http://d.puremagic.com/bugzilla/show_bug.cgi?id=93


 godaves yahoo.com changed:

            What    |Removed                     |Added
 ---------------------------------------------------------------------------- 

            Severity|major                       |blocker
             Version|0.152                       |0.153




 ------- Comment #2 from godaves yahoo.com  2006-04-11 09:08 -------
 "Blocker: Blocks development and/or testing work." It's a blocker if 
 you run
 into that bug and want to use Contract Programming during the course of
 development and testing. After all, that's a major part of the 
 langauge. Let
 Walter make the call.

That category list really should be changed, it is completely inappropriate for a compiler. Almost every bug affects development and testing work in that sense! (And segfaults of the compiler are not as bad as incorrect code generation). The fact that a particular example does not compile with -release is not a blocker. I can assure you that contract programming works in general. Blockers are very rare, one example occurred in an early DMD release where almost any program would fail to compile. I doubt that any blockers will be discovered that aren't regressions. (An example of a blocker would be: "dmd can no longer be used with build"). To have any chance of this being fixed, you need to have a go at cutting down the error. Walter generally ignores bug reports which are longer than 20 lines. I suspect he'll completely ignore the severity.

I appreciate your concerns and believe it or not put some thought into the original report severity, etc. If Walter wants to ignore it that is his prerogative. If Walter wants to 'downgrade' it that is fine w/ me. Believe me, I'm not doing this stuff to make Walter's job harder. I did not try to reduce the error any more than it is because the summary of the example says: "What follows is a cut-down version of Eric Anderton's regex compiler. It is just enough to compile the regular expression above, serving to illustrate how it is done."

It's just a bit of proof-of-concept code showing what's possible with D templates. No-one should be using the code for any other purpose. Minimal for a regexp does not mean minimal for a bug report. The whole regexp thing is completely irrelevant to this bug.
 In fact I went to the extra 'trouble' of copying and pasting the code to 
 put it all in one spot, and tested it both on Windows and Linux.
 
 I agree it probably a recent regression - all the more reason IMHO to 
 get it taken care of right away because Walter knows what he's changed 
 recently in that area.

Actually, the template part of the compiler has changed a lot since Eric wrote that code. I'm a little surprised that it compiles at all. (My compile-time regex, which greatly improves upon that one, was written against a much more recent compiler, is currently broken due to improvements in the template syntax).
 I also agree that perhaps some better bug report descriptions could be 
 developed, but I hesitate to say that because I don't have the time 
 right now to come up with suggestions and/or make the changes myself.

When bugzilla was set up, Walter proposed some definitions which made a lot of sense. I don't understand why the default inappropriate ones were retained. A compiler is so different to a normal app.
Apr 11 2006
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/bugzilla/show_bug.cgi?id=93





------- Comment #3 from clugdbug yahoo.com.au  2006-04-12 09:55 -------
I've tried to reproduce this on Windows with DMD 0.153. It always compiles for
me.
I also don't understand the reference to Contract Programming in message #3
(there's no contract programming in this code).


-- 
Apr 12 2006
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/bugzilla/show_bug.cgi?id=93


godaves yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|blocker                     |trivial
           Priority|P2                          |P5




------- Comment #4 from godaves yahoo.com  2006-04-12 21:10 -------
(In reply to comment #3)
 I've tried to reproduce this on Windows with DMD 0.153. It always compiles for
 me.
 I also don't understand the reference to Contract Programming in message #3
 (there's no contract programming in this code).

The linker error happens because of array bounds checking code that is omitted with -release. I recreated it, but it is arguably my mistake (read on). I copied the code into two files, test_regex.d and temp_regex.d. Then I recompiled: C:\Zz\temp>dmd test_regex.d C:\dmd\bin\..\..\dm\bin\link.exe test_regex,,,user32+kernel32/noi; OPTLINK (R) for Win32 Release 7.50B1 Copyright (C) Digital Mars 1989 - 2001 All Rights Reserved test_regex.obj(test_regex) Error 42: Symbol Undefined _array_10temp_regex --- errorlevel 1 Then I recompiled again with -release and ran it: C:\Zz\temp>dmd test_regex.d -release C:\dmd\bin\..\..\dm\bin\link.exe test_regex,,,user32+kernel32/noi; C:\Zz\temp>test_regex matches: [hello] That recreates the problem, and I should have specified the exact steps better. But, if I recompile w/o -release like so: C:\Zz\temp>dmd test_regex.d temp_regex.d C:\dmd\bin\..\..\dm\bin\link.exe test_regex+temp_regex,,,user32+kernel32/noi; C:\Zz\temp>test_regex matches: [hello] Then it works. The reason I didn't compile in temp_regex.d (or link in the .obj compiled separately) is because the code in tempregex.d is all of either const or template code. Being used to C/++ #include <header>, I just compiled the main() module. So under normal circumstances (e.g. the regex code is linked into a lib and that lib is linked with the app.) this 'bug' would probably not have happened, so along with the other things you pointed out, I lowered the Severity for it to 'trivial' and priority to 'informational'. This is a potentially frustrating inconsistency between the compiler switches because, as the templates are always instantiated in the declaritive scope, the compiler generated stuff is (correctly) generated for the same scope. I say potentially frustrating because sometimes compiler generated stuff is "out of sight, out of mind", at least for me. Walter probably spotted this right away from the linker error and just ignored it or sat back and chuckled as the e-mails went back and forth <g> (The reference to contract programming is because the -release switch omits pre and post contracts, along with asserts, invariants, etc. So, what I was referring to is that if you ran into this bug, then in order to get it to compile the -release switch would remove your CP code, hence "blocker"). Thanks, - Dave --
Apr 12 2006
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/bugzilla/show_bug.cgi?id=93


godaves yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




-- 
Apr 28 2006