www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 8203] New: Use of std.regex.match() generates "not enough preallocated memory" error

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203

           Summary: Use of std.regex.match() generates "not enough
                    preallocated memory" error
           Product: D
           Version: D2
          Platform: x86_64
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: phshaffer gmail.com



Created an attachment (id=1110)
File to Compare

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 06 2012
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203




Created an attachment (id=1111)
File to Compare

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 06 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203




Created an attachment (id=1112)
Source File

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 06 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203




Created an attachment (id=1113)
Console Screenshot with Error Showing

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 06 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203




Dmitry Olshansky recommended I submit this as a bug.

The program is executed as : icomp2 fold.txt fnew.txt

It should search fold.txt for certain text patterns and then see if all "found"
text also appears in fnew.txt.  Fold.txt and Fnew.txt are identical so all
"found" text should appeart in Fnew.txt as well.

I added some diagnostic loops counters for troubleshooting:
writeln(cntOld,"  ",cntNew,"  ",matchOld.hit,"  ",matchNew.hit);

As the screenshot shows after several iterations, it crashes with -> 
core.exception.AssertError C:\D\dmd2\windows\bin\..\..\src\phobos\std\regex.d(60
50): not enough preallocated memory

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 06 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh gmail.com



06:13:01 PDT ---

 Dmitry Olshansky recommended I submit this as a bug.
 
Yup, case I'm the only one to fix it, at least in near future ;)
 The program is executed as : icomp2 fold.txt fnew.txt
 
 It should search fold.txt for certain text patterns and then see if all "found"
 text also appears in fnew.txt.  Fold.txt and Fnew.txt are identical so all
 "found" text should appeart in Fnew.txt as well.
 
 I added some diagnostic loops counters for troubleshooting:
 writeln(cntOld,"  ",cntNew,"  ",matchOld.hit,"  ",matchNew.hit);
 
 As the screenshot shows after several iterations, it crashes with -> 
 core.exception.AssertError C:\D\dmd2\windows\bin\..\..\src\phobos\std\regex.d(60
 50): not enough preallocated memory
Thanks, I'm on it. We'd better get fixed it in 2.060. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 06 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203




04:35:32 PDT ---
I've studied it a bit, and here is the details:
it only happens, when re-running the same match object many times:

foreach(v; match(...)) // no bug
vs
auto m = match(....)
foreach(v; m) //does run out of memory

In your case I see from comments that you try hard to do eager evalutaion, and
first find all matches then work through two arrays of them. Yet it's not what
program does, it still performes N*M regex searches because 
auto uniCapturesNew = match(uniFileOld, regex(...));

just starts the engine and finds 1st match. Then you copy engine state on each
iteration of nested loop (this copy operation is bogus apparently) and run
engine till all matches are found. Next iteration of loop  - another copy.

So in your case I strongly suggest to do this magic recipe, that work for all
lazy ranges:
auto allMatches = array(match(....);

and work with arrays from now on.


Anyway, the root cause is now clear and I've reduced it to:

import std.regex;
string data = "
NAME   = XPAW01_STA:STATION
NAME   = XPAW01_STA
";
// Main function
void main(){
    auto uniFileOld = data;    
    auto uniCapturesNew = match(uniFileOld, regex(r"^NAME   =
(?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));

    for(int i=0; i<20; i++)
  { foreach (matchNew; uniCapturesNew) {} }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 07 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull
           Platform|x86_64                      |All
         OS/Version|Windows                     |All



14:38:19 PDT ---
https://github.com/D-Programming-Language/phobos/pull/623

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 07 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203




Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/0c35fcd694481753cebae9803906f6d857fe954f
fix Issue 8203

Change RegexMatch objects to follow proper COW semantics

https://github.com/D-Programming-Language/phobos/commit/245782bb6393b4a415c0e1e93b8a05f448e1457f
unittest for bug 8203

https://github.com/D-Programming-Language/phobos/commit/f1757b88fa2fda9f5db74493be762c058d3e0111


fix Issue 8203 std.regex.match() generates "not enough preallocated memory"

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 08 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203




Commit pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/065e7a1f78f176b988820b0a54e22d8eb9d59819
Updated changelog for fix to issue 8203.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 08 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8203


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |jmdavisProg gmx.com
         Resolution|                            |FIXED


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 08 2012