www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - regex.d(6050): not enough preallocated memory

reply "Paul" <phshaffer gmail.com> writes:
I am trying to see if all regex matches in one file are present 
in another file.
The code works; but, part way through the nested foreach(s) I get 
the error listed in the subject line.  I would think this error 
would come up when the Regex expressions were executed not when 
I'm iterating through the resultant matches.

Is there a better way to do this or can I just allocate more 
memory?
Thanks.

// Execute Regex expressions
auto uniCapturesOld = match(uniFileOld, regex(r"^NAME   = 
(?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
auto uniCapturesNew = match(uniFileNew, regex(r"^NAME   = 
(?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));

// Iterate through match collections to see if both files contain 
the same matches.
     foreach (matchOld; uniCapturesOld) {
         cntOld++;
         found = false;
         foreach (matchNew; uniCapturesNew) {
             cntNew++;
             // Following line is for troublshooting.
             writeln(cntOld,"  ",cntNew,"  ",matchOld.hit,"  
",matchNew.hit);
             if (matchOld.hit == matchNew.hit) {found=true;break;}}
         if (!found) writeln(cntNF++," ",matchOld.hit," not 
found);}
Jun 05 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 06.06.2012 0:25, Paul wrote:
 I am trying to see if all regex matches in one file are present in
 another file.
 The code works; but, part way through the nested foreach(s) I get the
 error listed in the subject line. I would think this error would come up
 when the Regex expressions were executed not when I'm iterating through
 the resultant matches.

To get next match engine is run again, then again for the next match and so on - it's lazy evaluation at it's finest (how knows maybe you'll break loop half-way through). Obviously it either looses some RAM in between calls or it just bugs out when reaches some specific text.
 Is there a better way to do this or can I just allocate more memory?
 Thanks.

Looks like you found a bug. Meaning that I probably miscalculated required amount of RAM or lose some free list nodes between calls. File a bug report, keep in mind that I need the data to reproduce it. Untill I figure it out, I recommend to fallback on bmatch function that is slower and in general unbound on used memory but should work. Another idea - try to modify one of regexes insignificantly, so that they don't reuse data structures internally (just in case it has to do with that).
 // Execute Regex expressions
 auto uniCapturesOld = match(uniFileOld, regex(r"^NAME =
 (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
 auto uniCapturesNew = match(uniFileNew, regex(r"^NAME =
 (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));

 // Iterate through match collections to see if both files contain the
 same matches.
 foreach (matchOld; uniCapturesOld) {
 cntOld++;
 found = false;
 foreach (matchNew; uniCapturesNew) {
 cntNew++;
 // Following line is for troublshooting.
 writeln(cntOld," ",cntNew," ",matchOld.hit," ",matchNew.hit);
 if (matchOld.hit == matchNew.hit) {found=true;break;}}
 if (!found) writeln(cntNF++," ",matchOld.hit," not found);}

-- Dmitry Olshansky
Jun 05 2012