www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - regex.d(6050): not enough preallocated memory

reply "Paul" <phshaffer gmail.com> writes:
I am trying to see if all regex matches in one file are present 
in another file.
The code works; but, part way through the nested foreach(s) I get 
the error listed in the subject line.  I would think this error 
would come up when the Regex expressions were executed not when 
I'm iterating through the resultant matches.

Is there a better way to do this or can I just allocate more 
memory?
Thanks.

// Execute Regex expressions
auto uniCapturesOld = match(uniFileOld, regex(r"^NAME   = 
(?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
auto uniCapturesNew = match(uniFileNew, regex(r"^NAME   = 
(?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));

// Iterate through match collections to see if both files contain 
the same matches.
     foreach (matchOld; uniCapturesOld) {
         cntOld++;
         found = false;
         foreach (matchNew; uniCapturesNew) {
             cntNew++;
             // Following line is for troublshooting.
             writeln(cntOld,"  ",cntNew,"  ",matchOld.hit,"  
",matchNew.hit);
             if (matchOld.hit == matchNew.hit) {found=true;break;}}
         if (!found) writeln(cntNF++," ",matchOld.hit," not 
found);}
Jun 05 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 06.06.2012 0:25, Paul wrote:
 I am trying to see if all regex matches in one file are present in
 another file.
 The code works; but, part way through the nested foreach(s) I get the
 error listed in the subject line. I would think this error would come up
 when the Regex expressions were executed not when I'm iterating through
 the resultant matches.

so on - it's lazy evaluation at it's finest (how knows maybe you'll break loop half-way through). Obviously it either looses some RAM in between calls or it just bugs out when reaches some specific text.
 Is there a better way to do this or can I just allocate more memory?
 Thanks.

Looks like you found a bug. Meaning that I probably miscalculated required amount of RAM or lose some free list nodes between calls. File a bug report, keep in mind that I need the data to reproduce it. Untill I figure it out, I recommend to fallback on bmatch function that is slower and in general unbound on used memory but should work. Another idea - try to modify one of regexes insignificantly, so that they don't reuse data structures internally (just in case it has to do with that).
 // Execute Regex expressions
 auto uniCapturesOld = match(uniFileOld, regex(r"^NAME =
 (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
 auto uniCapturesNew = match(uniFileNew, regex(r"^NAME =
 (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));

 // Iterate through match collections to see if both files contain the
 same matches.
 foreach (matchOld; uniCapturesOld) {
 cntOld++;
 found = false;
 foreach (matchNew; uniCapturesNew) {
 cntNew++;
 // Following line is for troublshooting.
 writeln(cntOld," ",cntNew," ",matchOld.hit," ",matchNew.hit);
 if (matchOld.hit == matchNew.hit) {found=true;break;}}
 if (!found) writeln(cntNF++," ",matchOld.hit," not found);}

-- Dmitry Olshansky
Jun 05 2012