digitalmars.D.learn - regex.d(6050): not enough preallocated memory

Paul (27/27) Jun 05 2012 I am trying to see if all regex matches in one file are present

Dmitry Olshansky (15/39) Jun 05 2012 To get next match engine is run again, then again for the next match and...

"Paul" <phshaffer gmail.com> writes:

I am trying to see if all regex matches in one file are present 
in another file.
The code works; but, part way through the nested foreach(s) I get 
the error listed in the subject line.  I would think this error 
would come up when the Regex expressions were executed not when 
I'm iterating through the resultant matches.

Is there a better way to do this or can I just allocate more 
memory?
Thanks.

// Execute Regex expressions
auto uniCapturesOld = match(uniFileOld, regex(r"^NAME   = 
(?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
auto uniCapturesNew = match(uniFileNew, regex(r"^NAME   = 
(?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));

// Iterate through match collections to see if both files contain 
the same matches.
     foreach (matchOld; uniCapturesOld) {
         cntOld++;
         found = false;
         foreach (matchNew; uniCapturesNew) {
             cntNew++;
             // Following line is for troublshooting.
             writeln(cntOld,"  ",cntNew,"  ",matchOld.hit,"  
",matchNew.hit);
             if (matchOld.hit == matchNew.hit) {found=true;break;}}
         if (!found) writeln(cntNF++," ",matchOld.hit," not 
found);}

Jun 05 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 06.06.2012 0:25, Paul wrote:
 I am trying to see if all regex matches in one file are present in
 another file.
 The code works; but, part way through the nested foreach(s) I get the
 error listed in the subject line. I would think this error would come up
 when the Regex expressions were executed not when I'm iterating through
 the resultant matches.

To get next match engine is run again, then again for the next match and 
so on - it's lazy evaluation at it's finest (how knows maybe you'll 
break loop half-way through). Obviously it either looses some RAM in 
between calls or it just bugs out when reaches some specific text.

 Is there a better way to do this or can I just allocate more memory?
 Thanks.

Looks like you found a bug. Meaning that I probably miscalculated 
required amount of RAM or lose some free list nodes between calls.

File a bug report, keep in mind that I need the data to reproduce it.

Untill I figure it out, I recommend to fallback on bmatch function that 
is slower and in general unbound on used memory but should work.

Another idea - try to modify one of regexes insignificantly, so that 
they don't reuse data structures internally (just in case it has to do 
with that).

 // Execute Regex expressions
 auto uniCapturesOld = match(uniFileOld, regex(r"^NAME =
 (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));
 auto uniCapturesNew = match(uniFileNew, regex(r"^NAME =
 (?P<comp>[a-zA-Z0-9_]+):*(?P<blk>[a-zA-Z0-9_]*)","gm"));


 // Iterate through match collections to see if both files contain the
 same matches.
 foreach (matchOld; uniCapturesOld) {
 cntOld++;
 found = false;
 foreach (matchNew; uniCapturesNew) {
 cntNew++;
 // Following line is for troublshooting.
 writeln(cntOld," ",cntNew," ",matchOld.hit," ",matchNew.hit);
 if (matchOld.hit == matchNew.hit) {found=true;break;}}
 if (!found) writeln(cntNF++," ",matchOld.hit," not found);}


-- 
Dmitry Olshansky

Jun 05 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - regex.d(6050): not enough preallocated memory