digitalmars.D.learn - D1: Out of memory problems

jicman (59/59) Apr 06 2015 Greetings.

Kagamin (1/1) Apr 07 2015 Depends on how you fill aTUs.

jicman (40/41) Apr 07 2015 Ok, I will bite... ;-)

Kagamin (3/3) Apr 07 2015 For example if you slice the original string, it will be

jicman (3/7) Apr 07 2015 Hmmmm... Will you be able to give me an example of what is bad

Kagamin (3/5) Apr 10 2015 maybe

jicman (3/8) Apr 11 2015 This change causes an out of memory almost instantly. Without

Kagamin (10/10) Apr 11 2015 Parsers unique duplicated strings via a name table:

jicman (5/15) Apr 11 2015 Hmmm... Yes, definitely, that happens... I will have to sit
jicman (2/12) Apr 11 2015 This example helped so much. Thanks.

"jicman" <jicman cinops.xerox.com> writes:

Greetings.

I am using,

15:32:35.63>dmd
Digital Mars D Compiler v1.046
Copyright (c) 1999-2009 by Digital Mars written by Walter Bright
Documentation: http://www.digitalmars.com/d/1.0/index.html

And I have a program that reads a file into UTF8 and does a 
series of string handling to create reports using an Associative 
Array of Arrays.  Then reads another file and does the same thing 
to each file and creates a report based on word usage, etc.  The 
problem is that the program is not releasing the memory.  Imagine 
this program:

//start
class TUCount
{
   int[char[]] File;
   char[][char[]] Target;
   int Count;
}

void ConsistencyCheck(char[] dir)
{
   TUCount[char[]] aTUs;
   char[][] allfiles = std.file.listdir(dir,"*.txt");
   aTUs = GrabUnits(allfiles);
   PrepareReport(aTUs);
}
TUCount[char[]] GrabUnits(char[][] allfiles)
{
   TUCount[char[]] aTUs;
   foreach (char[] f;allfiles)
   {
     char[] wText = "";
     wText = ReadFileData2UTF8(f, bom); //comes from another 
library and not in this file
                                        //<--Out of memory is 
happening in here...
     while (wText.length > 0)
     {
        // lots of some text handling and update aTUs base on text
     }
   }
}
void main
{
   char[] dir = r"C:\temp\LotsOfTextFiles";
   ConsistencyCheck(dir);
}
//end

The out of memory is happening in the ReadFileData2UTF function.  
All that function does is to read the BOM and read the whole file 
into a variable and returns the UTF8 encoded string.  The problem 
is that apparently, it is reading the files and keeping that data 
there and never releasing it.  The taskmanager just keeps on 
growing and growing, etc.  I know that the aTUs content, which is 
being used to keep track of words, etc., is really low on memory 
usage, and it is not the cause of the huge amount of memory shown 
by the taskmanager.  I have 4G on a Win7 x32.  Any help would be 
appreciated.  Thanks.

josé

Apr 06 2015

"Kagamin" <spam here.lot> writes:

Depends on how you fill aTUs.

Apr 07 2015

"jicman" <jicman cinops.xerox.com> writes:

On Tuesday, 7 April 2015 at 08:58:31 UTC, Kagamin wrote:
 Depends on how you fill aTUs.

Ok, I will bite... ;-)

I have the wText string which could be 20 mgs or so, I start 
finding pieces of data like this,

wText = wText[std.string.find(wText,"</ut>") + 5 .. $];

so, everything before </ut>, including it, will be thrown out, 
correct?  So, I continue like this, until I find a piece of the 
string that I want, and then, I fill the aTUs, like this,

aTUs = AddToTrackerRepeat(aTUs, source, fn, 1, target);

where:
  source is a part of the string wanted
  fn is the file name that the string was found
  1 is a count
  target is the other set of string wanted

And these are the other pieces missing:
   TUCount [char[]] AddToTrackerRepeat(TUCount[char[]] T, char[] 
tu, char[] f, int add, char[] target)
   {
     // target = target
     // f = filename
     // tu = translation unit
     // add = amount to be added
     if ((tu in T) == null)
     {
       T[tu] = new TUCount();
       T[tu].Count = 0;
       T[tu].File[f] = 0;
     }
     T[tu].Count += add;
     T[tu].File[f] += add;
     T[tu].Target[f ~ "\t" ~ std.string.toString(T[tu].File[f]) ] 
= target;
     return T;
   }

   class TUCount
   {
     int[char[]] File;
     char[][char[]] Target;
     int Count;
   }

Apr 07 2015

"Kagamin" <spam here.lot> writes:

For example if you slice the original string, it will be 
preserved in memory. That's why parsers keep parsed substrings by 
duplicating them - this can result in smaller memory footprint.

Apr 07 2015

"jicman" <jicman cinops.xerox.com> writes:

On Tuesday, 7 April 2015 at 09:03:19 UTC, Kagamin wrote:
 For example if you slice the original string, it will be 
 preserved in memory. That's why parsers keep parsed substrings 
 by duplicating them - this can result in smaller memory 
 footprint.

Hmmmm... Will you be able to give me an example of what is bad 
and then fix that bad to a good?  This may be my problem...

Apr 07 2015

"Kagamin" <spam here.lot> writes:

On Tuesday, 7 April 2015 at 15:28:21 UTC, jicman wrote:
 Hmmmm... Will you be able to give me an example of what is bad 
 and then fix that bad to a good?  This may be my problem...

maybe
aTUs = AddToTrackerRepeat(aTUs, source.dup, fn, 1, target.dup);

Apr 10 2015

"jicman" <jicman cinops.xerox.com> writes:

On Friday, 10 April 2015 at 13:47:52 UTC, Kagamin wrote:
 On Tuesday, 7 April 2015 at 15:28:21 UTC, jicman wrote:
 Hmmmm... Will you be able to give me an example of what is bad 
 and then fix that bad to a good?  This may be my problem...

 maybe
 aTUs = AddToTrackerRepeat(aTUs, source.dup, fn, 1, target.dup);

This change causes an out of memory almost instantly.  Without 
this change, it takes longer to run out of memory.

Apr 11 2015

"Kagamin" <spam here.lot> writes:

Parsers unique duplicated strings via a name table:
string udup(string s, ref string[string] nameTable)
{
   if(s in nameTable)return nameTable[s];
   string s1=s.dup;
   nameTable[s1]=s1;
   return s1;
}

This way you avoid extra duplicates. You can also try to free 
file content manually when it's processed.

Apr 11 2015

"jicman" <jicman cinops.xerox.com> writes:

On Saturday, 11 April 2015 at 20:45:25 UTC, Kagamin wrote:
 Parsers unique duplicated strings via a name table:
 string udup(string s, ref string[string] nameTable)
 {
   if(s in nameTable)return nameTable[s];
   string s1=s.dup;
   nameTable[s1]=s1;
   return s1;
 }

 This way you avoid extra duplicates. You can also try to free 
 file content manually when it's processed.

Hmmm...  Yes, definitely, that happens...  I will have to sit 
down and jump into out of memory abyss and how to handle it.  
Thanks.

josé

Apr 11 2015

"jicman" <jicman cinops.xerox.com> writes:

On Saturday, 11 April 2015 at 20:45:25 UTC, Kagamin wrote:
 Parsers unique duplicated strings via a name table:
 string udup(string s, ref string[string] nameTable)
 {
   if(s in nameTable)return nameTable[s];
   string s1=s.dup;
   nameTable[s1]=s1;
   return s1;
 }

 This way you avoid extra duplicates. You can also try to free 
 file content manually when it's processed.

This example helped so much.  Thanks.

Apr 11 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - D1: Out of memory problems