digitalmars.D.learn - How to free memory allocated via double[][] using dmd-2.0.12?

Markus Dittrich (27/27) Apr 08 2008 Hi,

Regan Heath (17/45) Apr 08 2008 Did you try just setting the array reference to null. This should make

Markus Dittrich (6/26) Apr 08 2008 Thanks for the hint. I just tried this again just to make sure and also

BCS (20/51) Apr 08 2008 One "hack" would be to have read_data() allocate a big buffer and then

Markus Dittrich (8/65) Apr 08 2008 Thanks much for you response! I could certainly role my own buffer

BCS (2/6) Apr 08 2008 life would be so much nicer if real life didn't get in the way :b

Bill Baxter (11/39) Apr 08 2008 Markus, you do not show us what either read_data or process_data do.

Markus Dittirich (39/53) Apr 08 2008 Hi Bill,

Bill Baxter (8/74) Apr 08 2008 Ok. You should add that to the bug report.
BCS (4/31) Apr 08 2008 does it change things if you drop the ~= in favor of extending the

Markus Dittirich (8/12) Apr 08 2008 I'll play around with this some more. I've tried pre-alocating which did...

Markus Dittrich <markusle gmail.com> writes:

Hi,

For a data processing application I need to read a large number
of data sets from disk. Due to their size, they have to be read and 
processed sequentially, i.e. in pseudocode

int main()
{
    while (some condition)
    {
         double[][] myLargeDataset = read_data();
         process_data(myLargeDataset);
         // free all memory here otherwise next cycle will 
         // run out of memory
     }
    
   return 0;
}

Now, the "problem" is the fact that each single data-set saturates
system memory and hence I need to make sure that all memory
is freed after each process_data step is complete. Unfortunately,
using dmd-2.012 I have not been able to achieve this. Whatever
I do (including nothing, i.e., letting the GC do its job), the resulting 
binary keeps accumulating memory and crashing shortly after). 
I've tried deleting the array, setting the array lengths to 0, and manually
forcing the GC to collect, to no avail. Hence, is there something I am 
doing terribly wrong or is this a bug in dmd?

Thanks much,
Markus

Apr 08 2008

Regan Heath <regan netmail.co.nz> writes:

Markus Dittrich wrote:
 Hi,
 
 For a data processing application I need to read a large number
 of data sets from disk. Due to their size, they have to be read and 
 processed sequentially, i.e. in pseudocode
 
 int main()
 {
     while (some condition)
     {
          double[][] myLargeDataset = read_data();
          process_data(myLargeDataset);
          // free all memory here otherwise next cycle will 
          // run out of memory
      }
     
    return 0;
 }
 
 Now, the "problem" is the fact that each single data-set saturates
 system memory and hence I need to make sure that all memory
 is freed after each process_data step is complete. Unfortunately,
 using dmd-2.012 I have not been able to achieve this. Whatever
 I do (including nothing, i.e., letting the GC do its job), the resulting 
 binary keeps accumulating memory and crashing shortly after). 
 I've tried deleting the array, setting the array lengths to 0, and manually
 forcing the GC to collect, to no avail. Hence, is there something I am 
 doing terribly wrong or is this a bug in dmd?

Did you try just setting the array reference to null.  This should make 
the contents of the array unreachable and therefore it should be 
collected when you next allocate (and run short on memory).

i.e.

int main()
{
     double[][] myLargeDataset;
     while (some condition)
     {
	 myLargeDataset = read_data();
          process_data(myLargeDataset);
          myLargeDataset = null;
      }

    return 0;
}

Regan

Apr 08 2008

Markus Dittrich <markusle gmail.com> writes:

Regan Heath Wrote:

 Did you try just setting the array reference to null.  This should make 
 the contents of the array unreachable and therefore it should be 
 collected when you next allocate (and run short on memory).
 
 i.e.
 
 int main()
 {
      double[][] myLargeDataset;
      while (some condition)
      {
 	 myLargeDataset = read_data();
           process_data(myLargeDataset);
           myLargeDataset = null;
       }
 
     return 0;
 }
 
 Regan

Thanks for the hint. I just tried this again just to make sure and also
tried plopping an std.gc.fullCollect() right after to force the GC to 
collect. In both cases I can watch memory consumption grow continuously
with the system running out of memory eventually. Maybe its a GC bug?

Markus

Apr 08 2008

BCS <BCS pathlink.com> writes:

Markus Dittrich wrote:
 Hi,
 
 For a data processing application I need to read a large number
 of data sets from disk. Due to their size, they have to be read and 
 processed sequentially, i.e. in pseudocode
 
 int main()
 {
     while (some condition)
     {
          double[][] myLargeDataset = read_data();
          process_data(myLargeDataset);
          // free all memory here otherwise next cycle will 
          // run out of memory
      }
     
    return 0;
 }
 
 Now, the "problem" is the fact that each single data-set saturates
 system memory and hence I need to make sure that all memory
 is freed after each process_data step is complete. Unfortunately,
 using dmd-2.012 I have not been able to achieve this. Whatever
 I do (including nothing, i.e., letting the GC do its job), the resulting 
 binary keeps accumulating memory and crashing shortly after). 
 I've tried deleting the array, setting the array lengths to 0, and manually
 forcing the GC to collect, to no avail. Hence, is there something I am 
 doing terribly wrong or is this a bug in dmd?
 
 Thanks much,
 Markus

One "hack" would be to have read_data() allocate a big buffer and then 
slice the parts of the double[][] out of it. This would have the 
advantage that you can just keep track of the buffer and on the next 
pass just reuse it in it's entirety, you never have to delete it.

double[][] read_data()
{
	static byte[] buff;
	if(buff.prt is null) buff = new byte[huge];

	byte left = buff;

	T[] Alloca(T)(int i)
	{
		T[] ret = (cast(*T)left.prt)[0..i];
		buff = buff[i*T.sizeof..$];
		return ret;
	}

	/// code uses Alloca!(double) and Alloca!(double[]) for
	/// allocations. Don't use .length or ~=

	
}

Apr 08 2008

Markus Dittrich <markusle gmail.com> writes:

BCS Wrote:

 Markus Dittrich wrote:
 Hi,
 
 For a data processing application I need to read a large number
 of data sets from disk. Due to their size, they have to be read and 
 processed sequentially, i.e. in pseudocode
 
 int main()
 {
     while (some condition)
     {
          double[][] myLargeDataset = read_data();
          process_data(myLargeDataset);
          // free all memory here otherwise next cycle will 
          // run out of memory
      }
     
    return 0;
 }
 
 Now, the "problem" is the fact that each single data-set saturates
 system memory and hence I need to make sure that all memory
 is freed after each process_data step is complete. Unfortunately,
 using dmd-2.012 I have not been able to achieve this. Whatever
 I do (including nothing, i.e., letting the GC do its job), the resulting 
 binary keeps accumulating memory and crashing shortly after). 
 I've tried deleting the array, setting the array lengths to 0, and manually
 forcing the GC to collect, to no avail. Hence, is there something I am 
 doing terribly wrong or is this a bug in dmd?
 
 Thanks much,
 Markus

 
 One "hack" would be to have read_data() allocate a big buffer and then 
 slice the parts of the double[][] out of it. This would have the 
 advantage that you can just keep track of the buffer and on the next 
 pass just reuse it in it's entirety, you never have to delete it.
 
 double[][] read_data()
 {
 	static byte[] buff;
 	if(buff.prt is null) buff = new byte[huge];
 
 	byte left = buff;
 
 	T[] Alloca(T)(int i)
 	{
 		T[] ret = (cast(*T)left.prt)[0..i];
 		buff = buff[i*T.sizeof..$];
 		return ret;
 	}
 
 	/// code uses Alloca!(double) and Alloca!(double[]) for
 	/// allocations. Don't use .length or ~=
 
 	
 }

Thanks much for you response! I could certainly role my own buffer
management. Unfortunately, the "real" app is more complicated 
than the "proof of concept" code I posted and doing so would require a bit
more work. After all, the main reason for using D for this type of thing
was the fact that I didn't want to deal with manual memory management ;)

From the posts I gather that I am not doing anything fundamentally
wrong, and I'll probably file a bug for this later.

Apr 08 2008

BCS <BCS pathlink.com> writes:

Markus Dittrich wrote:
 Unfortunately, the "real" app is more complicated 
 than the "proof of concept" code I posted and doing so would require a bit
 more work.
 

life would be so much nicer if real life didn't get in the way :b

Apr 08 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Markus Dittrich wrote:
 Hi,
 
 For a data processing application I need to read a large number
 of data sets from disk. Due to their size, they have to be read and 
 processed sequentially, i.e. in pseudocode
 
 int main()
 {
     while (some condition)
     {
          double[][] myLargeDataset = read_data();
          process_data(myLargeDataset);
          // free all memory here otherwise next cycle will 
          // run out of memory
      }
     
    return 0;
 }
 
 Now, the "problem" is the fact that each single data-set saturates
 system memory and hence I need to make sure that all memory
 is freed after each process_data step is complete. Unfortunately,
 using dmd-2.012 I have not been able to achieve this. Whatever
 I do (including nothing, i.e., letting the GC do its job), the resulting 
 binary keeps accumulating memory and crashing shortly after). 
 I've tried deleting the array, setting the array lengths to 0, and manually
 forcing the GC to collect, to no avail. Hence, is there something I am 
 doing terribly wrong or is this a bug in dmd?

Markus,  you do not show us what either read_data or process_data do. 
It is possible that one of those is somehow holding on to references to 
the data.  This would prevent the GC from collecting the memory.

Another problem is if you allocate the memory initially as void[] then 
the GC will scan it for pointers, and in a big float buffer you'll get a 
lot of false hits.  To prevent that, allocate the buffer initially as 
byte[] (or double[] --- just not void).

Anyway, if you want a speedy fix, you'll need to distill this down into 
something that is actually reproducible by Walter.

--bb

Apr 08 2008

Markus Dittirich <markusle gmail.com> writes:

Bill Baxter Wrote:

 
 Markus,  you do not show us what either read_data or process_data do. 
 It is possible that one of those is somehow holding on to references to 
 the data.  This would prevent the GC from collecting the memory.
 
 Another problem is if you allocate the memory initially as void[] then 
 the GC will scan it for pointers, and in a big float buffer you'll get a 
 lot of false hits.  To prevent that, allocate the buffer initially as 
 byte[] (or double[] --- just not void).
 
 Anyway, if you want a speedy fix, you'll need to distill this down into 
 something that is actually reproducible by Walter.
 
 --bb

Hi Bill,

You're of course absolutely correct! Below is a proof of concept code
that still exhibits the issue I was describing. The parse code needs 
to handle row centric ascii data with a variable number of columns.
The file "data_random.dat" contains a single row of random integers.
After a few iterations the code runs out of memory on my machine
and no deleting seems to help.

import std.stream;
import std.stdio;
import std.contracts;
import std.gc;


public double[][] parse(BufferedFile inputFile)
{

  double[][] array;
  foreach(char[] line; inputFile)
  {
    double[] temp;

    foreach(string item; std.string.split(assumeUnique(line)))
    {
       temp ~= std.string.atof(item);
    }

    array ~= temp;
  }

  /* rewind for next round */
  inputFile.seekSet(0);

  return array;
}



int main()
{
  BufferedFile inputFile = new BufferedFile("data_random.dat");

  while(1)
  {
    double[][] foo = parse(inputFile);
  }

  return 1;
}

Thanks much,
Markus

Apr 08 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Markus Dittirich wrote:
 Bill Baxter Wrote:
 
 Markus,  you do not show us what either read_data or process_data do. 
 It is possible that one of those is somehow holding on to references to 
 the data.  This would prevent the GC from collecting the memory.

 Another problem is if you allocate the memory initially as void[] then 
 the GC will scan it for pointers, and in a big float buffer you'll get a 
 lot of false hits.  To prevent that, allocate the buffer initially as 
 byte[] (or double[] --- just not void).

 Anyway, if you want a speedy fix, you'll need to distill this down into 
 something that is actually reproducible by Walter.

 --bb

 
 Hi Bill,
 
 You're of course absolutely correct! Below is a proof of concept code
 that still exhibits the issue I was describing. The parse code needs 
 to handle row centric ascii data with a variable number of columns.
 The file "data_random.dat" contains a single row of random integers.
 After a few iterations the code runs out of memory on my machine
 and no deleting seems to help.
 
 import std.stream;
 import std.stdio;
 import std.contracts;
 import std.gc;
 
 
 public double[][] parse(BufferedFile inputFile)
 {
 
   double[][] array;
   foreach(char[] line; inputFile)
   {
     double[] temp;
 
     foreach(string item; std.string.split(assumeUnique(line)))
     {
        temp ~= std.string.atof(item);
     }
 
     array ~= temp;
   }
 
   /* rewind for next round */
   inputFile.seekSet(0);
 
   return array;
 }
 
 
 
 int main()
 {
   BufferedFile inputFile = new BufferedFile("data_random.dat");
 
   while(1)
   {
     double[][] foo = parse(inputFile);
   }
 
   return 1;
 }

Ok.  You should add that to the bug report.

However, that test program works fine for me on Windows.
I tried it with
   DMD/Phobos 1.028,
   DMD/Tango/Tangobos 1.028, and
   DMD/Phobos 2.012.

--bb

Apr 08 2008

BCS <BCS pathlink.com> writes:

Markus Dittirich wrote:
 Bill Baxter Wrote:
 
 
Markus,  you do not show us what either read_data or process_data do. 
It is possible that one of those is somehow holding on to references to 
the data.  This would prevent the GC from collecting the memory.

Another problem is if you allocate the memory initially as void[] then 
the GC will scan it for pointers, and in a big float buffer you'll get a 
lot of false hits.  To prevent that, allocate the buffer initially as 
byte[] (or double[] --- just not void).

Anyway, if you want a speedy fix, you'll need to distill this down into 
something that is actually reproducible by Walter.

--bb

 
 
 Hi Bill,
 
 You're of course absolutely correct! Below is a proof of concept code
 that still exhibits the issue I was describing. The parse code needs 
 to handle row centric ascii data with a variable number of columns.
 The file "data_random.dat" contains a single row of random integers.
 After a few iterations the code runs out of memory on my machine
 and no deleting seems to help.
 

does it change things if you drop the ~= in favor of extending the 
array? What about if you preallocate the array with the correct size to 
begin with? (I know this might not be doable in the general case)

Apr 08 2008

Markus Dittirich <markusle gmail.com> writes:

BCS Wrote:
 
 does it change things if you drop the ~= in favor of extending the 
 array? What about if you preallocate the array with the correct size to 
 begin with? (I know this might not be doable in the general case)

I'll play around with this some more. I've tried pre-alocating which didn't
help. It looks like this is a dmd + linux issue since the same code compiled
with
the latest gdc works just fine and Bill seems to have no issues on windows
either.

Thanks for all the help!

cheers,
Markus

Apr 08 2008

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to free memory allocated via double[][] using dmd-2.0.12?