www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - The problem with the D GC

reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
After having fought a while with D programs with runaway memory leaks, 
I've unfortunately had to come to the conclusion that the D GC is not 
ready for production use. The problem is what I'd call "spurious 
pointers". That is random data (strings, numbers, image data, audio or 
whatever) appearing to the GC to be full of pointers to all over the 
memory space.

Consider this simple program. It is designed to have a memory footprint 
of about 20 mb and then continuously process data.

import std.random;

void main() {
         // The real memory use, ~20 mb
         uint[] data;
         data.length = 5_000_000;
         foreach(inout x; data)
                 x = rand();
         while(1) {
		// simulate reading a few kb of data
                 uint[] incoming;
                 incoming.length = 1000 + rand() % 5000;
                 foreach(inout x; incoming)
                         x = rand();
                 // do something with the data...
         }
}

The result may not be as expected. The program will use up all available 
memory (for me crashing at about 2.7 gb of memory usage) and at the same 
time run extremely slow due to the panicked GC scanning all memory over 
and over.

The reason is the 20 mb of random data and the small 32-bit memory 
address range of 4 GB. To understand how bad this is, 20 mb of random 
data will result in _each_ 4k memory page on average having 5 random 
pointers to it. Those spurious pointers are laying a dense mine-field 
effectively disabling the GC.

This means that each time you rely on the GC (array appending/resizing, 
Phobos function calls etc), you have a potential memory leak. (That is 
unless all the program data is nothing but valid pointers/references or 
all non-pointer data is hidden from the GC.)

The above program is of course just a toy illustrating the phenomena. In 
a text processing program of mine the bulk of the data is short char[] 
strings. The program still has runaway memory leaks leading to an 
inevitable crash. I have absolutely no idea how to handle text 
processing using the D recommended char[] and CoW idiom without getting 
severe memory leaks.

The definite solution has to be a GC that only scans memory containing 
pointers. Sean's patches to make the GC skip scanning memory known to 
contain elements smaller than sizeof(void*) will probably help 
tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
float or double data, or the DMD associative array implementation)

-- 
/Oskar
Jan 08 2007
next sibling parent Johan Granberg <lijat.meREM OVE.gmail.com> writes:
Oskar Linde wrote:

 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint
 of about 20 mb and then continuously process data.
 
 import std.random;
 
 void main() {
          // The real memory use, ~20 mb
          uint[] data;
          data.length = 5_000_000;
          foreach(inout x; data)
                  x = rand();
          while(1) {
 // simulate reading a few kb of data
                  uint[] incoming;
                  incoming.length = 1000 + rand() % 5000;
                  foreach(inout x; incoming)
                          x = rand();
                  // do something with the data...
          }
 }
 
 The result may not be as expected. The program will use up all available
 memory (for me crashing at about 2.7 gb of memory usage) and at the same
 time run extremely slow due to the panicked GC scanning all memory over
 and over.
 
 The reason is the 20 mb of random data and the small 32-bit memory
 address range of 4 GB. To understand how bad this is, 20 mb of random
 data will result in _each_ 4k memory page on average having 5 random
 pointers to it. Those spurious pointers are laying a dense mine-field
 effectively disabling the GC.
 
 This means that each time you rely on the GC (array appending/resizing,
 Phobos function calls etc), you have a potential memory leak. (That is
 unless all the program data is nothing but valid pointers/references or
 all non-pointer data is hidden from the GC.)
 
 The above program is of course just a toy illustrating the phenomena. In
 a text processing program of mine the bulk of the data is short char[]
 strings. The program still has runaway memory leaks leading to an
 inevitable crash. I have absolutely no idea how to handle text
 processing using the D recommended char[] and CoW idiom without getting
 severe memory leaks.
 
 The definite solution has to be a GC that only scans memory containing
 pointers. Sean's patches to make the GC skip scanning memory known to
 contain elements smaller than sizeof(void*) will probably help
 tremendously. (I'd just have to make sure I'm not using dchar[] strings,
 float or double data, or the DMD associative array implementation)
 

I have observed the same behavior but did not realize why it happened (thought it was a gc bug on osx or something). Something that helped the problem for me was to call fullCollect very often (40 times a second) this reduced the leak from 1mb a second to almost nothing.
Jan 08 2007
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

Wow that is so bad. I thought it has been mentioned this will not be a problem in practice but it apparently is.
Jan 08 2007
prev sibling next sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint 
 of about 20 mb and then continuously process data.
 
 import std.random;
 
 void main() {
         // The real memory use, ~20 mb
         uint[] data;
         data.length = 5_000_000;
         foreach(inout x; data)
                 x = rand();
         while(1) {
         // simulate reading a few kb of data
                 uint[] incoming;
                 incoming.length = 1000 + rand() % 5000;
                 foreach(inout x; incoming)
                         x = rand();
                 // do something with the data...
         }
 }
 
 The result may not be as expected. The program will use up all available 
 memory (for me crashing at about 2.7 gb of memory usage) and at the same 
 time run extremely slow due to the panicked GC scanning all memory over 
 and over.
 
 The reason is the 20 mb of random data and the small 32-bit memory 
 address range of 4 GB. To understand how bad this is, 20 mb of random 
 data will result in _each_ 4k memory page on average having 5 random 
 pointers to it. Those spurious pointers are laying a dense mine-field 
 effectively disabling the GC.
 
 This means that each time you rely on the GC (array appending/resizing, 
 Phobos function calls etc), you have a potential memory leak. (That is 
 unless all the program data is nothing but valid pointers/references or 
 all non-pointer data is hidden from the GC.)
 
 The above program is of course just a toy illustrating the phenomena. In 
 a text processing program of mine the bulk of the data is short char[] 
 strings. The program still has runaway memory leaks leading to an 
 inevitable crash. I have absolutely no idea how to handle text 
 processing using the D recommended char[] and CoW idiom without getting 
 severe memory leaks.
 
 The definite solution has to be a GC that only scans memory containing 
 pointers. Sean's patches to make the GC skip scanning memory known to 
 contain elements smaller than sizeof(void*) will probably help 
 tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
 float or double data, or the DMD associative array implementation)

I've run into similar problems back when I was messing around with the Universal Machine for that programing contest. It would run slower and slower. Skipping GC checks on arrays without pointers is a must, if you ask me. L.
Jan 08 2007
prev sibling next sibling parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Oskar Linde wrote:
 (...) gc hell (...)

I've experienced pretty much the same while doing memory-intensive computations. Since then I've been using lots of malloc/realloc/free and both my memory footprint and execution speed have improved. The GC needs a fix. Badly. -- Tomasz Stachowiak
Jan 08 2007
parent reply kenny <funisher gmail.com> writes:
I also have experienced bad GC performance. I found it to be because of 
the ~ operator on strings. The program is a daemon, and after it had 
been running for a while, memory usage gets truly horrific, and 
performance degrades very bad.

This was back on 0.140, so things may have changed since then...

I solved two ways. First, I wrote a function which accepts variadic 
arguments, and separated everything by a comma instead of appending the 
strings and the performance difference was stunning. it was also nice to 
be able to write:

prt("string", my_int, " ", my_float, "string2");

instead of

"string"~std.string.toString(my_int)~" 
"~std.string.toString(my_float)~"string2"

Second, like someone else in this thread I also called fullCollect every 
second.

I used to use gdc-0.08 with boehm-gc too. I can't honestly remember if 
that had the same problem.

Tom S wrote:
 Oskar Linde wrote:
 (...) gc hell (...)

I've experienced pretty much the same while doing memory-intensive computations. Since then I've been using lots of malloc/realloc/free and both my memory footprint and execution speed have improved. The GC needs a fix. Badly. -- Tomasz Stachowiak

Jan 08 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
kenny wrote:
 I also have experienced bad GC performance. I found it to be because of 
 the ~ operator on strings. The program is a daemon, and after it had 
 been running for a while, memory usage gets truly horrific, and 
 performance degrades very bad.
 
 This was back on 0.140, so things may have changed since then...

Probably not. Assuming this is a multithreaded app, you have to pay for two mutex locks for every concat operation. So an expression like: a = b ~ c ~ d; would result in six locks of the mutex protecting the GC to perform allocations (I think anyway--I don't believe chained concats have special handling). I think this would also result in two discarded temporary buffers for the GC to clean up later. And since the Phobos GC scans all such blocks by default... My patch will address the scanning issue, but it won't address the mutex issue--there's really no easy way to avoid that one. Sean
Jan 08 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 Assuming this is a multithreaded app, you have to pay for 
 two mutex locks for every concat operation.  So an expression like:
 
 a = b ~ c ~ d;
 
 would result in six locks of the mutex protecting the GC to perform 
 allocations (I think anyway--I don't believe chained concats have 
 special handling).  I think this would also result in two discarded 
 temporary buffers for the GC to clean up later.  And since the Phobos GC 
 scans all such blocks by default...

Chained concats *do* have special handling. See phobos/internal/arraycat.d, _d_arraycatn() (first function in the file, can't miss it). This function takes element size and number of arrays, followed by C-style vararg arrays. And yes, it only allocates once :). By the way, how do you figure six locks? At two locks per concat, that code only has two concats so even without above special handling wouldn't it still only lock four times? Also: Why two locks per concat at all? A concat only requires one allocation, so does a single alloc require two locks? (I haven't looked much into the GC code, so this is just curiosity)
 My patch will address the scanning issue, but it won't address the mutex 
 issue--there's really no easy way to avoid that one.

What exactly do you mean by "the mutex issue"? Is using mutexes at all the problem, or is it locking them too often per operation (i.e. more than once per alloc)? If you don't want to use them at all, I think the closest you'd get would involve implementing thread-local heaps. But that would still require a mutex if you want to allow deleting an object from a different thread than it was allocated in...
Jan 08 2007
next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Frits van Bommel wrote:
 If you don't want to use them at all, I think the closest you'd get 
 would involve implementing thread-local heaps.
 But that would still require a mutex if you want to allow deleting an 
 object from a different thread than it was allocated in...

Now that I think about it, maybe it would be possible to eliminate that mutex as well. Hmm...
Jan 08 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Frits van Bommel wrote:
 Sean Kelly wrote:
 Assuming this is a multithreaded app, you have to pay for two mutex 
 locks for every concat operation.  So an expression like:

 a = b ~ c ~ d;

 would result in six locks of the mutex protecting the GC to perform 
 allocations (I think anyway--I don't believe chained concats have 
 special handling).  I think this would also result in two discarded 
 temporary buffers for the GC to clean up later.  And since the Phobos 
 GC scans all such blocks by default...

Chained concats *do* have special handling. See phobos/internal/arraycat.d, _d_arraycatn() (first function in the file, can't miss it). This function takes element size and number of arrays, followed by C-style vararg arrays. And yes, it only allocates once :).

Oh good :) I knew about the varargs but hadn't given the code a close enough look to be sure.
 By the way, how do you figure six locks? At two locks per concat, that 
 code only has two concats so even without above special handling 
 wouldn't it still only lock four times?

I miscounted :p It would have been four without the special handling.
 Also: Why two locks per concat at all? A concat only requires one 
 allocation, so does a single alloc require two locks?
 (I haven't looked much into the GC code, so this is just curiosity)

The code in internal/gc/gc.d first checks the size of the array to see if a realloc is necessary, then it performs the realloc. So worst case you're stuck with two locks per operation. However, this may not be the case in the arraycat routines. It's been a while since I've looked at them.
 My patch will address the scanning issue, but it won't address the 
 mutex issue--there's really no easy way to avoid that one.

What exactly do you mean by "the mutex issue"? Is using mutexes at all the problem, or is it locking them too often per operation (i.e. more than once per alloc)?

Too often per operation. So given the vararg stuff you mentioned above, this isn't an issue IMO. As it is, locks should only be acquired if the app is multithreaded, so this cost is only incurred if it's necessary. I think Phobos might actually lock in a few instances where it isn't strictly necessary, but I can't recall for certain.
 If you don't want to use them at all, I think the closest you'd get 
 would involve implementing thread-local heaps.
 But that would still require a mutex if you want to allow deleting an 
 object from a different thread than it was allocated in...

No it wouldn't--give each per-thread heap a lock-free free list. But per-thread heaps for a GC are somewhat complicated. I think you'd have to implement something like a read/write lock where the "writer" is actually the thread that wants to collect. Sean
Jan 08 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 Frits van Bommel wrote:
 Also: Why two locks per concat at all? A concat only requires one 
 allocation, so does a single alloc require two locks?
 (I haven't looked much into the GC code, so this is just curiosity)

The code in internal/gc/gc.d first checks the size of the array to see if a realloc is necessary, then it performs the realloc. So worst case you're stuck with two locks per operation. However, this may not be the case in the arraycat routines. It's been a while since I've looked at them.

Normal concats always allocate, so there's no need to check the size (and so they don't). You're probably thinking of appends.
 If you don't want to use them at all, I think the closest you'd get 
 would involve implementing thread-local heaps.
 But that would still require a mutex if you want to allow deleting an 
 object from a different thread than it was allocated in...

No it wouldn't--give each per-thread heap a lock-free free list. But per-thread heaps for a GC are somewhat complicated. I think you'd have to implement something like a read/write lock where the "writer" is actually the thread that wants to collect.

I replied to myself about 10 minutes later when I realized it was probably possible :).
Jan 08 2007
prev sibling parent kenny <funisher gmail.com> writes:
Honestly, it was a long time ago, so remembering is difficult. I didn't 
have SVN installed back then :) I'm pretty sure it wasn't 
multi-threaded. I think that we launched multiple processes. (eg. not 
mutexes either) I know I used a lot of assoc arrays, and I also used a 
lot of concatenation. The size of the code was quite huge too.. at least 
8000 lines.

It could be a combination of those factors, and it may have been as 
early of a version of 0.73. I dunno dude, I just remember it sucked.

I have not really been having bad experiences with the GC lately, accept 
that, it's annoying at times with the peaks. Every request will give a 
response time of 2ms, accept when the GC runs, and then it'll give a 
50ms response. I suppose that's not that horrible, but it is a 25x 
difference.

What I've wanted to do for a while now is to be able to set the size of 
the pool of memory, and also want the variable to be set telling me some 
form of information of how close the next collect is.. whether it's 
largest memory block avail, or % fragmentation, or something, so I can 
take the processes and tell them to stop accepting requests (and let one 
of the other processes handle it) and run the GC so the GC doesn't run 
while accepting requests.

I guess I can just estimate based off of the amount of requests it has 
processed... but the other way sounds way cooler.


Sean Kelly wrote:
 kenny wrote:
 I also have experienced bad GC performance. I found it to be because 
 of the ~ operator on strings. The program is a daemon, and after it 
 had been running for a while, memory usage gets truly horrific, and 
 performance degrades very bad.

 This was back on 0.140, so things may have changed since then...

Probably not. Assuming this is a multithreaded app, you have to pay for two mutex locks for every concat operation. So an expression like: a = b ~ c ~ d; would result in six locks of the mutex protecting the GC to perform allocations (I think anyway--I don't believe chained concats have special handling). I think this would also result in two discarded temporary buffers for the GC to clean up later. And since the Phobos GC scans all such blocks by default... My patch will address the scanning issue, but it won't address the mutex issue--there's really no easy way to avoid that one. Sean

Jan 08 2007
prev sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
kenny wrote:
 Second, like someone else in this thread I also called fullCollect every 
 second.

This is not always an option. With a few hundred megs of memory allocated, collections can last seconds. Resorting to malloc was the only rescue for me... Anyway, thanks for sharing your experiences guys. Now I know I'm not the only one around having difficult relationships with the GC. -- Tomasz Stachowiak
Jan 08 2007
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint 
 of about 20 mb and then continuously process data.
 
 import std.random;
 
 void main() {
         // The real memory use, ~20 mb
         uint[] data;
         data.length = 5_000_000;
         foreach(inout x; data)
                 x = rand();
         while(1) {
         // simulate reading a few kb of data
                 uint[] incoming;
                 incoming.length = 1000 + rand() % 5000;
                 foreach(inout x; incoming)
                         x = rand();
                 // do something with the data...
         }
 }
 
 The result may not be as expected. The program will use up all available 
 memory (for me crashing at about 2.7 gb of memory usage) and at the same 
 time run extremely slow due to the panicked GC scanning all memory over 
 and over.
 
 The reason is the 20 mb of random data and the small 32-bit memory 
 address range of 4 GB. To understand how bad this is, 20 mb of random 
 data will result in _each_ 4k memory page on average having 5 random 
 pointers to it. Those spurious pointers are laying a dense mine-field 
 effectively disabling the GC.
 
 This means that each time you rely on the GC (array appending/resizing, 
 Phobos function calls etc), you have a potential memory leak. (That is 
 unless all the program data is nothing but valid pointers/references or 
 all non-pointer data is hidden from the GC.)
 
 The above program is of course just a toy illustrating the phenomena. In 
 a text processing program of mine the bulk of the data is short char[] 
 strings. The program still has runaway memory leaks leading to an 
 inevitable crash. I have absolutely no idea how to handle text 
 processing using the D recommended char[] and CoW idiom without getting 
 severe memory leaks.
 
 The definite solution has to be a GC that only scans memory containing 
 pointers. Sean's patches to make the GC skip scanning memory known to 
 contain elements smaller than sizeof(void*) will probably help 
 tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
 float or double data, or the DMD associative array implementation)

Since the patch keys on element size, the above code would still leak horribly by default. However, the user can set/clear this "no scan" flag explicitly, so if there are any memory blocks that are still scanned by default, you can indicate that the GC should not scan them. I think between the two, we should be in pretty good shape. Sean
Jan 08 2007
prev sibling next sibling parent "Andrey Khropov" <andkhropov_nosp m_mtu-net.ru> writes:
Oskar Linde wrote:

 After having fought a while with D programs with runaway memory leaks, I've
 unfortunately had to come to the conclusion that the D GC is not ready for
 production use. The problem is what I'd call "spurious pointers". That is
 random data (strings, numbers, image data, audio or whatever) appearing to
 the GC to be full of pointers to all over the memory space.
 
 The definite solution has to be a GC that only scans memory containing
 pointers. Sean's patches to make the GC skip scanning memory known to contain
 elements smaller than sizeof(void*) will probably help tremendously. (I'd
 just have to make sure I'm not using dchar[] strings, float or double data,
 or the DMD associative array implementation)

That's what precise GC is all about. I think it's the single biggest problem of the present D implementations. Some of my measurements have shown that on memory-intensive applications current Phobos GC could be 10x slower than MS.NET 2.0's GC: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digi talmars.D&artnum=43991 -- AKhropov
Jan 08 2007
prev sibling next sibling parent %u <u digitaldaemon.com> writes:
== Quote from Oskar Linde (oskar.lindeREM OVEgmail.com)'s article
 This means that each time you rely on the GC (array
 appending/resizing, Phobos function calls etc), you have a
 potential memory leak.

Wrong. You have an intentional memory leak, from which the GC cannot recover you. The GC was designed to eventually free coders from memory leak accidents. To enable this all variables are initialized by default-- -because this way no random data, including pointers, can survive. In total your toy program establishes exactly that environment for which the GC is known to fail. There is an easy solution for such environments by increasing the amount of memory needed for pointers by the factor two---or halving the available memory if memory bounds are tight and one takes a look from the other side. But even this solution will not prevent the GC from sucking in all data swapped out, if the GC is not coupled with the virtual memory manager.
Jan 08 2007
prev sibling next sibling parent reply Miles <_______ _______.____> writes:
Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

Some time ago I had the exact same problem, and I tried to convince Walter of how bad this problem is. Others have said the same thing before you and me. To Walter, "in practice, all integers vary from 0 to 100" or something like that, implying that the problem does not exist in fact or is not relevant. I should also mention that this is considered a serious security issue, on the same class of the QuickSort algorithm. For those not aware of secure programming practices: using QuickSort to sort data input by a remote user is considered a bad programming practice, because although QuickSort is O(n*log(n)) on average, there is a set of input data that makes the algorithm O(n^2). Knowing the implementation, an attacker may feed the application with carefully crafted data that exploits this weakness. In the D case, knowing how data are laid on memory, it is easy for a remote attacker to feed data that, read as pointers, would point to large (and otherwise temporary) blocks of data, making them leak. Stack base randomization and other similar techniques helps preventing this, but IMO, just hides the problem. Another problem in current GC implementation, IMO, is that memory space once allocated is never given back to the OS when disposed. What makes this worse is that it is common for programs to use a big amount of memory during initialization (like parsing XML data files), and never use it again. Again, Walter already knows of this and doesn't think it is a problem.
Jan 08 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Miles wrote:
 I should also mention that this is considered a serious security issue,
 on the same class of the QuickSort algorithm.
 
 For those not aware of secure programming practices: using QuickSort to
 sort data input by a remote user is considered a bad programming
 practice, because although QuickSort is O(n*log(n)) on average, there is
 a set of input data that makes the algorithm O(n^2). Knowing the
 implementation, an attacker may feed the application with carefully
 crafted data that exploits this weakness.

Of course, if you're only concerned about malicious users sending worst-case data (and not about worst-case data appearing randomly) there's an easy fix: randomize the data in O(n) :). Don't laugh, they taught me that in school. And it works, Quicksort will remain O(n*log(n)) on average with that simple step performed beforehand, no matter the input data. You'll need to trust your randomizer though, if the attacker can influence *that* you're just screwed and should use introsort or merge sort or something else with guaranteed O(n*log(n)) behavior if that's really what you want...
Jan 08 2007
next sibling parent Miles <_______ _______.____> writes:
Frits van Bommel wrote:
 Of course, if you're only concerned about malicious users sending
 worst-case data (and not about worst-case data appearing randomly)
 there's an easy fix: randomize the data in O(n) :).

Yeah, sure. You can also just pick a random element per iteration (instead of the first one or the middle one).
Jan 09 2007
prev sibling parent reply Kevin Bealer <kevinbealer gmail.com> writes:
I would think you are find if you pick your pivot at random(), and seed the
random
number generator at the top of your program with microseconds-since-1970 or
whatever the convenient chaotic local number is.

Or you could probably just use D's [].sort method -- I think Walter said it uses
quick-sort but checks for poor performance and switches to a heap sort
automatically.

Kevin
Jan 09 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Kevin Bealer wrote:
 I would think you are find if you pick your pivot at random(), and seed the
random
 number generator at the top of your program with microseconds-since-1970 or
 whatever the convenient chaotic local number is.

IIRC randomizing the input is a rather general solution to this kind of problem (not just with Quicksort).
 Or you could probably just use D's [].sort method -- I think Walter said it
uses
 quick-sort but checks for poor performance and switches to a heap sort
automatically.

That would be a variant of introsort. I believe I mentioned it.
Jan 09 2007
prev sibling next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

Wow. That may be the problem I'm having too. I have some radial basis function interpolation code that I put together just before the holiday break, which I observed to be leaking like mad. First thing I need to do today after catching up on mail is to figure out why, but maybe you've just answered the question for me. --bb
Jan 08 2007
prev sibling next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Here's a slightly less contrived version of Oskar's gc test.

import std.math;
import std.random;
import std.stdio;

void main() {
     // The real memory use, ~40 mb
     double[] data;
     data.length = 5_000_000;
     foreach(i, inout x; data) {
         x = sin(cast(double)i/data.length);
         //x = 1;
     }
     int count = 0;
     int gcount = 0;
     while(1) {
         // simulate reading a few kb of data
         double[] incoming;
         incoming.length = 1000 + rand() % 5000;
         foreach(i, inout x; incoming) {
             x = sin(cast(double)i/incoming.length);
             //x = 5;
         }
         // do something with the data...

         // print status message every so often
         count += incoming.length;
         if (count > 1_000_000) {
             count = 0;
             gcount++;
             writefln("%s processed", gcount);
         }
     }
}



This one uses doubles instead of uints and the data is the sin of some 
number.  These are _very_ realistic values for numeric data to have. 
The same effect can be seen.  Instead of hovering around 40MB, the 
memory use grows and grows and performance slows and slows.

This seems to be a very big issue.  The GC seems to be pretty much 
useless right now if you're going to have a lot of floating point data 
in your app.

--bb

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint 
 of about 20 mb and then continuously process data.
 

Jan 08 2007
next sibling parent reply %u <dusr yahoo.com> writes:
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Here's a slightly less contrived version of Oskar's gc test.
 import std.math;
 import std.random;
 import std.stdio;
 void main() {
      // The real memory use, ~40 mb
      double[] data;
      data.length = 5_000_000;
      foreach(i, inout x; data) {
          x = sin(cast(double)i/data.length);
          //x = 1;
      }
      int count = 0;
      int gcount = 0;
      while(1) {
          // simulate reading a few kb of data
          double[] incoming;
          incoming.length = 1000 + rand() % 5000;
          foreach(i, inout x; incoming) {
              x = sin(cast(double)i/incoming.length);
              //x = 5;
          }
          // do something with the data...
          // print status message every so often
          count += incoming.length;
          if (count > 1_000_000) {
              count = 0;
              gcount++;
              writefln("%s processed", gcount);
          }
      }
 }
 This one uses doubles instead of uints and the data is the sin of some
 number.  These are _very_ realistic values for numeric data to have.
 The same effect can be seen.  Instead of hovering around 40MB, the
 memory use grows and grows and performance slows and slows.
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.
 --bb
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

 Consider this simple program. It is designed to have a memory footprint
 of about 20 mb and then continuously process data.


Agreed. This needs to be changed. Is the GC in that tango library any better?
Jan 08 2007
parent reply Sean Kelly <sean f4.ca> writes:
%u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Here's a slightly less contrived version of Oskar's gc test.
 import std.math;
 import std.random;
 import std.stdio;
 void main() {
      // The real memory use, ~40 mb
      double[] data;
      data.length = 5_000_000;
      foreach(i, inout x; data) {
          x = sin(cast(double)i/data.length);
          //x = 1;
      }
      int count = 0;
      int gcount = 0;
      while(1) {
          // simulate reading a few kb of data
          double[] incoming;
          incoming.length = 1000 + rand() % 5000;
          foreach(i, inout x; incoming) {
              x = sin(cast(double)i/incoming.length);
              //x = 5;
          }
          // do something with the data...
          // print status message every so often
          count += incoming.length;
          if (count > 1_000_000) {
              count = 0;
              gcount++;
              writefln("%s processed", gcount);
          }
      }
 }
 This one uses doubles instead of uints and the data is the sin of some
 number.  These are _very_ realistic values for numeric data to have.
 The same effect can be seen.  Instead of hovering around 40MB, the
 memory use grows and grows and performance slows and slows.
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.
 --bb
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

 Consider this simple program. It is designed to have a memory footprint
 of about 20 mb and then continuously process data.


Agreed. This needs to be changed. Is the GC in that tango library any better?

It's a modified version of the DMD GC. The "don't scan blocks containing elements smaller than pointer size" feature is built-in, and there is user-level control of that behavior on a per-block basis, among other things. But it's still the same old mark/sweep GC at heart. Sean
Jan 08 2007
parent reply =?ISO-8859-1?Q?Lu=EDs_Marques?= <luismarques gmail.com> writes:
Sean Kelly wrote:
 It's a modified version of the DMD GC.  The "don't scan blocks 
 containing elements smaller than pointer size" feature is built-in, and 
 there is user-level control of that behavior on a per-block basis, among 
 other things.  But it's still the same old mark/sweep GC at heart.

Does the new GC allow setting a hook to be informed of when a given object was collected? (I need that) Luís
Jan 09 2007
parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Luís Marques wrote:
 Sean Kelly wrote:
 It's a modified version of the DMD GC.  The "don't scan blocks 
 containing elements smaller than pointer size" feature is built-in, 
 and there is user-level control of that behavior on a per-block basis, 
 among other things.  But it's still the same old mark/sweep GC at heart.

Does the new GC allow setting a hook to be informed of when a given object was collected? (I need that) Luís

This is possible with these functions in Object: final void notifyRegister(void delegate(Object) dg); final void notifyUnRegister(void delegate(Object) dg);
Jan 09 2007
parent Sean Kelly <sean f4.ca> writes:
Lutger wrote:
 Luís Marques wrote:
 Sean Kelly wrote:
 It's a modified version of the DMD GC.  The "don't scan blocks 
 containing elements smaller than pointer size" feature is built-in, 
 and there is user-level control of that behavior on a per-block 
 basis, among other things.  But it's still the same old mark/sweep GC 
 at heart.

Does the new GC allow setting a hook to be informed of when a given object was collected? (I need that)

This is possible with these functions in Object: final void notifyRegister(void delegate(Object) dg); final void notifyUnRegister(void delegate(Object) dg);

This option is available. There is also a global hook that can be called when an object is collected by the GC (as opposed to destroyed deterministically via delete): bool myCollectHandler( Object o ) { if( o.classinfo.name == "MyObject" ) { (cast(MyObject) o).dispose(); return false; } return true; } setCollectHandler( &myCollectHandler ); Returning false from the hook tells the GC not to finalize the object--ie. destroy its monitor but don't call its dtor. The purpose of this method is twofold: to detect when memory is "leaked" and to allow objects to be cleaned up differently if disposed via delete than via a GC collection. Sean
Jan 09 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Bill Baxter wrote:
 Here's a slightly less contrived version of Oskar's gc test.
 
 import std.math;
 import std.random;
 import std.stdio;
 
 void main() {
     // The real memory use, ~40 mb
     double[] data;
     data.length = 5_000_000;
     foreach(i, inout x; data) {
         x = sin(cast(double)i/data.length);
         //x = 1;
     }
     int count = 0;
     int gcount = 0;
     while(1) {
         // simulate reading a few kb of data
         double[] incoming;
         incoming.length = 1000 + rand() % 5000;
         foreach(i, inout x; incoming) {
             x = sin(cast(double)i/incoming.length);
             //x = 5;
         }
         // do something with the data...
 
         // print status message every so often
         count += incoming.length;
         if (count > 1_000_000) {
             count = 0;
             gcount++;
             writefln("%s processed", gcount);
         }
     }
 }
 
 
 
 This one uses doubles instead of uints and the data is the sin of some 
 number.  These are _very_ realistic values for numeric data to have. The 
 same effect can be seen.  Instead of hovering around 40MB, the memory 
 use grows and grows and performance slows and slows.
 
 This seems to be a very big issue.  The GC seems to be pretty much 
 useless right now if you're going to have a lot of floating point data 
 in your app.

For what it's worth, I ran the test above with the modified GC in Tango, for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet. Sean
Jan 09 2007
next sibling parent reply "Ralf Schneider" <ralfs72_at_ gmx.net> writes:
 For what it's worth, I ran the test above with the modified GC in Tango, 
 for 10000 iterations of the "while(1)" loop.  The default behavior roughly 
 matched Phobos, with an 89 second run time and over 340MB of memory 
 consumed and growing steadily.  Then I told the GC to not scan the arrays 
 using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds with 
 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out whether to 
 scan the block using TypeInfo, but at least it grants the programmer a way 
 to customize GC behavior somewhat to tune application performance. The 
 only stipulation with the current implementation is that block attributes 
 will not be preserved if an array is resized.  This isn't terribly 
 difficult to fix, but I haven't done so yet.

It dosen't seem so hard for me to let the compiler set such an attribute on arrays without pointers... - Ralf
Jan 09 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Ralf Schneider wrote:
 It dosen't seem so hard for me to let the compiler set such an attribute on 
 arrays without pointers...

It's pretty hard if you can't modify the compiler ;). Sean can't modify what DMD does (at least, not directly). He _can_ (directly) modify what the runtime library does by replacing it, which is what he's done. He (or anyone else, for that matter) might be able to implement this in GDC though... And Walter might be convinced to implement it in DMD (or, if it's front-end only code, accept a patch that implements it). Of course, that still leaves arrays of structs, which may contain both pointers and non-pointers. What we really need is a way for the GC to know what the type of the memory is, or at least where the pointers are. This may be possible by adding this info to TypeInfo & subclasses[1]. But then every memory block would need a pointer to the relevant TypeInfo (or some condensed form of this information, like flags for "only pointers" and "no pointers", with TypeInfo pointer only if both are false). This would definitely require some sort of compiler support though; it would need to generate appropriate type information for structs, objects (the actual memory, not the references) and unions. It would then need to supply this information to the GC in the runtime, requiring extra code generation. [1]: This could be as simple as supplying a member function that performs a callback for every offset containing a pointer.
Jan 09 2007
next sibling parent Kyle Furlong <kylefurlong gmail.com> writes:
Frits van Bommel wrote:
 Ralf Schneider wrote:
 It dosen't seem so hard for me to let the compiler set such an 
 attribute on arrays without pointers...

It's pretty hard if you can't modify the compiler ;). Sean can't modify what DMD does (at least, not directly). He _can_ (directly) modify what the runtime library does by replacing it, which is what he's done. He (or anyone else, for that matter) might be able to implement this in GDC though... And Walter might be convinced to implement it in DMD (or, if it's front-end only code, accept a patch that implements it). Of course, that still leaves arrays of structs, which may contain both pointers and non-pointers. What we really need is a way for the GC to know what the type of the memory is, or at least where the pointers are. This may be possible by adding this info to TypeInfo & subclasses[1]. But then every memory block would need a pointer to the relevant TypeInfo (or some condensed form of this information, like flags for "only pointers" and "no pointers", with TypeInfo pointer only if both are false). This would definitely require some sort of compiler support though; it would need to generate appropriate type information for structs, objects (the actual memory, not the references) and unions. It would then need to supply this information to the GC in the runtime, requiring extra code generation. [1]: This could be as simple as supplying a member function that performs a callback for every offset containing a pointer.

According to a conversation I had with Walter a month or two ago, this is the direction we are going. With this sort of type information, other, more modern GC implementations will become possible, one of which I hope to write. :D
Jan 09 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Frits van Bommel wrote:
 Ralf Schneider wrote:
 It dosen't seem so hard for me to let the compiler set such an 
 attribute on arrays without pointers...

It's pretty hard if you can't modify the compiler ;). Sean can't modify what DMD does (at least, not directly). He _can_ (directly) modify what the runtime library does by replacing it, which is what he's done. He (or anyone else, for that matter) might be able to implement this in GDC though... And Walter might be convinced to implement it in DMD (or, if it's front-end only code, accept a patch that implements it). Of course, that still leaves arrays of structs, which may contain both pointers and non-pointers. What we really need is a way for the GC to know what the type of the memory is, or at least where the pointers are. This may be possible by adding this info to TypeInfo & subclasses[1]. But then every memory block would need a pointer to the relevant TypeInfo (or some condensed form of this information, like flags for "only pointers" and "no pointers", with TypeInfo pointer only if both are false). This would definitely require some sort of compiler support though; it would need to generate appropriate type information for structs, objects (the actual memory, not the references) and unions. It would then need to supply this information to the GC in the runtime, requiring extra code generation.

An interim alternative that came up in conversation would be to provide similar functionality via template functions. With the .tupleof property, it's possible to obtain a very accurate picture of what data should be scanned. This would mean using a custom function instead of 'new', but it would certainly work: MyClass c = create!(MyClass)( a, b, c ); The create function above could be written using TypeTuples and other magic to allow direct parameter passing to the class ctor, making the routine just as efficient as an in-language solution. Similar functions could be written for arrays, exploiting some tricks in the language: T[] resize(T)( T[] val, size_t len ) { ... } int[] buf; buf.resize( 1024 ); Again, perhaps not as clean as an in-language solution, but it could be done today. I'll look into doing something like this for Tango prior to release. It shouldn't be too difficult, assuming the .tupleof property works as described (some experimentation I made this morning suggests that it may not). Sean
Jan 09 2007
next sibling parent Sean Kelly <sean f4.ca> writes:
Sean Kelly wrote:
 Similar functions 
 could be written for arrays, exploiting some tricks in the language:
 
     T[] resize(T)( T[] val, size_t len ) { ... }

Err, this should be: T[] resize(T)( inout T[] val, size_t len ) { ... } Sean
Jan 09 2007
prev sibling next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Sean Kelly wrote:
 An interim alternative that came up in conversation would be to provide 
 similar functionality via template functions.  With the .tupleof 
 property, it's possible to obtain a very accurate picture of what data 
 should be scanned.  This would mean using a custom function instead of 
 'new', but it would certainly work:

I actually tried something like that a while back. Unfortunately DMD kept segfaulting on me... Just tried to compile that code again, still segfaults. I should probably try cutting it down to something bugzilla-worthy.
Jan 09 2007
prev sibling parent zz <zz zz.com> writes:
Sean Kelly wrote:
 I'll look into doing something like this for Tango prior to release.  It 
 shouldn't be too difficult, assuming the .tupleof property works as 
 described (some experimentation I made this morning suggests that it may 
 not).
 
 
 Sean

Some years ago we had some comparision between two PostScript (Not GS) Interpreters without rendering to disk but generating loads of data Harlequin RIP beat the other by large factors which we found out were due to their garbage collector besides other things, this group also built the GC for LispWorks and is now available open source, I don't know how it's licenced but I know it was also used in Dylan from Halequin (for those of you who remember the language). Maybe you should look at some of their ideas, the link is bellow. http://www.ravenbrook.com/project/mps/ Zz
Jan 09 2007
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Sean Kelly wrote:

 This seems to be a very big issue.  The GC seems to be pretty much 
 useless right now if you're going to have a lot of floating point data 
 in your app.

For what it's worth, I ran the test above with the modified GC in Tango, for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet.

I'd certainly be willing to do that sort of thing in my code to make the problem go away in the short term. In my case all of those setAttr calls would be hidden away in library code, so it wouldn't make life any harder for everyday use. But it's not much use as a workaround until Tango is actually downloadable somewhere. :-) Once Tango is available, though, that would certainly be better than the alternative of rewriting my library to use malloc/free. If that's going to be necessary then I think I'll just as soon go back to C++ where at least I can use smart pointers. The integrated GC is the #1 reason I'm using D in the first place, and if I can't realistically use that then I might as well be using C++. --bb
Jan 09 2007
parent reply %u <duser hi.com> writes:
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

For what it's worth, I ran the test above with the modified GC in Tango, for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet.

problem go away in the short term. In my case all of those setAttr calls would be hidden away in library code, so it wouldn't make life any harder for everyday use. But it's not much use as a workaround until Tango is actually downloadable somewhere. :-) Once Tango is available, though, that would certainly be better than the alternative of rewriting my library to use malloc/free. If that's going to be necessary then I think I'll just as soon go back to C++ where at least I can use smart pointers. The integrated GC is the #1 reason I'm using D in the first place, and if I can't realistically use that then I might as well be using C++. --bb

I feel the same way. Without GC, D is just C++ with a few more features.
Jan 09 2007
parent reply Kyle Furlong <kylefurlong gmail.com> writes:
%u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet.

problem go away in the short term. In my case all of those setAttr calls would be hidden away in library code, so it wouldn't make life any harder for everyday use. But it's not much use as a workaround until Tango is actually downloadable somewhere. :-) Once Tango is available, though, that would certainly be better than the alternative of rewriting my library to use malloc/free. If that's going to be necessary then I think I'll just as soon go back to C++ where at least I can use smart pointers. The integrated GC is the #1 reason I'm using D in the first place, and if I can't realistically use that then I might as well be using C++. --bb

I feel the same way. Without GC, D is just C++ with a few more features.

I do not think that this is even remotely valid. Think of everything you arent doing in D. You arent hacking macros with the preprocessor, you arent declaring methods outside of classes, globally overloading operators... To say that D is C++ + GC + some other features is just plain ignorant.
Jan 10 2007
parent reply %u <duser hotmail.com> writes:
== Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet.

problem go away in the short term. In my case all of those setAttr calls would be hidden away in library code, so it wouldn't make life any harder for everyday use. But it's not much use as a workaround until Tango is actually downloadable somewhere. :-) Once Tango is available, though, that would certainly be better than the alternative of rewriting my library to use malloc/free. If that's going to be necessary then I think I'll just as soon go back to C++ where at least I can use smart pointers. The integrated GC is the #1 reason I'm using D in the first place, and if I can't realistically use that then I might as well be using C++. --bb

I feel the same way. Without GC, D is just C++ with a few more features.

arent doing in D. You arent hacking macros with the preprocessor, you arent declaring methods outside of classes, globally overloading operators... To say that D is C++ + GC + some other features is just plain ignorant.

It's not ignorant from my viewpoint. You're assuming I care about the features you described.
Jan 10 2007
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"%u" <duser hotmail.com> wrote in message 
news:eo324t$2ui5$1 digitaldaemon.com...
 It's not ignorant from my viewpoint. You're assuming I care about the 
 features you
 described.

A bit OT, but would you please get a newsreader so you can stop being called "%u"?
Jan 10 2007
prev sibling parent reply Kyle Furlong <kylefurlong gmail.com> writes:
%u wrote:
 == Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet.

problem go away in the short term. In my case all of those setAttr calls would be hidden away in library code, so it wouldn't make life any harder for everyday use. But it's not much use as a workaround until Tango is actually downloadable somewhere. :-) Once Tango is available, though, that would certainly be better than the alternative of rewriting my library to use malloc/free. If that's going to be necessary then I think I'll just as soon go back to C++ where at least I can use smart pointers. The integrated GC is the #1 reason I'm using D in the first place, and if I can't realistically use that then I might as well be using C++. --bb


arent doing in D. You arent hacking macros with the preprocessor, you arent declaring methods outside of classes, globally overloading operators... To say that D is C++ + GC + some other features is just plain ignorant.

It's not ignorant from my viewpoint. You're assuming I care about the features you described.

What I'm trying to say is, that to say a language is just a sum of its features is incorrect. There are behaviors that arise from the interaction of features that can be good or bad. In D, I've found that these interactions are on the whole good. In C++, I've found that these interactions are on the whole bad. I'm not trying to say your experiences with both are invalid, maybe you have been using them in a way in which the opposite is true. But for me, C++ as a whole doesn't feel right, and D as a whole does.
Jan 10 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Kyle Furlong wrote:
 %u wrote:
 == Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point 
 data
 in your app.

Tango, for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet.

make the problem go away in the short term. In my case all of those setAttr calls would be hidden away in library code, so it wouldn't make life any harder for everyday use. But it's not much use as a workaround until Tango is actually downloadable somewhere. :-) Once Tango is available, though, that would certainly be better than the alternative of rewriting my library to use malloc/free. If that's going to be necessary then I think I'll just as soon go back to C++ where at least I can use smart pointers. The integrated GC is the #1 reason I'm using D in the first place, and if I can't realistically use that then I might as well be using C++. --bb

features.

arent doing in D. You arent hacking macros with the preprocessor, you arent declaring methods outside of classes, globally overloading operators... To say that D is C++ + GC + some other features is just plain ignorant.

It's not ignorant from my viewpoint. You're assuming I care about the features you described.

What I'm trying to say is, that to say a language is just a sum of its features is incorrect. There are behaviors that arise from the interaction of features that can be good or bad. In D, I've found that these interactions are on the whole good. In C++, I've found that these interactions are on the whole bad. I'm not trying to say your experiences with both are invalid, maybe you have been using them in a way in which the opposite is true. But for me, C++ as a whole doesn't feel right, and D as a whole does.

Yes, that's certainly a valid point. But I think maybe you're taking Mr. Percent a little too literally. All I was trying to say originally was that, to me, D without garbage collection -- despite all the other nice features -- is not compelling enough to abandon C++. Also I think Mr. Percent and I would both agree that whether or not the language "feels right" is less important than being able to get our work done in a timely manner. I'm interested in D primarily because I believe it will help me get things done faster. Right now, I think with GC and all the other nice D features, for the kinds of things I want to do, D and C++ are just about neck-and-neck. What C++ lacks vs D, it makes up for with great debuggers and tools, and gobs of free libraries. Take away the GC from D, and what's left is simply not enough to overcome the advantages of C++'s debuggers, tools, and libraries *for the kinds of things I want to do*. Your mileage may vary. --bb
Jan 10 2007
parent Kyle Furlong <kylefurlong gmail.com> writes:
Bill Baxter wrote:
 Kyle Furlong wrote:
 %u wrote:
 == Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating 
 point data
 in your app.

Tango, for 10000 iterations of the "while(1)" loop. The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily. Then I told the GC to not scan the arrays using the following calls: gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN ); gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN ); A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing. I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized. This isn't terribly difficult to fix, but I haven't done so yet.

make the problem go away in the short term. In my case all of those setAttr calls would be hidden away in library code, so it wouldn't make life any harder for everyday use. But it's not much use as a workaround until Tango is actually downloadable somewhere. :-) Once Tango is available, though, that would certainly be better than the alternative of rewriting my library to use malloc/free. If that's going to be necessary then I think I'll just as soon go back to C++ where at least I can use smart pointers. The integrated GC is the #1 reason I'm using D in the first place, and if I can't realistically use that then I might as well be using C++. --bb

features.

you arent doing in D. You arent hacking macros with the preprocessor, you arent declaring methods outside of classes, globally overloading operators... To say that D is C++ + GC + some other features is just plain ignorant.

It's not ignorant from my viewpoint. You're assuming I care about the features you described.

What I'm trying to say is, that to say a language is just a sum of its features is incorrect. There are behaviors that arise from the interaction of features that can be good or bad. In D, I've found that these interactions are on the whole good. In C++, I've found that these interactions are on the whole bad. I'm not trying to say your experiences with both are invalid, maybe you have been using them in a way in which the opposite is true. But for me, C++ as a whole doesn't feel right, and D as a whole does.

Yes, that's certainly a valid point. But I think maybe you're taking Mr. Percent a little too literally. All I was trying to say originally was that, to me, D without garbage collection -- despite all the other nice features -- is not compelling enough to abandon C++. Also I think Mr. Percent and I would both agree that whether or not the language "feels right" is less important than being able to get our work done in a timely manner. I'm interested in D primarily because I believe it will help me get things done faster. Right now, I think with GC and all the other nice D features, for the kinds of things I want to do, D and C++ are just about neck-and-neck. What C++ lacks vs D, it makes up for with great debuggers and tools, and gobs of free libraries. Take away the GC from D, and what's left is simply not enough to overcome the advantages of C++'s debuggers, tools, and libraries *for the kinds of things I want to do*. Your mileage may vary. --bb

I agree. This is only good news for D, however, because libraries and tools come with time. D is still young.
Jan 10 2007
prev sibling next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

I feel kind of bad for making it sound like this is a problem related specifically to the D garbage collector. It is rather a general and apparently well known problem of all conservative garbage collectors. The D garbage collector is still excellent for a large domain of problems. Lots of people seem to be having similar problems though, so a better understanding of under what conditions a conservative garbage collector will and will not work seems to be called for. I'd guess that a lot of problems we have with the conservative GC is because we are at the end of the life time of the 32-bit address space. As time has passed, our programs have used a larger and larger portion of the available memory space. As one anonymous %u said, one solution would be "increasing the amount of memory needed for pointers by the factor two", which I interpret as moving to a 64-bit memory space. I did some calculations to better understand what is going on. I've probably done several errors though. <theoretical rambling> The probability of one single uniformly random spurious pointer preventing the collection of an object of size s is p = s/B, with B being the size of the address space (2^b for b = 32 or 64 bits). On a 64 bit architecture, the probability is obviously drastically reduced. if g is the amount (bytes) of independent random data an application needs, the risk of an object of size s not being collected is P(g) = 1 - (1 - s/B) ^ (g/(b/8)) As one can see, the risk increases considerably as the objects gets bigger. Increasing the amount of random data the application contains, reduces the size of objects the GC will be able to handle satisfactory. My example was kind of nasty in that it not only accumulated additional random garbage in each iteration, but also caused heap fragmentation. A kinder example would always create objects of the same size. The simpler example would be: * start with a static set of g bytes of random data * for each iteration create n objects of size s (for simplicity, let each object contain s bytes of random data) * after each iteration, none of the n objects are used anymore. This is a good time to call gc.fullCollect(); After each such iteration, i, P_i*n_i objects will remain uncollected, and will be added to the pool of random spurious pointers. I disregard the small portion of objects that are uncollected because of spurious pointers appearing in other objects left uncollected in the same iteration. If this portion would be significant, you'd really have a problem anyway and this will show up in later iterations. g_{i+1} ~ g_i + P_i*n_i*s. I generously assume that the safe memory area of the collected objects are reused by the allocator in the next iteration. A few of those will become unsafe by the recently added spurious pointers from uncollected objects, but for the most part, those memory areas are now safe, and the objects allocated/collected from them can be disregarded. The number of objects that need to be allocated in unsafe memory areas for the next iteration becomes: n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i) This will of course not perfectly predict the real behavior, as it relaxes the problem away from discrete units, but should still adequately model the sample application. Using this, I tried to find the breaking point, i.e. at what size objects need to be explicitly freed instead of left to the GC. "Verification": Rewriting my original example to to a std.gc.fullCollect() every 10th iteration, giving the following parameters: g = 20 MB n = 10 the model suggests allocating objects of size: s = 2000 would result in almost no space overhead, stable at 20 mb s = 3000 would result in a slight but stable space overhead, s = 4000 would cause a run-away memory usage running the program results in: s = 2000 memory usage is stable at 20 mb s = 3000 results in a small apparently unbounded memory leak s = 4000 results in a unbounded memory leak The model appears to be at least in the correct ballpark for this sample. Theoretical results: Those tables show the maximum object size the GC will be able to handle with different static amounts of "random" data. By "being able to handle", I mean that the application doesn't use more than 2x the required amount of memory. The exact breaking point is very unstable and going above it rapidly results in uncontrolled memory consumption. 32 bit arch, 100 objects 1 MB data 8000 bytes 5 MB data 4500 bytes 10 MB data 3000 bytes 20 MB data 2000 bytes 100 MB data 700 bytes 500 MB data 200 bytes 1000 MB data 100 bytes 32 bit arch, 1000 objects 1 MB data 3400 bytes 5 MB data 2200 bytes 10 MB data 1700 bytes 20 MB data 1200 bytes 100 MB data 500 bytes 500 MB data 150 bytes 1000 MB data 100 bytes 32 bit arch, 10000 objects 1 MB data 1300 bytes 5 MB data 1000 bytes 10 MB data 800 bytes 20 MB data 600 bytes 100 MB data 300 bytes 500 MB data 100 bytes 1000 MB data 75 bytes Those figures need to be taken with a couple of grains of salt, but should give a indication of at what object size one needs to manually handle object lifetimes. As a comparison -- the 64 bit haven: 64 bit arch, 100 objects 2 GB data 1.5 GB 100 GB data 600 MB 1 TB data 200 MB 64 bit arch, 1000 objects 2 GB data 350 MB 100 GB data 250 MB 1 TB data 150 MB 64 bit arch, 10000 objects 2 GB data 100 MB 100 GB data 75 MB 1 TB data 50 MB </theoretical rambling> Summary: As far as I can see, what I have to do to avoid memory leaks with a conservative GC, is one of the following: 1. move to a 64 bit architecture 2. manually handle all objects larger than a few hundred bytes (see above) 3. hide all non pointer data from the GC It is a shame #2 is such a pain and that D doesn't offer any help such as automatic ref-counting. Not having automatic ref-counting also prevents neat solutions such as transparent CoW, and automatic handling of scarce resources. If I wouldn't have a strong belief that automatic ref-counting would be addressed soon, I'd definitely consider going back to C++. Luckily, before I'd give up waiting, 64 bit architectures will probably be in great majority. ;) /Oskar
Jan 09 2007
next sibling parent reply Kyle Furlong <kylefurlong gmail.com> writes:
Oskar Linde wrote:
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

I feel kind of bad for making it sound like this is a problem related specifically to the D garbage collector. It is rather a general and apparently well known problem of all conservative garbage collectors. The D garbage collector is still excellent for a large domain of problems. Lots of people seem to be having similar problems though, so a better understanding of under what conditions a conservative garbage collector will and will not work seems to be called for. I'd guess that a lot of problems we have with the conservative GC is because we are at the end of the life time of the 32-bit address space. As time has passed, our programs have used a larger and larger portion of the available memory space. As one anonymous %u said, one solution would be "increasing the amount of memory needed for pointers by the factor two", which I interpret as moving to a 64-bit memory space. I did some calculations to better understand what is going on. I've probably done several errors though. <theoretical rambling> The probability of one single uniformly random spurious pointer preventing the collection of an object of size s is p = s/B, with B being the size of the address space (2^b for b = 32 or 64 bits). On a 64 bit architecture, the probability is obviously drastically reduced. if g is the amount (bytes) of independent random data an application needs, the risk of an object of size s not being collected is P(g) = 1 - (1 - s/B) ^ (g/(b/8)) As one can see, the risk increases considerably as the objects gets bigger. Increasing the amount of random data the application contains, reduces the size of objects the GC will be able to handle satisfactory. My example was kind of nasty in that it not only accumulated additional random garbage in each iteration, but also caused heap fragmentation. A kinder example would always create objects of the same size. The simpler example would be: * start with a static set of g bytes of random data * for each iteration create n objects of size s (for simplicity, let each object contain s bytes of random data) * after each iteration, none of the n objects are used anymore. This is a good time to call gc.fullCollect(); After each such iteration, i, P_i*n_i objects will remain uncollected, and will be added to the pool of random spurious pointers. I disregard the small portion of objects that are uncollected because of spurious pointers appearing in other objects left uncollected in the same iteration. If this portion would be significant, you'd really have a problem anyway and this will show up in later iterations. g_{i+1} ~ g_i + P_i*n_i*s. I generously assume that the safe memory area of the collected objects are reused by the allocator in the next iteration. A few of those will become unsafe by the recently added spurious pointers from uncollected objects, but for the most part, those memory areas are now safe, and the objects allocated/collected from them can be disregarded. The number of objects that need to be allocated in unsafe memory areas for the next iteration becomes: n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i) This will of course not perfectly predict the real behavior, as it relaxes the problem away from discrete units, but should still adequately model the sample application. Using this, I tried to find the breaking point, i.e. at what size objects need to be explicitly freed instead of left to the GC. "Verification": Rewriting my original example to to a std.gc.fullCollect() every 10th iteration, giving the following parameters: g = 20 MB n = 10 the model suggests allocating objects of size: s = 2000 would result in almost no space overhead, stable at 20 mb s = 3000 would result in a slight but stable space overhead, s = 4000 would cause a run-away memory usage running the program results in: s = 2000 memory usage is stable at 20 mb s = 3000 results in a small apparently unbounded memory leak s = 4000 results in a unbounded memory leak The model appears to be at least in the correct ballpark for this sample. Theoretical results: Those tables show the maximum object size the GC will be able to handle with different static amounts of "random" data. By "being able to handle", I mean that the application doesn't use more than 2x the required amount of memory. The exact breaking point is very unstable and going above it rapidly results in uncontrolled memory consumption. 32 bit arch, 100 objects 1 MB data 8000 bytes 5 MB data 4500 bytes 10 MB data 3000 bytes 20 MB data 2000 bytes 100 MB data 700 bytes 500 MB data 200 bytes 1000 MB data 100 bytes 32 bit arch, 1000 objects 1 MB data 3400 bytes 5 MB data 2200 bytes 10 MB data 1700 bytes 20 MB data 1200 bytes 100 MB data 500 bytes 500 MB data 150 bytes 1000 MB data 100 bytes 32 bit arch, 10000 objects 1 MB data 1300 bytes 5 MB data 1000 bytes 10 MB data 800 bytes 20 MB data 600 bytes 100 MB data 300 bytes 500 MB data 100 bytes 1000 MB data 75 bytes Those figures need to be taken with a couple of grains of salt, but should give a indication of at what object size one needs to manually handle object lifetimes. As a comparison -- the 64 bit haven: 64 bit arch, 100 objects 2 GB data 1.5 GB 100 GB data 600 MB 1 TB data 200 MB 64 bit arch, 1000 objects 2 GB data 350 MB 100 GB data 250 MB 1 TB data 150 MB 64 bit arch, 10000 objects 2 GB data 100 MB 100 GB data 75 MB 1 TB data 50 MB </theoretical rambling> Summary: As far as I can see, what I have to do to avoid memory leaks with a conservative GC, is one of the following: 1. move to a 64 bit architecture 2. manually handle all objects larger than a few hundred bytes (see above) 3. hide all non pointer data from the GC It is a shame #2 is such a pain and that D doesn't offer any help such as automatic ref-counting. Not having automatic ref-counting also prevents neat solutions such as transparent CoW, and automatic handling of scarce resources. If I wouldn't have a strong belief that automatic ref-counting would be addressed soon, I'd definitely consider going back to C++. Luckily, before I'd give up waiting, 64 bit architectures will probably be in great majority. ;) /Oskar

Dont hold your breath for a reference counting implementation from DMD. Walter doesnt want to add any overhead to assignments.
Jan 09 2007
next sibling parent Johan Granberg <lijat.meREM OVE.gmail.com> writes:
Kyle Furlong wrote:

 Oskar Linde wrote:
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

I feel kind of bad for making it sound like this is a problem related specifically to the D garbage collector. It is rather a general and apparently well known problem of all conservative garbage collectors. The D garbage collector is still excellent for a large domain of problems. Lots of people seem to be having similar problems though, so a better understanding of under what conditions a conservative garbage collector will and will not work seems to be called for. I'd guess that a lot of problems we have with the conservative GC is because we are at the end of the life time of the 32-bit address space. As time has passed, our programs have used a larger and larger portion of the available memory space. As one anonymous %u said, one solution would be "increasing the amount of memory needed for pointers by the factor two", which I interpret as moving to a 64-bit memory space. I did some calculations to better understand what is going on. I've probably done several errors though. <theoretical rambling> The probability of one single uniformly random spurious pointer preventing the collection of an object of size s is p = s/B, with B being the size of the address space (2^b for b = 32 or 64 bits). On a 64 bit architecture, the probability is obviously drastically reduced. if g is the amount (bytes) of independent random data an application needs, the risk of an object of size s not being collected is P(g) = 1 - (1 - s/B) ^ (g/(b/8)) As one can see, the risk increases considerably as the objects gets bigger. Increasing the amount of random data the application contains, reduces the size of objects the GC will be able to handle satisfactory. My example was kind of nasty in that it not only accumulated additional random garbage in each iteration, but also caused heap fragmentation. A kinder example would always create objects of the same size. The simpler example would be: * start with a static set of g bytes of random data * for each iteration create n objects of size s (for simplicity, let each object contain s bytes of random data) * after each iteration, none of the n objects are used anymore. This is a good time to call gc.fullCollect(); After each such iteration, i, P_i*n_i objects will remain uncollected, and will be added to the pool of random spurious pointers. I disregard the small portion of objects that are uncollected because of spurious pointers appearing in other objects left uncollected in the same iteration. If this portion would be significant, you'd really have a problem anyway and this will show up in later iterations. g_{i+1} ~ g_i + P_i*n_i*s. I generously assume that the safe memory area of the collected objects are reused by the allocator in the next iteration. A few of those will become unsafe by the recently added spurious pointers from uncollected objects, but for the most part, those memory areas are now safe, and the objects allocated/collected from them can be disregarded. The number of objects that need to be allocated in unsafe memory areas for the next iteration becomes: n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i) This will of course not perfectly predict the real behavior, as it relaxes the problem away from discrete units, but should still adequately model the sample application. Using this, I tried to find the breaking point, i.e. at what size objects need to be explicitly freed instead of left to the GC. "Verification": Rewriting my original example to to a std.gc.fullCollect() every 10th iteration, giving the following parameters: g = 20 MB n = 10 the model suggests allocating objects of size: s = 2000 would result in almost no space overhead, stable at 20 mb s = 3000 would result in a slight but stable space overhead, s = 4000 would cause a run-away memory usage running the program results in: s = 2000 memory usage is stable at 20 mb s = 3000 results in a small apparently unbounded memory leak s = 4000 results in a unbounded memory leak The model appears to be at least in the correct ballpark for this sample. Theoretical results: Those tables show the maximum object size the GC will be able to handle with different static amounts of "random" data. By "being able to handle", I mean that the application doesn't use more than 2x the required amount of memory. The exact breaking point is very unstable and going above it rapidly results in uncontrolled memory consumption. 32 bit arch, 100 objects 1 MB data 8000 bytes 5 MB data 4500 bytes 10 MB data 3000 bytes 20 MB data 2000 bytes 100 MB data 700 bytes 500 MB data 200 bytes 1000 MB data 100 bytes 32 bit arch, 1000 objects 1 MB data 3400 bytes 5 MB data 2200 bytes 10 MB data 1700 bytes 20 MB data 1200 bytes 100 MB data 500 bytes 500 MB data 150 bytes 1000 MB data 100 bytes 32 bit arch, 10000 objects 1 MB data 1300 bytes 5 MB data 1000 bytes 10 MB data 800 bytes 20 MB data 600 bytes 100 MB data 300 bytes 500 MB data 100 bytes 1000 MB data 75 bytes Those figures need to be taken with a couple of grains of salt, but should give a indication of at what object size one needs to manually handle object lifetimes. As a comparison -- the 64 bit haven: 64 bit arch, 100 objects 2 GB data 1.5 GB 100 GB data 600 MB 1 TB data 200 MB 64 bit arch, 1000 objects 2 GB data 350 MB 100 GB data 250 MB 1 TB data 150 MB 64 bit arch, 10000 objects 2 GB data 100 MB 100 GB data 75 MB 1 TB data 50 MB </theoretical rambling> Summary: As far as I can see, what I have to do to avoid memory leaks with a conservative GC, is one of the following: 1. move to a 64 bit architecture 2. manually handle all objects larger than a few hundred bytes (see above) 3. hide all non pointer data from the GC It is a shame #2 is such a pain and that D doesn't offer any help such as automatic ref-counting. Not having automatic ref-counting also prevents neat solutions such as transparent CoW, and automatic handling of scarce resources. If I wouldn't have a strong belief that automatic ref-counting would be addressed soon, I'd definitely consider going back to C++. Luckily, before I'd give up waiting, 64 bit architectures will probably be in great majority. ;) /Oskar

Dont hold your breath for a reference counting implementation from DMD. Walter doesnt want to add any overhead to assignments.

Wouldn't it be possible to have classes declared "refcounted" in the same way as scope? Then the extra cost would only apply to members of thouse classes, possibly by disallowing uppcasts to non refcounted superclasses. example: refcounted class foo{ } ... void bar(){ foo f=new foo();//refcount is one auto k=f;//refcount is two k=null;//back to one //here the refcount reaches zero and the object is deleted }
Jan 09 2007
prev sibling parent Thomas Kuehne <thomas-dloop kuehne.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kyle Furlong schrieb am 2007-01-09:
 Oskar Linde wrote:
 Oskar Linde wrote:


<snip>
 If I wouldn't have a strong belief that automatic ref-counting would be 
 addressed soon, I'd definitely consider going back to C++. Luckily, 
 before I'd give up waiting, 64 bit architectures will probably be in 
 great majority. ;)


 Dont hold your breath for a reference counting implementation from DMD. 
 Walter doesnt want to add any overhead to assignments.

Reference counting implementations would cause a lot of pain and overhead with array slices. A sweeping GC that has different pools for non-pointer and may-contain-pointer memory areas is easier to implement and causes less overhead. In addition reference counting can cause interesting problems in multi-core systems. Yes, using advance information all sweeping GC's can be exploited, though user data should'nt usually be a problem as most of it ends up in the no-pointer pool. The x86-stack however is a potential attack vector. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFFpCw3LK5blCcjpWoRAuFjAKCpXiyi61g916iUQO9IUbb8GIv4gwCfWmUG edc07TIMP5n4l0+bzWXeFr4= =4xvh -----END PGP SIGNATURE-----
Jan 09 2007
prev sibling parent Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

I feel kind of bad for making it sound like this is a problem related specifically to the D garbage collector. It is rather a general and apparently well known problem of all conservative garbage collectors. The D garbage collector is still excellent for a large domain of problems. Lots of people seem to be having similar problems though, so a better understanding of under what conditions a conservative garbage collector will and will not work seems to be called for.

This link may be relevant: http://www.hpl.hp.com/techreports/2001/HPL-2001-251.html "Bounding Space Usage of Conservative Garbage Collectors" - Hans J. Boehm Sean
Jan 10 2007
prev sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
The only comment I'd like to add is this:

At the same time as we're thinking about how to use a precise collector 
instead of a conservative collector, we should also think about how to 
implement a generational copying collector, since it might be necessary 
to add new constructs to the core language (pointer pinning, for 
example) to facilitate the new collector semantics.

--benji
Jan 10 2007
parent Sean Kelly <sean f4.ca> writes:
Benji Smith wrote:
 The only comment I'd like to add is this:
 
 At the same time as we're thinking about how to use a precise collector 
 instead of a conservative collector, we should also think about how to 
 implement a generational copying collector, since it might be necessary 
 to add new constructs to the core language (pointer pinning, for 
 example) to facilitate the new collector semantics.

Personally, I'm not convinced that a traditional moving GC will be the way to go in D. Unless we are given some way to obtain type info for data on the stack and in the static data segment, anything directly referenced by these regions would have to be implicitly pinned. And typically, long-lived data is referenced directly from such links. A programmer could work around this limitation by using proxy classes, but this seems like a lot of trouble just to allow for generational garbage collection. However, there are variants on the idea that sound like they have potential, one of which I believe is being discussed. As for pointer pinning--it's fairly easy to add to the GC, so I don't see that as an obstacle to future design. The bigger problem is sorting out efficient implementations of Object.opHash and such for objects whose only persistent uniquely identifying characteristic is their address. Sean
Jan 10 2007