digitalmars.D - The problem with the D GC

Oskar Linde (50/50) Jan 08 2007 After having fought a while with D programs with runaway memory leaks,

Johan Granberg (5/62) Jan 08 2007 I have observed the same behavior but did not realize why it happened
Lutger (4/10) Jan 08 2007 ....
Lionello Lunesu (6/62) Jan 08 2007 I've run into similar problems back when I was messing around with the
Tom S (7/8) Jan 08 2007 I've experienced pretty much the same while doing memory-intensive

kenny (18/29) Jan 08 2007 I also have experienced bad GC performance. I found it to be because of

Sean Kelly (12/18) Jan 08 2007 Probably not. Assuming this is a multithreaded app, you have to pay for...

Frits van Bommel (20/32) Jan 08 2007 Chained concats *do* have special handling. See

Frits van Bommel (3/7) Jan 08 2007 Now that I think about it, maybe it would be possible to eliminate that
Sean Kelly (18/52) Jan 08 2007 Oh good :) I knew about the varargs but hadn't given the code a close

Frits van Bommel (6/25) Jan 08 2007 Normal concats always allocate, so there's no need to check the size

kenny (23/47) Jan 08 2007 Honestly, it was a long time ago, so remembering is difficult. I didn't

Tom S (8/10) Jan 08 2007 This is not always an option. With a few hundred megs of memory

Sean Kelly (7/63) Jan 08 2007 Since the patch keys on element size, the above code would still leak
Andrey Khropov (9/20) Jan 08 2007 That's what precise GC is all about.
%u (15/18) Jan 08 2007 Wrong. You have an intentional memory leak, from which the GC
Miles (25/31) Jan 08 2007 Some time ago I had the exact same problem, and I tried to convince

Frits van Bommel (11/20) Jan 08 2007 Of course, if you're only concerned about malicious users sending

Miles (3/6) Jan 09 2007 Yeah, sure. You can also just pick a random element per iteration
Kevin Bealer (6/6) Jan 09 2007 I would think you are find if you pick your pivot at random(), and seed ...

Frits van Bommel (4/9) Jan 09 2007 IIRC randomizing the input is a rather general solution to this kind of

Bill Baxter (7/13) Jan 08 2007 Wow. That may be the problem I'm having too. I have some radial basis
Bill Baxter (41/51) Jan 08 2007 Here's a slightly less contrived version of Oskar's gc test.

%u (3/54) Jan 08 2007 Agreed. This needs to be changed. Is the GC in that tango

Sean Kelly (6/61) Jan 08 2007 It's a modified version of the DMD GC. The "don't scan blocks

=?ISO-8859-1?Q?Lu=EDs_Marques?= (4/8) Jan 09 2007 Does the new GC allow setting a hook to be informed of when a given

Lutger (4/14) Jan 09 2007 This is possible with these functions in Object:

Sean Kelly (20/34) Jan 09 2007 This option is available. There is also a global hook that can be

Sean Kelly (17/63) Jan 09 2007 For what it's worth, I ran the test above with the modified GC in Tango,...

Ralf Schneider (3/18) Jan 09 2007 It dosen't seem so hard for me to let the compiler set such an attribute...

Frits van Bommel (24/26) Jan 09 2007 It's pretty hard if you can't modify the compiler ;).

Kyle Furlong (5/35) Jan 09 2007 According to a conversation I had with Walter a month or two ago, this
Sean Kelly (21/47) Jan 09 2007 An interim alternative that came up in conversation would be to provide

Sean Kelly (4/8) Jan 09 2007 Err, this should be:
Frits van Bommel (5/10) Jan 09 2007 I actually tried something like that a while back. Unfortunately DMD
zz (11/18) Jan 09 2007 Some years ago we had some comparision between two PostScript (Not GS)

Bill Baxter (14/36) Jan 09 2007 I'd certainly be willing to do that sort of thing in my code to make the...

%u (2/38) Jan 09 2007 I feel the same way. Without GC, D is just C++ with a few more features.

Kyle Furlong (6/44) Jan 10 2007 I do not think that this is even remotely valid. Think of everything you...

%u (3/47) Jan 10 2007 It's not ignorant from my viewpoint. You're assuming I care about the fe...

Jarrett Billingsley (4/7) Jan 10 2007 A bit OT, but would you please get a newsreader so you can stop being ca...
Kyle Furlong (9/56) Jan 10 2007 What I'm trying to say is, that to say a language is just a sum of its

Bill Baxter (17/88) Jan 10 2007 Yes, that's certainly a valid point. But I think maybe you're taking

Kyle Furlong (3/98) Jan 10 2007 I agree. This is only good news for D, however, because libraries and

Oskar Linde (133/139) Jan 09 2007 I feel kind of bad for making it sound like this is a problem related

Kyle Furlong (3/180) Jan 09 2007 Dont hold your breath for a reference counting implementation from DMD.

Johan Granberg (14/197) Jan 09 2007 Wouldn't it be possible to have classes declared "refcounted" in the sam...
Thomas Kuehne (18/26) Jan 09 2007 -----BEGIN PGP SIGNED MESSAGE-----

Sean Kelly (5/21) Jan 10 2007 This link may be relevant:

Benji Smith (7/7) Jan 10 2007 The only comment I'd like to add is this:

Sean Kelly (15/22) Jan 10 2007 Personally, I'm not convinced that a traditional moving GC will be the

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

After having fought a while with D programs with runaway memory leaks, 
I've unfortunately had to come to the conclusion that the D GC is not 
ready for production use. The problem is what I'd call "spurious 
pointers". That is random data (strings, numbers, image data, audio or 
whatever) appearing to the GC to be full of pointers to all over the 
memory space.

Consider this simple program. It is designed to have a memory footprint 
of about 20 mb and then continuously process data.

import std.random;

void main() {
         // The real memory use, ~20 mb
         uint[] data;
         data.length = 5_000_000;
         foreach(inout x; data)
                 x = rand();
         while(1) {
		// simulate reading a few kb of data
                 uint[] incoming;
                 incoming.length = 1000 + rand() % 5000;
                 foreach(inout x; incoming)
                         x = rand();
                 // do something with the data...
         }
}

The result may not be as expected. The program will use up all available 
memory (for me crashing at about 2.7 gb of memory usage) and at the same 
time run extremely slow due to the panicked GC scanning all memory over 
and over.

The reason is the 20 mb of random data and the small 32-bit memory 
address range of 4 GB. To understand how bad this is, 20 mb of random 
data will result in _each_ 4k memory page on average having 5 random 
pointers to it. Those spurious pointers are laying a dense mine-field 
effectively disabling the GC.

This means that each time you rely on the GC (array appending/resizing, 
Phobos function calls etc), you have a potential memory leak. (That is 
unless all the program data is nothing but valid pointers/references or 
all non-pointer data is hidden from the GC.)

The above program is of course just a toy illustrating the phenomena. In 
a text processing program of mine the bulk of the data is short char[] 
strings. The program still has runaway memory leaks leading to an 
inevitable crash. I have absolutely no idea how to handle text 
processing using the D recommended char[] and CoW idiom without getting 
severe memory leaks.

The definite solution has to be a GC that only scans memory containing 
pointers. Sean's patches to make the GC skip scanning memory known to 
contain elements smaller than sizeof(void*) will probably help 
tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
float or double data, or the DMD associative array implementation)

-- 
/Oskar

Jan 08 2007

Johan Granberg <lijat.meREM OVE.gmail.com> writes:

Oskar Linde wrote:

 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint
 of about 20 mb and then continuously process data.
 
 import std.random;
 
 void main() {
          // The real memory use, ~20 mb
          uint[] data;
          data.length = 5_000_000;
          foreach(inout x; data)
                  x = rand();
          while(1) {
 // simulate reading a few kb of data
                  uint[] incoming;
                  incoming.length = 1000 + rand() % 5000;
                  foreach(inout x; incoming)
                          x = rand();
                  // do something with the data...
          }
 }
 
 The result may not be as expected. The program will use up all available
 memory (for me crashing at about 2.7 gb of memory usage) and at the same
 time run extremely slow due to the panicked GC scanning all memory over
 and over.
 
 The reason is the 20 mb of random data and the small 32-bit memory
 address range of 4 GB. To understand how bad this is, 20 mb of random
 data will result in _each_ 4k memory page on average having 5 random
 pointers to it. Those spurious pointers are laying a dense mine-field
 effectively disabling the GC.
 
 This means that each time you rely on the GC (array appending/resizing,
 Phobos function calls etc), you have a potential memory leak. (That is
 unless all the program data is nothing but valid pointers/references or
 all non-pointer data is hidden from the GC.)
 
 The above program is of course just a toy illustrating the phenomena. In
 a text processing program of mine the bulk of the data is short char[]
 strings. The program still has runaway memory leaks leading to an
 inevitable crash. I have absolutely no idea how to handle text
 processing using the D recommended char[] and CoW idiom without getting
 severe memory leaks.
 
 The definite solution has to be a GC that only scans memory containing
 pointers. Sean's patches to make the GC skip scanning memory known to
 contain elements smaller than sizeof(void*) will probably help
 tremendously. (I'd just have to make sure I'm not using dchar[] strings,
 float or double data, or the DMD associative array implementation)
 

I have observed the same behavior but did not realize why it happened
(thought it was a gc bug on osx or something). Something that helped the
problem for me was to call fullCollect very often (40 times a second) this
reduced the leak from 1mb a second to almost nothing.

Jan 08 2007

Lutger <lutger.blijdestijn gmail.com> writes:

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

....

Wow that is so bad. I thought it has been mentioned this will not be a 
problem in practice but it apparently is.

Jan 08 2007

Lionello Lunesu <lio lunesu.remove.com> writes:

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint 
 of about 20 mb and then continuously process data.
 
 import std.random;
 
 void main() {
         // The real memory use, ~20 mb
         uint[] data;
         data.length = 5_000_000;
         foreach(inout x; data)
                 x = rand();
         while(1) {
         // simulate reading a few kb of data
                 uint[] incoming;
                 incoming.length = 1000 + rand() % 5000;
                 foreach(inout x; incoming)
                         x = rand();
                 // do something with the data...
         }
 }
 
 The result may not be as expected. The program will use up all available 
 memory (for me crashing at about 2.7 gb of memory usage) and at the same 
 time run extremely slow due to the panicked GC scanning all memory over 
 and over.
 
 The reason is the 20 mb of random data and the small 32-bit memory 
 address range of 4 GB. To understand how bad this is, 20 mb of random 
 data will result in _each_ 4k memory page on average having 5 random 
 pointers to it. Those spurious pointers are laying a dense mine-field 
 effectively disabling the GC.
 
 This means that each time you rely on the GC (array appending/resizing, 
 Phobos function calls etc), you have a potential memory leak. (That is 
 unless all the program data is nothing but valid pointers/references or 
 all non-pointer data is hidden from the GC.)
 
 The above program is of course just a toy illustrating the phenomena. In 
 a text processing program of mine the bulk of the data is short char[] 
 strings. The program still has runaway memory leaks leading to an 
 inevitable crash. I have absolutely no idea how to handle text 
 processing using the D recommended char[] and CoW idiom without getting 
 severe memory leaks.
 
 The definite solution has to be a GC that only scans memory containing 
 pointers. Sean's patches to make the GC skip scanning memory known to 
 contain elements smaller than sizeof(void*) will probably help 
 tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
 float or double data, or the DMD associative array implementation)

I've run into similar problems back when I was messing around with the 
Universal Machine for that programing contest. It would run slower and 
slower. Skipping GC checks on arrays without pointers is a must, if you 
ask me.

L.

Jan 08 2007

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Oskar Linde wrote:
 (...) gc hell (...)

I've experienced pretty much the same while doing memory-intensive 
computations. Since then I've been using lots of malloc/realloc/free and 
both my memory footprint and execution speed have improved. The GC needs 
a fix. Badly.


--
Tomasz Stachowiak

Jan 08 2007

kenny <funisher gmail.com> writes:

I also have experienced bad GC performance. I found it to be because of 
the ~ operator on strings. The program is a daemon, and after it had 
been running for a while, memory usage gets truly horrific, and 
performance degrades very bad.

This was back on 0.140, so things may have changed since then...

I solved two ways. First, I wrote a function which accepts variadic 
arguments, and separated everything by a comma instead of appending the 
strings and the performance difference was stunning. it was also nice to 
be able to write:

prt("string", my_int, " ", my_float, "string2");

instead of

"string"~std.string.toString(my_int)~" 
"~std.string.toString(my_float)~"string2"

Second, like someone else in this thread I also called fullCollect every 
second.

I used to use gdc-0.08 with boehm-gc too. I can't honestly remember if 
that had the same problem.

Tom S wrote:
 Oskar Linde wrote:
 (...) gc hell (...)

 
 I've experienced pretty much the same while doing memory-intensive 
 computations. Since then I've been using lots of malloc/realloc/free and 
 both my memory footprint and execution speed have improved. The GC needs 
 a fix. Badly.
 
 
 -- 
 Tomasz Stachowiak

Jan 08 2007

Sean Kelly <sean f4.ca> writes:

kenny wrote:
 I also have experienced bad GC performance. I found it to be because of 
 the ~ operator on strings. The program is a daemon, and after it had 
 been running for a while, memory usage gets truly horrific, and 
 performance degrades very bad.
 
 This was back on 0.140, so things may have changed since then...

Probably not.  Assuming this is a multithreaded app, you have to pay for 
two mutex locks for every concat operation.  So an expression like:

a = b ~ c ~ d;

would result in six locks of the mutex protecting the GC to perform 
allocations (I think anyway--I don't believe chained concats have 
special handling).  I think this would also result in two discarded 
temporary buffers for the GC to clean up later.  And since the Phobos GC 
scans all such blocks by default...

My patch will address the scanning issue, but it won't address the mutex 
issue--there's really no easy way to avoid that one.


Sean

Jan 08 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 Assuming this is a multithreaded app, you have to pay for 
 two mutex locks for every concat operation.  So an expression like:
 
 a = b ~ c ~ d;
 
 would result in six locks of the mutex protecting the GC to perform 
 allocations (I think anyway--I don't believe chained concats have 
 special handling).  I think this would also result in two discarded 
 temporary buffers for the GC to clean up later.  And since the Phobos GC 
 scans all such blocks by default...

Chained concats *do* have special handling. See 
phobos/internal/arraycat.d, _d_arraycatn() (first function in the file, 
can't miss it).
This function takes element size and number of arrays, followed by 
C-style vararg arrays.
And yes, it only allocates once :).

By the way, how do you figure six locks? At two locks per concat, that 
code only has two concats so even without above special handling 
wouldn't it still only lock four times?

Also: Why two locks per concat at all? A concat only requires one 
allocation, so does a single alloc require two locks?
(I haven't looked much into the GC code, so this is just curiosity)

 My patch will address the scanning issue, but it won't address the mutex 
 issue--there's really no easy way to avoid that one.

What exactly do you mean by "the mutex issue"? Is using mutexes at all 
the problem, or is it locking them too often per operation (i.e. more 
than once per alloc)?

If you don't want to use them at all, I think the closest you'd get 
would involve implementing thread-local heaps.
But that would still require a mutex if you want to allow deleting an 
object from a different thread than it was allocated in...

Jan 08 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Frits van Bommel wrote:
 If you don't want to use them at all, I think the closest you'd get 
 would involve implementing thread-local heaps.
 But that would still require a mutex if you want to allow deleting an 
 object from a different thread than it was allocated in...

Now that I think about it, maybe it would be possible to eliminate that 
mutex as well. Hmm...

Jan 08 2007

Sean Kelly <sean f4.ca> writes:

Frits van Bommel wrote:
 Sean Kelly wrote:
 Assuming this is a multithreaded app, you have to pay for two mutex 
 locks for every concat operation.  So an expression like:

 a = b ~ c ~ d;

 would result in six locks of the mutex protecting the GC to perform 
 allocations (I think anyway--I don't believe chained concats have 
 special handling).  I think this would also result in two discarded 
 temporary buffers for the GC to clean up later.  And since the Phobos 
 GC scans all such blocks by default...

 
 Chained concats *do* have special handling. See 
 phobos/internal/arraycat.d, _d_arraycatn() (first function in the file, 
 can't miss it).
 This function takes element size and number of arrays, followed by 
 C-style vararg arrays.
 And yes, it only allocates once :).

Oh good :)  I knew about the varargs but hadn't given the code a close 
enough look to be sure.

 By the way, how do you figure six locks? At two locks per concat, that 
 code only has two concats so even without above special handling 
 wouldn't it still only lock four times?

I miscounted :p  It would have been four without the special handling.

 Also: Why two locks per concat at all? A concat only requires one 
 allocation, so does a single alloc require two locks?
 (I haven't looked much into the GC code, so this is just curiosity)

The code in internal/gc/gc.d first checks the size of the array to see 
if a realloc is necessary, then it performs the realloc.  So worst case 
you're stuck with two locks per operation.  However, this may not be the 
case in the arraycat routines.  It's been a while since I've looked at them.

 My patch will address the scanning issue, but it won't address the 
 mutex issue--there's really no easy way to avoid that one.

 
 What exactly do you mean by "the mutex issue"? Is using mutexes at all 
 the problem, or is it locking them too often per operation (i.e. more 
 than once per alloc)?

Too often per operation.  So given the vararg stuff you mentioned above, 
this isn't an issue IMO.  As it is, locks should only be acquired if the 
app is multithreaded, so this cost is only incurred if it's necessary. 
I think Phobos might actually lock in a few instances where it isn't 
strictly necessary, but I can't recall for certain.

 If you don't want to use them at all, I think the closest you'd get 
 would involve implementing thread-local heaps.
 But that would still require a mutex if you want to allow deleting an 
 object from a different thread than it was allocated in...

No it wouldn't--give each per-thread heap a lock-free free list.  But 
per-thread heaps for a GC are somewhat complicated.  I think you'd have 
to implement something like a read/write lock where the "writer" is 
actually the thread that wants to collect.


Sean

Jan 08 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 Frits van Bommel wrote:
 Also: Why two locks per concat at all? A concat only requires one 
 allocation, so does a single alloc require two locks?
 (I haven't looked much into the GC code, so this is just curiosity)

 
 The code in internal/gc/gc.d first checks the size of the array to see 
 if a realloc is necessary, then it performs the realloc.  So worst case 
 you're stuck with two locks per operation.  However, this may not be the 
 case in the arraycat routines.  It's been a while since I've looked at 
 them.

Normal concats always allocate, so there's no need to check the size 
(and so they don't).
You're probably thinking of appends.

 If you don't want to use them at all, I think the closest you'd get 
 would involve implementing thread-local heaps.
 But that would still require a mutex if you want to allow deleting an 
 object from a different thread than it was allocated in...

 
 No it wouldn't--give each per-thread heap a lock-free free list.  But 
 per-thread heaps for a GC are somewhat complicated.  I think you'd have 
 to implement something like a read/write lock where the "writer" is 
 actually the thread that wants to collect.

I replied to myself about 10 minutes later when I realized it was 
probably possible :).

Jan 08 2007

kenny <funisher gmail.com> writes:

Honestly, it was a long time ago, so remembering is difficult. I didn't 
have SVN installed back then :) I'm pretty sure it wasn't 
multi-threaded. I think that we launched multiple processes. (eg. not 
mutexes either) I know I used a lot of assoc arrays, and I also used a 
lot of concatenation. The size of the code was quite huge too.. at least 
8000 lines.

It could be a combination of those factors, and it may have been as 
early of a version of 0.73. I dunno dude, I just remember it sucked.

I have not really been having bad experiences with the GC lately, accept 
that, it's annoying at times with the peaks. Every request will give a 
response time of 2ms, accept when the GC runs, and then it'll give a 
50ms response. I suppose that's not that horrible, but it is a 25x 
difference.

What I've wanted to do for a while now is to be able to set the size of 
the pool of memory, and also want the variable to be set telling me some 
form of information of how close the next collect is.. whether it's 
largest memory block avail, or % fragmentation, or something, so I can 
take the processes and tell them to stop accepting requests (and let one 
of the other processes handle it) and run the GC so the GC doesn't run 
while accepting requests.

I guess I can just estimate based off of the amount of requests it has 
processed... but the other way sounds way cooler.


Sean Kelly wrote:
 kenny wrote:
 I also have experienced bad GC performance. I found it to be because 
 of the ~ operator on strings. The program is a daemon, and after it 
 had been running for a while, memory usage gets truly horrific, and 
 performance degrades very bad.

 This was back on 0.140, so things may have changed since then...

 
 Probably not.  Assuming this is a multithreaded app, you have to pay for 
 two mutex locks for every concat operation.  So an expression like:
 
 a = b ~ c ~ d;
 
 would result in six locks of the mutex protecting the GC to perform 
 allocations (I think anyway--I don't believe chained concats have 
 special handling).  I think this would also result in two discarded 
 temporary buffers for the GC to clean up later.  And since the Phobos GC 
 scans all such blocks by default...
 
 My patch will address the scanning issue, but it won't address the mutex 
 issue--there's really no easy way to avoid that one.
 
 
 Sean

Jan 08 2007

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

kenny wrote:
 Second, like someone else in this thread I also called fullCollect every 
 second.

This is not always an option. With a few hundred megs of memory 
allocated, collections can last seconds. Resorting to malloc was the 
only rescue for me...


Anyway, thanks for sharing your experiences guys. Now I know I'm not the 
only one around having difficult relationships with the GC.


--
Tomasz Stachowiak

Jan 08 2007

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint 
 of about 20 mb and then continuously process data.
 
 import std.random;
 
 void main() {
         // The real memory use, ~20 mb
         uint[] data;
         data.length = 5_000_000;
         foreach(inout x; data)
                 x = rand();
         while(1) {
         // simulate reading a few kb of data
                 uint[] incoming;
                 incoming.length = 1000 + rand() % 5000;
                 foreach(inout x; incoming)
                         x = rand();
                 // do something with the data...
         }
 }
 
 The result may not be as expected. The program will use up all available 
 memory (for me crashing at about 2.7 gb of memory usage) and at the same 
 time run extremely slow due to the panicked GC scanning all memory over 
 and over.
 
 The reason is the 20 mb of random data and the small 32-bit memory 
 address range of 4 GB. To understand how bad this is, 20 mb of random 
 data will result in _each_ 4k memory page on average having 5 random 
 pointers to it. Those spurious pointers are laying a dense mine-field 
 effectively disabling the GC.
 
 This means that each time you rely on the GC (array appending/resizing, 
 Phobos function calls etc), you have a potential memory leak. (That is 
 unless all the program data is nothing but valid pointers/references or 
 all non-pointer data is hidden from the GC.)
 
 The above program is of course just a toy illustrating the phenomena. In 
 a text processing program of mine the bulk of the data is short char[] 
 strings. The program still has runaway memory leaks leading to an 
 inevitable crash. I have absolutely no idea how to handle text 
 processing using the D recommended char[] and CoW idiom without getting 
 severe memory leaks.
 
 The definite solution has to be a GC that only scans memory containing 
 pointers. Sean's patches to make the GC skip scanning memory known to 
 contain elements smaller than sizeof(void*) will probably help 
 tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
 float or double data, or the DMD associative array implementation)

Since the patch keys on element size, the above code would still leak 
horribly by default.  However, the user can set/clear this "no scan" 
flag explicitly, so if there are any memory blocks that are still 
scanned by default, you can indicate that the GC should not scan them. 
I think between the two, we should be in pretty good shape.


Sean

Jan 08 2007

"Andrey Khropov" <andkhropov_nosp m_mtu-net.ru> writes:

Oskar Linde wrote:

 After having fought a while with D programs with runaway memory leaks, I've
 unfortunately had to come to the conclusion that the D GC is not ready for
 production use. The problem is what I'd call "spurious pointers". That is
 random data (strings, numbers, image data, audio or whatever) appearing to
 the GC to be full of pointers to all over the memory space.
 
 The definite solution has to be a GC that only scans memory containing
 pointers. Sean's patches to make the GC skip scanning memory known to contain
 elements smaller than sizeof(void*) will probably help tremendously. (I'd
 just have to make sure I'm not using dchar[] strings, float or double data,
 or the DMD associative array implementation)

That's what precise GC is all about. 

I think it's the single biggest problem of the present D implementations. Some
of my measurements have shown that on memory-intensive applications current
Phobos GC could be 10x slower than MS.NET 2.0's GC:
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digi
talmars.D&artnum=43991

-- 
AKhropov

Jan 08 2007

%u <u digitaldaemon.com> writes:

== Quote from Oskar Linde (oskar.lindeREM OVEgmail.com)'s article
 This means that each time you rely on the GC (array
 appending/resizing, Phobos function calls etc), you have a
 potential memory leak.

Wrong. You have an intentional memory leak, from which the GC
cannot recover you.

The GC was designed to eventually free coders from memory leak
accidents. To enable this all variables are initialized by default--
-because this way no random data, including pointers, can survive.

In total your toy program establishes exactly that environment for
which the GC is known to fail.

There is an easy solution for such environments by increasing the
amount of memory needed for pointers by the factor two---or halving
the available memory if memory bounds are tight and one takes a
look from the other side.

But even this solution will not prevent the GC from sucking in all
data swapped out, if the GC is not coupled with the virtual memory
manager.

Jan 08 2007

Miles <_______ _______.____> writes:

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

Some time ago I had the exact same problem, and I tried to convince
Walter of how bad this problem is. Others have said the same thing
before you and me. To Walter, "in practice, all integers vary from 0 to
100" or something like that, implying that the problem does not exist in
fact or is not relevant.

I should also mention that this is considered a serious security issue,
on the same class of the QuickSort algorithm.

For those not aware of secure programming practices: using QuickSort to
sort data input by a remote user is considered a bad programming
practice, because although QuickSort is O(n*log(n)) on average, there is
a set of input data that makes the algorithm O(n^2). Knowing the
implementation, an attacker may feed the application with carefully
crafted data that exploits this weakness.

In the D case, knowing how data are laid on memory, it is easy for a
remote attacker to feed data that, read as pointers, would point to
large (and otherwise temporary) blocks of data, making them leak. Stack
base randomization and other similar techniques helps preventing this,
but IMO, just hides the problem.

Another problem in current GC implementation, IMO, is that memory space
once allocated is never given back to the OS when disposed. What makes
this worse is that it is common for programs to use a big amount of
memory during initialization (like parsing XML data files), and never
use it again. Again, Walter already knows of this and doesn't think it
is a problem.

Jan 08 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Miles wrote:
 I should also mention that this is considered a serious security issue,
 on the same class of the QuickSort algorithm.
 
 For those not aware of secure programming practices: using QuickSort to
 sort data input by a remote user is considered a bad programming
 practice, because although QuickSort is O(n*log(n)) on average, there is
 a set of input data that makes the algorithm O(n^2). Knowing the
 implementation, an attacker may feed the application with carefully
 crafted data that exploits this weakness.

Of course, if you're only concerned about malicious users sending 
worst-case data (and not about worst-case data appearing randomly) 
there's an easy fix: randomize the data in O(n) :).
Don't laugh, they taught me that in school. And it works, Quicksort will 
remain O(n*log(n)) on average with that simple step performed 
beforehand, no matter the input data.
You'll need to trust your randomizer though, if the attacker can 
influence *that* you're just screwed and should use introsort or merge 
sort or something else with guaranteed O(n*log(n)) behavior if that's 
really what you want...

Jan 08 2007

Miles <_______ _______.____> writes:

Frits van Bommel wrote:
 Of course, if you're only concerned about malicious users sending
 worst-case data (and not about worst-case data appearing randomly)
 there's an easy fix: randomize the data in O(n) :).

Yeah, sure. You can also just pick a random element per iteration
(instead of the first one or the middle one).

Jan 09 2007

Kevin Bealer <kevinbealer gmail.com> writes:

I would think you are find if you pick your pivot at random(), and seed the
random
number generator at the top of your program with microseconds-since-1970 or
whatever the convenient chaotic local number is.

Or you could probably just use D's [].sort method -- I think Walter said it uses
quick-sort but checks for poor performance and switches to a heap sort
automatically.

Kevin

Jan 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Kevin Bealer wrote:
 I would think you are find if you pick your pivot at random(), and seed the
random
 number generator at the top of your program with microseconds-since-1970 or
 whatever the convenient chaotic local number is.

IIRC randomizing the input is a rather general solution to this kind of 
problem (not just with Quicksort).

 Or you could probably just use D's [].sort method -- I think Walter said it
uses
 quick-sort but checks for poor performance and switches to a heap sort
automatically.

That would be a variant of introsort. I believe I mentioned it.

Jan 09 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.


Wow.  That may be the problem I'm having too.  I have some radial basis 
function interpolation code that I put together just before the holiday 
break, which I observed to be leaking like mad.  First thing I need to 
do today after catching up on mail is to figure out why, but maybe 
you've just answered the question for me.

--bb

Jan 08 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Here's a slightly less contrived version of Oskar's gc test.

import std.math;
import std.random;
import std.stdio;

void main() {
     // The real memory use, ~40 mb
     double[] data;
     data.length = 5_000_000;
     foreach(i, inout x; data) {
         x = sin(cast(double)i/data.length);
         //x = 1;
     }
     int count = 0;
     int gcount = 0;
     while(1) {
         // simulate reading a few kb of data
         double[] incoming;
         incoming.length = 1000 + rand() % 5000;
         foreach(i, inout x; incoming) {
             x = sin(cast(double)i/incoming.length);
             //x = 5;
         }
         // do something with the data...

         // print status message every so often
         count += incoming.length;
         if (count > 1_000_000) {
             count = 0;
             gcount++;
             writefln("%s processed", gcount);
         }
     }
}



This one uses doubles instead of uints and the data is the sin of some 
number.  These are _very_ realistic values for numeric data to have. 
The same effect can be seen.  Instead of hovering around 40MB, the 
memory use grows and grows and performance slows and slows.

This seems to be a very big issue.  The GC seems to be pretty much 
useless right now if you're going to have a lot of floating point data 
in your app.

--bb

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.
 
 Consider this simple program. It is designed to have a memory footprint 
 of about 20 mb and then continuously process data.

Jan 08 2007

%u <dusr yahoo.com> writes:

== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Here's a slightly less contrived version of Oskar's gc test.
 import std.math;
 import std.random;
 import std.stdio;
 void main() {
      // The real memory use, ~40 mb
      double[] data;
      data.length = 5_000_000;
      foreach(i, inout x; data) {
          x = sin(cast(double)i/data.length);
          //x = 1;
      }
      int count = 0;
      int gcount = 0;
      while(1) {
          // simulate reading a few kb of data
          double[] incoming;
          incoming.length = 1000 + rand() % 5000;
          foreach(i, inout x; incoming) {
              x = sin(cast(double)i/incoming.length);
              //x = 5;
          }
          // do something with the data...
          // print status message every so often
          count += incoming.length;
          if (count > 1_000_000) {
              count = 0;
              gcount++;
              writefln("%s processed", gcount);
          }
      }
 }
 This one uses doubles instead of uints and the data is the sin of some
 number.  These are _very_ realistic values for numeric data to have.
 The same effect can be seen.  Instead of hovering around 40MB, the
 memory use grows and grows and performance slows and slows.
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.
 --bb
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

 Consider this simple program. It is designed to have a memory footprint
 of about 20 mb and then continuously process data.


Agreed. This needs to be changed. Is the GC in that tango
library any better?

Jan 08 2007

Sean Kelly <sean f4.ca> writes:

%u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Here's a slightly less contrived version of Oskar's gc test.
 import std.math;
 import std.random;
 import std.stdio;
 void main() {
      // The real memory use, ~40 mb
      double[] data;
      data.length = 5_000_000;
      foreach(i, inout x; data) {
          x = sin(cast(double)i/data.length);
          //x = 1;
      }
      int count = 0;
      int gcount = 0;
      while(1) {
          // simulate reading a few kb of data
          double[] incoming;
          incoming.length = 1000 + rand() % 5000;
          foreach(i, inout x; incoming) {
              x = sin(cast(double)i/incoming.length);
              //x = 5;
          }
          // do something with the data...
          // print status message every so often
          count += incoming.length;
          if (count > 1_000_000) {
              count = 0;
              gcount++;
              writefln("%s processed", gcount);
          }
      }
 }
 This one uses doubles instead of uints and the data is the sin of some
 number.  These are _very_ realistic values for numeric data to have.
 The same effect can be seen.  Instead of hovering around 40MB, the
 memory use grows and grows and performance slows and slows.
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.
 --bb
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

 Consider this simple program. It is designed to have a memory footprint
 of about 20 mb and then continuously process data.


 
 Agreed. This needs to be changed. Is the GC in that tango
 library any better?

It's a modified version of the DMD GC.  The "don't scan blocks 
containing elements smaller than pointer size" feature is built-in, and 
there is user-level control of that behavior on a per-block basis, among 
other things.  But it's still the same old mark/sweep GC at heart.


Sean

Jan 08 2007

=?ISO-8859-1?Q?Lu=EDs_Marques?= <luismarques gmail.com> writes:

Sean Kelly wrote:
 It's a modified version of the DMD GC.  The "don't scan blocks 
 containing elements smaller than pointer size" feature is built-in, and 
 there is user-level control of that behavior on a per-block basis, among 
 other things.  But it's still the same old mark/sweep GC at heart.

Does the new GC allow setting a hook to be informed of when a given 
object was collected? (I need that)

Lu�s

Jan 09 2007

Lutger <lutger.blijdestijn gmail.com> writes:

Lu�s Marques wrote:
 Sean Kelly wrote:
 It's a modified version of the DMD GC.  The "don't scan blocks 
 containing elements smaller than pointer size" feature is built-in, 
 and there is user-level control of that behavior on a per-block basis, 
 among other things.  But it's still the same old mark/sweep GC at heart.

 
 Does the new GC allow setting a hook to be informed of when a given 
 object was collected? (I need that)
 
 Lu�s

This is possible with these functions in Object:
final void notifyRegister(void delegate(Object) dg);
final void notifyUnRegister(void delegate(Object) dg);

Jan 09 2007

Sean Kelly <sean f4.ca> writes:

Lutger wrote:
 Lu�s Marques wrote:
 Sean Kelly wrote:
 It's a modified version of the DMD GC.  The "don't scan blocks 
 containing elements smaller than pointer size" feature is built-in, 
 and there is user-level control of that behavior on a per-block 
 basis, among other things.  But it's still the same old mark/sweep GC 
 at heart.

 Does the new GC allow setting a hook to be informed of when a given 
 object was collected? (I need that)

 
 This is possible with these functions in Object:
 final void notifyRegister(void delegate(Object) dg);
 final void notifyUnRegister(void delegate(Object) dg);

This option is available.  There is also a global hook that can be 
called when an object is collected by the GC (as opposed to destroyed 
deterministically via delete):

     bool myCollectHandler( Object o )
     {
         if( o.classinfo.name == "MyObject" )
         {
             (cast(MyObject) o).dispose();
             return false;
         }
         return true;
     }

     setCollectHandler( &myCollectHandler );

Returning false from the hook tells the GC not to finalize the 
object--ie. destroy its monitor but don't call its dtor.  The purpose of 
this method is twofold: to detect when memory is "leaked" and to allow 
objects to be cleaned up differently if disposed via delete than via a 
GC collection.


Sean

Jan 09 2007

Sean Kelly <sean f4.ca> writes:

Bill Baxter wrote:
 Here's a slightly less contrived version of Oskar's gc test.
 
 import std.math;
 import std.random;
 import std.stdio;
 
 void main() {
     // The real memory use, ~40 mb
     double[] data;
     data.length = 5_000_000;
     foreach(i, inout x; data) {
         x = sin(cast(double)i/data.length);
         //x = 1;
     }
     int count = 0;
     int gcount = 0;
     while(1) {
         // simulate reading a few kb of data
         double[] incoming;
         incoming.length = 1000 + rand() % 5000;
         foreach(i, inout x; incoming) {
             x = sin(cast(double)i/incoming.length);
             //x = 5;
         }
         // do something with the data...
 
         // print status message every so often
         count += incoming.length;
         if (count > 1_000_000) {
             count = 0;
             gcount++;
             writefln("%s processed", gcount);
         }
     }
 }
 
 
 
 This one uses doubles instead of uints and the data is the sin of some 
 number.  These are _very_ realistic values for numeric data to have. The 
 same effect can be seen.  Instead of hovering around 40MB, the memory 
 use grows and grows and performance slows and slows.
 
 This seems to be a very big issue.  The GC seems to be pretty much 
 useless right now if you're going to have a lot of floating point data 
 in your app.

For what it's worth, I ran the test above with the modified GC in Tango, 
for 10000 iterations of the "while(1)" loop.  The default behavior 
roughly matched Phobos, with an 89 second run time and over 340MB of 
memory consumed and growing steadily.  Then I told the GC to not scan 
the arrays using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

A test with these changes in place dropped the run time to 7 seconds 
with 43MB of memory consumed and not growing.

I grant that this isn't quite as nice as if D just figured out whether 
to scan the block using TypeInfo, but at least it grants the programmer 
a way to customize GC behavior somewhat to tune application performance. 
  The only stipulation with the current implementation is that block 
attributes will not be preserved if an array is resized.  This isn't 
terribly difficult to fix, but I haven't done so yet.


Sean

Jan 09 2007

"Ralf Schneider" <ralfs72_at_ gmx.net> writes:

 For what it's worth, I ran the test above with the modified GC in Tango, 
 for 10000 iterations of the "while(1)" loop.  The default behavior roughly 
 matched Phobos, with an 89 second run time and over 340MB of memory 
 consumed and growing steadily.  Then I told the GC to not scan the arrays 
 using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds with 
 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out whether to 
 scan the block using TypeInfo, but at least it grants the programmer a way 
 to customize GC behavior somewhat to tune application performance. The 
 only stipulation with the current implementation is that block attributes 
 will not be preserved if an array is resized.  This isn't terribly 
 difficult to fix, but I haven't done so yet.

It dosen't seem so hard for me to let the compiler set such an attribute on 
arrays without pointers...

- Ralf

Jan 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Ralf Schneider wrote:
 It dosen't seem so hard for me to let the compiler set such an attribute on 
 arrays without pointers...

It's pretty hard if you can't modify the compiler ;).
Sean can't modify what DMD does (at least, not directly). He _can_ 
(directly) modify what the runtime library does by replacing it, which 
is what he's done.
He (or anyone else, for that matter) might be able to implement this in 
GDC though...
And Walter might be convinced to implement it in DMD (or, if it's 
front-end only code, accept a patch that implements it).

Of course, that still leaves arrays of structs, which may contain both 
pointers and non-pointers.
What we really need is a way for the GC to know what the type of the 
memory is, or at least where the pointers are. This may be possible by 
adding this info to TypeInfo & subclasses[1]. But then every memory 
block would need a pointer to the relevant TypeInfo (or some condensed 
form of this information, like flags for "only pointers" and "no 
pointers", with TypeInfo pointer only if both are false).
This would definitely require some sort of compiler support though; it 
would need to generate appropriate type information for structs, objects 
(the actual memory, not the references) and unions.
It would then need to supply this information to the GC in the runtime, 
requiring extra code generation.


[1]: This could be as simple as supplying a member function that 
performs a callback for every offset containing a pointer.

Jan 09 2007

Kyle Furlong <kylefurlong gmail.com> writes:

Frits van Bommel wrote:
 Ralf Schneider wrote:
 It dosen't seem so hard for me to let the compiler set such an 
 attribute on arrays without pointers...

 
 It's pretty hard if you can't modify the compiler ;).
 Sean can't modify what DMD does (at least, not directly). He _can_ 
 (directly) modify what the runtime library does by replacing it, which 
 is what he's done.
 He (or anyone else, for that matter) might be able to implement this in 
 GDC though...
 And Walter might be convinced to implement it in DMD (or, if it's 
 front-end only code, accept a patch that implements it).
 
 Of course, that still leaves arrays of structs, which may contain both 
 pointers and non-pointers.
 What we really need is a way for the GC to know what the type of the 
 memory is, or at least where the pointers are. This may be possible by 
 adding this info to TypeInfo & subclasses[1]. But then every memory 
 block would need a pointer to the relevant TypeInfo (or some condensed 
 form of this information, like flags for "only pointers" and "no 
 pointers", with TypeInfo pointer only if both are false).
 This would definitely require some sort of compiler support though; it 
 would need to generate appropriate type information for structs, objects 
 (the actual memory, not the references) and unions.
 It would then need to supply this information to the GC in the runtime, 
 requiring extra code generation.
 
 
 [1]: This could be as simple as supplying a member function that 
 performs a callback for every offset containing a pointer.

According to a conversation I had with Walter a month or two ago, this 
is the direction we are going. With this sort of type information, 
other, more modern GC implementations will become possible, one of which 
I hope to write. :D

Jan 09 2007

Sean Kelly <sean f4.ca> writes:

Frits van Bommel wrote:
 Ralf Schneider wrote:
 It dosen't seem so hard for me to let the compiler set such an 
 attribute on arrays without pointers...

 
 It's pretty hard if you can't modify the compiler ;).
 Sean can't modify what DMD does (at least, not directly). He _can_ 
 (directly) modify what the runtime library does by replacing it, which 
 is what he's done.
 He (or anyone else, for that matter) might be able to implement this in 
 GDC though...
 And Walter might be convinced to implement it in DMD (or, if it's 
 front-end only code, accept a patch that implements it).
 
 Of course, that still leaves arrays of structs, which may contain both 
 pointers and non-pointers.
 What we really need is a way for the GC to know what the type of the 
 memory is, or at least where the pointers are. This may be possible by 
 adding this info to TypeInfo & subclasses[1]. But then every memory 
 block would need a pointer to the relevant TypeInfo (or some condensed 
 form of this information, like flags for "only pointers" and "no 
 pointers", with TypeInfo pointer only if both are false).
 This would definitely require some sort of compiler support though; it 
 would need to generate appropriate type information for structs, objects 
 (the actual memory, not the references) and unions.
 It would then need to supply this information to the GC in the runtime, 
 requiring extra code generation.

An interim alternative that came up in conversation would be to provide 
similar functionality via template functions.  With the .tupleof 
property, it's possible to obtain a very accurate picture of what data 
should be scanned.  This would mean using a custom function instead of 
'new', but it would certainly work:

     MyClass c = create!(MyClass)( a, b, c );

The create function above could be written using TypeTuples and other 
magic to allow direct parameter passing to the class ctor, making the 
routine just as efficient as an in-language solution.  Similar functions 
could be written for arrays, exploiting some tricks in the language:

     T[] resize(T)( T[] val, size_t len ) { ... }
     int[] buf;
     buf.resize( 1024 );

Again, perhaps not as clean as an in-language solution, but it could be 
done today.

I'll look into doing something like this for Tango prior to release.  It 
shouldn't be too difficult, assuming the .tupleof property works as 
described (some experimentation I made this morning suggests that it may 
not).


Sean

Jan 09 2007

Sean Kelly <sean f4.ca> writes:

Sean Kelly wrote:
 Similar functions 
 could be written for arrays, exploiting some tricks in the language:
 
     T[] resize(T)( T[] val, size_t len ) { ... }

Err, this should be:

     T[] resize(T)( inout T[] val, size_t len ) { ... }

Sean

Jan 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 An interim alternative that came up in conversation would be to provide 
 similar functionality via template functions.  With the .tupleof 
 property, it's possible to obtain a very accurate picture of what data 
 should be scanned.  This would mean using a custom function instead of 
 'new', but it would certainly work:

I actually tried something like that a while back. Unfortunately DMD 
kept segfaulting on me...
Just tried to compile that code again, still segfaults. I should 
probably try cutting it down to something bugzilla-worthy.

Jan 09 2007

zz <zz zz.com> writes:

Sean Kelly wrote:
 I'll look into doing something like this for Tango prior to release.  It 
 shouldn't be too difficult, assuming the .tupleof property works as 
 described (some experimentation I made this morning suggests that it may 
 not).
 
 
 Sean

Some years ago we had some comparision between two PostScript (Not GS) 
Interpreters without rendering to disk but generating loads of data 
Harlequin RIP beat the other by large factors which we found out were 
due to their garbage collector besides other things, this group also 
built the GC for LispWorks and is now available open source, I don't 
know how it's licenced but I know it was also used in Dylan from 
Halequin (for those of you who remember the language).

Maybe you should look at some of their ideas, the link is bellow.

http://www.ravenbrook.com/project/mps/

Zz

Jan 09 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Sean Kelly wrote:

 This seems to be a very big issue.  The GC seems to be pretty much 
 useless right now if you're going to have a lot of floating point data 
 in your app.

 
 For what it's worth, I ran the test above with the modified GC in Tango, 
 for 10000 iterations of the "while(1)" loop.  The default behavior 
 roughly matched Phobos, with an 89 second run time and over 340MB of 
 memory consumed and growing steadily.  Then I told the GC to not scan 
 the arrays using the following calls:
 
     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
 
 A test with these changes in place dropped the run time to 7 seconds 
 with 43MB of memory consumed and not growing.
 
 I grant that this isn't quite as nice as if D just figured out whether 
 to scan the block using TypeInfo, but at least it grants the programmer 
 a way to customize GC behavior somewhat to tune application performance. 
  The only stipulation with the current implementation is that block 
 attributes will not be preserved if an array is resized.  This isn't 
 terribly difficult to fix, but I haven't done so yet.

I'd certainly be willing to do that sort of thing in my code to make the 
problem go away in the short term.  In my case all of those setAttr 
calls would be hidden away in library code, so it wouldn't make life any 
harder for everyday use.

But it's not much use as a workaround until Tango is actually 
downloadable somewhere.  :-)

Once Tango is available, though, that would certainly be better than the 
alternative of rewriting my library to use malloc/free.  If that's going 
to be necessary then I think I'll just as soon go back to C++ where at 

using D in the first place, and if I can't realistically use that then I 
might as well be using C++.

--bb

Jan 09 2007

%u <duser hi.com> writes:

== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

 For what it's worth, I ran the test above with the modified GC in Tango,
 for 10000 iterations of the "while(1)" loop.  The default behavior
 roughly matched Phobos, with an 89 second run time and over 340MB of
 memory consumed and growing steadily.  Then I told the GC to not scan
 the arrays using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds
 with 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out whether
 to scan the block using TypeInfo, but at least it grants the programmer
 a way to customize GC behavior somewhat to tune application performance.
  The only stipulation with the current implementation is that block
 attributes will not be preserved if an array is resized.  This isn't
 terribly difficult to fix, but I haven't done so yet.

 I'd certainly be willing to do that sort of thing in my code to make the
 problem go away in the short term.  In my case all of those setAttr
 calls would be hidden away in library code, so it wouldn't make life any
 harder for everyday use.
 But it's not much use as a workaround until Tango is actually
 downloadable somewhere.  :-)
 Once Tango is available, though, that would certainly be better than the
 alternative of rewriting my library to use malloc/free.  If that's going
 to be necessary then I think I'll just as soon go back to C++ where at

 using D in the first place, and if I can't realistically use that then I
 might as well be using C++.
 --bb

I feel the same way. Without GC, D is just C++ with a few more features.

Jan 09 2007

Kyle Furlong <kylefurlong gmail.com> writes:

%u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

 For what it's worth, I ran the test above with the modified GC in Tango,
 for 10000 iterations of the "while(1)" loop.  The default behavior
 roughly matched Phobos, with an 89 second run time and over 340MB of
 memory consumed and growing steadily.  Then I told the GC to not scan
 the arrays using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds
 with 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out whether
 to scan the block using TypeInfo, but at least it grants the programmer
 a way to customize GC behavior somewhat to tune application performance.
  The only stipulation with the current implementation is that block
 attributes will not be preserved if an array is resized.  This isn't
 terribly difficult to fix, but I haven't done so yet.

 I'd certainly be willing to do that sort of thing in my code to make the
 problem go away in the short term.  In my case all of those setAttr
 calls would be hidden away in library code, so it wouldn't make life any
 harder for everyday use.
 But it's not much use as a workaround until Tango is actually
 downloadable somewhere.  :-)
 Once Tango is available, though, that would certainly be better than the
 alternative of rewriting my library to use malloc/free.  If that's going
 to be necessary then I think I'll just as soon go back to C++ where at

 using D in the first place, and if I can't realistically use that then I
 might as well be using C++.
 --bb

 
 I feel the same way. Without GC, D is just C++ with a few more features.

I do not think that this is even remotely valid. Think of everything you 
arent doing in D. You arent hacking macros with the preprocessor, you 
arent declaring methods outside of classes, globally overloading 
operators... To say that D is C++ + GC + some other features is just 
plain ignorant.

Jan 10 2007

%u <duser hotmail.com> writes:

== Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

 For what it's worth, I ran the test above with the modified GC in Tango,
 for 10000 iterations of the "while(1)" loop.  The default behavior
 roughly matched Phobos, with an 89 second run time and over 340MB of
 memory consumed and growing steadily.  Then I told the GC to not scan
 the arrays using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds
 with 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out whether
 to scan the block using TypeInfo, but at least it grants the programmer
 a way to customize GC behavior somewhat to tune application performance.
  The only stipulation with the current implementation is that block
 attributes will not be preserved if an array is resized.  This isn't
 terribly difficult to fix, but I haven't done so yet.

 I'd certainly be willing to do that sort of thing in my code to make the
 problem go away in the short term.  In my case all of those setAttr
 calls would be hidden away in library code, so it wouldn't make life any
 harder for everyday use.
 But it's not much use as a workaround until Tango is actually
 downloadable somewhere.  :-)
 Once Tango is available, though, that would certainly be better than the
 alternative of rewriting my library to use malloc/free.  If that's going
 to be necessary then I think I'll just as soon go back to C++ where at

 using D in the first place, and if I can't realistically use that then I
 might as well be using C++.
 --bb

 I feel the same way. Without GC, D is just C++ with a few more features.

 I do not think that this is even remotely valid. Think of everything you
 arent doing in D. You arent hacking macros with the preprocessor, you
 arent declaring methods outside of classes, globally overloading
 operators... To say that D is C++ + GC + some other features is just
 plain ignorant.

It's not ignorant from my viewpoint. You're assuming I care about the features
you
described.

Jan 10 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"%u" <duser hotmail.com> wrote in message 
news:eo324t$2ui5$1 digitaldaemon.com...
 It's not ignorant from my viewpoint. You're assuming I care about the 
 features you
 described.

A bit OT, but would you please get a newsreader so you can stop being called 
"%u"?

Jan 10 2007

Kyle Furlong <kylefurlong gmail.com> writes:

%u wrote:
 == Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point data
 in your app.

 For what it's worth, I ran the test above with the modified GC in Tango,
 for 10000 iterations of the "while(1)" loop.  The default behavior
 roughly matched Phobos, with an 89 second run time and over 340MB of
 memory consumed and growing steadily.  Then I told the GC to not scan
 the arrays using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds
 with 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out whether
 to scan the block using TypeInfo, but at least it grants the programmer
 a way to customize GC behavior somewhat to tune application performance.
  The only stipulation with the current implementation is that block
 attributes will not be preserved if an array is resized.  This isn't
 terribly difficult to fix, but I haven't done so yet.

 I'd certainly be willing to do that sort of thing in my code to make the
 problem go away in the short term.  In my case all of those setAttr
 calls would be hidden away in library code, so it wouldn't make life any
 harder for everyday use.
 But it's not much use as a workaround until Tango is actually
 downloadable somewhere.  :-)
 Once Tango is available, though, that would certainly be better than the
 alternative of rewriting my library to use malloc/free.  If that's going
 to be necessary then I think I'll just as soon go back to C++ where at

 using D in the first place, and if I can't realistically use that then I
 might as well be using C++.
 --bb

 I feel the same way. Without GC, D is just C++ with a few more features.

 I do not think that this is even remotely valid. Think of everything you
 arent doing in D. You arent hacking macros with the preprocessor, you
 arent declaring methods outside of classes, globally overloading
 operators... To say that D is C++ + GC + some other features is just
 plain ignorant.

 
 It's not ignorant from my viewpoint. You're assuming I care about the features
you
 described.

What I'm trying to say is, that to say a language is just a sum of its 
features is incorrect. There are behaviors that arise from the 
interaction of features that can be good or bad. In D, I've found that 
these interactions are on the whole good. In C++, I've found that these 
interactions are on the whole bad.

I'm not trying to say your experiences with both are invalid, maybe you 
have been using them in a way in which the opposite is true. But for me, 
C++ as a whole doesn't feel right, and D as a whole does.

Jan 10 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Kyle Furlong wrote:
 %u wrote:
 == Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating point 
 data
 in your app.

 For what it's worth, I ran the test above with the modified GC in 
 Tango,
 for 10000 iterations of the "while(1)" loop.  The default behavior
 roughly matched Phobos, with an 89 second run time and over 340MB of
 memory consumed and growing steadily.  Then I told the GC to not scan
 the arrays using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds
 with 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out 
 whether
 to scan the block using TypeInfo, but at least it grants the 
 programmer
 a way to customize GC behavior somewhat to tune application 
 performance.
  The only stipulation with the current implementation is that block
 attributes will not be preserved if an array is resized.  This isn't
 terribly difficult to fix, but I haven't done so yet.

 I'd certainly be willing to do that sort of thing in my code to 
 make the
 problem go away in the short term.  In my case all of those setAttr
 calls would be hidden away in library code, so it wouldn't make 
 life any
 harder for everyday use.
 But it's not much use as a workaround until Tango is actually
 downloadable somewhere.  :-)
 Once Tango is available, though, that would certainly be better 
 than the
 alternative of rewriting my library to use malloc/free.  If that's 
 going
 to be necessary then I think I'll just as soon go back to C++ where at

 I'm
 using D in the first place, and if I can't realistically use that 
 then I
 might as well be using C++.
 --bb

 I feel the same way. Without GC, D is just C++ with a few more 
 features.

 I do not think that this is even remotely valid. Think of everything you
 arent doing in D. You arent hacking macros with the preprocessor, you
 arent declaring methods outside of classes, globally overloading
 operators... To say that D is C++ + GC + some other features is just
 plain ignorant.

 It's not ignorant from my viewpoint. You're assuming I care about the 
 features you
 described.

 
 What I'm trying to say is, that to say a language is just a sum of its 
 features is incorrect. There are behaviors that arise from the 
 interaction of features that can be good or bad. In D, I've found that 
 these interactions are on the whole good. In C++, I've found that these 
 interactions are on the whole bad.
 
 I'm not trying to say your experiences with both are invalid, maybe you 
 have been using them in a way in which the opposite is true. But for me, 
 C++ as a whole doesn't feel right, and D as a whole does.

Yes, that's certainly a valid point.  But I think maybe you're taking 
Mr. Percent a little too literally.

All I was trying to say originally was that, to me, D without garbage 
collection -- despite all the other nice features -- is not compelling 
enough to abandon C++.

Also I think Mr. Percent and I would both agree that whether or not the 
language "feels right" is less important than being able to get our work 
done in a timely manner.  I'm interested in D primarily because I 
believe it will help me get things done faster.  Right now, I think with 
GC and all the other nice D features, for the kinds of things I want to 
do, D and C++ are just about neck-and-neck.  What C++ lacks vs D, it 
makes up for with great debuggers and tools, and gobs of free libraries. 
  Take away the GC from D, and what's left is simply not enough to 
overcome the advantages of C++'s debuggers, tools, and libraries *for 
the kinds of things I want to do*.  Your mileage may vary.

--bb

Jan 10 2007

Kyle Furlong <kylefurlong gmail.com> writes:

Bill Baxter wrote:
 Kyle Furlong wrote:
 %u wrote:
 == Quote from Kyle Furlong (kylefurlong gmail.com)'s article
 %u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 This seems to be a very big issue.  The GC seems to be pretty much
 useless right now if you're going to have a lot of floating 
 point data
 in your app.

 For what it's worth, I ran the test above with the modified GC in 
 Tango,
 for 10000 iterations of the "while(1)" loop.  The default behavior
 roughly matched Phobos, with an 89 second run time and over 340MB of
 memory consumed and growing steadily.  Then I told the GC to not 
 scan
 the arrays using the following calls:

     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

 A test with these changes in place dropped the run time to 7 seconds
 with 43MB of memory consumed and not growing.

 I grant that this isn't quite as nice as if D just figured out 
 whether
 to scan the block using TypeInfo, but at least it grants the 
 programmer
 a way to customize GC behavior somewhat to tune application 
 performance.
  The only stipulation with the current implementation is that block
 attributes will not be preserved if an array is resized.  This isn't
 terribly difficult to fix, but I haven't done so yet.

 I'd certainly be willing to do that sort of thing in my code to 
 make the
 problem go away in the short term.  In my case all of those setAttr
 calls would be hidden away in library code, so it wouldn't make 
 life any
 harder for everyday use.
 But it's not much use as a workaround until Tango is actually
 downloadable somewhere.  :-)
 Once Tango is available, though, that would certainly be better 
 than the
 alternative of rewriting my library to use malloc/free.  If that's 
 going
 to be necessary then I think I'll just as soon go back to C++ 
 where at

 reason I'm
 using D in the first place, and if I can't realistically use that 
 then I
 might as well be using C++.
 --bb

 I feel the same way. Without GC, D is just C++ with a few more 
 features.

 I do not think that this is even remotely valid. Think of everything 
 you
 arent doing in D. You arent hacking macros with the preprocessor, you
 arent declaring methods outside of classes, globally overloading
 operators... To say that D is C++ + GC + some other features is just
 plain ignorant.

 It's not ignorant from my viewpoint. You're assuming I care about the 
 features you
 described.

 What I'm trying to say is, that to say a language is just a sum of its 
 features is incorrect. There are behaviors that arise from the 
 interaction of features that can be good or bad. In D, I've found that 
 these interactions are on the whole good. In C++, I've found that 
 these interactions are on the whole bad.

 I'm not trying to say your experiences with both are invalid, maybe 
 you have been using them in a way in which the opposite is true. But 
 for me, C++ as a whole doesn't feel right, and D as a whole does.

 
 Yes, that's certainly a valid point.  But I think maybe you're taking 
 Mr. Percent a little too literally.
 
 All I was trying to say originally was that, to me, D without garbage 
 collection -- despite all the other nice features -- is not compelling 
 enough to abandon C++.
 
 Also I think Mr. Percent and I would both agree that whether or not the 
 language "feels right" is less important than being able to get our work 
 done in a timely manner.  I'm interested in D primarily because I 
 believe it will help me get things done faster.  Right now, I think with 
 GC and all the other nice D features, for the kinds of things I want to 
 do, D and C++ are just about neck-and-neck.  What C++ lacks vs D, it 
 makes up for with great debuggers and tools, and gobs of free libraries. 
  Take away the GC from D, and what's left is simply not enough to 
 overcome the advantages of C++'s debuggers, tools, and libraries *for 
 the kinds of things I want to do*.  Your mileage may vary.
 
 --bb

I agree. This is only good news for D, however, because libraries and 
tools come with time. D is still young.

Jan 10 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

I feel kind of bad for making it sound like this is a problem related 
specifically to the D garbage collector. It is rather a general and 
apparently well known problem of all conservative garbage collectors. 
The D garbage collector is still excellent for a large domain of problems.

Lots of people seem to be having similar problems though, so a better 
understanding of under what conditions a conservative garbage collector 
will and will not work seems to be called for.

I'd guess that a lot of problems we have with the conservative GC is 
because we are at the end of the life time of the 32-bit address space. 
As time has passed, our programs have used a larger and larger portion 
of the available memory space. As one anonymous %u said, one solution 
would be "increasing the amount of memory needed for pointers by the 
factor two", which I interpret as moving to a 64-bit memory space.

I did some calculations to better understand what is going on. I've 
probably done several errors though.

<theoretical rambling>
The probability of one single uniformly random spurious pointer 
preventing the collection of an object of size s is

p = s/B,

with B being the size of the address space (2^b for b = 32 or 64 bits). 
On a 64 bit architecture, the probability is obviously drastically reduced.

if g is the amount (bytes) of independent random data an application 
needs, the risk of an object of size s not being collected is

P(g) = 1 - (1 - s/B) ^ (g/(b/8))

As one can see, the risk increases considerably as the objects gets 
bigger. Increasing the amount of random data the application contains, 
reduces the size of objects the GC will be able to handle satisfactory.

My example was kind of nasty in that it not only accumulated additional 
random garbage in each iteration, but also caused heap fragmentation. A 
kinder example would always create objects of the same size. The simpler 
example would be:

* start with a static set of g bytes of random data
* for each iteration create n objects of size s (for simplicity, let 
each object contain s bytes of random data)
* after each iteration, none of the n objects are used anymore. This is 
a good time to call gc.fullCollect();

After each such iteration, i, P_i*n_i objects will remain uncollected, 
and will be added to the pool of random spurious pointers. I disregard 
the small portion of objects that are uncollected because of spurious 
pointers appearing in other objects left uncollected in the same 
iteration. If this portion would be significant, you'd really have a 
problem anyway and this will show up in later iterations.

g_{i+1} ~ g_i + P_i*n_i*s.

I generously assume that the safe memory area of the collected objects 
are reused by the allocator in the next iteration. A few of those will 
become unsafe by the recently added spurious pointers from uncollected 
objects, but for the most part, those memory areas are now safe, and the 
objects allocated/collected from them can be disregarded. The number of 
objects that need to be allocated in unsafe memory areas for the next 
iteration becomes:

n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)

This will of course not perfectly predict the real behavior, as it 
relaxes the problem away from discrete units, but should still 
adequately model the sample application. Using this, I tried to find the 
breaking point, i.e. at what size objects need to be explicitly freed 
instead of left to the GC.

"Verification":
Rewriting my original example to to a std.gc.fullCollect() every 10th 
iteration, giving the following parameters:
g = 20 MB
n = 10

the model suggests allocating objects of size:
s = 2000 would result in almost no space overhead, stable at 20 mb
s = 3000 would result in a slight but stable space overhead,
s = 4000 would cause a run-away memory usage

running the program results in:
s = 2000 memory usage is stable at 20 mb
s = 3000 results in a small apparently unbounded memory leak
s = 4000 results in a unbounded memory leak

The model appears to be at least in the correct ballpark for this sample.

Theoretical results:
Those tables show the maximum object size the GC will be able to handle 
with different static amounts of "random" data. By "being able to 
handle", I mean that the application doesn't use more than 2x the 
required amount of memory. The exact breaking point is very unstable and 
going above it rapidly results in uncontrolled memory consumption.

32 bit arch, 100 objects

1 MB data	8000 bytes
5 MB data	4500 bytes
10 MB data	3000 bytes
20 MB data	2000 bytes
100 MB data	 700 bytes
500 MB data	 200 bytes
1000 MB data	 100 bytes

32 bit arch, 1000 objects

1 MB data	3400 bytes
5 MB data	2200 bytes
10 MB data	1700 bytes
20 MB data	1200 bytes
100 MB data	 500 bytes
500 MB data	 150 bytes
1000 MB data	 100 bytes

32 bit arch, 10000 objects
1 MB data	1300 bytes
5 MB data	1000 bytes
10 MB data	 800 bytes
20 MB data	 600 bytes
100 MB data	 300 bytes
500 MB data	 100 bytes
1000 MB data	  75 bytes

Those figures need to be taken with a couple of grains of salt, but 
should give a indication of at what object size one needs to manually 
handle object lifetimes.

As a comparison -- the 64 bit haven:

64 bit arch, 100 objects
2 GB data	1.5 GB
100 GB data	600 MB
1 TB data	200 MB

64 bit arch, 1000 objects
2 GB data	350 MB
100 GB data	250 MB
1 TB data	150 MB

64 bit arch, 10000 objects
2 GB data	100 MB
100 GB data	 75 MB
1 TB data	 50 MB
</theoretical rambling>

Summary:

As far as I can see, what I have to do to avoid memory leaks with a 
conservative GC, is one of the following:

1. move to a 64 bit architecture
2. manually handle all objects larger than a few hundred bytes (see above)
3. hide all non pointer data from the GC


as automatic ref-counting. Not having automatic ref-counting also 
prevents neat solutions such as transparent CoW, and automatic handling 
of scarce resources.

If I wouldn't have a strong belief that automatic ref-counting would be 
addressed soon, I'd definitely consider going back to C++. Luckily, 
before I'd give up waiting, 64 bit architectures will probably be in 
great majority. ;)

/Oskar

Jan 09 2007

Kyle Furlong <kylefurlong gmail.com> writes:

Oskar Linde wrote:
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

 
 I feel kind of bad for making it sound like this is a problem related 
 specifically to the D garbage collector. It is rather a general and 
 apparently well known problem of all conservative garbage collectors. 
 The D garbage collector is still excellent for a large domain of problems.
 
 Lots of people seem to be having similar problems though, so a better 
 understanding of under what conditions a conservative garbage collector 
 will and will not work seems to be called for.
 
 I'd guess that a lot of problems we have with the conservative GC is 
 because we are at the end of the life time of the 32-bit address space. 
 As time has passed, our programs have used a larger and larger portion 
 of the available memory space. As one anonymous %u said, one solution 
 would be "increasing the amount of memory needed for pointers by the 
 factor two", which I interpret as moving to a 64-bit memory space.
 
 I did some calculations to better understand what is going on. I've 
 probably done several errors though.
 
 <theoretical rambling>
 The probability of one single uniformly random spurious pointer 
 preventing the collection of an object of size s is
 
 p = s/B,
 
 with B being the size of the address space (2^b for b = 32 or 64 bits). 
 On a 64 bit architecture, the probability is obviously drastically reduced.
 
 if g is the amount (bytes) of independent random data an application 
 needs, the risk of an object of size s not being collected is
 
 P(g) = 1 - (1 - s/B) ^ (g/(b/8))
 
 As one can see, the risk increases considerably as the objects gets 
 bigger. Increasing the amount of random data the application contains, 
 reduces the size of objects the GC will be able to handle satisfactory.
 
 My example was kind of nasty in that it not only accumulated additional 
 random garbage in each iteration, but also caused heap fragmentation. A 
 kinder example would always create objects of the same size. The simpler 
 example would be:
 
 * start with a static set of g bytes of random data
 * for each iteration create n objects of size s (for simplicity, let 
 each object contain s bytes of random data)
 * after each iteration, none of the n objects are used anymore. This is 
 a good time to call gc.fullCollect();
 
 After each such iteration, i, P_i*n_i objects will remain uncollected, 
 and will be added to the pool of random spurious pointers. I disregard 
 the small portion of objects that are uncollected because of spurious 
 pointers appearing in other objects left uncollected in the same 
 iteration. If this portion would be significant, you'd really have a 
 problem anyway and this will show up in later iterations.
 
 g_{i+1} ~ g_i + P_i*n_i*s.
 
 I generously assume that the safe memory area of the collected objects 
 are reused by the allocator in the next iteration. A few of those will 
 become unsafe by the recently added spurious pointers from uncollected 
 objects, but for the most part, those memory areas are now safe, and the 
 objects allocated/collected from them can be disregarded. The number of 
 objects that need to be allocated in unsafe memory areas for the next 
 iteration becomes:
 
 n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)
 
 This will of course not perfectly predict the real behavior, as it 
 relaxes the problem away from discrete units, but should still 
 adequately model the sample application. Using this, I tried to find the 
 breaking point, i.e. at what size objects need to be explicitly freed 
 instead of left to the GC.
 
 "Verification":
 Rewriting my original example to to a std.gc.fullCollect() every 10th 
 iteration, giving the following parameters:
 g = 20 MB
 n = 10
 
 the model suggests allocating objects of size:
 s = 2000 would result in almost no space overhead, stable at 20 mb
 s = 3000 would result in a slight but stable space overhead,
 s = 4000 would cause a run-away memory usage
 
 running the program results in:
 s = 2000 memory usage is stable at 20 mb
 s = 3000 results in a small apparently unbounded memory leak
 s = 4000 results in a unbounded memory leak
 
 The model appears to be at least in the correct ballpark for this sample.
 
 Theoretical results:
 Those tables show the maximum object size the GC will be able to handle 
 with different static amounts of "random" data. By "being able to 
 handle", I mean that the application doesn't use more than 2x the 
 required amount of memory. The exact breaking point is very unstable and 
 going above it rapidly results in uncontrolled memory consumption.
 
 32 bit arch, 100 objects
 
 1 MB data    8000 bytes
 5 MB data    4500 bytes
 10 MB data    3000 bytes
 20 MB data    2000 bytes
 100 MB data     700 bytes
 500 MB data     200 bytes
 1000 MB data     100 bytes
 
 32 bit arch, 1000 objects
 
 1 MB data    3400 bytes
 5 MB data    2200 bytes
 10 MB data    1700 bytes
 20 MB data    1200 bytes
 100 MB data     500 bytes
 500 MB data     150 bytes
 1000 MB data     100 bytes
 
 32 bit arch, 10000 objects
 1 MB data    1300 bytes
 5 MB data    1000 bytes
 10 MB data     800 bytes
 20 MB data     600 bytes
 100 MB data     300 bytes
 500 MB data     100 bytes
 1000 MB data      75 bytes
 
 Those figures need to be taken with a couple of grains of salt, but 
 should give a indication of at what object size one needs to manually 
 handle object lifetimes.
 
 As a comparison -- the 64 bit haven:
 
 64 bit arch, 100 objects
 2 GB data    1.5 GB
 100 GB data    600 MB
 1 TB data    200 MB
 
 64 bit arch, 1000 objects
 2 GB data    350 MB
 100 GB data    250 MB
 1 TB data    150 MB
 
 64 bit arch, 10000 objects
 2 GB data    100 MB
 100 GB data     75 MB
 1 TB data     50 MB
 </theoretical rambling>
 
 Summary:
 
 As far as I can see, what I have to do to avoid memory leaks with a 
 conservative GC, is one of the following:
 
 1. move to a 64 bit architecture
 2. manually handle all objects larger than a few hundred bytes (see above)
 3. hide all non pointer data from the GC
 

 as automatic ref-counting. Not having automatic ref-counting also 
 prevents neat solutions such as transparent CoW, and automatic handling 
 of scarce resources.
 
 If I wouldn't have a strong belief that automatic ref-counting would be 
 addressed soon, I'd definitely consider going back to C++. Luckily, 
 before I'd give up waiting, 64 bit architectures will probably be in 
 great majority. ;)
 
 /Oskar

Dont hold your breath for a reference counting implementation from DMD. 
Walter doesnt want to add any overhead to assignments.

Jan 09 2007

Johan Granberg <lijat.meREM OVE.gmail.com> writes:

Kyle Furlong wrote:

 Oskar Linde wrote:
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks,
 I've unfortunately had to come to the conclusion that the D GC is not
 ready for production use. The problem is what I'd call "spurious
 pointers". That is random data (strings, numbers, image data, audio or
 whatever) appearing to the GC to be full of pointers to all over the
 memory space.

 
 I feel kind of bad for making it sound like this is a problem related
 specifically to the D garbage collector. It is rather a general and
 apparently well known problem of all conservative garbage collectors.
 The D garbage collector is still excellent for a large domain of
 problems.
 
 Lots of people seem to be having similar problems though, so a better
 understanding of under what conditions a conservative garbage collector
 will and will not work seems to be called for.
 
 I'd guess that a lot of problems we have with the conservative GC is
 because we are at the end of the life time of the 32-bit address space.
 As time has passed, our programs have used a larger and larger portion
 of the available memory space. As one anonymous %u said, one solution
 would be "increasing the amount of memory needed for pointers by the
 factor two", which I interpret as moving to a 64-bit memory space.
 
 I did some calculations to better understand what is going on. I've
 probably done several errors though.
 
 <theoretical rambling>
 The probability of one single uniformly random spurious pointer
 preventing the collection of an object of size s is
 
 p = s/B,
 
 with B being the size of the address space (2^b for b = 32 or 64 bits).
 On a 64 bit architecture, the probability is obviously drastically
 reduced.
 
 if g is the amount (bytes) of independent random data an application
 needs, the risk of an object of size s not being collected is
 
 P(g) = 1 - (1 - s/B) ^ (g/(b/8))
 
 As one can see, the risk increases considerably as the objects gets
 bigger. Increasing the amount of random data the application contains,
 reduces the size of objects the GC will be able to handle satisfactory.
 
 My example was kind of nasty in that it not only accumulated additional
 random garbage in each iteration, but also caused heap fragmentation. A
 kinder example would always create objects of the same size. The simpler
 example would be:
 
 * start with a static set of g bytes of random data
 * for each iteration create n objects of size s (for simplicity, let
 each object contain s bytes of random data)
 * after each iteration, none of the n objects are used anymore. This is
 a good time to call gc.fullCollect();
 
 After each such iteration, i, P_i*n_i objects will remain uncollected,
 and will be added to the pool of random spurious pointers. I disregard
 the small portion of objects that are uncollected because of spurious
 pointers appearing in other objects left uncollected in the same
 iteration. If this portion would be significant, you'd really have a
 problem anyway and this will show up in later iterations.
 
 g_{i+1} ~ g_i + P_i*n_i*s.
 
 I generously assume that the safe memory area of the collected objects
 are reused by the allocator in the next iteration. A few of those will
 become unsafe by the recently added spurious pointers from uncollected
 objects, but for the most part, those memory areas are now safe, and the
 objects allocated/collected from them can be disregarded. The number of
 objects that need to be allocated in unsafe memory areas for the next
 iteration becomes:
 
 n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)
 
 This will of course not perfectly predict the real behavior, as it
 relaxes the problem away from discrete units, but should still
 adequately model the sample application. Using this, I tried to find the
 breaking point, i.e. at what size objects need to be explicitly freed
 instead of left to the GC.
 
 "Verification":
 Rewriting my original example to to a std.gc.fullCollect() every 10th
 iteration, giving the following parameters:
 g = 20 MB
 n = 10
 
 the model suggests allocating objects of size:
 s = 2000 would result in almost no space overhead, stable at 20 mb
 s = 3000 would result in a slight but stable space overhead,
 s = 4000 would cause a run-away memory usage
 
 running the program results in:
 s = 2000 memory usage is stable at 20 mb
 s = 3000 results in a small apparently unbounded memory leak
 s = 4000 results in a unbounded memory leak
 
 The model appears to be at least in the correct ballpark for this sample.
 
 Theoretical results:
 Those tables show the maximum object size the GC will be able to handle
 with different static amounts of "random" data. By "being able to
 handle", I mean that the application doesn't use more than 2x the
 required amount of memory. The exact breaking point is very unstable and
 going above it rapidly results in uncontrolled memory consumption.
 
 32 bit arch, 100 objects
 
 1 MB data    8000 bytes
 5 MB data    4500 bytes
 10 MB data    3000 bytes
 20 MB data    2000 bytes
 100 MB data     700 bytes
 500 MB data     200 bytes
 1000 MB data     100 bytes
 
 32 bit arch, 1000 objects
 
 1 MB data    3400 bytes
 5 MB data    2200 bytes
 10 MB data    1700 bytes
 20 MB data    1200 bytes
 100 MB data     500 bytes
 500 MB data     150 bytes
 1000 MB data     100 bytes
 
 32 bit arch, 10000 objects
 1 MB data    1300 bytes
 5 MB data    1000 bytes
 10 MB data     800 bytes
 20 MB data     600 bytes
 100 MB data     300 bytes
 500 MB data     100 bytes
 1000 MB data      75 bytes
 
 Those figures need to be taken with a couple of grains of salt, but
 should give a indication of at what object size one needs to manually
 handle object lifetimes.
 
 As a comparison -- the 64 bit haven:
 
 64 bit arch, 100 objects
 2 GB data    1.5 GB
 100 GB data    600 MB
 1 TB data    200 MB
 
 64 bit arch, 1000 objects
 2 GB data    350 MB
 100 GB data    250 MB
 1 TB data    150 MB
 
 64 bit arch, 10000 objects
 2 GB data    100 MB
 100 GB data     75 MB
 1 TB data     50 MB
 </theoretical rambling>
 
 Summary:
 
 As far as I can see, what I have to do to avoid memory leaks with a
 conservative GC, is one of the following:
 
 1. move to a 64 bit architecture
 2. manually handle all objects larger than a few hundred bytes (see
 above) 3. hide all non pointer data from the GC
 

 as automatic ref-counting. Not having automatic ref-counting also
 prevents neat solutions such as transparent CoW, and automatic handling
 of scarce resources.
 
 If I wouldn't have a strong belief that automatic ref-counting would be
 addressed soon, I'd definitely consider going back to C++. Luckily,
 before I'd give up waiting, 64 bit architectures will probably be in
 great majority. ;)
 
 /Oskar

 
 Dont hold your breath for a reference counting implementation from DMD.
 Walter doesnt want to add any overhead to assignments.

Wouldn't it be possible to have classes declared "refcounted" in the same
way as scope? Then the extra cost would only apply to members of thouse
classes, possibly by disallowing uppcasts to non refcounted superclasses. 

example:

refcounted class foo{
}

...

void bar(){
        foo f=new foo();//refcount is one
        auto k=f;//refcount is two
        k=null;//back to one
        //here the refcount reaches zero and the object is deleted
}

Jan 09 2007

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kyle Furlong schrieb am 2007-01-09:
 Oskar Linde wrote:
 Oskar Linde wrote:


<snip>

 If I wouldn't have a strong belief that automatic ref-counting would be 
 addressed soon, I'd definitely consider going back to C++. Luckily, 
 before I'd give up waiting, 64 bit architectures will probably be in 
 great majority. ;)


 Dont hold your breath for a reference counting implementation from DMD. 
 Walter doesnt want to add any overhead to assignments.

Reference counting implementations would cause a lot of pain and
overhead with array slices. A sweeping GC that has different
pools for non-pointer and may-contain-pointer memory areas is easier to
implement and causes less overhead. In addition reference counting can
cause interesting problems in multi-core systems.
Yes, using advance information all sweeping GC's can be exploited,
though user data should'nt usually be a problem as most of it ends up
in the no-pointer pool. The x86-stack however is a potential attack vector.

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFFpCw3LK5blCcjpWoRAuFjAKCpXiyi61g916iUQO9IUbb8GIv4gwCfWmUG
edc07TIMP5n4l0+bzWXeFr4=
=4xvh
-----END PGP SIGNATURE-----

Jan 09 2007

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 Oskar Linde wrote:
 After having fought a while with D programs with runaway memory leaks, 
 I've unfortunately had to come to the conclusion that the D GC is not 
 ready for production use. The problem is what I'd call "spurious 
 pointers". That is random data (strings, numbers, image data, audio or 
 whatever) appearing to the GC to be full of pointers to all over the 
 memory space.

 
 I feel kind of bad for making it sound like this is a problem related 
 specifically to the D garbage collector. It is rather a general and 
 apparently well known problem of all conservative garbage collectors. 
 The D garbage collector is still excellent for a large domain of problems.
 
 Lots of people seem to be having similar problems though, so a better 
 understanding of under what conditions a conservative garbage collector 
 will and will not work seems to be called for.

This link may be relevant:

http://www.hpl.hp.com/techreports/2001/HPL-2001-251.html

"Bounding Space Usage of Conservative Garbage Collectors" - Hans J. Boehm


Sean

Jan 10 2007

Benji Smith <dlanguage benjismith.net> writes:

The only comment I'd like to add is this:

At the same time as we're thinking about how to use a precise collector 
instead of a conservative collector, we should also think about how to 
implement a generational copying collector, since it might be necessary 
to add new constructs to the core language (pointer pinning, for 
example) to facilitate the new collector semantics.

--benji

Jan 10 2007

Sean Kelly <sean f4.ca> writes:

Benji Smith wrote:
 The only comment I'd like to add is this:
 
 At the same time as we're thinking about how to use a precise collector 
 instead of a conservative collector, we should also think about how to 
 implement a generational copying collector, since it might be necessary 
 to add new constructs to the core language (pointer pinning, for 
 example) to facilitate the new collector semantics.

Personally, I'm not convinced that a traditional moving GC will be the 
way to go in D.  Unless we are given some way to obtain type info for 
data on the stack and in the static data segment, anything directly 
referenced by these regions would have to be implicitly pinned.  And 
typically, long-lived data is referenced directly from such links.  A 
programmer could work around this limitation by using proxy classes, but 
this seems like a lot of trouble just to allow for generational garbage 
collection.  However, there are variants on the idea that sound like 
they have potential, one of which I believe is being discussed.  As for 
pointer pinning--it's fairly easy to add to the GC, so I don't see that 
as an obstacle to future design.  The bigger problem is sorting out 
efficient implementations of Object.opHash and such for objects whose 
only persistent uniquely identifying characteristic is their address.


Sean

Jan 10 2007

D Programming

C/C++ Programming

Other

digitalmars.D - The problem with the D GC