www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC, the simple solution

reply Frank Benoit <keinfarbton nospam.xyz> writes:
Another garbage collector thread :)

The best garbage collector I can think about, is a reference counting
one. To implement this:

* Every object needs a reference counter field.
* Code using references, need to have code for incrementing/decrementing
the counters.
* Destructors are called immediatly if the counter goes zero.

What you get is a precise, incremental, realtime garbage collector.
* all logic behaviour is deterministic
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
* implement games and realtime apps without worry about lags
* implement secure application without worry about GC attack by valid
refence data values.
* consume only the needed memory
* no more need for the auto object
Jun 04 2006
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
"GC, the simple solution"

No such thing.

Summary:

* Ref counting by default: good grief no!
* Ref counting as an option: good grief yes!
* Is there a perfect solution for memory management: don't be ridiculous.

Frank Benoit wrote:
 Another garbage collector thread :)
 
 The best garbage collector I can think about, is a reference counting
 one. To implement this:
 
 * Every object needs a reference counter field.
 * Code using references, need to have code for incrementing/decrementing
 the counters.
 * Destructors are called immediatly if the counter goes zero.
 
 What you get is a precise, incremental, realtime garbage collector.
 * all logic behaviour is deterministic
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns
 * implement games and realtime apps without worry about lags
 * implement secure application without worry about GC attack by valid
 refence data values.
 * consume only the needed memory
 * no more need for the auto object

What you also get that you might not want: * because references are on the object, you can't have array slices * it's also incompatible with pointers: if the pointer is to a member of an object, how on earth do you update the refcount then? * slower in the immediate sense (incref & decref calls, plus it'll mean less code in L1 and L2 CPU cache, which can kill performance in some architectures) * requires all objects to have destructors to decref child objects (implicit or otherwise) * without auto, makes it impossible to ensure an object's destruction when you leave scope: you can leak references * inability to free cycles * inability to even *run* destructors where cycles are involved, because there is no order in which they can logically be executed and still be valid Those last two are the main problem with reference counting, along with the others and slightly added complexity in the compiler. Python, however, does not suffer the cycle problem. Know why? Because they have a *mark & sweep* garbage collector to clean up the cycles. Seriously, I don't want to see the current GC replaced by a reference counting scheme. The current system is just fine for the majority of applications. Where it falls down are real-time systems which can't allow for the non-deterministic GC pauses, and where you want to keep your memory profile to a minimum. Anything that has pauses of activity can just run a background GC thread to keep the memory footprint down. To this end, I think implementing an incremental or concurrent GC should be the next port of call for optimising D's memory management. As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually. However, I do think that having built-in *support* for Reference counting would be fantastic. The less that stuff like this is left in the hands of the programmer, the better. What I think we all need to keep in mind is that NO memory management scheme is good for all cases. There will always be some weird corner case which doesn't work well with our scheme of choice, and that we don't give a rat's about, but is VERY important to someone else. So far D does automatic GC (the best default, IMHO [1]), manual management, and RAII. Throw optional ref counting in to the mix, and I think we'll have pretty much 99% of cases covered. -- Daniel [1] The best default since it's unobtrusive, and *safe*: unless you're doing strange things, it's very hard to leave dangling references with a GC. This allows people to get on with writing the damn program, and worry about managing memory later (if ever). -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 04 2006
next sibling parent reply Frank Benoit <keinfarbton nospam.xyz> writes:
In nearly every other GC implementation you will need also a read or
write barrier. With reference counting you have a write barriere.

Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
non-generational, conservative Phobos GC implementation is an exception.

But a write barrier alone doesn't make ref counting ridiculous. Well you
are right, it is not the solution for all and everything, but I think a
big step in the right direction. And mark&sweep GC should be the option.

 * because references are on the object, you can't have array slices

Don't understand that point.
 * it's also incompatible with pointers: if the pointer is to a member of
 an object, how on earth do you update the refcount then?

In the actual implememtation of the gc, every object's memory range is registered by the gc. With that you can get the objects refcounter address.
 * slower in the immediate sense (incref & decref calls, plus it'll mean
 less code in L1 and L2 CPU cache, which can kill performance in some
 architectures)

Performance has two sides. Throughput is one thing, worst latency the other. We can think about techniques to optimize unecessary increments/decrements. BTW all incremental GCs will also need read and/or write barriers, which will have the same perfomance issues.
 * requires all objects to have destructors to decref child objects
 (implicit or otherwise)

Yes, all objects having child objects. But if you think, that destructor calls are a performance issue, you will get problems with the mark&sweep also.
 * without auto, makes it impossible to ensure an object's destruction
 when you leave scope: you can leak references

How can that happen, if we have them native? Well ok, e.g. if you store the reference to a static variable, then it is not freed. But this is really a programmers bug. But I think that is something an advanced look at it will find solutions. First we have to decide to go for ref counting :)
 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still be valid

Right, here a manual ref=null is needed. And perhaps a mark&sweep algorithm to detect such cycles in the debug version, called on request. So if you know you program, you can avoid cycles, or you can break them before releasing the last reference. Doing a realtime GC seems to be not possible. The existing "realtime GC" seems to me, incremental ones, which stop the world for a short limited time. And that time goes into your worst latency. So if I want to do things with a latency less 100us, there is no chance to implement a GC. Ref counting whould be a solution, but this can only work if the whole program does not depend on "Stop-the-world".
 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.

* The dereference operator cannot be overwritten. * Existing classes, native arrays etc. does not work like this. I do not need a few classes managed with ref counting. I want to avoid the "Stop the world". And phobos needs it. Where is the proof that ref counting is so slow? In mark&sweep you have to run over the whole memory. If the memory is low, you have to do that often. This is really slow than.
Jun 04 2006
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Frank Benoit wrote:
 In nearly every other GC implementation you will need also a read or
 write barrier. With reference counting you have a write barriere.
 
 Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
 non-generational, conservative Phobos GC implementation is an exception.
 
 But a write barrier alone doesn't make ref counting ridiculous. Well you
 are right, it is not the solution for all and everything, but I think a
 big step in the right direction. And mark&sweep GC should be the option.

I guess we'll just have to disagree, then :) Like I said before, I think the current GC should be the default since, even if it's not the best in terms of latency, it's the *safest*. There's no need for someone coding under it to worry about things like cycles, etc. By having refcounting an option, we can say to programmers: "turn this on if you like, but watch your back for cycles!" <joking>Plus, the current system gives us bitchin' benchmarks with wc! ;)</joking>
 * because references are on the object, you can't have array slices

Don't understand that point.

An array slice is a pointer and a length. The problem is the "pointer" part:
 * it's also incompatible with pointers: if the pointer is to a member of
 an object, how on earth do you update the refcount then?

In the actual implememtation of the gc, every object's memory range is registered by the gc. With that you can get the objects refcounter address.

Aaah, ok: you're putting the refcount somewhere in the allocated chunk. Sorry, I thought you meant hide it as a member on objects. However, I imagine that this is still going to do horrible things to pointer operations in terms of speed. Every time you move or copy a pointer, you'd have to ask the GC "ok, find which pool this pointer is located in, then inc/dec the refcount."
 * slower in the immediate sense (incref & decref calls, plus it'll mean
 less code in L1 and L2 CPU cache, which can kill performance in some
 architectures)

Performance has two sides. Throughput is one thing, worst latency the other. We can think about techniques to optimize unecessary increments/decrements. BTW all incremental GCs will also need read and/or write barriers, which will have the same perfomance issues.

I concede that, as well as that the current GC has (probably) the worst all-at-once latency you can get.
 * requires all objects to have destructors to decref child objects
 (implicit or otherwise)

Yes, all objects having child objects. But if you think, that destructor calls are a performance issue, you will get problems with the mark&sweep also.

Well, I don't *need* destructors in most general objects under mark&sweep. It's only when they're holding some external resource, and that's when I usually manage them with auto or scoped destruction. ... which would admittedly be easier with refcounting, if a little bit slower :)
 * without auto, makes it impossible to ensure an object's destruction
 when you leave scope: you can leak references

How can that happen, if we have them native? Well ok, e.g. if you store the reference to a static variable, then it is not freed. But this is really a programmers bug. But I think that is something an advanced look at it will find solutions. First we have to decide to go for ref counting :)

What I mean is that if I say "auto Foo x = ..." then I know that come hell or high water, D is going to destroy that object at the end of scope. Plus, because of auto rules, I *cannot* leak a reference to that object: D simply won't let me. As for the static variable case, it's no different to the current implementation: you save a reference to something in a global or static variable, and it's never going to die :)
 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still be valid

Right, here a manual ref=null is needed. And perhaps a mark&sweep algorithm to detect such cycles in the debug version, called on request. So if you know you program, you can avoid cycles, or you can break them before releasing the last reference.

The problem I have is that even if I *do* "know" my program, I don't want to have to worry about this. I just want to write my damn code! This is one of the great advantages of D: if you don't give a rat's about memory management, you don't have to. That said, D *does* have one advantage: at compile time, the compiler can determine if it's possible for an object to indirectly refer to itself. Using this, it can insert something like this into the decref: # static if( typeof(ptr).canPointToItself ) # registerForCycleChecking(ptr); That way, you know that cycles will get checked for *eventually*.
 Doing a realtime GC seems to be not possible.

Oh, you'll get no argument from me THERE. :)
 The existing "realtime GC"
 seems to me, incremental ones, which stop the world for a short limited
 time. And that time goes into your worst latency. So if I want to do
 things with a latency less 100us, there is no chance to implement a GC.
 Ref counting whould be a solution, but this can only work if the whole
 program does not depend on "Stop-the-world".

I saw an interesting paper once that was talking about using a tri-colour mark&sweep collector that was fully concurrent: sounded heinously complicated to write, but could run a full collect in the background of your app. Buggered if I can find the damn thing again, though...
 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.

* The dereference operator cannot be overwritten.

MyRef.value Yeah, it's not that great, but at least it's *possible* :)
 * Existing classes, native arrays etc. does not work like this.

Fair enough.
 I do not need a few classes managed with ref counting. I want to avoid
 the "Stop the world". And phobos needs it.

I don't think phobos "needs" it. What I think would help immensely was if phobos was easier to rebuild with different compiler flags, collector, etc. Would you be happy if it was trivial to recompile phobos using refcounting instead of the current GC, and then use that? # build -phobos -collector=refcount -version=SomethingElse # build MyProggy -collector=refcount -- uses phobos_refcount.lib
 Where is the proof that ref counting is so slow? In mark&sweep you have
 to run over the whole memory. If the memory is low, you have to do that
 often. This is really slow than.

I don't have any proof on me, although knowing the internet I'm sure I could find some "proof" without too much trouble :P And yes; if memory is low, m&s will probably take a long time. Actually, I should find out how long that is. Would be interesting to know exactly how long it takes to collect 512 meg of garbage :P Personally, I don't mind how D's default collector is implemented, so long as it is efficient in the general case, and I don't have to know anything about the way it works in order to write code. That said, I'd still like to see D support collectors other than the base GC. Ideally, I think D should support at least: # -collector=simple -- current style: no special support needed # -collector=writebarrier -- calls std.gc.writeBarrier(ptr) on writes # -collector=refcount -- calls std.gc.writeBarrier(ptr), as well as # std.gc.incref and std.gc.decref as # appropriate. That way, we can all have our cake, be it chocolate cake, fruit cake or cheesecake. Plus, we even get to *eat* it :) -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 04 2006
parent Lars Ivar Igesund <larsivar igesund.net> writes:
Daniel Keep wrote:

 
 
 Frank Benoit wrote:
 In nearly every other GC implementation you will need also a read or
 write barrier. With reference counting you have a write barriere.
 
 Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
 non-generational, conservative Phobos GC implementation is an exception.
 
 But a write barrier alone doesn't make ref counting ridiculous. Well you
 are right, it is not the solution for all and everything, but I think a
 big step in the right direction. And mark&sweep GC should be the option.

I guess we'll just have to disagree, then :) Like I said before, I think the current GC should be the default since, even if it's not the best in terms of latency, it's the *safest*. There's no need for someone coding under it to worry about things like cycles, etc.

Well, actually, the current GC is kinda bad with cycles. Try to make a circular linked list where you keep a single additional reference to one of the elements. Then remove the reference and collect. In theory this memory should now "just" be lost/leaked, but you might also experience crashes. I think Chris Miller had some interesting revelations on the subject. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi
Jun 05 2006
prev sibling parent reply Larry Evans <cppljevans cos-internet.com> writes:
On 06/04/2006 05:40 AM, Daniel Keep wrote:

 "GC, the simple solution"

[snip]
 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still 

I think someone else in the thread mentioned this as a programmer error. However, assuming the programmer intended this, then this programmer must expect that the MS GC would break the cycle somewhere before calling the destructors and he must not care where the cycle was broken, because otherwise he would have used a weak pointer to close the cycle where he wanted the break to occur. Anyway, code for doing this GC breaking of cycles in RC is at: http://tinyurl.com/reuzl See code after comment: * Breaks all arcs in cycle_arcs. As mentioned in the code comments, this is Christopher's method for collecting cycles and suffers a delay since it keeps all smart ptrs in a list and processes the entire list. Other methods don't keep this list but do a local mark-sweep at each rc decrement with obvious time penalties. A compromise just uses a queue which, when it reaches a certain size, plays the same role as Christopher's list of all smart pointers.
 Those last two are the main problem with reference counting, along with
 the others and slightly added complexity in the compiler.

 Python, however, does not suffer the cycle problem.  Know why?

 Because they have a *mark & sweep* garbage collector to clean up the 

Then it must not automatically call destructors since then it would suffer the 2nd problem above.

 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.

Smart pointers would be a great help here, but since D doesn't allow user definition of the assignment operator: http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html that's not easy. I'm a newbie, so what's the best way to implement something like a smart pointer without having to manually: increment rc1; decrement rc2; assign the pointer to its target location;
Jun 05 2006
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Sorry for the late (and short) reply: exams breathing down my neck and
all :)

Larry Evans wrote:
 On 06/04/2006 05:40 AM, Daniel Keep wrote:
 
 "GC, the simple solution"

[snip]
 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still be

I think someone else in the thread mentioned this as a programmer error. However, assuming the programmer intended this, then this programmer must expect that the MS GC would break the cycle somewhere before calling the destructors and he must not care where the cycle was broken, because otherwise he would have used a weak pointer to close the cycle where he wanted the break to occur. Anyway, code for doing this GC breaking of cycles in RC is at: http://tinyurl.com/reuzl See code after comment: * Breaks all arcs in cycle_arcs. As mentioned in the code comments, this is Christopher's method for collecting cycles and suffers a delay since it keeps all smart ptrs in a list and processes the entire list. Other methods don't keep this list but do a local mark-sweep at each rc decrement with obvious time penalties. A compromise just uses a queue which, when it reaches a certain size, plays the same role as Christopher's list of all smart pointers.
 Those last two are the main problem with reference counting, along with
 the others and slightly added complexity in the compiler.

 Python, however, does not suffer the cycle problem.  Know why?

 Because they have a *mark & sweep* garbage collector to clean up the

Then it must not automatically call destructors since then it would suffer the 2nd problem above.

I haven't read the above link since I really don't need to be distracted any more than I already am at the moment. I'll read that link later on once I'm not quite so swamped :) However, I will say this: if you have a cycle in Python, it will clean it up normally PROVIDED destructors are not involved. If they are, then it just moves the whole cycle to a dead object list, since there is no safe way to break the cycle. At that point, it's up to the programmer to deal with it.
 

 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.

Smart pointers would be a great help here, but since D doesn't allow user definition of the assignment operator: http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html that's not easy. I'm a newbie, so what's the best way to implement something like a smart pointer without having to manually: increment rc1; decrement rc2; assign the pointer to its target location;

Not really, sorry. Your best bet might be with a preprocessor. I've had plans to write an AST-based preprocessor for a while, but that's off in the far, far future. Maybe give Build 3.0 a look? In that case, you'd want to write a macro like: rc2 #= rc1; ==> (rc1).incref(); (rc2).decref(); rc2 = (rc1); Again, sorry for the terse reply. -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 08 2006
prev sibling next sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
 * Destructors are called immediatly if the counter goes zero.

This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.
 * all logic behaviour is deterministic

See above.
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns

When an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.
 * consume only the needed memory

What about circular references? They'll never get released, no clean-up. L.
Jun 04 2006
parent reply Frank Benoit <keinfarbton nospam.xyz> writes:
Lionello Lunesu schrieb:
 * Destructors are called immediatly if the counter goes zero.

This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.

Where is the disadvantage? If you release concurrent in different threads you need some kind of syncronization. Than you know it. But this is no issue of ref counting. If you release a reference in the phobos GC, the object is collected "perhaps" the next time. If there are no valid integer values, "pointing"to your object. And after collecting, the sequence of destructor calls is not defined.
 * all logic behaviour is deterministic

See above.

 
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns

When an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.

the object A. The B.~this() can do something with A. After the Dtor is finished, the reference to A is nulled, and A is perhaps also deleted. Seams perfect to me.
 * consume only the needed memory

What about circular references? They'll never get released, no clean-up.

Yes, they need a solution. Perhaps a fall back mark&sweep collector, to check for cicles.
 
 L. 
 
 

Jun 04 2006
parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Frank Benoit" <keinfarbton nospam.xyz> wrote in message 
news:e5uojm$18ji$1 digitaldaemon.com...
 Lionello Lunesu schrieb:
 * Destructors are called immediatly if the counter goes zero.

This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.

Where is the disadvantage? If you release concurrent in different threads you need some kind of syncronization. Than you know it. But this is no issue of ref counting. If you release a reference in the phobos GC, the object is collected "perhaps" the next time. If there are no valid integer values, "pointing"to your object. And after collecting, the sequence of destructor calls is not defined.

There are many resources that must be released by the thread by which they were acquired. There's not much synchronisation can do. If the last reference just happens to be released in a thread other than the one that created the object, then the resource will be released in the wrong thread. As I've understood from Boehm's paper, one advantages of a GC is to fix this issue: a GC collect will call finalizers for the objects that were created in that thread. It's like keeping an object from being destructed until the thread that created the object calls malloc (which is were the GC does its thing). Anyway, Boehm's paper is a good read. He (she?) also addresses the problems with reference counting. http://www.hpl.hp.com/personal/Hans_Boehm/gc/
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns

When an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.

If object B Dtor is called, its member variables are valid, and valid is the object A. The B.~this() can do something with A. After the Dtor is finished, the reference to A is nulled, and A is perhaps also deleted. Seams perfect to me.

I meant my example like this: object A has a reference to object B. During the destruction of A, the references it has to other objects are released, causing some objects being destructed, like B. Now B might (indirectly) use data from A, but A's in the middle of destruction. What I mean to tell with all this is that you still have to be carefull with your destructors. Especially when using a strong-pointer template. You can't always tell what state other objects are in in a destructor. They might be half way during destruction, having released some, but not all, of their references. L.
Jun 04 2006
parent Frank Benoit <keinfarbton nospam.xyz> writes:
 As I've understood from Boehm's paper, one advantages of a GC is to fix this 
 issue: a GC collect will call finalizers for the objects that were created 
 in that thread. It's like keeping an object from being destructed until the 
 thread that created the object calls malloc 

This is not true for the phobos GC. The GC runs in the thread which calls new or gc.fullCollect. But the Dtors are not called while other threads are running, because the GC pauses all other threads. This is called the "Stop-the-world".
 I meant my example like this: object A has a reference to object B. During 
 the destruction of A, the references it has to other objects are released, 
 causing some objects being destructed, like B. Now B might (indirectly) use 
 data from A, but A's in the middle of destruction. What I mean to tell with 
 all this is that you still have to be carefull with your destructors. 
 Especially when using a strong-pointer template. You can't always tell what 
 state other objects are in in a destructor. They might be half way during 
 destruction, having released some, but not all, of their references.
 

Object A cannot be destroyed, if it is reachable directly or indirectly. That is, the ref counting is for. So if A.~this() is called, nobody else has a reference to A.
Jun 04 2006
prev sibling next sibling parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Frank Benoit wrote:

 Another garbage collector thread :)
 
 The best garbage collector I can think about, is a reference counting
 one. To implement this:
 
 * Every object needs a reference counter field.
 * Code using references, need to have code for incrementing/decrementing
 the counters.
 * Destructors are called immediatly if the counter goes zero.
 
 What you get is a precise, incremental, realtime garbage collector.
 * all logic behaviour is deterministic
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns
 * implement games and realtime apps without worry about lags
 * implement secure application without worry about GC attack by valid
 refence data values.
 * consume only the needed memory
 * no more need for the auto object

Not entirely true in the general case, Frank ;) There are always techniques to get around problems with an algorithm, depending on what you want to achieve, but in general, a reference counted GC is said to have these _negative_ properties: (paraphrased from Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Jones and Lins) * The cost of removing the last pointer to an object is unbounded * Even if the general latency is good, the total overhead of adjusting references is significantly greater than that of a tracing GC * It has a substantial space overhead, making it less useful for smaller heaps * Cyclic references (already mentioned) I think these makes RC less suited for the general case, whereas it might fit very well in a tightly controlled system where low latency is required (which I'm well aware that you need ;). IMO, the best general GC (the default) would be one which can handle as many cases as possible, and which is manipulable in those (preferably known) cases which it don't handle out of the box. I think that one need to combine features from several GC techniques (possibly also RC) to achieve this. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi
Jun 04 2006
parent reply Frank Benoit <keinfarbton nospam.xyz> writes:
 * The cost of removing the last pointer to an object is unbounded

 * Even if the general latency is good, the total overhead of adjusting
 references is significantly greater than that of a tracing GC

 * It has a substantial space overhead, making it less useful for smaller
 heaps

I though the tracing GC does need much more heap because it should be called very seldom.
 * Cyclic references (already mentioned)
 
 I think these makes RC less suited for the general case, whereas it might
 fit very well in a tightly controlled system where low latency is required
 (which I'm well aware that you need ;).
 
 IMO, the best general GC (the default) would be one which can handle as many
 cases as possible, and which is manipulable in those (preferably known)
 cases which it don't handle out of the box. I think that one need to
 combine features from several GC techniques (possibly also RC) to achieve
 this. 
 

agreed.
Jun 05 2006
parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Frank Benoit wrote:

 
 * The cost of removing the last pointer to an object is unbounded


The reason is that a delete might trigger an unknown additional amount of decrefs leading to more deletes.
 * It has a substantial space overhead, making it less useful for smaller
 heaps

I though the tracing GC does need much more heap because it should be called very seldom.

Hmm, might have paraphrased somewhat wildly there, but the space overhead _is_ considered a drawback, and reducing it will most likely lead to more computation overhead instead. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi
Jun 05 2006
parent Sean Kelly <sean f4.ca> writes:
Lars Ivar Igesund wrote:
 Frank Benoit wrote:
 
 * The cost of removing the last pointer to an object is unbounded


The reason is that a delete might trigger an unknown additional amount of decrefs leading to more deletes.

And some allocators colaesce adjacent free blocks aggressively (ie. during delete). Not to mention the fact that the allocator data is probably protected by a mutex, etc. In general, if the allocator documentation makes no guarantees about progress then you may at least be stuck on a mutex while other (potentially low-priority) threads are making an (unbounded) allocation. Sean
Jun 05 2006
prev sibling next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Frank Benoit wrote:
 The best garbage collector I can think about, is a reference counting
 one.

Reference counting has its advantages, but some severe disadvantages: 1) cyclical data structures won't get free'd 2) every pointer copy requires an increment and a corresponding decrement - including when simply passing a reference to a function 3) in a multithreaded app, the incs and decs must be synchronized 4) exception handlers (finally's) must be inserted to handle all the decs so there are no leaks. Contrary to assertions otherwise, there is no such thing as "zero overhead exceptions." 5) in order to support slicing and interior pointers, as well as supporting reference counting on arbitrary allocations of non-object data, a separate "wrapper" object must be allocated for each allocation to be ref counted. This essentially doubles the number of allocations needed. 6) The wrapper object will mean that all pointers will need to be double-dereferenced to access the data. 7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C. 8) Ref counting can fragment the heap thereby consuming more memory just like the gc can, though the gc typically will consume more memory overall. 9) Ref counting does not eliminated latency problems, it just reduces them. The proposed C++ shared_ptr<>, which implements ref counting, suffers from all these faults. I haven't seen a heads up benchmark of shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> turned out to be a significant loser in terms of both performance and memory consumption. That said, I'd like for D to optionally support some form of ref counting, as rc is better for managing scarce resources like file handles.
Jun 04 2006
parent reply Frank Benoit <keinfarbton nospam.xyz> writes:
Ok, i see now that my "best and simplest" solution is neiter of both :)
But I don't want to throw the idea away so fast. Please lets discuss it
a bit further.

Let's think about the scenario of having reference counting (RC) and
mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
Python. If Python uses this, the idea cannot be so crazy.

 1) cyclical data structures won't get free'd

Combine RC and MS. So cycles are detected. It should be possible to avoid cycles at all, if you don't want to have any MS pause.
 2) every pointer copy requires an increment and a corresponding
 decrement - including when simply passing a reference to a function

This is true for a smart pointer class. But is it really true for a compiler supported RC? If I call a func/method, I hold a valid reference to the object. While this reference is copied down the stack, these are only temporary reference. And all of them are release before the call returns to the caller. You can ignore them all for RC. Only if the reference is stored, the ref counter has to be incremented. This is only possible if this is done by the compiler. No smart pointer can do this :)
 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough? (Don't know anything about the performance of the asm "lock" instruction)
 4) exception handlers (finally's) must be inserted to handle all the
 decs so there are no leaks. Contrary to assertions otherwise, there is
 no such thing as "zero overhead exceptions."

Yes, this is additional overhead in code size and runtime. And there are additional post destructors needed, to set all member refs to null, which wasn't nulled by the dtor.
 5) in order to support slicing and interior pointers, as well as
 supporting reference counting on arbitrary allocations of non-object
 data, a separate "wrapper" object must be allocated for each allocation
 to be ref counted. This essentially doubles the number of allocations
 needed.

The MS GC allready has a memory registry. Isn't there stored, where an object begins? Let the ref counter be in this registry. This should work with all types and slicing and object member references.
 6) The wrapper object will mean that all pointers will need to be
 double-dereferenced to access the data.

=> 5.)
 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?
 8) Ref counting can fragment the heap thereby consuming more memory just
 like the gc can, though the gc typically will consume more memory overall.

Isn't the actual GC implementation also fragmenting the heap? But I don't have any idea to this issue.
 9) Ref counting does not eliminated latency problems, it just reduces them.

Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete. But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.
 
 The proposed C++ shared_ptr<>, which implements ref counting, suffers
 from all these faults. I haven't seen a heads up benchmark of
 shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
 turned out to be a significant loser in terms of both performance and
 memory consumption.
 
 That said, I'd like for D to optionally support some form of ref
 counting, as rc is better for managing scarce resources like file handles.

As I said, I would be happy to be able to switch the whole runtime system to RC. If I would loose 50% performance this is acceptable for me if I get latency < 100us. Frank Benoit
Jun 04 2006
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Just a few comments, since you invoked my name :P

Frank Benoit wrote:
 Let's think about the scenario of having reference counting (RC) and
 mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
 Python. If Python uses this, the idea cannot be so crazy.

Just wanted to point out that if I remember correctly (and I *could* be wrong), the cycle checking is only done on decrefs, since that's the only time that cycles become a problem.
 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough? (Don't know anything about the performance of the asm "lock" instruction)

They definitely have to be synched: because in decref you've got to: 1. ref--; 2. if( ref == 0 ) free(ptr); 3. if( obj.canHaveCycles ) registerForCycleCheck(obj); If not more. You then also need the incs synched with that, otherwise you could end up "ref++"ing in the middle of "free(ptr)".
 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?

MS' GC doesn't use refcounting (I assume you're talking about .NET, btw). In fact, to solve this problem, it uses pinning: """ When the runtime marshaler sees that your code is passing to native code a reference to a managed reference object, it automatically pins the object. What this means is that the object is put in a special state where the garbage collector will neither move the object in memory nor remove the object from memory. ... When the native function returns, the marshaled object is automatically unpinned. Automatic pinning is very convenient, but it raises another question. What happens if the native function caches the pointer for use later on? When the function returns, won't the collector be free to move the object? The answer is yes, and the solution for your code in such a situation is to manually pin the object using the System.Runtime.InteropServices.GCHandle type. """ -- http://msdn.microsoft.com/msdnmag/issues/04/10/NET/
 8) Ref counting can fragment the heap thereby consuming more memory just
 like the gc can, though the gc typically will consume more memory overall.

Isn't the actual GC implementation also fragmenting the heap? But I don't have any idea to this issue.

Not as badly. The current GC is pretty good at keeping fragmentation to a minimum. Fragmentation happens when you allocate lots of little objects, then free some but not all, and end up with teeny-tiny "holes" in memory that are hard to fill. I think Walter said this because of his comment on using "wrapper" objects to implement the references: you'd end up with lots of "holes" in memory.
 9) Ref counting does not eliminated latency problems, it just reduces them.

Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete. But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.

I think he means that it doesn't make latency just disappear; it spreads it over time. It's the old "drop a frog into hot water and it'll jump out, drop it in cold water and then boil it and it won't notice" anecdote. -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 04 2006
next sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
 They definitely have to be synched: because in decref you've got to:

 1. ref--;
 2. if( ref == 0 ) free(ptr);
 3. if( obj.canHaveCycles ) registerForCycleCheck(obj);

 If not more.  You then also need the incs synched with that, otherwise
 you could end up "ref++"ing in the middle of "free(ptr)".

The thread doing the ++ apparently has a pointer to the object without a reference. How could that ever happen? L.
Jun 04 2006
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Lionello Lunesu wrote:
 They definitely have to be synched: because in decref you've got to:

 1. ref--;
 2. if( ref == 0 ) free(ptr);
 3. if( obj.canHaveCycles ) registerForCycleCheck(obj);

 If not more.  You then also need the incs synched with that, otherwise
 you could end up "ref++"ing in the middle of "free(ptr)".

The thread doing the ++ apparently has a pointer to the object without a reference. How could that ever happen? L.

*slaps forehead* Of course; you're right. I was thinking about race conditions, but obviously got confused. Please refer to my signature. Specifically the "it may not even make sense." part :P -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 04 2006
prev sibling parent Frank Benoit <keinfarbton nospam.xyz> writes:
 If not more.  You then also need the incs synched with that, otherwise
 you could end up "ref++"ing in the middle of "free(ptr)".
 

no other manipulator.
 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

If you give the last reference to a C lib, isn't this a problem for the MS GC also?


MS=mark&sweep :)
Jun 05 2006
prev sibling next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Frank Benoit wrote:
 Ok, i see now that my "best and simplest" solution is neiter of both :)
 But I don't want to throw the idea away so fast. Please lets discuss it
 a bit further.

I'm not throwing it away so fast, I've thought about it for several years <g>.
 Let's think about the scenario of having reference counting (RC) and
 mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
 Python. If Python uses this, the idea cannot be so crazy.

Maybe, maybe not. Python isn't known for its speed. Java uses gc, not rc.
 1) cyclical data structures won't get free'd

Combine RC and MS. So cycles are detected. It should be possible to avoid cycles at all, if you don't want to have any MS pause.

There are ways to avoid the RC cycle problem, but I don't know how expensive they are.
 2) every pointer copy requires an increment and a corresponding
 decrement - including when simply passing a reference to a function

compiler supported RC?

Yes.
 If I call a func/method, I hold a valid reference to the object. While
 this reference is copied down the stack, these are only temporary
 reference. And all of them are release before the call returns to the
 caller. You can ignore them all for RC.

This fails if the passed reference 'escapes'. It can escape by being assigned to a global, being assigned to a class member, being inserted into some data structure that lives longer than the function, and being returned by the function.
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)
 
 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough?

That requires synchronizing.
 (Don't know anything about the
 performance of the asm "lock" instruction)

It's slow enough to be noticable.
 4) exception handlers (finally's) must be inserted to handle all the
 decs so there are no leaks. Contrary to assertions otherwise, there is
 no such thing as "zero overhead exceptions."

Yes, this is additional overhead in code size and runtime. And there are additional post destructors needed, to set all member refs to null, which wasn't nulled by the dtor.
 5) in order to support slicing and interior pointers, as well as
 supporting reference counting on arbitrary allocations of non-object
 data, a separate "wrapper" object must be allocated for each allocation
 to be ref counted. This essentially doubles the number of allocations
 needed.

The MS GC allready has a memory registry. Isn't there stored, where an object begins? Let the ref counter be in this registry. This should work with all types and slicing and object member references.

Yes, you can find out the beginning of an arbitrary pointer's memory block. But that isn't a cheap operation.
 6) The wrapper object will mean that all pointers will need to be
 double-dereferenced to access the data.

=> 5.)

Finding the start of the memory chunk will be several times more expensive than the double indirect.
 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?

I don't know anything about the MS GC.
 8) Ref counting can fragment the heap thereby consuming more memory just
 like the gc can, though the gc typically will consume more memory overall.

Isn't the actual GC implementation also fragmenting the heap?

Yes. malloc will fragment the heap, too. But only GC has the hope of doing compaction.
 But I don't have any idea to this issue.
 
 9) Ref counting does not eliminated latency problems, it just reduces them.

Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.

That's the same with gc. Only upon an allocation is the time unbound.
 But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

GC doesn't require disabling hardware interrupts. It does not need to stop any threads that do not hold references to GC allocated data.
 The proposed C++ shared_ptr<>, which implements ref counting, suffers
 from all these faults. I haven't seen a heads up benchmark of
 shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
 turned out to be a significant loser in terms of both performance and
 memory consumption.

 That said, I'd like for D to optionally support some form of ref
 counting, as rc is better for managing scarce resources like file handles.

As I said, I would be happy to be able to switch the whole runtime system to RC. If I would loose 50% performance this is acceptable for me if I get latency < 100us.

Latency cannot be guaranteed even with malloc(). I don't know what constraints you have on the system you're developing. But when I've written real time interrupt code, the ISRs were written to not make any OS calls or any allocations. All they'd do is just gather data and post it to a FIFO buffer. A non-realtime thread would then pull the data out of the buffer and process it. The system would work without dropping data as long as the average (not maximum) processing time per datum was less than the data acquisition rate.
Jun 04 2006
next sibling parent reply Frank Benoit <keinfarbton nospam.xyz> writes:
 I'm not throwing it away so fast, I've thought about it for several
 years <g>.

 If I call a func/method, I hold a valid reference to the object. While
 this reference is copied down the stack, these are only temporary
 reference. And all of them are release before the call returns to the
 caller. You can ignore them all for RC.

This fails if the passed reference 'escapes'. It can escape by being assigned to a global, being assigned to a class member, being inserted into some data structure that lives longer than the function, and being returned by the function.

Yes, and the compiler can catch these cases. With incr/decr the parameter reference you win nothing. So it should be possible to optimize them.
 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough?

That requires synchronizing.
 (Don't know anything about the
 performance of the asm "lock" instruction)

It's slow enough to be noticable. Yes, you can find out the beginning of an arbitrary pointer's memory block. But that isn't a cheap operation.

yikes, you are absolutely right. So here comes the more ugly solution: Every pointer is allways a struct with the pointer to the data and a pointer to the refcounter.
 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

hm, ok. Here is some more manual work necessary. If you give the last reference to a C lib, isn't this a problem for the MS GC also?

I don't know anything about the MS GC.

you do :) MS=mark&sweep
 9) Ref counting does not eliminated latency problems, it just reduces
 them.

Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.

That's the same with gc. Only upon an allocation is the time unbound.

And additional the time is unbound for the GC fullcollect...
 
 But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

GC doesn't require disabling hardware interrupts. It does not need to stop any threads that do not hold references to GC allocated data.

If you want to have you ISR written in D, you have two choices: a) be sure that you do not move a reference (refb=refa; refa=null;). This can end up in GC deleting a living object. If you cannot guarantee this, than b) disable the interrupting code also while the GC is running. But this is not acceptable, so go back to a) or have another GC.
 Latency cannot be guaranteed even with malloc().
 
 I don't know what constraints you have on the system you're developing.
 But when I've written real time interrupt code, the ISRs were written to
 not make any OS calls or any allocations. All they'd do is just gather
 data and post it to a FIFO buffer. A non-realtime thread would then pull
 the data out of the buffer and process it. The system would work without
 dropping data as long as the average (not maximum) processing time per
 datum was less than the data acquisition rate.

Yes, I will also not do any OS call, neither I would do a "new". And I can easily check the code for not doing this. I even can do a check into the GC allocator, that it is not called in a ISR. But a moving reference can corrupt the working GC. And there is no way to check for that.
Jun 05 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Frank Benoit wrote:
 This fails if the passed reference 'escapes'. It can escape by being
 assigned to a global, being assigned to a class member, being inserted
 into some data structure that lives longer than the function, and being
 returned by the function.

parameter reference you win nothing. So it should be possible to optimize them.

It can only do it if it has full source available. It also won't work for virtual functions, as it is not known at compile time what code will be called.
 So here comes the more ugly solution:
 Every pointer is allways a struct with the pointer to the data and a
 pointer to the refcounter.

It's the wrapper approach, with two allocations per allocation.
 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

If you give the last reference to a C lib, isn't this a problem for the MS GC also?


you do :) MS=mark&sweep

I thought you meant MS=Microsoft <g>. No, it isn't a problem, as it scans the C data segment and stack, too.
 9) Ref counting does not eliminated latency problems, it just reduces
 them.

the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.



The collector only runs during an allocation request - which means you can completely control when a collection can happen.
 But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

stop any threads that do not hold references to GC allocated data.

a) be sure that you do not move a reference (refb=refa; refa=null;). This can end up in GC deleting a living object. If you cannot guarantee this, than b) disable the interrupting code also while the GC is running. But this is not acceptable, so go back to a) or have another GC.

c) code the ISR so it does not reference any gc allocated data. For example, use malloc() for dynamic data the ISR needs. d) make sure any gc references used by the ISR have another reference to them that the gc will scan (such as in the global data segment).
Jun 05 2006
parent Kevin Bealer <Kevin_member pathlink.com> writes:
In article <e60vmu$1b6t$1 digitaldaemon.com>, Walter Bright says...

 9) Ref counting does not eliminated latency problems, it just reduces
 them.

the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete.



The collector only runs during an allocation request - which means you can completely control when a collection can happen.

Correct me if I'm wrong, but while this is true for M + S, doesn't RC /kind of/ have the opposite problem? In refcounting systems, allocation would be mostly predictable, a la malloc, but dereferences can trigger a long chain of other dereferences. I say 'kind of' because it is more deterministic than GC - you can always dig deeper to see what is going to be freed by a given dereference. For instance, if you are the last person to let go of a hash table, you have to wait for the whole thing to unravel, along with any objects stored inside. I guess a real-time RC implementation could be made by keeping a 'recycling bin' and only taking apart N objects at a time, i.e. incremental freeing. This would un-guarantee the prompt running of destructors though (which seems to be one of the big rationales for RC in D, right, at least for non-memory objects?) Kevin
Jun 07 2006
prev sibling parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough?

That requires synchronizing.

Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 07 2006
parent reply Sean Kelly <sean f4.ca> writes:
Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough?

That requires synchronizing.

Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?

These are synchronizing operations. Sean
Jun 07 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Sean Kelly wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough?

That requires synchronizing.

Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?

These are synchronizing operations. Sean

By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 10 2006
parent reply Sean Kelly <sean f4.ca> writes:
Bruno Medeiros wrote:
 Sean Kelly wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if 
 this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough?

That requires synchronizing.

Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?

These are synchronizing operations.

By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ?

For the most part. There are really two issues here. First, the operation must be atomic with respect to other threads. The x86 actually guarantees this for all load/store operations on aligned data <= the bus size. Second, the operations must be observed in the proper order by other threads--ie. the compiler and all involved CPUs (reader and writer) are not allowed to reorder the operations or do any other fancy tricks that may result in a load/store order that isn't the same as program order. For the compiler, this is where volatile comes in, and at the hardware level, a synchronizing operation may be required depending on the situation and on the hardware involved. Again, with x86 you can get away without a LOCK a lot of the time, but not always. So you really must assume that any competing operation has a worst case cost of... well, a lot. Sean
Jun 10 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Sean Kelly wrote:
 Bruno Medeiros wrote:
 Sean Kelly wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if 
 this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough?

That requires synchronizing.

Are you sure? It says in http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that: "In a multi-threaded system, if threads are preemptively scheduled, reference counting requires additional care to ensure that the updates to reference counts are atomic. This is straightforward using hardware primitives such as fetch-and-add, compare-and-swap, or load-linked/store-conditional, but it is definitely necessary to pay attention to this detail." ?

These are synchronizing operations.

By "That requires synchronizing." I thought Walter meant stuff with mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware synchronizing operations be called "atomic inc/dec" ?

For the most part. There are really two issues here. First, the operation must be atomic with respect to other threads. The x86 actually guarantees this for all load/store operations on aligned data <= the bus size. Second, the operations must be observed in the proper order by other threads--ie. the compiler and all involved CPUs (reader and writer) are not allowed to reorder the operations or do any other fancy tricks that may result in a load/store order that isn't the same as program order. For the compiler, this is where volatile comes in, and at the hardware level, a synchronizing operation may be required depending on the situation and on the hardware involved. Again, with x86 you can get away without a LOCK a lot of the time, but not always. So you really must assume that any competing operation has a worst case cost of... well, a lot. Sean

Ah, I see. I had recently read about relative out-of-order execution problems in the Memory Barrier wikipedia entry, and I got the impression (out of nowhere) that the hardware took care of that transparently (i.e., without program intervention), but then, that is not the case, not allways at least? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 12 2006
parent reply Sean Kelly <sean f4.ca> writes:
Bruno Medeiros wrote:
 
 Ah, I see. I had recently read about relative out-of-order execution 
 problems in the Memory Barrier wikipedia entry, and I got the impression 
 (out of nowhere) that the hardware took care of that transparently 
 (i.e., without program intervention), but then, that is not the case, 
 not allways at least?

A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner. However, the only information other CPUs in the system have is what they observe on the system bus. Obviously, so long as CPUs aren't sharing data there aren't any problems. But things get sticky when this isn't the case. The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior. Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code. There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary. Now let's suppose you have two threads doing something like this: thread/CPU 1: A = 1; B = 2; thread/CPU 2: if( A == 1 ) { assert( B == 2 ); } Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail. Enter memory barriers. Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode." So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down. The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary. However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other. Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level. The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use). I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago). Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control. Sean
Jun 13 2006
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Sean Kelly wrote:
 Bruno Medeiros wrote:
 Ah, I see. I had recently read about relative out-of-order execution
 problems in the Memory Barrier wikipedia entry, and I got the
 impression (out of nowhere) that the hardware took care of that
 transparently (i.e., without program intervention), but then, that is
 not the case, not allways at least?

A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner. However, the only information other CPUs in the system have is what they observe on the system bus. Obviously, so long as CPUs aren't sharing data there aren't any problems. But things get sticky when this isn't the case. The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior. Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code. There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary. Now let's suppose you have two threads doing something like this: thread/CPU 1: A = 1; B = 2; thread/CPU 2: if( A == 1 ) { assert( B == 2 ); } Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail. Enter memory barriers. Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode." So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down. The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary. However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other. Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level. The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use). I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago). Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control. Sean

Cool; learn something new every day. Thanks for the informative post. -- Daniel -- Unlike Knuth, I have neither proven or tried the above; it may not even make sense. v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Jun 14 2006
prev sibling parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Sean Kelly wrote:
 Bruno Medeiros wrote:
 Ah, I see. I had recently read about relative out-of-order execution 
 problems in the Memory Barrier wikipedia entry, and I got the 
 impression (out of nowhere) that the hardware took care of that 
 transparently (i.e., without program intervention), but then, that is 
 not the case, not allways at least?

A single CPU is allowed to do whatever it wants so long as it can fool the user into thinking it's executing instructions in a purely sequential manner. However, the only information other CPUs in the system have is what they observe on the system bus. Obviously, so long as CPUs aren't sharing data there aren't any problems. But things get sticky when this isn't the case. The memory model defines observable behavior allowed for a given architecture, as well as any methods offered for affecting that behavior. Say a specific architecture can operate much more quickly if it is allowed to perform writes to memory (from cache) in order of ascending address rather than in the order they were issued in code. There's no way to lie to other CPUs about the order in which writes occur and still have the optimization have any effect, so the designers state in the memory model spec that this architecture is allowed to reorder writes and then typically provide some means for overriding this behavior should it prove necessary. Now let's suppose you have two threads doing something like this: thread/CPU 1: A = 1; B = 2; thread/CPU 2: if( A == 1 ) { assert( B == 2 ); } Given the order in which a and b were declared, and therefore the order in which the writes occur, this assert may or may not fail. Enter memory barriers. Memory barriers are simply a way for the programmer to tell the CPU "I don't care if your way is faster, A simply must be written to memory before B or thread 2 will explode." So the CPU behind thread 1 does as it's told at great expense to performance and thread 2 doesn't melt down. The sticky part is that hardware designers don't agree with one another on how things should work and they never take the advice of the software people, so all architectures have different sets of observable behavior and different methods for working around it when necessary. However, the common concept is that memory barriers all constrain the order in which memory accesses may occur with respect to each other. Think of it as an assertion that "X may not occur before Y" or "X may not occur after Y" at the instruction level. The x86 is actually a bit weird in this regard as it has no formal memory barriers for normal operations (though it has the FENCE instructions for SSE use). I think this is largely for historical reasons--x86 PCs couldn't do SMP at all until fairly recently so none of this mattered, and the memory model has always been fairly strict (it was actually sequential until not terribly long ago). Also, the LOCK instruction acts as a heavy-handed sort of memory barrier as well, so there has been little motivation to add new instructions for finer-grained control. Sean

Nice post! Makes me think, how does one keep up with this? I mean, one who isn't (nor wishes to be) a hardware expert, but wants to keep up with the general developments in this area, thus maintaining an overview of it. -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 14 2006
parent reply Sean Kelly <sean f4.ca> writes:
Bruno Medeiros wrote:
 Makes me think, how does one keep up with this? I mean, one who isn't 
 (nor wishes to be) a hardware expert, but wants to keep up with the 
 general developments in this area, thus maintaining an overview of it.

comp.programming.threads is worth keeping an eye on, though the jargon can get a bit thick there at times. The C++ committee is also working on a new memory model for C++, so any discussion there may be useful. The easiest way to keep an eye on this is follow the links from Hans Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you could keep an eye on comp.std.c++ as well if you're having trouble staying awake at night. Finally, any paper with Maurice Herlihy's name on it is a useful resource if you want a deeper understanding of some of these ideas and don't mind some research. He's got a website with links to many of his papers, though IIRC some of them are hard to track down without an ACM membership. Sean
Jun 14 2006
next sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
Interesting indeed!

I'm currently reading Boehm's article "Threads Cannot be Implemented as a 
Library". I wonder if this means that threads should become primitives of 
some sort [in D] ?

L. 
Jun 14 2006
parent reply Sean Kelly <sean f4.ca> writes:
Lionello Lunesu wrote:
 Interesting indeed!
 
 I'm currently reading Boehm's article "Threads Cannot be Implemented as a 
 Library". I wonder if this means that threads should become primitives of 
 some sort [in D] ?

It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now. Sean
Jun 14 2006
parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Sean Kelly wrote:
 Lionello Lunesu wrote:
 Interesting indeed!

 I'm currently reading Boehm's article "Threads Cannot be Implemented 
 as a Library". I wonder if this means that threads should become 
 primitives of some sort [in D] ?

It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now.

...that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such? Would be nice if there were an built-in type with overloaded ++x, x++, x=y, all atomic. (I've seen some C++ templates that do this.) L.
Jun 15 2006
parent reply Sean Kelly <sean f4.ca> writes:
Lionello Lunesu wrote:
 Sean Kelly wrote:
 Lionello Lunesu wrote:
 Interesting indeed!

 I'm currently reading Boehm's article "Threads Cannot be Implemented 
 as a Library". I wonder if this means that threads should become 
 primitives of some sort [in D] ?

It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now.

...that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such?

"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.
 Would be nice if there were an built-in type with overloaded ++x, x++, 
 x=y, all atomic. (I've seen some C++ templates that do this.)

It would be nice, but I've yet to see a proposal that I like. The problem is that in addition to pure atomic operations (which is how the x86 typically works by default) lock-free programmers typically want to make an assertion about instruction ordering as well, and built-in semantics don't expose this very cleanly. One could simply have volatile types similar to Java where no optimization is allowed on them at all, but this may be too heavy-handed to satisfy some folks. For now, I think a library solution is a reasonable alternative. Ares has had one for quite a while and it's almost standalone so it could be used with Phobos with very little effort. The documentation is here (DDoc seems to want to generate docs for private template functions, so I apologize for the clutter): http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html And the code itself is here: http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d It currently only works on the x86 and is really intended for API developers, but it does the trick and I'll expand it as needed. For typical use, msync.acq (acquire), msync.rel (release), and msync.none are the flags you're likely to care about. The other more fine-grained flags just alias acq/rel on x86 anyway. In case the docs are confusing, the API calls supported are: load store storeIf (ie. CAS) increment decrement Sean
Jun 15 2006
next sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Sean Kelly wrote:
 Lionello Lunesu wrote:
 ...that leaves only atomic operations? Or would "volatile" or 
 "synchronized" take care of memory fencing and such?

"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.

So I suppose "volatile" could be extended to include memory locking as well! // instructions inside this block are 'lock'ed volatile(atomic) { ++a; a = x; } (or perhaps even "synchronized(memory) {...}", but I think extending volatile makes more sense). Come to think of it, doesn't "synchronized" also imply "volatile"? Or, another possibility would be to add the volatile declaration as in C, but then actually locking all access to the variable: volatile int i; i++;// atomic Seems to me it should be possibly to extend D with a built-in and portable construct for these atomic operations.
 It would be nice, but I've yet to see a proposal that I like.  The 
 problem is that in addition to pure atomic operations (which is how the 
 x86 typically works by default) lock-free programmers typically want to 
 make an assertion about instruction ordering as well, and built-in 
 semantics don't expose this very cleanly.  One could simply have 
 volatile types similar to Java where no optimization is allowed on them 
 at all, but this may be too heavy-handed to satisfy some folks.  For 
 now, I think a library solution is a reasonable alternative.  Ares has 
 had one for quite a while and it's almost standalone so it could be used 
 with Phobos with very little effort.  The documentation is here (DDoc 
 seems to want to generate docs for private template functions, so I 
 apologize for the clutter):
 
 http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html
 
 And the code itself is here:
 
 http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d

Thanks, I'll have a look at them. L.
Jun 16 2006
parent Sean Kelly <sean f4.ca> writes:
Lionello Lunesu wrote:
 Sean Kelly wrote:
 Lionello Lunesu wrote:
 ...that leaves only atomic operations? Or would "volatile" or 
 "synchronized" take care of memory fencing and such?

"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.

So I suppose "volatile" could be extended to include memory locking as well!

Yes it could, though for now I prefer it the way it is as it offers more control over exactly what's going on. It helps tremendously that D has such good inline asm support.
 // instructions inside this block are 'lock'ed
 volatile(atomic) {
     ++a;
     a = x;
 }
 
 (or perhaps even "synchronized(memory) {...}", but I think extending 
 volatile makes more sense).
 
 Come to think of it, doesn't "synchronized" also imply "volatile"?

Yes. Synchronization mechanisms rely on memory barriers to work properly. In fact, it's been suggested in the past on c.p.t that an empty mutex block: synchronzed {} could be used as a high-level sort of bidirectional memory barrier, except for the risk of the compiler optimizing the call away entirely.
 Or, another possibility would be to add the volatile declaration as in 
 C, but then actually locking all access to the variable:
 
 volatile int i;
 i++;// atomic

This is sort of how Java implements volatile and it wouldn't surprise me if C++ did something similar. Right now, the "volatile" qualifier offers basically no language guarantees for concurrent programming, as it was actually intended for accessing interrupt data IIRC.
 Seems to me it should be possibly to extend D with a built-in and 
 portable construct for these atomic operations.

Definately. The trouble is coming up with a good design :-) I think the C++ team is definately qualified to do so, but they may have to make some sacrifices in the interest of time. Sean
Jun 16 2006
prev sibling parent reply Don Clugston <dac nospam.com.au> writes:
Sean Kelly wrote:
 Lionello Lunesu wrote:
 Sean Kelly wrote:
 Lionello Lunesu wrote:
 Interesting indeed!

 I'm currently reading Boehm's article "Threads Cannot be Implemented 
 as a Library". I wonder if this means that threads should become 
 primitives of some sort [in D] ?

It's been a while since I read that article, but I think D already has enough in-language recognition of threading issues to escape most/all of the problems Boehm mentions: Thread is a standard library component rather than a third-party component, synchronized is available for mutual exclusion, and volatile (plus perhaps inline asm code) is enough to manage lock-free work. A more robust multithread-aware memory model would be good to have, but it's a sticky enough issue that I think Walter is right for waiting on that for now.

...that leaves only atomic operations? Or would "volatile" or "synchronized" take care of memory fencing and such?

"volatile" is actually intended to address compiler optimizations in a similar way to how memory barriers address CPU optimizations. It is a necessary part of any lock-free operation in D. "synchronized" is used for mutual exclusion and typically involves a mutex, so while it should have the proper effect, it's not lock-free.
 Would be nice if there were an built-in type with overloaded ++x, x++, 
 x=y, all atomic. (I've seen some C++ templates that do this.)

It would be nice, but I've yet to see a proposal that I like. The problem is that in addition to pure atomic operations (which is how the x86 typically works by default) lock-free programmers typically want to make an assertion about instruction ordering as well, and built-in semantics don't expose this very cleanly. One could simply have volatile types similar to Java where no optimization is allowed on them at all, but this may be too heavy-handed to satisfy some folks. For now, I think a library solution is a reasonable alternative. Ares has had one for quite a while and it's almost standalone so it could be used with Phobos with very little effort.

I'd really like to see this included in the standard distribution. Has anyone made a general-purpose lock-free linked list? Just a low-level list of void * pointers would be fantastic. The documentation is here (DDoc
 seems to want to generate docs for private template functions, so I 
 apologize for the clutter):
 
 http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html
 
 And the code itself is here:
 
 http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d
 
 It currently only works on the x86 and is really intended for API 
 developers, but it does the trick and I'll expand it as needed.  For 
 typical use, msync.acq (acquire), msync.rel (release), and msync.none 
 are the flags you're likely to care about.  The other more fine-grained 
 flags just alias acq/rel on x86 anyway.  In case the docs are confusing, 
 the API calls supported are:
 
 load
 store
 storeIf (ie. CAS)
 increment
 decrement
 
 
 Sean

Jun 16 2006
parent Sean Kelly <sean f4.ca> writes:
Don Clugston wrote:
 
 I'd really like to see this included in the standard distribution.
 Has anyone made a general-purpose lock-free linked list? Just a 
 low-level list of void * pointers would be fantastic.

Mango contains some of Doug Lea's containers and there may be one there. If not, it shouldn't be tremendously difficult to implement or port one. I'm not sure I have the time right now, but if someone wants to the the atomic package in Ares they're more than welcome to. Sean
Jun 16 2006
prev sibling parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Sean Kelly wrote:
 Bruno Medeiros wrote:
 Makes me think, how does one keep up with this? I mean, one who isn't 
 (nor wishes to be) a hardware expert, but wants to keep up with the 
 general developments in this area, thus maintaining an overview of it.

comp.programming.threads is worth keeping an eye on, though the jargon can get a bit thick there at times. The C++ committee is also working on a new memory model for C++, so any discussion there may be useful. The easiest way to keep an eye on this is follow the links from Hans Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you could keep an eye on comp.std.c++ as well if you're having trouble staying awake at night. Finally, any paper with Maurice Herlihy's name on it is a useful resource if you want a deeper understanding of some of these ideas and don't mind some research. He's got a website with links to many of his papers, though IIRC some of them are hard to track down without an ACM membership. Sean

Well, that's the thing, I just want to have some general knowledge about this area of hardware and concurrency/distributed-systems, not become an expert on it. My time to learn new things is limited (and I definitely have no trouble going to sleep :( ), so my priority goes for learning things that I have immediate need to use/become-involved. If I ever have the need to learn more in-depth I will. You see, concurrency is very important for people interested in server-side programming. But for multimedia programming, it's not that important. For now that is, as it seems that with the coming of multicore CPUs, concurrency is becoming more and more of a general software development topic, relevant even for non-intrinsically concurrent apps. -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 17 2006
parent Sean Kelly <sean f4.ca> writes:
Bruno Medeiros wrote:
 Sean Kelly wrote:
 Bruno Medeiros wrote:
 Makes me think, how does one keep up with this? I mean, one who isn't 
 (nor wishes to be) a hardware expert, but wants to keep up with the 
 general developments in this area, thus maintaining an overview of it.

comp.programming.threads is worth keeping an eye on, though the jargon can get a bit thick there at times. The C++ committee is also working on a new memory model for C++, so any discussion there may be useful. The easiest way to keep an eye on this is follow the links from Hans Boehm's website (http://www.hpl.hp.com/personal/Hans_Boehm/) though you could keep an eye on comp.std.c++ as well if you're having trouble staying awake at night. Finally, any paper with Maurice Herlihy's name on it is a useful resource if you want a deeper understanding of some of these ideas and don't mind some research. He's got a website with links to many of his papers, though IIRC some of them are hard to track down without an ACM membership.

Well, that's the thing, I just want to have some general knowledge about this area of hardware and concurrency/distributed-systems, not become an expert on it. My time to learn new things is limited (and I definitely have no trouble going to sleep :( ), so my priority goes for learning things that I have immediate need to use/become-involved. If I ever have the need to learn more in-depth I will. You see, concurrency is very important for people interested in server-side programming. But for multimedia programming, it's not that important. For now that is, as it seems that with the coming of multicore CPUs, concurrency is becoming more and more of a general software development topic, relevant even for non-intrinsically concurrent apps.

Sadly, I don't know of any more basic sources of information on this stuff. David Butenhof's "Programming with POSIX Threads" is quite good for the basic concepts, but beyond that it's mostly research papers, wiki topics, etc. But don't hesitate to ask on comp.programming.threads or here if you have a question. Sean
Jun 17 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
For what it's worth, incremental GC is similar to refcounting in that 
the cost is distributed across pointer use to avoid the need for a 
costly mark/sweep phase.  I've even seen hard-realtime incremental GC 
implementations, so it's a long-term solution worth considering.  That 
said, I would like to be able to create some sort of smart pointer in D 
for tracking valuable resources, so it's good to hear that Walter is 
considering this as well.

Frank Benoit wrote:
 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough? (Don't know anything about the performance of the asm "lock" instruction)

Depending on the platform and the synch. requirement, a "lock" type instruction may be necessary. It typically is for non-x86 platforms, but I think you could get away without using one in some cases on x86. The best place to check for optimizations would be the Boost shared_ptr code. IIRC they don't use the Interlocked calls but there are spin locks in some places. The cost of a "lock" instruction is variable based on the hardware, number of CPUs, whether the cache line is shared, etc, but the most reliable estimate I've seen averages locked ops at around 70ns.
 9) Ref counting does not eliminated latency problems, it just reduces them.

Where do you mean is the latency with RC? The collecting work is done in the smallest pieces all the time. Only the allocation of new memory needs an unbound time to complete. But it does not need to stop all other threads. e.g. hardware interrupts don't need to be disabled.

I've seen analyses that suggested reference counting is actually slower than mark/sweep when measured over the run of the program. The advantage is the avoidance of the unbounded mark/sweep phase, with far more expensive pointer modifications instead. As above, I'd almost rather see incremental GC support in D (it typically requires additional ode generation on pointer modifications, and may be tricky to sort out for C routines like memset). Sean
Jun 05 2006
parent reply Jeff Parsons <jeffrparsons optusnet.com.au> writes:
Sean Kelly wrote:
 For what it's worth, incremental GC is similar to refcounting in that 
 the cost is distributed across pointer use to avoid the need for a 
 costly mark/sweep phase.  I've even seen hard-realtime incremental GC 
 implementations, so it's a long-term solution worth considering.

My _only_ significant beef with D so far as been the unbound time for a garbage collection; therefore, if an incremental garbage collector built into D could guarantee me a set maximum time would be taken for each garbage collection (which I would manually induce as often as possible) then unless the efficiency hit is TERRIBLE I would be all over it. What I'm really curious about, however, is how important this is to _other_ people - especially Walter. Would it be reasonable to say that it seems D has been designed largely without long-term real-time performance in mind? Sure, there are workarounds, but I haven't seen anything elegant yet that will let me use D for such applications without making my memory management more complicated than it would have been in C++. Is anything in the works?
Jun 07 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Jeff Parsons wrote:
 
 What I'm really curious about, however, is how important this is to 
 _other_ people - especially Walter. Would it be reasonable to say that 
 it seems D has been designed largely without long-term real-time 
 performance in mind? Sure, there are workarounds, but I haven't seen 
 anything elegant yet that will let me use D for such applications 
 without making my memory management more complicated than it would have 
 been in C++. Is anything in the works?

The spec doesn't contain any language forbidding such an approach, but I don't think the popular compilers will support it by default. Certainly not any time soon, at any rate. The "problem" with incremental GC is that it requires extra code generation for all pointer modifications to signal the garbage collector that some re-scanning may be necessary. This has obvious performance implications for small apps that don't ever actually need to collect, but tends to level out as run time increases. And, of course, the potential for progress guarantees (possible with incremental GC, but not at all common) is a huge bonus for realtime programming. That aside, there may be other approaches to garbage collection that would make everyone happy and that don't require additional code generation. If you know of one, please say so. My own knowledge of the topic is limited to what I've read in research papers. Sean
Jun 08 2006
parent Larry Evans <cppljevans cos-internet.com> writes:
On 06/08/2006 06:04 PM, Sean Kelly wrote:
 Jeff Parsons wrote:
 

 performance in mind? Sure, there are workarounds, but I haven't seen 
 anything elegant yet that will let me use D for such applications 
 without making my memory management more complicated than it would 
 have been in C++. Is anything in the works?

The spec doesn't contain any language forbidding such an approach, but I don't think the popular compilers will support it by default. Certainly

 programming.  That aside, there may be other approaches to garbage 
 collection that would make everyone happy and that don't require 
 additional code generation.  If you know of one, please say so.  My own 
 knowledge of the topic is limited to what I've read in research papers.

current Boehm collector. IOW, any ptr<BW> p(T) on the stack would be a root pointer and each data structure would have a map of ptr<BW> locations. Similarly, there could be ptr<Copy> or ptr<IncBw> or even ptr<deterministic> and ptr<collected> instead of the class [collected] SomeClass{...}; class [deterministic] OtherClass{...} proposed by Adrei here: http://tinyurl.com/o5zpm This, of course, would require cooperation from the compiler, but the same it would be true of: class [collected] SomeClass{...}; it's just that some specializations, ptr<collected>, would be predetermined and others, for example, user-defined ones, would not.
Jun 08 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for a 
 garbage collection; therefore, if an incremental garbage collector built 
 into D could guarantee me a set maximum time would be taken for each 
 garbage collection (which I would manually induce as often as possible) 
 then unless the efficiency hit is TERRIBLE I would be all over it.

If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable. The approach most real time programmers use to deal with this is to preallocate all needed data before entering the real time section, and the real time section does no malloc's or free's. Also, you can use malloc and free in D, in exactly the same way you would in C.
Jun 08 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for 
 a garbage collection; therefore, if an incremental garbage collector 
 built into D could guarantee me a set maximum time would be taken for 
 each garbage collection (which I would manually induce as often as 
 possible) then unless the efficiency hit is TERRIBLE I would be all 
 over it.

If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.

For what it's worth, I have read papers describing hard real-time incremental garbage collectors, but I haven't yet seen one available outside research circles. I'd like to give one a shot at some point, but I've far too many other distractions to know when that will be. If anyone is interested however, some of the papers are available here: http://www.cs.utexas.edu/users/oops/papers.html See papers 11 and 14.
 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.

This is certainly the easiest and probably the safest method. Sean
Jun 08 2006
parent Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 Walter Bright wrote:
 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.

This is certainly the easiest and probably the safest method.

Yup. It gets the job done. Real time threads should be considered a very valuable resource, and as such all possible computation should be removed from them and put into non-realtime threads.
Jun 09 2006
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:


 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.
 
 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-) -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 9/06/2006 4:44:56 PM
Jun 08 2006
parent reply Sean Kelly <sean f4.ca> writes:
Derek Parnell wrote:
 On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:
 
 
 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.

 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-)

This gets sticky if one wants to use strings in D though, as there's no real way to make them work outside of the garbage collector mechanism. Sure we could use C-style pointers, but it's not a terribly attractive option. I suppose the alternative would be a class with a custom allocator method. Sean
Jun 08 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 Derek Parnell wrote:
 On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:


 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, 
 and the real time section does no malloc's or free's.

 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-)

This gets sticky if one wants to use strings in D though, as there's no real way to make them work outside of the garbage collector mechanism. Sure we could use C-style pointers, but it's not a terribly attractive option.

You can use D arrays. Just allocate them using malloc (or statically allocate them). Don't concatenate them. Though I would ask why the real time code was allocating strings? C is very good for writing real time code, and you can write "C"-style code in D, and it will behave exactly like the corresponding C code. You can call printf, strcpy, malloc, etc., which will behave just like C's printf, strcpy, and malloc because they *are* C's functions. It will not do a gc collect unless the code allocates gc memory, and it will ONLY do a gc collect during such an allocation. Heck, my old Empire game was written in C. I renamed the files from .c to .d, did a few minor syntactic edits, and voila! it was now in D. It runs exactly the same. But the bottom line is that a real time thread should not be doing storage allocation. It shouldn't be calling new, or malloc. It shouldn't do storage allocation regardless of whether it is written in D, C, C++, or assembler.
Jun 09 2006
parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 Derek Parnell wrote:
 On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:


 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, 
 and the real time section does no malloc's or free's.

 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

It seems obvious that if one doesn't want the Garbage Collector to collect garbage, one doesn't create any garbage for it to collect. ;-)

This gets sticky if one wants to use strings in D though, as there's no real way to make them work outside of the garbage collector mechanism. Sure we could use C-style pointers, but it's not a terribly attractive option.

You can use D arrays. Just allocate them using malloc (or statically allocate them). Don't concatenate them.

So then I'd just manually set the ptr and len members? Sounds reasonable, provided everyone behaves and uses the copy on write idiom you suggest.
 Though I would ask why the real time code was allocating strings?

I was mostly thinking about this in the context of just how much allocation pretty much must be performed by the GC for applications with special memory requirements (gigabytes of data, for example). It just seemed to have an overlap insofar as this sort of programming is concerned.
 C is very good for writing real time code, and you can write "C"-style 
 code in D, and it will behave exactly like the corresponding C code. You 
 can call printf, strcpy, malloc, etc., which will behave just like C's 
 printf, strcpy, and malloc because they *are* C's functions. It will not 
 do a gc collect unless the code allocates gc memory, and it will ONLY do 
 a gc collect during such an allocation.

At times I've considered extending thread support to allow for the creation of detached threads, but it simply seems to error-prone to be worthwhile. I think you're right in that if someone is doing real-time programming then they shouldn't expect much hand-holding, because the assumptions involved offer too great a risk of unexpected behavior.
 Heck, my old Empire game was written in C. I renamed the files from .c 
 to .d, did a few minor syntactic edits, and voila! it was now in D. It 
 runs exactly the same.

Yup. Though it's worth noting that D has been heavily influenced by your particular programming style. Not at all a bad thing, but it does imply that you are less likely than anyone else to experience code translation problems :-) On a related note, I sometimes wonder if Sid Meier credits you anywhere for effectively creating the genre that's made him so successful ;-)
 But the bottom line is that a real time thread should not be doing 
 storage allocation. It shouldn't be calling new, or malloc. It shouldn't 
 do storage allocation regardless of whether it is written in D, C, C++, 
 or assembler.

True enough. Sean Sean
Jun 09 2006
parent Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 Walter Bright wrote:
 You can use D arrays. Just allocate them using malloc (or statically 
 allocate them). Don't concatenate them.

reasonable, provided everyone behaves and uses the copy on write idiom you suggest.

int *p = cast(int *)calloc(dim, sizeof(int)); int[] a = p[0 .. dim];
 C is very good for writing real time code, and you can write "C"-style 
 code in D, and it will behave exactly like the corresponding C code. 
 You can call printf, strcpy, malloc, etc., which will behave just like 
 C's printf, strcpy, and malloc because they *are* C's functions. It 
 will not do a gc collect unless the code allocates gc memory, and it 
 will ONLY do a gc collect during such an allocation.

creation of detached threads, but it simply seems to error-prone to be worthwhile. I think you're right in that if someone is doing real-time programming then they shouldn't expect much hand-holding, because the assumptions involved offer too great a risk of unexpected behavior.

I agree that realtime programming is a special type of programming, and that someone doing it should be a more advanced programmer. You can't just take any random code (in any language), make it realtime, and expect it to work properly. It's got to be engineered for the peculiarities of realtime requirements.
 Heck, my old Empire game was written in C. I renamed the files from .c 
 to .d, did a few minor syntactic edits, and voila! it was now in D. It 
 runs exactly the same.

Yup. Though it's worth noting that D has been heavily influenced by your particular programming style. Not at all a bad thing, but it does imply that you are less likely than anyone else to experience code translation problems :-) On a related note, I sometimes wonder if Sid Meier credits you anywhere for effectively creating the genre that's made him so successful ;-)

Nope :-)
Jun 09 2006
prev sibling parent reply David Gileadi <foo bar.com> writes:
Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for 
 a garbage collection; therefore, if an incremental garbage collector 
 built into D could guarantee me a set maximum time would be taken for 
 each garbage collection (which I would manually induce as often as 
 possible) then unless the efficiency hit is TERRIBLE I would be all 
 over it.

If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.

It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play. This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code. I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely. As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.
Jun 09 2006
next sibling parent reply John Reimer <terminal.node gmail.com> writes:
David Gileadi wrote:
 Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for 
 a garbage collection; therefore, if an incremental garbage collector 
 built into D could guarantee me a set maximum time would be taken for 
 each garbage collection (which I would manually induce as often as 
 possible) then unless the efficiency hit is TERRIBLE I would be all 
 over it.

If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.

It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play. This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code. I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely. As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.

Didn't Walter point out that that, in the current gc implementation, you just have to be conscientious about when you call 'new' (which appears to initiate a collect) if you want to avoid the gc initiated pauses. So in game programming, I assume it's just a matter of good organization and planning. You make sure you allocate sufficient memory ahead of time and avoid using 'new' in any location that would cause an ugly pause in the game. Furthermore, you might choose appropriate times for when you want to call a full collect manually. As I see it, the obligation for careful design doesn't disappear when a gc is in place. It certainly makes life easier in many ways. But one still has to be aware of the gc shortcomings and adapt to the situation. I still see it as a tremendous improvement over having to clean up after every memory allocation you make (C/C++). In short, using a gc doesn't mean you don't have to do your homework. :) -JJR
Jun 09 2006
parent Dave <Dave_member pathlink.com> writes:
John Reimer wrote:
 David Gileadi wrote:
 Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time 
 for a garbage collection; therefore, if an incremental garbage 
 collector built into D could guarantee me a set maximum time would 
 be taken for each garbage collection (which I would manually induce 
 as often as possible) then unless the efficiency hit is TERRIBLE I 
 would be all over it.

If you're looking for a *guarantee*, then you cannot even use malloc(). Malloc has unbounded time, and is not predictable.

It seems that a fairly common desire is to use D for writing games. Games don't typically have hard real-time requirements, but it also doesn't look good if your screen freezes for a second or two in the middle of play. This is an area where an incremental collector or some kind of reference counting solution would be handy, since (especially in large programs) it would be difficult to avoid using garbage collected code. I realize that the programmer has the responsibility to manage memory intelligently, but it would be nice to not avoid using GC code entirely. As others have stated, it would even be worth giving up some overall performance to gain a smooth playing experience.

Didn't Walter point out that that, in the current gc implementation, you just have to be conscientious about when you call 'new' (which appears to initiate a collect) if you want to avoid the gc initiated pauses.

Yea, in the current GC, allocation is the only thing that may implicitly initiate a collection.
 So in game programming, I assume it's just a matter of good organization 
 and planning.  You make sure you allocate sufficient memory ahead of 
 time and avoid using 'new' in any location that would cause an ugly 
 pause in the game.  Furthermore, you might choose appropriate times for 
 when you want to call a full collect manually.
 

And D's easy array slicing along with array operations (like re-initializing an array with 'arr[] = 0;') makes the code to use pre-allocated buffers a lot easier to write too.
 As I see it, the obligation for careful design doesn't disappear when a 
 gc is in place.  It certainly makes life easier in many ways.  But one 
 still has to be aware of the gc shortcomings and adapt to the situation. 
  I still see it as a tremendous improvement over having to clean up 
 after every memory allocation you make (C/C++).
 
 In short, using a gc doesn't mean you don't have to do your homework. :)
 
 -JJR

Jun 09 2006
prev sibling parent Chad J <gamerChad _spamIsBad_gmail.com> writes:
David Gileadi wrote:
 
 It seems that a fairly common desire is to use D for writing games. 
 Games don't typically have hard real-time requirements, but it also 
 doesn't look good if your screen freezes for a second or two in the 
 middle of play.
 
 This is an area where an incremental collector or some kind of reference 
 counting solution would be handy, since (especially in large programs) 
 it would be difficult to avoid using garbage collected code.  I realize 
 that the programmer has the responsibility to manage memory 
 intelligently, but it would be nice to not avoid using GC code entirely. 
  As others have stated, it would even be worth giving up some overall 
 performance to gain a smooth playing experience.

This is what I'm screaming. It is becoming popular nowadays to allow people to modify games in various ways, one of which is modding the game's code. I don't think you will see much of this happening inside a 3d engine, but the game logic that sits next to the 3d engine and dictates gameplay may be open. This allows people to change things like how the AI behaves, how much damage you take when hit with a certain weapon, how that certain weapon behaves, how an economy behaves, etc. Modders may or may not be experienced programmers, so it's probably best to assume they don't have significant experience. This has caused plenty of games to use scripting languages for modding, since C plus plus is tough to learn and not good for this stuff. I think D could do it though. It would kick butt if D could do this, because it would make the mods run much faster than the interpreted code that is probably going to be used nowadays, even if you have like 50% reference counting overhead. Automatic memory management is a huge boon to that sort of ease of code writing that is expected for inexperienced coders to get in on the action and do constructive stuff. Even without the modding argument, being able to use automatic memory management in some parts of the game would be very advantageous in development time. It seems to me though, that if you want to do non-pausing games in D, you just use C style memory management. Bummer. As David Gileadi said, games don't have strict latency requirements. For games, garbage collection doesn't need to be deterministic, and it doesn't need to finish in under 100 microseconds, it just needs to not be noticable. For a few years I've messed with RunUO which is, afaik, written entirely in C#, and I never noticed the pauses (though the worldsaves were annoying!). This is doable. Also, think of all the new D programmers there would be if someone were to release a popular game that exposed D as a scripting language :)
Jun 12 2006
prev sibling parent renox <renosky free.fr> writes:
Frank Benoit wrote:
 Another garbage collector thread :)
 
 The best garbage collector I can think about, is a reference counting
 one.

Uh, what's your definition of "best"? The simplest? Then maybe. If you take a look at Java, it doesn't use a reference counting GC and there is a good reason for this: AFAIK modern GC are far better performance wise than reference counting GC. One GC that I remember (no I'm not an expert) is Mark Copy GC, see "MarkCopy:Fast copying GC with less space overhead". The only thing that reference counting GCs have for them is that this scheme is simple, more or less robust but "best", best for what? I doubt very much that the best GC used for say batch computations is the best for GUI clients.. Regards, RenoX
Jun 04 2006