digitalmars.D - GC, the simple solution

Frank Benoit (16/16) Jun 04 2006 Another garbage collector thread :)

Daniel Keep (55/74) Jun 04 2006 "GC, the simple solution"

Frank Benoit (40/58) Jun 04 2006 In nearly every other GC implementation you will need also a read or

Daniel Keep (80/147) Jun 04 2006 I guess we'll just have to disagree, then :)

Lars Ivar Igesund (10/29) Jun 05 2006 Well, actually, the current GC is kinda bad with cycles. Try to make a

Larry Evans (33/46) Jun 05 2006 be valid

Daniel Keep (25/94) Jun 08 2006 Sorry for the late (and short) reply: exams breathing down my neck and

Lionello Lunesu (10/15) Jun 04 2006 This is not such a good thing! You don't really know who releases the la...

Frank Benoit (15/40) Jun 04 2006 Where is the disadvantage? If you release concurrent in different

Lionello Lunesu (23/50) Jun 04 2006 There are many resources that must be released by the thread by which th...

Frank Benoit (7/20) Jun 04 2006 This is not true for the phobos GC. The GC runs in the thread which

Lars Ivar Igesund (25/44) Jun 04 2006 Not entirely true in the general case, Frank ;)

Frank Benoit (5/22) Jun 05 2006 In codesize yes, but in heap size?

Lars Ivar Igesund (10/18) Jun 05 2006 The reason is that a delete might trigger an unknown additional amount o...

Sean Kelly (8/15) Jun 05 2006 And some allocators colaesce adjacent free blocks aggressively (ie.

Walter Bright (28/30) Jun 04 2006 Reference counting has its advantages, but some severe disadvantages:

Frank Benoit (38/66) Jun 04 2006 Ok, i see now that my "best and simplest" solution is neiter of both :)

Daniel Keep (44/68) Jun 04 2006 Just wanted to point out that if I remember correctly (and I *could* be

Lionello Lunesu (3/9) Jun 04 2006 The thread doing the ++ apparently has a pointer to the object without a...

Daniel Keep (12/26) Jun 04 2006 *slaps forehead*

Frank Benoit (3/11) Jun 05 2006 MS=mark&sweep :)

Walter Bright (31/110) Jun 04 2006 I'm not throwing it away so fast, I've thought about it for several

Frank Benoit (21/76) Jun 05 2006 Yes, and the compiler can catch these cases. With incr/decr the

Walter Bright (13/48) Jun 05 2006 It can only do it if it has full source available. It also won't work

Kevin Bealer (15/24) Jun 07 2006 Correct me if I'm wrong, but while this is true for M + S, doesn't RC /k...

Bruno Medeiros (12/22) Jun 07 2006 Are you sure? It says in

Sean Kelly (3/23) Jun 07 2006 These are synchronizing operations.

Bruno Medeiros (7/33) Jun 10 2006 By "That requires synchronizing." I thought Walter meant stuff with

Sean Kelly (15/44) Jun 10 2006 For the most part. There are really two issues here. First, the

Bruno Medeiros (9/57) Jun 12 2006 Ah, I see. I had recently read about relative out-of-order execution

Sean Kelly (51/57) Jun 13 2006 A single CPU is allowed to do whatever it wants so long as it can fool

Daniel Keep (8/78) Jun 14 2006 Cool; learn something new every day. Thanks for the informative post.
Bruno Medeiros (8/78) Jun 14 2006 Nice post!

Sean Kelly (13/16) Jun 14 2006 comp.programming.threads is worth keeping an eye on, though the jargon

Lionello Lunesu (5/5) Jun 14 2006 Interesting indeed!

Sean Kelly (10/15) Jun 14 2006 It's been a while since I read that article, but I think D already has

Lionello Lunesu (6/21) Jun 15 2006 ...that leaves only atomic operations? Or would "volatile" or

Sean Kelly (33/54) Jun 15 2006 "volatile" is actually intended to address compiler optimizations in a

Lionello Lunesu (18/45) Jun 16 2006 So I suppose "volatile" could be extended to include memory locking as w...

Sean Kelly (18/48) Jun 16 2006 Yes it could, though for now I prefer it the way it is as it offers more...

Don Clugston (5/70) Jun 16 2006 I'd really like to see this included in the standard distribution.

Sean Kelly (6/10) Jun 16 2006 Mango contains some of Doug Lea's containers and there may be one there....

Bruno Medeiros (16/36) Jun 17 2006 Well, that's the thing, I just want to have some general knowledge about...

Sean Kelly (7/38) Jun 17 2006 Sadly, I don't know of any more basic sources of information on this

Sean Kelly (25/35) Jun 05 2006 For what it's worth, incremental GC is similar to refcounting in that

Jeff Parsons (13/17) Jun 07 2006 My _only_ significant beef with D so far as been the unbound time for a

Sean Kelly (15/23) Jun 08 2006 The spec doesn't contain any language forbidding such an approach, but I...

Larry Evans (18/32) Jun 08 2006 [snip]

Walter Bright (8/13) Jun 08 2006 If you're looking for a *guarantee*, then you cannot even use malloc().

Sean Kelly (10/23) Jun 08 2006 For what it's worth, I have read papers describing hard real-time

Walter Bright (5/11) Jun 09 2006 Yup. It gets the job done.

Derek Parnell (9/15) Jun 08 2006 It seems obvious that if one doesn't want the Garbage Collector to colle...

Sean Kelly (7/19) Jun 08 2006 This gets sticky if one wants to use strings in D though, as there's no

Walter Bright (17/36) Jun 09 2006 You can use D arrays. Just allocate them using malloc (or statically

Sean Kelly (22/59) Jun 09 2006 So then I'd just manually set the ptr and len members? Sounds

Walter Bright (9/36) Jun 09 2006 int *p = cast(int *)calloc(dim, sizeof(int));

David Gileadi (12/22) Jun 09 2006 It seems that a fairly common desire is to use D for writing games.

John Reimer (16/40) Jun 09 2006 Didn't Walter point out that that, in the current gc implementation, you...

Dave (6/52) Jun 09 2006 Yea, in the current GC, allocation is the only thing that may implicitly...

Chad J (32/45) Jun 12 2006 This is what I'm screaming.

renox (14/18) Jun 04 2006 Uh, what's your definition of "best"?

Frank Benoit <keinfarbton nospam.xyz> writes:

Another garbage collector thread :)

The best garbage collector I can think about, is a reference counting
one. To implement this:

* Every object needs a reference counter field.
* Code using references, need to have code for incrementing/decrementing
the counters.
* Destructors are called immediatly if the counter goes zero.

What you get is a precise, incremental, realtime garbage collector.
* all logic behaviour is deterministic
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
* implement games and realtime apps without worry about lags
* implement secure application without worry about GC attack by valid
refence data values.
* consume only the needed memory
* no more need for the auto object

Jun 04 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

"GC, the simple solution"

No such thing.

Summary:

* Ref counting by default: good grief no!
* Ref counting as an option: good grief yes!
* Is there a perfect solution for memory management: don't be ridiculous.

Frank Benoit wrote:
 Another garbage collector thread :)
 
 The best garbage collector I can think about, is a reference counting
 one. To implement this:
 
 * Every object needs a reference counter field.
 * Code using references, need to have code for incrementing/decrementing
 the counters.
 * Destructors are called immediatly if the counter goes zero.
 
 What you get is a precise, incremental, realtime garbage collector.
 * all logic behaviour is deterministic
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns
 * implement games and realtime apps without worry about lags
 * implement secure application without worry about GC attack by valid
 refence data values.
 * consume only the needed memory
 * no more need for the auto object

What you also get that you might not want:

* because references are on the object, you can't have array slices
* it's also incompatible with pointers: if the pointer is to a member of
an object, how on earth do you update the refcount then?
* slower in the immediate sense (incref & decref calls, plus it'll mean
less code in L1 and L2 CPU cache, which can kill performance in some
architectures)
* requires all objects to have destructors to decref child objects
(implicit or otherwise)
* without auto, makes it impossible to ensure an object's destruction
when you leave scope: you can leak references
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be valid

Those last two are the main problem with reference counting, along with
the others and slightly added complexity in the compiler.

Python, however, does not suffer the cycle problem.  Know why?

Because they have a *mark & sweep* garbage collector to clean up the cycles.

Seriously, I don't want to see the current GC replaced by a reference
counting scheme.  The current system is just fine for the majority of
applications.  Where it falls down are real-time systems which can't
allow for the non-deterministic GC pauses, and where you want to keep
your memory profile to a minimum.  Anything that has pauses of activity
can just run a background GC thread to keep the memory footprint down.

To this end, I think implementing an incremental or concurrent GC should
be the next port of call for optimising D's memory management.

As it stands, you can implement reference counting yourself: you just
need to write an allocator, and call the incref and decref functions
manually.  However, I do think that having built-in *support* for
Reference counting would be fantastic.  The less that stuff like this is
left in the hands of the programmer, the better.

What I think we all need to keep in mind is that NO memory management
scheme is good for all cases.  There will always be some weird corner
case which doesn't work well with our scheme of choice, and that we
don't give a rat's about, but is VERY important to someone else.

So far D does automatic GC (the best default, IMHO [1]), manual
management, and RAII.  Throw optional ref counting in to the mix, and I
think we'll have pretty much 99% of cases covered.

	-- Daniel

[1] The best default since it's unobtrusive, and *safe*: unless you're
doing strange things, it's very hard to leave dangling references with a
GC.  This allows people to get on with writing the damn program, and
worry about managing memory later (if ever).

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Jun 04 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

In nearly every other GC implementation you will need also a read or
write barrier. With reference counting you have a write barriere.

Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
non-generational, conservative Phobos GC implementation is an exception.

But a write barrier alone doesn't make ref counting ridiculous. Well you
are right, it is not the solution for all and everything, but I think a
big step in the right direction. And mark&sweep GC should be the option.

 * because references are on the object, you can't have array slices

Don't understand that point.

 * it's also incompatible with pointers: if the pointer is to a member of
 an object, how on earth do you update the refcount then?

In the actual implememtation of the gc, every object's memory range is
registered by the gc. With that you can get the objects refcounter address.

 * slower in the immediate sense (incref & decref calls, plus it'll mean
 less code in L1 and L2 CPU cache, which can kill performance in some
 architectures)

Performance has two sides. Throughput is one thing, worst latency the
other. We can think about techniques to optimize unecessary
increments/decrements.
BTW all incremental GCs will also need read and/or write barriers, which
will have the same perfomance issues.

 * requires all objects to have destructors to decref child objects
 (implicit or otherwise)

Yes, all objects having child objects. But if you think, that destructor
calls are a performance issue, you will get problems with the mark&sweep
also.

 * without auto, makes it impossible to ensure an object's destruction
 when you leave scope: you can leak references

How can that happen, if we have them native? Well ok, e.g. if you store
the reference to a static variable, then it is not freed. But this is
really a programmers bug. But I think that is something an advanced look
at it will find solutions. First we have to decide to go for ref counting :)

 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still be valid

Right, here a manual ref=null is needed. And perhaps a mark&sweep
algorithm to detect such cycles in the debug version, called on request.
So if you know you program, you can avoid cycles, or you can break them
before releasing the last reference.

Doing a realtime GC seems to be not possible. The existing "realtime GC"
seems to me, incremental ones, which stop the world for a short limited
time. And that time goes into your worst latency. So if I want to do
things with a latency less 100us, there is no chance to implement a GC.
Ref counting whould be a solution, but this can only work if the whole
program does not depend on "Stop-the-world".


 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.

No, i can't.
* The dereference operator cannot be overwritten.
* Existing classes, native arrays etc. does not work like this.
I do not need a few classes managed with ref counting. I want to avoid
the "Stop the world". And phobos needs it.

Where is the proof that ref counting is so slow? In mark&sweep you have
to run over the whole memory. If the memory is low, you have to do that
often. This is really slow than.

Jun 04 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Frank Benoit wrote:
 In nearly every other GC implementation you will need also a read or
 write barrier. With reference counting you have a write barriere.
 
 Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
 non-generational, conservative Phobos GC implementation is an exception.
 
 But a write barrier alone doesn't make ref counting ridiculous. Well you
 are right, it is not the solution for all and everything, but I think a
 big step in the right direction. And mark&sweep GC should be the option.

I guess we'll just have to disagree, then :)

Like I said before, I think the current GC should be the default since,
even if it's not the best in terms of latency, it's the *safest*.
There's no need for someone coding under it to worry about things like
cycles, etc.  By having refcounting an option, we can say to
programmers: "turn this on if you like, but watch your back for cycles!"

<joking>Plus, the current system gives us bitchin' benchmarks with wc!
;)</joking>

 * because references are on the object, you can't have array slices

 
 Don't understand that point.

An array slice is a pointer and a length.  The problem is the "pointer"
part:

 * it's also incompatible with pointers: if the pointer is to a member of
 an object, how on earth do you update the refcount then?

 
 In the actual implememtation of the gc, every object's memory range is
 registered by the gc. With that you can get the objects refcounter address.

Aaah, ok: you're putting the refcount somewhere in the allocated chunk.
   Sorry, I thought you meant hide it as a member on objects.

However, I imagine that this is still going to do horrible things to
pointer operations in terms of speed.  Every time you move or copy a
pointer, you'd have to ask the GC "ok, find which pool this pointer is
located in, then inc/dec the refcount."

 * slower in the immediate sense (incref & decref calls, plus it'll mean
 less code in L1 and L2 CPU cache, which can kill performance in some
 architectures)

 
 Performance has two sides. Throughput is one thing, worst latency the
 other. We can think about techniques to optimize unecessary
 increments/decrements.
 BTW all incremental GCs will also need read and/or write barriers, which
 will have the same perfomance issues.

I concede that, as well as that the current GC has (probably) the worst
all-at-once latency you can get.

 * requires all objects to have destructors to decref child objects
 (implicit or otherwise)

 
 Yes, all objects having child objects. But if you think, that destructor
 calls are a performance issue, you will get problems with the mark&sweep
 also.

Well, I don't *need* destructors in most general objects under
mark&sweep.  It's only when they're holding some external resource, and
that's when I usually manage them with auto or scoped destruction.

... which would admittedly be easier with refcounting, if a little bit
slower :)

 * without auto, makes it impossible to ensure an object's destruction
 when you leave scope: you can leak references

 
 How can that happen, if we have them native? Well ok, e.g. if you store
 the reference to a static variable, then it is not freed. But this is
 really a programmers bug. But I think that is something an advanced look
 at it will find solutions. First we have to decide to go for ref counting :)

What I mean is that if I say "auto Foo x = ..." then I know that come
hell or high water, D is going to destroy that object at the end of
scope.  Plus, because of auto rules, I *cannot* leak a reference to that
object: D simply won't let me.

As for the static variable case, it's no different to the current
implementation: you save a reference to something in a global or static
variable, and it's never going to die :)

 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still be valid

 
 Right, here a manual ref=null is needed. And perhaps a mark&sweep
 algorithm to detect such cycles in the debug version, called on request.
 So if you know you program, you can avoid cycles, or you can break them
 before releasing the last reference.

The problem I have is that even if I *do* "know" my program, I don't
want to have to worry about this.  I just want to write my damn code!
This is one of the great advantages of D: if you don't give a rat's
about memory management, you don't have to.

That said, D *does* have one advantage: at compile time, the compiler
can determine if it's possible for an object to indirectly refer to
itself.  Using this, it can insert something like this into the decref:




That way, you know that cycles will get checked for *eventually*.

 Doing a realtime GC seems to be not possible.

Oh, you'll get no argument from me THERE. :)

 The existing "realtime GC"
 seems to me, incremental ones, which stop the world for a short limited
 time. And that time goes into your worst latency. So if I want to do
 things with a latency less 100us, there is no chance to implement a GC.
 Ref counting whould be a solution, but this can only work if the whole
 program does not depend on "Stop-the-world".

I saw an interesting paper once that was talking about using a
tri-colour mark&sweep collector that was fully concurrent: sounded
heinously complicated to write, but could run a full collect in the
background of your app.

Buggered if I can find the damn thing again, though...

 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.

 No, i can't.
 * The dereference operator cannot be overwritten.

MyRef.value

Yeah, it's not that great, but at least it's *possible* :)

 * Existing classes, native arrays etc. does not work like this.

Fair enough.

 I do not need a few classes managed with ref counting. I want to avoid
 the "Stop the world". And phobos needs it.

I don't think phobos "needs" it.  What I think would help immensely was
if phobos was easier to rebuild with different compiler flags,
collector, etc.

Would you be happy if it was trivial to recompile phobos using
refcounting instead of the current GC, and then use that?




 Where is the proof that ref counting is so slow? In mark&sweep you have
 to run over the whole memory. If the memory is low, you have to do that
 often. This is really slow than.

I don't have any proof on me, although knowing the internet I'm sure I
could find some "proof" without too much trouble :P

And yes; if memory is low, m&s will probably take a long time.
Actually, I should find out how long that is.  Would be interesting to
know exactly how long it takes to collect 512 meg of garbage :P

Personally, I don't mind how D's default collector is implemented, so
long as it is efficient in the general case, and I don't have to know
anything about the way it works in order to write code.

That said, I'd still like to see D support collectors other than the
base GC.  Ideally, I think D should support at least:







That way, we can all have our cake, be it chocolate cake, fruit cake or
cheesecake.  Plus, we even get to *eat* it :)

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Jun 04 2006

Lars Ivar Igesund <larsivar igesund.net> writes:

Daniel Keep wrote:

 
 
 Frank Benoit wrote:
 In nearly every other GC implementation you will need also a read or
 write barrier. With reference counting you have a write barriere.
 
 Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
 non-generational, conservative Phobos GC implementation is an exception.
 
 But a write barrier alone doesn't make ref counting ridiculous. Well you
 are right, it is not the solution for all and everything, but I think a
 big step in the right direction. And mark&sweep GC should be the option.

 
 I guess we'll just have to disagree, then :)
 
 Like I said before, I think the current GC should be the default since,
 even if it's not the best in terms of latency, it's the *safest*.
 There's no need for someone coding under it to worry about things like
 cycles, etc.  

Well, actually, the current GC is kinda bad with cycles. Try to make a
circular linked list where you keep a single additional reference to one of
the elements. Then remove the reference and collect. In theory this memory
should now "just" be lost/leaked, but you might also experience crashes. I
think Chris Miller had some interesting revelations on the subject.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi

Jun 05 2006

Larry Evans <cppljevans cos-internet.com> writes:

On 06/04/2006 05:40 AM, Daniel Keep wrote:

 "GC, the simple solution"

[snip]

 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still 

be valid


I think someone else in the thread mentioned this as a programmer error.
However, assuming the programmer intended this, then this programmer
must expect that the MS GC would break the cycle somewhere before
calling the destructors and he must not care where the cycle was
broken, because otherwise he would have used a weak pointer
to close the cycle where he wanted the break to occur.

Anyway, code for doing this GC breaking of cycles in RC
is at:

   http://tinyurl.com/reuzl

See code after comment:

  *  Breaks all arcs in cycle_arcs.

As mentioned in the code comments, this is Christopher's
method for collecting cycles and suffers a delay since it
keeps all smart ptrs in a list and processes the entire list.
Other methods don't keep this list but do a local mark-sweep
at each rc decrement with obvious time penalties.  A compromise
just uses a queue which, when it reaches a certain size,
plays the same role as Christopher's list of all smart pointers.

 Those last two are the main problem with reference counting, along with
 the others and slightly added complexity in the compiler.

 Python, however, does not suffer the cycle problem.  Know why?

 Because they have a *mark & sweep* garbage collector to clean up the 

cycles.

Then it must not automatically call destructors since then it would
suffer the 2nd problem above.


[snip]

 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.


Smart pointers would be a great help here, but since
D doesn't allow user definition of the assignment operator:

   http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html

that's not easy.  I'm a newbie, so what's the best way to implement
something like a smart pointer without having to manually:

   increment rc1;
   decrement rc2;
   assign the pointer to its target location;

Jun 05 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Sorry for the late (and short) reply: exams breathing down my neck and
all :)

Larry Evans wrote:
 On 06/04/2006 05:40 AM, Daniel Keep wrote:
 
 "GC, the simple solution"

 
 [snip]
 
 * inability to free cycles
 * inability to even *run* destructors where cycles are involved, because
 there is no order in which they can logically be executed and still be

 valid
 
 
 I think someone else in the thread mentioned this as a programmer error.
 However, assuming the programmer intended this, then this programmer
 must expect that the MS GC would break the cycle somewhere before
 calling the destructors and he must not care where the cycle was
 broken, because otherwise he would have used a weak pointer
 to close the cycle where he wanted the break to occur.
 
 Anyway, code for doing this GC breaking of cycles in RC
 is at:
 
   http://tinyurl.com/reuzl
 
 See code after comment:
 
  *  Breaks all arcs in cycle_arcs.
 
 As mentioned in the code comments, this is Christopher's
 method for collecting cycles and suffers a delay since it
 keeps all smart ptrs in a list and processes the entire list.
 Other methods don't keep this list but do a local mark-sweep
 at each rc decrement with obvious time penalties.  A compromise
 just uses a queue which, when it reaches a certain size,
 plays the same role as Christopher's list of all smart pointers.
 
 Those last two are the main problem with reference counting, along with
 the others and slightly added complexity in the compiler.

 Python, however, does not suffer the cycle problem.  Know why?

 Because they have a *mark & sweep* garbage collector to clean up the

 cycles.
 
 Then it must not automatically call destructors since then it would
 suffer the 2nd problem above.

I haven't read the above link since I really don't need to be distracted
any more than I already am at the moment.  I'll read that link later on
once I'm not quite so swamped :)

However, I will say this: if you have a cycle in Python, it will clean
it up normally PROVIDED destructors are not involved.  If they are, then
it just moves the whole cycle to a dead object list, since there is no
safe way to break the cycle.  At that point, it's up to the programmer
to deal with it.

 

 [snip]
 
 As it stands, you can implement reference counting yourself: you just
 need to write an allocator, and call the incref and decref functions
 manually.  However, I do think that having built-in *support* for
 Reference counting would be fantastic.  The less that stuff like this is
 left in the hands of the programmer, the better.

 
 
 Smart pointers would be a great help here, but since
 D doesn't allow user definition of the assignment operator:
 
   http://www.digitalmars.com/d/archives/digitalmars/D/learn/984.html
 
 that's not easy.  I'm a newbie, so what's the best way to implement
 something like a smart pointer without having to manually:
 
   increment rc1;
   decrement rc2;
   assign the pointer to its target location;

Not really, sorry.  Your best bet might be with a preprocessor.  I've
had plans to write an AST-based preprocessor for a while, but that's off
in the far, far future.  Maybe give Build 3.0 a look?  In that case,
you'd want to write a macro like:


==>
    (rc1).incref(); (rc2).decref(); rc2 = (rc1);

Again, sorry for the terse reply.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Jun 08 2006

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

 * Destructors are called immediatly if the counter goes zero.

This is not such a good thing! You don't really know who releases the last 
reference, so you don't really know when the object is deleted. If the 
object is being referenced from multiple threads, you don't know in what 
thread the destructor will run.

 * all logic behaviour is deterministic

See above.

 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns

When an object is being destructed, the references it has to other objects 
are being released. So in the destructor of object B, object A might have 
released about half its references.

 * consume only the needed memory

What about circular references? They'll never get released, no clean-up.

L.

Jun 04 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

Lionello Lunesu schrieb:
 * Destructors are called immediatly if the counter goes zero.

 
 This is not such a good thing! You don't really know who releases the last 
 reference, so you don't really know when the object is deleted. If the 
 object is being referenced from multiple threads, you don't know in what 
 thread the destructor will run.
 

Where is the disadvantage? If you release concurrent in different
threads you need some kind of syncronization. Than you know it. But this
is no issue of ref counting.
If you release a reference in the phobos GC, the object is collected
"perhaps" the next time. If there are no valid integer values,
"pointing"to your object. And after collecting, the sequence of
destructor calls is not defined.

 * all logic behaviour is deterministic

 
 See above.

See above.

 
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns

 
 When an object is being destructed, the references it has to other objects 
 are being released. So in the destructor of object B, object A might have 
 released about half its references.
 

If object B Dtor is called, its member variables are valid, and valid is
the object A. The B.~this() can do something with A. After the Dtor is
finished, the reference to A is nulled, and A is perhaps also deleted.
Seams perfect to me.



 * consume only the needed memory

 
 What about circular references? They'll never get released, no clean-up.

Yes, they need a solution. Perhaps a fall back mark&sweep collector, to
check for cicles.

 
 L.

Jun 04 2006

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

"Frank Benoit" <keinfarbton nospam.xyz> wrote in message 
news:e5uojm$18ji$1 digitaldaemon.com...
 Lionello Lunesu schrieb:
 * Destructors are called immediatly if the counter goes zero.

 This is not such a good thing! You don't really know who releases the 
 last
 reference, so you don't really know when the object is deleted. If the
 object is being referenced from multiple threads, you don't know in what
 thread the destructor will run.

 Where is the disadvantage? If you release concurrent in different
 threads you need some kind of syncronization. Than you know it. But this
 is no issue of ref counting.
 If you release a reference in the phobos GC, the object is collected
 "perhaps" the next time. If there are no valid integer values,
 "pointing"to your object. And after collecting, the sequence of
 destructor calls is not defined.

There are many resources that must be released by the thread by which they 
were acquired. There's not much synchronisation can do. If the last 
reference just happens to be released in a thread other than the one that 
created the object, then the resource will be released in the wrong thread.

As I've understood from Boehm's paper, one advantages of a GC is to fix this 
issue: a GC collect will call finalizers for the objects that were created 
in that thread. It's like keeping an object from being destructed until the 
thread that created the object calls malloc (which is were the GC does its 
thing).

Anyway, Boehm's paper is a good read. He (she?) also addresses the problems 
with reference counting.

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns

 When an object is being destructed, the references it has to other 
 objects
 are being released. So in the destructor of object B, object A might have
 released about half its references.

 If object B Dtor is called, its member variables are valid, and valid is
 the object A. The B.~this() can do something with A. After the Dtor is
 finished, the reference to A is nulled, and A is perhaps also deleted.
 Seams perfect to me.

I meant my example like this: object A has a reference to object B. During 
the destruction of A, the references it has to other objects are released, 
causing some objects being destructed, like B. Now B might (indirectly) use 
data from A, but A's in the middle of destruction. What I mean to tell with 
all this is that you still have to be carefull with your destructors. 
Especially when using a strong-pointer template. You can't always tell what 
state other objects are in in a destructor. They might be half way during 
destruction, having released some, but not all, of their references.

L.

Jun 04 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

 As I've understood from Boehm's paper, one advantages of a GC is to fix this 
 issue: a GC collect will call finalizers for the objects that were created 
 in that thread. It's like keeping an object from being destructed until the 
 thread that created the object calls malloc 

This is not true for the phobos GC. The GC runs in the thread which
calls new or gc.fullCollect.
But the Dtors are not called while other threads are running, because
the GC pauses all other threads. This is called the "Stop-the-world".


 I meant my example like this: object A has a reference to object B. During 
 the destruction of A, the references it has to other objects are released, 
 causing some objects being destructed, like B. Now B might (indirectly) use 
 data from A, but A's in the middle of destruction. What I mean to tell with 
 all this is that you still have to be carefull with your destructors. 
 Especially when using a strong-pointer template. You can't always tell what 
 state other objects are in in a destructor. They might be half way during 
 destruction, having released some, but not all, of their references.
 

Object A cannot be destroyed, if it is reachable directly or indirectly.
That is, the ref counting is for. So if A.~this() is called, nobody else
has a reference to A.

Jun 04 2006

Lars Ivar Igesund <larsivar igesund.net> writes:

Frank Benoit wrote:

 Another garbage collector thread :)
 
 The best garbage collector I can think about, is a reference counting
 one. To implement this:
 
 * Every object needs a reference counter field.
 * Code using references, need to have code for incrementing/decrementing
 the counters.
 * Destructors are called immediatly if the counter goes zero.
 
 What you get is a precise, incremental, realtime garbage collector.
 * all logic behaviour is deterministic
 * Member objects are still valid, while a destructor is called. => no
 more dispose/close patterns
 * implement games and realtime apps without worry about lags
 * implement secure application without worry about GC attack by valid
 refence data values.
 * consume only the needed memory
 * no more need for the auto object

Not entirely true in the general case, Frank ;)

There are always techniques to get around problems with an algorithm,
depending on what you want to achieve, but in general, a reference counted
GC is said to have these _negative_ properties: (paraphrased from Garbage
Collection: Algorithms for Automatic Dynamic Memory Management by Jones and
Lins)

* The cost of removing the last pointer to an object is unbounded
* Even if the general latency is good, the total overhead of adjusting
references is significantly greater than that of a tracing GC
* It has a substantial space overhead, making it less useful for smaller
heaps
* Cyclic references (already mentioned)

I think these makes RC less suited for the general case, whereas it might
fit very well in a tightly controlled system where low latency is required
(which I'm well aware that you need ;).

IMO, the best general GC (the default) would be one which can handle as many
cases as possible, and which is manipulable in those (preferably known)
cases which it don't handle out of the box. I think that one need to
combine features from several GC techniques (possibly also RC) to achieve
this. 

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi

Jun 04 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

 * The cost of removing the last pointer to an object is unbounded

delete can be unbound? I didn't know that.

 * Even if the general latency is good, the total overhead of adjusting
 references is significantly greater than that of a tracing GC


 * It has a substantial space overhead, making it less useful for smaller
 heaps

In codesize yes, but in heap size?
I though the tracing GC does need much more heap because it should be
called very seldom.

 * Cyclic references (already mentioned)
 
 I think these makes RC less suited for the general case, whereas it might
 fit very well in a tightly controlled system where low latency is required
 (which I'm well aware that you need ;).
 
 IMO, the best general GC (the default) would be one which can handle as many
 cases as possible, and which is manipulable in those (preferably known)
 cases which it don't handle out of the box. I think that one need to
 combine features from several GC techniques (possibly also RC) to achieve
 this. 
 

agreed.

Jun 05 2006

Lars Ivar Igesund <larsivar igesund.net> writes:

Frank Benoit wrote:

 
 * The cost of removing the last pointer to an object is unbounded

 delete can be unbound? I didn't know that.

The reason is that a delete might trigger an unknown additional amount of
decrefs leading to more deletes.

 * It has a substantial space overhead, making it less useful for smaller
 heaps

 In codesize yes, but in heap size?
 I though the tracing GC does need much more heap because it should be
 called very seldom.

Hmm, might have paraphrased somewhat wildly there, but the space overhead
_is_ considered a drawback, and reducing it will most likely lead to more
computation overhead instead.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi

Jun 05 2006

Sean Kelly <sean f4.ca> writes:

Lars Ivar Igesund wrote:
 Frank Benoit wrote:
 
 * The cost of removing the last pointer to an object is unbounded

 delete can be unbound? I didn't know that.

 
 The reason is that a delete might trigger an unknown additional amount of
 decrefs leading to more deletes.

And some allocators colaesce adjacent free blocks aggressively (ie. 
during delete).  Not to mention the fact that the allocator data is 
probably protected by a mutex, etc.  In general, if the allocator 
documentation makes no guarantees about progress then you may at least 
be stuck on a mutex while other (potentially low-priority) threads are 
making an (unbounded) allocation.


Sean

Jun 05 2006

Walter Bright <newshound digitalmars.com> writes:

Frank Benoit wrote:
 The best garbage collector I can think about, is a reference counting
 one.

Reference counting has its advantages, but some severe disadvantages:

1) cyclical data structures won't get free'd

2) every pointer copy requires an increment and a corresponding 
decrement - including when simply passing a reference to a function

3) in a multithreaded app, the incs and decs must be synchronized

4) exception handlers (finally's) must be inserted to handle all the 
decs so there are no leaks. Contrary to assertions otherwise, there is 
no such thing as "zero overhead exceptions."

5) in order to support slicing and interior pointers, as well as 
supporting reference counting on arbitrary allocations of non-object 
data, a separate "wrapper" object must be allocated for each allocation 
to be ref counted. This essentially doubles the number of allocations 
needed.

6) The wrapper object will mean that all pointers will need to be 
double-dereferenced to access the data.

7) Fixing the compiler to hide all this stuff from the programmer will 
make it difficult to interface cleanly with C.

8) Ref counting can fragment the heap thereby consuming more memory just 
like the gc can, though the gc typically will consume more memory overall.

9) Ref counting does not eliminated latency problems, it just reduces them.

The proposed C++ shared_ptr<>, which implements ref counting, suffers 
from all these faults. I haven't seen a heads up benchmark of 
shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> 
turned out to be a significant loser in terms of both performance and 
memory consumption.

That said, I'd like for D to optionally support some form of ref 
counting, as rc is better for managing scarce resources like file handles.

Jun 04 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

Ok, i see now that my "best and simplest" solution is neiter of both :)
But I don't want to throw the idea away so fast. Please lets discuss it
a bit further.

Let's think about the scenario of having reference counting (RC) and
mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
Python. If Python uses this, the idea cannot be so crazy.

 1) cyclical data structures won't get free'd

Combine RC and MS. So cycles are detected. It should be possible to
avoid cycles at all, if you don't want to have any MS pause.

 2) every pointer copy requires an increment and a corresponding
 decrement - including when simply passing a reference to a function

This is true for a smart pointer class. But is it really true for a
compiler supported RC?
If I call a func/method, I hold a valid reference to the object. While
this reference is copied down the stack, these are only temporary
reference. And all of them are release before the call returns to the
caller. You can ignore them all for RC. Only if the reference is stored,
the ref counter has to be incremented. This is only possible if this is
done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

Isn't a atomic inc/dec enough? (Don't know anything about the
performance of the asm "lock" instruction)

 4) exception handlers (finally's) must be inserted to handle all the
 decs so there are no leaks. Contrary to assertions otherwise, there is
 no such thing as "zero overhead exceptions."

Yes, this is additional overhead in code size and runtime. And there are
additional post destructors needed, to set all member refs to null,
which wasn't nulled by the dtor.

 5) in order to support slicing and interior pointers, as well as
 supporting reference counting on arbitrary allocations of non-object
 data, a separate "wrapper" object must be allocated for each allocation
 to be ref counted. This essentially doubles the number of allocations
 needed.

The MS GC allready has a memory registry. Isn't there stored, where an
object begins? Let the ref counter be in this registry. This should work
with all types and slicing and object member references.

 6) The wrapper object will mean that all pointers will need to be
 double-dereferenced to access the data.

=> 5.)

 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

hm, ok. Here is some more manual work necessary.
If you give the last reference to a C lib, isn't this a problem for the
MS GC also?

 8) Ref counting can fragment the heap thereby consuming more memory just
 like the gc can, though the gc typically will consume more memory overall.

Isn't the actual GC implementation also fragmenting the heap?
But I don't have any idea to this issue.

 9) Ref counting does not eliminated latency problems, it just reduces them.

Where do you mean is the latency with RC? The collecting work is done in
the smallest pieces all the time. Only the allocation of new memory
needs an unbound time to complete. But it does not need to stop all
other threads. e.g. hardware interrupts don't need to be disabled.

 
 The proposed C++ shared_ptr<>, which implements ref counting, suffers
 from all these faults. I haven't seen a heads up benchmark of
 shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
 turned out to be a significant loser in terms of both performance and
 memory consumption.
 
 That said, I'd like for D to optionally support some form of ref
 counting, as rc is better for managing scarce resources like file handles.

As I said, I would be happy to be able to switch the whole runtime
system to RC. If I would loose 50% performance this is acceptable for me
if I get latency < 100us.

Frank Benoit

Jun 04 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Just a few comments, since you invoked my name :P

Frank Benoit wrote:
 Let's think about the scenario of having reference counting (RC) and
 mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
 Python. If Python uses this, the idea cannot be so crazy.

Just wanted to point out that if I remember correctly (and I *could* be
wrong), the cycle checking is only done on decrefs, since that's the
only time that cycles become a problem.

 3) in a multithreaded app, the incs and decs must be synchronized

 
 Isn't a atomic inc/dec enough? (Don't know anything about the
 performance of the asm "lock" instruction)

They definitely have to be synched: because in decref you've got to:

1. ref--;
2. if( ref == 0 ) free(ptr);
3. if( obj.canHaveCycles ) registerForCycleCheck(obj);

If not more.  You then also need the incs synched with that, otherwise
you could end up "ref++"ing in the middle of "free(ptr)".

 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

 
 hm, ok. Here is some more manual work necessary.
 If you give the last reference to a C lib, isn't this a problem for the
 MS GC also?

MS' GC doesn't use refcounting (I assume you're talking about .NET,
btw).  In fact, to solve this problem, it uses pinning:

"""
When the runtime marshaler sees that your code is passing to native code
a reference to a managed reference object, it automatically pins the
object. What this means is that the object is put in a special state
where the garbage collector will neither move the object in memory nor
remove the object from memory. ...

When the native function returns, the marshaled object is automatically
unpinned. Automatic pinning is very convenient, but it raises another
question. What happens if the native function caches the pointer for use
later on? When the function returns, won't the collector be free to move
the object? The answer is yes, and the solution for your code in such a
situation is to manually pin the object using the
System.Runtime.InteropServices.GCHandle type.
"""

 -- http://msdn.microsoft.com/msdnmag/issues/04/10/NET/

 8) Ref counting can fragment the heap thereby consuming more memory just
 like the gc can, though the gc typically will consume more memory overall.

 
 Isn't the actual GC implementation also fragmenting the heap?
 But I don't have any idea to this issue.

Not as badly.  The current GC is pretty good at keeping fragmentation to
a minimum.  Fragmentation happens when you allocate lots of little
objects, then free some but not all, and end up with teeny-tiny "holes"
in memory that are hard to fill.

I think Walter said this because of his comment on using "wrapper"
objects to implement the references: you'd end up with lots of "holes"
in memory.

 9) Ref counting does not eliminated latency problems, it just reduces them.

 
 Where do you mean is the latency with RC? The collecting work is done in
 the smallest pieces all the time. Only the allocation of new memory
 needs an unbound time to complete. But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

I think he means that it doesn't make latency just disappear; it spreads
it over time.  It's the old "drop a frog into hot water and it'll jump
out, drop it in cold water and then boil it and it won't notice" anecdote.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Jun 04 2006

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

 They definitely have to be synched: because in decref you've got to:

 1. ref--;
 2. if( ref == 0 ) free(ptr);
 3. if( obj.canHaveCycles ) registerForCycleCheck(obj);

 If not more.  You then also need the incs synched with that, otherwise
 you could end up "ref++"ing in the middle of "free(ptr)".

The thread doing the ++ apparently has a pointer to the object without a 
reference. How could that ever happen?

L.

Jun 04 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Lionello Lunesu wrote:
 They definitely have to be synched: because in decref you've got to:

 1. ref--;
 2. if( ref == 0 ) free(ptr);
 3. if( obj.canHaveCycles ) registerForCycleCheck(obj);

 If not more.  You then also need the incs synched with that, otherwise
 you could end up "ref++"ing in the middle of "free(ptr)".

 
 The thread doing the ++ apparently has a pointer to the object without a 
 reference. How could that ever happen?
 
 L. 
 

*slaps forehead*

Of course; you're right.  I was thinking about race conditions, but
obviously got confused.

Please refer to my signature.  Specifically the "it may not even make
sense." part :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Jun 04 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

 If not more.  You then also need the incs synched with that, otherwise
 you could end up "ref++"ing in the middle of "free(ptr)".
 

As allready pointed out by Lionello, if the counter goes zero there is
no other manipulator.

 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

 hm, ok. Here is some more manual work necessary.
 If you give the last reference to a C lib, isn't this a problem for the
 MS GC also?


MS=mark&sweep :)

Jun 05 2006

Walter Bright <newshound digitalmars.com> writes:

Frank Benoit wrote:
 Ok, i see now that my "best and simplest" solution is neiter of both :)
 But I don't want to throw the idea away so fast. Please lets discuss it
 a bit further.

I'm not throwing it away so fast, I've thought about it for several 
years <g>.


 Let's think about the scenario of having reference counting (RC) and
 mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
 Python. If Python uses this, the idea cannot be so crazy.

Maybe, maybe not. Python isn't known for its speed. Java uses gc, not rc.


 1) cyclical data structures won't get free'd

 
 Combine RC and MS. So cycles are detected. It should be possible to
 avoid cycles at all, if you don't want to have any MS pause.

There are ways to avoid the RC cycle problem, but I don't know how 
expensive they are.


 2) every pointer copy requires an increment and a corresponding
 decrement - including when simply passing a reference to a function

 This is true for a smart pointer class. But is it really true for a
 compiler supported RC?

Yes.


 If I call a func/method, I hold a valid reference to the object. While
 this reference is copied down the stack, these are only temporary
 reference. And all of them are release before the call returns to the
 caller. You can ignore them all for RC.

This fails if the passed reference 'escapes'. It can escape by being 
assigned to a global, being assigned to a class member, being inserted 
into some data structure that lives longer than the function, and being 
returned by the function.

 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)
 
 3) in a multithreaded app, the incs and decs must be synchronized

 
 Isn't a atomic inc/dec enough?

That requires synchronizing.

 (Don't know anything about the
 performance of the asm "lock" instruction)

It's slow enough to be noticable.

 4) exception handlers (finally's) must be inserted to handle all the
 decs so there are no leaks. Contrary to assertions otherwise, there is
 no such thing as "zero overhead exceptions."

 
 Yes, this is additional overhead in code size and runtime. And there are
 additional post destructors needed, to set all member refs to null,
 which wasn't nulled by the dtor.
 
 5) in order to support slicing and interior pointers, as well as
 supporting reference counting on arbitrary allocations of non-object
 data, a separate "wrapper" object must be allocated for each allocation
 to be ref counted. This essentially doubles the number of allocations
 needed.

 
 The MS GC allready has a memory registry. Isn't there stored, where an
 object begins? Let the ref counter be in this registry. This should work
 with all types and slicing and object member references.

Yes, you can find out the beginning of an arbitrary pointer's memory 
block. But that isn't a cheap operation.

 6) The wrapper object will mean that all pointers will need to be
 double-dereferenced to access the data.

 
 => 5.)

Finding the start of the memory chunk will be several times more 
expensive than the double indirect.

 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

 
 hm, ok. Here is some more manual work necessary.
 If you give the last reference to a C lib, isn't this a problem for the
 MS GC also?

I don't know anything about the MS GC.

 8) Ref counting can fragment the heap thereby consuming more memory just
 like the gc can, though the gc typically will consume more memory overall.

 
 Isn't the actual GC implementation also fragmenting the heap?

Yes. malloc will fragment the heap, too. But only GC has the hope of 
doing compaction.

 But I don't have any idea to this issue.
 
 9) Ref counting does not eliminated latency problems, it just reduces them.

 
 Where do you mean is the latency with RC? The collecting work is done in
 the smallest pieces all the time. Only the allocation of new memory
 needs an unbound time to complete.

That's the same with gc. Only upon an allocation is the time unbound.

 But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

GC doesn't require disabling hardware interrupts. It does not need to 
stop any threads that do not hold references to GC allocated data.


 The proposed C++ shared_ptr<>, which implements ref counting, suffers
 from all these faults. I haven't seen a heads up benchmark of
 shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
 turned out to be a significant loser in terms of both performance and
 memory consumption.

 That said, I'd like for D to optionally support some form of ref
 counting, as rc is better for managing scarce resources like file handles.

 
 As I said, I would be happy to be able to switch the whole runtime
 system to RC. If I would loose 50% performance this is acceptable for me
 if I get latency < 100us.

Latency cannot be guaranteed even with malloc().

I don't know what constraints you have on the system you're developing. 
But when I've written real time interrupt code, the ISRs were written to 
not make any OS calls or any allocations. All they'd do is just gather 
data and post it to a FIFO buffer. A non-realtime thread would then pull 
the data out of the buffer and process it. The system would work without 
dropping data as long as the average (not maximum) processing time per 
datum was less than the data acquisition rate.

Jun 04 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

 I'm not throwing it away so fast, I've thought about it for several
 years <g>.

I meant myself, hehe.

 If I call a func/method, I hold a valid reference to the object. While
 this reference is copied down the stack, these are only temporary
 reference. And all of them are release before the call returns to the
 caller. You can ignore them all for RC.

 
 This fails if the passed reference 'escapes'. It can escape by being
 assigned to a global, being assigned to a class member, being inserted
 into some data structure that lives longer than the function, and being
 returned by the function.

Yes, and the compiler can catch these cases. With incr/decr the
parameter reference you win nothing. So it should be possible to
optimize them.

 3) in a multithreaded app, the incs and decs must be synchronized

 Isn't a atomic inc/dec enough?

 
 That requires synchronizing.
 
 (Don't know anything about the
 performance of the asm "lock" instruction)

 
 It's slow enough to be noticable.
 
 Yes, you can find out the beginning of an arbitrary pointer's memory
 block. But that isn't a cheap operation.

yikes, you are absolutely right.
So here comes the more ugly solution:
Every pointer is allways a struct with the pointer to the data and a
pointer to the refcounter.

 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

 hm, ok. Here is some more manual work necessary.
 If you give the last reference to a C lib, isn't this a problem for the
 MS GC also?

 
 I don't know anything about the MS GC.

you do :) MS=mark&sweep


 9) Ref counting does not eliminated latency problems, it just reduces
 them.

 Where do you mean is the latency with RC? The collecting work is done in
 the smallest pieces all the time. Only the allocation of new memory
 needs an unbound time to complete.

 
 That's the same with gc. Only upon an allocation is the time unbound.

And additional the time is unbound for the GC fullcollect...

 
 But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

 
 GC doesn't require disabling hardware interrupts. It does not need to
 stop any threads that do not hold references to GC allocated data.

If you want to have you ISR written in D, you have two choices:
a) be sure that you do not move a reference (refb=refa; refa=null;).
This can end up in GC deleting a living object. If you cannot guarantee
this, than
b) disable the interrupting code also while the GC is running. But this
is not acceptable, so go back to a) or have another GC.

 Latency cannot be guaranteed even with malloc().
 
 I don't know what constraints you have on the system you're developing.
 But when I've written real time interrupt code, the ISRs were written to
 not make any OS calls or any allocations. All they'd do is just gather
 data and post it to a FIFO buffer. A non-realtime thread would then pull
 the data out of the buffer and process it. The system would work without
 dropping data as long as the average (not maximum) processing time per
 datum was less than the data acquisition rate.

Yes, I will also not do any OS call, neither I would do a "new". And I
can easily check the code for not doing this. I even can do a check into
the GC allocator, that it is not called in a ISR.
But a moving reference can corrupt the working GC. And there is no way
to check for that.

Jun 05 2006

Walter Bright <newshound digitalmars.com> writes:

Frank Benoit wrote:
 This fails if the passed reference 'escapes'. It can escape by being
 assigned to a global, being assigned to a class member, being inserted
 into some data structure that lives longer than the function, and being
 returned by the function.

 Yes, and the compiler can catch these cases. With incr/decr the
 parameter reference you win nothing. So it should be possible to
 optimize them.

It can only do it if it has full source available. It also won't work 
for virtual functions, as it is not known at compile time what code will 
be called.

 So here comes the more ugly solution:
 Every pointer is allways a struct with the pointer to the data and a
 pointer to the refcounter.

It's the wrapper approach, with two allocations per allocation.


 7) Fixing the compiler to hide all this stuff from the programmer will
 make it difficult to interface cleanly with C.

 hm, ok. Here is some more manual work necessary.
 If you give the last reference to a C lib, isn't this a problem for the
 MS GC also?

 I don't know anything about the MS GC.

 
 you do :) MS=mark&sweep

I thought you meant MS=Microsoft <g>. No, it isn't a problem, as it 
scans the C data segment and stack, too.


 9) Ref counting does not eliminated latency problems, it just reduces
 them.

 Where do you mean is the latency with RC? The collecting work is done in
 the smallest pieces all the time. Only the allocation of new memory
 needs an unbound time to complete.

 That's the same with gc. Only upon an allocation is the time unbound.

 And additional the time is unbound for the GC fullcollect...

The collector only runs during an allocation request - which means you 
can completely control when a collection can happen.


 But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

 GC doesn't require disabling hardware interrupts. It does not need to
 stop any threads that do not hold references to GC allocated data.

 If you want to have you ISR written in D, you have two choices:
 a) be sure that you do not move a reference (refb=refa; refa=null;).
 This can end up in GC deleting a living object. If you cannot guarantee
 this, than
 b) disable the interrupting code also while the GC is running. But this
 is not acceptable, so go back to a) or have another GC.

c) code the ISR so it does not reference any gc allocated data. For 
example, use malloc() for dynamic data the ISR needs.

d) make sure any gc references used by the ISR have another reference to 
them that the gc will scan (such as in the global data segment).

Jun 05 2006

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <e60vmu$1b6t$1 digitaldaemon.com>, Walter Bright says...

..
 9) Ref counting does not eliminated latency problems, it just reduces
 them.

 Where do you mean is the latency with RC? The collecting work is done in
 the smallest pieces all the time. Only the allocation of new memory
 needs an unbound time to complete.

 That's the same with gc. Only upon an allocation is the time unbound.

 And additional the time is unbound for the GC fullcollect...

The collector only runs during an allocation request - which means you 
can completely control when a collection can happen.

Correct me if I'm wrong, but while this is true for M + S, doesn't RC /kind of/
have the opposite problem?  In refcounting systems, allocation would be mostly
predictable, a la malloc, but dereferences can trigger a long chain of other
dereferences.

I say 'kind of' because it is more deterministic than GC - you can always dig
deeper to see what is going to be freed by a given dereference.

For instance, if you are the last person to let go of a hash table, you have to
wait for the whole thing to unravel, along with any objects stored inside.

I guess a real-time RC implementation could be made by keeping a 'recycling bin'
and only taking apart N objects at a time, i.e. incremental freeing.  This would
un-guarantee the prompt running of destructors though (which seems to be one of
the big rationales for RC in D, right, at least for non-memory objects?)

Kevin

Jun 07 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

 Isn't a atomic inc/dec enough?

 
 That requires synchronizing.
 

Are you sure? It says in 
http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
"In a multi-threaded system, if threads are preemptively scheduled, 
reference counting requires additional care to ensure that the updates 
to reference counts are atomic. This is straightforward using hardware 
primitives such as fetch-and-add, compare-and-swap, or 
load-linked/store-conditional, but it is definitely necessary to pay 
attention to this detail." ?




-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 07 2006

Sean Kelly <sean f4.ca> writes:

Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

 Isn't a atomic inc/dec enough?

 That requires synchronizing.

 
 Are you sure? It says in 
 http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
 "In a multi-threaded system, if threads are preemptively scheduled, 
 reference counting requires additional care to ensure that the updates 
 to reference counts are atomic. This is straightforward using hardware 
 primitives such as fetch-and-add, compare-and-swap, or 
 load-linked/store-conditional, but it is definitely necessary to pay 
 attention to this detail." ?

These are synchronizing operations.


Sean

Jun 07 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Sean Kelly wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

 Isn't a atomic inc/dec enough?

 That requires synchronizing.

 Are you sure? It says in 
 http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
 "In a multi-threaded system, if threads are preemptively scheduled, 
 reference counting requires additional care to ensure that the updates 
 to reference counts are atomic. This is straightforward using hardware 
 primitives such as fetch-and-add, compare-and-swap, or 
 load-linked/store-conditional, but it is definitely necessary to pay 
 attention to this detail." ?

 
 These are synchronizing operations.
 
 
 Sean

By "That requires synchronizing." I thought Walter meant stuff with 
mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware 
synchronizing operations be called "atomic inc/dec" ?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 10 2006

Sean Kelly <sean f4.ca> writes:

Bruno Medeiros wrote:
 Sean Kelly wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if 
 this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

 Isn't a atomic inc/dec enough?

 That requires synchronizing.

 Are you sure? It says in 
 http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting that:
 "In a multi-threaded system, if threads are preemptively scheduled, 
 reference counting requires additional care to ensure that the 
 updates to reference counts are atomic. This is straightforward using 
 hardware primitives such as fetch-and-add, compare-and-swap, or 
 load-linked/store-conditional, but it is definitely necessary to pay 
 attention to this detail." ?

 These are synchronizing operations.

 
 By "That requires synchronizing." I thought Walter meant stuff with 
 mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware 
 synchronizing operations be called "atomic inc/dec" ?

For the most part.  There are really two issues here.  First, the 
operation must be atomic with respect to other threads.  The x86 
actually guarantees this for all load/store operations on aligned data 
<= the bus size.  Second, the operations must be observed in the proper 
order by other threads--ie. the compiler and all involved CPUs (reader 
and writer) are not allowed to reorder the operations or do any other 
fancy tricks that may result in a load/store order that isn't the same 
as program order.  For the compiler, this is where volatile comes in, 
and at the hardware level, a synchronizing operation may be required 
depending on the situation and on the hardware involved.  Again, with 
x86 you can get away without a LOCK a lot of the time, but not always. 
So you really must assume that any competing operation has a worst case 
cost of... well, a lot.


Sean

Jun 10 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Sean Kelly wrote:
 Bruno Medeiros wrote:
 Sean Kelly wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Only if the reference is stored,
 the ref counter has to be incremented. This is only possible if 
 this is
 done by the compiler. No smart pointer can do this :)

 3) in a multithreaded app, the incs and decs must be synchronized

 Isn't a atomic inc/dec enough?

 That requires synchronizing.

 Are you sure? It says in 
 http://www.iecc.com/gclist/GC-algorithms.html#Reference%20counting 
 that:
 "In a multi-threaded system, if threads are preemptively scheduled, 
 reference counting requires additional care to ensure that the 
 updates to reference counts are atomic. This is straightforward 
 using hardware primitives such as fetch-and-add, compare-and-swap, 
 or load-linked/store-conditional, but it is definitely necessary to 
 pay attention to this detail." ?

 These are synchronizing operations.

 By "That requires synchronizing." I thought Walter meant stuff with 
 mutexes, etc. (i.e., non-trivial stuff), because, can't those hardware 
 synchronizing operations be called "atomic inc/dec" ?

 
 For the most part.  There are really two issues here.  First, the 
 operation must be atomic with respect to other threads.  The x86 
 actually guarantees this for all load/store operations on aligned data 
 <= the bus size.  Second, the operations must be observed in the proper 
 order by other threads--ie. the compiler and all involved CPUs (reader 
 and writer) are not allowed to reorder the operations or do any other 
 fancy tricks that may result in a load/store order that isn't the same 
 as program order.  For the compiler, this is where volatile comes in, 
 and at the hardware level, a synchronizing operation may be required 
 depending on the situation and on the hardware involved.  Again, with 
 x86 you can get away without a LOCK a lot of the time, but not always. 
 So you really must assume that any competing operation has a worst case 
 cost of... well, a lot.
 
 
 Sean

Ah, I see. I had recently read about relative out-of-order execution 
problems in the Memory Barrier wikipedia entry, and I got the impression 
(out of nowhere) that the hardware took care of that transparently 
(i.e., without program intervention), but then, that is not the case, 
not allways at least?

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 12 2006

Sean Kelly <sean f4.ca> writes:

Bruno Medeiros wrote:
 
 Ah, I see. I had recently read about relative out-of-order execution 
 problems in the Memory Barrier wikipedia entry, and I got the impression 
 (out of nowhere) that the hardware took care of that transparently 
 (i.e., without program intervention), but then, that is not the case, 
 not allways at least?

A single CPU is allowed to do whatever it wants so long as it can fool 
the user into thinking it's executing instructions in a purely 
sequential manner.  However, the only information other CPUs in the 
system have is what they observe on the system bus.  Obviously, so long 
as CPUs aren't sharing data there aren't any problems.  But things get 
sticky when this isn't the case.  The memory model defines observable 
behavior allowed for a given architecture, as well as any methods 
offered for affecting that behavior.

Say a specific architecture can operate much more quickly if it is 
allowed to perform writes to memory (from cache) in order of ascending 
address rather than in the order they were issued in code.  There's no 
way to lie to other CPUs about the order in which writes occur and still 
have the optimization have any effect, so the designers state in the 
memory model spec that this architecture is allowed to reorder writes 
and then typically provide some means for overriding this behavior 
should it prove necessary.

Now let's suppose you have two threads doing something like this:

   thread/CPU 1:

     A = 1;
     B = 2;

   thread/CPU 2:

     if( A == 1 )
     {
       assert( B == 2 );
     }

Given the order in which a and b were declared, and therefore the order 
in which the writes occur, this assert may or may not fail.

Enter memory barriers.  Memory barriers are simply a way for the 
programmer to tell the CPU "I don't care if your way is faster, A simply 
must be written to memory before B or thread 2 will explode."  So the 
CPU behind thread 1 does as it's told at great expense to performance 
and thread 2 doesn't melt down.

The sticky part is that hardware designers don't agree with one another 
on how things should work and they never take the advice of the software 
people, so all architectures have different sets of observable behavior 
and different methods for working around it when necessary.  However, 
the common concept is that memory barriers all constrain the order in 
which memory accesses may occur with respect to each other.  Think of it 
as an assertion that "X may not occur before Y" or "X may not occur 
after Y" at the instruction level.

The x86 is actually a bit weird in this regard as it has no formal 
memory barriers for normal operations (though it has the FENCE 
instructions for SSE use).  I think this is largely for historical 
reasons--x86 PCs couldn't do SMP at all until fairly recently so none of 
this mattered, and the memory model has always been fairly strict (it 
was actually sequential until not terribly long ago).  Also, the LOCK 
instruction acts as a heavy-handed sort of memory barrier as well, so 
there has been little motivation to add new instructions for 
finer-grained control.


Sean

Jun 13 2006

Daniel Keep <daniel.keep.lists gmail.com> writes:

Sean Kelly wrote:
 Bruno Medeiros wrote:
 Ah, I see. I had recently read about relative out-of-order execution
 problems in the Memory Barrier wikipedia entry, and I got the
 impression (out of nowhere) that the hardware took care of that
 transparently (i.e., without program intervention), but then, that is
 not the case, not allways at least?

 
 A single CPU is allowed to do whatever it wants so long as it can fool
 the user into thinking it's executing instructions in a purely
 sequential manner.  However, the only information other CPUs in the
 system have is what they observe on the system bus.  Obviously, so long
 as CPUs aren't sharing data there aren't any problems.  But things get
 sticky when this isn't the case.  The memory model defines observable
 behavior allowed for a given architecture, as well as any methods
 offered for affecting that behavior.
 
 Say a specific architecture can operate much more quickly if it is
 allowed to perform writes to memory (from cache) in order of ascending
 address rather than in the order they were issued in code.  There's no
 way to lie to other CPUs about the order in which writes occur and still
 have the optimization have any effect, so the designers state in the
 memory model spec that this architecture is allowed to reorder writes
 and then typically provide some means for overriding this behavior
 should it prove necessary.
 
 Now let's suppose you have two threads doing something like this:
 
   thread/CPU 1:
 
     A = 1;
     B = 2;
 
   thread/CPU 2:
 
     if( A == 1 )
     {
       assert( B == 2 );
     }
 
 Given the order in which a and b were declared, and therefore the order
 in which the writes occur, this assert may or may not fail.
 
 Enter memory barriers.  Memory barriers are simply a way for the
 programmer to tell the CPU "I don't care if your way is faster, A simply
 must be written to memory before B or thread 2 will explode."  So the
 CPU behind thread 1 does as it's told at great expense to performance
 and thread 2 doesn't melt down.
 
 The sticky part is that hardware designers don't agree with one another
 on how things should work and they never take the advice of the software
 people, so all architectures have different sets of observable behavior
 and different methods for working around it when necessary.  However,
 the common concept is that memory barriers all constrain the order in
 which memory accesses may occur with respect to each other.  Think of it
 as an assertion that "X may not occur before Y" or "X may not occur
 after Y" at the instruction level.
 
 The x86 is actually a bit weird in this regard as it has no formal
 memory barriers for normal operations (though it has the FENCE
 instructions for SSE use).  I think this is largely for historical
 reasons--x86 PCs couldn't do SMP at all until fairly recently so none of
 this mattered, and the memory model has always been fairly strict (it
 was actually sequential until not terribly long ago).  Also, the LOCK
 instruction acts as a heavy-handed sort of memory barrier as well, so
 there has been little motivation to add new instructions for
 finer-grained control.
 
 
 Sean

Cool; learn something new every day.  Thanks for the informative post.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Jun 14 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Sean Kelly wrote:
 Bruno Medeiros wrote:
 Ah, I see. I had recently read about relative out-of-order execution 
 problems in the Memory Barrier wikipedia entry, and I got the 
 impression (out of nowhere) that the hardware took care of that 
 transparently (i.e., without program intervention), but then, that is 
 not the case, not allways at least?

 
 A single CPU is allowed to do whatever it wants so long as it can fool 
 the user into thinking it's executing instructions in a purely 
 sequential manner.  However, the only information other CPUs in the 
 system have is what they observe on the system bus.  Obviously, so long 
 as CPUs aren't sharing data there aren't any problems.  But things get 
 sticky when this isn't the case.  The memory model defines observable 
 behavior allowed for a given architecture, as well as any methods 
 offered for affecting that behavior.
 
 Say a specific architecture can operate much more quickly if it is 
 allowed to perform writes to memory (from cache) in order of ascending 
 address rather than in the order they were issued in code.  There's no 
 way to lie to other CPUs about the order in which writes occur and still 
 have the optimization have any effect, so the designers state in the 
 memory model spec that this architecture is allowed to reorder writes 
 and then typically provide some means for overriding this behavior 
 should it prove necessary.
 
 Now let's suppose you have two threads doing something like this:
 
   thread/CPU 1:
 
     A = 1;
     B = 2;
 
   thread/CPU 2:
 
     if( A == 1 )
     {
       assert( B == 2 );
     }
 
 Given the order in which a and b were declared, and therefore the order 
 in which the writes occur, this assert may or may not fail.
 
 Enter memory barriers.  Memory barriers are simply a way for the 
 programmer to tell the CPU "I don't care if your way is faster, A simply 
 must be written to memory before B or thread 2 will explode."  So the 
 CPU behind thread 1 does as it's told at great expense to performance 
 and thread 2 doesn't melt down.
 
 The sticky part is that hardware designers don't agree with one another 
 on how things should work and they never take the advice of the software 
 people, so all architectures have different sets of observable behavior 
 and different methods for working around it when necessary.  However, 
 the common concept is that memory barriers all constrain the order in 
 which memory accesses may occur with respect to each other.  Think of it 
 as an assertion that "X may not occur before Y" or "X may not occur 
 after Y" at the instruction level.
 
 The x86 is actually a bit weird in this regard as it has no formal 
 memory barriers for normal operations (though it has the FENCE 
 instructions for SSE use).  I think this is largely for historical 
 reasons--x86 PCs couldn't do SMP at all until fairly recently so none of 
 this mattered, and the memory model has always been fairly strict (it 
 was actually sequential until not terribly long ago).  Also, the LOCK 
 instruction acts as a heavy-handed sort of memory barrier as well, so 
 there has been little motivation to add new instructions for 
 finer-grained control.
 
 
 Sean

Nice post!
Makes me think, how does one keep up with this? I mean, one who isn't 
(nor wishes to be) a hardware expert, but wants to keep up with the 
general developments in this area, thus maintaining an overview of it.

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 14 2006

Sean Kelly <sean f4.ca> writes:

Bruno Medeiros wrote:
 Makes me think, how does one keep up with this? I mean, one who isn't 
 (nor wishes to be) a hardware expert, but wants to keep up with the 
 general developments in this area, thus maintaining an overview of it.

comp.programming.threads is worth keeping an eye on, though the jargon 
can get a bit thick there at times.  The C++ committee is also working 
on a new memory model for C++, so any discussion there may be useful. 
The easiest way to keep an eye on this is follow the links from Hans 
Boehm's website  (http://www.hpl.hp.com/personal/Hans_Boehm/) though you 
could keep an eye on comp.std.c++ as well if you're having trouble 
staying awake at night.  Finally, any paper with Maurice Herlihy's name 
on it is a useful resource if you want a deeper understanding of some of 
these ideas and don't mind some research.  He's got a website with links 
to many of his papers, though IIRC some of them are hard to track down 
without an ACM membership.


Sean

Jun 14 2006

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

Interesting indeed!

I'm currently reading Boehm's article "Threads Cannot be Implemented as a 
Library". I wonder if this means that threads should become primitives of 
some sort [in D] ?

L.

Jun 14 2006

Sean Kelly <sean f4.ca> writes:

Lionello Lunesu wrote:
 Interesting indeed!
 
 I'm currently reading Boehm's article "Threads Cannot be Implemented as a 
 Library". I wonder if this means that threads should become primitives of 
 some sort [in D] ?

It's been a while since I read that article, but I think D already has 
enough in-language recognition of threading issues to escape most/all of 
the problems Boehm mentions: Thread is a standard library component 
rather than a third-party component, synchronized is available for 
mutual exclusion, and volatile (plus perhaps inline asm code) is enough 
to manage lock-free work.  A more robust multithread-aware memory model 
would be good to have, but it's a sticky enough issue that I think 
Walter is right for waiting on that for now.


Sean

Jun 14 2006

Lionello Lunesu <lio lunesu.remove.com> writes:

Sean Kelly wrote:
 Lionello Lunesu wrote:
 Interesting indeed!

 I'm currently reading Boehm's article "Threads Cannot be Implemented 
 as a Library". I wonder if this means that threads should become 
 primitives of some sort [in D] ?

 
 It's been a while since I read that article, but I think D already has 
 enough in-language recognition of threading issues to escape most/all of 
 the problems Boehm mentions: Thread is a standard library component 
 rather than a third-party component, synchronized is available for 
 mutual exclusion, and volatile (plus perhaps inline asm code) is enough 
 to manage lock-free work.  A more robust multithread-aware memory model 
 would be good to have, but it's a sticky enough issue that I think 
 Walter is right for waiting on that for now.

...that leaves only atomic operations? Or would "volatile" or 
"synchronized" take care of memory fencing and such?

Would be nice if there were an built-in type with overloaded ++x, x++, 
x=y, all atomic. (I've seen some C++ templates that do this.)

L.

Jun 15 2006

Sean Kelly <sean f4.ca> writes:

Lionello Lunesu wrote:
 Sean Kelly wrote:
 Lionello Lunesu wrote:
 Interesting indeed!

 I'm currently reading Boehm's article "Threads Cannot be Implemented 
 as a Library". I wonder if this means that threads should become 
 primitives of some sort [in D] ?

 It's been a while since I read that article, but I think D already has 
 enough in-language recognition of threading issues to escape most/all 
 of the problems Boehm mentions: Thread is a standard library component 
 rather than a third-party component, synchronized is available for 
 mutual exclusion, and volatile (plus perhaps inline asm code) is 
 enough to manage lock-free work.  A more robust multithread-aware 
 memory model would be good to have, but it's a sticky enough issue 
 that I think Walter is right for waiting on that for now.

 
 ...that leaves only atomic operations? Or would "volatile" or 
 "synchronized" take care of memory fencing and such?

"volatile" is actually intended to address compiler optimizations in a 
similar way to how memory barriers address CPU optimizations.  It is a 
necessary part of any lock-free operation in D.  "synchronized" is used 
for mutual exclusion and typically involves a mutex, so while it should 
have the proper effect, it's not lock-free.

 Would be nice if there were an built-in type with overloaded ++x, x++, 
 x=y, all atomic. (I've seen some C++ templates that do this.)

It would be nice, but I've yet to see a proposal that I like.  The 
problem is that in addition to pure atomic operations (which is how the 
x86 typically works by default) lock-free programmers typically want to 
make an assertion about instruction ordering as well, and built-in 
semantics don't expose this very cleanly.  One could simply have 
volatile types similar to Java where no optimization is allowed on them 
at all, but this may be too heavy-handed to satisfy some folks.  For 
now, I think a library solution is a reasonable alternative.  Ares has 
had one for quite a while and it's almost standalone so it could be used 
with Phobos with very little effort.  The documentation is here (DDoc 
seems to want to generate docs for private template functions, so I 
apologize for the clutter):

http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html

And the code itself is here:

http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d

It currently only works on the x86 and is really intended for API 
developers, but it does the trick and I'll expand it as needed.  For 
typical use, msync.acq (acquire), msync.rel (release), and msync.none 
are the flags you're likely to care about.  The other more fine-grained 
flags just alias acq/rel on x86 anyway.  In case the docs are confusing, 
the API calls supported are:

load
store
storeIf (ie. CAS)
increment
decrement


Sean

Jun 15 2006

Lionello Lunesu <lio lunesu.remove.com> writes:

Sean Kelly wrote:
 Lionello Lunesu wrote:
 ...that leaves only atomic operations? Or would "volatile" or 
 "synchronized" take care of memory fencing and such?

 
 "volatile" is actually intended to address compiler optimizations in a 
 similar way to how memory barriers address CPU optimizations.  It is a 
 necessary part of any lock-free operation in D.  "synchronized" is used 
 for mutual exclusion and typically involves a mutex, so while it should 
 have the proper effect, it's not lock-free.

So I suppose "volatile" could be extended to include memory locking as well!

// instructions inside this block are 'lock'ed
volatile(atomic) {
	++a;
	a = x;
}

(or perhaps even "synchronized(memory) {...}", but I think extending 
volatile makes more sense).

Come to think of it, doesn't "synchronized" also imply "volatile"?

Or, another possibility would be to add the volatile declaration as in 
C, but then actually locking all access to the variable:

volatile int i;
i++;// atomic

Seems to me it should be possibly to extend D with a built-in and 
portable construct for these atomic operations.

 It would be nice, but I've yet to see a proposal that I like.  The 
 problem is that in addition to pure atomic operations (which is how the 
 x86 typically works by default) lock-free programmers typically want to 
 make an assertion about instruction ordering as well, and built-in 
 semantics don't expose this very cleanly.  One could simply have 
 volatile types similar to Java where no optimization is allowed on them 
 at all, but this may be too heavy-handed to satisfy some folks.  For 
 now, I think a library solution is a reasonable alternative.  Ares has 
 had one for quite a while and it's almost standalone so it could be used 
 with Phobos with very little effort.  The documentation is here (DDoc 
 seems to want to generate docs for private template functions, so I 
 apologize for the clutter):
 
 http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html
 
 And the code itself is here:
 
 http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d


Thanks, I'll have a look at them.

L.

Jun 16 2006

Sean Kelly <sean f4.ca> writes:

Lionello Lunesu wrote:
 Sean Kelly wrote:
 Lionello Lunesu wrote:
 ...that leaves only atomic operations? Or would "volatile" or 
 "synchronized" take care of memory fencing and such?

 "volatile" is actually intended to address compiler optimizations in a 
 similar way to how memory barriers address CPU optimizations.  It is a 
 necessary part of any lock-free operation in D.  "synchronized" is 
 used for mutual exclusion and typically involves a mutex, so while it 
 should have the proper effect, it's not lock-free.

 
 So I suppose "volatile" could be extended to include memory locking as 
 well!

Yes it could, though for now I prefer it the way it is as it offers more 
control over exactly what's going on.  It helps tremendously that D has 
such good inline asm support.

 // instructions inside this block are 'lock'ed
 volatile(atomic) {
     ++a;
     a = x;
 }
 
 (or perhaps even "synchronized(memory) {...}", but I think extending 
 volatile makes more sense).
 
 Come to think of it, doesn't "synchronized" also imply "volatile"?

Yes.  Synchronization mechanisms rely on memory barriers to work 
properly.  In fact, it's been suggested in the past on c.p.t that an 
empty mutex block:

     synchronzed {}

could be used as a high-level sort of bidirectional memory barrier, 
except for the risk of the compiler optimizing the call away entirely.

 Or, another possibility would be to add the volatile declaration as in 
 C, but then actually locking all access to the variable:
 
 volatile int i;
 i++;// atomic

This is sort of how Java implements volatile and it wouldn't surprise me 
if C++ did something similar.  Right now, the "volatile" qualifier 
offers basically no language guarantees for concurrent programming, as 
it was actually intended for accessing interrupt data IIRC.

 Seems to me it should be possibly to extend D with a built-in and 
 portable construct for these atomic operations.

Definately.  The trouble is coming up with a good design :-)  I think 
the C++ team is definately qualified to do so, but they may have to make 
some sacrifices in the interest of time.


Sean

Jun 16 2006

Don Clugston <dac nospam.com.au> writes:

Sean Kelly wrote:
 Lionello Lunesu wrote:
 Sean Kelly wrote:
 Lionello Lunesu wrote:
 Interesting indeed!

 I'm currently reading Boehm's article "Threads Cannot be Implemented 
 as a Library". I wonder if this means that threads should become 
 primitives of some sort [in D] ?

 It's been a while since I read that article, but I think D already 
 has enough in-language recognition of threading issues to escape 
 most/all of the problems Boehm mentions: Thread is a standard library 
 component rather than a third-party component, synchronized is 
 available for mutual exclusion, and volatile (plus perhaps inline asm 
 code) is enough to manage lock-free work.  A more robust 
 multithread-aware memory model would be good to have, but it's a 
 sticky enough issue that I think Walter is right for waiting on that 
 for now.

 ...that leaves only atomic operations? Or would "volatile" or 
 "synchronized" take care of memory fencing and such?

 
 "volatile" is actually intended to address compiler optimizations in a 
 similar way to how memory barriers address CPU optimizations.  It is a 
 necessary part of any lock-free operation in D.  "synchronized" is used 
 for mutual exclusion and typically involves a mutex, so while it should 
 have the proper effect, it's not lock-free.
 
 Would be nice if there were an built-in type with overloaded ++x, x++, 
 x=y, all atomic. (I've seen some C++ templates that do this.)

 
 It would be nice, but I've yet to see a proposal that I like.  The 
 problem is that in addition to pure atomic operations (which is how the 
 x86 typically works by default) lock-free programmers typically want to 
 make an assertion about instruction ordering as well, and built-in 
 semantics don't expose this very cleanly.  One could simply have 
 volatile types similar to Java where no optimization is allowed on them 
 at all, but this may be too heavy-handed to satisfy some folks.  For 
 now, I think a library solution is a reasonable alternative.  Ares has 
 had one for quite a while and it's almost standalone so it could be used 
 with Phobos with very little effort. 

I'd really like to see this included in the standard distribution.
Has anyone made a general-purpose lock-free linked list? Just a 
low-level list of void * pointers would be fantastic.

  The documentation is here (DDoc
 seems to want to generate docs for private template functions, so I 
 apologize for the clutter):
 
 http://svn.dsource.org/projects/ares/trunk/doc/ares/std/atomic.html
 
 And the code itself is here:
 
 http://svn.dsource.org/projects/ares/trunk/src/ares/std/atomic.d
 
 It currently only works on the x86 and is really intended for API 
 developers, but it does the trick and I'll expand it as needed.  For 
 typical use, msync.acq (acquire), msync.rel (release), and msync.none 
 are the flags you're likely to care about.  The other more fine-grained 
 flags just alias acq/rel on x86 anyway.  In case the docs are confusing, 
 the API calls supported are:
 
 load
 store
 storeIf (ie. CAS)
 increment
 decrement
 
 
 Sean

Jun 16 2006

Sean Kelly <sean f4.ca> writes:

Don Clugston wrote:
 
 I'd really like to see this included in the standard distribution.
 Has anyone made a general-purpose lock-free linked list? Just a 
 low-level list of void * pointers would be fantastic.

Mango contains some of Doug Lea's containers and there may be one there. 
  If not, it shouldn't be tremendously difficult to implement or port 
one.  I'm not sure I have the time right now, but if someone wants to 
the the atomic package in Ares they're more than welcome to.


Sean

Jun 16 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Sean Kelly wrote:
 Bruno Medeiros wrote:
 Makes me think, how does one keep up with this? I mean, one who isn't 
 (nor wishes to be) a hardware expert, but wants to keep up with the 
 general developments in this area, thus maintaining an overview of it.

 
 comp.programming.threads is worth keeping an eye on, though the jargon 
 can get a bit thick there at times.  The C++ committee is also working 
 on a new memory model for C++, so any discussion there may be useful. 
 The easiest way to keep an eye on this is follow the links from Hans 
 Boehm's website  (http://www.hpl.hp.com/personal/Hans_Boehm/) though you 
 could keep an eye on comp.std.c++ as well if you're having trouble 
 staying awake at night.  Finally, any paper with Maurice Herlihy's name 
 on it is a useful resource if you want a deeper understanding of some of 
 these ideas and don't mind some research.  He's got a website with links 
 to many of his papers, though IIRC some of them are hard to track down 
 without an ACM membership.
 
 
 Sean

Well, that's the thing, I just want to have some general knowledge about 
this area of hardware and concurrency/distributed-systems, not become an 
expert on it. My time to learn new things is limited (and I definitely 
have no trouble going to sleep :( ), so my priority goes for learning 
things that I have immediate need to use/become-involved. If I ever have 
the need to learn more in-depth I will.
You see, concurrency is very important for people interested in 
server-side programming. But for multimedia programming, it's not that 
important. For now that is, as it seems that with the coming of 
multicore CPUs, concurrency is becoming more and more of a general 
software development topic, relevant even for non-intrinsically 
concurrent apps.

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 17 2006

Sean Kelly <sean f4.ca> writes:

Bruno Medeiros wrote:
 Sean Kelly wrote:
 Bruno Medeiros wrote:
 Makes me think, how does one keep up with this? I mean, one who isn't 
 (nor wishes to be) a hardware expert, but wants to keep up with the 
 general developments in this area, thus maintaining an overview of it.

 comp.programming.threads is worth keeping an eye on, though the jargon 
 can get a bit thick there at times.  The C++ committee is also working 
 on a new memory model for C++, so any discussion there may be useful. 
 The easiest way to keep an eye on this is follow the links from Hans 
 Boehm's website  (http://www.hpl.hp.com/personal/Hans_Boehm/) though 
 you could keep an eye on comp.std.c++ as well if you're having trouble 
 staying awake at night.  Finally, any paper with Maurice Herlihy's 
 name on it is a useful resource if you want a deeper understanding of 
 some of these ideas and don't mind some research.  He's got a website 
 with links to many of his papers, though IIRC some of them are hard to 
 track down without an ACM membership.

 
 Well, that's the thing, I just want to have some general knowledge about 
 this area of hardware and concurrency/distributed-systems, not become an 
 expert on it. My time to learn new things is limited (and I definitely 
 have no trouble going to sleep :( ), so my priority goes for learning 
 things that I have immediate need to use/become-involved. If I ever have 
 the need to learn more in-depth I will.
 You see, concurrency is very important for people interested in 
 server-side programming. But for multimedia programming, it's not that 
 important. For now that is, as it seems that with the coming of 
 multicore CPUs, concurrency is becoming more and more of a general 
 software development topic, relevant even for non-intrinsically 
 concurrent apps.

Sadly, I don't know of any more basic sources of information on this 
stuff.  David Butenhof's "Programming with POSIX Threads" is quite good 
for the basic concepts, but beyond that it's mostly research papers, 
wiki topics, etc.  But don't hesitate to ask on comp.programming.threads 
or here if you have a question.

Sean

Jun 17 2006

Sean Kelly <sean f4.ca> writes:

For what it's worth, incremental GC is similar to refcounting in that 
the cost is distributed across pointer use to avoid the need for a 
costly mark/sweep phase.  I've even seen hard-realtime incremental GC 
implementations, so it's a long-term solution worth considering.  That 
said, I would like to be able to create some sort of smart pointer in D 
for tracking valuable resources, so it's good to hear that Walter is 
considering this as well.

Frank Benoit wrote:
 3) in a multithreaded app, the incs and decs must be synchronized

 
 Isn't a atomic inc/dec enough? (Don't know anything about the
 performance of the asm "lock" instruction)

Depending on the platform and the synch. requirement, a "lock" type 
instruction may be necessary.  It typically is for non-x86 platforms, 
but I think you could get away without using one in some cases on x86. 
The best place to check for optimizations would be the Boost shared_ptr 
code.  IIRC they don't use the Interlocked calls but there are spin 
locks in some places.

The cost of a "lock" instruction is variable based on the hardware, 
number of CPUs, whether the cache line is shared, etc, but the most 
reliable estimate I've seen averages locked ops at around 70ns.

 9) Ref counting does not eliminated latency problems, it just reduces them.

 
 Where do you mean is the latency with RC? The collecting work is done in
 the smallest pieces all the time. Only the allocation of new memory
 needs an unbound time to complete. But it does not need to stop all
 other threads. e.g. hardware interrupts don't need to be disabled.

I've seen analyses that suggested reference counting is actually slower 
than mark/sweep when measured over the run of the program.  The 
advantage is the avoidance of the unbounded mark/sweep phase, with far 
more expensive pointer modifications instead.  As above, I'd almost 
rather see incremental GC support in D (it typically requires additional 
ode generation on pointer modifications, and may be tricky to sort out 
for C routines like memset).


Sean

Jun 05 2006

Jeff Parsons <jeffrparsons optusnet.com.au> writes:

Sean Kelly wrote:
 For what it's worth, incremental GC is similar to refcounting in that 
 the cost is distributed across pointer use to avoid the need for a 
 costly mark/sweep phase.  I've even seen hard-realtime incremental GC 
 implementations, so it's a long-term solution worth considering.

My _only_ significant beef with D so far as been the unbound time for a 
garbage collection; therefore, if an incremental garbage collector built 
into D could guarantee me a set maximum time would be taken for each 
garbage collection (which I would manually induce as often as possible) 
then unless the efficiency hit is TERRIBLE I would be all over it.

What I'm really curious about, however, is how important this is to 
_other_ people - especially Walter. Would it be reasonable to say that 
it seems D has been designed largely without long-term real-time 
performance in mind? Sure, there are workarounds, but I haven't seen 
anything elegant yet that will let me use D for such applications 
without making my memory management more complicated than it would have 
been in C++. Is anything in the works?

Jun 07 2006

Sean Kelly <sean f4.ca> writes:

Jeff Parsons wrote:
 
 What I'm really curious about, however, is how important this is to 
 _other_ people - especially Walter. Would it be reasonable to say that 
 it seems D has been designed largely without long-term real-time 
 performance in mind? Sure, there are workarounds, but I haven't seen 
 anything elegant yet that will let me use D for such applications 
 without making my memory management more complicated than it would have 
 been in C++. Is anything in the works?

The spec doesn't contain any language forbidding such an approach, but I 
don't think the popular compilers will support it by default.  Certainly 
not any time soon, at any rate.  The "problem" with incremental GC is 
that it requires extra code generation for all pointer modifications to 
signal the garbage collector that some re-scanning may be necessary. 
This has obvious performance implications for small apps that don't ever 
actually need to collect, but tends to level out as run time increases. 
   And, of course, the potential for progress guarantees (possible with 
incremental GC, but not at all common) is a huge bonus for realtime 
programming.  That aside, there may be other approaches to garbage 
collection that would make everyone happy and that don't require 
additional code generation.  If you know of one, please say so.  My own 
knowledge of the topic is limited to what I've read in research papers.


Sean

Jun 08 2006

Larry Evans <cppljevans cos-internet.com> writes:

On 06/08/2006 06:04 PM, Sean Kelly wrote:
 Jeff Parsons wrote:
 

[snip]
 performance in mind? Sure, there are workarounds, but I haven't seen 
 anything elegant yet that will let me use D for such applications 
 without making my memory management more complicated than it would 
 have been in C++. Is anything in the works?

 
 
 The spec doesn't contain any language forbidding such an approach, but I 
 don't think the popular compilers will support it by default.  Certainly 

[snip]
 programming.  That aside, there may be other approaches to garbage 
 collection that would make everyone happy and that don't require 
 additional code generation.  If you know of one, please say so.  My own 
 knowledge of the topic is limited to what I've read in research papers.

You could define a policy pointer which could have as one policy, the
current Boehm collector.  IOW, any ptr<BW> p(T) on the stack would be a
root pointer and each data structure would have a map of ptr<BW>
locations.  Similarly, there could be ptr<Copy> or ptr<IncBw> or even
ptr<deterministic> and ptr<collected> instead of the

   class [collected] SomeClass{...};
   class [deterministic] OtherClass{...}

proposed by Adrei here:

   http://tinyurl.com/o5zpm

This, of course, would require cooperation from the compiler, but
the same it would be true of:

   class [collected] SomeClass{...};

it's just that some specializations, ptr<collected>, would be
predetermined and others, for example, user-defined ones,
would not.

Jun 08 2006

Walter Bright <newshound digitalmars.com> writes:

Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for a 
 garbage collection; therefore, if an incremental garbage collector built 
 into D could guarantee me a set maximum time would be taken for each 
 garbage collection (which I would manually induce as often as possible) 
 then unless the efficiency hit is TERRIBLE I would be all over it.

If you're looking for a *guarantee*, then you cannot even use malloc(). 
Malloc has unbounded time, and is not predictable.

The approach most real time programmers use to deal with this is to 
preallocate all needed data before entering the real time section, and 
the real time section does no malloc's or free's.

Also, you can use malloc and free in D, in exactly the same way you 
would in C.

Jun 08 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for 
 a garbage collection; therefore, if an incremental garbage collector 
 built into D could guarantee me a set maximum time would be taken for 
 each garbage collection (which I would manually induce as often as 
 possible) then unless the efficiency hit is TERRIBLE I would be all 
 over it.

 
 If you're looking for a *guarantee*, then you cannot even use malloc(). 
 Malloc has unbounded time, and is not predictable.

For what it's worth, I have read papers describing hard real-time 
incremental garbage collectors, but I haven't yet seen one available 
outside research circles.  I'd like to give one a shot at some point, 
but I've far too many other distractions to know when that will be.

If anyone is interested however, some of the papers are available here:

http://www.cs.utexas.edu/users/oops/papers.html

See papers 11 and 14.

 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.

This is certainly the easiest and probably the safest method.


Sean

Jun 08 2006

Walter Bright <newshound digitalmars.com> writes:

Sean Kelly wrote:
 Walter Bright wrote:
 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.

 
 This is certainly the easiest and probably the safest method.

Yup. It gets the job done.

Real time threads should be considered a very valuable resource, and as 
such all possible computation should be removed from them and put into 
non-realtime threads.

Jun 09 2006

Derek Parnell <derek psych.ward> writes:

On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:


 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.
 
 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

It seems obvious that if one doesn't want the Garbage Collector to collect
garbage, one doesn't create any garbage for it to collect. ;-)

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
9/06/2006 4:44:56 PM

Jun 08 2006

Sean Kelly <sean f4.ca> writes:

Derek Parnell wrote:
 On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:
 
 
 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, and 
 the real time section does no malloc's or free's.

 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

 
 It seems obvious that if one doesn't want the Garbage Collector to collect
 garbage, one doesn't create any garbage for it to collect. ;-)

This gets sticky if one wants to use strings in D though, as there's no 
real way to make them work outside of the garbage collector mechanism. 
Sure we could use C-style pointers, but it's not a terribly attractive 
option.  I suppose the alternative would be a class with a custom 
allocator method.


Sean

Jun 08 2006

Walter Bright <newshound digitalmars.com> writes:

Sean Kelly wrote:
 Derek Parnell wrote:
 On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:


 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, 
 and the real time section does no malloc's or free's.

 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

 It seems obvious that if one doesn't want the Garbage Collector to 
 collect
 garbage, one doesn't create any garbage for it to collect. ;-)

 
 This gets sticky if one wants to use strings in D though, as there's no 
 real way to make them work outside of the garbage collector mechanism.
 Sure we could use C-style pointers, but it's not a terribly attractive 
 option.

You can use D arrays. Just allocate them using malloc (or statically 
allocate them). Don't concatenate them.

Though I would ask why the real time code was allocating strings?

C is very good for writing real time code, and you can write "C"-style 
code in D, and it will behave exactly like the corresponding C code. You 
can call printf, strcpy, malloc, etc., which will behave just like C's 
printf, strcpy, and malloc because they *are* C's functions. It will not 
do a gc collect unless the code allocates gc memory, and it will ONLY do 
a gc collect during such an allocation.

Heck, my old Empire game was written in C. I renamed the files from .c 
to .d, did a few minor syntactic edits, and voila! it was now in D. It 
runs exactly the same.

But the bottom line is that a real time thread should not be doing 
storage allocation. It shouldn't be calling new, or malloc. It shouldn't 
do storage allocation regardless of whether it is written in D, C, C++, 
or assembler.

Jun 09 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 Sean Kelly wrote:
 Derek Parnell wrote:
 On Thu, 08 Jun 2006 22:18:24 -0700, Walter Bright wrote:


 The approach most real time programmers use to deal with this is to 
 preallocate all needed data before entering the real time section, 
 and the real time section does no malloc's or free's.

 Also, you can use malloc and free in D, in exactly the same way you 
 would in C.

 It seems obvious that if one doesn't want the Garbage Collector to 
 collect
 garbage, one doesn't create any garbage for it to collect. ;-)

 This gets sticky if one wants to use strings in D though, as there's 
 no real way to make them work outside of the garbage collector mechanism.
 Sure we could use C-style pointers, but it's not a terribly attractive 
 option.

 
 You can use D arrays. Just allocate them using malloc (or statically 
 allocate them). Don't concatenate them.

So then I'd just manually set the ptr and len members?  Sounds 
reasonable, provided everyone behaves and uses the copy on write idiom 
you suggest.

 Though I would ask why the real time code was allocating strings?

I was mostly thinking about this in the context of just how much 
allocation pretty much must be performed by the GC for applications with 
special memory requirements (gigabytes of data, for example).  It just 
seemed to have an overlap insofar as this sort of programming is concerned.

 C is very good for writing real time code, and you can write "C"-style 
 code in D, and it will behave exactly like the corresponding C code. You 
 can call printf, strcpy, malloc, etc., which will behave just like C's 
 printf, strcpy, and malloc because they *are* C's functions. It will not 
 do a gc collect unless the code allocates gc memory, and it will ONLY do 
 a gc collect during such an allocation.

At times I've considered extending thread support to allow for the 
creation of detached threads, but it simply seems to error-prone to be 
worthwhile.  I think you're right in that if someone is doing real-time 
programming then they shouldn't expect much hand-holding, because the 
assumptions involved offer too great a risk of unexpected behavior.

 Heck, my old Empire game was written in C. I renamed the files from .c 
 to .d, did a few minor syntactic edits, and voila! it was now in D. It 
 runs exactly the same.

Yup.  Though it's worth noting that D has been heavily influenced by 
your particular programming style.  Not at all a bad thing, but it does 
imply that you are less likely than anyone else to experience code 
translation problems :-)  On a related note, I sometimes wonder if Sid 
Meier credits you anywhere for effectively creating the genre that's 
made him so successful ;-)

 But the bottom line is that a real time thread should not be doing 
 storage allocation. It shouldn't be calling new, or malloc. It shouldn't 
 do storage allocation regardless of whether it is written in D, C, C++, 
 or assembler.

True enough.


Sean
Sean

Jun 09 2006

Walter Bright <newshound digitalmars.com> writes:

Sean Kelly wrote:
 Walter Bright wrote:
 You can use D arrays. Just allocate them using malloc (or statically 
 allocate them). Don't concatenate them.

 So then I'd just manually set the ptr and len members?  Sounds 
 reasonable, provided everyone behaves and uses the copy on write idiom 
 you suggest.

int *p = cast(int *)calloc(dim, sizeof(int));
int[] a = p[0 .. dim];

 C is very good for writing real time code, and you can write "C"-style 
 code in D, and it will behave exactly like the corresponding C code. 
 You can call printf, strcpy, malloc, etc., which will behave just like 
 C's printf, strcpy, and malloc because they *are* C's functions. It 
 will not do a gc collect unless the code allocates gc memory, and it 
 will ONLY do a gc collect during such an allocation.

 At times I've considered extending thread support to allow for the 
 creation of detached threads, but it simply seems to error-prone to be 
 worthwhile.  I think you're right in that if someone is doing real-time 
 programming then they shouldn't expect much hand-holding, because the 
 assumptions involved offer too great a risk of unexpected behavior.

I agree that realtime programming is a special type of programming, and 
that someone doing it should be a more advanced programmer. You can't 
just take any random code (in any language), make it realtime, and 
expect it to work properly. It's got to be engineered for the 
peculiarities of realtime requirements.


 Heck, my old Empire game was written in C. I renamed the files from .c 
 to .d, did a few minor syntactic edits, and voila! it was now in D. It 
 runs exactly the same.

 
 Yup.  Though it's worth noting that D has been heavily influenced by 
 your particular programming style.  Not at all a bad thing, but it does 
 imply that you are less likely than anyone else to experience code 
 translation problems :-)  On a related note, I sometimes wonder if Sid 
 Meier credits you anywhere for effectively creating the genre that's 
 made him so successful ;-)

Nope :-)

Jun 09 2006

David Gileadi <foo bar.com> writes:

Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for 
 a garbage collection; therefore, if an incremental garbage collector 
 built into D could guarantee me a set maximum time would be taken for 
 each garbage collection (which I would manually induce as often as 
 possible) then unless the efficiency hit is TERRIBLE I would be all 
 over it.

 
 If you're looking for a *guarantee*, then you cannot even use malloc(). 
 Malloc has unbounded time, and is not predictable.

It seems that a fairly common desire is to use D for writing games. 
Games don't typically have hard real-time requirements, but it also 
doesn't look good if your screen freezes for a second or two in the 
middle of play.

This is an area where an incremental collector or some kind of reference 
counting solution would be handy, since (especially in large programs) 
it would be difficult to avoid using garbage collected code.  I realize 
that the programmer has the responsibility to manage memory 
intelligently, but it would be nice to not avoid using GC code entirely. 
  As others have stated, it would even be worth giving up some overall 
performance to gain a smooth playing experience.

Jun 09 2006

John Reimer <terminal.node gmail.com> writes:

David Gileadi wrote:
 Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time for 
 a garbage collection; therefore, if an incremental garbage collector 
 built into D could guarantee me a set maximum time would be taken for 
 each garbage collection (which I would manually induce as often as 
 possible) then unless the efficiency hit is TERRIBLE I would be all 
 over it.

 If you're looking for a *guarantee*, then you cannot even use 
 malloc(). Malloc has unbounded time, and is not predictable.

 
 It seems that a fairly common desire is to use D for writing games. 
 Games don't typically have hard real-time requirements, but it also 
 doesn't look good if your screen freezes for a second or two in the 
 middle of play.
 
 This is an area where an incremental collector or some kind of reference 
 counting solution would be handy, since (especially in large programs) 
 it would be difficult to avoid using garbage collected code.  I realize 
 that the programmer has the responsibility to manage memory 
 intelligently, but it would be nice to not avoid using GC code entirely. 
  As others have stated, it would even be worth giving up some overall 
 performance to gain a smooth playing experience.


Didn't Walter point out that that, in the current gc implementation, you 
just have to be conscientious about when you call 'new' (which appears 
to initiate a collect) if you want to avoid the gc initiated pauses.

So in game programming, I assume it's just a matter of good organization 
and planning.  You make sure you allocate sufficient memory ahead of 
time and avoid using 'new' in any location that would cause an ugly 
pause in the game.  Furthermore, you might choose appropriate times for 
when you want to call a full collect manually.

As I see it, the obligation for careful design doesn't disappear when a 
gc is in place.  It certainly makes life easier in many ways.  But one 
still has to be aware of the gc shortcomings and adapt to the situation. 
  I still see it as a tremendous improvement over having to clean up 
after every memory allocation you make (C/C++).

In short, using a gc doesn't mean you don't have to do your homework. :)

-JJR

Jun 09 2006

Dave <Dave_member pathlink.com> writes:

John Reimer wrote:
 David Gileadi wrote:
 Walter Bright wrote:
 Jeff Parsons wrote:
 My _only_ significant beef with D so far as been the unbound time 
 for a garbage collection; therefore, if an incremental garbage 
 collector built into D could guarantee me a set maximum time would 
 be taken for each garbage collection (which I would manually induce 
 as often as possible) then unless the efficiency hit is TERRIBLE I 
 would be all over it.

 If you're looking for a *guarantee*, then you cannot even use 
 malloc(). Malloc has unbounded time, and is not predictable.

 It seems that a fairly common desire is to use D for writing games. 
 Games don't typically have hard real-time requirements, but it also 
 doesn't look good if your screen freezes for a second or two in the 
 middle of play.

 This is an area where an incremental collector or some kind of 
 reference counting solution would be handy, since (especially in large 
 programs) it would be difficult to avoid using garbage collected 
 code.  I realize that the programmer has the responsibility to manage 
 memory intelligently, but it would be nice to not avoid using GC code 
 entirely.  As others have stated, it would even be worth giving up 
 some overall performance to gain a smooth playing experience.

 
 
 Didn't Walter point out that that, in the current gc implementation, you 
 just have to be conscientious about when you call 'new' (which appears 
 to initiate a collect) if you want to avoid the gc initiated pauses.
 

Yea, in the current GC, allocation is the only thing that may implicitly 
initiate a collection.

 So in game programming, I assume it's just a matter of good organization 
 and planning.  You make sure you allocate sufficient memory ahead of 
 time and avoid using 'new' in any location that would cause an ugly 
 pause in the game.  Furthermore, you might choose appropriate times for 
 when you want to call a full collect manually.
 

And D's easy array slicing along with array operations (like 
re-initializing an array with 'arr[] = 0;') makes the code to use 
pre-allocated buffers a lot easier to write too.

 As I see it, the obligation for careful design doesn't disappear when a 
 gc is in place.  It certainly makes life easier in many ways.  But one 
 still has to be aware of the gc shortcomings and adapt to the situation. 
  I still see it as a tremendous improvement over having to clean up 
 after every memory allocation you make (C/C++).
 
 In short, using a gc doesn't mean you don't have to do your homework. :)
 
 -JJR

Jun 09 2006

Chad J <gamerChad _spamIsBad_gmail.com> writes:

David Gileadi wrote:
 
 It seems that a fairly common desire is to use D for writing games. 
 Games don't typically have hard real-time requirements, but it also 
 doesn't look good if your screen freezes for a second or two in the 
 middle of play.
 
 This is an area where an incremental collector or some kind of reference 
 counting solution would be handy, since (especially in large programs) 
 it would be difficult to avoid using garbage collected code.  I realize 
 that the programmer has the responsibility to manage memory 
 intelligently, but it would be nice to not avoid using GC code entirely. 
  As others have stated, it would even be worth giving up some overall 
 performance to gain a smooth playing experience.

This is what I'm screaming.

It is becoming popular nowadays to allow people to modify games in 
various ways, one of which is modding the game's code.  I don't think 
you will see much of this happening inside a 3d engine, but the game 
logic that sits next to the 3d engine and dictates gameplay may be open. 
  This allows people to change things like how the AI behaves, how much 
damage you take when hit with a certain weapon, how that certain weapon 
behaves, how an economy behaves, etc.

Modders may or may not be experienced programmers, so it's probably best 
to assume they don't have significant experience.  This has caused 
plenty of games to use scripting languages for modding, since C plus 
plus is tough to learn and not good for this stuff.  I think D could do 
it though.  It would kick butt if D could do this, because it would make 
the mods run much faster than the interpreted code that is probably 
going to be used nowadays, even if you have like 50% reference counting 
overhead.

Automatic memory management is a huge boon to that sort of ease of code 
writing that is expected for inexperienced coders to get in on the 
action and do constructive stuff.  Even without the modding argument, 
being able to use automatic memory management in some parts of the game 
would be very advantageous in development time.  It seems to me though, 
that if you want to do non-pausing games in D, you just use C style 
memory management.  Bummer.

As David Gileadi said, games don't have strict latency requirements. 
For games, garbage collection doesn't need to be deterministic, and it 
doesn't need to finish in under 100 microseconds, it just needs to not 
be noticable.  For a few years I've messed with RunUO which is, afaik, 

worldsaves were annoying!).  This is doable.

Also, think of all the new D programmers there would be if someone were 
to release a popular game that exposed D as a scripting language :)

Jun 12 2006

renox <renosky free.fr> writes:

Frank Benoit wrote:
 Another garbage collector thread :)
 
 The best garbage collector I can think about, is a reference counting
 one.

Uh, what's your definition of "best"?
The simplest? Then maybe.

If you take a look at Java, it doesn't use a reference counting GC and 
there is a good reason for this: AFAIK modern GC are far better 
performance wise than reference counting GC.
One GC that I remember (no I'm not an expert) is Mark Copy GC, see 
"MarkCopy:Fast copying GC with less space overhead".

The only thing that reference counting GCs have for them is that this 
scheme is simple, more or less robust but "best", best for what?

I doubt very much that the best GC used for say batch computations is 
the best for GUI clients..

Regards,
RenoX

Jun 04 2006

D Programming

C/C++ Programming

Other

digitalmars.D - GC, the simple solution