www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - draft proposal for ref counting in D

reply Walter Bright <newshound2 digitalmars.com> writes:
This is an email conversation we had last summer. It's of general interest, so 
reposted here with permission. We didn't reach any conclusions, but there's a 
lot of good stuff in here, and it's particularly relevant to other recent
threads.
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
This is based on n.g. discussions and ideas from you guys. I'll redo it as a
DIP 
if it passes the smoke test from y'all.

----------------------------------------------------------------------
     Adding Reference Counting to D

D currently supports manual memory management and generalized GC.
Unfortunately, 
the pausing
and memory consumption inherent in GC is not acceptable for many programs, and 
manual
memory management is error-prone, tedious, and unsafe. A third style, reference 
counting (RC),
addresses this. Common implementations are COM's AddRef/Release, Objective-C's
ARC,
and C++'s shared_ptr<>.

None of these three schemes are guaranteed memory-safe, they all require the 
programmer
to conform to a protocol (even O-C's ARC). Stepping outside of the protocol 
results in
memory corruption. A D implementation must make it possible to use ref counted 
objects
in code marked as  safe, although it will be necessary for the implementation
of 
those
objects to be unsafe.

Some aspects of a D implementation are inevitable:

1. Avoid any requirement for more pointer types. This would cause drastic 
increases in
complexity for both the compiler and the user. It may make generic code much 
more difficult
to write.

2. Decay of a ref-counted pointer to a non-ref-counted pointer is unsafe, and 
can only
be allowed (in  safe code) in circumstances where it can be statically proven
to 
be safe.

3. A ref counted object is inherently a reference type, not a value type.

4. The compiler needs to know about ref counted types.


==Proposal==

If a class contains the following methods, in either itself or a base class, it
is
an RC class:


     T AddRef();
     T Release();

An RC class is like a regular D class with these additional semantics:

1. In  safe code, casting (implicit or explicit) to a base class that does not
have both AddRef() and Release() is an error.

2. Initialization of a class reference causes a call to AddRef().

3. Assignment to a class reference causes a call to Release() on its original
value
and AddRef() on the new value.

4. Null checks are done before calling any AddRef() or Release().

5. Upon scope exit of all RC variables or temporaries, a call to Release() is 
performed,
analogously to the destruction of struct variables and temporaries.

6. If a class or struct contains RC fields, calls to Release() for those fields
will
be added to the destructor, and a destructor will be created if one doesn't 
exist already.

7. If a closure is created that contains RC fields, either a compile time error 
will be
generated or a destructor will be created for it.

8. Explicit calls to AddRef/Release will not be allowed in  safe code.

9. A call to AddRef() will be added to any argument passed as a parameter.

10. Function returns have an AddRef() already done to the return value.

11. The compiler can elide any AddRef()/Release() calls it can prove are
redundant.

12. AddRef() is not called when passed as the implicit 'this' reference.

13. Taking the address of, or passing by reference, any fields of an RC object
is not allowed in  safe code. Passing by reference an RC field is allowed.

14. RC objects will still be allocated on the GC heap - this means that a normal
GC run will reap RC objects that are in a cycle, and RC objects will get 
automatically
scanned for heap references with no additional action required by the user.


==Existing Code==

D COM objects already have AddRef() and Release(). This proposal should not
break
that code, it'll just mean that there will be extra AddRef()/Release calls made.
Calling AddRef()/Release() should never have been allowed in  safe code anyway.

Any other existing uses of AddRef()/Release() will break.

==Arrays==

Built-in arrays have no place to put a reference count. Ref counted arrays
would 
hence
become a library type, based on a ref counted class with overloaded operators
for
the array operations.

==Results==

This is a very flexible approach, allowing for support of general RC objects,
as 
well
as specific support for COM objects and Objective-C ARC. AddRef()/Release()'s 
implementation
is entirely up to the user or library writer.

 safe code can be guaranteed to be memory safe, as long as AddRef()/Release() 
are correctly
implemented.
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

Looks like a good start.

A few things:

1. Proposal point 3, you need to AddRef first, THEN Release the original.
Reason 
being, you could be assigning a reference the same value as before. In this 
case, if you decrement *first*, you will decrement the reference, which might 
reduce it to 0, and free the object before you increment.  In most cases, the 
AddRef and Release are elided, so it's not bad if you do this.

I wonder if it's not a good idea to have a RefAssign function that takes two RC 
objects and does a check to see if they are the same before doing AddRef and 
Release to help with this issue. Calls where the compiler can prove they are
the 
same value can be elided.

2. AddRef and Release should be treated not as function calls, but as
callables. 
That is, if for some reason AddRef and Release should be aliases, this does not 
detract from the solution. Only requirement should be that they cannot be UFCS, 
as that doesn't make any sense (one cannot add reference counting after the
fact 
to an object).

I'm thinking of the Objective-C objects, whose functions are "release" and 
"retain". It would be good to use the names Objective-C coders are used to.

3. Objective-C ARC uses a mechanism called auto-release pools that help cut
down 
on the release/retain calls. It works like this:

 autoreleasepool {  // create a new pool
      NSString *str = [NSString stringWithFormat: "an int: %d", 1];
       autoreleasepool { // create a new pool on the "pool stack"
            NSDate *date = [NSDate date];
            {
                NSDate *date2 = [NSDate date];
            }
      } // auto-release date and date2
} // auto-release str

In this way, it's not the pointer going out of scope that releases the object, 
it's the release pool going out of scope that releases the object. These
release 
pools themselves are simply RC objects that call release on all their objects 
(in fact, prior to the  autoreleasepool directive, you had to manually create 
and destroy these pools).

The benefit of this model is that basically, inside a pool, you can move around 
auto-released objects at will, pass them into functions, return them from 
functions, etc, without having to retain or release them for each assignment. 
It's kind of like a mini-GC.

It works by convention, that any function that returns a RC object:

if it's called 'new...' or 'init...' or 'alloc...' or 'copy...', then the
object 
is assumed returned with it's retain count incremented on behalf of the calling 
scope. This means, if you assign it to a member variable, for instance, you do 
not have to retain the object again, and if it goes out of scope, you must call 
release on it.

All other functions return 'auto-released' objects, or objects which have
queued 
in the latest auto release pool. The compiler knows this and can elide more of 
the releases and retains.

Would this be a mechanism that's worth putting in? I think it goes really well 
with something like TempAlloc.  I don't think we should use convention,
though...

-Steve
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/25/2013 1:19 PM, Steven Schveighoffer wrote:
 Looks like a good start.

 A few things:

 1. Proposal point 3, you need to AddRef first, THEN Release the original. 
Reason being, you could be assigning a reference the same value as before. In this case, if you decrement *first*, you will decrement the reference, which might reduce it to 0, and free the object before you increment. In most cases, the AddRef and Release are elided, so it's not bad if you do this. Yeah, I got that backwards, and I should know better.
 I wonder if it's not a good idea to have a RefAssign function that takes two 
RC objects and does a check to see if they are the same before doing AddRef and Release to help with this issue. Calls where the compiler can prove they are the same value can be elided. I'd like to see how far we get with just AddRef/Release first, and getting the semantics of them right first.
 2. AddRef and Release should be treated not as function calls, but as 
callables. That is, if for some reason AddRef and Release should be aliases, this does not detract from the solution. Yes, just like the names for Ranges are used.
   Only requirement should be that they cannot be UFCS, as that doesn't make 
any sense (one cannot add reference counting after the fact to an object). That would be covered by disallowing explicit calls to AddRef/Release in safe code.
 I'm thinking of the Objective-C objects, whose functions are "release" and 
"retain". It would be good to use the names Objective-C coders are used to. AddRef/Release are the COM names. It's trivial to have one wrap the other. I picked AddRef/Release because I'm familiar with their semantics, and am not with O-C.
 3. Objective-C ARC uses a mechanism called auto-release pools that help cut 
down on the release/retain calls. It works like this:
  autoreleasepool {  // create a new pool
       NSString *str = [NSString stringWithFormat: "an int: %d", 1];
        autoreleasepool { // create a new pool on the "pool stack"
             NSDate *date = [NSDate date];
             {
                 NSDate *date2 = [NSDate date];
             }
       } // auto-release date and date2
 } // auto-release str

 In this way, it's not the pointer going out of scope that releases the 
object, it's the release pool going out of scope that releases the object. These release pools themselves are simply RC objects that call release on all their objects (in fact, prior to the autoreleasepool directive, you had to manually create and destroy these pools).
 The benefit of this model is that basically, inside a pool, you can move 
around auto-released objects at will, pass them into functions, return them from functions, etc, without having to retain or release them for each assignment. It's kind of like a mini-GC.
 It works by convention, that any function that returns a RC object:

 if it's called 'new...' or 'init...' or 'alloc...' or 'copy...', then the 
object is assumed returned with it's retain count incremented on behalf of the calling scope. This means, if you assign it to a member variable, for instance, you do not have to retain the object again, and if it goes out of scope, you must call release on it.
 All other functions return 'auto-released' objects, or objects which have 
queued in the latest auto release pool. The compiler knows this and can elide more of the releases and retains.
 Would this be a mechanism that's worth putting in? I think it goes really 
well with something like TempAlloc. I don't think we should use convention, though... I agree with not relying on convention. But also reserving the new*, init*, alloc* and copy* namespaces seems excessive for D. As for autoreleasepool, it is relying on a convention that its fields are not leaking. I don't see how we can enforce this.
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
I have overlooked addressing what happens when you pass an RC ref to a pure 
function. Is the pure function allowed to call AddRef()/Release()? Not sure.
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

That's a resounding yes.

Consider that this is allowed:

class X {}

struct S
{
    X foo;
    void setFoo(X newfoo) pure {foo = newfoo;}
}

If X is ref-counted, you HAVE to increment the ref count.

The only issue here is, ref counting may have to access global data.  But we 
already have exceptions for memory management, even for strong-pure functions.

-Steve

On Jun 25, 2013, at 4:48 PM, Walter Bright wrote:

 I have overlooked addressing what happens when you pass an RC ref to a pure 
function. Is the pure function allowed to call AddRef()/Release()? Not sure.
Oct 09 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:


On Jun 25, 2013, at 4:44 PM, Walter Bright wrote:

 On 6/25/2013 1:19 PM, Steven Schveighoffer wrote:
 Would this be a mechanism that's worth putting in? I think it goes really 
well with something like TempAlloc. I don't think we should use convention, though...
 I agree with not relying on convention. But also reserving the new*, init*, 
alloc* and copy* namespaces seems excessive for D.
 As for autoreleasepool, it is relying on a convention that its fields are not 
leaking. I don't see how we can enforce this. I don't think the autoreleasepool is relying on convention, it's simply giving the compiler a way to elide careful tracking of temporaries' reference counts. It was definitely of more use when manual reference counting was done, because one only had to worry about retaining non-temporary data in that case. But the compiler can make the same optimizations (and does in the ARC version of Objective-C). Consider the following code: NSString *str = [NSString stringWithFormat: "%d", 5]; // translating to D, that would be something like: NSString str = NSString.stringWithFormat("%d", 5); stringWithFormat is a class method that gives you back a temporary string. You are not asserting ownership, you are just assigning to a variable. Now, if you wanted to do some fancy stuff with str, we could do: { NSString str2; { NSString str = NSString.stringWithFormat("%d", 5); if(condition) str2 = str; if(otherCondition) { NSString str3 = str; str = NSString.stringWithFormat("%d", 6); } } str2 = str; } Now, in all this mess, how is the compiler to sort out the AddRefs and Releases? Most likely, it will end up adding more than it needs to. But with autorelease pool, it's like you did this: AutoReleasePool arp; { NSString str2; { NSString str = NSString.stringWithFormat("%d", 5); arp.add(str); if(condition) str2 = str; if(otherCondition) { NSString str3 = str; str = NSString.stringWithFormat("%d", 6); arp.add(str); } } str2 = str; } arp.drain(); // releases both strings used, don't care what now-out-of-scope variables Essentially, they are only retained when created, and because they go out of scope, they are no longer needed. The compiler can surmise that because the fields aren't leaving the scope, it doesn't have to retain them. If it does see that, it adds a retain. Then, it can release them all at once. In fact, this could be done automatically, but you have to allocate a place to put these 'scheduled for release' things. In Cocoa, the main event loop has an auto release pool, and you can add them manually wherever you wish for more fine-grained memory management (that is, if you wanted to free objects before you left the event loop). Note that in Objective-C, they use those naming conventions to determine whether an object is auto-released or not. But we could make sure it's *always* auto-released, as we don't have the historical requirements that Objective-C has. The question is, does it make sense to use this technique to "lump together" deallocations instead of conservatively calling retain/release wherever you assign variables (like C++ shared_ptr)? And a further question is whether the compiler should pick those points, or whether they should be picked manually. -Steve
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
If autoreleasepull is just a handy way to lump together Release() calls, then 
that is quite unnecessary if the compiler inserts calls to Release() 
automatically. If it is, instead, a promise that members of autoreleasepull do 
not leak references outside of that object, then this is very problematic for D 
to guarantee such - and guarantee it it must. I.e. it's "escape analysis" in 
another disguise.

I think the compiler should pick where to put the Release() calls, that is the 
whole point of ARC. If the compiler can do sufficient escape analysis to 
determine that the calls can be elided, so much the better.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On Jun 25, 2013, at 5:28 PM, Walter Bright wrote:

 If autoreleasepull is just a handy way to lump together Release() calls, then 
that is quite unnecessary if the compiler inserts calls to Release() automatically. If it is, instead, a promise that members of autoreleasepull do not leak references outside of that object, then this is very problematic for D to guarantee such - and guarantee it it must. I.e. it's "escape analysis" in another disguise.
 I think the compiler should pick where to put the Release() calls, that is 
the whole point of ARC. If the compiler can do sufficient escape analysis to determine that the calls can be elided, so much the better. I'm not sure exactly what is required for ARC to guarantee proper memory management (whether it requires flow-analysis or not), but it seems to work quite well for Objective-C. I think it helps minimize the expensive release/retain calls when you can just say "oh, someone else will clean that up later", just like you can with a GC. It might be good for someone who knows the ARC eliding techniques that clang uses to explain how they work. We certainly shouldn't ignore those techniques. -Steve
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/25/2013 2:47 PM, Steven Schveighoffer wrote:
 I'm not sure exactly what is required for ARC to guarantee proper memory 
management (whether it requires flow-analysis or not), but it seems to work quite well for Objective-C. I think it helps minimize the expensive release/retain calls when you can just say "oh, someone else will clean that up later", just like you can with a GC.
 It might be good for someone who knows the ARC eliding techniques that clang 
uses to explain how they work. We certainly shouldn't ignore those techniques.

Also remember that O-C doesn't guarantee memory safety, so they are freed from 
some of the constraints we operate under. They can say "don't do that", we
can't.

C++ shared_ptr<> is memory safe as long as you don't escape a pointer - and no 
C++ compiler checks for that.

COM is also memory safe as long as you carefully follow the conventions - and 
again, no C++ compiler checks it.
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
 On 6/25/2013 2:47 PM, Steven Schveighoffer wrote:
 I'm not sure exactly what is required for ARC to guarantee proper memory 
management (whether it requires flow-analysis or not), but it seems to work quite well for Objective-C. I think it helps minimize the expensive release/retain calls when you can just say "oh, someone else will clean that up later", just like you can with a GC.
 It might be good for someone who knows the ARC eliding techniques that clang 
uses to explain how they work. We certainly shouldn't ignore those techniques.

 Also remember that O-C doesn't guarantee memory safety, so they are freed 
from some of the constraints we operate under. They can say "don't do that", we can't. I'm not sure that it doesn't. At least when we are talking about object references. The only thing clang complains about is when you try to call any memory management manually, or if you disobey the naming conventions. -Steve
Oct 09 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 25-juin-2013 à 17:20, Steven Schveighoffer a écrit :

 On Jun 25, 2013, at 4:44 PM, Walter Bright wrote:

 On 6/25/2013 1:19 PM, Steven Schveighoffer wrote:
 Would this be a mechanism that's worth putting in? I think it goes really 
well with something like TempAlloc. I don't think we should use convention, though...
 I agree with not relying on convention. But also reserving the new*, init*, 
alloc* and copy* namespaces seems excessive for D.
 As for autoreleasepool, it is relying on a convention that its fields are 
not leaking. I don't see how we can enforce this.
 I don't think the autoreleasepool is relying on convention, it's simply 
giving the compiler a way to elide careful tracking of temporaries' reference counts. Not at all. Autorelease pools were useful at a time before ARC so you wouldn't have to think of releasing manually every object called functions were returned to you. Instead, most functions would return autoreleased object and you'd only have to retain those objects you were storing elsewhere. Nowadays, with ARC, we still have them but that's mostly for interoperability already existing code. Most functions still return autoreleased objects because that's the convention, and breaking that convention would cause objects to be retained or released to many times. So we still need autorelease pools. But ARC is hard at work[^1] behind the scene to reduce the number of those autoreleased objects. So no, we shouldn't introduce autorelease pools to D... well, except maybe for the part were we want interoperability with Objective-C (because we have no choice). And finally, there's nothing unsafe with autorelease pools as long as you don't keep an unretained reference to an autoreleased object when the pool drains. Making sure you have no unretained reference is ARC's job, so with ARC it should not be no problem. [^1]: One clever trick ARC does is inside the implicit call to objc_returnAutoreleased it adds at the end of an autoreleasing function, the runtime checks to see if the return address points to an instruction that'll call objc_retain on that same pointer. If that's the case, it skips the autorelease and also skip objc_retain and goes to the next instruction directly. Of course if the convention was always to return object retained, none of this would be needed. I saw that explained on a WWDC video a couple of years back.
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jun 25, 2013, at 9:31 PM, Michel Fortin wrote:

 Le 25-juin-2013 à 17:20, Steven Schveighoffer a écrit :

 I don't think the autoreleasepool is relying on convention, it's simply 
giving the compiler a way to elide careful tracking of temporaries' reference counts.
 Not at all. Autorelease pools were useful at a time before ARC so you 
wouldn't have to think of releasing manually every object called functions were returned to you. Instead, most functions would return autoreleased object and you'd only have to retain those objects you were storing elsewhere. Having used MRC, I appreciate what autoreleasepool did, but I thought of it being also as a kind of blanket way to allow the compiler to remove extra retains/releases in ARC. Is it not advantageous to release a whole pool of objects vs. releasing them individually during execution? All releases and retains are atomic, so I figured one could do some optimization when it's all lumped together. I find the autorelease pools very GC-like -- you don't have to worry who uses or forgets the reference, it's kept in memory until you don't need it. Anyway, everything I know about Obj-C ARC I learned from my iOS 5 book :) So don't take me as an expert. -Steve
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 25-juin-2013 à 21:40, Steven Schveighoffer a écrit :

 On Jun 25, 2013, at 9:31 PM, Michel Fortin wrote:

 Not at all. Autorelease pools were useful at a time before ARC so you 
wouldn't have to think of releasing manually every object called functions were returned to you. Instead, most functions would return autoreleased object and you'd only have to retain those objects you were storing elsewhere.
 Having used MRC, I appreciate what autoreleasepool did, but I thought of it 
being also as a kind of blanket way to allow the compiler to remove extra retains/releases in ARC.
 Is it not advantageous to release a whole pool of objects vs. releasing them 
individually during execution? All releases and retains are atomic, so I figured one could do some optimization when it's all lumped together. I haven't done any benchmarking, but I'd have to assume it is more advantageous to just return objects retained since Apple went to great lengths to make sure this can happen even when the convention is to return autoreleased. There's no question it also simplifies the compiler. It's much easier to reason about pairs of retain/release than retain/autorelease.
 I find the autorelease pools very GC-like -- you don't have to worry who uses 
or forgets the reference, it's kept in memory until you don't need it. The concept was truly great, no doubt about that.
Oct 09 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/25/2013 6:31 PM, Michel Fortin wrote:
 And finally, there's nothing unsafe with autorelease pools as long as you 
don't keep an unretained reference to an autoreleased object when the pool drains. Well, that's exactly the issue - an escaping reference.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jun 25, 2013, at 11:04 PM, Walter Bright wrote:

 On 6/25/2013 6:31 PM, Michel Fortin wrote:
 And finally, there's nothing unsafe with autorelease pools as long as you 
don't keep an unretained reference to an autoreleased object when the pool drains.
 Well, that's exactly the issue - an escaping reference.
Read the next sentence after your quote though: "Making sure you have no unretained reference is ARC's job, so with ARC it should not be no problem." So with ARC, it's not unsafe. I think that was the ultimate point. -Steve
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/25/2013 8:08 PM, Steven Schveighoffer wrote:
 On Jun 25, 2013, at 11:04 PM, Walter Bright wrote:

 On 6/25/2013 6:31 PM, Michel Fortin wrote:
 And finally, there's nothing unsafe with autorelease pools as long as you 
don't keep an unretained reference to an autoreleased object when the pool drains.
 Well, that's exactly the issue - an escaping reference.
Read the next sentence after your quote though: "Making sure you have no unretained reference is ARC's job, so with ARC it
should not be no problem."
 So with ARC, it's not unsafe.  I think that was the ultimate point.
Yes, I read the next sentence. What it means to me is that autorelease pools add nothing when ARC is implemented - no optimizations, either. Either that or I don't understand how ARC figures out that there are no escaping references without doing things like runtime checks.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jun 26, 2013, at 12:10 AM, Walter Bright wrote:

 On 6/25/2013 8:08 PM, Steven Schveighoffer wrote:
 On Jun 25, 2013, at 11:04 PM, Walter Bright wrote:

 On 6/25/2013 6:31 PM, Michel Fortin wrote:
 And finally, there's nothing unsafe with autorelease pools as long as you 
don't keep an unretained reference to an autoreleased object when the pool drains.
 Well, that's exactly the issue - an escaping reference.
Read the next sentence after your quote though: "Making sure you have no unretained reference is ARC's job, so with ARC it
should not be no problem."
 So with ARC, it's not unsafe.  I think that was the ultimate point.
Yes, I read the next sentence. What it means to me is that autorelease pools
add nothing when ARC is implemented - no optimizations, either. Either that or I don't understand how ARC figures out that there are no escaping references without doing things like runtime checks. OK, it sounded like you were saying still that Objective C with ARC didn't have memory safety. I'm no longer arguing about autorelease, I defer to Michel on that. I clearly didn't understand that autorelease doesn't provide any benefits for ARC, it's just there for compatibility. -Steve
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/25/2013 9:13 PM, Steven Schveighoffer wrote:
 OK, it sounded like you were saying still that Objective C with ARC didn't 
have memory safety. My initial first pass read of: http://clang.llvm.org/docs/AutomaticReferenceCounting.html indicates that ARC does not guarantee memory safety. There are a number of "undefined" behaviors when the O-C programmer does not follow the rules.
Oct 09 2013
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

Where are the benchmarks that show that this is a good idea in 
some real situations? This is the essential first step.


 If a class contains the following methods, in either itself or 
 a base class, it is
 an RC class:


     T AddRef();
     T Release();
What if a programmer adds only one of those two? Currently if you add only part of the hashing protocol (or you bork a function signature) the compiler often gives no errors. What are the plans for coalescing and optimizing away some reference counts updates? Bye, bearophile
Oct 09 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
 What are the plans for coalescing and optimizing away some 
 reference counts updates?
There is now a discussion and paper on optimizing a reference counter: http://lambda-the-ultimate.org/node/4825 Bye, bearophile
Oct 11 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/11/13 10:28 AM, bearophile wrote:
 What are the plans for coalescing and optimizing away some reference
 counts updates?
There is now a discussion and paper on optimizing a reference counter: http://lambda-the-ultimate.org/node/4825 Bye, bearophile
Yes, I think that's great work. Andrei
Oct 11 2013
prev sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 9 October 2013 at 22:31:34 UTC, Walter Bright wrote:
 ==Arrays==

 Built-in arrays have no place to put a reference count. Ref 
 counted arrays would hence
 become a library type, based on a ref counted class with 
 overloaded operators for
 the array operations.
You'll soon see the roadblock here with templates and type qualifiers. const(A!T) and A!const(T) has nothing to do with each other as far as the compiler is concerned, and no we have no way around this ATM.
Oct 09 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Updated incorporating Steven's suggestion, and some comments about 
shared/const/mutable/purity.

-------------------------------------------------------------

     Adding Reference Counting to D

D currently supports manual memory management and generalized GC.
Unfortunately, 
the pausing
and memory consumption inherent in GC is not acceptable for many programs, and 
manual
memory management is error-prone, tedious, and unsafe. A third style, reference 
counting (RC),
addresses this. Common implementations are COM's AddRef/Release, Objective-C's
ARC,
and C++'s shared_ptr<>.

None of these three schemes are guaranteed memory-safe, they all require the 
programmer
to conform to a protocol (even O-C's ARC). Stepping outside of the protocol 
results in
memory corruption. A D implementation must make it possible to use ref counted 
objects
in code marked as  safe, although it will be necessary for the implementation
of 
those
objects to be unsafe.

Some aspects of a D implementation are inevitable:

1. Avoid any requirement for more pointer types. This would cause drastic 
increases in
complexity for both the compiler and the user. It may make generic code much 
more difficult
to write.

2. Decay of a ref-counted pointer to a non-ref-counted pointer is unsafe, and 
can only
be allowed (in  safe code) in circumstances where it can be statically proven
to 
be safe.

3. A ref counted object is inherently a reference type, not a value type.

4. The compiler needs to know about ref counted types.


==Proposal==

If a class contains the following methods, in either itself or a base class, it
is
an RC class:


     T AddRef();
     T Release();

An RC class is like a regular D class with these additional semantics:

1. In  safe code, casting (implicit or explicit) to a base class that does not
have both AddRef() and Release() is an error.

2. Initialization of a class reference causes a call to AddRef().

3. Assignment to a class reference causes a call to AddRef() on the new value
followed by a call to Release() on its original value.

4. Null checks are done before calling any AddRef() or Release().

5. Upon scope exit of all RC variables or temporaries, a call to Release() is 
performed,
analogously to the destruction of struct variables and temporaries.

6. If a class or struct contains RC fields, calls to Release() for those fields
will
be added to the destructor, and a destructor will be created if one doesn't 
exist already.

7. If a closure is created that contains RC fields, either a compile time error 
will be
generated or a destructor will be created for it.

8. Explicit calls to AddRef/Release will not be allowed in  safe code.

9. A call to AddRef() will be added to any argument passed as a parameter.

10. Function returns have an AddRef() already done to the return value.

11. The compiler can elide any AddRef()/Release() calls it can prove are
redundant.

12. AddRef() is not called when passed as the implicit 'this' reference.

13. Taking the address of, or passing by reference, any fields of an RC object
is not allowed in  safe code. Passing by reference an RC field is allowed.

14. RC objects will still be allocated on the GC heap - this means that a normal
GC run will reap RC objects that are in a cycle, and RC objects will get 
automatically
scanned for heap references with no additional action required by the user.

15. The class implementor will be responsible for deciding whether or not to
support
sharing. Casting to shared is already disallowed in  safe code, so this is only
viable in system code.

16. RC objects cannot be const or immutable.

17. Can RC objects be arguments to pure functions?

==Existing Code==

D COM objects already have AddRef() and Release(). This proposal should not
break
that code, it'll just mean that there will be extra AddRef()/Release calls made.
Calling AddRef()/Release() should never have been allowed in  safe code anyway.

Any other existing uses of AddRef()/Release() will break.

==Arrays==

Built-in arrays have no place to put a reference count. Ref counted arrays
would 
hence
become a library type, based on a ref counted class with overloaded operators
for
the array operations.

==Results==

This is a very flexible approach, allowing for support of general RC objects,
as 
well
as specific support for COM objects and Objective-C ARC. AddRef()/Release()'s 
implementation
is entirely up to the user or library writer.

 safe code can be guaranteed to be memory safe, as long as AddRef()/Release() 
are correctly
implemented.
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:



While its a start, this is hardly enough for Objective-C. Mostly for legacy 
reasons, most Objective-C methods return autoreleased objects (deferred release 
using an autorelease pool) based on a naming convention. Also, Objective-C 
objects can't be allocated from the D heap, so to avoid cycles we need weak 
pointers. More on Objective-C later.

While it's good that a direct call to AddRef/Release is forbidden in  safe
code, 
I think it should be forbidden in  system code too. The reason is that if the 
compiler is inserting calls to these automatically and you're also adding your 
own explicitly in the same function, it becomes practically impossible to
reason 
about the reference counts, short of looking at the assembly. Instead, I think 
you should create a  noarc attribute for functions: it'll prevent the compiler 
for inserting any of those calls so it becomes the responsibility of the author 
to make those calls (which are then allowed).  noarc would be incompatible with 
 safe, obviously.

Finally, that's a nitpick but I wish you'd use function names that fit D
better, 
such as opRetain and opRelease. Then you can add a "final void opRetain() { 
AddRef(); }" function to the IUnknown COM interface and we could do the same
for 
Objective-C.



Objective-C is a special case. In Objective-C we need to know whether the 
returned object of a function is already retained or if it is deferred released 
(autoreleased). This is easily deducted from the naming convention. 
Occasionally, we might need to create autorelease pools too, but that can 
probably stay  system.

(Note: all this idea of autoreleased objects might sound silly, but it was a 
great help before ARC, and Objective-C ARC has to be compatible with legacy
code 
so it conforms to those conventions.)

You can easily implement ARC for COM using an implementation of ARC for 
Objective-C, the reverse is not true because COM does not have this (old but 
still needed) concept of autorelease pools and deferred release where you need 
to know at each function boundary whether returned values (including those 
returned by pointer arguments) whether the object is expected to be retained or
not.

If I were you Walter, I would just not care about Objective-C idioms while 
implementing this feature at first. It'll have to be special cased anyway. 
Here's how I expect that'll be done:

What will need to be done later when adding Objective-C support is to add an 
internal "autoreleasedReturn" flag to a function that'll make codegen call 
"autorelease" in the callee when returning an object and "retain" in the caller 
where it receives an object from a function with that flag. Also, the same flag 
and behaviour is needed for out parameters (to mimick those cases where an 
object is returned by pointer). That flag will then be set automatically 
internally depending on the function name (only for Objective-C member 
functions), and it should be possible to override it explicitly with an 
attribute or a pragma of some sort. This is what Clang is doing, and we must 
match that to allow things to work.

Checking for null is redundant in the Objective-C case: that check is done by 
the runtime. That's of minor importance, but it might impact performance and 
should probably special-cased in this case.



With Apple's implementation of reference counting (using global hash tables 
protected by spin locks), it is more efficient to update many counters in one 
operation. The codegen for Objective-C ARC upon assignement to a variable calls 
"objc_storeStrong(id *object, id value)", incrementing and decrementing the two 
counters presumably in one operation (as well as replacing the content of the 
variable pointed by the first argument with the new value).

Ideally, the codegen for Objective-C ARC in D would call the same functions so 
we have the same performance. This means that codegen should make a call 
"objc_retain" when first initializing a variable, "objc_storeStrong" when doing 
an assignment, and "objc_release" when destructing a variable.

As for returning autoreleased objects, there are two functions to choose from 
depending on whether the object needs to be retained at the same time nor not. 
(In general, the object needs to be retained prior autoreleasing if it comes 
from a variable not part of the function's stack frame.)

Here's Clang's documentation for how it implements ARC:
http://clang.llvm.org/docs/AutomaticReferenceCounting.html



Weak pointers are essential in order to break retain cycles in Objective-C
where 
there is no GC. They are implemented with the same kind of function calls as 
strong pointers. Unfortunately, Apple's Objective-C implementation won't sit 
well with D the way it works now.

Weak pointers are implemented in Objective-C by registering the address of the 
pointer with the runtime. This means that when a pointer is moved from one 
location to another, the need to be notified of that through a call to 
objc_moveWeak. This breaks one assumption of D that you can move memory at will 
without calling anything.

While we could still implement a working weak pointer with a template struct, 
that struct would have to allocate a pointer on the heap (where it is
guarantied 
to not move) so it can store the true weak pointer recognized by the runtime. 
I'm not sure that would be acceptable, but at least it would work.



I feel like I should share some of my thoughts here about a broader use of 
reference counting in D.

First, we don't have to assume the reference counter has to be part of the 
object. Apple implements reference counting using global hash tables where the 
key is the address. It works very well.

If we added a hash table like this for all memory allocated from the GC, we'd 
just have to find the base address of any memory block to get to its reference 
counter. I know you were designing with only classes in mind, but I want to 
point out that it is possible to reference-count everything the GC allocates if 
we want to.

The downside is that every assignment to a pointer anywhere has to call a 
function. While this is some overhead, it is more predictable than overhead
from 
a GC scan and would be preferred in some situation (games I guess). Another 
downside is you have an object retained by being present on the stack frame of
a 
C function, it'd have to be explicitly retained from elsewhere.

As for pointers not pointing to GC memory, the generic addRef/release functions 
can ignore those pointers just like the GC ignores them today when it does its
scan.

Finally, cycles can still be reclaimed by having the GC scan for them. Those 
scans should be less frequent however since most of the memory can be reclaimed 
through reference counting.
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/25/2013 6:09 PM, Michel Fortin wrote:


 While its a start, this is hardly enough for Objective-C. Mostly for legacy 
reasons, most Objective-C methods return autoreleased objects (deferred release using an autorelease pool) based on a naming convention. Also, Objective-C objects can't be allocated from the D heap, so to avoid cycles we need weak pointers. More on Objective-C later.
 While it's good that a direct call to AddRef/Release is forbidden in  safe 
code, I think it should be forbidden in system code too. The reason is that if the compiler is inserting calls to these automatically and you're also adding your own explicitly in the same function, it becomes practically impossible to reason about the reference counts, short of looking at the assembly. Instead, I think you should create a noarc attribute for functions: it'll prevent the compiler for inserting any of those calls so it becomes the responsibility of the author to make those calls (which are then allowed). noarc would be incompatible with safe, obviously. It's a good point, but adding such an attribute to the function may be too coarse, for one, and may cause composition problems, for another. Maybe just disallowing it altogether is the best solution.
 Finally, that's a nitpick but I wish you'd use function names that fit D 
better, such as opRetain and opRelease. Then you can add a "final void opRetain() { AddRef(); }" function to the IUnknown COM interface and we could do the same for Objective-C. Makes sense.


 Objective-C is a special case. In Objective-C we need to know whether the 
returned object of a function is already retained or if it is deferred released (autoreleased). This is easily deducted from the naming convention. Occasionally, we might need to create autorelease pools too, but that can probably stay system.
 (Note: all this idea of autoreleased objects might sound silly, but it was a 
great help before ARC, and Objective-C ARC has to be compatible with legacy code so it conforms to those conventions.)
 You can easily implement ARC for COM using an implementation of ARC for 
Objective-C, the reverse is not true because COM does not have this (old but still needed) concept of autorelease pools and deferred release where you need to know at each function boundary whether returned values (including those returned by pointer arguments) whether the object is expected to be retained or not.
 If I were you Walter, I would just not care about Objective-C idioms while 
implementing this feature at first. It'll have to be special cased anyway. Here's how I expect that'll be done: From reading over that clang document, O-C arc is far more complex than I'd anticipated. I think it is way beyond what we'd want in regular D. It also comes with all kinds of pointer and function annotations - something I strongly want to avoid.
 What will need to be done later when adding Objective-C support is to add an 
internal "autoreleasedReturn" flag to a function that'll make codegen call "autorelease" in the callee when returning an object and "retain" in the caller where it receives an object from a function with that flag. Also, the same flag and behaviour is needed for out parameters (to mimick those cases where an object is returned by pointer). That flag will then be set automatically internally depending on the function name (only for Objective-C member functions), and it should be possible to override it explicitly with an attribute or a pragma of some sort. This is what Clang is doing, and we must match that to allow things to work. I agree that this complexity should only be in O-C code.
 Checking for null is redundant in the Objective-C case: that check is done by 
the runtime. That's of minor importance, but it might impact performance and should probably special-cased in this case.


 With Apple's implementation of reference counting (using global hash tables 
protected by spin locks), it is more efficient to update many counters in one operation. The codegen for Objective-C ARC upon assignement to a variable calls "objc_storeStrong(id *object, id value)", incrementing and decrementing the two counters presumably in one operation (as well as replacing the content of the variable pointed by the first argument with the new value).
 Ideally, the codegen for Objective-C ARC in D would call the same functions 
so we have the same performance. This means that codegen should make a call "objc_retain" when first initializing a variable, "objc_storeStrong" when doing an assignment, and "objc_release" when destructing a variable.
 As for returning autoreleased objects, there are two functions to choose from 
depending on whether the object needs to be retained at the same time nor not. (In general, the object needs to be retained prior autoreleasing if it comes from a variable not part of the function's stack frame.)
 Here's Clang's documentation for how it implements ARC:
 http://clang.llvm.org/docs/AutomaticReferenceCounting.html



 Weak pointers are essential in order to break retain cycles in Objective-C 
where there is no GC. They are implemented with the same kind of function calls as strong pointers. Unfortunately, Apple's Objective-C implementation won't sit well with D the way it works now.
 Weak pointers are implemented in Objective-C by registering the address of 
the pointer with the runtime. This means that when a pointer is moved from one location to another, the need to be notified of that through a call to objc_moveWeak. This breaks one assumption of D that you can move memory at will without calling anything.
 While we could still implement a working weak pointer with a template struct, 
that struct would have to allocate a pointer on the heap (where it is guarantied to not move) so it can store the true weak pointer recognized by the runtime. I'm not sure that would be acceptable, but at least it would work.


 I feel like I should share some of my thoughts here about a broader use of 
reference counting in D.
 First, we don't have to assume the reference counter has to be part of the 
object. Apple implements reference counting using global hash tables where the key is the address. It works very well.
 If we added a hash table like this for all memory allocated from the GC, we'd 
just have to find the base address of any memory block to get to its reference counter. I know you were designing with only classes in mind, but I want to point out that it is possible to reference-count everything the GC allocates if we want to. D would need manual, RC and GC to coexist peacefully.
 The downside is that every assignment to a pointer anywhere has to call a 
function. While this is some overhead, it is more predictable than overhead from a GC scan and would be preferred in some situation (games I guess). Another downside is you have an object retained by being present on the stack frame of a C function, it'd have to be explicitly retained from elsewhere. Doesn't this make it impractical to mix vanilla C with D code? An important feature of D is this capability, without worrying about a "JNI" style interface. As for D switching to a full refcounted GC for everything, I'm very hesitant for such a step. For one thing, reading the clang spec on all the various pointer and function annotations necessary is very off-putting.
 As for pointers not pointing to GC memory, the generic addRef/release 
functions can ignore those pointers just like the GC ignores them today when it does its scan.
 Finally, cycles can still be reclaimed by having the GC scan for them. Those 
scans should be less frequent however since most of the memory can be reclaimed through reference counting.

Oct 09 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 25.06.2013 23:00, Walter Bright wrote:
 Updated incorporating Steven's suggestion, and some comments about
 shared/const/mutable/purity.

 -------------------------------------------------------------

      Adding Reference Counting to D
Cool. I didn't expect this to be tackled so soon.

[...]
 3. A ref counted object is inherently a reference type, not a value type.
As Michel also said, the reference count does not have to be in inside the object itself, so we might want to allow reference counting on other types aswell.
 4. The compiler needs to know about ref counted types.
I imagine a few (constrained) templated functions for the different operations defined in the library could also do the job, though it might drown compilation speed. Also getting help from the optimizer to remove redundant calls will need some back doors.
 ==Proposal==

 If a class contains the following methods, in either itself or a base
 class, it is
 an RC class:


      T AddRef();
      T Release();
Is T typeof(this) here? I don't think we should force linking this functionality with COM, the programmer can do this with a simple wrapper.
 An RC class is like a regular D class with these additional semantics:

 1. In  safe code, casting (implicit or explicit) to a base class that
 does not
 have both AddRef() and Release() is an error.

 2. Initialization of a class reference causes a call to AddRef().

 3. Assignment to a class reference causes a call to AddRef() on the new
 value
 followed by a call to Release() on its original value.
It might be common knowledge, but I want to point out that the usual COM implementation (atomic increment/decrement and free when refcount goes down to 0) is not thread-safe for shared pointers. That means you either have to guard all reads and writes with a lock to make the full assignment atomic or have to implement reference counting very different (e.g. deferred reference counting).
 4. Null checks are done before calling any AddRef() or Release().

 5. Upon scope exit of all RC variables or temporaries, a call to
 Release() is performed,
 analogously to the destruction of struct variables and temporaries.

 6. If a class or struct contains RC fields, calls to Release() for those
 fields will
 be added to the destructor, and a destructor will be created if one
 doesn't exist already.

 7. If a closure is created that contains RC fields, either a compile
 time error will be
 generated or a destructor will be created for it.

 8. Explicit calls to AddRef/Release will not be allowed in  safe code.

 9. A call to AddRef() will be added to any argument passed as a parameter.

 10. Function returns have an AddRef() already done to the return value.

 11. The compiler can elide any AddRef()/Release() calls it can prove are
 redundant.

 12. AddRef() is not called when passed as the implicit 'this' reference.
Isn't this unsafe if a member function is called through the last existing reference and this reference is then cleared during execution of this member function or from another thread?
 13. Taking the address of, or passing by reference, any fields of an RC
 object
 is not allowed in  safe code. Passing by reference an RC field is allowed.
Please note that this includes slices to fixed size arrays.
 14. RC objects will still be allocated on the GC heap - this means that
 a normal
 GC run will reap RC objects that are in a cycle, and RC objects will get
 automatically
 scanned for heap references with no additional action required by the user.

 15. The class implementor will be responsible for deciding whether or
 not to support
 sharing. Casting to shared is already disallowed in  safe code, so this
 is only
 viable in system code.

 16. RC objects cannot be const or immutable.
This is a bit of a downer. If the reference count is not within the object, this can be implemented.
 17. Can RC objects be arguments to pure functions?

 ==Existing Code==

 D COM objects already have AddRef() and Release(). This proposal should
 not break
 that code, it'll just mean that there will be extra AddRef()/Release
 calls made.
 Calling AddRef()/Release() should never have been allowed in  safe code
 anyway.

 Any other existing uses of AddRef()/Release() will break.

 ==Arrays==

 Built-in arrays have no place to put a reference count. Ref counted
 arrays would hence
 become a library type, based on a ref counted class with overloaded
 operators for
 the array operations.

 ==Results==

 This is a very flexible approach, allowing for support of general RC
 objects, as well
 as specific support for COM objects and Objective-C ARC.
 AddRef()/Release()'s implementation
 is entirely up to the user or library writer.

  safe code can be guaranteed to be memory safe, as long as
 AddRef()/Release() are correctly
 implemented.
I feel I'm hijacking this proposal, but the step to library defined read/write barriers seems pretty small. Make AddRef, Release and assignment free template functions, e.g. void ptrConstruct(T,bool stackOrHeap)(T*adr, T p); void ptrAssign(T,bool stackOrHeap)(T*adr, T p); void ptrRelease(T,bool stackOrHeap)(T*adr); and we are able to experiment with all kinds of sophisticated GC algorithms including RC. Eliding redundant addref/release pairs would need some extra support though, I read that LLVM does something like this, but I don't know how.
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 As Michel also said, the reference count does not have to be in inside the 
object itself, so we might want to allow reference counting on other types aswell. That opens the question of what is the point of other RC types? For example, C++ can throw any type - but it turns out that throwing anything but class types is largely pointless. My proposal does not specify where the count actually is - the two functions can be arbitrarily implemented.
 4. The compiler needs to know about ref counted types.
I imagine a few (constrained) templated functions for the different
operations defined in the library could also do the job, though it might drown compilation speed. Also getting help from the optimizer to remove redundant calls will need some back doors. I don't see how this can be done without specific compiler knowledge in a memory safe way.
      T AddRef();
      T Release();
Is T typeof(this) here?
T is not relevant to the proposal - it's up to the specific implementation of those functions.
 I don't think we should force linking this functionality with COM, the 
programmer can do this with a simple wrapper. Yeah, that's Michel's suggestion, and it's a good one.
 An RC class is like a regular D class with these additional semantics:

 1. In  safe code, casting (implicit or explicit) to a base class that
 does not
 have both AddRef() and Release() is an error.

 2. Initialization of a class reference causes a call to AddRef().

 3. Assignment to a class reference causes a call to AddRef() on the new
 value
 followed by a call to Release() on its original value.
It might be common knowledge, but I want to point out that the usual COM
implementation (atomic increment/decrement and free when refcount goes down to 0) is not thread-safe for shared pointers. That means you either have to guard all reads and writes with a lock to make the full assignment atomic or have to implement reference counting very different (e.g. deferred reference counting). Since the implementation of AddRef()/Release() is up to the user, whether it uses locks or not and whether it supports shared or not is up to the user.
 12. AddRef() is not called when passed as the implicit 'this' reference.
Isn't this unsafe if a member function is called through the last existing
reference and this reference is then cleared during execution of this member function or from another thread? No. The caller of the function still retains a reference in that thread.
 13. Taking the address of, or passing by reference, any fields of an RC
 object
 is not allowed in  safe code. Passing by reference an RC field is allowed.
Please note that this includes slices to fixed size arrays.
As I suggested, arrays would not be supported with this proposal - but the user can create ref counted array-like objects.
 16. RC objects cannot be const or immutable.
This is a bit of a downer. If the reference count is not within the object,
this can be implemented. Also, an exception could be made for the AddRef()/Release() functions.
 I feel I'm hijacking this proposal, but the step to library defined 
read/write barriers seems pretty small. Make AddRef, Release and assignment free template functions, e.g.
 void ptrConstruct(T,bool stackOrHeap)(T*adr, T p);
 void ptrAssign(T,bool stackOrHeap)(T*adr, T p);
 void ptrRelease(T,bool stackOrHeap)(T*adr);

 and we are able to experiment with all kinds of sophisticated GC algorithms 
including RC. Eliding redundant addref/release pairs would need some extra support though, I read that LLVM does something like this, but I don't know how.

It's pretty invasive into the code generation and performance, and could 
completely disrupt the C compatibility of D.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Jacob Carlborg wrote:

On Jun 25, 2013, at 11:00 PM, Walter Bright wrote:

 Updated incorporating Steven's suggestion, and some comments about
 shared/const/mutable/purity.
It's great that you're bringing this up. I think it's a good idea to add reference counting to D. Although I don't think I can add much to the discussion and Michel have a lot better knowledge about Objective-C than I have. All I can say is that I really want it to be compatible with Objective-C.
Oct 09 2013
prev sibling parent zoujiaqing <zoujiaqing gmail.com> writes:
On Thursday, 10 October 2013 at 00:02:09 UTC, Walter Bright wrote:
 Updated incorporating Steven's suggestion, and some comments 
 about shared/const/mutable/purity.

 -------------------------------------------------------------

     Adding Reference Counting to D
Automatic Reference Counting (ARC) as an alternative to D's Garbage Collector? What is its state? I think ARC is a very good direction. Thanks. Ref link: https://wiki.dlang.org/Language_design_discussions#Automatic_Reference_Counting_.28ARC.29_as_an_alternative_to_D.27s_Garbage_Collector
Sep 18 2019
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 26-juin-2013 à 5:38, Walter Bright  a écrit :

 On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 As Michel also said, the reference count does not have to be in inside the 
object itself, so we might want to allow reference counting on other types aswell.
 That opens the question of what is the point of other RC types? For example, 
C++ can throw any type - but it turns out that throwing anything but class types is largely pointless. RC is just another garbage collection scheme. You might favor it for its performance characteristics, its determinism, or the lower memory footprint. Or you might need it to interact with foreign code that relies on it (COM, Objective-C, etc.), in which case it needs to be customizable (use the foreign implementation) or be manually managed. That's two different use cases. And in the later case you can't use the GC to release cycles because foreign code is using memory invisible to the GC. It is important to note that when foreign code calls AddRef you don't want the GC to collect that object, at least not until Release is called.
 I don't think we should force linking this functionality with COM, the 
programmer can do this with a simple wrapper.
 Yeah, that's Michel's suggestion, and it's a good one.
It could also be done with user attributes: void x_init(ref X x); void x_destroy(ref X x); void x_assign(ref X a, X b); // retains a, releases b arc!(x_init, x_destroy, x_assign) class X {} This way you don't have to check for null in the generated code: the standalone function is in charge of that. Also, this "assign" function here provides good opportunity for optimization in the RC implementation because you can update the two reference counts in a single operation (in the case where they're stored at the same place and are protected by the same lock). It's an improvement over making two separate function calls.
 No. The caller of the function still retains a reference in that thread.
This should also apply to all function arguments. The caller is better placed than the callee to elide redundant AddRef/Release pairs, so it should be the one in charge of retaining the object for the callee.
 16. RC objects cannot be const or immutable.
This is a bit of a downer. If the reference count is not within the object,
this can be implemented.
 Also, an exception could be made for the AddRef()/Release() functions.
I'm not too fond of that idea.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:


Le 26-juin-2013 à 1:36, Walter Bright  a écrit :

 On 6/25/2013 6:09 PM, Michel Fortin wrote:
 Instead, I think you should create a  noarc attribute for functions: it'll 
prevent the compiler for inserting any of those calls so it becomes the responsibility of the author to make those calls (which are then allowed). noarc would be incompatible with safe, obviously.
 It's a good point, but adding such an attribute to the function may be too 
coarse, for one, and may cause composition problems, for another. Maybe just disallowing it altogether is the best solution. Well, it's mostly required to write runtime support functions. The attribute could be more obscure so people are less tempted to use it, but if you're going to implement the ref-counting code you'll need that.


 I feel like I should share some of my thoughts here about a broader use of 
reference counting in D.
 First, we don't have to assume the reference counter has to be part of the 
object. Apple implements reference counting using global hash tables where the key is the address. It works very well.
 If we added a hash table like this for all memory allocated from the GC, 
we'd just have to find the base address of any memory block to get to its reference counter. I know you were designing with only classes in mind, but I want to point out that it is possible to reference-count everything the GC allocates if we want to.
 D would need manual, RC and GC to coexist peacefully.
The problem is how to make the three of those use the same codegen? - Druntime could have a flag to disable/enable refcounting. It'd make the retain/release functions no-ops, but it'd not prevent the GC from reclaiming memory as it does today. - Druntime could have a flag to disable/enable garbage collection (it already has). That'd prevent cycles from being collected, but you could use weak pointers to work around that or request a collection manually at the appropriate time. - A noarc (or similar) attribute at the function level could be used to prevent the compiler from generating function calls on pointer assignments. You could make a whole module noarc if you want by adding " noarc:" at the top. Here's the annoying thing: noarc is totally safe if reference counting is disabled and we rely entirely on the GC. noarc is unsafe when reference counting is enabled.
 The downside is that every assignment to a pointer anywhere has to call a 
function. While this is some overhead, it is more predictable than overhead from a GC scan and would be preferred in some situation (games I guess). Another downside is you have an object retained by being present on the stack frame of a C function, it'd have to be explicitly retained from elsewhere.
 Doesn't this make it impractical to mix vanilla C with D code? An important 
feature of D is this capability, without worrying about a "JNI" style interface. It's not very different than with the GC today. If you call a C function by giving it a ref-counted pointer argument, that memory block is guarantied to live at least for that call's lifetime (because it is retained by the caller). So simple calls to C functions are not a problem. If the C function puts that pointer elsewhere you'll need to retain it some other way, but you have to do this with the GC too. If you're implementing a callback called from C you need to care about what you return because the caller's C code won't retain it, while with the GC you could manage if C code did not store that pointer outside of the stack. I think that's all you have to worry about.
 As for D switching to a full refcounted GC for everything, I'm very hesitant 
for such a step. For one thing, reading the clang spec on all the various pointer and function annotations necessary is very off-putting. Don't let Clang intimidate you. The Clang spec is about four to five time more complicated than needed because of autoreleased objects and because it supports weak pointers. Weak pointers can be implemented as a struct templates (as long as we have noarc). And all those annotations are for special cases, when you need to break the rules. You don't use them when doing normal programming, well except for __weak.
Oct 09 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 26.06.2013 11:38, Walter Bright wrote:
 On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 I imagine a few (constrained) templated functions for the different
 operations defined in the library could also do the job, though it
 might drown compilation speed. Also getting help from the optimizer to
 remove redundant calls will need some back doors.
I don't see how this can be done without specific compiler knowledge in a memory safe way.
I currently don't see how it can be memory safe with this proposal.
 3. Assignment to a class reference causes a call to AddRef() on the new
 value
 followed by a call to Release() on its original value.
It might be common knowledge, but I want to point out that the usual COM implementation (atomic increment/decrement and free when refcount goes down to 0) is not thread-safe for shared pointers. That means you either have to guard all reads and writes with a lock to make the full assignment atomic or have to implement reference counting very different (e.g. deferred reference counting).
Since the implementation of AddRef()/Release() is up to the user, whether it uses locks or not and whether it supports shared or not is up to the user.
You have to put the lock around the pair of AddRef and Release, but if the compiler already splits this into two function calls, this cannot be done in the implementation.
 12. AddRef() is not called when passed as the implicit 'this' reference.
Isn't this unsafe if a member function is called through the last existing reference and this reference is then cleared during execution of this member function or from another thread?
No. The caller of the function still retains a reference in that thread.
Hmmm, I guess I misunderstand the proposal. Assume for example a refcounted class R and this code class R : RefCounted { int _x; int readx() { return _x; } } int main() { R r = new R; return r.readx(); } According to 12. there is no refcounting going on when calling or executing readx. Ok, now what happens here: class R : RefCounted { int _x; int readx(C c) { c.r = null; // "standard" rc deletes r here return _x; // reads garbage } } class C { R r; } int main() { C c = new C; c.r = new R; return c.r.readx(c); } This reads garbage or crashes if there is no reference counting going on when calling readx.
 13. Taking the address of, or passing by reference, any fields of an RC
 object
 is not allowed in  safe code. Passing by reference an RC field is
 allowed.
Please note that this includes slices to fixed size arrays.
As I suggested, arrays would not be supported with this proposal - but the user can create ref counted array-like objects.
Just to clarify, I meant taking a slice of a static array that is a field of a refcounted class. Is it forbidden to have a field like this in a refcounted class or is taking the address through slicing forbidden?
 I feel I'm hijacking this proposal, but the step to library defined
 read/write barriers seems pretty small. Make AddRef, Release and
 assignment free template functions, e.g.

 void ptrConstruct(T,bool stackOrHeap)(T*adr, T p);
 void ptrAssign(T,bool stackOrHeap)(T*adr, T p);
 void ptrRelease(T,bool stackOrHeap)(T*adr);

 and we are able to experiment with all kinds of sophisticated GC
 algorithms including RC. Eliding redundant addref/release pairs would
 need some extra support though, I read that LLVM does something like
 this, but I don't know how.
It's pretty invasive into the code generation and performance, and could completely disrupt the C compatibility of D.
I don't see a big difference between a free function and a member function call, though the template character of it might hurt compilation performance. Two more notes: - I'm not sure it is mentioned, but I think you have to describe what happens when copying a struct. pre- and post-blit actions have to be taken if the struct contain pointers to refcounted objects.
 10. Function returns have an AddRef() already done to the return value.
- A refcounted reference returned from a function (including new) would have to be Released if the return value is ignored or if only used as part of an expression.
Oct 09 2013
parent reply Robert Schadek <realburner gmx.de> writes:
On 10/10/2013 03:45 AM, Walter Bright wrote:
 Rainer Schuetze wrote:

 You have to put the lock around the pair of AddRef and Release, but if
 the compiler already splits this into two function calls, this cannot
 be done in the implementation.
I would imagine the counter to be manipulated with atomic_add_and_fetch operations, so no locks are required.
Oct 10 2013
parent reply "inout" <inout gmail.com> writes:
On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
wrote:
 On 10/10/2013 03:45 AM, Walter Bright wrote:
 Rainer Schuetze wrote:

 You have to put the lock around the pair of AddRef and 
 Release, but if
 the compiler already splits this into two function calls, this 
 cannot
 be done in the implementation.
I would imagine the counter to be manipulated with atomic_add_and_fetch operations, so no locks are required.
On shared objects, yes. Local objects need no atomics at all.
Oct 11 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 12.10.2013 04:16, inout wrote:
 On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
 wrote:
 On 10/10/2013 03:45 AM, Walter Bright wrote:
 Rainer Schuetze wrote:

 You have to put the lock around the pair of AddRef and Release, but if
 the compiler already splits this into two function calls, this cannot
 be done in the implementation.
I would imagine the counter to be manipulated with atomic_add_and_fetch operations, so no locks are required.
On shared objects, yes. Local objects need no atomics at all.
Atomic increments/decrements are not good enough for shared references. See this example from later in the discussion: Consider a global shared reference R that holds the last reference to an object O. One thread exchanges the reference with another reference P while another thread reads the reference into S. shared(C) R = O; ; refcnt of O is 1 in pseudo-assembly missing null-checks: Thread1 (R = P) Thread2 (S = R) mov ecx,[R] ; thread suspended mov eax,[P] inc [eax].refcnt mov ebx,[R] mov [R],eax dec [ebx].refcnt ; refcnt of O now 0 jnz done call delete_ebx ; thread resumed inc [ecx].refcnt done: The increment on [ecx].refcnt modifies garbage.
Oct 11 2013
next sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-12 06:16:17 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 On 12.10.2013 04:16, inout wrote:
 On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
 wrote:
 I would imagine the counter to be manipulated with atomic_add_and_fetch
 operations, so no locks are required.
On shared objects, yes. Local objects need no atomics at all.
Atomic increments/decrements are not good enough for shared references. See this example from later in the discussion: Consider a global shared reference R that holds the last reference to an object O. One thread exchanges the reference with another reference P while another thread reads the reference into S. shared(C) R = O; ; refcnt of O is 1 in pseudo-assembly missing null-checks: Thread1 (R = P) Thread2 (S = R) mov ecx,[R] ; thread suspended mov eax,[P] inc [eax].refcnt mov ebx,[R] mov [R],eax dec [ebx].refcnt ; refcnt of O now 0 jnz done call delete_ebx ; thread resumed inc [ecx].refcnt done: The increment on [ecx].refcnt modifies garbage.
I think you are mixing shared references with shared objects. Atomic increment/decrement will work fine for shared objects with unshared pointers to it. But if the pointer to the object is itself shared and available from two threads (as in your example) then you need to properly serialize reads and writes to that pointer (with a mutex or through other means). Of course, then you fall into the problem that in D you are not able to set different attributes to an object and to its reference. (Hint: const(Object)ref) -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 12 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 12.10.2013 14:16, Michel Fortin wrote:
 On 2013-10-12 06:16:17 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 On 12.10.2013 04:16, inout wrote:
 On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
 wrote:
 I would imagine the counter to be manipulated with atomic_add_and_fetch
 operations, so no locks are required.
On shared objects, yes. Local objects need no atomics at all.
Atomic increments/decrements are not good enough for shared references. See this example from later in the discussion: Consider a global shared reference R that holds the last reference to an object O. One thread exchanges the reference with another reference P while another thread reads the reference into S. shared(C) R = O; ; refcnt of O is 1 in pseudo-assembly missing null-checks: Thread1 (R = P) Thread2 (S = R) mov ecx,[R] ; thread suspended mov eax,[P] inc [eax].refcnt mov ebx,[R] mov [R],eax dec [ebx].refcnt ; refcnt of O now 0 jnz done call delete_ebx ; thread resumed inc [ecx].refcnt done: The increment on [ecx].refcnt modifies garbage.
I think you are mixing shared references with shared objects. Atomic increment/decrement will work fine for shared objects with unshared pointers to it. But if the pointer to the object is itself shared and available from two threads (as in your example) then you need to properly serialize reads and writes to that pointer (with a mutex or through other means).
I agree, that's why I used the term "shared reference", too ;-) If you are using only shared objects, but not shared references, you'll have to use message passing coming with its own set of synchronization operations that are not easily made lock-free.
 Of course, then you fall into the problem that in D you are not able to
 set different attributes to an object and to its reference. (Hint:
 const(Object)ref)
I think being able to distinguish these types is missing from the type system. It's even worse for the class describing TypeInfo object as it is used for both the reference and the object.
Oct 12 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-13 06:54:04 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 I agree, that's why I used the term "shared reference", too ;-)
 
 If you are using only shared objects, but not shared references, you'll 
 have to use message passing coming with its own set of synchronization 
 operations that are not easily made lock-free.
For one of my projects I implemented a shared pointer like this. It uses the pointer value itself as a spin lock with the assumption that -1 is an invalid pointer value: 1. read pointer value 2. if read value is -1 go to line 1 (spin) 3. compare and swap (previously read value <-> -1) 4. if failure go to line 1 (spin) // now pointer is "locked", its value is -1 but we have a copy of the original 5. copy pointer locally or assign to it (and update counter) 6. write back pointer value atomically to replace the -1 No mutex, but there's a spin lock so it's not good if there's contention. That said, I find it extremely rare to want a shared pointer that isn't already protected by a mutex alongside other variables, or that isn't propagated using some form of message passing. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 13 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 13.10.2013 13:48, Michel Fortin wrote:
 On 2013-10-13 06:54:04 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 I agree, that's why I used the term "shared reference", too ;-)

 If you are using only shared objects, but not shared references,
 you'll have to use message passing coming with its own set of
 synchronization operations that are not easily made lock-free.
For one of my projects I implemented a shared pointer like this. It uses the pointer value itself as a spin lock with the assumption that -1 is an invalid pointer value: 1. read pointer value 2. if read value is -1 go to line 1 (spin) 3. compare and swap (previously read value <-> -1) 4. if failure go to line 1 (spin) // now pointer is "locked", its value is -1 but we have a copy of the original 5. copy pointer locally or assign to it (and update counter) 6. write back pointer value atomically to replace the -1 No mutex, but there's a spin lock so it's not good if there's contention. That said, I find it extremely rare to want a shared pointer that isn't already protected by a mutex alongside other variables, or that isn't propagated using some form of message passing.
Locking is very bad if you have threads at different priorities as it might introduce priority inversion. Spinning is probably even worse in that scenario. At work, I use shared pointers all the time to pass information to a real time audio thread. The scheme uses triple-buffering of pointers for a lock free safe transport from/to the real time thread. Not having to worry about these low-level locking stuff is one of the good aspects about garbage collecting.
Oct 14 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-14 17:57:18 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 On 13.10.2013 13:48, Michel Fortin wrote:
 
 For one of my projects I implemented a shared pointer like this. It uses
 the pointer value itself as a spin lock with the assumption that -1 is
 an invalid pointer value:
 
 1. read pointer value
 2. if read value is -1 go to line 1 (spin)
 3. compare and swap (previously read value <-> -1)
 4. if failure go to line 1 (spin)
 // now pointer is "locked", its value is -1 but we have a copy of the
 original
 5. copy pointer locally or assign to it (and update counter)
 6. write back pointer value atomically to replace the -1
 
 No mutex, but there's a spin lock so it's not good if there's contention.
 
 That said, I find it extremely rare to want a shared pointer that isn't
 already protected by a mutex alongside other variables, or that isn't
 propagated using some form of message passing.
Locking is very bad if you have threads at different priorities as it might introduce priority inversion. Spinning is probably even worse in that scenario.
Spinning is good only when you very rarely expect contention, which is the case for me. The above code is used once per object the first time someone requests a weak pointer for it. Having contention for that just doesn't make sense in most usage patterns. But still, being curious I added a log message anytime it does actually has to spin, and I have yet to see that message once in my logs (which probably means aggressive enough unit tests are missing). If you have a lot of read accesses and rarely write to the pointer you could instead try a read-write mutex with concurrent read access. In any case, there's no solution that will be ideal in all cases. Different situations asks for different trade-offs.
 At work, I use shared pointers all the time to pass information to a 
 real time audio thread. The scheme uses triple-buffering of pointers 
 for a lock free safe transport from/to the real time thread.
 
 Not having to worry about these low-level locking stuff is one of the 
 good aspects about garbage collecting.
Indeed. The current garbage collector makes it easy to have shared pointers to shared objects. But the GC can also interrupt real-time threads for an unpredictable duration, how do you cope with that in a real-time thread? I know ARC isn't the ideal solution for all use cases. But neither is the GC, especially for real-time applications. So, which one would you recommend for a project having a real-time audio thread? -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 14 2013
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Monday, 14 October 2013 at 19:42:36 UTC, Michel Fortin wrote:
 Indeed. The current garbage collector makes it easy to have 
 shared pointers to shared objects. But the GC can also 
 interrupt real-time threads for an unpredictable duration, how 
 do you cope with that in a real-time thread?

 I know ARC isn't the ideal solution for all use cases. But 
 neither is the GC, especially for real-time applications. So, 
 which one would you recommend for a project having a real-time 
 audio thread?
If you don't want any pause, concurrent GC is the way to go. This type of GC come at a cost of increased memory usage (everything is a tradeoff) but exists.
Oct 14 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-14 20:45:35 +0000, "deadalnix" <deadalnix gmail.com> said:

 On Monday, 14 October 2013 at 19:42:36 UTC, Michel Fortin wrote:
 Indeed. The current garbage collector makes it easy to have shared 
 pointers to shared objects. But the GC can also interrupt real-time 
 threads for an unpredictable duration, how do you cope with that in a 
 real-time thread?
 
 I know ARC isn't the ideal solution for all use cases. But neither is 
 the GC, especially for real-time applications. So, which one would you 
 recommend for a project having a real-time audio thread?
If you don't want any pause, concurrent GC is the way to go. This type of GC come at a cost of increased memory usage (everything is a tradeoff) but exists.
I'm not an expert in GCs, but as far as I know a concurrent GC also requires some bookkeeping to be done when assigning to pointers, similar to ARC, and also when moving pointers, unlike ARC. So it requires hooks in the codegen that will perform atomic operations, just like ARC. Unless of course you use "fork" and scan inside the child process… but that has its own set of problems on some platforms. The only consensus we'll reach is that different projects have different needs. In theory being able to swap the GC for something else could bring everyone together. But to be able to replace the GC for another with a strategy different enough to matter (concurrent GC or ARC) you need the codegen to be different. So we can either: 1. make the codegen configurable -- which brings its own set of compatibility problems for compiled code but is good for experimentation, or 2. settle on something middle-ground performance-wise that can accommodate a couple of strategies, or 3. choose one GC strategy that ought to satisfy everybody and adapt the codegen to fit, or 4. keep everything as is. We're stuck with 4 until something else is decided. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 14 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Monday, 14 October 2013 at 21:25:29 UTC, Michel Fortin wrote:
 I'm not an expert in GCs, but as far as I know a concurrent GC 
 also requires some bookkeeping to be done when assigning to 
 pointers, similar to ARC, and also when moving pointers, unlike 
 ARC. So it requires hooks in the codegen that will perform 
 atomic operations, just like ARC.
Usual strategy include : - When you JIT, change the function itself, to write pointers through a function that mark the old value as live. - When AOT, always go throw that function, which make a test and mark alive the old value if this is done during a collection. This basically add a read to a global and a test for each pointer write. - Use the page protection mechanism and do regular write. This can be done via fork, but also via remapping the GCed memory as COW. The tax is then more expensive, but you only pay it once per page you actually write and only when actually collecting. The good news, is that this tax is only required for object that contains shared mutable pointers. In D, most data are thread local or immutable. D's type system is really friendly to concurrent GC, and we definitively should go in that direction.
 The only consensus we'll reach is that different projects have 
 different needs. In theory being able to swap the GC for 
 something else could bring everyone together. But to be able to 
 replace the GC for another with a strategy different enough to 
 matter (concurrent GC or ARC) you need the codegen to be 
 different. So we can either:
ARC like system need a different codegen, but you can do this with regular codegen if you use page protection to detect writes.
 1. make the codegen configurable -- which brings its own set of 
 compatibility problems for compiled code but is good for 
 experimentation, or
Bad, we will end up having different incompatible binaries.
Oct 14 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-14 23:45:42 +0000, "deadalnix" <deadalnix gmail.com> said:

 On Monday, 14 October 2013 at 21:25:29 UTC, Michel Fortin wrote:
 I'm not an expert in GCs, but as far as I know a concurrent GC also 
 requires some bookkeeping to be done when assigning to pointers, 
 similar to ARC, and also when moving pointers, unlike ARC. So it 
 requires hooks in the codegen that will perform atomic operations, just 
 like ARC.
Usual strategy include : - When you JIT, change the function itself, to write pointers through a function that mark the old value as live. - When AOT, always go throw that function, which make a test and mark alive the old value if this is done during a collection. This basically add a read to a global and a test for each pointer write. - Use the page protection mechanism and do regular write. This can be done via fork, but also via remapping the GCed memory as COW. The tax is then more expensive, but you only pay it once per page you actually write and only when actually collecting. The good news, is that this tax is only required for object that contains shared mutable pointers. In D, most data are thread local or immutable. D's type system is really friendly to concurrent GC, and we definitively should go in that direction.
 The only consensus we'll reach is that different projects have 
 different needs. In theory being able to swap the GC for something else 
 could bring everyone together. But to be able to replace the GC for 
 another with a strategy different enough to matter (concurrent GC or 
 ARC) you need the codegen to be different. So we can either:
ARC like system need a different codegen, but you can do this with regular codegen if you use page protection to detect writes.
Very insightful. Thank you.
 1. make the codegen configurable -- which brings its own set of 
 compatibility problems for compiled code but is good for 
 experimentation, or
Bad, we will end up having different incompatible binaries.
So as I understand it, your plan would then be: - use concurrent GC using the page protection mechanism and COW to catch writes during collection - add thread-local awareness to the GC so only shared and mutable memory needs COW Makes sense. But adding thread-local awareness requires a couple of language changes. It won't work as long as people keep casting things around so you need to fix a lot of cases where casts are needed. But otherwise, seems like a good plan. I'm a little wary of the cost of doing COW during collection. Obviously the GC isn't pausing the program per-see, but it'll be slowing it down. By how much is probably dependent on what you're doing. At the very least the GC should allocate types with no pointers on separate pages from those with pointers. Also, what are the calls required to implement page protection and COW on posix? I'd like to check whether those are allowed within the OS X and iOS sandbox. For instance fork() isn't allowed for sandboxed apps. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 14 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 15 October 2013 at 00:38:20 UTC, Michel Fortin wrote:
 So as I understand it, your plan would then be:

 - use concurrent GC using the page protection mechanism and COW 
 to catch writes during collection
 - add thread-local awareness to the GC so only shared and 
 mutable memory needs COW
Yes, I think this is what make the most sense for D. If the GC allow explicit free; the reference counting and other manual memory management techniques can be built on top of it.
 Makes sense. But adding thread-local awareness requires a 
 couple of language changes. It won't work as long as people 
 keep casting things around so you need to fix a lot of cases 
 where casts are needed.
Type qualifiers provide the necessary information. However, some practice (that are already mentioned as being undefined behavior) will become really unsafe. It also require to enrich the GC api, but users aren't supposed to use it directly :D
 I'm a little wary of the cost of doing COW during collection. 
 Obviously the GC isn't pausing the program per-see, but it'll 
 be slowing it down. By how much is probably dependent on what 
 you're doing. At the very least the GC should allocate types 
 with no pointers on separate pages from those with pointers.
Nothing is free, and indeed, trapping the write via memory protection is quite expensive. Hopefully, we can segregate object that contain mutable pointer from others and only memory protect theses. It will indeed cause trouble for code that mutate a large amount of shared pointers. I'd say that such code is probably asking for trouble in the first place, but as always, no silver bullet. I still think solution is the one that fit D the best.
 Also, what are the calls required to implement page protection 
 and COW on posix? I'd like to check whether those are allowed 
 within the OS X and iOS sandbox. For instance fork() isn't 
 allowed for sandboxed apps.
You need mmap, mprotect and all the signal handling machinery.
Oct 14 2013
next sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-15 02:20:49 +0000, "deadalnix" <deadalnix gmail.com> said:

 Also, what are the calls required to implement page protection and COW 
 on posix? I'd like to check whether those are allowed within the OS X 
 and iOS sandbox. For instance fork() isn't allowed for sandboxed apps.
You need mmap, mprotect and all the signal handling machinery.
mprotect is the one I'm worried about, as it lets you set the executable bit (among other things) which could be exploited to run arbitrary code. So I tested it and it seems to work fine on OS X inside the sandbox (including for setting the executable bit). I guess an executable with a reference to mprotect would probably also pass Apple's Mac App Store validation, but I haven't tested. mprotect isn't available at all with the iOS SDK. So making this collector work on iOS (and the iOS Simulator) would require a different codegen. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 14 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-15 05:11, Michel Fortin wrote:

 mprotect isn't available at all with the iOS SDK. So making this
 collector work on iOS (and the iOS Simulator) would require a different
 codegen.
I haven't tried compiling anything and I don't know if I'm looking in the correct file but this file: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk/usr/include/sys/mman.h Does contain "mprotect". -- /Jacob Carlborg
Oct 15 2013
parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-15 07:28:16 +0000, Jacob Carlborg <doob me.com> said:

 On 2013-10-15 05:11, Michel Fortin wrote:
 
 mprotect isn't available at all with the iOS SDK. So making this
 collector work on iOS (and the iOS Simulator) would require a different
 codegen.
I haven't tried compiling anything and I don't know if I'm looking in the correct file but this file: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk/usr/include/sys/mman.h
Does
 
 contain "mprotect".
You're right. Yes it does exist. I was confused. Not only it does exist, but it lets you set the executable bit. I find that depressing, since I'm pretty sure App Store apps are prevented from setting the executable bit, and I'd tend to think now that they're blocking it by checking for references to mprotect it in the executable when submitting to the App Store. And by doing it this way they probably wouldn't be able to distinguish between setting the executable bit or making a page read-only. Also, someone would need to check that Windows Phone apps and Windows 8-style (Metro) apps can access mprotect (or equivalent) too. They're sandboxed just as heavily and statically checked upon submission the same way. Could some game consoles out there block it too? -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 15 2013
prev sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-15 02:20:49 +0000, "deadalnix" <deadalnix gmail.com> said:

 It will indeed cause trouble for code that mutate a large amount of 
 shared pointers. I'd say that such code is probably asking for trouble 
 in the first place, but as always, no silver bullet. I still think 
 solution is the one that fit D the best.
I think there's a small mistake in your phrasing, but it makes a difference. When the collector is running, it needs to know about any mutation for pointers to its shared memory pool, including pointers that are themselves thread-local but point to shared memory. So COW will be trouble for code that mutate a large amount of **pages containing pointers to shared memory**. And this which includes **pointers to immutable data** because immutable is implicitly shared. And this includes **pointers to const data** since those pointers might point to immutable (thus shared) memory. So any memory page susceptible of containing pointers to shared memory would need to use COW during collection. Which means all the thread's stacks, and also all objects with a pointer to shared, immutable, and const data. At this point I think it is fair to approximate this to almost all memory that could contain pointers. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 15 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 15 October 2013 at 11:03:01 UTC, Michel Fortin wrote:
 On 2013-10-15 02:20:49 +0000, "deadalnix" <deadalnix gmail.com> 
 said:

 It will indeed cause trouble for code that mutate a large 
 amount of shared pointers. I'd say that such code is probably 
 asking for trouble in the first place, but as always, no 
 silver bullet. I still think solution is the one that fit D 
 the best.
I think there's a small mistake in your phrasing, but it makes a difference. When the collector is running, it needs to know about any mutation for pointers to its shared memory pool, including pointers that are themselves thread-local but point to shared memory. So COW will be trouble for code that mutate a large amount of **pages containing pointers to shared memory**. And this which includes **pointers to immutable data** because immutable is implicitly shared. And this includes **pointers to const data** since those pointers might point to immutable (thus shared) memory.
No, that is the beauty of it :D Consider you have pointer from Tl -> shared -> immutable and TL -> immutable. I'm not covering TL collection here (It seem to be obvious that it doesn't require to stop the world). So the starting point is that we have the roots in all TL heaps/stacks, and we want to collect shared/immutable without blocking the worlds. TL heap may get new pointers to the shared heap, but they can only come from the shared heap itself or new allocations. At this point, you consider every new allocations as live. Reading a pointer from the shared heap and copy it to the TL heap isn't problematic in itself, but then we have a problem if this pointer is now updated in the shared heap, as the GC may never scan this pointer. This is why you need to track pointer writes to the shared heap. The write value itself isn't important : it come from either new alloc that are live, or from somewhere else in the shared heap (so it will be scanned as we track writes).
 So any memory page susceptible of containing pointers to shared 
 memory would need to use COW during collection. Which means all 
 the thread's stacks, and also all objects with a pointer to 
 shared, immutable, and const data. At this point I think it is 
 fair to approximate this to almost all memory that could 
 contain pointers.
No, only the shared one, that is the beauty of the technique. Not that I'm not making that up myself, it is how GC used to work in the Caml family for a while, and it has proven itself really efficient (in Caml family, most data are either immutable or thread local, and the shared heap typically small).
Oct 15 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-15 18:32:01 +0000, "deadalnix" <deadalnix gmail.com> said:

 No, that is the beauty of it :D
 
 Consider you have pointer from Tl -> shared -> immutable and TL -> immutable.
 
 I'm not covering TL collection here (It seem to be obvious that it 
 doesn't require to stop the world). So the starting point is that we 
 have the roots in all TL heaps/stacks, and we want to collect 
 shared/immutable without blocking the worlds.
 
 TL heap may get new pointers to the shared heap, but they can only come 
 from the shared heap itself or new allocations. At this point, you 
 consider every new allocations as live.
 
 Reading a pointer from the shared heap and copy it to the TL heap isn't 
 problematic in itself, but then we have a problem if this pointer is 
 now updated in the shared heap, as the GC may never scan this pointer.
 
 This is why you need to track pointer writes to the shared heap. The 
 write value itself isn't important : it come from either new alloc that 
 are live, or from somewhere else in the shared heap (so it will be 
 scanned as we track writes).
But you still have to scan the thread-local heaps and stacks to find pointers to shared memory. If you don't stop those threads, how do you know one of these threads isn't moving a pointer value during the scan from one place the GC still has to scan to another that the GC has just scanned, making it so the GC never sees the pointer? For instance: class A { // string is immutable and points to shared heap immutable(char)[] s; } A global; void func() { // GC scans the stack of this thread in the background // no reference to our string on the stack // moving string pointer to the stack // while the GC is running auto tmp = global.s; global.s = null; // GC scans the heap of this thread in the background // no reference to our string on the heap // (missed reference to our string now on the stack) } The thread can move the pointer around while the GC is looking away and you'll end up with a pointer to a freed string. So you have to use COW for the thread's stack, or get notified of a pointer assignment somehow (or of a pointer move), or stop the thread while you're scanning its heap. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 15 2013
parent "deadalnix" <deadalnix gmail.com> writes:
Yes and no.

You obviously need to scan TL heaps at some point. When doing so 
you'll have a set of root that allow you to scan shared/immutable 
heap. What is going on in the TL heap become irrelevant once you 
have the root. And getting the roots from the TL heap is another 
problem altogether (any kind of GC can be used for that, stop the 
world, where the world is thread local, or another concurrent GC, 
the important point being that it is an independent problem).
Oct 16 2013
prev sibling next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
Am 14.10.2013 21:42, schrieb Michel Fortin:
 On 2013-10-14 17:57:18 +0000, Rainer Schuetze <r.sagitario gmx.de> said:

 On 13.10.2013 13:48, Michel Fortin wrote:
 For one of my projects I implemented a shared pointer like this. It uses
 the pointer value itself as a spin lock with the assumption that -1 is
 an invalid pointer value:

 1. read pointer value
 2. if read value is -1 go to line 1 (spin)
 3. compare and swap (previously read value <-> -1)
 4. if failure go to line 1 (spin)
 // now pointer is "locked", its value is -1 but we have a copy of the
 original
 5. copy pointer locally or assign to it (and update counter)
 6. write back pointer value atomically to replace the -1

 No mutex, but there's a spin lock so it's not good if there's
 contention.

 That said, I find it extremely rare to want a shared pointer that isn't
 already protected by a mutex alongside other variables, or that isn't
 propagated using some form of message passing.
Locking is very bad if you have threads at different priorities as it might introduce priority inversion. Spinning is probably even worse in that scenario.
Spinning is good only when you very rarely expect contention, which is the case for me. The above code is used once per object the first time someone requests a weak pointer for it. Having contention for that just doesn't make sense in most usage patterns. But still, being curious I added a log message anytime it does actually has to spin, and I have yet to see that message once in my logs (which probably means aggressive enough unit tests are missing). If you have a lot of read accesses and rarely write to the pointer you could instead try a read-write mutex with concurrent read access. In any case, there's no solution that will be ideal in all cases. Different situations asks for different trade-offs.
 At work, I use shared pointers all the time to pass information to a
 real time audio thread. The scheme uses triple-buffering of pointers
 for a lock free safe transport from/to the real time thread.

 Not having to worry about these low-level locking stuff is one of the
 good aspects about garbage collecting.
Indeed. The current garbage collector makes it easy to have shared pointers to shared objects. But the GC can also interrupt real-time threads for an unpredictable duration, how do you cope with that in a real-time thread? I know ARC isn't the ideal solution for all use cases. But neither is the GC, especially for real-time applications. So, which one would you recommend for a project having a real-time audio thread?
Well, if real time concurrent GC for Java systems is good enough for systems that control militar missile systems, maybe it is good enough for real-time audio as well. -- Paulo
Oct 14 2013
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 14.10.2013 22:59, schrieb Paulo Pinto:
 Well, if real time concurrent GC for Java systems is good enough for
 systems that control militar missile systems, maybe it is good enough
 for real-time audio as well.

 --
 Paulo
The problem is not that there are no GCs around in other languages which satisfy certain requirements. The problem is actually implementing them in D. I suggest that you read "The Garbage Collection Handbook" which explains this in deep detail. I'm currently reading it, and I might write an article about the entire D GC issue once I'm done with it.
Oct 16 2013
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Oct 16, 2013, at 11:54 AM, Benjamin Thaut <code benjamin-thaut.de> =
wrote:
=20
 The problem is not that there are no GCs around in other languages =
which satisfy certain requirements. The problem is actually implementing = them in D. I suggest that you read "The Garbage Collection Handbook" = which explains this in deep detail. I'm currently reading it, and I = might write an article about the entire D GC issue once I'm done with = it. I think the short version is that D being able to directly call C code = is a huge problem here. Incremental GCs all rely on the GC being = notified when pointers are changed. We might be able to manage it for = SafeD, but then SafeD would basically be its own language.=
Oct 16 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-10-16 21:05, Sean Kelly wrote:

 I think the short version is that D being able to directly call C code is a
huge problem here.  Incremental GCs all rely on the GC being notified when
pointers are changed.  We might be able to manage it for SafeD, but then SafeD
would basically be its own language.
One need to be very careful with the memory when interfacing with C code today. What about having a function that notifies the GC that a pointer has been updated? -- /Jacob Carlborg
Oct 16 2013
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 16.10.2013 21:05, schrieb Sean Kelly:
 On Oct 16, 2013, at 11:54 AM, Benjamin Thaut <code benjamin-thaut.de> wrote:
 The problem is not that there are no GCs around in other languages which
satisfy certain requirements. The problem is actually implementing them in D. I
suggest that you read "The Garbage Collection Handbook" which explains this in
deep detail. I'm currently reading it, and I might write an article about the
entire D GC issue once I'm done with it.
I think the short version is that D being able to directly call C code is a huge problem here. Incremental GCs all rely on the GC being notified when pointers are changed. We might be able to manage it for SafeD, but then SafeD would basically be its own language.
I think a even bigger problem are structs. Because if you need write barriers for pointers on the heap you are going to have a problem with structs. Because you will never know if they are located on the heap or the stack. Additionally making the stack percisely scannable and adding GC-Points will require a lot of compiler support. And even if this is doable in respect to DMD its going to be a big problem for GDC or LDC to change the codegen.
Oct 16 2013
parent reply Sean Kelly <sean invisibleduck.org> writes:
On Oct 16, 2013, at 1:05 PM, Benjamin Thaut <code benjamin-thaut.de> wrote:
=20
 Am 16.10.2013 21:05, schrieb Sean Kelly:
 On Oct 16, 2013, at 11:54 AM, Benjamin Thaut <code benjamin-thaut.de> wro=
te:
=20
 The problem is not that there are no GCs around in other languages which=
satisfy certain requirements. The problem is actually implementing them in D= . I suggest that you read "The Garbage Collection Handbook" which explains t= his in deep detail. I'm currently reading it, and I might write an article a= bout the entire D GC issue once I'm done with it.
=20
 I think the short version is that D being able to directly call C code is=
a huge problem here. Incremental GCs all rely on the GC being notified whe= n pointers are changed. We might be able to manage it for SafeD, but then S= afeD would basically be its own language.
=20
 I think a even bigger problem are structs. Because if you need write barri=
ers for pointers on the heap you are going to have a problem with structs. B= ecause you will never know if they are located on the heap or the stack. Add= itionally making the stack percisely scannable and adding GC-Points will req= uire a lot of compiler support. And even if this is doable in respect to DMD= its going to be a big problem for GDC or LDC to change the codegen. Yes, any pointer anywhere. I recall someone posting a doc about a compromise= solution a few years back, but I'd have to do some digging to figure out wh= at the approach was.=20=
Oct 17 2013
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 17 October 2013 at 17:11:06 UTC, Sean Kelly wrote:
 And even if this is doable in respect to DMD its going to be a 
 big problem for GDC or LDC to change the codegen.
Yes, any pointer anywhere. I recall someone posting a doc about a compromise solution a few years back, but I'd have to do some digging to figure out what the approach was.
LLVM actually comes with a quite expensive GC support infrastructure: http://llvm.org/docs/GarbageCollection.html. As far as I'm aware, it is not widely used in terms of the "top-tier" LLVM projects, so there might be quite a bit of work involved in getting that to run. David
Oct 17 2013
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 17.10.2013 19:16, schrieb David Nadlinger:
 On Thursday, 17 October 2013 at 17:11:06 UTC, Sean Kelly wrote:
 And even if this is doable in respect to DMD its going to be a big
 problem for GDC or LDC to change the codegen.
Yes, any pointer anywhere. I recall someone posting a doc about a compromise solution a few years back, but I'd have to do some digging to figure out what the approach was.
LLVM actually comes with a quite expensive GC support infrastructure: http://llvm.org/docs/GarbageCollection.html. As far as I'm aware, it is not widely used in terms of the "top-tier" LLVM projects, so there might be quite a bit of work involved in getting that to run. David
Uhhh, this sounds really good. They in fact have everything to implement a generational garbage collector. This would improve the D GC situation a lot. But reading the part about the shadow stack really lowers my expectations. Thats really something you don't want. The performance impact is so going to be so big, that it doesn't make sense to use the better GC in the first place. Kind Regards Benjamin Thaut
Oct 17 2013
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Thursday, 17 October 2013 at 17:28:17 UTC, Benjamin Thaut 
wrote:
 Am 17.10.2013 19:16, schrieb David Nadlinger:

 But reading the part about the shadow stack really lowers my 
 expectations. Thats really something you don't want. The 
 performance impact is so going to be so big, that it doesn't 
 make sense to use the better GC in the first place.
There's always a tradeoff. If an app is very delay-sensitive, then making the app run slower in general might be worthwhile if the delay could be eliminated.
Oct 17 2013
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 17.10.2013 19:47, schrieb Sean Kelly:
 On Thursday, 17 October 2013 at 17:28:17 UTC, Benjamin Thaut wrote:
 Am 17.10.2013 19:16, schrieb David Nadlinger:

 But reading the part about the shadow stack really lowers my
 expectations. Thats really something you don't want. The performance
 impact is so going to be so big, that it doesn't make sense to use the
 better GC in the first place.
There's always a tradeoff. If an app is very delay-sensitive, then making the app run slower in general might be worthwhile if the delay could be eliminated.
Well I just read it again, and it appears to me that the shadow stack is something they already have implemented and it can be used for "gc prototyping" but if you want you can write your own code generator plugin and generate your own stack maps. It actually sounds more feasable to implement a generational GC with LLVM then with what we have in dmd. Kind Regards Benjamin Thaut
Oct 17 2013
prev sibling parent Paulo Pinto <pjmlp progtools.org> writes:
Am 16.10.2013 20:54, schrieb Benjamin Thaut:
 Am 14.10.2013 22:59, schrieb Paulo Pinto:
 Well, if real time concurrent GC for Java systems is good enough for
 systems that control militar missile systems, maybe it is good enough
 for real-time audio as well.

 --
 Paulo
The problem is not that there are no GCs around in other languages which satisfy certain requirements. The problem is actually implementing them in D. I suggest that you read "The Garbage Collection Handbook" which explains this in deep detail. I'm currently reading it, and I might write an article about the entire D GC issue once I'm done with it.
I have read it when it came out in the late 90's. My main focus areas in the university were compiler design, distributed systems and graphics programming. Although, like many, I tend to do plain boring enterprise applications nowadays. -- Paulo
Oct 16 2013
prev sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 14.10.2013 21:42, Michel Fortin wrote:
 Indeed. The current garbage collector makes it easy to have shared
 pointers to shared objects. But the GC can also interrupt real-time
 threads for an unpredictable duration, how do you cope with that in a
 real-time thread?
The work I was talking about uses C++, not D, so there is no GC involved. The options I see for real-time threads in D is either a concurrent GC (which means read/write barriers for pointer accesses) or just excluding the real time thread from suspension by the GC. This forces the programmer to ensure that references in the real time thread are also found elsewhere. I'm not sure if this eliminates the benefits regarding locking, though.
 I know ARC isn't the ideal solution for all use cases. But neither is
 the GC, especially for real-time applications. So, which one would you
 recommend for a project having a real-time audio thread?
ARC doesn't work for real time threads anyway, because you are not allowed to deallocate if it can cause locks. It can only work if you defer reference counting into another thread through some buffering. Realistically I would currently recommend the approach above: exclude the thread from suspension, and keep a reference to used object elsewhere. This is probably about as difficult as avoiding allocations/deallocations in C++, but harder to debug.
Oct 15 2013
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 12 October 2013 at 06:16:24 UTC, Rainer Schuetze 
wrote:
 in pseudo-assembly missing null-checks:

 Thread1 (R = P)        Thread2 (S = R)

                        mov ecx,[R]
                        ; thread suspended
You need an sequentially consistent write. You also need to increment the refcount BEFORE ! This codegen is incorrect.
 mov eax,[P]
 inc [eax].refcnt
Same here.
 mov ebx,[R]
 mov [R],eax
 dec [ebx].refcnt      ; refcnt of O now 0
 jnz done
 call delete_ebx
                        ; thread resumed
                        inc [ecx].refcnt
 done:

 The increment on [ecx].refcnt modifies garbage.
This can be done atomically (even with eventually consistent increment, don't need full sequential consistency here, you still need fully sequential consistent decrement).
Oct 12 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 12.10.2013 20:31, deadalnix wrote:
 On Saturday, 12 October 2013 at 06:16:24 UTC, Rainer Schuetze wrote:
 in pseudo-assembly missing null-checks:

 Thread1 (R = P)        Thread2 (S = R)

                        mov ecx,[R]
                        ; thread suspended
You need an sequentially consistent write. You also need to increment the refcount BEFORE ! This codegen is incorrect.
How do you increment the counter without reading its address?
 mov eax,[P]
 inc [eax].refcnt
Same here.
Same here ;-)
 mov ebx,[R]
 mov [R],eax
 dec [ebx].refcnt      ; refcnt of O now 0
 jnz done
 call delete_ebx
                        ; thread resumed
                        inc [ecx].refcnt
 done:

 The increment on [ecx].refcnt modifies garbage.
This can be done atomically (even with eventually consistent increment, don't need full sequential consistency here, you still need fully sequential consistent decrement).
According to the "Handbook of Garbage Collection" by Richard Jones eager lock-free reference counting can only be done with a cas2 operation modifying two seperate locations atomically (algorithm 18.2 "Eager reference counting with CompareAndSwap is broken"). This might be the quoted paper: http://scholr.ly/paper/2199608/lock-free-reference-counting Unfortunately the CAS2 operation does not exist in most processors.
Oct 13 2013
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
 How do you increment the counter without reading its address?
I assumed that the reference count was in a struct with the data, and refcounted point to it. In this case, if you remove the pointer via a sequencially consistent write (while keeping a local copy internally) and THEN decrement the counter, the other thread will access another object (or skip on a null check). Granted the read is sequencially consistent.
Oct 13 2013
parent Artur Skawina <art.08.09 gmail.com> writes:
On 10/13/13 11:19, deadalnix wrote:
 On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
 How do you increment the counter without reading its address?
I assumed that the reference count was in a struct with the data, and refcounted point to it. In this case, if you remove the pointer via a sequencially consistent write (while keeping a local copy internally) and THEN decrement the counter, the other thread will access another object (or skip on a null check). Granted the read is sequencially consistent.
No, if you have two (or more) threads concurrently accessing the object, it is possible that one threads reads the pointer, then sleeps before incrementing the count. Then another thread comes and /destroys/ the object, the memory is reused for something else, or even unmapped. Then the first thread wakes up and incs the counter, which is no longer there, causing a crash or data corruption. But this is only a problem for shared objects, which are accessed without any locking -- it's not a common case at all, and can be dealt with by simply taking a lock *before* reading the reference. (there are many much more complex solutions such as CAS2 or RCU based ones). artur
Oct 13 2013
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
 According to the "Handbook of Garbage Collection" by Richard 
 Jones eager lock-free reference counting can only be done with 
 a cas2 operation modifying two seperate locations atomically 
 (algorithm 18.2 "Eager reference counting with CompareAndSwap 
 is broken"). This might be the quoted paper: 
 http://scholr.ly/paper/2199608/lock-free-reference-counting

 Unfortunately the CAS2 operation does not exist in most 
 processors.
I suppose it's worth noting that Boost (and now standard C++) has a shared_ptr that works across threads and the implementation I've seen doesn't use a mutex. In fact, I think the Boost one doesn't even use CAS on x86, though it's been quite a few years so my memory could be wrong on that last detail.
Oct 13 2013
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Am 13.10.2013 17:15, schrieb Sean Kelly:
 On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
 According to the "Handbook of Garbage Collection" by Richard Jones
 eager lock-free reference counting can only be done with a cas2
 operation modifying two seperate locations atomically (algorithm 18.2
 "Eager reference counting with CompareAndSwap is broken"). This might
 be the quoted paper:
 http://scholr.ly/paper/2199608/lock-free-reference-counting

 Unfortunately the CAS2 operation does not exist in most processors.
I suppose it's worth noting that Boost (and now standard C++) has a shared_ptr that works across threads and the implementation I've seen doesn't use a mutex. In fact, I think the Boost one doesn't even use CAS on x86, though it's been quite a few years so my memory could be wrong on that last detail.
I didn't read the paper, but I'd suspect that the paper refers to the case where both, the reference count _and_ the reference is thread-safe, since the boost/c++ shared_ptr only has a thread-safe reference count after all.
Oct 13 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 13.10.2013 19:05, Sönke Ludwig wrote:
 Am 13.10.2013 17:15, schrieb Sean Kelly:
 On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
 According to the "Handbook of Garbage Collection" by Richard Jones
 eager lock-free reference counting can only be done with a cas2
 operation modifying two seperate locations atomically (algorithm 18.2
 "Eager reference counting with CompareAndSwap is broken"). This might
 be the quoted paper:
 http://scholr.ly/paper/2199608/lock-free-reference-counting

 Unfortunately the CAS2 operation does not exist in most processors.
I suppose it's worth noting that Boost (and now standard C++) has a shared_ptr that works across threads and the implementation I've seen doesn't use a mutex. In fact, I think the Boost one doesn't even use CAS on x86, though it's been quite a few years so my memory could be wrong on that last detail.
I didn't read the paper, but I'd suspect that the paper refers to the case where both, the reference count _and_ the reference is thread-safe, since the boost/c++ shared_ptr only has a thread-safe reference count after all.
I haven't read it either, but AFAICT the cas2 operation is used two modify the pointer and the reference count at the same time atomically. I just checked boost::shared_ptr, it uses cas operations on the reference counts. It has the same problem as described in my example, see the read/write example 3 here: http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety boost::shared_ptr is also unsafe with respect to calling member functions through "->" as it doesn't increment the reference count.
Oct 14 2013
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 14 October 2013 at 17:43:33 UTC, Rainer Schuetze wrote:
 On 13.10.2013 19:05, Sönke Ludwig wrote:
 Am 13.10.2013 17:15, schrieb Sean Kelly:
 On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze 
 wrote:
 According to the "Handbook of Garbage Collection" by Richard 
 Jones
 eager lock-free reference counting can only be done with a 
 cas2
 operation modifying two seperate locations atomically 
 (algorithm 18.2
 "Eager reference counting with CompareAndSwap is broken"). 
 This might
 be the quoted paper:
 http://scholr.ly/paper/2199608/lock-free-reference-counting

 Unfortunately the CAS2 operation does not exist in most 
 processors.
I suppose it's worth noting that Boost (and now standard C++) has a shared_ptr that works across threads and the implementation I've seen doesn't use a mutex. In fact, I think the Boost one doesn't even use CAS on x86, though it's been quite a few years so my memory could be wrong on that last detail.
I didn't read the paper, but I'd suspect that the paper refers to the case where both, the reference count _and_ the reference is thread-safe, since the boost/c++ shared_ptr only has a thread-safe reference count after all.
I haven't read it either, but AFAICT the cas2 operation is used two modify the pointer and the reference count at the same time atomically. I just checked boost::shared_ptr, it uses cas operations on the reference counts. It has the same problem as described in my example, see the read/write example 3 here: http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety boost::shared_ptr is also unsafe with respect to calling member functions through "->" as it doesn't increment the reference count.
I'm totally out of my depth here but can't you store the reference count adjacent to the pointer and use CMPXCHG16B
Oct 14 2013
next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 14 October 2013 at 17:50:01 UTC, John Colvin wrote:
 I'm totally out of my depth here but can't you store the 
 reference count adjacent to the pointer and use CMPXCHG16B
I think that can work if the refcount is stored with the pointer (ie. if a fat pointer is used) and not in the object. But that would defeat the whole point of the refcount :( It seems that Rainer Schuetze is right and that the implementation that boost::shared_ptr uses is not safe.
Oct 14 2013
prev sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 14.10.2013 19:50, John Colvin wrote:
 On Monday, 14 October 2013 at 17:43:33 UTC, Rainer Schuetze wrote:
 On 13.10.2013 19:05, Sönke Ludwig wrote:
 Am 13.10.2013 17:15, schrieb Sean Kelly:
 On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
 According to the "Handbook of Garbage Collection" by Richard Jones
 eager lock-free reference counting can only be done with a cas2
 operation modifying two seperate locations atomically (algorithm 18.2
 "Eager reference counting with CompareAndSwap is broken"). This might
 be the quoted paper:
 http://scholr.ly/paper/2199608/lock-free-reference-counting

 Unfortunately the CAS2 operation does not exist in most processors.
I suppose it's worth noting that Boost (and now standard C++) has a shared_ptr that works across threads and the implementation I've seen doesn't use a mutex. In fact, I think the Boost one doesn't even use CAS on x86, though it's been quite a few years so my memory could be wrong on that last detail.
I didn't read the paper, but I'd suspect that the paper refers to the case where both, the reference count _and_ the reference is thread-safe, since the boost/c++ shared_ptr only has a thread-safe reference count after all.
I haven't read it either, but AFAICT the cas2 operation is used two modify the pointer and the reference count at the same time atomically. I just checked boost::shared_ptr, it uses cas operations on the reference counts. It has the same problem as described in my example, see the read/write example 3 here: http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety boost::shared_ptr is also unsafe with respect to calling member functions through "->" as it doesn't increment the reference count.
I'm totally out of my depth here but can't you store the reference count adjacent to the pointer and use CMPXCHG16B
This might work for a single pointer, but not if you have multiple pointers to the same object.
Oct 14 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 26.06.2013 13:12, Michel Fortin wrote:
 Le 26-juin-2013 à 5:38, Walter Bright  a
 écrit :

 On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 As Michel also said, the reference count does not have to be in
 inside the object itself, so we might want to allow reference
 counting on other types aswell.
That opens the question of what is the point of other RC types? For example, C++ can throw any type - but it turns out that throwing anything but class types is largely pointless.
RC is just another garbage collection scheme. You might favor it for its performance characteristics, its determinism, or the lower memory footprint. Or you might need it to interact with foreign code that relies on it (COM, Objective-C, etc.), in which case it needs to be customizable (use the foreign implementation) or be manually managed. That's two different use cases. And in the later case you can't use the GC to release cycles because foreign code is using memory invisible to the GC. It is important to note that when foreign code calls AddRef you don't want the GC to collect that object, at least not until Release is called.
That means you have to maintain two reference counts if you have both ARC and COM used in the same class. ARC can only release the object if both counters are 0, while the COM implementation has to add/remove roots to the GC when counting from/to 0 to avoid that the object is collected while external references exist.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 2013-06-26 à 17:42, Rainer Schuetze <r.sagitario gmx.de> a écrit :

 On 26.06.2013 13:12, Michel Fortin wrote:
 Le 26-juin-2013 à 5:38, Walter Bright  a
 écrit :

 On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 As Michel also said, the reference count does not have to be in
 inside the object itself, so we might want to allow reference
 counting on other types aswell.
That opens the question of what is the point of other RC types? For example, C++ can throw any type - but it turns out that throwing anything but class types is largely pointless.
RC is just another garbage collection scheme. You might favor it for its performance characteristics, its determinism, or the lower memory footprint. Or you might need it to interact with foreign code that relies on it (COM, Objective-C, etc.), in which case it needs to be customizable (use the foreign implementation) or be manually managed. That's two different use cases. And in the later case you can't use the GC to release cycles because foreign code is using memory invisible to the GC. It is important to note that when foreign code calls AddRef you don't want the GC to collect that object, at least not until Release is called.
That means you have to maintain two reference counts if you have both ARC and
COM used in the same class. ARC can only release the object if both counters are 0, while the COM implementation has to add/remove roots to the GC when counting from/to 0 to avoid that the object is collected while external references exist. Yes. The "external" reference count prevents the gc from releasing memory and the "internal" reference count keeps track of the number of references from gc-scanned memory. When both fall to zero, the memory is released immediately; when the external count falls to zero, the gc can collect memory if it isn't connected to any of its roots. Note that you could implement the external count as a separate hash table that'll include a gc-scanned pointer to the object. That pointer would keep the internal count above zero as long as it is present in the table. The external count doesn't need to be formally part of the gc data structure.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 4:15 AM, Michel Fortin wrote:
 Well, it's mostly required to write runtime support functions. The attribute 
could be more obscure so people are less tempted to use it, but if you're going to implement the ref-counting code you'll need that. I know the temptation is strong to create more attributes as an easy solution, but I'd really like to try hard to find other ways.
 D would need manual, RC and GC to coexist peacefully.
 The problem is how to make the three of those use the same codegen?

 - Druntime could have a flag to disable/enable refcounting. It'd make the 
retain/release functions no-ops, but it'd not prevent the GC from reclaiming memory as it does today.
 - Druntime could have a flag to disable/enable garbage collection (it already 
has). That'd prevent cycles from being collected, but you could use weak pointers to work around that or request a collection manually at the appropriate time.
 - A  noarc (or similar) attribute at the function level could be used to 
prevent the compiler from generating function calls on pointer assignments. You could make a whole module noarc if you want by adding " noarc:" at the top.
 Here's the annoying thing:  noarc is totally safe if reference counting is 
disabled and we rely entirely on the GC. noarc is unsafe when reference counting is enabled. I don't really understand your point. My proposal enables manual, RC and GC to coexist peacefully. The GC wouldn't even know about RC.
 The downside is that every assignment to a pointer anywhere has to call a 
function. While this is some overhead, it is more predictable than overhead from a GC scan and would be preferred in some situation (games I guess). Another downside is you have an object retained by being present on the stack frame of a C function, it'd have to be explicitly retained from elsewhere.
 Doesn't this make it impractical to mix vanilla C with D code? An important 
feature of D is this capability, without worrying about a "JNI" style interface.
 It's not very different than with the GC today.

 If you call a C function by giving it a ref-counted pointer argument, that 
memory block is guarantied to live at least for that call's lifetime (because it is retained by the caller). So simple calls to C functions are not a problem.
 If the C function puts that pointer elsewhere you'll need to retain it some 
other way, but you have to do this with the GC too. If you're implementing a callback called from C you need to care about what you return because the caller's C code won't retain it, while with the GC you could manage if C code did not store that pointer outside of the stack.
 I think that's all you have to worry about.
D (like C and C++) loves to manipulate pointers. Having to call a function every time this is done would be a disaster. It means that people would be motivated to drop down to C to do the fast code, and we might as well throw in the towel.
 As for D switching to a full refcounted GC for everything, I'm very hesitant 
for such a step. For one thing, reading the clang spec on all the various pointer and function annotations necessary is very off-putting.
 Don't let Clang intimidate you. The Clang spec is about four to five time 
more complicated than needed because of autoreleased objects and because it supports weak pointers. Weak pointers can be implemented as a struct templates (as long as we have noarc). And all those annotations are for special cases, when you need to break the rules. You don't use them when doing normal programming, well except for __weak.

Ok.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 2:33 PM, Rainer Schuetze wrote:
 On 26.06.2013 11:38, Walter Bright wrote:
 On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 I imagine a few (constrained) templated functions for the different
 operations defined in the library could also do the job, though it
 might drown compilation speed. Also getting help from the optimizer to
 remove redundant calls will need some back doors.
I don't see how this can be done without specific compiler knowledge in a memory safe way.
I currently don't see how it can be memory safe with this proposal.
I'm a little confused about what you are referring to here.
 3. Assignment to a class reference causes a call to AddRef() on the new
 value
 followed by a call to Release() on its original value.
It might be common knowledge, but I want to point out that the usual COM implementation (atomic increment/decrement and free when refcount goes down to 0) is not thread-safe for shared pointers. That means you either have to guard all reads and writes with a lock to make the full assignment atomic or have to implement reference counting very different (e.g. deferred reference counting).
Since the implementation of AddRef()/Release() is up to the user, whether it uses locks or not and whether it supports shared or not is up to the user.
You have to put the lock around the pair of AddRef and Release, but if the
compiler already splits this into two function calls, this cannot be done in the implementation. Why is it necessary to put a lock around the pair?
 12. AddRef() is not called when passed as the implicit 'this' reference.
Isn't this unsafe if a member function is called through the last existing reference and this reference is then cleared during execution of this member function or from another thread?
No. The caller of the function still retains a reference in that thread.
Hmmm, I guess I misunderstand the proposal. Assume for example a refcounted
class R and this code
 class R : RefCounted
 {
     int _x;
     int readx() { return _x; }
 }
 int main()
 {
     R r = new R;
     return r.readx();
 }

 According to 12. there is no refcounting going on when calling or executing 
readx. Ok, now what happens here:
 class R : RefCounted
 {
     int _x;
     int readx(C c)
     {
         c.r = null; // "standard" rc deletes r here
         return _x;  // reads garbage
     }
 }
 class C
 {
     R r;
 }
 int main()
 {
     C c = new C;
     c.r = new R;
     return c.r.readx(c);
 }

 This reads garbage or crashes if there is no reference counting going on when 
calling readx. I think you're right. Grrrr!
 13. Taking the address of, or passing by reference, any fields of an RC
 object
 is not allowed in  safe code. Passing by reference an RC field is
 allowed.
Please note that this includes slices to fixed size arrays.
As I suggested, arrays would not be supported with this proposal - but the user can create ref counted array-like objects.
Just to clarify, I meant taking a slice of a static array that is a field of
a refcounted class. Is it forbidden to have a field like this in a refcounted class or is taking the address through slicing forbidden? It would be forbidden to obtain a slice of a ref counted object in this way - or even to simply refer to a static array embedded in a ref counted object (in safe code).
 I feel I'm hijacking this proposal, but the step to library defined
 read/write barriers seems pretty small. Make AddRef, Release and
 assignment free template functions, e.g.

 void ptrConstruct(T,bool stackOrHeap)(T*adr, T p);
 void ptrAssign(T,bool stackOrHeap)(T*adr, T p);
 void ptrRelease(T,bool stackOrHeap)(T*adr);

 and we are able to experiment with all kinds of sophisticated GC
 algorithms including RC. Eliding redundant addref/release pairs would
 need some extra support though, I read that LLVM does something like
 this, but I don't know how.
It's pretty invasive into the code generation and performance, and could completely disrupt the C compatibility of D.
I don't see a big difference between a free function and a member function
call, though the template character of it might hurt compilation performance.
 Two more notes:

 - I'm not sure it is mentioned, but I think you have to describe what happens 
when copying a struct. pre- and post-blit actions have to be taken if the struct contain pointers to refcounted objects. Yes. It's analogous to copying a struct that has fields which contain constructors and destructors.
 10. Function returns have an AddRef() already done to the return value.
- A refcounted reference returned from a function (including new) would have
to be Released if the return value is ignored or if only used as part of an expression.

That's right. Just as if a function returned a struct with a destructor.

The model I am basing this on is C++'s shared_ptr<>, which makes use of all the 
various rules around construction, assignment, and destruction. The wrinkles we 
have are:

1. memory safety
2. working with the GC
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 26-juin-2013 à 19:48, Walter Bright  a écrit :

 D (like C and C++) loves to manipulate pointers. Having to call a function 
every time this is done would be a disaster. It means that people would be motivated to drop down to C to do the fast code, and we might as well throw in the towel. But at the same time, having a GC that stops the world at irregular interval is worse for certain kind of things, games notably. It's been stated often on the forum that game developers prefer increasing the overhead if it prevents those hiccups. Making everything in D reference-counted would undoubtedly increase the general overhead, but it'd allow game developers to leverage the whole language and its libraries instead of restricting themselves to a custom subset of the class hierarchy derived from a reference counted class. And about pointer manipulation: for cases where you know for certain that the pointer still points to the same memory block before and after the assignment (when you call popFront on an array for instance), you have no reference count to update and can elide the call. The downside of optional support for language-wide reference counting is that it requires two incompatible codegen (or rather one incompatible with RC). We could have only one if it's the one that calls the retain/release functions on pointer assignment, with those functions replaced with empty stubs in druntime when reference counting is disabled, but some overhead would remain for the function call. I'm not claiming this is the right solution. It's just an idea I wanted to mention as an aside because it has some common points. It is however a mostly separate matter from your initial goal of supporting custom reference counting schemes for some object hierarchies. I decided to write about it mostly because you talked about reimplementing arrays using classes and that got me thinking. But perhaps I shouldn't have mentioned it because it seems to be side-tracking the discussion.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 6:33 PM, Michel Fortin wrote:
 Le 26-juin-2013 à 19:48, Walter Bright  a écrit :

 D (like C and C++) loves to manipulate pointers. Having to call a function 
every time this is done would be a disaster. It means that people would be motivated to drop down to C to do the fast code, and we might as well throw in the towel.
 But at the same time, having a GC that stops the world at irregular interval 
is worse for certain kind of things, games notably. It's been stated often on the forum that game developers prefer increasing the overhead if it prevents those hiccups. Making everything in D reference-counted would undoubtedly increase the general overhead, but it'd allow game developers to leverage the whole language and its libraries instead of restricting themselves to a custom subset of the class hierarchy derived from a reference counted class.
 And about pointer manipulation: for cases where you know for certain that the 
pointer still points to the same memory block before and after the assignment (when you call popFront on an array for instance), you have no reference count to update and can elide the call. I've never seen a scheme for "knows for certain" that did not involve extensive and intrusive pointer annotations, something we very much want to avoid. Pointer annotations work great in theory, but in practice no successful language I know of uses them (we'll see if Rust will become an exception).
 The downside of optional support for language-wide reference counting is that 
it requires two incompatible codegen (or rather one incompatible with RC). We could have only one if it's the one that calls the retain/release functions on pointer assignment, with those functions replaced with empty stubs in druntime when reference counting is disabled, but some overhead would remain for the function call.
 I'm not claiming this is the right solution. It's just an idea I wanted to 
mention as an aside because it has some common points. It is however a mostly separate matter from your initial goal of supporting custom reference counting schemes for some object hierarchies. I decided to write about it mostly because you talked about reimplementing arrays using classes and that got me thinking. But perhaps I shouldn't have mentioned it because it seems to be side-tracking the discussion.

This proposal is modeled after C++'s shared_ptr<T> in that it should have 
equivalent performance and capabilities. Since it has been well accepted into 
the C++ "best practices", I think we're on solid ground with it.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 27.06.2013 02:11, Walter Bright wrote:
 On 6/26/2013 2:33 PM, Rainer Schuetze wrote:
 On 26.06.2013 11:38, Walter Bright wrote:
 On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 I imagine a few (constrained) templated functions for the
 different operations defined in the library could also do the
 job, though it might drown compilation speed. Also getting help
 from the optimizer to remove redundant calls will need some
 back doors.
I don't see how this can be done without specific compiler knowledge in a memory safe way.
I currently don't see how it can be memory safe with this proposal.
I'm a little confused about what you are referring to here.
When preparing for dconf I read the "Garbage Collection Handbook" by Richard Jones, and it very much supported my suspicion that the usual reference counting cannot be both memory safe and high-performance.
 You have to put the lock around the pair of AddRef and Release, but
 if the compiler already splits this into two function calls, this
 cannot be done in the implementation.
Why is it necessary to put a lock around the pair?
To be more accurate, it is the assignment and the Release that have to be atomic, in addition to a concurrent read with AddRef. Imagine the reading thread is suspended while just having read the pointer, but not incremented the reference count yet. If an assignment with release and deletion is performed before the thread resumes, AddRef is called on garbage. IIRC you also have the GC handbook book on your shelf. Check the chapters on RC, especially algorithm 18.2 "Eager reference counting with CompareAndSwap is broken".
 Just to clarify, I meant taking a slice of a static array that is a
  field of a refcounted class. Is it forbidden to have a field like
 this in a refcounted class or is taking the address through slicing
 forbidden?
It would be forbidden to obtain a slice of a ref counted object in this way - or even to simply refer to a static array embedded in a ref counted object (in safe code).
Ok.
 Two more notes:

 - I'm not sure it is mentioned, but I think you have to describe
 what happens when copying a struct. pre- and post-blit actions have
 to be taken if the struct contain pointers to refcounted objects.
Yes. It's analogous to copying a struct that has fields which contain constructors and destructors.
Ok, I tend to forget about the swapping to a temporary when assigning structs.
 The model I am basing this on is C++'s shared_ptr<>, which makes use
 of all the various rules around construction, assignment, and
 destruction. The wrinkles we have are:

 1. memory safety
This is the hard part.
 2. working with the GC
I don't think that the GC is getting in the way as long as it is mark-and-sweep. A fully reference counting GC is a different story and can be made concurrent, but it is usually not eager and needs to defer actual reference counting to avoid locks. Instead it logs accesses to some thread local buffer which needs to be processed eventually.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:


 Why is it necessary to put a lock around the pair?
 To be more accurate, it is the assignment and the Release that have to
 be atomic, in addition to a concurrent read with AddRef. Imagine the
 reading thread is suspended while just having read the pointer, but not
 incremented the reference count yet. If an assignment with release and
 deletion is performed before the thread resumes, AddRef is called on
 garbage.
On my way to work today, I figured that a safe but slow implementation can work, if the interface is not AddRef/Release, but class C { C readThis(); void writeThis(ref C c); } where the function can include the necessary locks, e.g. class C { int refcnt; C readThis() { synchronized(this) { refcnt++; return this; } } void writeThis(ref C c) { synchronized(c) { C x = c; c = this; if (--c.refcnt == 0) delete c; } } } Reading/Writing null (e.g. when constructing or destructing a reference) would have to be special cased, that would not be necesessary with free functions.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:


Le 27-juin-2013 à 8:03, "Rainer Schuetze" <r.sagitario gmx.de> a écrit :

 On my way to work today, I figured that a safe but slow implementation can 
work, if the interface is not AddRef/Release, but
 class C
 {
  C readThis();
  void writeThis(ref C c);
 }

 where the function can include the necessary locks, e.g.

 class C
 {
  int refcnt;

  C readThis()
  {
    synchronized(this)
    {
      refcnt++;
      return this;
    }
  }
  void writeThis(ref C c)
  {
    synchronized(c)
    {
       C x = c;
       c = this;
       if (--c.refcnt == 0)
         delete c;
    }
  }
 }
There's an error in this code. You must synchronize on the lock protecting the pointer, not on the lock at the other end of the pointer's value. Also, you only need to do this if the pointer pointing to the object is shared. If the pointer is thread-local, assignment does not need to be atomic. And if the object itself is thread-local, not even the reference counter need to be atomic.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 11:28 PM, Rainer Schuetze wrote:
 On 27.06.2013 02:11, Walter Bright wrote:
 On 6/26/2013 2:33 PM, Rainer Schuetze wrote:
 On 26.06.2013 11:38, Walter Bright wrote:
 On 6/26/2013 12:19 AM, Rainer Schuetze wrote:
 I imagine a few (constrained) templated functions for the
 different operations defined in the library could also do the
 job, though it might drown compilation speed. Also getting help
 from the optimizer to remove redundant calls will need some
 back doors.
I don't see how this can be done without specific compiler knowledge in a memory safe way.
I currently don't see how it can be memory safe with this proposal.
I'm a little confused about what you are referring to here.
When preparing for dconf I read the "Garbage Collection Handbook" by Richard Jones, and it very much supported my suspicion that the usual reference counting cannot be both memory safe and high-performance.
I think with the rules in the proposal we have we can support it.
 You have to put the lock around the pair of AddRef and Release, but
 if the compiler already splits this into two function calls, this
 cannot be done in the implementation.
Why is it necessary to put a lock around the pair?
To be more accurate, it is the assignment and the Release that have to be atomic, in addition to a concurrent read with AddRef. Imagine the reading thread is suspended while just having read the pointer, but not incremented the reference count yet. If an assignment with release and
deletion is performed before the thread resumes, AddRef is called on garbage. I see. I'll have to think about that some more.
 IIRC you also have the GC handbook book on your shelf. Check the
 chapters on RC, especially algorithm 18.2 "Eager reference counting with
 CompareAndSwap is broken".
I have the book, but it is the first edition and there's no chapter 18 in it :-(
 Just to clarify, I meant taking a slice of a static array that is a
  field of a refcounted class. Is it forbidden to have a field like
 this in a refcounted class or is taking the address through slicing
 forbidden?
It would be forbidden to obtain a slice of a ref counted object in this way - or even to simply refer to a static array embedded in a ref counted object (in safe code).
Ok.
 Two more notes:

 - I'm not sure it is mentioned, but I think you have to describe
 what happens when copying a struct. pre- and post-blit actions have
 to be taken if the struct contain pointers to refcounted objects.
Yes. It's analogous to copying a struct that has fields which contain constructors and destructors.
Ok, I tend to forget about the swapping to a temporary when assigning structs.
 The model I am basing this on is C++'s shared_ptr<>, which makes use
 of all the various rules around construction, assignment, and
 destruction. The wrinkles we have are:

 1. memory safety
This is the hard part.
 2. working with the GC
I don't think that the GC is getting in the way as long as it is mark-and-sweep. A fully reference counting GC is a different story and can be
made concurrent, but it is usually not eager and needs to defer actual reference counting to avoid locks. Instead it logs accesses to some thread local buffer which needs to be processed eventually.

I don't think we should do a fully ref counted GC anyway.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 27.06.2013 15:50, Michel Fortin wrote:
 Le 27-juin-2013 à 8:03, "Rainer Schuetze" <r.sagitario gmx.de> a écrit :

 class C
 {
   C readThis();
   void writeThis(ref C c);
 }

 where the function can include the necessary locks, e.g.

 class C
 {
   int refcnt;

   C readThis()
   {
     synchronized(this)
     {
       refcnt++;
       return this;
     }
   }
   void writeThis(ref C c)
   {
     synchronized(c)
     {
        C x = c;
        c = this;
        if (--c.refcnt == 0)
          delete c;
     }
   }
 }
There's an error in this code. You must synchronize on the lock protecting the pointer, not on the lock at the other end of the pointer's value.
You're right (I have been about to run to a meeting when writing this). Then, readThis will also need a reference to the pointer. Another more obvious bug is that it should read if (--x.refcnt == 0) delete x;
 Also, you only need to do this if the pointer pointing to the object
 is shared. If the pointer is thread-local, assignment does not need
 to be atomic. And if the object itself is thread-local, not even the
 reference counter need to be atomic.
True, these issues only happen with shared pointers. But remember that fields in shared objects are also shared. I also have a hard time to imagine how the code above works with reading pointers that live in registers or writing to "references to registers". These are never shared, so they could have simpler implementations.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 27-juin-2013 à 13:04, Walter Bright  a écrit :

 I don't think we should do a fully ref counted GC anyway.
Speaking of the GC, you should probably rethink this point:
 14. RC objects will still be allocated on the GC heap - this means that a
normal
 GC run will reap RC objects that are in a cycle, and RC objects will get 
automatically
 scanned for heap references with no additional action required by the user.
If you allocate the object from the GC heap, the GC will collect it regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work. If instead you allocate the object outside of the GC heap and your object contains pointers to the GC heap, you'll need to add roots to the GC for any pointer variable in the object. (This is what DMD/Objective-C currently does.) There's no way to detect cycles with that scheme, but it is simple. We could use a hybrid scheme with two reference counts: one for internal references that the GC can see and one for external references that the GC cannot see. The GC cannot collect an object if the external reference count is non-zero. If the external count is zero, it can collect the object if the internal reference count reaches zero or if it becomes unreachable from any root. This allows detection of cycles, as long as this cycle is only made of internal references. Care must be taken about incrementing/decrementing the right reference count depending on the context, which sounds tricky. Or we could use a somewhat less hybrid scheme where we have one reference count and the only thing it does is prevent objects from being deallocated. This can be implemented as one global hash table and you put all objects that have a non-zero reference count in that table. This hash table being scanned by the GC anything in it will never be collected. This will also detect internal cycles like the previous two-counter scheme, but it doesn't allow immediate deallocation as it waits for the GC to deallocate. (This is similar to how it worked in my defunct D/Objective-C bridge that did not rely on tweaking the compiler.)
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/27/2013 11:38 AM, Michel Fortin wrote:
 Le 27-juin-2013 à 13:04, Walter Bright  a écrit :

 I don't think we should do a fully ref counted GC anyway.
Speaking of the GC, you should probably rethink this point:
 14. RC objects will still be allocated on the GC heap - this means that a
normal
 GC run will reap RC objects that are in a cycle, and RC objects will get 
automatically
 scanned for heap references with no additional action required by the user.
If you allocate the object from the GC heap, the GC will collect it
regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work. We already require that if you're going to pass a pointer to any GC allocated data to external code, that you retain a pointer. I see no additional issue with requiring this for COM objects created on the GC heap.
 If instead you allocate the object outside of the GC heap and your object 
contains pointers to the GC heap, you'll need to add roots to the GC for any pointer variable in the object. (This is what DMD/Objective-C currently does.) There's no way to detect cycles with that scheme, but it is simple. Yes, but that's a lot harder (and more error-prone) than simply requiring the programmer to retain a pointer as I outlined above.
 We could use a hybrid scheme with two reference counts: one for internal 
references that the GC can see and one for external references that the GC cannot see. The GC cannot collect an object if the external reference count is non-zero. If the external count is zero, it can collect the object if the internal reference count reaches zero or if it becomes unreachable from any root. This allows detection of cycles, as long as this cycle is only made of internal references. Care must be taken about incrementing/decrementing the right reference count depending on the context, which sounds tricky. That also seems far more complex than what I proposed.
 Or we could use a somewhat less hybrid scheme where we have one reference 
count and the only thing it does is prevent objects from being deallocated. This can be implemented as one global hash table and you put all objects that have a non-zero reference count in that table. This hash table being scanned by the GC anything in it will never be collected. This will also detect internal cycles like the previous two-counter scheme, but it doesn't allow immediate deallocation as it waits for the GC to deallocate. (This is similar to how it worked in my defunct D/Objective-C bridge that did not rely on tweaking the compiler.)

I'd really like to stick to the shared_ptr<T> model. (A global hash table also 
is not so simple when factoring in loading and unloading DLLs.) Of course, for 
the O-C bridge, you can implement it as required to be compatible with O-C.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 27-juin-2013 à 15:32, Walter Bright  a écrit :

 On 6/27/2013 11:38 AM, Michel Fortin wrote:
 14. RC objects will still be allocated on the GC heap - this means that a 
normal
 GC run will reap RC objects that are in a cycle, and RC objects will get 
automatically
 scanned for heap references with no additional action required by the user.
If you allocate the object from the GC heap, the GC will collect it
regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work.
 We already require that if you're going to pass a pointer to any GC allocated 
data to external code, that you retain a pointer. I see no additional issue with requiring this for COM objects created on the GC heap. Perhaps it's just me, but I'd say if you need to anticipate the duration for which you need to keep the object alive when you pass it to some external code it completely defeats the purpose of said external code calling AddRef and Release. With the scheme you propose, reference counting would be useful inside D code as a way to deallocate some classes of objects early without waiting a GC scan. The GC can collect cycles for those objects. People passing COM objects to external code however should allocate those objects outside of the GC if they intend to pass the object to external code. They should also add member pointers as GC roots. Also, no cycle detection for those objects. If done right it could be made memory safe, but cycles will leak. Maybe that could work.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/27/2013 1:15 PM, Michel Fortin wrote:
 Le 27-juin-2013 à 15:32, Walter Bright  a écrit :

 On 6/27/2013 11:38 AM, Michel Fortin wrote:
 14. RC objects will still be allocated on the GC heap - this means that a 
normal
 GC run will reap RC objects that are in a cycle, and RC objects will get 
automatically
 scanned for heap references with no additional action required by the user.
If you allocate the object from the GC heap, the GC will collect it
regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work.
 We already require that if you're going to pass a pointer to any GC 
allocated data to external code, that you retain a pointer. I see no additional issue with requiring this for COM objects created on the GC heap.
 Perhaps it's just me, but I'd say if you need to anticipate the duration for 
which you need to keep the object alive when you pass it to some external code it completely defeats the purpose of said external code calling AddRef and Release.
 With the scheme you propose, reference counting would be useful inside D code 
as a way to deallocate some classes of objects early without waiting a GC scan. The GC can collect cycles for those objects.
 People passing COM objects to external code however should allocate those 
objects outside of the GC if they intend to pass the object to external code. They should also add member pointers as GC roots. Also, no cycle detection for those objects. If done right it could be made memory safe, but cycles will leak.
 Maybe that could work.
Nothing about the proposal acts to prevent one from constructing COM objects any way they wish, including using malloc/free and managing it all themselves. All COM objects require is an implementation of the COM interface, which says nothing at all beyond having a pointer to an AddRef() and Release(). If you are building a COM object that is to be fired and forgotten into the void of unknown external code, I don't think there's any automated replacement for thinking carefully about it and constructing it accordingly. D's memory safety guarantees cannot, of course, cover unknown and unknowable external code. What I'm trying to accomplish with this proposal is: 1. A way to do ref-counted memory allocation for specific objects 2. Do it in a guaranteed memory safe manner (at least for the user of those objects) 3. Do it in a way that does not interfere with people who want to use the GC or do manual memory management 4. Not impose penalties on non-refcounted code 5. Do it in a way that offers a similar performance and overhead profile to C++'s shared_ptr<T> 6. Do it in a way that makes it usable to construct COM objects, and work with NSObject's 7. Not require pointer annotations 8. Address the most common "why I can't use D" complaint What I'm not trying to accomplish is: 1. Replacing all memory allocation in D with ref counting
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 27-juin-2013 à 16:56, Walter Bright  a écrit :

 What I'm trying to accomplish with this proposal is:

 1. A way to do ref-counted memory allocation for specific objects
 2. Do it in a guaranteed memory safe manner (at least for the user of those 
objects)
 3. Do it in a way that does not interfere with people who want to use the GC 
or do manual memory management
 4. Not impose penalties on non-refcounted code
 5. Do it in a way that offers a similar performance and overhead profile to 
C++'s shared_ptr<T>
 6. Do it in a way that makes it usable to construct COM objects, and work 
with NSObject's
 7. Not require pointer annotations
 8. Address the most common "why I can't use D" complaint

 What I'm not trying to accomplish is:

 1. Replacing all memory allocation in D with ref counting
That list is great for limiting the scope of your DIP. Make sure you include it in the DIP. So if we return to the core of it, here's the problems that still need solving: 1. Depending on the reference counting scheme implemented, it might be more efficient to have a single operation for an assignment (retain a/release b) operation. I think that should be allowed. 2. If the pointer variable is shared assignment must be atomic (done under a lock, and it must always be the same lock for a given pointer, obviously). 3. If the pointer variable is shared, reading its value must be done atomically with a retain too. Here's a suggestion for problem number 1 above: class MyObject { // user-implemented static void opRetain(MyObject var); // must accept null static void opRelease(MyObject var); // must accept null // optional (showing default implementation below) // this can be made faster with for some implementations of ref-counting // only call it for an assignment, not for constructing/destructing the pointer // (notably for Objective-C) static void opPtrAssign(ref MyObject var, MyObject newVal) { opRetain(newVal); opRelease(var); var = newVal; } } This maps 1 on 1 to the underlying functions for Objective-C ARC. I don't have a solution for the shared case. We do in fact have a tail-shared problem here. If I have a shared(MyObject), the pointer is as much shared along with the object. When the pointer itself is shared, we need a lock to access it reliably and that can only be provided by the outer context. If we had a way to express tail-shared, then we could repeat the above three functions for tail-shared object pointers and it'd work reliably for that.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin Wrote:

Le 27-juin-2013 à 18:35, Michel Fortin a écrit :

 	class MyObject
 	{
 		// user-implemented
 		static void opRetain(MyObject var);  // must accept null
 		static void opRelease(MyObject var); // must accept null

 		// optional (showing default implementation below)
 		// this can be made faster with for some implementations of ref-counting
 		// only call it for an assignment, not for constructing/destructing the
pointer
 		// (notably for Objective-C)
 		static void opPtrAssign(ref MyObject var, MyObject newVal) {
 			opRetain(newVal);
 			opRelease(var);
 			var = newVal;
 		}
 	}

 This maps 1 on 1 to the underlying functions for Objective-C ARC.
Actually, I made a small error in opRetain. To match Objective-C ARC it should return the retained object: static MyObject opRetain(MyObject var); // must accept null and the default implementation for opPtrAssign would then become: static void opPtrAssign(ref MyObject var, MyObject newVal) { newVal = opRetain(newVal); opRelease(var); var = newVal; } One reason is that Objective-C blocks (equivalent of delegate literals) are stack allocated. If you call retain on a block, it'll make a copy on the heap and return that copy. Another reason for opRetain to return the object is to enable tail-call optimization for cases like this one: NSObject x; NSObject getX() { return x; // D ARC should insert an implicit opRetain here } Of course it doesn't necessarily need to work that way, but it'd certainly make it easier to integrate with Objective-C if it worked that way.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 25.06.2013 23:00, Walter Bright wrote:
 4. Null checks are done before calling any AddRef() or Release().
Here is another nitpick that needs to be addressed: As mentioned in the implementation of ComObject invariants (and out contracts) must not be called when returning from Release, if it is ok to actually delete the object.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 28.06.2013 00:35, Michel Fortin wrote:
 So if we return to the core of it, here's the problems that still
 need solving:

 1. Depending on the reference counting scheme implemented, it might
 be more efficient to have a single operation for an assignment
 (retain a/release b) operation. I think that should be allowed.
 2. If the pointer variable is shared assignment must be atomic (done
 under a lock, and it must always be the same lock for a given
 pointer, obviously).
 3. If the pointer variable is shared, reading its value must be done
 atomically with a retain too.
I just had an idea, maybe it is obvious and just distracts, but I thought it might be worth sharing: Instead of defining methods on the class type, we could also redefine the reference type. The compiler detects a type declaration "reference_type" in the class declaration and replaces all references to that class with that type. class C { alias shared_ptr!C reference_type; } C c = new C; is lowered to shared_ptr!C c = new C; "new C" returns a shared_ptr!C aswell. It is then up to the implementation of shared_ptr to define what member functions to call for reference counting and to deal with proper shared semantics in assignments. It can also define whether opCall should increment the reference count or not. For most of the needed functionality, struct semantics work out-of-the-box. 2 immediate gotchas - In a class hierarchy, you would want to define the reference_type in the base class only, so maybe it has to be a template. I'm not sure implicite casting to base class reference type and interfaces can be implemented. - the implementation of the shared_ptr template will have to be able to deal with the "raw" reference, so that might need some type modifier/annotation. I think this might also be true for the addRef/release version, if the implementation is not just working on the refcount, but is also calling other functions. - To elide redundant reference counting, the compiler will need annotations here, too. Move semantics of structs might reduce the number of reference count operations already, though.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 28.06.2013 09:07, Walter Bright wrote:
 Will add to proposal.

 On 6/27/2013 11:27 PM, Rainer Schuetze wrote:
 On 25.06.2013 23:00, Walter Bright wrote:
 4. Null checks are done before calling any AddRef() or Release().
Here is another nitpick that needs to be addressed: As mentioned in the implementation of ComObject invariants (and out contracts) must not be called when returning from Release, if it is ok to actually delete the object.
Sorry to produce these drop by drop, but while writing the last mail, I noticed another issue to think about: What happens if the class also implements interfaces? A reference of the interface type must do reference counting as well. So the interface must also define AddRef and Release. This is currently true for COM-interfaces derived from IUnknown, but not for other interfaces.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 28-juin-2013 à 4:55, Rainer Schuetze <r.sagitario gmx.de> a écrit :

 What happens if the class also implements interfaces? A reference of the 
interface type must do reference counting as well. So the interface must also define AddRef and Release. This is currently true for COM-interfaces derived from IUnknown, but not for other interfaces. I would assume if an object of a reference-counted class cannot be cast to its base non-reference-counted class that it'd be the same for casting to non-reference-counted interfaces.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 28.06.2013 13:37, Michel Fortin wrote:
 Le 28-juin-2013 à 4:55, Rainer Schuetze <r.sagitario gmx.de> a écrit
 :

 What happens if the class also implements interfaces? A reference
 of the interface type must do reference counting as well. So the
 interface must also define AddRef and Release. This is currently
 true for COM-interfaces derived from IUnknown, but not for other
 interfaces.
I would assume if an object of a reference-counted class cannot be cast to its base non-reference-counted class that it'd be the same for casting to non-reference-counted interfaces.
Yes, that is probably the way to go. But that makes using protocols like COM difficult in safe mode. We have already talked Walter into not linking reference counting to AddRef and Release, but if it is implemented with other methods, these cannot be added to the already existing COM interfaces. Passing or getting interface pointers to/from external code being unsafe sounds ok, but passing around these interface references in D code would be unsafe as well. Adding aliases or non-virtual wrappers to the interface declaration to forward reference counting to AddRef/Release might help but could also introduce ambiguities in a class that derives both from a reference counted base class and an interface like this.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 25-juin-2013 à 17:00, Walter Bright  a écrit :

 If a class contains the following methods, in either itself or a base class, 
it is
 an RC class:


    T AddRef();
    T Release();

 An RC class is like a regular D class with these additional semantics:

 1. In  safe code, casting (implicit or explicit) to a base class that does not
 have both AddRef() and Release() is an error.
I'm just realizing that this means safe code cannot call any member function of the non-reference-counted base class. This would require an implicit conversion of "this" to the base class. system code could, but it'd be extremely uneasy doing such calls unless I am the one in charge of that code and can make sure the base function will never store the (unretained) pointer somewhere it shouldn't now and in the future. An misstep here and you get memory corruption. Seriously, I don't think system code should allow implicit conversions to the base class, it should be explicit. I am starting to doubt there is any value in inheriting the base ref-counted-class from another class.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/28/2013 1:47 AM, Rainer Schuetze wrote:
 On 28.06.2013 00:35, Michel Fortin wrote:
 So if we return to the core of it, here's the problems that still
 need solving:

 1. Depending on the reference counting scheme implemented, it might
 be more efficient to have a single operation for an assignment
 (retain a/release b) operation. I think that should be allowed.
 2. If the pointer variable is shared assignment must be atomic (done
 under a lock, and it must always be the same lock for a given
 pointer, obviously).
 3. If the pointer variable is shared, reading its value must be done
 atomically with a retain too.
I just had an idea, maybe it is obvious and just distracts, but I thought it
might be worth sharing:
 Instead of defining methods on the class type, we could also redefine the 
reference type. The compiler detects a type declaration "reference_type" in the class declaration and replaces all references to that class with that type.
 class C
 {
     alias shared_ptr!C reference_type;
 }

 C c = new C;

 is lowered to

 shared_ptr!C c = new C;

 "new C" returns a shared_ptr!C aswell.

 It is then up to the implementation of shared_ptr to define what member 
functions to call for reference counting and to deal with proper shared semantics in assignments. It can also define whether opCall should increment the reference count or not. For most of the needed functionality, struct semantics work out-of-the-box.
 2 immediate gotchas

 - In a class hierarchy, you would want to define the reference_type in the 
base class only, so maybe it has to be a template. I'm not sure implicite casting to base class reference type and interfaces can be implemented.
 - the implementation of the shared_ptr template will have to be able to deal 
with the "raw" reference, so that might need some type modifier/annotation. I think this might also be true for the addRef/release version, if the implementation is not just working on the refcount, but is also calling other functions.
 - To elide redundant reference counting, the compiler will need annotations 
here, too. Move semantics of structs might reduce the number of reference count operations already, though.

The main problem with this is the decay of a shared_ptr!C to a C. Once that 
happens, all the memory safety goes out the window.
Oct 09 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/28/2013 7:14 AM, Michel Fortin wrote:
 Le 25-juin-2013 à 17:00, Walter Bright  a écrit :

 If a class contains the following methods, in either itself or a base class, 
it is
 an RC class:


     T AddRef();
     T Release();

 An RC class is like a regular D class with these additional semantics:

 1. In  safe code, casting (implicit or explicit) to a base class that does not
 have both AddRef() and Release() is an error.
I'm just realizing that this means safe code cannot call any member function
of the non-reference-counted base class. This would require an implicit conversion of "this" to the base class. That's right.
  system code could, but it'd be extremely uneasy doing such calls unless I am 
the one in charge of that code and can make sure the base function will never store the (unretained) pointer somewhere it shouldn't now and in the future. An misstep here and you get memory corruption. Seriously, I don't think system code should allow implicit conversions to the base class, it should be explicit. It's a worthy point.
 I am starting to doubt there is any value in inheriting the base 
ref-counted-class from another class.

Oct 09 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 10 October 2013 at 02:19:02 UTC, Walter Bright wrote:
  system code could, but it'd be extremely uneasy doing such
calls unless I am the one in charge of that code and can make sure the base function will never store the (unretained) pointer somewhere it shouldn't now and in the future. An misstep here and you get memory corruption. Seriously, I don't think system code should allow implicit conversions to the base class, it should be explicit. It's a worthy point.
It means OOP is completely broken with that design.
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/9/2013 7:21 PM, deadalnix wrote:
 It means OOP is completely broken with that design.
I know. The thread kind of petered out as we began to realize the obstacles with this approach. But it's important to have the conversation in the record so we don't have to go through it again.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 28.06.2013 21:50, Walter Bright wrote:
 The main problem with this is the decay of a shared_ptr!C to a C. Once
 that happens, all the memory safety goes out the window.
By "decay", do mean the lowering or something else? There is no stray C reference in user code, it always gets lowered to shared_ptr!C. Only trusted code in shared_ptr will have to deal with "raw" references. It is shared_ptr's responsibilty to maintain memory safety, just the same as for AddRef and Release.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/28/2013 1:11 PM, Rainer Schuetze wrote:
 On 28.06.2013 21:50, Walter Bright wrote:
 The main problem with this is the decay of a shared_ptr!C to a C. Once
 that happens, all the memory safety goes out the window.
By "decay", do mean the lowering or something else? There is no stray C reference in user code, it always gets lowered to
shared_ptr!C. Only trusted code in shared_ptr will have to deal with "raw" references. It is shared_ptr's responsibilty to maintain memory safety, just the same as for AddRef and Release.

"Decay" means it is converted to type C in order to call functions that take C 
as the 'this' pointer or C as a parameter. The problem is both type C and type 
shared_ptr!C will exist.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 28.06.2013 22:29, Walter Bright wrote:
 On 6/28/2013 1:11 PM, Rainer Schuetze wrote:
 On 28.06.2013 21:50, Walter Bright wrote:
 The main problem with this is the decay of a shared_ptr!C to a C. Once
 that happens, all the memory safety goes out the window.
By "decay", do mean the lowering or something else? There is no stray C reference in user code, it always gets lowered to shared_ptr!C. Only trusted code in shared_ptr will have to deal with "raw" references. It is shared_ptr's responsibilty to maintain memory safety, just the same as for AddRef and Release.
"Decay" means it is converted to type C in order to call functions that take C as the 'this' pointer or C as a parameter. The problem is both type C and type shared_ptr!C will exist.
Any parameter of type C is also lowered to shared_ptr!C. Calling a member function would go through opDot, which could also do reference counting for safety. Treating every explicite or implicite usage of "this" as a temporary shared_ptr!C might be overkill, so it could be restricted to assigning "this" to another reference (this includes passing it as an argument to another function or returning it from a function). My current adhoc rule: if "this" is not followed by a '.', it has to be lowered to construct shared_ptr!C(this). Assuming the reference count is updated by shared_ptr!C.opDot, there will always be a thread local reference while inside a member function (it must have been called through an external reference at least once). Other member functions of the same object can always be called without ref-counting assuming that the object never gets destroyed through changing other references.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/28/2013 2:03 PM, Rainer Schuetze wrote:
 On 28.06.2013 22:29, Walter Bright wrote:
 On 6/28/2013 1:11 PM, Rainer Schuetze wrote:
 On 28.06.2013 21:50, Walter Bright wrote:
 The main problem with this is the decay of a shared_ptr!C to a C. Once
 that happens, all the memory safety goes out the window.
By "decay", do mean the lowering or something else? There is no stray C reference in user code, it always gets lowered to shared_ptr!C. Only trusted code in shared_ptr will have to deal with "raw" references. It is shared_ptr's responsibilty to maintain memory safety, just the same as for AddRef and Release.
"Decay" means it is converted to type C in order to call functions that take C as the 'this' pointer or C as a parameter. The problem is both type C and type shared_ptr!C will exist.
Any parameter of type C is also lowered to shared_ptr!C.
I don't see how lowering C to shared_ptr!C and lowering share_ptr!C to C can work?
 Calling a member function would go through opDot, which could also do 
reference counting for safety. Treating every explicite or implicite usage of "this" as a temporary shared_ptr!C might be overkill, so it could be restricted to assigning "this" to another reference (this includes passing it as an argument to another function or returning it from a function). My current adhoc rule: if "this" is not followed by a '.', it has to be lowered to construct shared_ptr!C(this).
 Assuming the reference count is updated by shared_ptr!C.opDot, there will 
always be a thread local reference while inside a member function (it must have been called through an external reference at least once). Other member functions of the same object can always be called without ref-counting assuming that the object never gets destroyed through changing other references.

Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:

Le 28-juin-2013 à 17:03, Rainer Schuetze <r.sagitario gmx.de> a écrit :

 Any parameter of type C is also lowered to shared_ptr!C.
class C {} People still constantly forget that C used as a type represents a *reference* to an object of class C, not the object itself. If you replace type C with shared_ptr!C, you must then replace it with shared_ptr!(shared_ptr!C) and so on; there's no end to it. Also, I strongly doubt the compiler will be able to elide redundant calls to retain/release made within shared_ptr!C while still respecting normal struct semantics.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/28/2013 6:42 PM, Michel Fortin wrote:
 Le 28-juin-2013 à 17:03, Rainer Schuetze <r.sagitario gmx.de> a écrit :

 Any parameter of type C is also lowered to shared_ptr!C.
class C {} People still constantly forget that C used as a type represents a *reference*
to an object of class C, not the object itself. If you replace type C with shared_ptr!C, you must then replace it with shared_ptr!(shared_ptr!C) and so on; there's no end to it.
 Also, I strongly doubt the compiler will be able to elide redundant calls to 
retain/release made within shared_ptr!C while still respecting normal struct semantics.

Using some sort of shared_ptr!T was the original idea, but I could not figure a 
reasonable way to make it memory safe without the compiler knowing about it.
The 
easiest way to have the compiler know about it is to make it some sort of class 
type, not a struct type.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/27/2013 11:33 AM, Rainer Schuetze wrote:
 On 27.06.2013 19:04, Walter Bright wrote:
 IIRC you also have the GC handbook book on your shelf. Check the
 chapters on RC, especially algorithm 18.2 "Eager reference counting with
 CompareAndSwap is broken".
I have the book, but it is the first edition and there's no chapter 18 in it :-(
I can remove the dust from my scanner to copy the 3 mostly relevant pages and
send them to you.

I understand the issue (I think), but I can't think of a case where the ref 
count would be 1 when this happens.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/28/2013 1:55 AM, Rainer Schuetze wrote:
 What happens if the class also implements interfaces? A reference of the 
interface type must do reference counting as well. So the interface must also define AddRef and Release. This is currently true for COM-interfaces derived from IUnknown, but not for other interfaces.

Even implementing IUnknown is a problem, if we do the suggestion that opAddref 
and opRelease be used as wrappers around AddRef and Release.

I think the simplest thing is to not allow ref counted classes to implement 
interfaces other than ones derived from IUnknown.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 29.06.2013 00:22, Walter Bright wrote:
 Any parameter of type C is also lowered to shared_ptr!C.
I don't see how lowering C to shared_ptr!C and lowering share_ptr!C to C can work?
I don't see why you would want to lower from shared_ptr!C to C. It's only inside shared_ptr where access to the non-lowered C is needed, e.g. by disabling the lowering inside the shared_ptr. I was referring to "raw" references before, so the lowering would be better shared_ptr!(__raw(C)). But I agree, having the lowering include the original seems bad. I realized a worse flaw with my proposal: it doesn't solve the assignment problem it was meant to. shared_ptr implemented as a struct does not have full control of the assignment, but is only called for the postblit and the destruction of the previous value. It has no way to put a lock around the full assignment. Still thinking too much C++... Sorry for the noise.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/29/2013 12:16 AM, Rainer Schuetze wrote:
 Sorry for the noise.
No problem. It's a complicated subject, and none of us can think of all the ramifications. That's why this thread exists.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:

On 29.06.2013 06:38, Walter Bright wrote:
 On 6/27/2013 11:33 AM, Rainer Schuetze wrote:
 On 27.06.2013 19:04, Walter Bright wrote:
 IIRC you also have the GC handbook book on your shelf. Check the
 chapters on RC, especially algorithm 18.2 "Eager reference counting
 with
 CompareAndSwap is broken".
I have the book, but it is the first edition and there's no chapter 18 in it :-(
I can remove the dust from my scanner to copy the 3 mostly relevant pages and send them to you.
I tried to scan it yesterday, but got large black bar at the fold (don't know if this the correct term) that eraased the first inch of text. I would have to rip the book apart to get better results.
 I understand the issue (I think), but I can't think of a case where the
 ref count would be 1 when this happens.
Consider a global shared reference R that holds the last reference to an object O. One thread exchanges the reference with another reference P while another thread reads the reference into S. shared(C) R = O; ; refcnt of O is 1 in pseudo-assembly missing null-checks: Thread1 (R = P) Thread2 (S = R) mov ecx,[R] ; thread suspended mov eax,[P] inc [eax].refcnt mov ebx,[R] mov [R],eax dec [ebx].refcnt ; refcnt of O now 0 jnz done call delete_ebx ; thread resumed inc [ecx].refcnt done: The increment on [ecx].refcnt modifies garbage.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:
On 29.06.2013 09:36, Rainer Schuetze wrote:
                         inc [ecx].refcnt
 done:
mov [S], ecx Just wanted to add the book states that lock-free reference counting can be implemented with a cas2 operation modifying two seperate locations atomically. Unfortunately this operation does not exist in most processors. This might be the quoted paper: http://scholr.ly/paper/2199608/lock-free-reference-counting
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/29/2013 12:36 AM, Rainer Schuetze wrote:
 On 29.06.2013 06:38, Walter Bright wrote:
 On 6/27/2013 11:33 AM, Rainer Schuetze wrote:
 On 27.06.2013 19:04, Walter Bright wrote:
 IIRC you also have the GC handbook book on your shelf. Check the
 chapters on RC, especially algorithm 18.2 "Eager reference counting
 with
 CompareAndSwap is broken".
I have the book, but it is the first edition and there's no chapter 18 in it :-(
I can remove the dust from my scanner to copy the 3 mostly relevant pages and send them to you.
I tried to scan it yesterday, but got large black bar at the fold (don't know
if this the correct term) that eraased the first inch of text. I would have to rip the book apart to get better results. Ah, don't rip up your book! (I cut the back off of mine and scanned it, but I no longer care to store the thousands of pounds of books I have anymore, and I like that my whole library fits on my laptop now!).
 I understand the issue (I think), but I can't think of a case where the
 ref count would be 1 when this happens.
Consider a global shared reference R that holds the last reference to an
object O. One thread exchanges the reference with another reference P while another thread reads the reference into S.
 shared(C) R = O;      ; refcnt of O is 1

 in pseudo-assembly missing null-checks:

 Thread1 (R = P)        Thread2 (S = R)

                        mov ecx,[R]
                        ; thread suspended
 mov eax,[P]
 inc [eax].refcnt
 mov ebx,[R]
 mov [R],eax
 dec [ebx].refcnt      ; refcnt of O now 0
 jnz done
 call delete_ebx
                        ; thread resumed
                        inc [ecx].refcnt
 done:

 The increment on [ecx].refcnt modifies garbage.
Ok, I see. Let me think about it some more.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Jacob Carlborg:

On 29 jun 2013, at 06:42, Walter Bright wrote:

 I think the simplest thing is to not allow ref counted classes to implement 
interfaces other than ones derived from IUnknown. What about Objective-C interfaces?
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:
Le 29-juin-2013 à 6:08, Jacob Carlborg a écrit :

 On 29 jun 2013, at 06:42, Walter Bright wrote:

 I think the simplest thing is to not allow ref counted classes to implement 
interfaces other than ones derived from IUnknown.
 What about Objective-C interfaces?
Implementing ARC for Objective-C is going to require some more compiler support anyway (for autoreleased objects notably). Tweaking the compiler it so accepts Objective-C interfaces should just be a matter of tweaking the boolean expression that makes this check. "if (classdelc->objc && interfacedecl->objc) return true;" or something like that. As for which function the compiler should call, it'll probably need to be special-cased for Objective-C in the compiler too. Here's the list of changes that'd be needed for Objective-C ARC: == Retain == COM: if (obj) obj->AddRef(); ObjC: obj = objc_retain(obj); ObjC blocks: obj = objc_retainBlock(obj); // objc_retainBlock might do a copy == Release == COM: if (obj) obj->Release(); ObjC: objc_release(obj); == Assignment == COM: if (obj) obj->AddRef(); if (var) var->Release(); var = obj; ObjC: objc_storeStrong(&var, obj); ObjC blocks: obj = objc_retainBlock(obj); objc_release(var); var = obj; As long as Walter implements D ARC in a way we can make the above substitutions it shouldn't be too hard. Then, support for autorelease is a matter of calling objc_autorelease on returned objects from autoreleasing functions, followed by objc_retain after the function call in the caller. We'll also have to check what ObjC ARC does for pointer write-backs to autoreleased variables and mimick that. There's an optimized path for autoreleased return values that we should use, but I'd defer that to later. It involves a special no-op instruction to insert at the right place as a flag. Also, autoreleased returns should be eliminated when inlining.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Looks like this should go into the O-C DIP.


On 6/29/2013 5:38 AM, Michel Fortin wrote:
 Implementing ARC for Objective-C is going to require some more compiler 
support anyway (for autoreleased objects notably). Tweaking the compiler it so accepts Objective-C interfaces should just be a matter of tweaking the boolean expression that makes this check. "if (classdelc->objc && interfacedecl->objc) return true;" or something like that.
 As for which function the compiler should call, it'll probably need to be 
special-cased for Objective-C in the compiler too. Here's the list of changes that'd be needed for Objective-C ARC:
 == Retain ==
 COM:  if (obj) obj->AddRef();
 ObjC: obj = objc_retain(obj);
 ObjC blocks: obj = objc_retainBlock(obj); // objc_retainBlock might do a copy

 == Release ==
 COM:  if (obj) obj->Release();
 ObjC: objc_release(obj);

 == Assignment ==
 COM:  if (obj) obj->AddRef(); if (var) var->Release(); var = obj;
 ObjC: objc_storeStrong(&var, obj);
 ObjC blocks: obj = objc_retainBlock(obj); objc_release(var); var = obj;

 As long as Walter implements D ARC in a way we can make the above 
substitutions it shouldn't be too hard.
 Then, support for autorelease is a matter of calling objc_autorelease on 
returned objects from autoreleasing functions, followed by objc_retain after the function call in the caller. We'll also have to check what ObjC ARC does for pointer write-backs to autoreleased variables and mimick that.
 There's an optimized path for autoreleased return values that we should use, 
but I'd defer that to later. It involves a special no-op instruction to insert at the right place as a flag. Also, autoreleased returns should be eliminated when inlining.

Oct 09 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Jacob Carlborg:

On 29 jun 2013, at 22:24, Walter Bright wrote:

 Looks like this should go into the O-C DIP.
Shouldn't all this reference counting be in its own DIP?
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2013 1:50 AM, Jacob Carlborg wrote:
 On 29 jun 2013, at 22:24, Walter Bright wrote:

 Looks like this should go into the O-C DIP.
Shouldn't all this reference counting be in its own DIP?
Yes. It's not ready yet, though.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:
Le 25-juin-2013 à 17:00, Walter Bright  a écrit :

 6. If a class or struct contains RC fields, calls to Release() for those 
fields will
 be added to the destructor, and a destructor will be created if one doesn't 
exist already. Another thing to note that the above is dangerous if the destructor is called from the GC and RC objects are allocated from GC memory. Referenced objects might already have been destroyed and you'll be calling Release() on them. This will happen when the GC releases a cycle.
Oct 09 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2013 12:35 PM, Michel Fortin wrote:
 Le 25-juin-2013 à 17:00, Walter Bright  a écrit :

 6. If a class or struct contains RC fields, calls to Release() for those 
fields will
 be added to the destructor, and a destructor will be created if one doesn't 
exist already.
 Another thing to note that the above is dangerous if the destructor is called 
from the GC and RC objects are allocated from GC memory. Referenced objects might already have been destroyed and you'll be calling Release() on them. This will happen when the GC releases a cycle.

Amended as:

6. If a class or struct contains RC fields, calls to Release() for those fields
will
be added to the destructor, and a destructor will be created if one doesn't 
exist already.
Release() implementations should take care to not destroy objects that are 
already destroyed,
which can happen if the objects are allocated on the GC heap and the GC removes 
a cycle of
refcounted objects.
Oct 09 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:
Le 2013-06-30 à 16:32, Walter Bright  a écrit :

 Amended as:

 6. If a class or struct contains RC fields, calls to Release() for those 
fields will
 be added to the destructor, and a destructor will be created if one doesn't 
exist already.
 Release() implementations should take care to not destroy objects that are 
already destroyed,
 which can happen if the objects are allocated on the GC heap and the GC 
removes a cycle of
 refcounted objects.
Good advice. But... how do you implement that? For one thing, I doubt there's an API in the GC you can query for deleted objects, and if there was it'd be inefficient to call it for every call to Release. And also, will a virtual call to a function of a destroyed object work in the first place? It all seems quite fragile to me.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2013 3:05 PM, Michel Fortin wrote:
 Le 2013-06-30 à 16:32, Walter Bright  a écrit :

 Amended as:

 6. If a class or struct contains RC fields, calls to Release() for those 
fields will
 be added to the destructor, and a destructor will be created if one doesn't 
exist already.
 Release() implementations should take care to not destroy objects that are 
already destroyed,
 which can happen if the objects are allocated on the GC heap and the GC 
removes a cycle of
 refcounted objects.
Good advice. But... how do you implement that? For one thing, I doubt there's
an API in the GC you can query for deleted objects, and if there was it'd be inefficient to call it for every call to Release. And also, will a virtual call to a function of a destroyed object work in the first place? It all seems quite fragile to me.

The GC doesn't actually delete anything while it is doing a collection cycle.
So 
the refcount could simply be checked.
Oct 09 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jun 30, 2013, at 6:11 PM, Walter Bright wrote:

 On 6/30/2013 3:05 PM, Michel Fortin wrote:
 Le 2013-06-30 à 16:32, Walter Bright  a écrit :

 Amended as:

 6. If a class or struct contains RC fields, calls to Release() for those 
fields will
 be added to the destructor, and a destructor will be created if one doesn't 
exist already.
 Release() implementations should take care to not destroy objects that are 
already destroyed,
 which can happen if the objects are allocated on the GC heap and the GC 
removes a cycle of
 refcounted objects.
Good advice. But... how do you implement that? For one thing, I doubt
there's an API in the GC you can query for deleted objects, and if there was it'd be inefficient to call it for every call to Release. And also, will a virtual call to a function of a destroyed object work in the first place? It all seems quite fragile to me.

 The GC doesn't actually delete anything while it is doing a collection cycle. 
So the refcount could simply be checked. AFAIK, this isn't a requirement of the GC. May want to add it. I have bad experiences with trying to second guess the GC and when it actually kills the object. Note, if this is the case, then inc/dec refcount cannot depend on vtable, since that is zeroed. I'm wondering if the GC shouldn't set the RC to size_t.max when destructing, or even just +1 it, to ensure the ref count destructor doesn't accidentally free it before the reaper does. -Steve
Oct 09 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:
Le 30-juin-2013 à 18:11, Walter Bright  a écrit :

 On 6/30/2013 3:05 PM, Michel Fortin wrote:
 Le 2013-06-30 à 16:32, Walter Bright  a écrit :

 Amended as:

 6. If a class or struct contains RC fields, calls to Release() for those 
fields will
 be added to the destructor, and a destructor will be created if one doesn't 
exist already.
 Release() implementations should take care to not destroy objects that are 
already destroyed,
 which can happen if the objects are allocated on the GC heap and the GC 
removes a cycle of
 refcounted objects.
Good advice. But... how do you implement that? For one thing, I doubt
there's an API in the GC you can query for deleted objects, and if there was it'd be inefficient to call it for every call to Release. And also, will a virtual call to a function of a destroyed object work in the first place? It all seems quite fragile to me.
 The GC doesn't actually delete anything while it is doing a collection cycle. 
So the refcount could simply be checked. ... checked and decremented, and if it reaches zero in the thread the GC is currently running then it doesn't have to delete the object as, in theory, it should be destructed as part of the same run. Ok, I get it now. You should add a requirement that the reference counter be atomic because the GC can run in any thread and you still need to decrement counters of referenced objects in destructor. Honestly, I think it'd be much easier if the runtime provided its own base object you could use for reference counting with the GC to collect cycles. The provided implementation could rely on internal details of the GC since both would be part of druntime. There isn't much room for alternate implementations when the GC is involved anyway.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2013 4:35 PM, Michel Fortin wrote:
 Le 30-juin-2013 à 18:11, Walter Bright  a écrit :

 On 6/30/2013 3:05 PM, Michel Fortin wrote:
 Le 2013-06-30 à 16:32, Walter Bright  a écrit :

 Amended as:

 6. If a class or struct contains RC fields, calls to Release() for those 
fields will
 be added to the destructor, and a destructor will be created if one 
doesn't exist already.
 Release() implementations should take care to not destroy objects that are 
already destroyed,
 which can happen if the objects are allocated on the GC heap and the GC 
removes a cycle of
 refcounted objects.
Good advice. But... how do you implement that? For one thing, I doubt
there's an API in the GC you can query for deleted objects, and if there was it'd be inefficient to call it for every call to Release. And also, will a virtual call to a function of a destroyed object work in the first place? It all seems quite fragile to me.
 The GC doesn't actually delete anything while it is doing a collection 
cycle. So the refcount could simply be checked.
 ... checked and decremented, and if it reaches zero in the thread the GC is 
currently running then it doesn't have to delete the object as, in theory, it should be destructed as part of the same run. Ok, I get it now.
 You should add a requirement that the reference counter be atomic because the 
GC can run in any thread and you still need to decrement counters of referenced objects in destructor. I very much want to avoid requiring atomic counts - it's a major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
 Honestly, I think it'd be much easier if the runtime provided its own base 
object you could use for reference counting with the GC to collect cycles. The provided implementation could rely on internal details of the GC since both would be part of druntime. There isn't much room for alternate implementations when the GC is involved anyway.

Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:

 I very much want to avoid requiring atomic counts - it's a major performance 
penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue. I think you didn't understand what Michel was saying. Take for example: A->B->C->A this is a cycle. Imagine that nobody else is pointing at A, B or C. Fine. The GC starts to collect this cycle. But let's say that D is not being collected *AND* B has a reference to D. B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count. I agree that RC optimally is thread-local. But if you involve the GC, then ref incs and decs have to be atomic. I don't think this is that bad. iOS on ARM which has terrible atomic primitives uses atomic reference counts. If you do NOT involve the GC and are careful about cycles, then you could potentially have a RC solution that does not require atomics. But that would have to be a special case, with the danger of having cycles. -Steve
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:
Le 30-juin-2013 à 20:25, Steven Schveighoffer a écrit :

 A->B->C->A

 this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine. 
The GC starts to collect this cycle.
 But let's say that D is not being collected *AND* B has a reference to D.

 B could be getting destroyed in one thread, and decrementing D's reference 
count, while someone else in another thread is incrementing/decrementing D's reference count.
 I agree that RC optimally is thread-local.  But if you involve the GC, then 
ref incs and decs have to be atomic. Exactly what I was trying to explain. Thanks.
 I don't think this is that bad.  iOS on ARM which has terrible atomic 
primitives uses atomic reference counts. Moreover iOS uses a single spinlock to protect a global hash table containing all reference counts.
 If you do NOT involve the GC and are careful about cycles, then you could 
potentially have a RC solution that does not require atomics. But that would have to be a special case, with the danger of having cycles. Not involving the GC is quite difficult: you need to be absolutely sure you have no pointer pointing to that thread-local ref-counted object anywhere in the GC-heap. Unfortunately, there's no way to guaranty statically what is part of the GC heap and what is not, so any non-atomic reference counter is not safe.
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:
On Jun 30, 2013, at 10:08 PM, Michel Fortin wrote:

 Le 30-juin-2013 à 20:25, Steven Schveighoffer a écrit :
 I don't think this is that bad.  iOS on ARM which has terrible atomic 
primitives uses atomic reference counts.
 Moreover iOS uses a single spinlock to protect a global hash table containing 
all reference counts. Hearing this, I actually find it amazing how well it works :)
 If you do NOT involve the GC and are careful about cycles, then you could 
potentially have a RC solution that does not require atomics. But that would have to be a special case, with the danger of having cycles.
 Not involving the GC is quite difficult: you need to be absolutely sure you 
have no pointer pointing to that thread-local ref-counted object anywhere in the GC-heap. Unfortunately, there's no way to guaranty statically what is part of the GC heap and what is not, so any non-atomic reference counter is not safe. This is true, I was thinking of garbage collected RC object referring to RC object, I wasn't thinking of fully GC object referring to RC object. In terms of pure functions and possibly a nogc attribute, this might be a possibility. Maybe at some point we have a nogcref attribute we attach to specific *types* so the compiler prevents you from storing any references to that type in the GC. I think it is be important to reserve the possibility for having cases where RC inc/dec is not atomic. Especially where we have D's type system identifying what is shared and what is not. Especially when there is the possibility for thread-local GCs. -Steve
Oct 09 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2013 5:25 PM, Steven Schveighoffer wrote:
 On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:

 I very much want to avoid requiring atomic counts - it's a major performance 
penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
 I think you didn't understand what Michel was saying.

 Take for example:

 A->B->C->A

 this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine. 
The GC starts to collect this cycle.
 But let's say that D is not being collected *AND* B has a reference to D.

 B could be getting destroyed in one thread, and decrementing D's reference 
count, while someone else in another thread is incrementing/decrementing D's reference count.
 I agree that RC optimally is thread-local.  But if you involve the GC, then 
ref incs and decs have to be atomic. This is actually a problem right now with the GC, as destructors may be run in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.
 I don't think this is that bad.  iOS on ARM which has terrible atomic 
primitives uses atomic reference counts. It's bad. ARM is not the only processor out there.
Oct 09 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jun 30, 2013, at 10:26 PM, Walter Bright wrote:

 On 6/30/2013 5:25 PM, Steven Schveighoffer wrote:
 On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:

 I very much want to avoid requiring atomic counts - it's a major 
performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
 I think you didn't understand what Michel was saying.

 Take for example:

 A->B->C->A

 this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine. 
The GC starts to collect this cycle.
 But let's say that D is not being collected *AND* B has a reference to D.

 B could be getting destroyed in one thread, and decrementing D's reference 
count, while someone else in another thread is incrementing/decrementing D's reference count.
 I agree that RC optimally is thread-local.  But if you involve the GC, then 
ref incs and decs have to be atomic.
 This is actually a problem right now with the GC, as destructors may be run 
in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in. I think that's a tall order presently. For instance, on linux, the threads are all stopped using a signal. It's a very bad idea to run destructors in a signal handler. What it seems like you are saying is that a prerequisite for ref counting is to have thread-local GC working. If that is the case, we need to start a thread-local GC "thread" before this goes any further.
 I don't think this is that bad.  iOS on ARM which has terrible atomic 
primitives uses atomic reference counts.
 It's bad. ARM is not the only processor out there.
Pragmatically, I think if D targets x86 variants and ARM, it is well-situated in the mainstream of existing devices. Yes, it would be nice if it could target other obscure platforms, but if we are talking ref counting works poorly on those, I don't think we are any worse off than today. Note that we can keep the options open, and implement atomic RC now without many headaches. -Steve
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2013 7:36 PM, Steven Schveighoffer wrote:
 On Jun 30, 2013, at 10:26 PM, Walter Bright wrote:

 On 6/30/2013 5:25 PM, Steven Schveighoffer wrote:
 On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:

 I very much want to avoid requiring atomic counts - it's a major 
performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
 I think you didn't understand what Michel was saying.

 Take for example:

 A->B->C->A

 this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine. 
The GC starts to collect this cycle.
 But let's say that D is not being collected *AND* B has a reference to D.

 B could be getting destroyed in one thread, and decrementing D's reference 
count, while someone else in another thread is incrementing/decrementing D's reference count.
 I agree that RC optimally is thread-local.  But if you involve the GC, then 
ref incs and decs have to be atomic.
 This is actually a problem right now with the GC, as destructors may be run 
in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.
 I think that's a tall order presently.  For instance, on linux, the threads 
are all stopped using a signal. It's a very bad idea to run destructors in a signal handler.
 What it seems like you are saying is that a prerequisite for ref counting is 
to have thread-local GC working. If that is the case, we need to start a thread-local GC "thread" before this goes any further. Not really. This doesn't make anything worse. Also, the proposed solution to this issue is to post the "destruct" list to the appropriate thread, and that thread runs it next time it calls the GC.
 I don't think this is that bad.  iOS on ARM which has terrible atomic 
primitives uses atomic reference counts.
 It's bad. ARM is not the only processor out there.
Pragmatically, I think if D targets x86 variants and ARM, it is well-situated
in the mainstream of existing devices. Yes, it would be nice if it could target other obscure platforms, but if we are talking ref counting works poorly on those, I don't think we are any worse off than today. Note that we can keep the options open, and implement atomic RC now without many headaches.

We don't need to require atomic RC for these.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jul 1, 2013, at 3:11 AM, Walter Bright wrote:

 On 6/30/2013 7:36 PM, Steven Schveighoffer wrote:
 I think that's a tall order presently.  For instance, on linux, the threads 
are all stopped using a signal. It's a very bad idea to run destructors in a signal handler.
 What it seems like you are saying is that a prerequisite for ref counting is 
to have thread-local GC working. If that is the case, we need to start a thread-local GC "thread" before this goes any further.
 Not really. This doesn't make anything worse. Also, the proposed solution to 
this issue is to post the "destruct" list to the appropriate thread, and that thread runs it next time it calls the GC. I really urge you to make this a separate project. It's not trivial. Logically, it's sound, but the implementation will be very difficult. I also think Sean (and probably others) should be involved for that discussion.
 Pragmatically, I think if D targets x86 variants and ARM, it is 
well-situated in the mainstream of existing devices. Yes, it would be nice if it could target other obscure platforms, but if we are talking ref counting works poorly on those, I don't think we are any worse off than today. Note that we can keep the options open, and implement atomic RC now without many headaches.

 We don't need to require atomic RC for these.
I didn't say that. I said we could implement atomic RC without any changes to the GC, and worry about optimizing with non-atomic RC later. As long as we make it *possible*. -Steve
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/1/2013 6:08 AM, Steven Schveighoffer wrote:
 On Jul 1, 2013, at 3:11 AM, Walter Bright wrote:

 On 6/30/2013 7:36 PM, Steven Schveighoffer wrote:
 I think that's a tall order presently.  For instance, on linux, the threads 
are all stopped using a signal. It's a very bad idea to run destructors in a signal handler.
 What it seems like you are saying is that a prerequisite for ref counting 
is to have thread-local GC working. If that is the case, we need to start a thread-local GC "thread" before this goes any further.
 Not really. This doesn't make anything worse. Also, the proposed solution to 
this issue is to post the "destruct" list to the appropriate thread, and that thread runs it next time it calls the GC.
 I really urge you to make this a separate project.  It's not trivial. 
Logically, it's sound, but the implementation will be very difficult. I also think Sean (and probably others) should be involved for that discussion. Make what a separate project? The destruction of objects by the GC in local threads? It already is not part of the ref counting proposal.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:

On Jul 1, 2013, at 12:17 PM, Walter Bright wrote:

 I really urge you to make this a separate project.  It's not trivial. 
Logically, it's sound, but the implementation will be very difficult. I also think Sean (and probably others) should be involved for that discussion.
 Make what a separate project? The destruction of objects by the GC in local 
threads? It already is not part of the ref counting proposal.


As far as I can tell, the ref counting proposal is not viable without it, as 
long as you insist on non-atomic RC increments and decrements.  How can it 
possibly not be a prerequisite to this, and therefore part of the proposal?

Unless you are saying now that atomic ref counting is OK?

I'm going by your previous statement:

 I very much want to avoid requiring atomic counts - it's a major performance 
penalty. -Steve
Oct 09 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-10 04:35, Walter Bright wrote:
 Steven Schveighoffer wrote:

 On Jul 1, 2013, at 12:17 PM, Walter Bright wrote:

  >> I really urge you to make this a separate project.  It's not
 trivial. Logically, it's sound, but the implementation will be very
 difficult.  I also think Sean (and probably others) should be involved
 for that discussion.
  >
  > Make what a separate project? The destruction of objects by the GC in
 local threads? It already is not part of the ref counting proposal.
  >


 As far as I can tell, the ref counting proposal is not viable without
 it, as long as you insist on non-atomic RC increments and decrements.
 How can it possibly not be a prerequisite to this, and therefore part of
 the proposal?

 Unless you are saying now that atomic ref counting is OK?

 I'm going by your previous statement:

  > I very much want to avoid requiring atomic counts - it's a major
 performance penalty.


 -Steve
Is this the last email in the conversation? In that case I think you clearly mark that with a post. -- /Jacob Carlborg
Oct 09 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/9/2013 11:52 PM, Jacob Carlborg wrote:
 Is this the last email in the conversation?
Yes, I posted them in chronological order.
Oct 10 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:
Le 30-juin-2013 à 22:26, Walter Bright  a écrit :

 This is actually a problem right now with the GC, as destructors may be run 
in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in. Indeed. Maybe that could work. How ironic that we can't implement RC efficiently because of the GC. That said, it strongly favors having a base RC object implementation in druntime, where it can be kept in sync with the GC.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/30/2013 7:47 PM, Michel Fortin wrote:
 Le 30-juin-2013 à 22:26, Walter Bright  a écrit :

 This is actually a problem right now with the GC, as destructors may be run 
in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.
 Indeed. Maybe that could work. How ironic that we can't implement RC 
efficiently because of the GC.
 That said, it strongly favors having a base RC object implementation in 
druntime, where it can be kept in sync with the GC.

The GC doesn't need to know about it.
Oct 09 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
And that's the last email in the original thread!
Oct 10 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/10/2013 12:31 AM, Walter Bright wrote:
 And that's the last email in the original thread!
Durn, no it isn't.
Oct 10 2013
prev sibling parent "Sean Kelly" <sean invisibleduck.org> writes:
On Thursday, 10 October 2013 at 02:28:13 UTC, Walter Bright wrote:
 Steven Schveighoffer wrote:

 On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:

 I very much want to avoid requiring atomic counts - it's a
major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue. I think you didn't understand what Michel was saying. Take for example: A->B->C->A this is a cycle. Imagine that nobody else is pointing at A, B or C. Fine. The GC starts to collect this cycle. But let's say that D is not being collected *AND* B has a reference to D. B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count. I agree that RC optimally is thread-local. But if you involve the GC, then ref incs and decs have to be atomic.
I think this ties into the requirement that after the GC collects thread-local objects, they must be finalized by the thread that owns them (assuming it's still alive). What's missing is some way to track what thread owns an object. This isn't super difficult to add in the simple case, but if we allow thread-local objects to be transferred between threads, then the transferral of ownership has to be communicated to the GC. Assuming for the moment that's not a problem though, I think RC updates could be non-atomic for thread-local data.
Oct 11 2013
prev sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
I've made short roundup of the different features/requirements that have 
been mentioned (may have forgotten some and added some) as this thread 
has a complexity that presumably makes it quite hard to follow for most 
readers. I have also attached an estimated priority for each requirement 
based on the discussion and my own experiences.

  - Memory safety [very important but also very difficult/limiting]
      Disallow escaping uncounted references to reference counted
      memory. Keywords: pure, scope, isolated/owned
  - COM compatible [important]
      Needs to support internal reference counting using
      AddRef/ReleaseRef, while obeying to the call convention
  - Objective-C compatible [important]
      Weak references, manual memory management and autorelease pools are
      some keywords here
  - Handle reference cycles [nice to have]
      Requires GC memory for storing the instances
  - Weak pointers [critical]
      Only briefly mentioned, but critical for many data structures
      (e.g. graphs or caches) requires external reference counting
  - Not require two separate counts for COM [mildly important]
      Using GC memory would require a second reference count for the
      D side of things, which is not desirable for multiple reasons
  - Support type qualifiers [critical]
      All usual type qualifiers and conversions should work as expected.
      This is not possible in a pure template based solution.
  - Support non-class types [nice to have]
      Although slightly limiting, classes provide a convenient
      abstraction and will arguably capture >90% use cases just fine
  - Support referencing fields/slices [nice to have]
      Letting references to members escape in a safe way would greatly
      increase the flexibility, but ties it tightly to the GC
  - Allow manual memory management [critical]
      For COM/Obj-C and any situation where external code is to take
      responsibility of the ref counting this needs to be an option
  - Not require new annotations [important]
      Getting this to work without introducing new keywords/syntax is
      strongly preferred by Walter
  - Safe shared(RefCountedType) variables [important]
      There needs to be a way to make shared references thread-safe
  - Support OOP reasonably well [important]
      The usual up and down casts should work and calling functions of
      base classes needs to be safe

Please mention any points that I have forgotten so that we have some 
kind of unit test against which proposed designs can be checked.
Oct 12 2013
next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
To support most of the requirements we need to offer some control over 
the reference type. Forcing the reference to be a pointer to the class 
instance precludes proper weak references and makes thread-safety difficult.

Rainer Schütze's proposal [1] looked promising, but didn't quite work 
out. However, by going a bit further, I think this approach can be fixed 
and will provide all the flexibility needed to implement solutions that 
can satisfy any of those requirements.

The basic idea is the same: Any reference to a class that is recognized 
as being reference counted is replaced by a struct that performs the 
reference counting using RAII (e.g. std.typecons.RefCounted). This 
allows any reference counting scheme to be implemented 
(internal/external, support weak refs or not, global counter table, GC 
memory or malloc, etc.).

TL;DR Let some code speak instead of a full blown spec first:

struct RefCounted(T) { /* ... */ }

//  referenceType!RefCounted
class C {
	// the presence of a C.ReferenceType template makes the class a
	// reference counted class
	// Note: A  referenceType UDA as above defined in object.d
         //       could be a cleaner/more explicit alternative
	alias ReferenceType = RefCounted;

	// C is now internally renamed to __ref_C to avoid ambiguities
	// and "C" itself is an alias for ReferenceType!__ref_C
	pragma(msg, C.stringof); // "RefCounted!__ref_C"

	void method()
	{
		// The "this" pointer, too, is of the reference counted
		// type. caveats: how to handle private fields? COM call
  		// convention?
		pragma(msg, typeof(this)); // "RefCounted!__ref_C"

		// Alternative: allow only pure functions to avoid
		// escaping references

		// Another alternative is to make 'scope' powerful
		// enough and use that:
		pragma(msg, typeof(this)); // "scope __ref_C"
	}
}

// The reference type itself is never const/immutable
// to enable reference counting for qualified types
C c; // -> RefCounted!__ref_C
const(C) d; // -> RefCounted!(const(__ref_C))

// To support the usual implicit conversions, some substitution is
// needed, since we have no implicit cast support for UDTs in the
// language
d = c; // d = typeof(D)(c)
        // or
        // d = typeof(D).implicitCastFrom(c)
        // or
        // d = typeof(C).implicitCastTo!D

// shared, however, is applied to the reference count itself (and
// transitively to the object) to force a thread-safety - or rather to
// avoid accidental use of unsafe implementations for shared references)
shared(const(C)) e; // -> shared(RefCounted!(const(__ref_C)))

---

Caveat: Changing the "this" pointer from '__ref_C' to 
'RefCounted!__ref_C' has implications on the calling convention, which 
needs to be taken into account when COM objects are involved. A simple 
COMPtr-like struct that only contains the target pointer may be enough 
here, though.

Also, to guarantee memory safety, some additional measures need to be 
taken to avoid escaping plain references to refcounted memory. One 
solution is the use of isolated/owned types, another is to make 'scope' 
not only check for shallow reference escaping, but also check for 
escaping of references to fields (similar thing but behaves 
differently). Both combined would of course be ideal. I think this is an 
issue that is mostly orthogonal to the refcount topic. See also the 
corresponding thread [2].

[1]: 
http://forum.dlang.org/thread/l34lei$255v$1 digitalmars.com?page=5#post-l352nk:242g3b:246:40digitalmars.com
[2]: http://forum.dlang.org/thread/kluaojijixhwigoujeip forum.dlang.org
Oct 12 2013
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 12 October 2013 at 14:03:26 UTC, Sönke Ludwig wrote:
 I've made short roundup of the different features/requirements 
 that have been mentioned (may have forgotten some and added 
 some) as this thread has a complexity that presumably makes it 
 quite hard to follow for most readers. I have also attached an 
 estimated priority for each requirement based on the discussion 
 and my own experiences.

  - Memory safety [very important but also very 
 difficult/limiting]
      Disallow escaping uncounted references to reference counted
      memory. Keywords: pure, scope, isolated/owned
We have a first missing block here :D I do think this is mandatory.
  - Objective-C compatible [important]
      Weak references, manual memory management and autorelease 
 pools are
      some keywords here
Can someone explain me what an autorelease pool is ?
  - Handle reference cycles [nice to have]
      Requires GC memory for storing the instances
The good new is that we can (and IMO should) layer ref counting on top of GC. This is the only way to make both work nicely together and have safety net for leakage.
  - Weak pointers [critical]
      Only briefly mentioned, but critical for many data 
 structures
      (e.g. graphs or caches) requires external reference 
 counting
Easy for RefCounted but become tricky for GC stuff.
  - Not require two separate counts for COM [mildly important]
      Using GC memory would require a second reference count for 
 the
      D side of things, which is not desirable for multiple 
 reasons
Can you explain that one a bit more ? Especially how it require 2 count.
  - Support type qualifiers [critical]
      All usual type qualifiers and conversions should work as 
 expected.
      This is not possible in a pure template based solution.
Here we have the second piece missing. We need a way to tail qualify template.
  - Not require new annotations [important]
      Getting this to work without introducing new 
 keywords/syntax is
      strongly preferred by Walter
We have 2 missing bloc to make that work (from a language perpective, many runtime/compiler magic is also required).
  - Support OOP reasonably well [important]
      The usual up and down casts should work and calling 
 functions of
      base classes needs to be safe
I see no way to make that work without increasing the scope of scope (huhuhu :P)
 Please mention any points that I have forgotten so that we have 
 some kind of unit test against which proposed designs can be 
 checked.
You did an excellent job here. I guess we have 2 missing piece (and an incomplete one in the name of scope) to get sorted out and we can get something really nice here !
Oct 12 2013
parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-10-13 01:15:49 +0000, "deadalnix" <deadalnix gmail.com> said:

 Can someone explain me what an autorelease pool is ?
A basic concept in Objective-C to make manual reference counting bearable. As things are moving to *automatic* reference counting now autorelease pools are becoming less important, but they remain there for backward compatibility and autoreleased objects must be handled correctly by ARC following to the existing conventions. The concept is to have functions return autoreleased objects, objects pending release. Each time you autorelease an object, instead of the counter being decremented immediately, the object gets added to the autorelease pool and the pool decrement the counter later when it gets drained. So, when the caller gets an autoreleased object, it doesn't have to decrement the counter when it stopped using the object as a temporary, the object will be cleaned up automatically later, generally at the next iteration of the event loop. You only need to retain the object if you're storing it somewhere else than a local variable. So this is what made manual reference counting bearable in Objective-C. Autorelease pool support is only useful and needed for correctly implementing ARC for Objective-C object types. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Oct 12 2013