digitalmars.D - Improving DIP74: functions borrow by default, retain only if needed
- Andrei Alexandrescu (18/18) Feb 27 2015 DIP74's function call protocol for RCOs has the caller insert opAddRef
- H. S. Teoh via Digitalmars-d (11/32) Feb 27 2015 So if the callee assigns the RCO to something else, that's when
- Zach the Mystic (19/38) Feb 27 2015 I think it's fine. I couldn't even figure out the original motive
- Steven Schveighoffer (21/38) Feb 27 2015 I recall saying something like this, and someone came up with a reason
- Steven Schveighoffer (12/19) Feb 27 2015 Bleh, that was dumb.
- Michel Fortin (22/44) Feb 27 2015 You have to retain 'c' for the duration of the call unless you can
- Andrei Alexandrescu (4/44) Feb 27 2015 Thanks. So it seems we continue as we were with DIP74 and leave the rest...
- Zach the Mystic (6/67) Feb 27 2015 Still seems like a very significant performance penalty for such
- Zach the Mystic (7/31) Feb 28 2015 Hey, I don't think so. I think I figured it out. Keep track "in
- Andrei Alexandrescu (41/59) Feb 27 2015 Thanks! In ARC, there are autorelease pools that keep at least one
- Steven Schveighoffer (15/40) Feb 27 2015 No, that's not what they are for. They are for returning data without
- Michel Fortin (6/9) Feb 27 2015 Exactly.
- Andrei Alexandrescu (3/8) Feb 27 2015 OK, so at least in theory autorelease pools are not necessary for
- deadalnix (4/16) Feb 27 2015 ARC need them, this is part of the spec. You can have good RC
- Michel Fortin (9/14) Feb 27 2015 Apple's ARC needs autorelease pools to interact with Objective-C code.
- John McCall (100/112) Mar 06 2015 Hi, I'm the core language designer for ObjC ARC. I was pointed at
- Zach the Mystic (4/23) Feb 27 2015 Split-passing nested ref-counted classes with null loads! How
DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias. However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead. So I was thinking of changing DIP74 as follows: * Caller does NOT insert an opAddRef for byval RCOs * Callee does NOT insert an opRelease for its byval RCO parameters It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something. Thoughts? Andrei
Feb 27 2015
On Fri, Feb 27, 2015 at 10:24:26AM -0800, Andrei Alexandrescu via Digitalmars-d wrote:DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias. However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead. So I was thinking of changing DIP74 as follows: * Caller does NOT insert an opAddRef for byval RCOs * Callee does NOT insert an opRelease for its byval RCO parametersSo if the callee assigns the RCO to something else, that's when opAddRef/retain will get called, but if the callee doesn't do that, then no call is inserted? Sounds reasonable.It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something. Thoughts?[...] As long as there is no sharing of the RCO between threads, this looks like it should work. (But I'm no ARC expert, so don't take my word for it.) But if there's sharing, the story becomes drastically more complex. T -- Lottery: tax on the stupid. -- Slashdotter
Feb 27 2015
On Friday, 27 February 2015 at 18:24:27 UTC, Andrei Alexandrescu wrote:DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias. However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead. So I was thinking of changing DIP74 as follows: * Caller does NOT insert an opAddRef for byval RCOs * Callee does NOT insert an opRelease for its byval RCO parameters It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something. Thoughts?I think it's fine. I couldn't even figure out the original motive for wanting to add those calls -- I thought it must have something to do with threads or exceptions or something, but even then I couldn't figure it out. Any reference argument will, by definition, outlive its function -- it can't possibly die within the function itself, since the caller still thinks it's a valid reference. Another thing is that local references in general need not participate in reference counting. They will retain and release the reference automatically when they go in and out of scope. I'm really no expert (except that I like to study and think and by thinking become somewhat expert it appears), but if all ARC could be confined to global/heap <=> global/heap copies, you'd get the most efficient code. And I'm not trying to advertise a reference tracking system :-), but the real hiccup is that global reference can go *through* the stack and land back at a global... and you would need to keep track of that.
Feb 27 2015
On 2/27/15 1:24 PM, Andrei Alexandrescu wrote:DIP74's function call protocol for RCOs has the caller insert opAddRef for each RCO passed by value. Then the callee has the responsibility to call opRelease (or defer that to another entity). This choice of protocol mimics the constructor/destructor protocol and probably shows our C++ bias. However, ARC does not do that. Instead, it implicitly assumes the callee is a borrower of the reference. Only if the callee wants to copy the parameter to a member or a global (i.e. save it beyond the duration of the call), a new call to retain() (= opAddRef) is inserted. That way, functions that only need to look at the object but not store it incur no reference call overhead. So I was thinking of changing DIP74 as follows: * Caller does NOT insert an opAddRef for byval RCOs * Callee does NOT insert an opRelease for its byval RCO parameters It seems everything will just work with this change (including all move scenarios), but it is simple enough to make me worry I'm missing something. Thoughts?I recall saying something like this, and someone came up with a reason why you still have to add the calls. I'll see if I can dig it up. OK, I found the offending issue. It's when you pass a parameter, the only reference holding onto it may be also passed as well. Something like: void foo(C c, C2 c2) { c2.c = null; // this destroys 'c' unless you opAddRef it before passing c.someFunc(); // crash } void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); } How does the compiler know in this case that it *does* have to opAddRef c before calling? Maybe your ARC expert can explain how that works. BTW, Michel Fortin is who pointed this out. -Steve
Feb 27 2015
On 2/27/15 3:30 PM, Steven Schveighoffer wrote:void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); }Bleh, that was dumb. void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door. -Steve
Feb 27 2015
On 2015-02-27 20:34:08 +0000, Steven Schveighoffer said:On 2/27/15 3:30 PM, Steven Schveighoffer wrote:You have to retain 'c' for the duration of the call unless you can prove somehow that calling the function will not cause it to be released. You can prove it in certain situations: - you are passing a local variable as a parameter and nobody has taken a mutable reference (or pointer) to that variable, or to the stack frame (be wary of nested functions accessing the stack frame) - you are passing a global variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable. - you are passing a member variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable or its class. There are surely other cases, but you get the idea. These three situations are probably the most common, especially the first one. For instance, inside a member function, 'this' is a local variable and you will never pass it to another function by ref, so it's safe to call 'this.otherFunction()' without retaining 'this' first. -- Michel Fortin michel.fortin michelf.com http://michelf.com/void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); }Bleh, that was dumb. void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.
Feb 27 2015
On 2/27/15 1:02 PM, Michel Fortin wrote:On 2015-02-27 20:34:08 +0000, Steven Schveighoffer said:Thanks. So it seems we continue as we were with DIP74 and leave the rest to the implementation. AndreiOn 2/27/15 3:30 PM, Steven Schveighoffer wrote:You have to retain 'c' for the duration of the call unless you can prove somehow that calling the function will not cause it to be released. You can prove it in certain situations: - you are passing a local variable as a parameter and nobody has taken a mutable reference (or pointer) to that variable, or to the stack frame (be wary of nested functions accessing the stack frame) - you are passing a global variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable. - you are passing a member variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable or its class. There are surely other cases, but you get the idea. These three situations are probably the most common, especially the first one. For instance, inside a member function, 'this' is a local variable and you will never pass it to another function by ref, so it's safe to call 'this.otherFunction()' without retaining 'this' first.void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); }Bleh, that was dumb. void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.
Feb 27 2015
On Friday, 27 February 2015 at 21:21:08 UTC, Andrei Alexandrescu wrote:On 2/27/15 1:02 PM, Michel Fortin wrote:Still seems like a very significant performance penalty for such a strange case. It probably won't surprise you that I would suggest another parameter attribute to the rescue, e.g.` rcRelease`! Inter-function communication for the win!On 2015-02-27 20:34:08 +0000, Steven Schveighoffer said:Thanks. So it seems we continue as we were with DIP74 and leave the rest to the implementation. AndreiOn 2/27/15 3:30 PM, Steven Schveighoffer wrote:You have to retain 'c' for the duration of the call unless you can prove somehow that calling the function will not cause it to be released. You can prove it in certain situations: - you are passing a local variable as a parameter and nobody has taken a mutable reference (or pointer) to that variable, or to the stack frame (be wary of nested functions accessing the stack frame) - you are passing a global variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable. - you are passing a member variable as a parameter to a pure function and aren't giving to that pure function a mutable reference to that variable or its class. There are surely other cases, but you get the idea. These three situations are probably the most common, especially the first one. For instance, inside a member function, 'this' is a local variable and you will never pass it to another function by ref, so it's safe to call 'this.otherFunction()' without retaining 'this' first.void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); }Bleh, that was dumb. void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.
Feb 27 2015
On Friday, 27 February 2015 at 21:21:08 UTC, Andrei Alexandrescu wrote:On 2/27/15 1:02 PM, Michel Fortin wrote:Hey, I don't think so. I think I figured it out. Keep track "in house" of which parameters get opReleased, and have the compiler insert addRef and opRelease at entry and exit to the function itself. No performance penalty, no parameter attribute, no nothin'. Just an in-house tracking mechanism. Eh???On 2015-02-27 20:34:08 +0000, Steven Schveighoffer said:Thanks. So it seems we continue as we were with DIP74 and leave the rest to the implementation.void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.There are surely other cases, but you get the idea. These three situations are probably the most common, especially the first one. For instance, inside a member function, 'this' is a local variable and you will never pass it to another function by ref, so it's safe to call 'this.otherFunction()' without retaining 'this' first.
Feb 28 2015
On 2/27/15 12:34 PM, Steven Schveighoffer wrote:On 2/27/15 3:30 PM, Steven Schveighoffer wrote:Thanks! In ARC, there are autorelease pools that keep at least one reference to the objects they own. I think that's what they are for. So let me add a complete example: class C { void someFunc(); void opAddRef(); void opRelease(); } class C2 { C c; void opAddRef(); void opRelease(); } void foo(C c, C2 c2) { c2.c = null; c.someFunc(); // crash } void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Distinguishing these is an interesting problem. In fact we can reduce the matter to one class only: class C { C c; void someFunc(); void opAddRef(); void opRelease(); } void foo(C c1, C c2) { c2.c = null; c1.someFunc(); // crash } void main() { C obj = new C; obj.c = new C; foo(obj.c, obj); } Andreivoid main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); }Bleh, that was dumb. void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.
Feb 27 2015
On 2/27/15 4:15 PM, Andrei Alexandrescu wrote:On 2/27/15 12:34 PM, Steven Schveighoffer wrote:No, that's not what they are for. They are for returning data without having to worry about the receiving function needing to release. This was really important for manual reference counting. Otherwise, you would always have to return a value with it's count at 1, and put the onus on the receiving function to store the result and release it after using. One-liners would turn into huge nests of temporary variables. I believe autorelease pools are not needed for ARC, but are maintained because much Objective-C code contains MRC, and that protocol needs to be supported. I used them a lot in my Objective-C code. With ARC you can create autorelease pools, but you never did any autoRetain manually, the ARC system did it. Creating the pool just allows you to scope where data should be released. Otherwise, you are adding it to the event loop pool which is only released after an event is done processing. -SteveOn 2/27/15 3:30 PM, Steven Schveighoffer wrote:Thanks! In ARC, there are autorelease pools that keep at least one reference to the objects they own. I think that's what they are for.void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); }Bleh, that was dumb. void main() { C2 c2 = new C2; c2.c = new C; foo(c2.c, c2); } Still same question. The issue here is how do you know that the reference that you are sure is keeping the thing alive is not going to release it through some back door.
Feb 27 2015
On 2015-02-27 21:33:51 +0000, Steven Schveighoffer said:I believe autorelease pools are not needed for ARC, but are maintained because much Objective-C code contains MRC, and that protocol needs to be supported.Exactly. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 27 2015
On 2/27/15 2:46 PM, Michel Fortin wrote:On 2015-02-27 21:33:51 +0000, Steven Schveighoffer said:OK, so at least in theory autorelease pools are not necessary for getting ARC to work? -- AndreiI believe autorelease pools are not needed for ARC, but are maintained because much Objective-C code contains MRC, and that protocol needs to be supported.Exactly.
Feb 27 2015
On Friday, 27 February 2015 at 23:06:26 UTC, Andrei Alexandrescu wrote:On 2/27/15 2:46 PM, Michel Fortin wrote:ARC need them, this is part of the spec. You can have good RC without them IMO.On 2015-02-27 21:33:51 +0000, Steven Schveighoffer said:OK, so at least in theory autorelease pools are not necessary for getting ARC to work? -- AndreiI believe autorelease pools are not needed for ARC, but are maintained because much Objective-C code contains MRC, and that protocol needs to be supported.Exactly.
Feb 27 2015
On 2015-02-27 23:11:55 +0000, deadalnix said:On Friday, 27 February 2015 at 23:06:26 UTC, Andrei Alexandrescu wrote:Apple's ARC needs autorelease pools to interact with Objective-C code. But if by ARC you just mean what the acronym stands for -- automatic reference counting -- there's no need for autorelease pools to implement ARC. -- Michel Fortin michel.fortin michelf.com http://michelf.com/OK, so at least in theory autorelease pools are not necessary for getting ARC to work? -- AndreiARC need them, this is part of the spec. You can have good RC without them IMO.
Feb 27 2015
On Saturday, 28 February 2015 at 02:55:14 UTC, Michel Fortin wrote:On 2015-02-27 23:11:55 +0000, deadalnix said:Hi, I'm the core language designer for ObjC ARC. I was pointed at this thread by a coworker. ObjC ARC uses +0 conventions for parameters and results by default only because those are the conventions used by manual reference counting (MRC) in Objective-C. Those conventions developed that way because programmers were manually implementing them. ARC can only deviate from those conventions when it's certain that it knows about all callers and implementations; since individual files in a project can be selectively compiled in either ARC or MRC, and since Objective-C methods can be dynamically overridden and reflectively called, that would only be possible for static functions, and even then we might need thunks when taking their address. A +0 result convention does have some benefits when implemented by programmers. Assuming you don't care about safety in the presence of data races, certain functions (getters, chiefly) can simply return "borrowed" references which the caller can use without retaining if they're very careful. This is almost completely impossible for a compiler to take effective advantage of, because programmers can make much more aggressive/unsafe assumptions about the behavior of code ("It's obvious that none of these calls can invalidate my borrowed reference."). And it creates the problem of what to do when you have to follow a +0 convention but have a naturally +1 result. There is absolutely no reason to emulate what ObjC ARC does with autoreleased results. It's bad for performance in about half-a-dozen different ways, and the trick we use to avoid actual autoreleases is extremely brittle. If you really care about the borrowed result optimization, you can use a dynamic convention, where you also return a flag saying whether the result is borrowed; it then becomes a neat optimization problem to actually take advantage of that. That was never an option for ARC because it's not MRC compatible. I would be very concerned about the code-size impact of doing this for arbitrary calls, but you could consider selectively using it for getters. Parameters are a different story, and you can make a case either way. On the caller side, the function got the argument reference from somewhere, probably by constructing it or calling a function that returned it. Even if the reference was loaded from memory, it may need to be retained for safety's sake if the memory is mutable. So the caller generally owns a retain of the argument. A +1 convention allows that reference to simply be forwarded without extra work in the common case that it's used in exactly one place. The disadvantage is that, if the reference is used multiple times, it may need to be retained multiple times just to balance the convention. On the callee side, the language needs to guarantee that the object stays valid as long as it's being used within the function. In a +0 convention, you can have the caller make that guarantee; of course, that means the caller will always have to retain unless it's able to forward a similar guarantee from somewhere else. Without this guarantee, in a +0 convention the callee's probably going to need to retain anyway. (Unfortunately, in Objective-C the caller does not make this guarantee.) You can imagine situations where any one of these three conventions (+1, +0 guaranteed, +0 non-guaranteed) is the most profitable. I tend to prefer a +1 convention because of its impact on common, straight-line code. A +0 non-guaranteed convention is very nice for higher-order algorithms on arrays because you can briefly borrow the reference from the array and let the callee decide whether it needs to retain. If the callee is some lightweight function like a sort comparator, it probably doesn't need to. But the convention is awful for more complex code because it frequently forces both sides to own a retain. A +0 guaranteed convention avoids creating redundant work for values used multiple times, but it does prevent a reference from being "forwarded": if you allocate it in the caller, and then store it somewhere in the callee, you're going to need a redundant retain. Consider using this for select arguments like the "this" argument of a method. All of this analysis assumes that you have some built-in optimization of retain and release operations. I probably won't watch this thread, but feel free to email me if you have further questions.On Friday, 27 February 2015 at 23:06:26 UTC, Andrei Alexandrescu wrote:Apple's ARC needs autorelease pools to interact with Objective-C code. But if by ARC you just mean what the acronym stands for -- automatic reference counting -- there's no need for autorelease pools to implement ARC.OK, so at least in theory autorelease pools are not necessary for getting ARC to work? -- AndreiARC need them, this is part of the spec. You can have good RC without them IMO.
Mar 06 2015
On Friday, 27 February 2015 at 20:30:20 UTC, Steven Schveighoffer wrote:OK, I found the offending issue. It's when you pass a parameter, the only reference holding onto it may be also passed as well. Something like: void foo(C c, C2 c2) { c2.c = null; // this destroys 'c' unless you opAddRef it before passing c.someFunc(); // crash } void main() { C c = new C; // ref counted class C2 c2 = new C2; // another ref counted class c2.c = c; foo(c, c2); } How does the compiler know in this case that it *does* have to opAddRef c before calling? Maybe your ARC expert can explain how that works.Split-passing nested ref-counted classes with null loads! How insidious!
Feb 27 2015