www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - opApply and that int

reply Bill Baxter <dnewsgroup billbaxter.com> writes:
I know we've been through this before but I don't recall the conclusion.

Why do we have to pass an int through our opApply functions.

Given an object.opApply that takes a delegate that takes a ref T,
and code like this:

foreach(T x; object) {
      if (x) break;
      if (condition) return Something;
      do_something;
}

the compiler transforms that into something like this:

RType _fn_ret; // (RType is return type of enclosing function)
int _loop_body(ref T x)
{
    if (x) return BREAK;
    if (condition) { _fn_ret = Something; } return RETURN;
    do_something;
    return 0;
}
int _ret = object.opApply(&_loop_body));
if (_ret==RETURN) return;
else if (_ret==GOTO) goto ??;
// maybe some other cases...


My question is this: _loop_body and the caller of opApply share the same 
enclosing scope, so why not stick the return code in a local variable 
both can see?  It already seems to do that that for return values (as 
far as I can tell from reading dmd/src/dmd/statement.c).  So why not do 
it for the main return code too and generate code like this:

RType _fn_ret;
int _ret = 0;
void _loop_body(ref T x)
{
    _ret = 0;
    if (x) { _ret = BREAK; return; }
    if (condition) { _fn_ret = Something; _ret = RETURN; return; }
    do_something;
}
object.opApply(&_loop_body));
if (_ret==RETURN) return;
else if (_ret==GOTO) goto ??;
// maybe some other cases...


Why oh why does that int have to go traipsing through *my* opApply?

--bb
Jan 04 2008
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Bill Baxter wrote:
 I know we've been through this before but I don't recall the conclusion.
 
 Why do we have to pass an int through our opApply functions.
 
 Given an object.opApply that takes a delegate that takes a ref T,
 and code like this:
 
 foreach(T x; object) {
      if (x) break;
      if (condition) return Something;
      do_something;
 }
 
 the compiler transforms that into something like this:
 
 RType _fn_ret; // (RType is return type of enclosing function)
 int _loop_body(ref T x)
 {
    if (x) return BREAK;
    if (condition) { _fn_ret = Something; } return RETURN;
    do_something;
    return 0;
 }
 int _ret = object.opApply(&_loop_body));
 if (_ret==RETURN) return;
 else if (_ret==GOTO) goto ??;
 // maybe some other cases...
 
 
 My question is this: _loop_body and the caller of opApply share the same 
 enclosing scope, so why not stick the return code in a local variable 
 both can see?  It already seems to do that that for return values (as 
 far as I can tell from reading dmd/src/dmd/statement.c).  So why not do 
 it for the main return code too and generate code like this:
 
 RType _fn_ret;
 int _ret = 0;
 void _loop_body(ref T x)
 {
    _ret = 0;
    if (x) { _ret = BREAK; return; }
    if (condition) { _fn_ret = Something; _ret = RETURN; return; }
    do_something;
 }
 object.opApply(&_loop_body));
 if (_ret==RETURN) return;
 else if (_ret==GOTO) goto ??;
 // maybe some other cases...
 
 
 Why oh why does that int have to go traipsing through *my* opApply?
 
 --bb
Ok, Jason poked me into realizing that I completely forgot that the user's opApply has to know to return when the loop body does a break or something. So with what I just proposed it would still have to check for a non-zero return code, *BUT* it wouldn't have to return it to the caller. So opApplys could become: void opApply(int delegate(ref T) loop_body) { for(/*x in elements*/) { if (loop_body(x)) return; } } At least then users don't have to handle radioactive materials. Still I'd love to get rid of that int in front of the delegate too and just have something like: void opApply(void delegate(ref T) loop_body) { for(/*x in elements*/) { loop_body(x); yield(); } } The trouble is figuring out how to make yield do its magic. Macros I guess will make it possible to have yield actually return from the function. But I don't see a good way to communicate the current loop state to yield(). Yield could maybe know about the stack layouts and the code that calls opApply could be careful to put the "int _ret" variable in a place on the stack that yield() could always reach up to find it. Yield would be doing tricky non-portable stuff, but the idea is it would be included as part of something low-level like object.d, so non-portable would be ok. Unfortunately if you call yield in a non-opApply callback situation it could just do bogus stuff and probably couldn't even warn you that what you were doing was bogus. --bb
Jan 04 2008
parent reply BCS <ao pathlink.com> writes:
Reply to Bill,


 Ok, Jason poked me into realizing that I completely forgot that the
 user's opApply has to know to return when the loop body does a break
 or something.  So with what I just proposed it would still have to
 check for a non-zero return code, *BUT* it wouldn't have to return it
 to the caller.  So opApplys could become:
 
how about this: void opApply( /**/ bool /**/ delegate(ref T) loop_body) { for(/*x in elements*/) { if (loop_body(x)) return; } }
 Unfortunately if you call yield in a
 non-opApply callback situation it could just do bogus stuff and
 probably
 couldn't even warn you that what you were doing was bogus.
yield(loop_body(x)); // can check stuff about loop_body but that still has the issue of: MyObject mo; mo.opApply((ref T x){something(x);}); // call to opApply directly
 --bb
 
Jan 04 2008
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Warning, long post, but in the end I think I actually came up with a 
pretty decent way to make opApply code cleaner without requiring any 
funky special casing or hacks, and without breaking legacy code.

So please read!

BCS wrote:
 Reply to Bill,
 
 
 Ok, Jason poked me into realizing that I completely forgot that the
 user's opApply has to know to return when the loop body does a break
 or something.  So with what I just proposed it would still have to
 check for a non-zero return code, *BUT* it wouldn't have to return it
 to the caller.  So opApplys could become:
how about this: void opApply( /**/ bool /**/ delegate(ref T) loop_body) { for(/*x in elements*/) { if (loop_body(x)) return; } }
Oh, right :-) A bool would be the way to go.
 Unfortunately if you call yield in a
 non-opApply callback situation it could just do bogus stuff and
 probably
 couldn't even warn you that what you were doing was bogus.
yield(loop_body(x)); // can check stuff about loop_body but that still has the issue of: MyObject mo; mo.opApply((ref T x){something(x);}); // call to opApply directly
Hmm, maybe this is what you were getting at when you said "can check stuff", but it just ocurred to me that loop_body is a delegate whose context pointer points to the stack frame where _ret lives. So we have access to the apropriate stack frame, we just don't know (A) the right offset for _ret or (B) if there even *is* a _ret in that context (as there wouldn't be for a direct call to the opApply) So we could make that work if we could somehow pass opApply an int* that points to _ret. But then users would have access to that radioactive int* which isn't any better than what we started with. Ok lets face it, though. Currently the type of delegate that you pass to a foreach (be it opApply or some other method) really is not particularly useful for anything other than being called by foreach. If you don't call it via a foreach, you have to carefully construct a loop_body that handles the int return code properly. And this is a far far *far* less common thing than writing an opApply. So I think it's acceptable to make calling opApply and writing an opApply delegate parameter more complex, in order to make writing the opApply itself simpler and safer. So, a new templates and a new macro in object.d are the answer. The template just bundles an int* (pointer to _ret) together with the loop body delegate: struct Apply(Args...) { alias void delegate(Args) LoopBody; LoopBody _loop_body; int* _ret = null; void _call(Args a) { loop_body(a); // may set *_ret! } } the macro is this: macro yield(dg, args...) { dg._call(args); if (dg._ret && *dg._ret) { return; } } and then opApply-like functions can become: void opApply( Apply!(ref T) dg ) { for( /*T x in elements*/ ) { yield(dg,x); } } Now the trickiness is *all* shifted to how you call such a beast properly. For a foreach in a void function, the compiler will have to generate code like so: int _ret = 0; void _loop_body(ref T x) { _ret = 0; if (x) { _ret = BREAK; return; } if (condition) { _ret = RETURN; return; } do_something; } object.opApply( Apply(&_loop_body, &_ret))); if (_ret==RETURN) return; The language can ALMOST do this today except for three small things: 1) No macros - but they're on the way! 2) Inability to preserve ref-ness of template arguments -- this really needs to be solved one way or another regardless. 3) The necessary but changes to the foreach code gen -- this is straightforward. Attached is a proof of concept demo. I've manually inlined the yield() code to work around 1), and made the loop body use a non-ref type to work around 2). So what do you think? The biggest problems I see are 1) the code breakage, but D2.0 is all about breaking code to make things better! Furthermore the signatures of the opApplys are different so the compiler could very well continue to generate code the old way for any opApply written in the old style. So actually very little code has to break, if any. 2) (maybe the bigger of the two) Walter has never acknowledged that he sees anything wrong with making users pass around a magic int in their opApplys. --bb
Jan 04 2008
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
A slightly more streamlined version of the demo.

* The Apply_call method was unnecessary baggage. yield() can just call 
dg._loop_body directly

* Resetting *_ret to 0 on every iteration was unnecessary.

* Allowing for _ret to be a null pointer was unnecessary.  That was just 
intended to make it easier for users to call opApply directly.  But 
realistically, there's no reason for users to ever do that.  But if they 
really really want to they still can; they just have to supply that int 
pointer.

(Note: This code is free for anyone to use for whatever purpose they like)
Jan 04 2008