digitalmars.D.learn - opApply and that int

Bill Baxter (42/42) Jan 04 2008 I know we've been through this before but I don't recall the conclusion.

Bill Baxter (32/85) Jan 04 2008 Ok, Jason poked me into realizing that I completely forgot that the

BCS (12/24) Jan 04 2008 how about this:

Bill Baxter (79/108) Jan 04 2008 Warning, long post, but in the end I think I actually came up with a

Bill Baxter (10/10) Jan 04 2008 A slightly more streamlined version of the demo.

Bill Baxter <dnewsgroup billbaxter.com> writes:

I know we've been through this before but I don't recall the conclusion.

Why do we have to pass an int through our opApply functions.

Given an object.opApply that takes a delegate that takes a ref T,
and code like this:

foreach(T x; object) {
      if (x) break;
      if (condition) return Something;
      do_something;
}

the compiler transforms that into something like this:

RType _fn_ret; // (RType is return type of enclosing function)
int _loop_body(ref T x)
{
    if (x) return BREAK;
    if (condition) { _fn_ret = Something; } return RETURN;
    do_something;
    return 0;
}
int _ret = object.opApply(&_loop_body));
if (_ret==RETURN) return;
else if (_ret==GOTO) goto ??;
// maybe some other cases...


My question is this: _loop_body and the caller of opApply share the same 
enclosing scope, so why not stick the return code in a local variable 
both can see?  It already seems to do that that for return values (as 
far as I can tell from reading dmd/src/dmd/statement.c).  So why not do 
it for the main return code too and generate code like this:

RType _fn_ret;
int _ret = 0;
void _loop_body(ref T x)
{
    _ret = 0;
    if (x) { _ret = BREAK; return; }
    if (condition) { _fn_ret = Something; _ret = RETURN; return; }
    do_something;
}
object.opApply(&_loop_body));
if (_ret==RETURN) return;
else if (_ret==GOTO) goto ??;
// maybe some other cases...


Why oh why does that int have to go traipsing through *my* opApply?

--bb

Jan 04 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Bill Baxter wrote:
 I know we've been through this before but I don't recall the conclusion.
 
 Why do we have to pass an int through our opApply functions.
 
 Given an object.opApply that takes a delegate that takes a ref T,
 and code like this:
 
 foreach(T x; object) {
      if (x) break;
      if (condition) return Something;
      do_something;
 }
 
 the compiler transforms that into something like this:
 
 RType _fn_ret; // (RType is return type of enclosing function)
 int _loop_body(ref T x)
 {
    if (x) return BREAK;
    if (condition) { _fn_ret = Something; } return RETURN;
    do_something;
    return 0;
 }
 int _ret = object.opApply(&_loop_body));
 if (_ret==RETURN) return;
 else if (_ret==GOTO) goto ??;
 // maybe some other cases...
 
 
 My question is this: _loop_body and the caller of opApply share the same 
 enclosing scope, so why not stick the return code in a local variable 
 both can see?  It already seems to do that that for return values (as 
 far as I can tell from reading dmd/src/dmd/statement.c).  So why not do 
 it for the main return code too and generate code like this:
 
 RType _fn_ret;
 int _ret = 0;
 void _loop_body(ref T x)
 {
    _ret = 0;
    if (x) { _ret = BREAK; return; }
    if (condition) { _fn_ret = Something; _ret = RETURN; return; }
    do_something;
 }
 object.opApply(&_loop_body));
 if (_ret==RETURN) return;
 else if (_ret==GOTO) goto ??;
 // maybe some other cases...
 
 
 Why oh why does that int have to go traipsing through *my* opApply?
 
 --bb

Ok, Jason poked me into realizing that I completely forgot that the 
user's opApply has to know to return when the loop body does a break or 
something.  So with what I just proposed it would still have to check 
for a non-zero return code, *BUT* it wouldn't have to return it to the 
caller.  So opApplys could become:

      void opApply(int delegate(ref T) loop_body) {
            for(/*x in elements*/) {
                 if (loop_body(x)) return;
            }
      }

At least then users don't have to handle radioactive materials.

Still I'd love to get rid of that int in front of the delegate too and 
just have something like:

      void opApply(void delegate(ref T) loop_body) {
            for(/*x in elements*/) {
                 loop_body(x);
                 yield();
            }
      }

The trouble is figuring out how to make yield do its magic.
Macros I guess will make it possible to have yield actually return from 
the function.  But I don't see a good way to communicate the current 
loop state to yield().  Yield could maybe know about the stack layouts 
and the code that calls opApply could be careful to put the "int _ret" 
variable in a place on the stack that yield() could always reach up to 
find it.  Yield would be doing tricky non-portable stuff, but the idea 
is it would be included as part of something low-level like object.d, so 
non-portable would be ok.  Unfortunately if you call yield in a 
non-opApply callback situation it could just do bogus stuff and probably 
couldn't even warn you that what you were doing was bogus.

--bb

Jan 04 2008

BCS <ao pathlink.com> writes:

Reply to Bill,


 Ok, Jason poked me into realizing that I completely forgot that the
 user's opApply has to know to return when the loop body does a break
 or something.  So with what I just proposed it would still have to
 check for a non-zero return code, *BUT* it wouldn't have to return it
 to the caller.  So opApplys could become:
 

how about this:

void opApply( /**/ bool /**/  delegate(ref T) loop_body) {
  for(/*x in elements*/)
  {
    if (loop_body(x)) return;
  }
}


 Unfortunately if you call yield in a
 non-opApply callback situation it could just do bogus stuff and
 probably
 couldn't even warn you that what you were doing was bogus.

yield(loop_body(x));  // can check stuff about loop_body

but that still has the issue of:

MyObject mo;
mo.opApply((ref T x){something(x);});  // call to opApply directly


 --bb

Jan 04 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

Warning, long post, but in the end I think I actually came up with a 
pretty decent way to make opApply code cleaner without requiring any 
funky special casing or hacks, and without breaking legacy code.

So please read!

BCS wrote:
 Reply to Bill,
 
 
 Ok, Jason poked me into realizing that I completely forgot that the
 user's opApply has to know to return when the loop body does a break
 or something.  So with what I just proposed it would still have to
 check for a non-zero return code, *BUT* it wouldn't have to return it
 to the caller.  So opApplys could become:

 
 how about this:
 
 void opApply( /**/ bool /**/  delegate(ref T) loop_body) {
  for(/*x in elements*/)
  {
    if (loop_body(x)) return;
  }
 }

Oh, right :-)  A bool would be the way to go.

 Unfortunately if you call yield in a
 non-opApply callback situation it could just do bogus stuff and
 probably
 couldn't even warn you that what you were doing was bogus.

 
 yield(loop_body(x));  // can check stuff about loop_body
 
 but that still has the issue of:
 
 MyObject mo;
 mo.opApply((ref T x){something(x);});  // call to opApply directly

Hmm, maybe this is what you were getting at when you said "can check 
stuff", but it just ocurred to me that loop_body is a delegate whose 
context pointer points to the stack frame where _ret lives.  So we have 
access to the apropriate stack frame, we just don't know
(A) the right offset for _ret or
(B) if there even *is* a _ret in that context (as there wouldn't be for 
a direct call to the opApply)

So we could make that work if we could somehow pass opApply an int* that 
points to _ret.  But then users would have access to that radioactive 
int* which isn't any better than what we started with.

Ok lets face it, though.  Currently the type of delegate that you pass 
to a foreach (be it opApply or some other method) really is not 
particularly useful for anything other than being called by foreach.  If 
you don't call it via a foreach, you have to carefully construct a 
loop_body that handles the int return code properly.  And this is a far 
far *far* less common thing than writing an opApply.  So I think it's 
acceptable to make calling opApply and writing an opApply delegate 
parameter more complex, in order to make writing the opApply itself 
simpler and safer.

So, a new templates and a new macro in object.d are the answer.

The template just bundles an int* (pointer to _ret) together with the 
loop body delegate:

struct Apply(Args...) {
     alias void delegate(Args) LoopBody;
     LoopBody _loop_body;
     int* _ret = null;

     void _call(Args a) {
         loop_body(a); // may set *_ret!
     }
}

the macro is this:

macro yield(dg, args...) {
     dg._call(args);
     if (dg._ret && *dg._ret) { return; }
}

and then opApply-like functions can become:

void opApply( Apply!(ref T) dg ) {
    for( /*T x in elements*/ ) {
        yield(dg,x);
    }
}

Now the trickiness is *all* shifted to how you call such a beast 
properly.  For a foreach in a void function, the compiler will have to 
generate code like so:

int _ret = 0;
void _loop_body(ref T x)
{
    _ret = 0;
    if (x) { _ret = BREAK; return; }
    if (condition) { _ret = RETURN; return; }
    do_something;
}
object.opApply( Apply(&_loop_body, &_ret)));
if (_ret==RETURN) return;



The language can ALMOST do this today except for three small things:
1) No macros - but they're on the way!
2) Inability to preserve ref-ness of template arguments -- this really 
needs to be solved one way or another regardless.
3) The necessary but changes to the foreach code gen -- this is 
straightforward.


Attached is a proof of concept demo.  I've manually inlined the yield() 
code to work around 1), and made the loop body use a non-ref type to 
work around 2).

So what do you think?  The biggest problems I see are
1) the code breakage, but D2.0 is all about breaking code to make things 
better!  Furthermore the signatures of the opApplys are different so the 
compiler could very well continue to generate code the old way for any 
opApply written in the old style.  So actually very little code has to 
break, if any.
2) (maybe the bigger of the two) Walter has never acknowledged that he 
sees anything wrong with making users pass around a magic int in their 
opApplys.


--bb

Jan 04 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

A slightly more streamlined version of the demo.

* The Apply_call method was unnecessary baggage. yield() can just call 
dg._loop_body directly

* Resetting *_ret to 0 on every iteration was unnecessary.

* Allowing for _ret to be a null pointer was unnecessary.  That was just 
intended to make it easier for users to call opApply directly.  But 
realistically, there's no reason for users to ever do that.  But if they 
really really want to they still can; they just have to supply that int 
pointer.

(Note: This code is free for anyone to use for whatever purpose they like)

Jan 04 2008

D Programming

C/C++ Programming

Other

digitalmars.D.learn - opApply and that int