www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proposal: Hide the int in opApply from the user

reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

I proposed this iniitally over in D.learn, but I'm cleaning it up and 
reposting here in hopes of getting some response from Walter who was 
probably too busy finishing const and eating holiday Turkey at the time 
to notice.  And rightly so.

I don't believe it's appropriate in a high-level supposedly clean 
language like D that one of the main facilities for iterating over user 
types (foreach) requires writing code that passes around magic values 
generated by the compiler (opApply).

It seems wrong to me that these magic values
- come from code generated by the compiler,
- must be handled exactly the proper way by the user's opApply
    (or else you get undefined behavior, but no compiler errors)
- and then are handed back to code also generated by the compiler.

Furthermore, the compiler-generated code in the first and last steps 
share the same scope!  So the solution seems obvious -- they should pass 
the information back and forth using a local variable in their shared 
local scope.

In this proposal we need to add two things:

1) a new template struct and
2) a new macro [yes, this proposal relies on macros which don't exist yet!]


The template just bundles an int* (pointer to _ret) together with the 
loop body delegate:

   struct Apply(Args...)
   {
     alias void delegate(Args) LoopBody;
     LoopBody _loop_body;
     int* _ret = null;
   }

the macro is this (just guessing what syntax will be, and hoping macros 
will support tuple-like varargs):

   macro yield(dg, args...) {
     dg._call(args);
     if (dg._ret && *dg._ret) { return; }
   }

With these two library additions, opApply functions can become this:

void opApply( Apply!(ref T) dg ) {
    for( /*T x in elements*/ ) {
        yield(dg,x);
    }
}

Now the trickiness is *all* shifted to how you call such a beast 
properly, which is all handled by the compiler.  For a foreach in a void 
function, the compiler will have to generate code like so:

     int _ret = 0;
     void _loop_body(/*ref*/ T x)
     {
         writefln("x is ", x);
         if (x=="two") { _ret = BREAK; return; }
         if (x=="three") { _ret = RETURN; return; }
         do_something;
     }
     obj.opApply( Apply!(T)(&_loop_body, &_ret) );
     if (_ret==RETURN) return;


The language can ALMOST do this today except for three small things:
1) No macros - but they're on the way!
2) Inability to preserve ref-ness of template arguments -- but I think 
this really needs to be solved one way or another regardless.
3) The necessary but changes to the foreach code gen -- this is 
straightforward.


Attached is a proof of concept demo.  I've manually inlined the yield() 
code to work around 1), and made the loop body use a non-ref type to 
work around 2).  I manually generated the foreach code too to deal with 3).

The great thing about this proposal is that it is backwards compatible. 
   foreach already generates different code depending on what the 
argument is, this can just be another case detected by the use of the 
Apply argument.  Code using old-style opApplys can continue to work.

The main thing fuzzy in my mind is the vague status of yield and Apply. 
  They don't need to be keywords per-se, but the compiler at least needs 
to know about Apply so that it can recognize the signature of this 
"new-style" opApply.   I think it can maybe satisfy all that by going 
into object.d?  If there were anonymous struct literals it wouldn't even 
need to be a real struct, just an alias like we have for 'string' now.


--bb
Jan 07 2008
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
I think this proposal has one fatal flaw...

There is no way for the opApply function to do something after iteration stops
prematurely.  Some data structures could change internal state as they iterate.
 Those state changes may require clean-up.  I have no good examples at the
moment, but know they exist.
Jan 07 2008
parent reply Sean Kelly <sean f4.ca> writes:
Jason House wrote:
 I think this proposal has one fatal flaw...
 
 There is no way for the opApply function to do something after iteration stops
prematurely.  Some data structures could change internal state as they iterate.
 Those state changes may require clean-up.  I have no good examples at the
moment, but know they exist.

scope(exit) would work, but it's not ideal. Sean
Jan 07 2008
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Sean Kelly wrote:
 Jason House wrote:
 I think this proposal has one fatal flaw...

 There is no way for the opApply function to do something after 
 iteration stops prematurely.  Some data structures could change 
 internal state as they iterate.  Those state changes may require 
 clean-up.  I have no good examples at the moment, but know they exist.


Hmm, you're right. That's an issue that had not occurred to me because I've never seen it in code. But it's not a fatal flaw I do not think. The yield thing is a pretty trivial macro. I see several possible ways to handle such rare cases. 1) Add public opCall and a public 'finished' methods to Apply to allow users to do yield's work on their own: struct Apply(Args...) { // .. same as before plus: void opCall(Args args) { _loop_body(args); } bool finished() { return *_ret!=0; } } Then this is possible: void opApply( Apply!(ref T) dg ) { for( /*T x in elements*/ ) { dg(x); if (dg.finished) { // do clean up; return; } } } 2) Provide an alternate macro that takes a cleanup parameter: void opApply( Apply!(ref T) dg ) { for( /*T x in elements*/ ) { yield_with_cleanup(dg, { /*do cleanup*/ }, x); } } 3)
 scope(exit) would work, but it's not ideal.

But it's probably the solution that I would use, and one of the other solutions can be used for what I expect are the very rare situations in which you both have to do clean up in your opApply and for some reason can't use scope(exit). --bb
Jan 07 2008
prev sibling parent reply "Bruce Adams" <tortoise_74 yeah.who.co.uk> writes:
On Mon, 07 Jan 2008 09:06:52 -0000, Bill Baxter  
<dnewsgroup billbaxter.com> wrote:

 I proposed this iniitally over in D.learn, but I'm cleaning it up and
 reposting here in hopes of getting some response from Walter who was
 probably too busy finishing const and eating holiday Turkey at the time
 to notice.  And rightly so.

 I don't believe it's appropriate in a high-level supposedly clean
 language like D that one of the main facilities for iterating over user
 types (foreach) requires writing code that passes around magic values
 generated by the compiler (opApply).

 It seems wrong to me that these magic values
 - come from code generated by the compiler,
 - must be handled exactly the proper way by the user's opApply
     (or else you get undefined behavior, but no compiler errors)
 - and then are handed back to code also generated by the compiler.

you are describing an iterator without iterators being properly part of the D world (yet).
Jan 07 2008
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Bruce Adams wrote:
 On Mon, 07 Jan 2008 09:06:52 -0000, Bill Baxter 
 <dnewsgroup billbaxter.com> wrote:
 
 I proposed this iniitally over in D.learn, but I'm cleaning it up and
 reposting here in hopes of getting some response from Walter who was
 probably too busy finishing const and eating holiday Turkey at the time
 to notice.  And rightly so.

 I don't believe it's appropriate in a high-level supposedly clean
 language like D that one of the main facilities for iterating over user
 types (foreach) requires writing code that passes around magic values
 generated by the compiler (opApply).

 It seems wrong to me that these magic values
 - come from code generated by the compiler,
 - must be handled exactly the proper way by the user's opApply
     (or else you get undefined behavior, but no compiler errors)
 - and then are handed back to code also generated by the compiler.

like you are describing an iterator without iterators being properly part of the D world (yet).

I mean this: alias int MagicValue; MagicValue opApply(MagicValue delegate(ref T) dg) { MagicValue magic_value=0; foreach(x; stuff) { magic_value = dg(x); if (magic_value != 0) return magic_value; } return magic_value; } "Magic value" is that int that we're forced to have scattered all over our opApply functions. --bb
Jan 07 2008