www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - PROPOSAL: opSeq()

reply Russell Lewis <webmaster villagersonline.com> writes:
PROPOSAL: A way to handle sequences of expressions which otherwise would 
have been syntax errors


EXAMPLE CODE:
	my_for(i=0, i<10, i++) { <code> }


PARSER DETAILS:

Add a grammar rule that works as follows:
	expression:
		expression expression+

(I'm not sure exactly where in the associativity hierarchy it should go. 
  Maybe assign expression?)


GRAMMAR DETAILS:

Any time that we parse the above rule, the left-hand expression must be 
a "sequence handler."  A sequence handler is either a delegate, or a 
struct which implements the function "opSeq()".

The number and type of arguments of the handler determine how many, and 
what type, of expressions can follow the handler.

The return value from the handler can be void, or a value.

If the handler has fewer arguments than we have expressions in the 
sequence, then the return value from the first handler may be a second 
handler, and thus we can chain handlers.

If the types of the expressions don't match, or the sequence of 
expressions has too few elements, then we have a syntax error.

Handlers are always right-associative.  This means that if we have a 
series of expressions:
	handler1 expressionA handler2 expressionB

then this first becomes:
	handler1 expressionA handler2(expressionB)

Then, if handler1 has 2 arguments, it becomes
	handler1(expressionA, handler2(expressionB))

However, if handler1 has only 1 argument, then it must return a handler 
for the second expression:
	handler1(expressionA)(handler2(expressionB))


IMPLEMENTATION EXAMPLE:

// my_for().
//
// Note that the "lazy void" overload of opSeq handles single-line
// bodies with no {} while the "void delegate()" overload handles
// bodies with {}.


MyFor my_for(lazy void init, lazy bool test, lazy void inc)
{
   MyFor ret;
     ret.init = init;
     ret.test = test;
     ret.inc  = inc;
   return ret;
}


struct MyFor
{
   void delegate() init;
   bool delegate() test;
   void delegate() inc;

   void opSeq(lazy void body)
   {
     opSeq({ body() });
   }

   void opSeq(void delegate() body)
   {
     init();

     while(test())
     {
       body();
       inc;
     }
   }
}
Apr 07 2008
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Russell Lewis wrote:
 PROPOSAL: A way to handle sequences of expressions which otherwise would 
 have been syntax errors
 
 
 EXAMPLE CODE:
     my_for(i=0, i<10, i++) { <code> }

Are you familiar with the "trailing delegates" proposal? Basically the idea there is that any {<code>} block following a function call would be treated as an extra argument to the function. So if you write the function: void my_for(lazy void init, lazy bool test, lazy void inc, void delegate()) { ... } then your EXAMPLE_CODE above would call that function. Your proposal would have one benefit over that in that you could have "my_for" a varargs function if you wanted to. Though, the trailing delegates idea could probably be fixed to handle that too. Like by making the trailing delegate the first argument instead of the last (kinda like what opIndexAssign does). Overall I think trailing delegates sounds like a simpler, more elegant approach. Can you point out any other benefits of your proposal that trailing delegate args would not have? I believe Walter's response previously has been that we should just get used to looking at things like: my_for(i=0,i<10,i++,{<code>}); instead of adding complications to the grammar to support such things. --bb
Apr 07 2008
next sibling parent downs <default_357-line yahoo.de> writes:
Bill Baxter wrote:
 I believe Walter's response previously has been that we should just get
 used to looking at things like:
 
     my_for(i=0,i<10,i++,{<code>});
 
 instead of adding complications to the grammar to support such things.
 
 --bb

FWIW and just FYI, the least closing brackets can be done with my_for(i=0, i<10, i++) = {<code>}; using an overloaded opAssign. To make it flexible, template opAssign and make it lazy to allow chaining; i.e. my_for(...) = your_for(...) = {<code>}; For example, I use this in dglut: const string LazyCall=" static if (is(T==void)) t(); else static if (is(T==void delegate())) t()(); else static assert(false, T.stringof); "; Of course, I'd still rather have trailing DGs or full infix support. ^^ --downs
Apr 07 2008
prev sibling parent reply Russell Lewis <webmaster villagersonline.com> writes:
Bill Baxter wrote:
 Russell Lewis wrote:
 PROPOSAL: A way to handle sequences of expressions which otherwise 
 would have been syntax errors


 EXAMPLE CODE:
     my_for(i=0, i<10, i++) { <code> }

Are you familiar with the "trailing delegates" proposal? Basically the idea there is that any {<code>} block following a function call would be treated as an extra argument to the function. So if you write the function: void my_for(lazy void init, lazy bool test, lazy void inc, void delegate()) { ... } then your EXAMPLE_CODE above would call that function.

Yes, I am familiar with the concept. My proposal is a generalization of that which is able to handle any type of expression, and also to handle multiple expressions. OPEN QUESTION: What happens if an opSeq-type struct is *not* followed by anything? Do we need syntax to indicate whether that is legal or not? You asked how opSeq is better than trailing delegates, so here are some more examples of things that opSeq can do: 1) Bare statements. Take a look at my implementation of the MyFor struct from the original post. One of the overloads of opSeq takes "lazy void block", which means that this syntax is also legal: my_for(i=0, i<10, i++) a = a+1; 2) Suffixes. People have suggested that the expression 3 + 2i be something that can be implemented entirely as a library. If i was a variable and we supported "opSeqRev", then it would be easy! 3) Multiple arguments. Trailing delegates can't implement complex syntaxes, such as do...while. opSeq can. At the bottom of this post, I'll post code that will handle all of the following: MyWhile(a != b) <bare statement>; MyWhile(a != b) { <block>} MyDo <bare statement> MyWhile(a != b); MyDo { <block> } MyWhile(a != b); 4) Generalized syntax. The examples above indicate to me that a lot of D's syntax could be implemented in a library using opSeq. Would that allow many of D's constructs to be first class entities? Might that allow us to implement more functional-language type features? Here's the example code I promised: BEGIN CODE struct While { bool delegate() cond; void opSeq(lazy void bareStatement) { opSeq({ bareStatement(); }); } void opSeq(void delegate() block) { if(cond()) { BEGIN_LOOP: // so I don't have to use D's while! block(); if(cond()) goto BEGIN_LOOP; } } } While MyWhile(lazy bool cond) { While ret; ret.cond = cond; return ret; } struct Do { void opSeq(lazy void bareStatement, While the_while) { opSeq({ bareStatement(); }, the_while); } void opSeq(void delegate() block, While the_while) { block(); the_while block; } } // this isn't a function, it's a variable. that's because // the use of MyDo doesn't use parens. Do MyDo; END CODE
Apr 08 2008
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Russell Lewis wrote:
 Bill Baxter wrote:
 Russell Lewis wrote:
 PROPOSAL: A way to handle sequences of expressions which otherwise 
 would have been syntax errors


 EXAMPLE CODE:
     my_for(i=0, i<10, i++) { <code> }

Are you familiar with the "trailing delegates" proposal? Basically the idea there is that any {<code>} block following a function call would be treated as an extra argument to the function. So if you write the function: void my_for(lazy void init, lazy bool test, lazy void inc, void delegate()) { ... } then your EXAMPLE_CODE above would call that function.

Yes, I am familiar with the concept. My proposal is a generalization of that which is able to handle any type of expression, and also to handle multiple expressions. OPEN QUESTION: What happens if an opSeq-type struct is *not* followed by anything? Do we need syntax to indicate whether that is legal or not? You asked how opSeq is better than trailing delegates, so here are some more examples of things that opSeq can do: 1) Bare statements. Take a look at my implementation of the MyFor struct from the original post. One of the overloads of opSeq takes "lazy void block", which means that this syntax is also legal: my_for(i=0, i<10, i++) a = a+1; 2) Suffixes. People have suggested that the expression 3 + 2i be something that can be implemented entirely as a library. If i was a variable and we supported "opSeqRev", then it would be easy! 3) Multiple arguments. Trailing delegates can't implement complex syntaxes, such as do...while. opSeq can. At the bottom of this post, I'll post code that will handle all of the following: MyWhile(a != b) <bare statement>; MyWhile(a != b) { <block>} MyDo <bare statement> MyWhile(a != b); MyDo { <block> } MyWhile(a != b); 4) Generalized syntax. The examples above indicate to me that a lot of D's syntax could be implemented in a library using opSeq. Would that allow many of D's constructs to be first class entities? Might that allow us to implement more functional-language type features? Here's the example code I promised: BEGIN CODE struct While { bool delegate() cond; void opSeq(lazy void bareStatement) { opSeq({ bareStatement(); }); } void opSeq(void delegate() block) { if(cond()) { BEGIN_LOOP: // so I don't have to use D's while! block(); if(cond()) goto BEGIN_LOOP; } } } While MyWhile(lazy bool cond) { While ret; ret.cond = cond; return ret; } struct Do { void opSeq(lazy void bareStatement, While the_while) { opSeq({ bareStatement(); }, the_while); } void opSeq(void delegate() block, While the_while) { block(); the_while block; } } // this isn't a function, it's a variable. that's because // the use of MyDo doesn't use parens. Do MyDo; END CODE

Ok. Good examples. Here's another that I suppose would be possible: 5) Cast-like syntaxes. For instance the to! template in Phobos 2.x and Tango acts like a cast more or less, but you have to parenthesize the argument. Currently: int x = 5; string y = to!(string)(x); // ok! string z = to!(string) x; // error! But with your opSeq, I think the latter could be made legal, too. IIUC. I mention this because I keep forgetting to put those parenthesis around to!'s argument because it just feels so darn much like a cast. It's an interesting idea. Are you sure it doesn't kill the-ease-of-parsing requirement for the grammar? --bb
Apr 08 2008
parent reply Russell Lewis <webmaster villagersonline.com> writes:
Bill Baxter wrote:
 It's an interesting idea.  Are you sure it doesn't kill 
 the-ease-of-parsing requirement for the grammar?

That is something that I have worried about, as well, and I haven't done a rock-solid analysis of it. However, my hand-waving argument is that we parse the code without any knowledge of the types (we don't know which are opSeq handlers and which are not). If our parsing shows us that we have a sequence of expressions without any sort of operator between them, then we interpret that using the opSeq parse rule: expression: expression expression ... Then, in semantic analysis, we would decide whether that syntax is valid or not. Since opSeq is right-associative, we start at the far-right of any chain of expressions, and see if the next-to-last expression is an opSeq handler; if so, it must take 1 argument, and the type must match the rightmost expression. If not, then we work left, and so on. Mechanically, I think I can argue that this doesn't make the parser any more complex. What I don't know for sure, yet, is whether it introduces ambiguities into the grammar. Those often require a tool to find. :( Russ
Apr 09 2008
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Russell Lewis wrote:
 Bill Baxter wrote:
 It's an interesting idea.  Are you sure it doesn't kill 
 the-ease-of-parsing requirement for the grammar?

That is something that I have worried about, as well, and I haven't done a rock-solid analysis of it. However, my hand-waving argument is that we parse the code without any knowledge of the types (we don't know which are opSeq handlers and which are not). If our parsing shows us that we have a sequence of expressions without any sort of operator between them, then we interpret that using the opSeq parse rule: expression: expression expression ... Then, in semantic analysis, we would decide whether that syntax is valid or not. Since opSeq is right-associative, we start at the far-right of any chain of expressions, and see if the next-to-last expression is an opSeq handler; if so, it must take 1 argument, and the type must match the rightmost expression. If not, then we work left, and so on. Mechanically, I think I can argue that this doesn't make the parser any more complex. What I don't know for sure, yet, is whether it introduces ambiguities into the grammar. Those often require a tool to find. :(

Well, one ambiguity is stuff like: "foo 1 -2". Is this foo.opSeq(1 - 2) (i.e. foo.opSeq(-1)) or foo.opSeq(1, -2)? Ditto for '~' (concatenation versus bitwise negation), '&' (bitwise-and versus address-of), '!' (template instantiation versus logical negation), '.' ("member of" versus "look up in the global scope"), '+' (addition versus numeric identity function), '*' (multiplication versus dereferencing). If you only allow _unexpected_ expressions, as you suggest, that would mean always choosing the first alternative above. That would mean you'd have to disambiguate the unary versions of those operators by placing them in parentheses: "foo 1 (-2)" instead of the initial example. But that leaves another ambiguity: what about "foo x (-2)"? That would translate to foo.opSeq(x(-2)). I don't think this one can be resolved, even placing parentheses around x doesn't work. For example, if x is a delegate, the expression would mean the same thing with or without parentheses around it, so there would be no way to call Foo.opSeq(void delegate(int), int) except explicitly. Besides, if you're going to place parentheses around all the operands you might as well overload opCall and be done with it, without any syntax extensions or added ambiguity at all.
Apr 09 2008
parent Russell Lewis <webmaster villagersonline.com> writes:
Frits van Bommel wrote:
 Well, one ambiguity is stuff like: "foo 1 -2". Is this foo.opSeq(1 - 2) 
 (i.e. foo.opSeq(-1)) or foo.opSeq(1, -2)?
 Ditto for '~' (concatenation versus bitwise negation), '&' (bitwise-and 
 versus address-of), '!' (template instantiation versus logical 
 negation), '.' ("member of" versus "look up in the global scope"), '+' 
 (addition versus numeric identity function), '*' (multiplication versus 
 dereferencing).
 
 If you only allow _unexpected_ expressions, as you suggest, that would 
 mean always choosing the first alternative above. That would mean you'd 
 have to disambiguate the unary versions of those operators by placing 
 them in parentheses: "foo 1 (-2)" instead of the initial example.
 But that leaves another ambiguity: what about "foo x (-2)"? That would 
 translate to foo.opSeq(x(-2)). I don't think this one can be resolved, 
 even placing parentheses around x doesn't work. For example, if x is a 
 delegate, the expression would mean the same thing with or without 
 parentheses around it, so there would be no way to call Foo.opSeq(void 
 delegate(int), int) except explicitly.
 
 Besides, if you're going to place parentheses around all the operands 
 you might as well overload opCall and be done with it, without any 
 syntax extensions or added ambiguity at all.

Good points. I'll ponder 'em.
Apr 09 2008