www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - undocumented opMatch induced semantic changes to while-loops

reply Oskar Linde <olREM OVEnada.kth.se> writes:
Hello,

Among the language changes introduced by the new ~~ operator is a
undocumented(?) semantic change to while-loops. This is a quite surprising
behavior:

import std.stdio;
import std.string;

char[] getLine()
{
        static int x = 0;
        if (x > 32)
                return "";
        return format("%s:%s:%s:%s:",x++,x++,x++,x++);
}

int main() {
        while("([0-9]*):" ~~ getLine()) {
                writefln("Match: ",_match.match(1));
        }
        return 0;
}

Prints:
Match: 3
Match: 2
Match: 1
Match: 0

(To get this to compile on linux, I had to manually include
~/dmd/src/phobos/internal/match.d for the _d_match() function. I guess this
is just missing from the precompiled phobos library on linux)

I would assume getString() to be evaluated for each iteration of the while
loop, but this is apparently not the case. 

This changes behavior -- that people for more than 30 years have been
expecting and relying on -- of the while loop in C-like languages...!

What is it that happens here? Looking at the dmd front end sources,
statement.c line 513:

Statement *WhileStatement::semantic(Scope *sc)
{
    if (condition->op == TOKmatch)
    {
        /* Rewrite while (condition) body as:
         *   if (condition)
         *     do
         *       body
         *     while ((_match = _match.opNext), _match);
         */

        ... [snip code that does the rewrite and injects a _match identifier]
    }
   ...
}

So we now have a opNext() that gets called on the match for each following
iteration of the while loop...

While if - do - while loops generally are very good for branch prediction on
modern cpus, it feels very odd to silently change the semantics of the
while-loop like this.

This whole opMatch thing feels like a hack. In this contex, I would define a
hack to be something that adds many new special cases rather than making
general ones. Examples:
1) Injecting a _match variable in the scope of if() and while() loops of the
result of the condition, but only if the condition is an opMatch()
expression.
2) Changing the semantics of the while loop, but only if the condition is an
opMatch() expression.

We now have two custom ways of iterating over collections:

the classic opApply() - used by foreach
the new opMatch() - opNext() automatically used by while-loops

Suffice to say that this feels like a mistake. To iterate over the matches
of a string, one should use foreach(), not make semantic changes to the
while loop.

IMHO of course. 

Disclaimer: This post was written before breakfast :)

Regards,

Oskar
Feb 16 2006
next sibling parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
Oskar Linde wrote:

 Hello,
 
 Among the language changes introduced by the new ~~ operator is a
 undocumented(?) semantic change to while-loops. This is a quite surprising
 behavior:
I apologize calling this undocumented. I overlooked the documentation on while statements. My other points still remain. foreach would be much more natural to overload with this behavior than while. Foreach is expected to evaluate its argument once, while is expected to evaluate it each iteration. /Oskar
Feb 16 2006
next sibling parent "Derek Parnell" <derek psych.ward> writes:
On Thu, 16 Feb 2006 22:12:42 +1100, Oskar Linde <olREM OVEnada.kth.se>  
wrote:

 Oskar Linde wrote:

 Hello,

 Among the language changes introduced by the new ~~ operator is a
 undocumented(?) semantic change to while-loops. This is a quite  
 surprising
 behavior:
I apologize calling this undocumented. I overlooked the documentation on while statements. My other points still remain. foreach would be much more natural to overload with this behavior than while. Foreach is expected to evaluate its argument once, while is expected to evaluate it each iteration.
You make a whole lot of sense, Oskar. I agree that 'while' is expected to re-evaluate the expression on each iteration and 'foreach' is expected to evaluate it just the once. I'm surprised that Walter has changed this. -- Derek Parnell Melbourne, Australia
Feb 16 2006
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1j71$196$1 digitaldaemon.com...
 My other points still remain. foreach would be much more
 natural to overload with this behavior than while. Foreach is expected to
 evaluate its argument once, while is expected to evaluate it each
 iteration.
It is a very good point. Perhaps it should be instead: foreach (MatchExpression) { } ??
Feb 16 2006
next sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <dt2aig$qn3$4 digitaldaemon.com>, Walter Bright says...
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1j71$196$1 digitaldaemon.com...
 My other points still remain. foreach would be much more
 natural to overload with this behavior than while. Foreach is expected to
 evaluate its argument once, while is expected to evaluate it each
 iteration.
It is a very good point. Perhaps it should be instead: foreach (MatchExpression) { } ??
I think it should - just seems a better match overall, especially considering they are both D built-in's as compared to C and C++. As an aside, most Perl users at least will be pretty comfortable with D's built-in 'foreach' (the syntax is different of course, but the idea is firmly implanted). - Dave
Feb 16 2006
parent Dave <Dave_member pathlink.com> writes:
In article <dt2cp1$t7g$1 digitaldaemon.com>, Dave says...
In article <dt2aig$qn3$4 digitaldaemon.com>, Walter Bright says...
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1j71$196$1 digitaldaemon.com...
 My other points still remain. foreach would be much more
 natural to overload with this behavior than while. Foreach is expected to
 evaluate its argument once, while is expected to evaluate it each
 iteration.
It is a very good point. Perhaps it should be instead: foreach (MatchExpression) { } ??
I think it should - just seems a better match overall, especially considering they are both D built-in's as compared to C and C++. As an aside, most Perl users at least will be pretty comfortable with D's built-in 'foreach' (the syntax is different of course, but the idea is firmly implanted). - Dave
What if: if(MatchExpression) // ME not compiled foreach(MatchExpression) // ME compiled Because I can see where things like this will be used a lot: char[][] recs = split(cast(char[])read("path/to/file"),"\n")); foreach(char[] rec; recs) { if("<regex>" ~~ rec) { .. } } Would this make sense? Thanks, - Dave
Feb 16 2006
prev sibling parent pragma <pragma_member pathlink.com> writes:
In article <dt2aig$qn3$4 digitaldaemon.com>, Walter Bright says...
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1j71$196$1 digitaldaemon.com...
 My other points still remain. foreach would be much more
 natural to overload with this behavior than while. Foreach is expected to
 evaluate its argument once, while is expected to evaluate it each
 iteration.
It is a very good point. Perhaps it should be instead: foreach (MatchExpression) { }
Please? It just makes more sense that way. Of course this opens the door for an implicit iterator syntax, ala the '$' suggestion (semantic scope operator?) earlier.
 foreach(<Expression>){
    // compiler inserts: foreach(auto _match; <something>){
    // where _match is aliased to '$', and matches are harvested via opApply()
    writefln("match: %s",$);
 }
When you look at the problem from a more generlized standpoint, <Expression> doesn't necessarily have to be a <RegularExpression> at all. Likewise the implicit '_match' token could be more general too like '_loop', '_iter' or '_value'. So if we want to use the shorthand we can, but if we're using nested loops or desire an explicit variable, we can do it the old-fashioned way:
 foreach(auto myMatch; <Expression>)
- Eric Anderton at yahoo
Feb 16 2006
prev sibling parent "Walter Bright" <newshound digitalmars.com> writes:
"Oskar Linde" <olREM OVEnada.kth.se> wrote in message 
news:dt1g6c$309n$1 digitaldaemon.com...
 (To get this to compile on linux, I had to manually include
 ~/dmd/src/phobos/internal/match.d for the _d_match() function. I guess 
 this
 is just missing from the precompiled phobos library on linux)
That's right, I goofed that.
Feb 16 2006