www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - News and problems about foreach loops

reply "bearophile" <bearophileHUGS lycos.com> writes:
(This post is not crystal clear because unfortunately now I don't 
have a lot of time to make it better, but I think it's 
acceptable.)

In DMD 2.061 there will be some interesting changes, one of them 
regards the foreach loops. Foreach loops are everywhere in D 
code, it's one of the most used D constructs, so this is an 
important topic; and avoiding problems and corner cases, and 
making it not bug-prone is quite important.

Hara, Walter and others have started to implement a change I have 
suggested time ago (from Python loops):


import std.stdio;
void main() {
     foreach (i; 0 .. 10)
         i++; // line 4
     auto array = [10, 20, 30, 40, 50];
     foreach (i, item; array) {
         writeln(item);
         i++; // line 8
     }
}



With warnings this now gives:

test.d(4): Warning: variable modified in foreach body requires 
ref storage class
test.d(8): Warning: variable modified in foreach body requires 
ref storage class
10
30
50



Anoter change is visible here, in past this code required:

struct Foo { int x; }
void main() {
     auto data = [Foo(10), Foo(20), Foo(30)];
     foreach (const Foo f; data) {}
}



Now you can omit the type, this is very handy:

struct Foo { int x; }
void main() {
     auto data = [Foo(10), Foo(20), Foo(30)];
     foreach (const f; data) {}
}




Unfortunately currently this compiles and runs, but there is 
already a patch to forbid it:
https://github.com/D-Programming-Language/dmd/pull/1249


void main() {
     foreach (const i; 0 .. 10)
         i++;
}


See also:
http://d.puremagic.com/issues/show_bug.cgi?id=6214
http://d.puremagic.com/issues/show_bug.cgi?id=4090
http://d.puremagic.com/issues/show_bug.cgi?id=6652



Given how foreach now works on integer ranges, there is one more 
situation worth discussing. In D this is a recognized common 
source of bugs:


struct F { int x; }
void main() {
     auto fs = [F(10), F(20), F(30)];
     foreach (f; fs)
         f.x += 1;
     assert(fs[0].x == 10);
}



I think there are 3 main use cases for foreach loops on 
collections of structs:

Case1: The struct iterated on is constant or it's not modified in 
the loop body:

import std.stdio;
struct F { int x; }
void main() {
     auto data = [F(10), F(20), F(30)];
     foreach (f; data)
         writeln(f);
     foreach (const f; data)
         writeln(f);
     foreach (const ref f; data)
         writeln(f);
     foreach (immutable f; data)
         writeln(f);
     immutable data2 = [F(10), F(20), F(30)];
     foreach (immutable ref f; data2)
         writeln(f);
}



Case2: you want to modify the structs of the array:

import std.stdio;
struct F { int x; }
void main() {
     auto data = [F(10), F(20), F(30)];
     foreach (ref f; data)
         f.x++;
     writeln(data);
}



Case3: you want to modify just the copy of the struct, but you 
don't want to modify the structs inside the array:

import std.stdio, std.range;
void main() {
     auto iotas = [iota(0,3), iota(3,7), iota(7,10)];
     foreach (it; iotas) {
         it.popFront();
         it.popFront();
         writeln(it);
     }
     writeln(iotas);
}




[Note: in foreach ranged loops you only have Case1 and Case2 (and 
in practice Case2 is not common):

import std.stdio;
void main() {
     foreach (const i; 0 .. 10)
         writeln(i);
}

]


Currently in D Case1 requires nothing, but today it's also more 
handy than before to mark the struct loop variable with const or 
immutable or even "const ref", to make sure you will not change 
it inside the loop.

Case2 requires ref. If you see "ref" (but not "const ref") you 
know inside the loop the struct will probably be modified.

Currently Case3 is denoted as Case1, but in my code this is a 
much less common case. It's easy to write by mistake the Case3 
pattern when you want to write the Case2 pattern.


So if there is a warning that warns against code like (later the 
warning will become an error, I think. It's a very small breaking 
change):

import std.stdio;
void main() {
     foreach (i; 0 .. 4)
         i++;
}



then maybe the fake case3 too should become a warning and later 
an error (this is another breaking change):

// Solution1:
void main() {
     auto data = [0, 1, 2, 3];
     foreach (x; data)
         x++; // warning
}

So the compiler always require "const" or "ref" in the loops, so 
you can't write true Case3 any more. You have to copy the struct 
manually:

import std.stdio, std.range;
void main() {
     auto iotas = [iota(0,3), iota(3,7), iota(7,10)];
     foreach (it; iotas) {
         auto it2 = it;
         it2.popFront();
         it2.popFront();
         writeln(it2);
     }
     writeln(iotas);
}


What are the alternative solutions?

A second solution is to require some kind of tagging to tell the 
D compiler that the programmer really wants the Case3:

// Solution2:
void main() {
     auto data = [0, 1, 2, 3];
     foreach ( mutable x; data)
         x++;
     foreach (/*mut*/ x; data)
         x++;
}


A third solution is use idioms, and do not change D. It means 
that on default the programmer puts always a "const" in foreach. 
This avoids most bugs caused by fake Case3, and you don't use it 
in the uncommon true Cases3.

There are few more alternative ways to face this problem. 
Including doing nothing :-)

Bye,
bearophile
Nov 03 2012
next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 3 November 2012 at 14:04:45 UTC, bearophile wrote:
 A third solution is use idioms, and do not change D. It means 
 that on default the programmer puts always a "const" in 
 foreach. This avoids most bugs caused by fake Case3, and you 
 don't use it in the uncommon true Cases3.

 There are few more alternative ways to face this problem. 
 Including doing nothing :-)

I never use Case3, except accidentally. It's possibly the most common language-caused bug in my D code. That said, I'm not a fan of introducing new keywords or introducing breaking changes in D code (even though I don't use Case3). I think this might just be something we have to live with, although making use of "const" idiomatic may be a good step forward (unfortunately, it's more unnecessary typing, so it's unlikely to catch on).
Nov 03 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 11/3/12, bearophile <bearophileHUGS lycos.com> wrote:
 Now you can omit the type, this is very handy:

 struct Foo { int x; }
 void main() {
      auto data = [Foo(10), Foo(20), Foo(30)];
      foreach (const f; data) {}
 }

That's really cool that we have this now. But this error message needs to improve: foreach (const f; data) { f.x++; } test.d(9): Error: can only initialize const member x inside constructor It should be something like "cannot modify const member x".
Nov 03 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 11/3/12, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 It should be something like "cannot modify const member x".

In fact the error is already read but the else statement is not taken in Expression::checkModifiable: if (var && var->storage_class & STCctorinit) // goes here { const char *p = var->isStatic() ? "static " : ""; error("can only initialize %sconst member %s inside %sconstructor", p, var->toChars(), p); } else { OutBuffer buf; MODtoBuffer(&buf, type->mod); error("cannot modify %s expression %s", buf.toChars(), toChars()); }
Nov 03 2012
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Peter Alexander:

 I'm not a fan of introducing new keywords or introducing 
 breaking changes in D code (even though I don't use Case3). I 
 think this might just be something we have to live with,

Using a custom property is maybe acceptable (this copy doesn't need to be a built-in, probably it's not too much hard to define it with user-defined attributes): void main() { auto data = [0, 1, 2, 3]; foreach ( copy x; data) x++; } But there are other solutions, like asking D tool (like D IDEs or future D lints) to show a warning here. Another solution is to require an already present keyword ("new" means it's a copy, it's not related to heap allocations) for the Case3: void main() { auto data = [0, 1, 2, 3]; foreach (new x; data) x++; } Bye, bearophile
Nov 03 2012