www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Iterating over multiple collections in parallel

reply "Koroskin Denis" <2korden gmail.com> writes:
If found myself solving the same problem again and again:
how to implement simultaneous iteration over two (or more collections)?

suppose, we have the arrays:
int[] a = [1, 2, 3];
int[] b = [1, 4, 9];

and would like to multiply them per-component, like this:

int[] c = new int[a.length];
for (int i = 0; i < a.length; ++j) {
     c[i] = a[i] * b[i];
}

Since foreach is the prefered (and sometimes the only) way to iterate over  
collection, there should be a way to iterate over multiple collections  
simultaneously, like (just an example) this:

int[] c = new int[a.length];
foreach (i : a; j : b; ref k : c) {
     k = a + b;
}

But this isn't supported (yet?)
More complicated example would be an iterating over two (or more)  
collection with *no* random access iterators available, opApply only.

I spend a lot of time on this and have found no solution how to do it with  
existing D feature set, but this is surely doable with a few  
inter-function gotos and an exception if collections sizes don't match.  
It's just a small layer written in asm, nothing more.

How about an enhancement proposal? :)
Jul 03 2008
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Koroskin Denis" <2korden gmail.com> wrote in message 
news:op.udp4pcjxenyajd proton.creatstudio.intranet...
 If found myself solving the same problem again and again:
 how to implement simultaneous iteration over two (or more collections)?

 suppose, we have the arrays:
 int[] a = [1, 2, 3];
 int[] b = [1, 4, 9];

 and would like to multiply them per-component, like this:

 int[] c = new int[a.length];
 for (int i = 0; i < a.length; ++j) {
     c[i] = a[i] * b[i];
 }

 Since foreach is the prefered (and sometimes the only) way to iterate over 
 collection, there should be a way to iterate over multiple collections 
 simultaneously, like (just an example) this:

 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
     k = a + b;
 }

 But this isn't supported (yet?)
 More complicated example would be an iterating over two (or more) 
 collection with *no* random access iterators available, opApply only.

 I spend a lot of time on this and have found no solution how to do it with 
 existing D feature set, but this is surely doable with a few 
 inter-function gotos and an exception if collections sizes don't match. 
 It's just a small layer written in asm, nothing more.

 How about an enhancement proposal? :)
It's something Walter mentioned before, but without any kind of public "plan" of these ideas he has, there's no way to know if it's even on his plate. I think the syntax he used was something like this: foreach(a, b; c)(d, e; f) { blah } In any case, it's difficult in the general case to do parallel iteration with D's inside-out iterators. To do it generally, you'd have to use coroutines. But you can currently write a templated zip to iterate over arrays simultaneously, only because they're so easy to iterate over. I'm sure downs will post something horrid-looking that you can look over.
Jul 03 2008
prev sibling next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Koroskin,

 If found myself solving the same problem again and again: how to
 implement simultaneous iteration over two (or more collections)?
 
 suppose, we have the arrays:
 int[] a = [1, 2, 3];
 int[] b = [1, 4, 9];
 and would like to multiply them per-component, like this:
 
 int[] c = new int[a.length];
 for (int i = 0; i < a.length; ++j) {
 c[i] = a[i] * b[i];
 }
 Since foreach is the prefered (and sometimes the only) way to iterate
 over  collection, there should be a way to iterate over multiple
 collections  simultaneously, like (just an example) this:
 
 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
 k = a + b;
 }
 But this isn't supported (yet?)
 More complicated example would be an iterating over two (or more)
 collection with *no* random access iterators available, opApply only.
 I spend a lot of time on this and have found no solution how to do it
 with  existing D feature set, but this is surely doable with a few
 inter-function gotos and an exception if collections sizes don't
 match.  It's just a small layer written in asm, nothing more.
 
 How about an enhancement proposal? :)
 
Can a singe thread have more than one stack? If someone can figure out how to do that, then this wouldn't be "to hard" to do in a lib. call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last dg is called on the threads normal stack and calls the real body.
Jul 03 2008
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"BCS" <ao pathlink.com> wrote in message 
news:55391cb32ecb08caab0811331728 news.digitalmars.com...
 Can a singe thread have more than one stack? If someone can figure out how 
 to do that, then this wouldn't be "to hard" to do in a lib.

 call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
 dg is called on the threads normal stack and calls the real body.
Of course, they're called stackthreads or Fibers. Downs has implemented them in his tools package (in an unstable state) for Phobos, and they've been in Tango for more than two years.
Jul 03 2008
parent reply downs <default_357-line yahoo.de> writes:
Jarrett Billingsley wrote:
 "BCS" <ao pathlink.com> wrote in message 
 news:55391cb32ecb08caab0811331728 news.digitalmars.com...
 Can a singe thread have more than one stack? If someone can figure out how 
 to do that, then this wouldn't be "to hard" to do in a lib.

 call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
 dg is called on the threads normal stack and calls the real body.
Of course, they're called stackthreads or Fibers. Downs has implemented them in his tools package (in an unstable state) for Phobos, and they've been in Tango for more than two years.
They're pretty stable now, but they depend on a Phobos patch to run acceptably fast (500 cycles per context switch on my Pentium M). And I don't know if they run on DMD; there was some problem with the GDC assembler statements. If somebody finds a problem with it, tell me in a reply and I'll try to fix it. Anyway, here's parallel iteration over two foreachables: gentoo-pc ~/d $ cat test33.d && echo "----" && rebuild test33.d && ./test33 module test33; import std.stdio, tools.stackthreads; void main() { auto it1 = [1, 2, 3]; auto it2 = [1, 4, 9]; auto iter2 = generator((void delegate(int) yield) { foreach (entry; it2) yield(entry); }); foreach (entry1; it1) { auto entry2 = iter2(); writefln(entry1, " - ", entry2); } } ---- 1 - 1 2 - 4 3 - 9 gentoo-pc ~/d $
Jul 04 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
downs:
   auto iter2 = generator((void delegate(int) yield) {
     foreach (entry; it2) yield(entry);
   });
 
   foreach (entry1; it1) {
     auto entry2 = iter2();
     writefln(entry1, " - ", entry2);
Nice :-) Bye, bearophile
Jul 04 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
Iterating over many collection in parallel is very nice to have, what 
would be nice to have with a cleaner syntax for it.

Aldor had it, and since then I always missed it.
It worked like this:
Iterator/loops can be declared like this:

for a in [1,2,3]
for b in [1,0,7,3]
for i in 1..
while (b<7)
repeat {
	...
}

which mean follow *all* iterators in parallel, and execute ..., when 
any iterator stop, stop everything

very clear, nice and flexible.

Other syntaxes look like a kludge with respect to it.

Python for example has generators/iterators, and builds derived 
iterators one on the top of the other

for a in [1,2,3]
for i in 0..
repeat { ... }

becomes
for i,a in enumerate([1,2,3].iterator()):
  ...

C++ iterators can be used in parallel, because all the difficulty of 
keeping the current position is moved to the iterator writer, but the 
syntax is just plain ugly and writing efficient iterators for complex 
structures can be painful.

Anyway leaving away that syntax, in D without changes one can already 
have a reasonable syntax (even if not as nice).
Using fibers, as in the nice example of downs it should also be 
possible to convert automatically an opApply in a generator, so 
technically it should be feasible, even if 500 cycles per conetext 
switch, might be too much for some applications (tight loops).
So in D one could have a something workable
foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
...
}

where collectLoop is a variadic template function that converts opApply 
to a generator and returns an object having a opApply that loops in 
parallel.
Implementation of it is left as exercice to the reader ;-)

Fawzi
Jul 05 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Fawzi Mohamed:

 for i,a in enumerate([1,2,3].iterator()):
I may have lost you a bit here, anyway in real Python that code is: for idx, el in enumerate([1, 2, 3]): print idx, el
 Using fibers, as in the nice example of downs it should also be 
 possible to convert automatically an opApply in a generator, so 
 technically it should be feasible, even if 500 cycles per conetext 
 switch, might be too much for some applications (tight loops).
 So in D one could have a something workable
 foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
 ...
 }
 Implementation of it is left as exercice to the reader ;-)
In my libs there are zip() and azip() for something like that, but they don't use fibers yet... Once the fibers are added, the code can be made smart at compile-time, so it can use the current fast code when possible, and fibers when it must. Bye, bearophile
Jul 05 2008
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Koroskin Denis" wrote
 If found myself solving the same problem again and again:
 how to implement simultaneous iteration over two (or more collections)?

 suppose, we have the arrays:
 int[] a = [1, 2, 3];
 int[] b = [1, 4, 9];

 and would like to multiply them per-component, like this:

 int[] c = new int[a.length];
 for (int i = 0; i < a.length; ++j) {
     c[i] = a[i] * b[i];
 }

 Since foreach is the prefered (and sometimes the only) way to iterate over 
 collection, there should be a way to iterate over multiple collections 
 simultaneously, like (just an example) this:

 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
     k = a + b;
 }

 But this isn't supported (yet?)
 More complicated example would be an iterating over two (or more) 
 collection with *no* random access iterators available, opApply only.

 I spend a lot of time on this and have found no solution how to do it with 
 existing D feature set, but this is surely doable with a few 
 inter-function gotos and an exception if collections sizes don't match. 
 It's just a small layer written in asm, nothing more.

 How about an enhancement proposal? :)
The issue here is how to formulate each iteration. foreach works because it is well defined "iterate over each value in the container". But iterating over multiple containers, you could want different orders of iteration, iterating for each unique set of indexes, etc. Or you could want what you had originally stated, in which cases the lengths must be correct. We already have foreach_reverse, which is like foreach, but iterates in the opposite direction. I don't want foreach_X for everyones preferred iteration method of choice. I think the best solution here is a fruct (foreach struct) in a library: struct myfruct { static myfruct opCall(int[] a, int[] b, int[] c) { myfruct result; result.a = a; result.b = b; result.c = c; assert(a.length == b.length && a.length == c.length); } int[] a; int[] b; int[] c; int opApply(int delegate(ref int i, ref int j, ref int k) dg) { int result = 0; for(int i = 0; i < a.length; i++) { if((result = dg(a[i], b[i], c[i])) != 0) break; } return result; } } usage: foreach(i, j, ref k; myfruct(a, b, c)) { k = i * j; } Now, the issue here is, how can this be made generic? The answer is, we need iterators. Otherwise you cannot iterate over multiple containers that are not index-based (such as AA's or maps). With iterators, that have a well defined interface (like opApply), that allows one to increment individual indexes, and not have to know what indexes are made of how how they work. They just define opInc and opStar. Oh, and we would need reference return types too (so opStar can return an actual reference). -Steve
Jul 03 2008
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Koroskin Denis wrote:
 I spend a lot of time on this and have found no solution how to do it 
 with existing D feature set, but this is surely doable with a few 
 inter-function gotos and an exception if collections sizes don't match. 
It's a well-known problem, and not so easy to solve in an intuitive, non-klunky way. Some of the issues are what do you do when the collections are of different sizes? Error? Iterate to the minimum? Iterate to the maximum and use default values for the shorter collections? The best I can offer at the moment is: foreach (i; 0 .. a.length) a[i] = b[i] + c[i];
Jul 05 2008
prev sibling parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Koroskin Denis wrote:

 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
      k = a + b;
 }
What's wrong with auto c= a .* b; except, that it is neither suggested nor supported? Or auto c= a (*) b; which was suggested some long time ago, but has not got through? Is it horrible then to define: void opDotMul(T)(out T c, T x, T y){ assert( x.length == y.length); c.length= x.length; foreach( i,v; x){ c[i]= x[i] * y[i]; } } -manfred
Jul 05 2008
next sibling parent reply Yigal Chripun <yigal100 gmail.com> writes:
Manfred_Nowak wrote:
 Koroskin Denis wrote:
 
 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
      k = a + b;
 }
What's wrong with auto c= a .* b; except, that it is neither suggested nor supported? Or auto c= a (*) b; which was suggested some long time ago, but has not got through? Is it horrible then to define: void opDotMul(T)(out T c, T x, T y){ assert( x.length == y.length); c.length= x.length; foreach( i,v; x){ c[i]= x[i] * y[i]; } } -manfred
what about: int[100] a, b, c; for (int i = 0; i < 100; ++i) { a[i] = myCrazyFunkyFunction(b[i], c[i]); } D does not and should not have an opMyCrazyFunkyFunction.
Jul 05 2008
parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Yigal Chripun wrote:

 D does not and should not have an opMyCrazyFunkyFunction.
- Maybe some knowledge of some types of disagreeing and their relation can turn out to be useful: http://blog.createdebate.com/2008/04/07/writing-strong-arguments/ - Maybe a DigitalMars_funky_extension.opMyCrazyFunkyFunction is allowed to exist? At least it's in the language specification: http://www.digitalmars.com/d/1.0/pragma.html -manfred
Jul 05 2008
parent Yigal Chripun <yigal100 gmail.com> writes:
Manfred_Nowak wrote:
 Yigal Chripun wrote:
 
 D does not and should not have an opMyCrazyFunkyFunction.
- Maybe some knowledge of some types of disagreeing and their relation can turn out to be useful: http://blog.createdebate.com/2008/04/07/writing-strong-arguments/ - Maybe a DigitalMars_funky_extension.opMyCrazyFunkyFunction is allowed to exist? At least it's in the language specification: http://www.digitalmars.com/d/1.0/pragma.html -manfred
What's so strong about the word "should" ?! If I wanted to make a "strong argument" as you say, I would've used the word "must". Also, I didn't talk about pragmas and compiler specific extensions but rather meant the core language. maybe you want to suggest a way to define infix functions? Last thing: I find it ridiculous you try to teach me the proper way to post because you didn't like my phrasing while other people use insults and other bad language on the NG and you're fine with that. That's just smells of hypocrisy. If you really want to educate someone here, start with those that curse, swear and troll. disappointed, Yigal
Jul 06 2008
prev sibling parent reply "Koroskin Denis" <2korden gmail.com> writes:
On Sun, 06 Jul 2008 01:45:08 +0400, Manfred_Nowak <svv1999 hotmail.com> =
 =

wrote:

 Koroskin Denis wrote:

 int[] c =3D new int[a.length];
 foreach (i : a; j : b; ref k : c) {
      k =3D a + b;
 }
What's wrong with auto c=3D a .* b; except, that it is neither suggested nor supported? Or auto c=3D a (*) b; which was suggested some long time ago, but has not got through? Is it horrible then to define: void opDotMul(T)(out T c, T x, T y){ assert( x.length =3D=3D y.length); c.length=3D x.length; foreach( i,v; x){ c[i]=3D x[i] * y[i]; =
     }
 }

 -manfred
Nothing wrong. It was just brought as an example. Replace int[] with a List!(int) that exposes opApply() but not iterators= = and try again :) The only way (apart from using stackthreads) you can do it now is = something like this: List!(int) a, b, c; int[] acopy, bcopy; foreach (i; a) { acopy ~=3D i; } foreach (j; b) { bcopy ~=3D j; } int i =3D 0; foreach (ref k; c) { k =3D acopy[i] + bcopy[i]; } compare with the following: List!(int) a, b, c; foreach (i; a) (j; b)(ref k; c) { c =3D a + b; } which one would you prefer?
Jul 05 2008
parent "Manfred_Nowak" <svv1999 hotmail.com> writes:
Koroskin Denis wrote:

 List!(int) a, b, c;
[...]
 which one would you prefer?
Still c= a .+ b; To do componentwise operations imposes some requirements for the involved types. These are at least: - an order on the elements, and - definition(s) for the operation(s) on the elements If an opApply cannot be interpreted as representing an ordering, than your List!-example will fail badly. -manfred
Jul 05 2008