digitalmars.D - Iterating over multiple collections in parallel

Koroskin Denis (25/25) Jul 03 2008 If found myself solving the same problem again and again:

Jarrett Billingsley (12/37) Jul 03 2008 It's something Walter mentioned before, but without any kind of public
BCS (5/35) Jul 03 2008 Can a singe thread have more than one stack? If someone can figure out h...

Jarrett Billingsley (5/9) Jul 03 2008 Of course, they're called stackthreads or Fibers. Downs has implemented...

downs (23/36) Jul 04 2008 They're pretty stable now, but they depend on a Phobos patch to run acce...

bearophile (4/11) Jul 04 2008 Nice :-)

Fawzi Mohamed (43/43) Jul 05 2008 Iterating over many collection in parallel is very nice to have, what

bearophile (7/17) Jul 05 2008 I may have lost you a bit here, anyway in real Python that code is:

Steven Schveighoffer (47/72) Jul 03 2008 The issue here is how to formulate each iteration. foreach works becaus...
Walter Bright (8/11) Jul 05 2008 It's a well-known problem, and not so easy to solve in an intuitive,
Manfred_Nowak (16/20) Jul 05 2008 What's wrong with

Yigal Chripun (7/39) Jul 05 2008 what about:

Manfred_Nowak (8/9) Jul 05 2008 - Maybe some knowledge of some types of disagreeing and their relation

Yigal Chripun (14/27) Jul 06 2008 What's so strong about the word "should" ?!

Koroskin Denis (27/47) Jul 05 2008 On Sun, 06 Jul 2008 01:45:08 +0400, Manfred_Nowak ...

Manfred_Nowak (11/13) Jul 05 2008 Still

"Koroskin Denis" <2korden gmail.com> writes:

If found myself solving the same problem again and again:
how to implement simultaneous iteration over two (or more collections)?

suppose, we have the arrays:
int[] a = [1, 2, 3];
int[] b = [1, 4, 9];

and would like to multiply them per-component, like this:

int[] c = new int[a.length];
for (int i = 0; i < a.length; ++j) {
     c[i] = a[i] * b[i];
}

Since foreach is the prefered (and sometimes the only) way to iterate over  
collection, there should be a way to iterate over multiple collections  
simultaneously, like (just an example) this:

int[] c = new int[a.length];
foreach (i : a; j : b; ref k : c) {
     k = a + b;
}

But this isn't supported (yet?)
More complicated example would be an iterating over two (or more)  
collection with *no* random access iterators available, opApply only.

I spend a lot of time on this and have found no solution how to do it with  
existing D feature set, but this is surely doable with a few  
inter-function gotos and an exception if collections sizes don't match.  
It's just a small layer written in asm, nothing more.

How about an enhancement proposal? :)

Jul 03 2008

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Koroskin Denis" <2korden gmail.com> wrote in message 
news:op.udp4pcjxenyajd proton.creatstudio.intranet...
 If found myself solving the same problem again and again:
 how to implement simultaneous iteration over two (or more collections)?

 suppose, we have the arrays:
 int[] a = [1, 2, 3];
 int[] b = [1, 4, 9];

 and would like to multiply them per-component, like this:

 int[] c = new int[a.length];
 for (int i = 0; i < a.length; ++j) {
     c[i] = a[i] * b[i];
 }

 Since foreach is the prefered (and sometimes the only) way to iterate over 
 collection, there should be a way to iterate over multiple collections 
 simultaneously, like (just an example) this:

 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
     k = a + b;
 }

 But this isn't supported (yet?)
 More complicated example would be an iterating over two (or more) 
 collection with *no* random access iterators available, opApply only.

 I spend a lot of time on this and have found no solution how to do it with 
 existing D feature set, but this is surely doable with a few 
 inter-function gotos and an exception if collections sizes don't match. 
 It's just a small layer written in asm, nothing more.

 How about an enhancement proposal? :)

It's something Walter mentioned before, but without any kind of public 
"plan" of these ideas he has, there's no way to know if it's even on his 
plate.

I think the syntax he used was something like this:

foreach(a, b; c)(d, e; f) { blah }

In any case, it's difficult in the general case to do parallel iteration 
with D's inside-out iterators.  To do it generally, you'd have to use 
coroutines.  But you can currently write a templated zip to iterate over 
arrays simultaneously, only because they're so easy to iterate over.  I'm 
sure downs will post something horrid-looking that you can look over.

Jul 03 2008

BCS <ao pathlink.com> writes:

Reply to Koroskin,

 If found myself solving the same problem again and again: how to
 implement simultaneous iteration over two (or more collections)?
 
 suppose, we have the arrays:
 int[] a = [1, 2, 3];
 int[] b = [1, 4, 9];
 and would like to multiply them per-component, like this:
 
 int[] c = new int[a.length];
 for (int i = 0; i < a.length; ++j) {
 c[i] = a[i] * b[i];
 }
 Since foreach is the prefered (and sometimes the only) way to iterate
 over  collection, there should be a way to iterate over multiple
 collections  simultaneously, like (just an example) this:
 
 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
 k = a + b;
 }
 But this isn't supported (yet?)
 More complicated example would be an iterating over two (or more)
 collection with *no* random access iterators available, opApply only.
 I spend a lot of time on this and have found no solution how to do it
 with  existing D feature set, but this is surely doable with a few
 inter-function gotos and an exception if collections sizes don't
 match.  It's just a small layer written in asm, nothing more.
 
 How about an enhancement proposal? :)
 

Can a singe thread have more than one stack? If someone can figure out how 
to do that, then this wouldn't be "to hard" to do in a lib.

call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
dg is called on the threads normal stack and calls the real body.

Jul 03 2008

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"BCS" <ao pathlink.com> wrote in message 
news:55391cb32ecb08caab0811331728 news.digitalmars.com...
 Can a singe thread have more than one stack? If someone can figure out how 
 to do that, then this wouldn't be "to hard" to do in a lib.

 call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
 dg is called on the threads normal stack and calls the real body.

Of course, they're called stackthreads or Fibers.  Downs has implemented 
them in his tools package (in an unstable state) for Phobos, and they've 
been in Tango for more than two years.

Jul 03 2008

downs <default_357-line yahoo.de> writes:

Jarrett Billingsley wrote:
 "BCS" <ao pathlink.com> wrote in message 
 news:55391cb32ecb08caab0811331728 news.digitalmars.com...
 Can a singe thread have more than one stack? If someone can figure out how 
 to do that, then this wouldn't be "to hard" to do in a lib.

 call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last 
 dg is called on the threads normal stack and calls the real body.

 
 Of course, they're called stackthreads or Fibers.  Downs has implemented 
 them in his tools package (in an unstable state) for Phobos, and they've 
 been in Tango for more than two years. 
 
 

They're pretty stable now, but they depend on a Phobos patch to run acceptably
fast (500 cycles per context switch on my Pentium M).

And I don't know if they run on DMD; there was some problem with the GDC
assembler statements. If somebody finds a problem with it, tell me in a reply
and I'll try to fix it.

Anyway, here's parallel iteration over two foreachables:

gentoo-pc ~/d $ cat test33.d && echo "----" && rebuild test33.d && ./test33
module test33;
import std.stdio, tools.stackthreads;

void main() {
  auto it1 = [1, 2, 3];
  auto it2 = [1, 4, 9];
  auto iter2 = generator((void delegate(int) yield) {
    foreach (entry; it2) yield(entry);
  });

  foreach (entry1; it1) {
    auto entry2 = iter2();
    writefln(entry1, " - ", entry2);
  }
}
----
1 - 1
2 - 4
3 - 9
gentoo-pc ~/d $

Jul 04 2008

bearophile <bearophileHUGS lycos.com> writes:

downs:
   auto iter2 = generator((void delegate(int) yield) {
     foreach (entry; it2) yield(entry);
   });
 
   foreach (entry1; it1) {
     auto entry2 = iter2();
     writefln(entry1, " - ", entry2);

Nice :-)

Bye,
bearophile

Jul 04 2008

Fawzi Mohamed <fmohamed mac.com> writes:

Iterating over many collection in parallel is very nice to have, what 
would be nice to have with a cleaner syntax for it.

Aldor had it, and since then I always missed it.
It worked like this:
Iterator/loops can be declared like this:

for a in [1,2,3]
for b in [1,0,7,3]
for i in 1..
while (b<7)
repeat {
	...
}

which mean follow *all* iterators in parallel, and execute ..., when 
any iterator stop, stop everything

very clear, nice and flexible.

Other syntaxes look like a kludge with respect to it.

Python for example has generators/iterators, and builds derived 
iterators one on the top of the other

for a in [1,2,3]
for i in 0..
repeat { ... }

becomes
for i,a in enumerate([1,2,3].iterator()):
  ...

C++ iterators can be used in parallel, because all the difficulty of 
keeping the current position is moved to the iterator writer, but the 
syntax is just plain ugly and writing efficient iterators for complex 
structures can be painful.

Anyway leaving away that syntax, in D without changes one can already 
have a reasonable syntax (even if not as nice).
Using fibers, as in the nice example of downs it should also be 
possible to convert automatically an opApply in a generator, so 
technically it should be feasible, even if 500 cycles per conetext 
switch, might be too much for some applications (tight loops).
So in D one could have a something workable
foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
...
}

where collectLoop is a variadic template function that converts opApply 
to a generator and returns an object having a opApply that loops in 
parallel.
Implementation of it is left as exercice to the reader ;-)

Fawzi

Jul 05 2008

bearophile <bearophileHUGS lycos.com> writes:

Fawzi Mohamed:

 for i,a in enumerate([1,2,3].iterator()):

I may have lost you a bit here, anyway in real Python that code is:

for idx, el in enumerate([1, 2, 3]):
    print idx, el

 Using fibers, as in the nice example of downs it should also be 
 possible to convert automatically an opApply in a generator, so 
 technically it should be feasible, even if 500 cycles per conetext 
 switch, might be too much for some applications (tight loops).
 So in D one could have a something workable
 foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){
 ...
 }
 Implementation of it is left as exercice to the reader ;-)

In my libs there are zip() and azip() for something like that, but they don't
use fibers yet... Once the fibers are added, the code can be made smart at
compile-time, so it can use the current fast code when possible, and fibers
when it must.

Bye,
bearophile

Jul 05 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Koroskin Denis" wrote
 If found myself solving the same problem again and again:
 how to implement simultaneous iteration over two (or more collections)?

 suppose, we have the arrays:
 int[] a = [1, 2, 3];
 int[] b = [1, 4, 9];

 and would like to multiply them per-component, like this:

 int[] c = new int[a.length];
 for (int i = 0; i < a.length; ++j) {
     c[i] = a[i] * b[i];
 }

 Since foreach is the prefered (and sometimes the only) way to iterate over 
 collection, there should be a way to iterate over multiple collections 
 simultaneously, like (just an example) this:

 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
     k = a + b;
 }

 But this isn't supported (yet?)
 More complicated example would be an iterating over two (or more) 
 collection with *no* random access iterators available, opApply only.

 I spend a lot of time on this and have found no solution how to do it with 
 existing D feature set, but this is surely doable with a few 
 inter-function gotos and an exception if collections sizes don't match. 
 It's just a small layer written in asm, nothing more.

 How about an enhancement proposal? :)

The issue here is how to formulate each iteration.  foreach works because it 
is well defined "iterate over each value in the container".  But iterating 
over multiple containers, you could want different orders of iteration, 
iterating for each unique set of indexes, etc.

Or you could want what you had originally stated, in which cases the lengths 
must be correct.  We already have foreach_reverse, which is like foreach, 
but iterates in the opposite direction.  I don't want foreach_X for 
everyones preferred iteration method of choice.

I think the best solution here is a fruct (foreach struct) in a library:

struct myfruct
{
   static myfruct opCall(int[] a, int[] b, int[] c)
   {
      myfruct result;
      result.a = a;
      result.b = b;
      result.c = c;
      assert(a.length == b.length && a.length == c.length);
   }

   int[] a;
   int[] b;
   int[] c;
   int opApply(int delegate(ref int i, ref int j, ref int k) dg)
   {
      int result = 0;
      for(int i = 0; i < a.length; i++)
      {
         if((result = dg(a[i], b[i], c[i])) != 0)
             break;
      }
      return result;
   }
}

usage:

foreach(i, j, ref k; myfruct(a, b, c))
{
    k = i * j;
}

Now, the issue here is, how can this be made generic?  The answer is, we 
need iterators.  Otherwise you cannot iterate over multiple containers that 
are not index-based (such as AA's or maps).  With iterators, that have a 
well defined interface (like opApply), that allows one to increment 
individual indexes, and not have to know what indexes are made of how how 
they work.  They just define opInc and opStar.  Oh, and we would need 
reference return types too (so opStar can return an actual reference).

-Steve

Jul 03 2008

Walter Bright <newshound1 digitalmars.com> writes:

Koroskin Denis wrote:
 I spend a lot of time on this and have found no solution how to do it 
 with existing D feature set, but this is surely doable with a few 
 inter-function gotos and an exception if collections sizes don't match. 

It's a well-known problem, and not so easy to solve in an intuitive, 
non-klunky way. Some of the issues are what do you do when the 
collections are of different sizes? Error? Iterate to the minimum? 
Iterate to the maximum and use default values for the shorter collections?

The best I can offer at the moment is:

     foreach (i; 0 .. a.length)
	a[i] = b[i] + c[i];

Jul 05 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Koroskin Denis wrote:

 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
      k = a + b;
 }

What's wrong with

  auto c= a .* b;

except, that it is neither suggested nor supported?


Or

  auto c= a (*) b;

which was suggested some long time ago, but has not got through?


Is it horrible then to define:

void opDotMul(T)(out T c, T x, T y){
    assert( x.length ==  y.length);
    c.length= x.length;
    foreach( i,v; x){ 
        c[i]= x[i] * y[i];	  
    }
}

-manfred

Jul 05 2008

Yigal Chripun <yigal100 gmail.com> writes:

Manfred_Nowak wrote:
 Koroskin Denis wrote:
 
 int[] c = new int[a.length];
 foreach (i : a; j : b; ref k : c) {
      k = a + b;
 }

 
 What's wrong with
 
   auto c= a .* b;
 
 except, that it is neither suggested nor supported?
 
 
 Or
 
   auto c= a (*) b;
 
 which was suggested some long time ago, but has not got through?
 
 
 Is it horrible then to define:
 
 void opDotMul(T)(out T c, T x, T y){
     assert( x.length ==  y.length);
     c.length= x.length;
     foreach( i,v; x){ 
         c[i]= x[i] * y[i];	  
     }
 }
 
 -manfred

what about:

int[100] a, b, c;
for (int i = 0; i < 100; ++i) {
  a[i] = myCrazyFunkyFunction(b[i], c[i]);
}

D does not and should not have an opMyCrazyFunkyFunction.

Jul 05 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Yigal Chripun wrote:

 D does not and should not have an opMyCrazyFunkyFunction.

- Maybe some knowledge of some types of disagreeing and their relation 
can turn out to be useful:
http://blog.createdebate.com/2008/04/07/writing-strong-arguments/

- Maybe a DigitalMars_funky_extension.opMyCrazyFunkyFunction is allowed 
to exist? At least it's in the language specification:
http://www.digitalmars.com/d/1.0/pragma.html

-manfred

Jul 05 2008

Yigal Chripun <yigal100 gmail.com> writes:

Manfred_Nowak wrote:
 Yigal Chripun wrote:
 
 D does not and should not have an opMyCrazyFunkyFunction.

 
 - Maybe some knowledge of some types of disagreeing and their relation 
 can turn out to be useful:
 http://blog.createdebate.com/2008/04/07/writing-strong-arguments/
 
 - Maybe a DigitalMars_funky_extension.opMyCrazyFunkyFunction is allowed 
 to exist? At least it's in the language specification:
 http://www.digitalmars.com/d/1.0/pragma.html
 
 -manfred

What's so strong about the word "should" ?!
If I wanted to make a "strong argument" as you say, I would've used the
word "must".

Also, I didn't talk about pragmas and compiler specific extensions but
rather meant the core language.

maybe you want to suggest a way to define infix functions?

Last thing: I find it ridiculous you try to teach me the proper way to
post because you didn't like my phrasing while other people use insults
and other bad language on the NG and you're fine with that. That's just
smells of hypocrisy. If you really want to educate someone here, start
with those that curse, swear and troll.

disappointed,
Yigal

Jul 06 2008

"Koroskin Denis" <2korden gmail.com> writes:

On Sun, 06 Jul 2008 01:45:08 +0400, Manfred_Nowak <svv1999 hotmail.com> =
 =

wrote:

 Koroskin Denis wrote:

 int[] c =3D new int[a.length];
 foreach (i : a; j : b; ref k : c) {
      k =3D a + b;
 }

 What's wrong with

   auto c=3D a .* b;

 except, that it is neither suggested nor supported?


 Or

   auto c=3D a (*) b;

 which was suggested some long time ago, but has not got through?


 Is it horrible then to define:

 void opDotMul(T)(out T c, T x, T y){
     assert( x.length =3D=3D  y.length);
     c.length=3D x.length;
     foreach( i,v; x){
         c[i]=3D x[i] * y[i];	=

     }
 }

 -manfred

Nothing wrong. It was just brought as an example.
Replace int[] with a List!(int) that exposes opApply() but not iterators=
  =

and try again :)

The only way (apart from using stackthreads) you can do it now is  =

something like this:

List!(int) a, b, c;

int[] acopy, bcopy;
foreach (i; a) {
     acopy ~=3D i;
}

foreach (j; b) {
     bcopy ~=3D j;
}

int i =3D 0;
foreach (ref k; c) {
     k =3D acopy[i] + bcopy[i];
}

compare with the following:

List!(int) a, b, c;
foreach (i; a) (j; b)(ref k; c) {
     c =3D a + b;
}

which one would you prefer?

Jul 05 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Koroskin Denis wrote:

 List!(int) a, b, c;

[...]
 which one would you prefer?

Still

   c= a .+ b;

To do componentwise operations imposes some requirements for the 
involved types. These are at least:
- an order on the elements, and
- definition(s) for the operation(s) on the elements

If an opApply cannot be interpreted as representing an ordering, than 
your List!-example will fail badly.

-manfred

Jul 05 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Iterating over multiple collections in parallel