digitalmars.D.learn - Parallel For

seany (62/62) Jun 14 2021 I know that c# has parallel for [like

seany (6/9) Jun 14 2021 PS :
jfondren (30/31) Jun 15 2021 add a `writeln(c.length);` in your inner loop and consider

seany (5/12) Jun 15 2021 My attempt is to make such huge O(N^4) or O(N^5) algorithms

=?UTF-8?Q?Ali_=c3=87ehreli?= (10/16) Jun 15 2021 I gave an example of it in my DConf Online 2020 presentation as well:

seany (2/17) Jun 15 2021 Ali Chehreli is an angel, Thank you.

z (11/12) Jun 15 2021 This is the best I could do: https://run.dlang.io/is/dm8LBP

jfondren (19/25) Jun 16 2021 The same trick as before is useful here: insert a write(i+ii);

seany <seany uni-bonn.de> writes:


this](https://dotnettutorials.net/lesson/parallel-for-method-csharp/) .

I know that D has parallel foreach [like 
this](http://ddili.org/ders/d.en/parallelism.html).

I want to do the following :

Given 4 sets , A = {a_1, a_2, ... }; B = {b_1, b_2, ... } ; C = 
{c_1 , ... } ; D = {d_1, ... } - I need to make all Cartesian 
products, such as {a_1, b_1, c_1, d_1 }; {a_1, b_1, c_1, d_2}; 
... {a_n, b_n, c_n, d_n} ; then run an operation on the elements .

Then, I need to extract the maximum and minimum values of the 
results - either by storing them in an array, or by some other 
suitable means.

My attempt to solve can be seen in this minimal example :

         import std.stdio;
         import std.math;
         import std.stdio;
         import std.conv;
         import std.format;
         import std.math;
         import std.algorithm;
         import std.net.curl;
         import std.json;
         //import dlib.image;
         import std.path;
         import std.array;
         import std.net.curl;
         import core.stdc.stdlib;
         import std.datetime;
         import std.file;
         import opmix.dup;
         import std.parallelism;


         void main() {

                 int[] a = [1,2,3,4,5,6,7,8,9];
                 int[] b = [11,12,13,14,15,16,17,18];

                 int[] c ;
                 foreach(aa; parallel (a)) {
                         foreach (bb; parallel(b)) {

                                 c ~= aa+bb;

                         }

                 }
                 writeln(c);

         }


I expect the output : `[ 12, 13, 14 ....27 ]` ( not necessarily 
in this order - but that is okey).

I am getting the output : `[12, 13, 14, 15, 16, 17, 18, 19, 14, 
15, 16, 17, 18, 19, 20, 21, 15, 16, 17, 18, 19, 20, 21, 22, 0, 0, 
0, 0, 0, 0, 0, 17, 18, 19, 20, 21, 22, 23, 24, 19, 20, 21, 22, 
18, 0, 0, 0, 16, 21, 18, 22, 23, 24, 25, 20, 21, 22, 19]`.

In the next run : `[12, 13, 14, 15, 16, 17, 18, 19, 14, 15, 16, 
17, 18, 19, 20, 21, 15, 16, 17, 18, 19, 20, 21, 22, 16, 17, 18, 
19, 20, 21, 22, 23, 18, 19, 20, 21, 22, 23, 24, 25, 17, 18, 19, 
19, 21, 20, 23, 23, 24, 24, 20, 21, 22, 23, 24, 25, 26, 27]`

In the next run : `[12, 13, 14, 15, 16, 17, 18, 19, 14, 15, 16, 
17, 0, 0, 16, 18, 19, 20, 21, 17, 18, 19, 20, 15, 16, 17, 18, 19, 
20, 21, 22, 18, 19, 20, 21, 22, 23, 24, 25, 19, 20, 21, 22, 16, 
17, 20, 21, 22, 23, 24, 21, 22, 27, 23]`

In the 4th run : `[12, 13, 14, 15, 16, 17, 18, 19, 15, 16, 17, 
18, 19, 20, 21, 22, 14, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 
22, 23, 24, 25, 26, 27, 18, 20, 22, 23, 24, 25, 20, 21, 22, 23, 
24]`

What am I doing wrong?

Thank you .

Jun 14 2021

seany <seany uni-bonn.de> writes:

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:

 this](https://dotnettutorials.net/lesson/parallel-for-method-csharp/) .

 [...]

PS :

I need the entire include list - while they are not necessary for 
this minimal example - they are needed for the ful project. That 
is why I kept them here.

DMD DMD64 D Compiler v2.096.1

Jun 14 2021

jfondren <julian.fondren gmail.com> writes:

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
 What am I doing wrong?

add a `writeln(c.length);` in your inner loop and consider
the output. If you were always pushing to the end of c, then
only unique numbers should be output. But I see e.g. six
occurrences of 0, four of 8 ...

Here's a non-solution:

```d
import std;
import core.sync.mutex;

void main() {
     int[] a = [1, 2, 3, 4, 5, 6, 7, 8, 9];
     int[] b = [11, 12, 13, 14, 15, 16, 17, 18];
     int[] c;
     shared Mutex mtx = new shared Mutex();
     foreach (aa; parallel(a)) {
         foreach (bb; parallel(b)) {
             mtx.lock_nothrow();
             c ~= aa + bb;
             mtx.unlock_nothrow();
         }
     }
     writeln(c);
}
```

That solves the inconsistent access to c, but the parallelism
is probably pointless.

Real solutions might include

1. `c[i] = aa + bb;`, with a calculated unique i per assignment.

2. std.concurrency and message passing to build c?

3. using core.atomic for i?

Jun 15 2021

seany <seany uni-bonn.de> writes:

On Tuesday, 15 June 2021 at 07:41:06 UTC, jfondren wrote:
 On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
 [...]

 add a `writeln(c.length);` in your inner loop and consider
 the output. If you were always pushing to the end of c, then
 only unique numbers should be output. But I see e.g. six
 occurrences of 0, four of 8 ...

 [...]

My attempt is to make such huge O(N^4) or O(N^5) algorithms 
faster.

Wouldn't all these calculation of unique `i` make the program 
slow?  I would like to know how to do this properly.

Jun 15 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 6/14/21 11:39 PM, seany wrote:

 I know that D has parallel foreach [like
 this](http://ddili.org/ders/d.en/parallelism.html).

I gave an example of it in my DConf Online 2020 presentation as well:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1324s

                  int[] c ;
                  foreach(aa; parallel (a)) {
                          foreach (bb; parallel(b)) {

                                  c ~= aa+bb;

That is violating a parallelism requirement that loop bodies must be 
independent. (I use a similar example during my presentation above.) You 
need to either pre-allocate the array (as jfondren said) or not store 
the elements at all but use them independently in the loop body.

Yes, std.concurrency is another option. I show a "recipe" of usage here:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1737s

Ali

Jun 15 2021

seany <seany uni-bonn.de> writes:

On Tuesday, 15 June 2021 at 09:09:29 UTC, Ali Çehreli wrote:
 On 6/14/21 11:39 PM, seany wrote:

 [...]

 I gave an example of it in my DConf Online 2020 presentation as 
 well:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1324s

                                  [...]

 That is violating a parallelism requirement that loop bodies 
 must be independent. (I use a similar example during my 
 presentation above.) You need to either pre-allocate the array 
 (as jfondren said) or not store the elements at all but use 
 them independently in the loop body.

 Yes, std.concurrency is another option. I show a "recipe" of 
 usage here:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1737s

 Ali

Ali Chehreli is an angel, Thank you.

Jun 15 2021

z <z z.com> writes:

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
...

This is the best I could do: https://run.dlang.io/is/dm8LBP
For some reason, LDC refuses to vectorize or even just unroll the 
nonparallel version, and more than one `parallel` corrupts the 
results.
But judging by the results you expected and what you described, 
you could maybe replace it by a ton of `c[] = a[] *operand* b[]` 
operations?
Unless you use conditionals after or do something else that 
confuses the compiler, it will maybe use SSE/AVX instructions, 
and at worst use basic loop unrolling.

Jun 15 2021

jfondren <julian.fondren gmail.com> writes:

On Wednesday, 16 June 2021 at 06:29:21 UTC, z wrote:
 On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
...

 This is the best I could do: https://run.dlang.io/is/dm8LBP
 For some reason, LDC refuses to vectorize or even just unroll 
 the nonparallel version, and more than one `parallel` corrupts 
 the results.

The same trick as before is useful here: insert a write(i+ii);
before the assignment and see if the output looks reasonable.

Instead of only unique indices, there are many repeats, as of
course (i=0)+(ii=1) == (i=1)+(ii=0)

```d
import std;

void main() {
     int[] a = [1, 2, 3, 4, 5, 6, 7, 8, 9];
     int[] b = [11, 12, 13, 14, 15, 16, 17, 18];
     int[] c = new int[a.length * b.length];
     foreach (i, aa; parallel(a)) {
         foreach (ii, bb; parallel(b)) {
             c[i * b.length + ii] = aa + bb;
         }
     }
     writeln(c);
}
```

Jun 16 2021

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Parallel For