www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Parallel For

reply seany <seany uni-bonn.de> writes:

this](https://dotnettutorials.net/lesson/parallel-for-method-csharp/) .

I know that D has parallel foreach [like 
this](http://ddili.org/ders/d.en/parallelism.html).

I want to do the following :

Given 4 sets , A = {a_1, a_2, ... }; B = {b_1, b_2, ... } ; C = 
{c_1 , ... } ; D = {d_1, ... } - I need to make all Cartesian 
products, such as {a_1, b_1, c_1, d_1 }; {a_1, b_1, c_1, d_2}; 
... {a_n, b_n, c_n, d_n} ; then run an operation on the elements .

Then, I need to extract the maximum and minimum values of the 
results - either by storing them in an array, or by some other 
suitable means.

My attempt to solve can be seen in this minimal example :

         import std.stdio;
         import std.math;
         import std.stdio;
         import std.conv;
         import std.format;
         import std.math;
         import std.algorithm;
         import std.net.curl;
         import std.json;
         //import dlib.image;
         import std.path;
         import std.array;
         import std.net.curl;
         import core.stdc.stdlib;
         import std.datetime;
         import std.file;
         import opmix.dup;
         import std.parallelism;


         void main() {

                 int[] a = [1,2,3,4,5,6,7,8,9];
                 int[] b = [11,12,13,14,15,16,17,18];

                 int[] c ;
                 foreach(aa; parallel (a)) {
                         foreach (bb; parallel(b)) {

                                 c ~= aa+bb;

                         }

                 }
                 writeln(c);

         }


I expect the output : `[ 12, 13, 14 ....27 ]` ( not necessarily 
in this order - but that is okey).

I am getting the output : `[12, 13, 14, 15, 16, 17, 18, 19, 14, 
15, 16, 17, 18, 19, 20, 21, 15, 16, 17, 18, 19, 20, 21, 22, 0, 0, 
0, 0, 0, 0, 0, 17, 18, 19, 20, 21, 22, 23, 24, 19, 20, 21, 22, 
18, 0, 0, 0, 16, 21, 18, 22, 23, 24, 25, 20, 21, 22, 19]`.

In the next run : `[12, 13, 14, 15, 16, 17, 18, 19, 14, 15, 16, 
17, 18, 19, 20, 21, 15, 16, 17, 18, 19, 20, 21, 22, 16, 17, 18, 
19, 20, 21, 22, 23, 18, 19, 20, 21, 22, 23, 24, 25, 17, 18, 19, 
19, 21, 20, 23, 23, 24, 24, 20, 21, 22, 23, 24, 25, 26, 27]`

In the next run : `[12, 13, 14, 15, 16, 17, 18, 19, 14, 15, 16, 
17, 0, 0, 16, 18, 19, 20, 21, 17, 18, 19, 20, 15, 16, 17, 18, 19, 
20, 21, 22, 18, 19, 20, 21, 22, 23, 24, 25, 19, 20, 21, 22, 16, 
17, 20, 21, 22, 23, 24, 21, 22, 27, 23]`

In the 4th run : `[12, 13, 14, 15, 16, 17, 18, 19, 15, 16, 17, 
18, 19, 20, 21, 22, 14, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 
22, 23, 24, 25, 26, 27, 18, 20, 22, 23, 24, 25, 20, 21, 22, 23, 
24]`

What am I doing wrong?

Thank you .
Jun 14 2021
next sibling parent seany <seany uni-bonn.de> writes:
On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:

 this](https://dotnettutorials.net/lesson/parallel-for-method-csharp/) .

 [...]
PS : I need the entire include list - while they are not necessary for this minimal example - they are needed for the ful project. That is why I kept them here. DMD DMD64 D Compiler v2.096.1
Jun 14 2021
prev sibling next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
 What am I doing wrong?
add a `writeln(c.length);` in your inner loop and consider the output. If you were always pushing to the end of c, then only unique numbers should be output. But I see e.g. six occurrences of 0, four of 8 ... Here's a non-solution: ```d import std; import core.sync.mutex; void main() { int[] a = [1, 2, 3, 4, 5, 6, 7, 8, 9]; int[] b = [11, 12, 13, 14, 15, 16, 17, 18]; int[] c; shared Mutex mtx = new shared Mutex(); foreach (aa; parallel(a)) { foreach (bb; parallel(b)) { mtx.lock_nothrow(); c ~= aa + bb; mtx.unlock_nothrow(); } } writeln(c); } ``` That solves the inconsistent access to c, but the parallelism is probably pointless. Real solutions might include 1. `c[i] = aa + bb;`, with a calculated unique i per assignment. 2. std.concurrency and message passing to build c? 3. using core.atomic for i?
Jun 15 2021
parent seany <seany uni-bonn.de> writes:
On Tuesday, 15 June 2021 at 07:41:06 UTC, jfondren wrote:
 On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
 [...]
add a `writeln(c.length);` in your inner loop and consider the output. If you were always pushing to the end of c, then only unique numbers should be output. But I see e.g. six occurrences of 0, four of 8 ... [...]
My attempt is to make such huge O(N^4) or O(N^5) algorithms faster. Wouldn't all these calculation of unique `i` make the program slow? I would like to know how to do this properly.
Jun 15 2021
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/14/21 11:39 PM, seany wrote:

 I know that D has parallel foreach [like
 this](http://ddili.org/ders/d.en/parallelism.html).
I gave an example of it in my DConf Online 2020 presentation as well: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1324s
                  int[] c ;
                  foreach(aa; parallel (a)) {
                          foreach (bb; parallel(b)) {

                                  c ~= aa+bb;
That is violating a parallelism requirement that loop bodies must be independent. (I use a similar example during my presentation above.) You need to either pre-allocate the array (as jfondren said) or not store the elements at all but use them independently in the loop body. Yes, std.concurrency is another option. I show a "recipe" of usage here: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1737s Ali
Jun 15 2021
parent seany <seany uni-bonn.de> writes:
On Tuesday, 15 June 2021 at 09:09:29 UTC, Ali Çehreli wrote:
 On 6/14/21 11:39 PM, seany wrote:

 [...]
I gave an example of it in my DConf Online 2020 presentation as well: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1324s
                                  [...]
That is violating a parallelism requirement that loop bodies must be independent. (I use a similar example during my presentation above.) You need to either pre-allocate the array (as jfondren said) or not store the elements at all but use them independently in the loop body. Yes, std.concurrency is another option. I show a "recipe" of usage here: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1737s Ali
Ali Chehreli is an angel, Thank you.
Jun 15 2021
prev sibling parent reply z <z z.com> writes:
On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
...
This is the best I could do: https://run.dlang.io/is/dm8LBP For some reason, LDC refuses to vectorize or even just unroll the nonparallel version, and more than one `parallel` corrupts the results. But judging by the results you expected and what you described, you could maybe replace it by a ton of `c[] = a[] *operand* b[]` operations? Unless you use conditionals after or do something else that confuses the compiler, it will maybe use SSE/AVX instructions, and at worst use basic loop unrolling.
Jun 15 2021
parent jfondren <julian.fondren gmail.com> writes:
On Wednesday, 16 June 2021 at 06:29:21 UTC, z wrote:
 On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:
...
This is the best I could do: https://run.dlang.io/is/dm8LBP For some reason, LDC refuses to vectorize or even just unroll the nonparallel version, and more than one `parallel` corrupts the results.
The same trick as before is useful here: insert a write(i+ii); before the assignment and see if the output looks reasonable. Instead of only unique indices, there are many repeats, as of course (i=0)+(ii=1) == (i=1)+(ii=0) ```d import std; void main() { int[] a = [1, 2, 3, 4, 5, 6, 7, 8, 9]; int[] b = [11, 12, 13, 14, 15, 16, 17, 18]; int[] c = new int[a.length * b.length]; foreach (i, aa; parallel(a)) { foreach (ii, bb; parallel(b)) { c[i * b.length + ii] = aa + bb; } } writeln(c); } ```
Jun 16 2021