www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Recommendation for parallelism with nested for loops?

reply Shriramana Sharma writes:
Hello. I want to parallelize a computation which has two for 
loops, one nested within another. All 
inner-loop-param+outer-loop-param combinations can be computed 
independent of one another.

As I suspected, 
[https://forum.dlang.org/post/xysyidbkjdinclmrxzgt forum.dlang.org](this forum
post) says that only one loop can be parallelized. Will it be an error or
inefficient or useless if I try to do both?

Also, what is the best way to do parallelism in such a situation?
Aug 18 2022
next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Friday, 19 August 2022 at 01:49:43 UTC, Shriramana Sharma 
wrote:
 Also, what is the best way to do parallelism in such a 
 situation?
If the inner loops are about the same size for each pass of the outer loop, you can just simply parallel on the outer loop and get the same benefit. Even if they aren't equal, you'll get decent benefit from parallel on the outer one alone, but not as good since the work won't be balanced. Still, that's what I'd try.
Aug 18 2022
parent bachmeier <no spam.net> writes:
On Friday, 19 August 2022 at 02:02:57 UTC, Adam D Ruppe wrote:

 Even if they aren't equal, you'll get decent benefit from 
 parallel on the outer one alone, but not as good since the work 
 won't be balanced.
Unless there's some kind of blocking going on in D's implementation, if the number of passes on the outer loop is large enough relative to the number of cores, applying parallel to the outer loop is the best you can do - uneven amounts of work on the inner loop will get spread out across cores. There are always counterexamples, but the ratio of passes to cores needs to be pretty low for further optimizations to have any chance to help.
Aug 19 2022
prev sibling next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 8/18/22 18:49, Shriramana Sharma wrote:
 Hello. I want to parallelize a computation which has two for loops
An option is to add tasks individually but I am not sure how wise doing this and I don't know how to determine whether all tasks are completed. In any case, Roy Margalit's DConf 2022 presentation is very much on topic. :) http://dconf.org/2022/index.html#roym And the following program cannot show any benefit because the tasks are so short. import std.stdio; import std.parallelism; import std.conv; enum I = 1_000; enum J = 1_000; void main() { auto results = new int[I * J]; // In case you want a new TaskPool: // auto tp = new TaskPool(totalCPUs); // (And use tp. below instead of taskPool.) foreach (i; 0 .. I) { foreach (j; 0 .. J) { taskPool.put(task!foo(i, j, results)); } } // WARNING: I am not sure whether one can trust the results are // ready yet. (?) // // parallel() does call yieldForce() on each task but we don't seem // to have that option for tasks that are .put() into the pool. (?) enum toPrint = 10; writeln(results[0..toPrint]); writeln("[...]"); writeln(results[$-toPrint..$]); } void foo(size_t i, size_t j, int[] results) { results[i * J + j] = to!int(i * J + j); } Ali
Aug 18 2022
prev sibling parent reply =?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:
On 19.08.22 03:49, Shriramana Sharma wrote:
 Hello. I want to parallelize a computation which has two for loops, one 
 nested within another. All inner-loop-param+outer-loop-param 
 combinations can be computed independent of one another.
 
 As I suspected, 
 [https://forum.dlang.org/post/xysyidbkjdinclmrxzgt forum.dlang.org](this 
 forum post) says that only one loop can be parallelized. Will it be an 
 error or inefficient or useless if I try to do both?
 
 Also, what is the best way to do parallelism in such a situation?
You could also do a custom range that makes a one-dimensional range (aka iota out of your nested loops) and then process this with parallel. Another way (more similar to Ali's solution) would be to write the nested loops as you have them, but collect the parameters for the work in an array or something and then process this array in parallel. Kind regards, Christian
Aug 20 2022
parent =?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:
On 20.08.22 12:28, Christian Köstlin wrote:
 On 19.08.22 03:49, Shriramana Sharma wrote:
 Hello. I want to parallelize a computation which has two for loops, 
 one nested within another. All inner-loop-param+outer-loop-param 
 combinations can be computed independent of one another.

 As I suspected, 
 [https://forum.dlang.org/post/xysyidbkjdinclmrxzgt forum.dlang.org](this forum
post) says that only one loop can be parallelized. Will it be an error or
inefficient or useless if I try to do both?

 Also, what is the best way to do parallelism in such a situation?
You could also do a custom range that makes a one-dimensional range (aka iota out of your nested loops) and then process this with parallel. Another way (more similar to Ali's solution) would be to write the nested loops as you have them, but collect the parameters for the work in an array or something and then process this array in parallel. Kind regards, Christian
Actually phobos brings already all that is required (especially the cartesian product): ```d import std; void doSomething(int i, string s) { writeln("%s-%s".format(i, s)); } void main() { foreach (t; cartesianProduct(iota(1, 4), ["abc", "def", "ghi"]).parallel) { doSomething(t.expand); } } ``` kind regards, Christian
Aug 20 2022