www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Yet another parallel foreach + continue question

reply seany <seany uni-bonn.de> writes:
Consider :

     for (int i = 0; i < max_value_of_i; i++) {
         foreach ( j, dummyVar; 
myTaskPool.parallel(array_to_get_j_from, my_workunitSize) {

             if ( boolean_function(i,j) ) continue;
             double d = expensiveFunction(i,j);
             // ... stuff ...
         }
     }

I understand, that the parallel iterator will pick lazily values 
of `j` (up to `my_workunitsize`), and execute the for loop for 
those values in its own thread.

Say, values of `j` from `10`to `20` is filled where 
`my_workunitsize` = 11. Say, at `j = 13` the `boolean_function` 
returns true.

Will then the for loop just jump to the next value of `j = 14`  
like a normal for loop? I am having a bit of difficulty to 
understand this. Thank you.
Jul 19
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 20, 2021 at 12:07:10AM +0000, seany via Digitalmars-d-learn wrote:
 Consider :
 
     for (int i = 0; i < max_value_of_i; i++) {
         foreach ( j, dummyVar; myTaskPool.parallel(array_to_get_j_from,
 my_workunitSize) {
 
             if ( boolean_function(i,j) ) continue;
             double d = expensiveFunction(i,j);
             // ... stuff ...
         }
     }
 
 I understand, that the parallel iterator will pick lazily values of
 `j` (up to `my_workunitsize`), and execute the for loop for those
 values in its own thread.
 
 Say, values of `j` from `10`to `20` is filled where `my_workunitsize`
 = 11.  Say, at `j = 13` the `boolean_function` returns true.
 
 Will then the for loop just jump to the next value of `j = 14`  like a
 normal for loop? I am having a bit of difficulty to understand this.
[...] I didn't test this, but I'm pretty sure `continue` inside a parallel foreach loop simply terminates that iteration early; I don't think it will skip to the next iteration. Basically, what .parallel does under the hood is to create N jobs (where N is the number of items to iterate over), representing N instances of the loop body, and assign them to M worker threads to execute. Then it waits until all N jobs have been completed before it returns. Which order the worker threads will pick up the loop body instances is not specified, and generally is not predictable from user code. The loop body in this case is translated into a delegate that gets passed to the task pool's .opApply method; each worker thread that picks up a job simply invokes the delegate with the right value of the loop variable. A `continue` translates to returning a specific magic value from the delegate that tells .opApply that the loop body finished early. AFAIK, the task pool does not act on this return value, i.e., the other instances of the loop body will execute regardless. T -- Time flies like an arrow. Fruit flies like a banana.
Jul 19
parent reply seany <seany uni-bonn.de> writes:
On Tuesday, 20 July 2021 at 00:37:56 UTC, H. S. Teoh wrote:
 On Tue, Jul 20, 2021 at 12:07:10AM +0000, seany via 
 Digitalmars-d-learn wrote:
 [...]
[...] I didn't test this, but I'm pretty sure `continue` inside a parallel foreach loop simply terminates that iteration early; I don't think it will skip to the next iteration. [...]
Ok, therefore it means that, if at `j = 13 `i use a continue, then the thread where I had `10`... `20` as values of `j`, will only execute for `j = 10, 11, 12 ` and will not reach `14`or later ?
Jul 19
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 20, 2021 at 01:07:22AM +0000, seany via Digitalmars-d-learn wrote:
 On Tuesday, 20 July 2021 at 00:37:56 UTC, H. S. Teoh wrote:
 On Tue, Jul 20, 2021 at 12:07:10AM +0000, seany via Digitalmars-d-learn
 wrote:
 [...]
[...] I didn't test this, but I'm pretty sure `continue` inside a parallel foreach loop simply terminates that iteration early; I don't think it will skip to the next iteration. [...]
Ok, therefore it means that, if at `j = 13 `i use a continue, then the thread where I had `10`... `20` as values of `j`, will only execute for `j = 10, 11, 12 ` and will not reach `14`or later ?
No, it will. Since each iteration is running in parallel, the fact that one of them terminated early should not affect the others. T -- Skill without imagination is craftsmanship and gives us many useful objects such as wickerwork picnic baskets. Imagination without skill gives us modern art. -- Tom Stoppard
Jul 19
parent reply seany <seany uni-bonn.de> writes:
On Tuesday, 20 July 2021 at 02:31:14 UTC, H. S. Teoh wrote:
 On Tue, Jul 20, 2021 at 01:07:22AM +0000, seany via 
 Digitalmars-d-learn wrote:
 On Tuesday, 20 July 2021 at 00:37:56 UTC, H. S. Teoh wrote:
 [...]
Ok, therefore it means that, if at `j = 13 `i use a continue, then the thread where I had `10`... `20` as values of `j`, will only execute for `j = 10, 11, 12 ` and will not reach `14`or later ?
No, it will. Since each iteration is running in parallel, the fact that one of them terminated early should not affect the others. T
Even tho, the workunit specified 11 values to a single thread?
Jul 19
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 20, 2021 at 02:39:58AM +0000, seany via Digitalmars-d-learn wrote:
 On Tuesday, 20 July 2021 at 02:31:14 UTC, H. S. Teoh wrote:
 On Tue, Jul 20, 2021 at 01:07:22AM +0000, seany via Digitalmars-d-learn
 wrote:
 On Tuesday, 20 July 2021 at 00:37:56 UTC, H. S. Teoh wrote:
 [...]
Ok, therefore it means that, if at `j = 13 `i use a continue, then the thread where I had `10`... `20` as values of `j`, will only execute for `j = 10, 11, 12 ` and will not reach `14`or later ?
No, it will. Since each iteration is running in parallel, the fact that one of them terminated early should not affect the others.
[...]
 Even tho, the workunit specified 11 values to a single thread?
Logically speaking, the size of the work unit should not change the semantics of the loop. That's just an implementation detail that should not affect the semantics of the overall computation. In order to maintain consistency, loop iterations should not affect each other (unless they deliberately do so, e.g., read/write from a shared variable -- but parallel foreach itself should not introduce such a dependency). I didn't check the implementation to verify this, but I'm pretty sure `break`, `continue`, etc., in the parallel foreach body does not change which iteration gets run or not. Think of it this way: when you use a parallel foreach, what you're essentially asking for is that, logically speaking, *all* loop iterations start in parallel (even though in actual implementation that doesn't actually happen unless you have as many CPUs as you have iterations). Meaning that by the time a thread gets to the `continue` in a particular iteration, *all* of the other iterations may already have started executing. So it doesn't make sense for any of them to get interrupted just because this particular iteration executes a `continue`. Doing otherwise would introduce all sorts of weird inconsistent semantics that are hard (if not impossible) to reason about. While I'm not 100% sure this is what the current parallel foreach implementation actually does, I'm pretty sure that's the case. It doesn't make sense to do it any other way. T -- Ph.D. = Permanent head Damage
Jul 19
next sibling parent seany <seany uni-bonn.de> writes:
On Tuesday, 20 July 2021 at 02:58:50 UTC, H. S. Teoh wrote:
 On Tue, Jul 20, 2021 at 02:39:58AM +0000, seany via 
 Digitalmars-d-learn wrote:
 [...]
[...]
 [...]
Logically speaking, the size of the work unit should not change the semantics of the loop. That's just an implementation detail that should not affect the semantics of the overall computation. In order to maintain consistency, loop iterations should not affect each other (unless they deliberately do so, e.g., read/write from a shared variable -- but parallel foreach itself should not introduce such a dependency). [...]
Okey, thank you. If you later have some time, and find out about the exact implementation - and help me to understand it - I would be most grateful. I have checked: [this link](https://github.com/dlang/phobos/blob/master/std/parallelism.d) - but did not understand completely.
Jul 19
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/19/21 10:58 PM, H. S. Teoh wrote:

 I didn't check the implementation to verify this, but I'm pretty sure
 `break`, `continue`, etc., in the parallel foreach body does not change
 which iteration gets run or not.
`break` should be undefined behavior (it is impossible to know which loops have already executed by that point). `continue` should be fine. Noted in the [docs](https://dlang.org/phobos/std_parallelism.html#.TaskPool.parallel): Breaking from a parallel foreach loop via a break, labeled break, labeled continue, return or goto statement throws a ParallelForeachError. I would say `continue` is ok (probably just implemented as an early return), but all those others are going to throw an error (unrecoverable). -Steve
Jul 21
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 7/19/21 5:07 PM, seany wrote:

 Consider :

      for (int i = 0; i < max_value_of_i; i++) {
          foreach ( j, dummyVar; myTaskPool.parallel(array_to_get_j_from,
 my_workunitSize) {

              if ( boolean_function(i,j) ) continue;
              double d = expensiveFunction(i,j);
              // ... stuff ...
          }
      }
Arranging the code to its equivalent may reveal the answer: if (!boolean_function(i, j)) { double d = expensiveFunction(i, j); // ... stuff ... } We removed 'continue' and nothing changed and your question disappeared. :)
 I understand, that the parallel iterator will pick lazily values of `j`
 (up to `my_workunitsize`), and execute the for loop for those values in
 its own thread.
Yes.
 Say, values of `j` from `10`to `20` is filled where `my_workunitsize` =
 11. Say, at `j = 13` the `boolean_function` returns true.

 Will then the for loop just jump to the next value of `j = 14` like a
 normal for loop?
Yes.
 I am having a bit of difficulty to understand this.
 Thank you.
parallel is only for performance gain. The 2 knobs that it provides are also for performance reasons: 1) "Use just this many cores, not all" 2) "Process this many elements, not 100 (the default)" because otherwise context switches are too expensive Other than that, it shouldn't be any different from running the loop regularly. Ali
Jul 19