digitalmars.D.learn - How to call stop from parallel foreach
- seany (17/17) Jun 24 2021 I have seen
- Jerry (7/24) Jun 24 2021 Maybe I'm wrong here, but I don't think there is any way to do
- seany (3/11) Jun 24 2021 The parallel() function is running a taskpool. I should be able
- seany (18/30) Jun 24 2021 PS :
- Bastiaan Veelo (20/26) Jun 24 2021 Yes there is, but it won’t break the `foreach`:
- seany (15/20) Jun 24 2021 Okey. So consider :
- Bastiaan Veelo (14/35) Jun 24 2021 You can nest multiple `foreach`, but only parallelise one like so:
- Bastiaan Veelo (4/16) Jun 24 2021 Actually, I think this would be suboptimal as well, as the three
- =?UTF-8?Q?Ali_=c3=87ehreli?= (17/19) Jun 24 2021 Yes. You have to create a task pool explicitly:
- seany (17/37) Jun 25 2021 I tried this .
- seany (3/21) Jun 25 2021 PS. line
- =?UTF-8?Q?Ali_=c3=87ehreli?= (15/31) Jun 25 2021 Performance is not guaranteed depending on many factors. For example,
- seany (5/28) Jun 25 2021 The code without the parallel foreach works fine. No segfault.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (7/8) Jun 25 2021 That's very common.
- seany (2/10) Jun 25 2021 I have added MWP. did you have a chance to look at it?
- jfondren (13/25) Jun 25 2021 A self-contained and complete example would help a lot, but the
- seany (4/21) Jun 25 2021 This particular location does not cause segfault.
- seany (4/28) Jun 25 2021 [Here is MWP](https://github.com/naturalmechanics/mwp).
- jfondren (8/16) Jun 25 2021 With ldc2 this segfaults for me even if std.parallelism is removed
- =?UTF-8?Q?Ali_=c3=87ehreli?= (3/4) Jun 24 2021 Last time I checked, the current thread would run tasks as well.
- Bastiaan Veelo (3/7) Jun 24 2021 Indeed, thanks.
I have seen [this](https://forum.dlang.org/thread/akhbvvjgeaspmjntznyk forum.dlang.org). I can't call break form parallel foreach. Okey, Is there a way to easily call .stop() from such a case? Here is a case to consider: outer: foreach(i, a; parallel(array_of_a)) { foreach(j, b; parallel(array_of_b)) { auto c = myFunction0(i,j); auto d = myFunction1(a,b); auto f = myFunction2(i,b); auto g = myFunction3(a,j); if(someConditionCheck(c,d,f,g)) { // stop the outer foreach loop here } } } Thank you
Jun 24 2021
On Thursday, 24 June 2021 at 18:23:01 UTC, seany wrote:I have seen [this](https://forum.dlang.org/thread/akhbvvjgeaspmjntznyk forum.dlang.org). I can't call break form parallel foreach. Okey, Is there a way to easily call .stop() from such a case? Here is a case to consider: outer: foreach(i, a; parallel(array_of_a)) { foreach(j, b; parallel(array_of_b)) { auto c = myFunction0(i,j); auto d = myFunction1(a,b); auto f = myFunction2(i,b); auto g = myFunction3(a,j); if(someConditionCheck(c,d,f,g)) { // stop the outer foreach loop here } } } Thank youMaybe I'm wrong here, but I don't think there is any way to do that with parallel. What I would do is negate someConditionCheck and instead only do work when there is work to be done. Obviously that may or may not be suitable. But with parallel I don't see any way to make it happen.
Jun 24 2021
On Thursday, 24 June 2021 at 19:46:52 UTC, Jerry wrote:On Thursday, 24 June 2021 at 18:23:01 UTC, seany wrote:The parallel() function is running a taskpool. I should be able to stop that in any case...[...]Maybe I'm wrong here, but I don't think there is any way to do that with parallel. What I would do is negate someConditionCheck and instead only do work when there is work to be done. Obviously that may or may not be suitable. But with parallel I don't see any way to make it happen.
Jun 24 2021
On Thursday, 24 June 2021 at 20:08:06 UTC, seany wrote:On Thursday, 24 June 2021 at 19:46:52 UTC, Jerry wrote:PS : I have done this : parallelContainer: while(true) { outer: foreach(i, a; parallel(array_of_a)) { foreach(j, b; parallel(array_of_b)) { auto c = myFunction0(i,j); auto d = myFunction1(a,b); auto f = myFunction2(i,b); auto g = myFunction3(a,j); if(someConditionCheck(c,d,f,g)) { break parallelContainer; } } break; } Is this safe? Will this cause any problem that I can't foresee now? Thank youOn Thursday, 24 June 2021 at 18:23:01 UTC, seany wrote:The parallel() function is running a taskpool. I should be able to stop that in any case...[...]Maybe I'm wrong here, but I don't think there is any way to do that with parallel. What I would do is negate someConditionCheck and instead only do work when there is work to be done. Obviously that may or may not be suitable. But with parallel I don't see any way to make it happen.
Jun 24 2021
On Thursday, 24 June 2021 at 18:23:01 UTC, seany wrote:I have seen [this](https://forum.dlang.org/thread/akhbvvjgeaspmjntznyk forum.dlang.org). I can't call break form parallel foreach. Okey, Is there a way to easily call .stop() from such a case?Yes there is, but it won’t break the `foreach`: ```d auto tp = taskPool; foreach (i, ref e; tp.parallel(a)) { // … tp.stop; } ``` The reason this does not work is because `stop` terminates the worker threads as soon as they are finished with their current `Task`, but no sooner. `parallel` creates the `Task`s before it presents a range to `foreach`, so no new `Task`s are created during iteration. Therefore all elements are iterated.outer: foreach(i, a; parallel(array_of_a)) { foreach(j, b; parallel(array_of_b)) {By the way, nesting parallel `foreach` does not make much sense, as one level already distributes the load across all cores (but one). Additional parallelisation will likely just add overhead, and have a net negative effect. — Bastiaan.
Jun 24 2021
On Thursday, 24 June 2021 at 20:33:00 UTC, Bastiaan Veelo wrote:By the way, nesting parallel `foreach` does not make much sense, as one level already distributes the load across all cores (but one). Additional parallelisation will likely just add overhead, and have a net negative effect. — Bastiaan.Okey. So consider : foreach(array_elem; parallel(an_array)) { dothing(array_elem); } and then in `dothing()` : foreach(subelem; array_elem) { dootherthing(subelem); } - Will this ALSO cause the same overhead? Is there any way to control the number of CPU cores used in parallelization ? E.g : take 3 cores for the first parallel foreach - and then for the second one, take 3 cores each -> so 3 + 3 * 3 = 12 cores out of a 16 core system? Thank you.
Jun 24 2021
On Thursday, 24 June 2021 at 20:41:40 UTC, seany wrote:On Thursday, 24 June 2021 at 20:33:00 UTC, Bastiaan Veelo wrote:You can nest multiple `foreach`, but only parallelise one like so: ```d foreach(array_elem; parallel(an_array)) foreach(subelem; array_elem) dootherthing(subelem); ``` So there is no need to hide one of them in a function.By the way, nesting parallel `foreach` does not make much sense, as one level already distributes the load across all cores (but one). Additional parallelisation will likely just add overhead, and have a net negative effect. — Bastiaan.Okey. So consider : foreach(array_elem; parallel(an_array)) { dothing(array_elem); } and then in `dothing()` : foreach(subelem; array_elem) { dootherthing(subelem); } - Will this ALSO cause the same overhead?Is there any way to control the number of CPU cores used in parallelization ? E.g : take 3 cores for the first parallel foreach - and then for the second one, take 3 cores each -> so 3 + 3 * 3 = 12 cores out of a 16 core system? Thank you.There might be, by using various `TaskPool`s with a smaller number of work threads: https://dlang.org/phobos/std_parallelism.html#.TaskPool.this.2. But I cannot see the benefit of doing this. It will just distribute the same amount of work in a different way. — Bastiaan.
Jun 24 2021
On Thursday, 24 June 2021 at 21:05:28 UTC, Bastiaan Veelo wrote:On Thursday, 24 June 2021 at 20:41:40 UTC, seany wrote:Actually, I think this would be suboptimal as well, as the three outer threads seem to do no real work. — Bastiaan.Is there any way to control the number of CPU cores used in parallelization ? E.g : take 3 cores for the first parallel foreach - and then for the second one, take 3 cores each -> so 3 + 3 * 3 = 12 cores out of a 16 core system? Thank you.There might be, by using various `TaskPool`s with a smaller number of work threads: https://dlang.org/phobos/std_parallelism.html#.TaskPool.this.2. But I cannot see the benefit of doing this. It will just distribute the same amount of work in a different way.
Jun 24 2021
On 6/24/21 1:41 PM, seany wrote:Is there any way to control the number of CPU cores used in parallelization ?Yes. You have to create a task pool explicitly: import std.parallelism; void main() { enum threadCount = 2; auto myTaskPool = new TaskPool(threadCount); scope (exit) { myTaskPool.finish(); } enum workUnitSize = 1; // Or 42 or something else. :) foreach (e; myTaskPool.parallel([ 1, 2, 3 ], workUnitSize)) { // ... } } I've touched on a few parallelism concepts at this point in a presentation: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1332s Ali
Jun 24 2021
On Thursday, 24 June 2021 at 21:19:19 UTC, Ali Çehreli wrote:On 6/24/21 1:41 PM, seany wrote:I tried this . int[][] pnts ; pnts.length = fld.length; enum threadCount = 2; auto prTaskPool = new TaskPool(threadCount); scope (exit) { prTaskPool.finish(); } enum workUnitSize = 1; foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) { //.... } This is throwing random segfaults. CPU has 2 cores, but usage is not going above 37% Even much deeper down in program, much further down the line... And the location of segfault is random.Is there any way to control the number of CPU cores used in parallelization ?Yes. You have to create a task pool explicitly: import std.parallelism; void main() { enum threadCount = 2; auto myTaskPool = new TaskPool(threadCount); scope (exit) { myTaskPool.finish(); } enum workUnitSize = 1; // Or 42 or something else. :) foreach (e; myTaskPool.parallel([ 1, 2, 3 ], workUnitSize)) { // ... } } I've touched on a few parallelism concepts at this point in a presentation: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1332s Ali
Jun 25 2021
On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:On Thursday, 24 June 2021 at 21:19:19 UTC, Ali Çehreli wrote:PS. line [this](https://forum.dlang.org/thread/gomhpxzolddnodaeyvlh forum.dlang.org) I am running into bus errors Too , sometimes way down the line after these foreach calls are completed...[...]I tried this . int[][] pnts ; pnts.length = fld.length; enum threadCount = 2; auto prTaskPool = new TaskPool(threadCount); scope (exit) { prTaskPool.finish(); } enum workUnitSize = 1; foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) { //.... } This is throwing random segfaults. CPU has 2 cores, but usage is not going above 37% Even much deeper down in program, much further down the line... And the location of segfault is random.
Jun 25 2021
On 6/25/21 6:53 AM, seany wrote:I tried this . int[][] pnts ; pnts.length = fld.length; enum threadCount = 2; auto prTaskPool = new TaskPool(threadCount); scope (exit) { prTaskPool.finish(); } enum workUnitSize = 1; foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) { //.... } This is throwing random segfaults. CPU has 2 cores, but usage is not going above 37%Performance is not guaranteed depending on many factors. For example, inserting a writeln() call in the loop would make all threads compete with each other for stdout. There can be many contention points some of which depending on your program logic. (And "Amdahl's Law" applies.) Another reason: 1 can be a horrible value for workUnitSize. Try 100, 1000, etc. and see whether it helps with performance.Even much deeper down in program, much further down the line... And the location of segfault is random.Do you still have two parallel loops? Are both with explicit TaskPool objects? If not, I wonder whether multiple threads are using the convenient 'parallel' function, stepping over each others' toes. (I am not sure about this because perhaps it's safe to do this; never tested.) It is possible that the segfaults are caused by your code. The code you showed in your original post (myFunction0() and others), they all work on independent data structures, right? Ali
Jun 25 2021
On Friday, 25 June 2021 at 14:10:52 UTC, Ali Çehreli wrote:On 6/25/21 6:53 AM, seany wrote:The code without the parallel foreach works fine. No segfault. In several instances, I do have multiple nested loops, but in every case. only the outer one in parallel foreach. All of them are with explicit taskpool definition.[...]workUnitSize)) {[...]Performance is not guaranteed depending on many factors. For example, inserting a writeln() call in the loop would make all threads compete with each other for stdout. There can be many contention points some of which depending on your program logic. (And "Amdahl's Law" applies.) Another reason: 1 can be a horrible value for workUnitSize. Try 100, 1000, etc. and see whether it helps with performance.[...]line...[...]Do you still have two parallel loops? Are both with explicit TaskPool objects? If not, I wonder whether multiple threads are using the convenient 'parallel' function, stepping over each others' toes. (I am not sure about this because perhaps it's safe to do this; never tested.) It is possible that the segfaults are caused by your code. The code you showed in your original post (myFunction0() and others), they all work on independent data structures, right? Ali
Jun 25 2021
On 6/25/21 7:21 AM, seany wrote:The code without the parallel foreach works fine. No segfault.That's very common. What I meant is, is the code written in a way to work safely in a parallel foreach loop? (i.e. Is the code "independent"?) (But I assume it is because it's been the common theme in this thread; so there must be something stranger going on.) Ali
Jun 25 2021
On Friday, 25 June 2021 at 15:08:38 UTC, Ali Çehreli wrote:On 6/25/21 7:21 AM, seany wrote:I have added MWP. did you have a chance to look at it?The code without the parallel foreach works fine. No segfault.That's very common. What I meant is, is the code written in a way to work safely in a parallel foreach loop? (i.e. Is the code "independent"?) (But I assume it is because it's been the common theme in this thread; so there must be something stranger going on.) Ali
Jun 25 2021
On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:I tried this . int[][] pnts ; pnts.length = fld.length; enum threadCount = 2; auto prTaskPool = new TaskPool(threadCount); scope (exit) { prTaskPool.finish(); } enum workUnitSize = 1; foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) { //.... }A self-contained and complete example would help a lot, but the likely problem with this code is that you're accessing pnts[y][x] in the loop, which makes the loop bodies no longer independent because some of them need to first allocate an int[] to replace the zero-length pnts[y] that you're starting with. Consider: ``` $ rdmd --eval 'int[][] p; p.length = 5; p.map!"a.length".writeln' [0, 0, 0, 0, 0] ```
Jun 25 2021
On Friday, 25 June 2021 at 14:13:14 UTC, jfondren wrote:On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:This particular location does not cause segfault. It is segfaulting down the line in a completely unrelated location... Wait I will try to make a MWP.[...]A self-contained and complete example would help a lot, but the likely problem with this code is that you're accessing pnts[y][x] in the loop, which makes the loop bodies no longer independent because some of them need to first allocate an int[] to replace the zero-length pnts[y] that you're starting with. Consider: ``` $ rdmd --eval 'int[][] p; p.length = 5; p.map!"a.length".writeln' [0, 0, 0, 0, 0] ```
Jun 25 2021
On Friday, 25 June 2021 at 14:22:25 UTC, seany wrote:On Friday, 25 June 2021 at 14:13:14 UTC, jfondren wrote:[Here is MWP](https://github.com/naturalmechanics/mwp). Please compile with `dub build -b release --compiler=ldc2 `. Then to run, please use : `./tracker_ai --filename 21010014-86.ptl `On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:This particular location does not cause segfault. It is segfaulting down the line in a completely unrelated location... Wait I will try to make a MWP.[...]A self-contained and complete example would help a lot, but the likely problem with this code is that you're accessing pnts[y][x] in the loop, which makes the loop bodies no longer independent because some of them need to first allocate an int[] to replace the zero-length pnts[y] that you're starting with. Consider: ``` $ rdmd --eval 'int[][] p; p.length = 5; p.map!"a.length".writeln' [0, 0, 0, 0, 0] ```
Jun 25 2021
On Friday, 25 June 2021 at 14:44:13 UTC, seany wrote:With ldc2 this segfaults for me even if std.parallelism is removed entirely. With DMD and std.parallelism removed it runs to completion. With DMD and no changes it never seems to finish. I reckon that there's some other memory error and that the parallelism is unrelated.This particular location does not cause segfault. It is segfaulting down the line in a completely unrelated location... Wait I will try to make a MWP.[Here is MWP](https://github.com/naturalmechanics/mwp). Please compile with `dub build -b release --compiler=ldc2 `. Then to run, please use : `./tracker_ai --filename 21010014-86.ptl `
Jun 25 2021
On Friday, 25 June 2021 at 15:16:30 UTC, jfondren wrote:I reckon that there's some other memory error and that the parallelism is unrelated.safe: ``` source/AI.d(83,23): Error: cannot take address of local `rData` in ` safe` function `main` source/analysisEngine.d(560,20): Error: cannot take address of local `rd_flattened` in ` safe` function `add_missingPoints` source/analysisEngine.d(344,5): Error: can only catch class objects derived from `Exception` in ` safe` code, not `core.exception.RangeError` ``` And then, about half complaints about system dlib calls and complaints about `void*` <-> `rawData[]*` casting The RangeError catch likely does nothing with your compile flags. Even if you don't intend to use safe in the end, I'd bet that the segfaults are due to what it's complaining about.
Jun 25 2021
On Friday, 25 June 2021 at 15:16:30 UTC, jfondren wrote:On Friday, 25 June 2021 at 14:44:13 UTC, seany wrote:Try : (this version)[https://github.com/naturalmechanics/mwp/tree/nested-loops] The goal is to parallelize : `calculate_avgSweepDist_pairwise` at line `3836`. Notice there we have 6 nested loops. Thank you.With ldc2 this segfaults for me even if std.parallelism is removed entirely. With DMD and std.parallelism removed it runs to completion. With DMD and no changes it never seems to finish. I reckon that there's some other memory error and that the parallelism is unrelated.This particular location does not cause segfault. It is segfaulting down the line in a completely unrelated location... Wait I will try to make a MWP.[Here is MWP](https://github.com/naturalmechanics/mwp). Please compile with `dub build -b release --compiler=ldc2 `. Then to run, please use : `./tracker_ai --filename 21010014-86.ptl `
Jun 25 2021
On Friday, 25 June 2021 at 15:50:37 UTC, seany wrote:On Friday, 25 June 2021 at 15:16:30 UTC, jfondren wrote:Ok, i stopped the buss error and the segfault. It was indeed an index that was written wrong in the flattened version . No, I dont have the seg fault any more. But I have "error creating thread" - time to time. Not always. But, even with the taskpool, it is not spreading to multiple cores.On Friday, 25 June 2021 at 14:44:13 UTC, seany wrote:Try : (this version)[https://github.com/naturalmechanics/mwp/tree/nested-loops] The goal is to parallelize : `calculate_avgSweepDist_pairwise` at line `3836`. Notice there we have 6 nested loops. Thank you.With ldc2 this segfaults for me even if std.parallelism is removed entirely. With DMD and std.parallelism removed it runs to completion. With DMD and no changes it never seems to finish. I reckon that there's some other memory error and that the parallelism is unrelated.This particular location does not cause segfault. It is segfaulting down the line in a completely unrelated location... Wait I will try to make a MWP.[Here is MWP](https://github.com/naturalmechanics/mwp). Please compile with `dub build -b release --compiler=ldc2 `. Then to run, please use : `./tracker_ai --filename 21010014-86.ptl `
Jun 25 2021
On Friday, 25 June 2021 at 16:37:06 UTC, seany wrote:On Friday, 25 June 2021 at 15:50:37 UTC, seany wrote:PS: this is the error message : "core.thread.threadbase.ThreadError src/core/thread/threadbase.d(1219): Error creating thread"On Friday, 25 June 2021 at 15:16:30 UTC, jfondren wrote:Ok, i stopped the buss error and the segfault. It was indeed an index that was written wrong in the flattened version . No, I dont have the seg fault any more. But I have "error creating thread" - time to time. Not always. But, even with the taskpool, it is not spreading to multiple cores.[...]Try : (this version)[https://github.com/naturalmechanics/mwp/tree/nested-loops] The goal is to parallelize : `calculate_avgSweepDist_pairwise` at line `3836`. Notice there we have 6 nested loops. Thank you.
Jun 25 2021
On Friday, 25 June 2021 at 16:37:44 UTC, seany wrote:On Friday, 25 June 2021 at 16:37:06 UTC, seany wrote:If i use `parallel(...)`it runs. If i use `prTaskPool.parallel(...`, then in the line : `auto prTaskPool = new TaskPool(threadCount);` it hits the error. Please help.On Friday, 25 June 2021 at 15:50:37 UTC, seany wrote:PS: this is the error message : "core.thread.threadbase.ThreadError src/core/thread/threadbase.d(1219): Error creating thread"On Friday, 25 June 2021 at 15:16:30 UTC, jfondren wrote:Ok, i stopped the buss error and the segfault. It was indeed an index that was written wrong in the flattened version . No, I dont have the seg fault any more. But I have "error creating thread" - time to time. Not always. But, even with the taskpool, it is not spreading to multiple cores.[...]Try : (this version)[https://github.com/naturalmechanics/mwp/tree/nested-loops] The goal is to parallelize : `calculate_avgSweepDist_pairwise` at line `3836`. Notice there we have 6 nested loops. Thank you.
Jun 25 2021
On Friday, 25 June 2021 at 19:17:38 UTC, seany wrote:If i use `parallel(...)`it runs. If i use `prTaskPool.parallel(...`, then in the line : `auto prTaskPool = new TaskPool(threadCount);` it hits the error. Please help.parallel() reuses a single taskPool that's only established once. Your code creates two TaskPools per a function invocation and you call that function in a loop. stracing your program might again reveal the error you're hitting.
Jun 25 2021
On Friday, 25 June 2021 at 19:30:16 UTC, jfondren wrote:On Friday, 25 June 2021 at 19:17:38 UTC, seany wrote:I have removed one - same problem. Yes, I do call it in a loop. how can I create a taskpool in a function that itself will be called in a loop?If i use `parallel(...)`it runs. If i use `prTaskPool.parallel(...`, then in the line : `auto prTaskPool = new TaskPool(threadCount);` it hits the error. Please help.parallel() reuses a single taskPool that's only established once. Your code creates two TaskPools per a function invocation and you call that function in a loop. stracing your program might again reveal the error you're hitting.
Jun 25 2021
On Friday, 25 June 2021 at 19:52:23 UTC, seany wrote:On Friday, 25 June 2021 at 19:30:16 UTC, jfondren wrote:One option is to do as parallel does: https://github.com/dlang/phobos/blob/master/std/parallelism.d#L3508On Friday, 25 June 2021 at 19:17:38 UTC, seany wrote:I have removed one - same problem. Yes, I do call it in a loop. how can I create a taskpool in a function that itself will be called in a loop?If i use `parallel(...)`it runs. If i use `prTaskPool.parallel(...`, then in the line : `auto prTaskPool = new TaskPool(threadCount);` it hits the error. Please help.parallel() reuses a single taskPool that's only established once. Your code creates two TaskPools per a function invocation and you call that function in a loop. stracing your program might again reveal the error you're hitting.
Jun 25 2021
On 6/24/21 1:33 PM, Bastiaan Veelo wrote:distributes the load across all cores (but one).Last time I checked, the current thread would run tasks as well. Ali
Jun 24 2021
On Thursday, 24 June 2021 at 20:56:26 UTC, Ali Çehreli wrote:On 6/24/21 1:33 PM, Bastiaan Veelo wrote:Indeed, thanks. — Bastiaan.distributes the load across all cores (but one).Last time I checked, the current thread would run tasks as well. Ali
Jun 24 2021