www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.parallelism: How to wait all tasks finished?

reply "Cooler" <kulkin hotbox.ru> writes:
I have several tasks. Each task may or may not create another 
task. What is the best way to wait until all tasks finished?

The code:

void procData(){
   if(...)
     taskPool.put(task(&procData));
}

void main(){
   taskPool.put(task(&procData));
   taskPool.put(task(&procData));
   ...
   taskPool.put(task(&procData));

   // Next line will block execution until all tasks already in 
queue finished.
   // Almost all what I need, but new tasks will not be started.
   taskPool.finish(true);
}
Feb 02 2014
next sibling parent reply "Dan Killebrew" <dankillebrew gmail.com> writes:
   // Next line will block execution until all tasks already in 
 queue finished.
   // Almost all what I need, but new tasks will not be started.
   taskPool.finish(true);
 }
Are you sure TaskPool.finish isn't what you're looking for? "Signals worker threads to terminate when the queue becomes empty." It seems to me that worker threads will continue as long as the queue isn't empty. So if a task adds another task to the pool, some worker will process the newly enqueued task.
Feb 02 2014
parent reply "Cooler" <kulkin hotbox.ru> writes:
On Monday, 3 February 2014 at 06:56:35 UTC, Dan Killebrew wrote:
  // Next line will block execution until all tasks already in 
 queue finished.
  // Almost all what I need, but new tasks will not be started.
  taskPool.finish(true);
 }
Are you sure TaskPool.finish isn't what you're looking for? "Signals worker threads to terminate when the queue becomes empty." It seems to me that worker threads will continue as long as the queue isn't empty. So if a task adds another task to the pool, some worker will process the newly enqueued task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Feb 02 2014
parent reply "Dan Killebrew" <dank gmail.com> writes:
 It seems to me that worker threads will continue as long as 
 the queue isn't empty. So if a task adds another task to the 
 pool, some worker will process the newly enqueued task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Feb 03 2014
parent reply "Cooler" <kulkin hotbox.ru> writes:
On Tuesday, 4 February 2014 at 03:26:04 UTC, Dan Killebrew wrote:
 It seems to me that worker threads will continue as long as 
 the queue isn't empty. So if a task adds another task to the 
 pool, some worker will process the newly enqueued task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Will not help. I don't know beforehand what tasks will be created. procData is recursive and it decides create new task or not.
Feb 05 2014
next sibling parent "Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
On Wednesday, 5 February 2014 at 15:38:14 UTC, Cooler wrote:
 Will not help. I don't know beforehand what tasks will be
 created. procData is recursive and it decides create new task or
 not.
You seem to be saying that you want to be able to wait for all tasks to complete an indefinite number of times, adding more tasks after each one. Why would you want to do that? The queue for the pool is infinitely long, so just keep adding tasks till you have no more tasks to add. Or if you have a progression of types, like all tasks of type A have to be complete before you can start running the tasks of type B, then you should be able to have a separate thread pool for each type.
Feb 05 2014
prev sibling parent reply "Andrea Fontana" <nospam example.com> writes:
On Wednesday, 5 February 2014 at 15:38:14 UTC, Cooler wrote:
 On Tuesday, 4 February 2014 at 03:26:04 UTC, Dan Killebrew 
 wrote:
 It seems to me that worker threads will continue as long as 
 the queue isn't empty. So if a task adds another task to the 
 pool, some worker will process the newly enqueued task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Will not help. I don't know beforehand what tasks will be created. procData is recursive and it decides create new task or not.
Something like this? (not tested...) shared bool more = true; ... ... ... void procData(){ if(...) { taskPool.put(task(&procData)); more = true; } } while(true) { taskPool.finish(true); if (!more) break; else more = false; }
Feb 06 2014
parent reply "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 6 February 2014 at 11:30:17 UTC, Andrea Fontana 
wrote:
 On Wednesday, 5 February 2014 at 15:38:14 UTC, Cooler wrote:
 On Tuesday, 4 February 2014 at 03:26:04 UTC, Dan Killebrew 
 wrote:
 It seems to me that worker threads will continue as long as 
 the queue isn't empty. So if a task adds another task to 
 the pool, some worker will process the newly enqueued task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Will not help. I don't know beforehand what tasks will be created. procData is recursive and it decides create new task or not.
Something like this? (not tested...) shared bool more = true; ... ... ... void procData(){ if(...) { taskPool.put(task(&procData)); more = true; } } while(true) { taskPool.finish(true); if (!more) break; else more = false; }
It is closer, but after taskPool.finish() all tries to taskPool.put() will be rejected. Let's me clear example. import std.stdio, std.parallelism, core.thread; shared int i; void procData(){ synchronized ++i; if(i >= 100) return; foreach(i; 0 .. 100) taskPool.put(task(&procData)); // New tasks will be rejected after // taskPool.finish() } void main(){ taskPool.put(task(&procData)); Thread.sleep(1.msecs); // The final output of "i" depends on duration here taskPool.finish(true); writefln("i = %s", i); } In the example above the total number of tasks executed depends on sleep duration.
Feb 06 2014
parent reply "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 6 February 2014 at 14:42:57 UTC, Cooler wrote:
 On Thursday, 6 February 2014 at 11:30:17 UTC, Andrea Fontana 
 wrote:
 On Wednesday, 5 February 2014 at 15:38:14 UTC, Cooler wrote:
 On Tuesday, 4 February 2014 at 03:26:04 UTC, Dan Killebrew 
 wrote:
 It seems to me that worker threads will continue as long 
 as the queue isn't empty. So if a task adds another task 
 to the pool, some worker will process the newly enqueued 
 task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Will not help. I don't know beforehand what tasks will be created. procData is recursive and it decides create new task or not.
Something like this? (not tested...) shared bool more = true; ... ... ... void procData(){ if(...) { taskPool.put(task(&procData)); more = true; } } while(true) { taskPool.finish(true); if (!more) break; else more = false; }
It is closer, but after taskPool.finish() all tries to taskPool.put() will be rejected. Let's me clear example. import std.stdio, std.parallelism, core.thread; shared int i; void procData(){ synchronized ++i; if(i >= 100) return; foreach(i; 0 .. 100) taskPool.put(task(&procData)); // New tasks will be rejected after // taskPool.finish() } void main(){ taskPool.put(task(&procData)); Thread.sleep(1.msecs); // The final output of "i" depends on duration here taskPool.finish(true); writefln("i = %s", i); } In the example above the total number of tasks executed depends on sleep duration.
Forgot to say - I know how to solve the topic problem. My question is "What is the BEST way?". One of my idea - may be introduce new function, named for example "wait", that will block until there are working tasks?
Feb 06 2014
parent reply "Andrea Fontana" <nospam example.com> writes:
On Thursday, 6 February 2014 at 14:52:36 UTC, Cooler wrote:
 On Thursday, 6 February 2014 at 14:42:57 UTC, Cooler wrote:
 On Thursday, 6 February 2014 at 11:30:17 UTC, Andrea Fontana 
 wrote:
 On Wednesday, 5 February 2014 at 15:38:14 UTC, Cooler wrote:
 On Tuesday, 4 February 2014 at 03:26:04 UTC, Dan Killebrew 
 wrote:
 It seems to me that worker threads will continue as long 
 as the queue isn't empty. So if a task adds another task 
 to the pool, some worker will process the newly enqueued 
 task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Will not help. I don't know beforehand what tasks will be created. procData is recursive and it decides create new task or not.
Something like this? (not tested...) shared bool more = true; ... ... ... void procData(){ if(...) { taskPool.put(task(&procData)); more = true; } } while(true) { taskPool.finish(true); if (!more) break; else more = false; }
It is closer, but after taskPool.finish() all tries to taskPool.put() will be rejected. Let's me clear example. import std.stdio, std.parallelism, core.thread; shared int i; void procData(){ synchronized ++i; if(i >= 100) return; foreach(i; 0 .. 100) taskPool.put(task(&procData)); // New tasks will be rejected after // taskPool.finish() } void main(){ taskPool.put(task(&procData)); Thread.sleep(1.msecs); // The final output of "i" depends on duration here taskPool.finish(true); writefln("i = %s", i); } In the example above the total number of tasks executed depends on sleep duration.
Forgot to say - I know how to solve the topic problem. My question is "What is the BEST way?". One of my idea - may be introduce new function, named for example "wait", that will block until there are working tasks?
What about sync ++taskCount when you put() something and --taskCount when task is done? And on main while(i > 0) Thread.yield(); ?
Feb 06 2014
parent reply "Andrea Fontana" <nospam example.com> writes:
On Thursday, 6 February 2014 at 16:07:51 UTC, Andrea Fontana 
wrote:
 On Thursday, 6 February 2014 at 14:52:36 UTC, Cooler wrote:
 On Thursday, 6 February 2014 at 14:42:57 UTC, Cooler wrote:
 On Thursday, 6 February 2014 at 11:30:17 UTC, Andrea Fontana 
 wrote:
 On Wednesday, 5 February 2014 at 15:38:14 UTC, Cooler wrote:
 On Tuesday, 4 February 2014 at 03:26:04 UTC, Dan Killebrew 
 wrote:
 It seems to me that worker threads will continue as long 
 as the queue isn't empty. So if a task adds another task 
 to the pool, some worker will process the newly enqueued 
 task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Will not help. I don't know beforehand what tasks will be created. procData is recursive and it decides create new task or not.
Something like this? (not tested...) shared bool more = true; ... ... ... void procData(){ if(...) { taskPool.put(task(&procData)); more = true; } } while(true) { taskPool.finish(true); if (!more) break; else more = false; }
It is closer, but after taskPool.finish() all tries to taskPool.put() will be rejected. Let's me clear example. import std.stdio, std.parallelism, core.thread; shared int i; void procData(){ synchronized ++i; if(i >= 100) return; foreach(i; 0 .. 100) taskPool.put(task(&procData)); // New tasks will be rejected after // taskPool.finish() } void main(){ taskPool.put(task(&procData)); Thread.sleep(1.msecs); // The final output of "i" depends on duration here taskPool.finish(true); writefln("i = %s", i); } In the example above the total number of tasks executed depends on sleep duration.
Forgot to say - I know how to solve the topic problem. My question is "What is the BEST way?". One of my idea - may be introduce new function, named for example "wait", that will block until there are working tasks?
What about sync ++taskCount when you put() something and --taskCount when task is done? And on main while(i > 0) Thread.yield(); ?
Something like this: import std.stdio, std.parallelism, core.thread; import std.random; shared size_t taskCount; shared size_t i; void procData() in { synchronized ++i; } out { synchronized --taskCount; } body { if (i > 100) return; foreach(i; 0 .. 100) { taskPool.put(task(&procData)); synchronized ++taskCount; } } void main(){ taskCount = 2; taskPool.put(task(&procData)); taskPool.put(task(&procData)); while(taskCount > 0) Thread.yield(); }
Feb 06 2014
parent "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 6 February 2014 at 16:19:38 UTC, Andrea Fontana
wrote:
 On Thursday, 6 February 2014 at 16:07:51 UTC, Andrea Fontana 
 wrote:
 On Thursday, 6 February 2014 at 14:52:36 UTC, Cooler wrote:
 On Thursday, 6 February 2014 at 14:42:57 UTC, Cooler wrote:
 On Thursday, 6 February 2014 at 11:30:17 UTC, Andrea Fontana 
 wrote:
 On Wednesday, 5 February 2014 at 15:38:14 UTC, Cooler wrote:
 On Tuesday, 4 February 2014 at 03:26:04 UTC, Dan Killebrew 
 wrote:
 It seems to me that worker threads will continue as 
 long as the queue isn't empty. So if a task adds 
 another task to the pool, some worker will process the 
 newly enqueued task.
No. After taskPool.finish() no way to add new tasks to the queue. taskPool.put will not add new tasks.
Then perhaps you need to create a new TaskPool (and make sure that workers add their tasks to the correct task pool), so that you can wait on the first task pool, then wait on the second task pool, etc. auto phase1 = new TaskPool(); //make sure all new tasks are added to phase1 phase1.finish(true); auto phase2 = new TaskPool(); //make sure all new tasks are added to phase2 phase2.finish(true);
Will not help. I don't know beforehand what tasks will be created. procData is recursive and it decides create new task or not.
Something like this? (not tested...) shared bool more = true; ... ... ... void procData(){ if(...) { taskPool.put(task(&procData)); more = true; } } while(true) { taskPool.finish(true); if (!more) break; else more = false; }
It is closer, but after taskPool.finish() all tries to taskPool.put() will be rejected. Let's me clear example. import std.stdio, std.parallelism, core.thread; shared int i; void procData(){ synchronized ++i; if(i >= 100) return; foreach(i; 0 .. 100) taskPool.put(task(&procData)); // New tasks will be rejected after // taskPool.finish() } void main(){ taskPool.put(task(&procData)); Thread.sleep(1.msecs); // The final output of "i" depends on duration here taskPool.finish(true); writefln("i = %s", i); } In the example above the total number of tasks executed depends on sleep duration.
Forgot to say - I know how to solve the topic problem. My question is "What is the BEST way?". One of my idea - may be introduce new function, named for example "wait", that will block until there are working tasks?
What about sync ++taskCount when you put() something and --taskCount when task is done? And on main while(i > 0) Thread.yield(); ?
Something like this: import std.stdio, std.parallelism, core.thread; import std.random; shared size_t taskCount; shared size_t i; void procData() in { synchronized ++i; } out { synchronized --taskCount; } body { if (i > 100) return; foreach(i; 0 .. 100) { taskPool.put(task(&procData)); synchronized ++taskCount; } } void main(){ taskCount = 2; taskPool.put(task(&procData)); taskPool.put(task(&procData)); while(taskCount > 0) Thread.yield(); }
Now I do almost the same in my program, but instead of while(...) Thread.yield() I wait on semaphore to be notified. And in threads when --taskCount reaches 0, i do Semaphore.notify(). But all of this don't look beautiful enough. As I mention above - may be introduce new function, named for example "wait", that will block TaskPool until there are working tasks? If such situation encounters frequently, may be it is worth to add Phobos some functionality?
Feb 06 2014
prev sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Mon, 2014-02-03 at 00:00 +0000, Cooler wrote:
 I have several tasks. Each task may or may not create another 
 task. What is the best way to wait until all tasks finished?
What you are describing here is a classic fork/join architecture. The tasks are structured as a tree with synchronization handled by the sub-nodes. As far as I am aware std.parallelism focuses on data parallelism which is a scatter/gather (aka map/reduce) model of just a single layer. All the code fragments in the thread have, I believe, been predicated on working with a thread pool as an explicit global entity. I think the problems have stemmed from taking this viewpoint. I would suggest following the way the Java fork/join framework (based on Doug Lea's original) works. There is an underlying global thread pool, but the user code uses the fork/join abstraction layer in order to create the tree of synchronization dependencies. In this case instead of working with tasks directly there needs to be a type whose job it is to be a non-leaf node in the tree that handles synchronization whilst nonetheless creating tasks and submitting them to the pool. This is clearly something that could turn into an addition to std.parallelism or be std.forkjoin. Sorry I have no actual code to offer, but the overall design of what is needed is well understood, at least in the Java context. C++ has a long way to go to catch up, as does D. The other thing that then sits on this is lazy stream parallelism, which is what Java 8 is adding to the mix. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 06 2014
parent "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 6 February 2014 at 17:58:47 UTC, Russel Winder wrote:
 On Mon, 2014-02-03 at 00:00 +0000, Cooler wrote:
 I have several tasks. Each task may or may not create another 
 task. What is the best way to wait until all tasks finished?
What you are describing here is a classic fork/join architecture. The tasks are structured as a tree with synchronization handled by the sub-nodes. As far as I am aware std.parallelism focuses on data parallelism which is a scatter/gather (aka map/reduce) model of just a single layer. All the code fragments in the thread have, I believe, been predicated on working with a thread pool as an explicit global entity. I think the problems have stemmed from taking this viewpoint. I would suggest following the way the Java fork/join framework (based on Doug Lea's original) works. There is an underlying global thread pool, but the user code uses the fork/join abstraction layer in order to create the tree of synchronization dependencies. In this case instead of working with tasks directly there needs to be a type whose job it is to be a non-leaf node in the tree that handles synchronization whilst nonetheless creating tasks and submitting them to the pool. This is clearly something that could turn into an addition to std.parallelism or be std.forkjoin. Sorry I have no actual code to offer, but the overall design of what is needed is well understood, at least in the Java context. C++ has a long way to go to catch up, as does D. The other thing that then sits on this is lazy stream parallelism, which is what Java 8 is adding to the mix.
Thank you for your verbose answer. I think the solution could be shorter and simpler :)
Feb 06 2014