www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - parallelFuture

reply dsimcha <dsimcha yahoo.com> writes:
I've created an alpha release of parallelFuture, a high-level parallelization
library for D2.  Right now, it has a task pool, futures, parallel foreach, and
parallel map.

Here's the (IMHO) coolest example:

auto pool = new ThreadPool();

// Assuming we have a function isPrime(), print all
// prime numbers from 0 to uint.max, testing for primeness
// in parallel.
auto myRange = iota(0, uint.max);
foreach(num; pool.parallel(myRange)) {
    if(isPrime(num)) {
        synchronized writeln(num);
    }
}

The interface is there and it seems to work, although it has not been
extensively stress tested yet.  Some of the implementation details could
admittedly use some cleaning up, and I would appreciate help from some
threading gurus on improving my queue (right now it's a naive synchronized
singly-linked list) and getting condition mutexes to work properly.  (Right
now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
of places.  It's a kludge, but it seems to work reasonably well in practice.)

The code is at:

http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

The docs are at:

http://cis.jhu.edu/~dsimcha/parallelFuture.html
Oct 21 2009
next sibling parent reply Tim Matthews <tim.matthews7 gmail.com> writes:
dsimcha wrote:
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.
 
 Here's the (IMHO) coolest example:
 
 auto pool = new ThreadPool();
 
 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
     if(isPrime(num)) {
         synchronized writeln(num);
     }
 }
 
 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)
 
 The code is at:
 
 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
 The docs are at:
 
 http://cis.jhu.edu/~dsimcha/parallelFuture.html
Nice. About the tasks: In .Net a worker thread is created for each cpu core. Each worker thread has its own local queue of tasks which it initially retrieves from the global queue but if the tasks creates new tasks it adds to its local queue directly for less contention. When it finishes completing a task it takes the next from the back of its local queue to take advantage of cache (like a stack). The tasks at the front of the queue can be stolen from another worker thread if its local queue and the global queue are both empty to maximize cpu usage. Also the task's wait for complete is only considered complete if all of the tasks it created too are complete (kinda like recursion). What is the implementation or plans like for this.
Oct 21 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Tim Matthews (tim.matthews7 gmail.com)'s article
 dsimcha wrote:
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.

 Here's the (IMHO) coolest example:

 auto pool = new ThreadPool();

 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
     if(isPrime(num)) {
         synchronized writeln(num);
     }
 }

 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)

 The code is at:

 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

 The docs are at:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
Nice. About the tasks: In .Net a worker thread is created for each cpu core. Each worker thread has its own local queue of tasks which it initially retrieves from the global queue but if the tasks creates new tasks it adds to its local queue directly for less contention. When it finishes completing a task it takes the next from the back of its local queue to take advantage of cache (like a stack). The tasks at the front of the queue can be stolen from another worker thread if its local queue and the global queue are both empty to maximize cpu usage. Also the task's wait for complete is only considered complete if all of the tasks it created too are complete (kinda like recursion). What is the implementation or plans like for this.
For now, parallelFuture was designed with a single producer, multiple worker model. Absolutely no attempt was made to allow for tasks running in the task pool to themselves submit jobs to the same task pool, because it would have made things more complicated and I couldn't think of any use cases. I designed the lib with the types of use cases I encounter in my work in mind. (mathy, pure throughput oriented computing on large, embarrassingly parallel problems.) If someone comes up with a compelling use case, though, I'd certainly consider adding such abilities provided they don't interfere with performance or API simplicity for the more common cases. To make this discussion simple, let's define F1 as a future/task submitted by the main producer thread, and F2 as a task/future submitted by F1. The queue is (for now) strictly FIFO, except that if you have a pointer to the Task/Future object you can steal a job. When F1 submits F2 to the queue, F2 goes to the back of the queue like anything else. This means when F1 waits on F2, it is possible to have a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread populated by F1). This is mitigated by work stealing (F1 may just steal F2 and do it in its own thread). In parallel map and foreach, I should probably document this, but for now it's undefined behavior for the mapping function or parallel foreach loop body to submit jobs to the task pool and wait on them, and in practice will likely result in deadlocks.
Oct 22 2009
parent reply Tim Matthews <tim.matthews7 gmail.com> writes:
dsimcha wrote:

 For now, parallelFuture was designed with a single producer, multiple worker
 model.  Absolutely no attempt was made to allow for tasks running in the task
pool
 to themselves submit jobs to the same task pool, because it would have made
things
 more complicated and I couldn't think of any use cases.
like recursion a function is gona need to call another function, a thread needs to be able to spawn threads and task should be able to create new tasks. Making newly spawned tasks stay on the same thread is good optimization. This shouldn't need a specific use case.
 I designed the lib with
 the types of use cases I encounter in my work in mind.  (mathy, pure throughput
 oriented computing on large, embarrassingly parallel problems.)  If someone
comes
 up with a compelling use case, though, I'd certainly consider adding such
 abilities provided they don't interfere with performance or API simplicity for
the
 more common cases.
 
 To make this discussion simple, let's define F1 as a future/task submitted by
the
 main producer thread, and F2 as a task/future submitted by F1.  The queue is
(for
 now) strictly FIFO, except that if you have a pointer to the Task/Future object
 you can steal a job.  When F1 submits F2 to the queue, F2 goes to the back of
the
 queue like anything else.  This means when F1 waits on F2, it is possible to
have
 a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread
populated
 by F1).  This is mitigated by work stealing (F1 may just steal F2 and do it in
its
 own thread).
I don't like that ^ idea of simple discussion with the many F1 and F2 all over the place. I hope this video can help visualize some ideas: http://channel9.msdn.com/pdc2008/TL26/
 
 In parallel map and foreach, I should probably document this, but for now it's
 undefined behavior for the mapping function or parallel foreach loop body to
 submit jobs to the task pool and wait on them, and in practice will likely
result
 in deadlocks.
You want to document that as undefined behavior? It can be made to work.
Oct 22 2009
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Tim Matthews (tim.matthews7 gmail.com)'s article
 dsimcha wrote:
 For now, parallelFuture was designed with a single producer, multiple worker
 model.  Absolutely no attempt was made to allow for tasks running in the task
pool
 to themselves submit jobs to the same task pool, because it would have made
things
 more complicated and I couldn't think of any use cases.
like recursion a function is gona need to call another function, a thread needs to be able to spawn threads and task should be able to create new tasks. Making newly spawned tasks stay on the same thread is good optimization. This shouldn't need a specific use case.
 I designed the lib with
 the types of use cases I encounter in my work in mind.  (mathy, pure throughput
 oriented computing on large, embarrassingly parallel problems.)  If someone
comes
 up with a compelling use case, though, I'd certainly consider adding such
 abilities provided they don't interfere with performance or API simplicity for
the
 more common cases.

 To make this discussion simple, let's define F1 as a future/task submitted by
the
 main producer thread, and F2 as a task/future submitted by F1.  The queue is
(for
 now) strictly FIFO, except that if you have a pointer to the Task/Future object
 you can steal a job.  When F1 submits F2 to the queue, F2 goes to the back of
the
 queue like anything else.  This means when F1 waits on F2, it is possible to
have
 a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread
populated
 by F1).  This is mitigated by work stealing (F1 may just steal F2 and do it in
its
 own thread).
I don't like that ^ idea of simple discussion with the many F1 and F2 all over the place. I hope this video can help visualize some ideas: http://channel9.msdn.com/pdc2008/TL26/
 In parallel map and foreach, I should probably document this, but for now it's
 undefined behavior for the mapping function or parallel foreach loop body to
 submit jobs to the task pool and wait on them, and in practice will likely
result
 in deadlocks.
You want to document that as undefined behavior? It can be made to work.
Ok, I thought of my use case and an easy fix. It will probably be fixed soon.
Oct 22 2009
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Tim Matthews (tim.matthews7 gmail.com)'s article
 dsimcha wrote:
 For now, parallelFuture was designed with a single producer, multiple worker
 model.  Absolutely no attempt was made to allow for tasks running in the task
pool
 to themselves submit jobs to the same task pool, because it would have made
things
 more complicated and I couldn't think of any use cases.
like recursion a function is gona need to call another function, a thread needs to be able to spawn threads and task should be able to create new tasks. Making newly spawned tasks stay on the same thread is good optimization. This shouldn't need a specific use case.
 I designed the lib with
 the types of use cases I encounter in my work in mind.  (mathy, pure throughput
 oriented computing on large, embarrassingly parallel problems.)  If someone
comes
 up with a compelling use case, though, I'd certainly consider adding such
 abilities provided they don't interfere with performance or API simplicity for
the
 more common cases.

 To make this discussion simple, let's define F1 as a future/task submitted by
the
 main producer thread, and F2 as a task/future submitted by F1.  The queue is
(for
 now) strictly FIFO, except that if you have a pointer to the Task/Future object
 you can steal a job.  When F1 submits F2 to the queue, F2 goes to the back of
the
 queue like anything else.  This means when F1 waits on F2, it is possible to
have
 a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread
populated
 by F1).  This is mitigated by work stealing (F1 may just steal F2 and do it in
its
 own thread).
I don't like that ^ idea of simple discussion with the many F1 and F2 all over the place. I hope this video can help visualize some ideas: http://channel9.msdn.com/pdc2008/TL26/
 In parallel map and foreach, I should probably document this, but for now it's
 undefined behavior for the mapping function or parallel foreach loop body to
 submit jobs to the task pool and wait on them, and in practice will likely
result
 in deadlocks.
You want to document that as undefined behavior? It can be made to work.
Ok, I added the ability for tasks to submit more tasks to the pool by implementing what I'd call a "selfish thread" model (not sure if there's a more formal name for it). Basically, a thread will steal any job it's waiting on if the job hasn't been started yet, no matter where the job was in the queue. This was made somewhat difficult to implement because I was using lazy submitting for parallel foreach and map and recycling objects. This enables parallel foreach over random access ranges without generating any heap activity. It also enables parallel foreach over lazy forward ranges that might not even fit in memory if they were converted to arrays up front, without having to synchronize on every call to front(). Before each submission, a small portion of the range is converted to an array, and the memory used for this is recycled when the object is recycled. However, the effort was worth it, as I thought of a killer use case: Nested parallel foreach over matrices. Again, code: http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d Docs: http://cis.jhu.edu/~dsimcha/parallelFuture.html
Oct 23 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 Again, code:
 
 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
 Docs:
 
 http://cis.jhu.edu/~dsimcha/parallelFuture.html
What license is the library under? Andrei
Oct 23 2009
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 Again, code:

 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

 Docs:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
What license is the library under? Andrei
Boost. I borrowed a few lines of assembler for atomic ops from Tango, though. These snippets are "standard" atomic ops code and are probably not copyrightable. Nevertheless, this section of code is clearly marked, and D2/Phobos should eventually get a full-fledged atomics lib anyhow. At that point, these ad-hoc functions can be replaced with more generic functions and the issue will be completely resolved.
Oct 23 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 dsimcha wrote:
 Again, code:

 http://dsource.org/projects/scrapple/browser/trunk/parallelFut
re/parallelFuture.d 


 Docs:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
What license is the library under? Andrei
Boost. I suppose you didn't want to look at the source in case the license wasn't sufficiently liberal, but it's easy enough to scan through the module's doc comments for a license block, without reading the source code.
Oct 23 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 dsimcha wrote:
 Again, code:

 http://dsource.org/projects/scrapple/browser/trunk/parallelFut
re/parallelFuture.d 


 Docs:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
What license is the library under? Andrei
Boost. I suppose you didn't want to look at the source in case the license wasn't sufficiently liberal, but it's easy enough to scan through the module's doc comments for a license block, without reading the source code.
I actually did take a look but couldn't visually find the license block at the beginning or end of the code, sorry. Now I see it's a bit below the beginning. Great - it would be cool if we could adapt this for Phobos. I'm hoping Sean (who is fleshing out his messaging API) and David could find ways to integrate their work. Andrei
Oct 23 2009
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 dsimcha wrote:
 Again, code:
http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 Docs:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
What license is the library under? Andrei
Boost. I suppose you didn't want to look at the source in case the license wasn't sufficiently liberal, but it's easy enough to scan through the module's doc comments for a license block, without reading the source code.
I actually did take a look but couldn't visually find the license block at the beginning or end of the code, sorry. Now I see it's a bit below the beginning. Great - it would be cool if we could adapt this for Phobos. I'm hoping Sean (who is fleshing out his messaging API) and David could find ways to integrate their work. Andrei
If that's the case, someone please point me to some use cases for message passing so I can understand it better. parallelFuture was designed mostly with pure throughput-oriented multithreading of embarrassingly parallel tasks in mind. I hadn't given any thought to message passing because I don't encounter many use cases for it in the type of work I do.
Oct 23 2009
prev sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Andrei Alexandrescu wrote:
 Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 dsimcha wrote:
 Again, code:

 http://dsource.org/projects/scrapple/browser/trunk/parallelFut
re/parallelFuture.d 


 Docs:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
What license is the library under? Andrei
Boost. I suppose you didn't want to look at the source in case the license wasn't sufficiently liberal, but it's easy enough to scan through the module's doc comments for a license block, without reading the source code.
I actually did take a look but couldn't visually find the license block at the beginning or end of the code, sorry. Now I see it's a bit below the beginning. Great - it would be cool if we could adapt this for Phobos. I'm hoping Sean (who is fleshing out his messaging API) and David could find ways to integrate their work.
I was actually going to suggest just that. :) I definitely think this deserves a place in Phobos. The parallel foreach alone covers a lot of use cases for multithreaded code in the simplest way. -Lars
Oct 24 2009
prev sibling next sibling parent zsxxsz <zhengshuxin hexun.com> writes:
== Quote from dsimcha (dsimcha yahoo.com)'s article
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.
 Here's the (IMHO) coolest example:
 auto pool = new ThreadPool();
 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
     if(isPrime(num)) {
         synchronized writeln(num);
     }
 }
 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)
 The code is at:
 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 The docs are at:
 http://cis.jhu.edu/~dsimcha/parallelFuture.html
Very good!
Oct 22 2009
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
dsimcha wrote:
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.
 
 Here's the (IMHO) coolest example:
 
 auto pool = new ThreadPool();
 
 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
     if(isPrime(num)) {
         synchronized writeln(num);
     }
 }
 
 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)
 
 The code is at:
 
 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
 The docs are at:
 
 http://cis.jhu.edu/~dsimcha/parallelFuture.html
Very nice! Parallelisation for us ordinary folks. :) I tried your isPrime example, which was very cool. I can't wait to try the library in a real application. I often have a situation where I have a set of grid points, and I do (mostly) separate calculations on each grid point. Your library should allow me to do calculations on several grid points in parallel, with minimal changes to my code. Is there some particular reason why you have capitalised the F in the file name, but not in the module name? -Lars
Oct 22 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Lars T. Kyllingstad (public kyllingen.NOSPAMnet)'s article
 Is there some particular reason why you have capitalised the F in the
 file name, but not in the module name?
 -Lars
This is called the effects of being in hack mode late at night. I guess the convention is all lower case.
Oct 22 2009
prev sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
dsimcha wrote:
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.
 
 Here's the (IMHO) coolest example:
 
 auto pool = new ThreadPool();
 
 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
     if(isPrime(num)) {
         synchronized writeln(num);
     }
 }
 
 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)
 
 The code is at:
 
 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
 
 The docs are at:
 
 http://cis.jhu.edu/~dsimcha/parallelFuture.html
If you can easily do it, it would be nice to be able to have the threads able to communicate with each other. Something along the lines of both broadcast messages and 1:1 exchanges. A very rough sketch of a possible use: Task[] tasks = Task.who_is_waiting; foreach (auto task; tasks) { if (task.process(something) ) { task.markDone(true); } } I see "something" as being an Object that would need to implement an interface to carry some identifying information, so when a task received it it could determine what to do with it (with "reject processing" being an option). I think the rest is pretty clear. OTOH, this comment was just to demonstrate the kind of communication between tasks that I'm talking about. Static methods for broadcast communication and instance methods for 1:1 communication. With safeties, so when a task completes before it gets your message, you aren't left talking to a null pointer. Cleanup would need to be managed by the original thread that created the tasks. It would need to be able to send a broadcast "now closing task xxx signal" to all threads. Threads maintaining a list of tasks would need to scan through them and remove any references to that task. Maybe this is getting to complicated, but it would certainly be useful. In the past interthread communication is the main problem that's kept me from using them.
Oct 22 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Charles Hixson (charleshixsn earthlink.net)'s article
 dsimcha wrote:
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.

 Here's the (IMHO) coolest example:

 auto pool = new ThreadPool();

 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
     if(isPrime(num)) {
         synchronized writeln(num);
     }
 }

 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)

 The code is at:

 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

 The docs are at:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
If you can easily do it, it would be nice to be able to have the threads able to communicate with each other. Something along the lines of both broadcast messages and 1:1 exchanges. A very rough sketch of a possible use: Task[] tasks = Task.who_is_waiting; foreach (auto task; tasks) { if (task.process(something) ) { task.markDone(true); } } I see "something" as being an Object that would need to implement an interface to carry some identifying information, so when a task received it it could determine what to do with it (with "reject processing" being an option). I think the rest is pretty clear. OTOH, this comment was just to demonstrate the kind of communication between tasks that I'm talking about. Static methods for broadcast communication and instance methods for 1:1 communication. With safeties, so when a task completes before it gets your message, you aren't left talking to a null pointer. Cleanup would need to be managed by the original thread that created the tasks. It would need to be able to send a broadcast "now closing task xxx signal" to all threads. Threads maintaining a list of tasks would need to scan through them and remove any references to that task. Maybe this is getting to complicated, but it would certainly be useful. In the past interthread communication is the main problem that's kept me from using them.
I'll think about this and see if I can work something like this in, but I need real-world use cases. I do bioinformatics work, which basically means large-scale data mining, embarrassingly parallel problems and very CPU-bound work. The use cases I had in mind were mostly the "use every core I have to do something embarrassingly parallel" kind. The goal was to make these use cases as dead simple as possible. Whatever use cases you have in mind, I'm apparently not familiar with them. If I'm to improve this lib to handle use cases other than the pure throughput-oriented parallelization of embarrassingly parallel tasks that I had in mind, I need to understand use cases from other fields.
Oct 22 2009
parent Charles Hixson <charleshixsn earthlink.net> writes:
dsimcha wrote:
 == Quote from Charles Hixson (charleshixsn earthlink.net)'s article
 dsimcha wrote:
 I've created an alpha release of parallelFuture, a high-level parallelization
 library for D2.  Right now, it has a task pool, futures, parallel foreach, and
 parallel map.

 Here's the (IMHO) coolest example:

 auto pool = new ThreadPool();

 // Assuming we have a function isPrime(), print all
 // prime numbers from 0 to uint.max, testing for primeness
 // in parallel.
 auto myRange = iota(0, uint.max);
 foreach(num; pool.parallel(myRange)) {
     if(isPrime(num)) {
         synchronized writeln(num);
     }
 }

 The interface is there and it seems to work, although it has not been
 extensively stress tested yet.  Some of the implementation details could
 admittedly use some cleaning up, and I would appreciate help from some
 threading gurus on improving my queue (right now it's a naive synchronized
 singly-linked list) and getting condition mutexes to work properly.  (Right
 now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
 of places.  It's a kludge, but it seems to work reasonably well in practice.)

 The code is at:

 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

 The docs are at:

 http://cis.jhu.edu/~dsimcha/parallelFuture.html
If you can easily do it, it would be nice to be able to have the threads able to communicate with each other. Something along the lines of both broadcast messages and 1:1 exchanges. A very rough sketch of a possible use: Task[] tasks = Task.who_is_waiting; foreach (auto task; tasks) { if (task.process(something) ) { task.markDone(true); } } I see "something" as being an Object that would need to implement an interface to carry some identifying information, so when a task received it it could determine what to do with it (with "reject processing" being an option). I think the rest is pretty clear. OTOH, this comment was just to demonstrate the kind of communication between tasks that I'm talking about. Static methods for broadcast communication and instance methods for 1:1 communication. With safeties, so when a task completes before it gets your message, you aren't left talking to a null pointer. Cleanup would need to be managed by the original thread that created the tasks. It would need to be able to send a broadcast "now closing task xxx signal" to all threads. Threads maintaining a list of tasks would need to scan through them and remove any references to that task. Maybe this is getting to complicated, but it would certainly be useful. In the past interthread communication is the main problem that's kept me from using them.
I'll think about this and see if I can work something like this in, but I need real-world use cases. I do bioinformatics work, which basically means large-scale data mining, embarrassingly parallel problems and very CPU-bound work. The use cases I had in mind were mostly the "use every core I have to do something embarrassingly parallel" kind. The goal was to make these use cases as dead simple as possible. Whatever use cases you have in mind, I'm apparently not familiar with them. If I'm to improve this lib to handle use cases other than the pure throughput-oriented parallelization of embarrassingly parallel tasks that I had in mind, I need to understand use cases from other fields.
It's basically a simplified form of message passing, and useful for general processing in the context of multiple cores. If you wanted to implement, say, Smalltalk or Objective-C you could use this to allow their calls to proceed in parallel. My particular interest is based on AI. I'm not talking about neural-nets, as that, I think, would incur too much overhead if implemented with even this kind of simplified message passing, but one where several "dumb" processes are continually running in the background and sending messages whenever they detect something interesting. (So it would also be useful if the tasks could be assigned an adjustable priority.) The kind of system I'm envisioning should be able to adjust to a variable number of processors...even variable while the program is running. (Yeah, that's not what we're talking about here. I'm talking about the higher level design.) Erlang can probably do what I want, but it's incredibly slow. In tests I've run it's come out even slower than Ruby, which is slower than Python. Currently my choice is between D and Java. with a preference for D. Java has the libraries, but D is a better design fit with what I want to do. (I can't give detailed specifics, because I haven't yet done any programming of that part of the process. But the rough design calls for lots of small modules with a coordinator that manages them. I don't like having the coordinator be so central, but every threading model I've seen has a central controller. I'd really rather do something more like Linux/Unix daemons (see the Pandemonium paper, which may have originally inspired Unix).
Oct 22 2009