www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Threads don't scale, processes do

reply Bartosz Milewski <bartosz relisoft.com> writes:
My new blog entry is about the scalability of the thread model for 
concurrency. I discuss some lessons from the Erlang approach.You can 
vote on it in reddit:

http://www.reddit.com/comments/6xt9h/threads_dont_scaleproceses_do/
Aug 24 2008
parent reply Oliver Dathe <o.dathe gmx.de> writes:
 Fibers (as well as Java’s green threads) are not an alternative to
 heavyweight threads. Fibers can’t run in parallel, so they have no
 way to use multiple processors. They are more of a flow-of-control
 construct, often used interchangeably with coroutines.
Fibers can be passed between Threads as long as they are not currently executing. This may be implementationdepended but generally seems not to be the big issue. See tango's Fibers. I frequently thought about a model where you got an applicationdepended Scheduler/Proxy/Supervisor S which interchanges a set of Fibers between a set of Threads - controlled on each yield(). It is based on the following scenario: - Create Fibers for all kinds of independed tasks (Game scenario: Graphics, Sound, Logic, Network, Peripheral) - These Fibers don't call() each others but frequently yield() - These Fibers only get call()ed by S, and so only yield() to S - N Fibers run on M Threads that are known to S. Lets assume N>M. - S may interchange Fibers on each yield() call between the Threads according to a policy and may suspend Threads if there is too few work to be done. Some extended demands which could be well accomplished by this pattern: 1.) Let S try to keep up to C=(number of CPUs) Threads busy, s.th. these don't reach a non-blocking state from the OS's point of view (see 3 and 4). Fibers can get switched between Threads at a higher frequency than the OS's Scheduler. That is determined by the frequency of yield() usage. The OS sees noting of that internal switching. 2.) Tie priorities and timing constraints to Fibers. When distributed Fibers do yield() with an appropriate frequency, then S may handle the constraints according to the given policy. This may also incorporate Thread affinity to reduce the impacts of Thread/CPU switching. 3.) Fibers could announce blocking calls to S, so it may transfer them to a specific set of Threads which are created only for the purpose of getting blocked. This is to avoid impact on the fibers in 1.) and 2.) yieldAndAnnounceBlockingCall(); // S transfers the Fiber to some thread dedicated only for blocking // s.th. 1.) and 2.) already can be accomplished. blocking_call(); // OS suspends execution yieldAndAnnounceBlockingFinished(); // S may transfer the Fiber back to the set of Threads for 1.) // or may add the current thread to the set of Threads for 1.) 4.) Synchronization of internal ressources by S yieldAndLock( someSharedRessource ); // Scheduler handles this, possibly without OS support use( someSharedRessource ); yieldAndUnlock( someSharedRessource ); // s.th. another request can get served The latter could make locking a significantly less expensive operation, since it circumvents the necessity to get synchronized by the OS. If the Fibers respect the protocol, then S knows exactly when a ressource R is (un)used and when there are waiters. It as well does not have to fear asynchronous concurrency (Yes, S needs to be synchronized itself). Blocked waiting for R does not become a OS supported blocking call but rather manifests in the fact that S does not transfer the requesting Fiber to a Thread as long as it is in an internal blocking state (OS sees nothing of it). I created this model when thinking about a flexible architecture for games where very different kinds of competing constraints/QoS should be satisfied like network and peripheral IO (delay constraints) and graphics/physics (overall-amount-of-time constraints). The traditional way (making as much work as possible available to the OS's scheduler, using blocking calls, synchronizing through locks and hoping the OS will do an appropriate scheduling) did not seem satisfying to me. Letting an internal Scheduler/Proxy/Supervisor S interchange a set of Fibers between a set of Threads according to an applicationdepended policy and as well making the Fibers announce blocking calls to S and explicitely synchronizing through S indeed seemed very promising to me. If someone knows of similar attempts please let me know.
Aug 27 2008
next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Oliver Dathe wrote:
  > Fibers (as well as Java’s green threads) are not an alternative to
  > heavyweight threads. Fibers can’t run in parallel, so they have no
  > way to use multiple processors. They are more of a flow-of-control
  > construct, often used interchangeably with coroutines.
 
 Fibers can be passed between Threads as long as they are not currently 
 executing. This may be implementationdepended but generally seems not to 
 be the big issue. See tango's Fibers.
 
 I frequently thought about a model where you got an applicationdepended 
 Scheduler/Proxy/Supervisor S which interchanges a set of Fibers between 
 a set of Threads - controlled on each yield(). It is based on the 
 following scenario:
I find the concept of userland scheduling and green threads interesting and would work on this with you if you wish. Interpreted languages such as Erlang can pause their green threads / green processes easily; not so with natively compiled code. This may be one of the largest issues with scheduling native fibers. You might be able to get around it with enough hacks, though some would have to involve your OS's kernel, I have no doubt.
Aug 27 2008
parent reply Oliver Dathe <o.dathe gmx.de> writes:
 I find the concept of userland scheduling and green threads interesting 
 and would work on this with you if you wish.
I'd really like to follow up on this, in collaboration even better! Unfortunately I'm very busy until some time in October. However, feel free to do what you like and I'd enjoy dicussing. Have you had any similar plans before?
 Interpreted languages such as Erlang can pause their green threads / 
 green processes easily; not so with natively compiled code. This may be 
 one of the largest issues with scheduling native fibers. You might be 
 able to get around it with enough hacks, though some would have to 
 involve your OS's kernel, I have no doubt.
Yes, preemptive greenthreading would fit into this even better! But its no prerequisite, so I chose the collaborative way with Fibers for this. Not sure if the preemptive way could get accomplished with help of signal handling in a safe way. Since the proposal is for Fibers, the responsibility to yield is tied to the app developer. You're aiming for more genericity or even language support, right? A compiler could also place the yields based on heuristics or even supported by profiler runs and a chosen scheduling policy. But yes, preemption would be cooler. If blocking functions would be marked as such, then a compiler could also place the appropriate yields with the announcement that I proposed for that. What makes me a bit suspicious about the model is that it tries to manage ressources that are already managed for it. Review point 1 where it should try to keep up to C=(number of CPUs) Threads in a schedulable state from the OS's point of view. Thats just a stupid heuristic. So I see two main operation areas for this concept: greedy applications like games and applications with weak realtime demands. However it does not incur any restrictions on other types of applications and indeed they could profit from interactivity, reduced OS responsibility/activity/overhead and the multicore story. Some work that was previously only managable by the OS could be offloaded into the application. E.g. fiberspecific semaphores can get implemented with a Queue!(Fiber) for waiters plus the counter in collaboration with the userspace scheduler. That is what I was aiming for in point 4.
Aug 30 2008
parent Sean Kelly <sean invisibleduck.org> writes:
Oliver Dathe wrote:
 I find the concept of userland scheduling and green threads 
 interesting and would work on this with you if you wish.
I'd really like to follow up on this, in collaboration even better! Unfortunately I'm very busy until some time in October. However, feel free to do what you like and I'd enjoy dicussing. Have you had any similar plans before?
 Interpreted languages such as Erlang can pause their green threads / 
 green processes easily; not so with natively compiled code. This may 
 be one of the largest issues with scheduling native fibers. You might 
 be able to get around it with enough hacks, though some would have to 
 involve your OS's kernel, I have no doubt.
Yes, preemptive greenthreading would fit into this even better! But its no prerequisite, so I chose the collaborative way with Fibers for this. Not sure if the preemptive way could get accomplished with help of signal handling in a safe way.
Solaris does preemptive greenthreading automatically, so you get it for free if you use pthreads on that system. It's a pretty cool setup. Sean
Aug 30 2008
prev sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 27 Aug 2008 12:49:50 -0400, Oliver Dathe <o.dathe gmx.de> wrote:

 3.) Fibers could announce blocking calls to S, so it may transfer them  
 to a specific set of Threads which are created only for the purpose of  
 getting blocked. This is to avoid impact on the fibers in 1.) and 2.)
This sounds synergistic with thin-locks. Although locking code would have to be moved into the thread class to allow polymorphism on how the lock's aquired. i.e. Thead.getThis.lock(heavy_lock)
 If someone knows of similar attempts please let me know.
/ occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).
Aug 27 2008
parent reply Oliver Dathe <o.dathe gmx.de> writes:
 This sounds synergistic with thin-locks. Although locking code would 
 have to be moved into the thread class to allow polymorphism on how the 
 lock's aquired. i.e. Thead.getThis.lock(heavy_lock)
Polymorphism is already used for that. You could modify __monitor (or (cast(void*)this)[1]) in an object to show to a struct with an interface reference for lock() and unlock(). You can then synchronize(...) on that, so I don't think you have to touch Thread for that. That is what I've seen in tango.core.sync.Monitor.

 occam / occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).
Thanks, but CSP seems to have very little connection to this.
Aug 30 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Oliver Dathe wrote:
 This sounds synergistic with thin-locks. Although locking code would 
 have to be moved into the thread class to allow polymorphism on how 
 the lock's aquired. i.e. Thead.getThis.lock(heavy_lock)
Polymorphism is already used for that. You could modify __monitor (or (cast(void*)this)[1]) in an object to show to a struct with an interface reference for lock() and unlock(). You can then synchronize(...) on that, so I don't think you have to touch Thread for that. That is what I've seen in tango.core.sync.Monitor.
Yup. I'd even be happy to add a setMonitor routine somewhere, but as it's one line of code anyway I didn't see the need. Sean
Aug 30 2008
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 30 Aug 2008 10:35:10 -0400, Oliver Dathe <o.dathe gmx.de> wrote:

 This sounds synergistic with thin-locks. Although locking code would  
 have to be moved into the thread class to allow polymorphism on how the  
 lock's aquired. i.e. Thead.getThis.lock(heavy_lock)
Polymorphism is already used for that. You could modify __monitor (or (cast(void*)this)[1]) in an object to show to a struct with an interface reference for lock() and unlock(). You can then synchronize(...) on that, so I don't think you have to touch Thread for that. That is what I've seen in tango.core.sync.Monitor.
I think we're misunderstanding each other. Your post suggests to me that before using any shared object in a fiber, you first convert the monitor to the fiber lock-yield type and then later convert the object back when you return it to normal code. Polymorphism of the object should not be used as you don't know a prior whether an object is executing on a thread or a fiber or both, hence my suggestion of using thread polymorphism and moving the thick-lock code there. The synergy with thin-locks is that both threads and fibers should be able to take a thin-lock using the same code. Then again, since the thin-lock looks up the thread id anyways, maybe it's a moot point.

 occam / occam-pi (see: http://www.cs.kent.ac.uk/projects/ofa/kroc/ ).
Thanks, but CSP seems to have very little connection to this.
I re-read your post and it looks like your really interested in scheduling and not the threading. I'd still recommend CSP simply for the implementations of user-land thread scheduling. (besides alternatives/selects are cool) I'll also mention stackthreads, if your looking for just a pure fiber library. And nanothreads got a 40% performance increase over windows fibers if I remember correctly, so it might be worth taking a look at. In D, there's also DCSP. Also, futures are very efficient implementation wise, and would be easy to extend to take scheduling parameters, like a due date and/or profiling data (i.e. generated from play testing data or on the fly to determine which routines are typically slow and which are fast and to schedule accordingly)
Aug 30 2008