digitalmars.D - Lock-Free Actor-Based Flow Programming in D2 for GSOC2011?
- eris (10/10) Jul 07 2011 Walter et al...
- David Nadlinger (4/5) Jul 07 2011 Unfortunately, yes, by about three months:
- eris (3/4) Jul 07 2011 Damn their eyes.
- Andrei Alexandrescu (3/8) Jul 07 2011 Or early by nine!
- eris (2/3) Jul 07 2011 Ah, yes. Ahem. That's what I meant. I typo'd it. That should read 'for...
- dsimcha (3/13) Jul 07 2011 Can you please explain what this brings to the table beyond what's curre...
- David Nadlinger (5/18) Jul 07 2011 Yes, it would be great if you could elaborate a bit on the (high-level)
- eris (38/38) Jul 07 2011 Sorry for the hasty overview, I was trying to squeeze my request in unde...
- Piotr Szturmaj (5/7) Jul 09 2011 I'm working on something similar, e.g. event-driven programming using
- nonexistent (2/11) Jul 09 2011 Why not implement cross thread messaging and then use task stealing. The...
- eris (10/10) Jul 11 2011 My library uses a straight-forward reactor approach to handle incoming e...
- Nonexistent (2/15) Jul 11 2011 I will admit that I don't understand lock-free that well, but I have imp...
- Nonexistent (2/15) Jul 11 2011 I will admit that I don't understand lock-free that well, but I have imp...
- Kagamin (2/5) Jul 12 2011 Huh? What does it change? IO is done pretty much the same on all systems...
- Piotr Szturmaj (6/11) Jul 12 2011 You mean synchronous blocking IO. Proactor in Windows means asynchronous...
- Kagamin (2/16) Jul 12 2011 From what I understand, reactor is meant to be synchronous?
- Piotr Szturmaj (14/30) Jul 12 2011 Reactor is event handling pattern, when client specify event handlers
- Kagamin (2/11) Jul 12 2011 epoll seems like windows synchronization api except that in windows one ...
Walter et al... I'm interested in finishing my proof-of-concept multi-core actor/flow-based programming system and I'd like to do it for D2. I've already implemented a .9 version in D1 and would like to leverage the additional features and concurrency system in D2. I'm currently tweaking it for lock-free, multi-core, reactor-based environment. The lock-free intercore message FIFOs are currently running around 50 MT (mega transfers) a second. Is it too late? eris
Jul 07 2011
On 7/7/11 10:50 PM, eris wrote:Is it too late?Unfortunately, yes, by about three months: http://www.google-melange.com/gsoc/events/google/gsoc2011 David
Jul 07 2011
Unfortunately, yes, by about three months:Damn their eyes. I'll have to do it anyway just for the fun of it. :-)
Jul 07 2011
On 7/7/11 1:47 PM, David Nadlinger wrote:On 7/7/11 10:50 PM, eris wrote:Or early by nine! AndreiIs it too late?Unfortunately, yes, by about three months: http://www.google-melange.com/gsoc/events/google/gsoc2011 David
Jul 07 2011
Andrei wrote:Or early by nine!Ah, yes. Ahem. That's what I meant. I typo'd it. That should read 'for GSOC2012'.
Jul 07 2011
== Quote from eris (jvburnes gmail.com)'s articleWalter et al... I'm interested in finishing my proof-of-concept multi-core actor/flow-based programming system and I'd like to do it for D2. I've already implemented a .9 version in D1 and would like to leverage the additional features and concurrency system in D2. I'm currently tweaking it for lock-free, multi-core, reactor-based environment. The lock-free intercore message FIFOs are currently running around 50 MT (mega transfers) a second. Is it too late? erisCan you please explain what this brings to the table beyond what's currently available in std.concurrency and std.parallelism? I'm very curious.
Jul 07 2011
On 7/7/11 11:47 PM, dsimcha wrote:== Quote from eris (jvburnes gmail.com)'s articleYes, it would be great if you could elaborate a bit on the (high-level) design of your library – actor-based programing has become one my top interests lately. DavidWalter et al... I'm interested in finishing my proof-of-concept multi-core actor/flow-based programming system and I'd like to do it for D2. I've already implemented a .9 version in D1 and would like to leverage the additional features and concurrency system in D2. I'm currently tweaking it for lock-free, multi-core, reactor-based environment. The lock-free intercore message FIFOs are currently running around 50 MT (mega transfers) a second. Is it too late? erisCan you please explain what this brings to the table beyond what's currently available in std.concurrency and std.parallelism? I'm very curious.
Jul 07 2011
Sorry for the hasty overview, I was trying to squeeze my request in under the wire (and failed). My previous system (Dendrite) was a D1/Tango implementation of flow-based programming environment similar to the BBC Kamaelia/Axon system implemented in Python. It was moderately challenging and fun, but really hard to avoid the Turing Tarpit of a universal programming system. I used the Tango Fibers implementation (thanks Sean Kelly I believe) and various reactor libraries to implement the actor engine. After I finished 0.9, one of the chat members (mwarning maybe?) asked me if it was multicore. (It wasn't). This seemed like such an obvious and straight forward addition that (like a fool) I set about implementing it. (Tarpit 2) Last time I finished something on that branch it was a high-speed lock-free message FIFO so the Fiber Cores could communicate easily. I achieved about 50-60 MT/s and figured I needn't go any further on that. I also integrated the portable hwloc library so I could introspect the runtime multicore topology and set the Thread affinity. In any case, the system basically allows you to define pipe-like (and simple mailbox msg transfer) systems that process data by having concurrent actors reading from their inputs, processing them and writing to their outputs. Fun for the whole family. So this is nothing tremendously new, flow-based actor systems are implemented in a number of languages and libraries. Higher level concurrency systems like LINDA can be built from actor systems etc. After I came up for air after implementing the multicore flow-based components D2 had really come into it's own. I read TDPL (yea!) with special attention to the concurrency system and I'm really excited to see how I can improve on my Dendrite system using D2-supported constructs (I know Fiber is in Phobos2 so that's good). Elegant mailbox support and lock-free patterns are built right into the concurrency system so that makes my life a lot easier if I can use them. (Thanks Andrei et al) Right now I'm trying to port to ldc2/phobos2. As soon as I get a working build things will go a lot better. That's basically the idea. There's a lot more that I haven't mentioned, like the fact that the reactor, scheduler and other engines are starting to take on aspects of an operating system or virtual machine. Am I re-inventing Mach? Poorly? That's the "overview". eris So, I started working back up my list of outstanding multicore patches
Jul 07 2011
eris wrote:I used the Tango Fibers implementation (thanks Sean Kelly I believe) and various reactor libraries to implement the actor engine.I'm working on something similar, e.g. event-driven programming using fibers. I need it for my upcoming network library. But I don't use queue for each fiber, but instead each fiber "poll" for events so there's no possibility of queue overflow.
Jul 09 2011
Piotr Szturmaj Wrote:eris wrote:Why not implement cross thread messaging and then use task stealing. Then you could set up a couple of default threads and add maybe up to 10 more on top as decidedly needed and each thread accesses a list (it's own list) of fibers to be executed. When one thread runs out of fibers to execute, instead of sitting and waiting for something else to get mapped to it, it just goes ahead and steals a fiber from the thread in front of it (which if in turn is empty, steals 2 from the one in front so it can have one and the other as well). You could of course change the granularity of steals so as to keep each thread with enough tasks. The only thing is that each of these threads would be a "StealThread" but something would have to keep a list of them and map events dynamically to each "StealThread" in the list. Instead of you being able to map to each thread individually, the list holder would do that for you in a way that tries to keep them full. Now, since each thing in the program could actually just be a fiber to be ran by the "StealThread", you could just map one function to a "StealThread" that would recursively map itself and other functions back into the Threads so as to continue execution of the program, you'd have a program that is multithreaded and pretty much runs on it's own without making actuall calls besides map this or that to a thread. This could turn out to be very flexible (especially if you could find a way to map as many "StealThread" functions into fibers to be passed to a "StealThread"). This is a very good method for servers because you don't want any thread to be still just because no tasks are mapped to it. The effeciency comes from stealing tasks, keeping all the little threads going.I used the Tango Fibers implementation (thanks Sean Kelly I believe) and various reactor libraries to implement the actor engine.I'm working on something similar, e.g. event-driven programming using fibers. I need it for my upcoming network library. But I don't use queue for each fiber, but instead each fiber "poll" for events so there's no possibility of queue overflow.
Jul 09 2011
My library uses a straight-forward reactor approach to handle incoming events (IO, timer etc). The library is structured as one-thread-per-core and as many co-routines (fibers) per thread as memory will allow. The threads communicate with each other via lock-free single-writer, single-reader FIFOs. Each threadCore has it's own reactor with the only limitation that only the threadCore0 reactor (the default thread) can wait for signals / child processes. Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems. Performance suffers a bit, but you pick your platform and you takes your chances.
Jul 11 2011
eris Wrote:My library uses a straight-forward reactor approach to handle incoming events (IO, timer etc). The library is structured as one-thread-per-core and as many co-routines (fibers) per thread as memory will allow. The threads communicate with each other via lock-free single-writer, single-reader FIFOs. Each threadCore has it's own reactor with the only limitation that only the threadCore0 reactor (the default thread) can wait for signals / child processes. Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems. Performance suffers a bit, but you pick your platform and you takes your chances.I will admit that I don't understand lock-free that well, but I have implemented thread safe lists before. In my case, it allowed multiple readers (they just increased one locked variable that held the count of how many readers, and decreased on finish reading). This allowed multiple readers to read from it. Whenever a writer decided it needed access, it would hit write, and wait for all the current readers to free themselves from the list. Then it would have exclusive access and write. The reason I mention this is the multiple readers. Allowing multiple reads would increase performance imo. Why is it that you don't do this? The other thing is, what if one core becomes completely free, are you waiting for an event to be mapped to a core? why not have each core steal events as needed. This would allow that no core becomes overloaded with long tasks and that all cores get to work even when new events aren't being mapped but long running events are currently mapped.
Jul 11 2011
eris Wrote:My library uses a straight-forward reactor approach to handle incoming events (IO, timer etc). The library is structured as one-thread-per-core and as many co-routines (fibers) per thread as memory will allow. The threads communicate with each other via lock-free single-writer, single-reader FIFOs. Each threadCore has it's own reactor with the only limitation that only the threadCore0 reactor (the default thread) can wait for signals / child processes. Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems. Performance suffers a bit, but you pick your platform and you takes your chances.I will admit that I don't understand lock-free that well, but I have implemented thread safe lists before. In my case, it allowed multiple readers (they just increased one locked variable that held the count of how many readers, and decreased on finish reading). This allowed multiple readers to read from it. Whenever a writer decided it needed access, it would hit write, and wait for all the current readers to free themselves from the list. Then it would have exclusive access and write. The reason I mention this is the multiple readers. Allowing multiple reads would increase performance imo. Why is it that you don't do this? The other thing is, what if one core becomes completely free, are you waiting for an event to be mapped to a core? why not have each core steal events as needed. This would allow that no core becomes overloaded with long tasks and that all cores get to work even when new events aren't being mapped but long running events are currently mapped.
Jul 11 2011
eris Wrote:Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems.Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
Jul 12 2011
Kagamin wrote:eris Wrote:You mean synchronous blocking IO. Proactor in Windows means asynchronous non-blocking IO (overlapped IO) and completion ports. Client may request multiple IO operations and its thread is not put to sleep. Instead, client receives all completed operations using GetQueuedCompletionStatus() or using callback function.Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems.Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
Jul 12 2011
Piotr Szturmaj Wrote:Kagamin wrote:From what I understand, reactor is meant to be synchronous?eris Wrote:You mean synchronous blocking IO. Proactor in Windows means asynchronous non-blocking IO (overlapped IO) and completion ports. Client may request multiple IO operations and its thread is not put to sleep. Instead, client receives all completed operations using GetQueuedCompletionStatus() or using callback function.Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems.Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
Jul 12 2011
Kagamin wrote:Piotr Szturmaj Wrote:Reactor is event handling pattern, when client specify event handlers like onRecv() and just wait for the events - these are delivered as soon as they arrive. Proactor on the other side requires that client issue each asynchronous operation manually - there will be no events delivered until client requests them. For example in linux's epoll, events are delivered as soon as there is available data in the buffer (in level triggered mode). In Windows NT family recv event is delivered only after successfull call to WSARecv(). Proactor model has (a debatable) advantage of specifying data buffer before issuing async IO. This could avoid data copying in some circumstances, because IO manager can read data directly into that buffer. In reactor models (epoll) buffer is provided after IO completes. This involves copying of data from internal buffer to user's buffer.Kagamin wrote:From what I understand, reactor is meant to be synchronous?eris Wrote:You mean synchronous blocking IO. Proactor in Windows means asynchronous non-blocking IO (overlapped IO) and completion ports. Client may request multiple IO operations and its thread is not put to sleep. Instead, client receives all completed operations using GetQueuedCompletionStatus() or using callback function.Windows uses a "proactor" model instead of reactor, so it schedules I/O first and then waits for an IO completion flag. I've modified my reactor so that it presents a reactor facade even on Windows systems.Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
Jul 12 2011
Piotr Szturmaj Wrote:Reactor is event handling pattern, when client specify event handlers like onRecv() and just wait for the events - these are delivered as soon as they arrive. Proactor on the other side requires that client issue each asynchronous operation manually - there will be no events delivered until client requests them. For example in linux's epoll, events are delivered as soon as there is available data in the buffer (in level triggered mode). In Windows NT family recv event is delivered only after successfull call to WSARecv().epoll seems like windows synchronization api except that in windows one shouldn't wait on file handles directly.
Jul 12 2011