www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Lock-Free Actor-Based Flow Programming in D2 for GSOC2011?

reply eris <jvburnes gmail.com> writes:
Walter et al...

I'm interested in finishing my proof-of-concept multi-core actor/flow-based
programming system and I'd like to do it for D2.  I've already implemented a
.9 version in D1 and would like to leverage the additional features and
concurrency system in D2.

I'm currently tweaking it for lock-free, multi-core, reactor-based
environment. The lock-free intercore message FIFOs are currently running
around 50 MT (mega transfers) a second.

Is it too late?

eris
Jul 07 2011
next sibling parent reply David Nadlinger <see klickverbot.at> writes:
On 7/7/11 10:50 PM, eris wrote:
 Is it too late?
Unfortunately, yes, by about three months: http://www.google-melange.com/gsoc/events/google/gsoc2011 David
Jul 07 2011
next sibling parent eris <jvburnes gmail.com> writes:
 Unfortunately, yes, by about three months:
Damn their eyes. I'll have to do it anyway just for the fun of it. :-)
Jul 07 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/7/11 1:47 PM, David Nadlinger wrote:
 On 7/7/11 10:50 PM, eris wrote:
 Is it too late?
Unfortunately, yes, by about three months: http://www.google-melange.com/gsoc/events/google/gsoc2011 David
Or early by nine! Andrei
Jul 07 2011
parent eris <jvburnes gmail.com> writes:
Andrei wrote:

 Or early by nine!
Ah, yes. Ahem. That's what I meant. I typo'd it. That should read 'for GSOC2012'.
Jul 07 2011
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from eris (jvburnes gmail.com)'s article
 Walter et al...
 I'm interested in finishing my proof-of-concept multi-core actor/flow-based
 programming system and I'd like to do it for D2.  I've already implemented a
 .9 version in D1 and would like to leverage the additional features and
 concurrency system in D2.
 I'm currently tweaking it for lock-free, multi-core, reactor-based
 environment. The lock-free intercore message FIFOs are currently running
 around 50 MT (mega transfers) a second.
 Is it too late?
 eris
Can you please explain what this brings to the table beyond what's currently available in std.concurrency and std.parallelism? I'm very curious.
Jul 07 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 7/7/11 11:47 PM, dsimcha wrote:
 == Quote from eris (jvburnes gmail.com)'s article
 Walter et al...
 I'm interested in finishing my proof-of-concept multi-core actor/flow-based
 programming system and I'd like to do it for D2.  I've already implemented a
 .9 version in D1 and would like to leverage the additional features and
 concurrency system in D2.
 I'm currently tweaking it for lock-free, multi-core, reactor-based
 environment. The lock-free intercore message FIFOs are currently running
 around 50 MT (mega transfers) a second.
 Is it too late?
 eris
Can you please explain what this brings to the table beyond what's currently available in std.concurrency and std.parallelism? I'm very curious.
Yes, it would be great if you could elaborate a bit on the (high-level) design of your library – actor-based programing has become one my top interests lately. David
Jul 07 2011
parent reply eris <jvburnes gmail.com> writes:
Sorry for the hasty overview, I was trying to squeeze my request in under the
wire
(and failed).

My previous system (Dendrite) was a D1/Tango implementation of flow-based
programming environment similar to the BBC Kamaelia/Axon system implemented in
Python.  It was moderately challenging and fun, but really hard to avoid the
Turing Tarpit of a universal programming system.

I used the Tango Fibers implementation (thanks Sean Kelly I believe) and various
reactor libraries to implement the actor engine.

After I finished 0.9, one of the chat members (mwarning maybe?) asked me if it
was
multicore. (It wasn't).  This seemed like such an obvious and straight forward
addition that (like a fool) I set about implementing it.  (Tarpit 2)

Last time I finished something on that branch it was a high-speed lock-free
message FIFO so the Fiber Cores could communicate easily.  I achieved about
50-60
MT/s and figured I needn't go any further on that. I also integrated the
portable
hwloc library so I could introspect the runtime multicore topology and set the
Thread affinity.

In any case, the system basically allows you to define pipe-like (and simple
mailbox msg transfer) systems that process data by having concurrent actors
reading from their inputs, processing them and writing to their outputs.  Fun
for
the whole family.

So this is nothing tremendously new, flow-based actor systems are implemented
in a
number of languages and libraries.  Higher level concurrency systems like LINDA
can be built from actor systems etc.

After I came up for air after implementing the multicore flow-based components
D2
had really come into it's own.  I read TDPL (yea!) with special attention to the
concurrency system and I'm really excited to see how I can improve on my
Dendrite
system using D2-supported constructs (I know Fiber is in Phobos2 so that's
good).
 Elegant mailbox support and lock-free patterns are built right into the
concurrency system so that makes my life a lot easier if I can use them. 
(Thanks
Andrei et al)

Right now I'm trying to port to ldc2/phobos2.  As soon as I get a working build
things will go a lot better.

That's basically the idea.  There's a lot more that I haven't mentioned, like
the
fact that the reactor, scheduler and other engines are starting to take on
aspects
of an operating system or virtual machine.  Am I re-inventing Mach? Poorly?

That's the "overview".

eris




So, I started working back up my list of outstanding multicore patches
Jul 07 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
eris wrote:
 I used the Tango Fibers implementation (thanks Sean Kelly I believe) and
various
 reactor libraries to implement the actor engine.
I'm working on something similar, e.g. event-driven programming using fibers. I need it for my upcoming network library. But I don't use queue for each fiber, but instead each fiber "poll" for events so there's no possibility of queue overflow.
Jul 09 2011
parent reply nonexistent <non non.com> writes:
Piotr Szturmaj Wrote:

 eris wrote:
 I used the Tango Fibers implementation (thanks Sean Kelly I believe) and
various
 reactor libraries to implement the actor engine.
I'm working on something similar, e.g. event-driven programming using fibers. I need it for my upcoming network library. But I don't use queue for each fiber, but instead each fiber "poll" for events so there's no possibility of queue overflow.
Why not implement cross thread messaging and then use task stealing. Then you could set up a couple of default threads and add maybe up to 10 more on top as decidedly needed and each thread accesses a list (it's own list) of fibers to be executed. When one thread runs out of fibers to execute, instead of sitting and waiting for something else to get mapped to it, it just goes ahead and steals a fiber from the thread in front of it (which if in turn is empty, steals 2 from the one in front so it can have one and the other as well). You could of course change the granularity of steals so as to keep each thread with enough tasks. The only thing is that each of these threads would be a "StealThread" but something would have to keep a list of them and map events dynamically to each "StealThread" in the list. Instead of you being able to map to each thread individually, the list holder would do that for you in a way that tries to keep them full. Now, since each thing in the program could actually just be a fiber to be ran by the "StealThread", you could just map one function to a "StealThread" that would recursively map itself and other functions back into the Threads so as to continue execution of the program, you'd have a program that is multithreaded and pretty much runs on it's own without making actuall calls besides map this or that to a thread. This could turn out to be very flexible (especially if you could find a way to map as many "StealThread" functions into fibers to be passed to a "StealThread"). This is a very good method for servers because you don't want any thread to be still just because no tasks are mapped to it. The effeciency comes from stealing tasks, keeping all the little threads going.
Jul 09 2011
parent reply eris <jvburnes gmail.com> writes:
My library uses a straight-forward reactor approach to handle incoming events
(IO,
timer etc).   The library is structured as one-thread-per-core and as many
co-routines (fibers) per thread as memory will allow.  The threads communicate
with each other via lock-free single-writer, single-reader FIFOs.

Each threadCore has it's own reactor with the only limitation that only the
threadCore0 reactor (the default thread) can wait for signals / child processes.

Windows uses a "proactor" model instead of reactor, so it schedules I/O first
and
then waits for an IO completion flag. I've modified my reactor so that it
presents
a reactor facade even on Windows systems.  Performance suffers a bit, but you
pick
your platform and you takes your chances.
Jul 11 2011
next sibling parent Nonexistent <non non.com> writes:
eris Wrote:

 My library uses a straight-forward reactor approach to handle incoming events
(IO,
 timer etc).   The library is structured as one-thread-per-core and as many
 co-routines (fibers) per thread as memory will allow.  The threads communicate
 with each other via lock-free single-writer, single-reader FIFOs.
 
 Each threadCore has it's own reactor with the only limitation that only the
 threadCore0 reactor (the default thread) can wait for signals / child
processes.
 
 Windows uses a "proactor" model instead of reactor, so it schedules I/O first
and
 then waits for an IO completion flag. I've modified my reactor so that it
presents
 a reactor facade even on Windows systems.  Performance suffers a bit, but you
pick
 your platform and you takes your chances.
 
I will admit that I don't understand lock-free that well, but I have implemented thread safe lists before. In my case, it allowed multiple readers (they just increased one locked variable that held the count of how many readers, and decreased on finish reading). This allowed multiple readers to read from it. Whenever a writer decided it needed access, it would hit write, and wait for all the current readers to free themselves from the list. Then it would have exclusive access and write. The reason I mention this is the multiple readers. Allowing multiple reads would increase performance imo. Why is it that you don't do this? The other thing is, what if one core becomes completely free, are you waiting for an event to be mapped to a core? why not have each core steal events as needed. This would allow that no core becomes overloaded with long tasks and that all cores get to work even when new events aren't being mapped but long running events are currently mapped.
Jul 11 2011
prev sibling next sibling parent Nonexistent <non non.com> writes:
eris Wrote:

 My library uses a straight-forward reactor approach to handle incoming events
(IO,
 timer etc).   The library is structured as one-thread-per-core and as many
 co-routines (fibers) per thread as memory will allow.  The threads communicate
 with each other via lock-free single-writer, single-reader FIFOs.
 
 Each threadCore has it's own reactor with the only limitation that only the
 threadCore0 reactor (the default thread) can wait for signals / child
processes.
 
 Windows uses a "proactor" model instead of reactor, so it schedules I/O first
and
 then waits for an IO completion flag. I've modified my reactor so that it
presents
 a reactor facade even on Windows systems.  Performance suffers a bit, but you
pick
 your platform and you takes your chances.
 
I will admit that I don't understand lock-free that well, but I have implemented thread safe lists before. In my case, it allowed multiple readers (they just increased one locked variable that held the count of how many readers, and decreased on finish reading). This allowed multiple readers to read from it. Whenever a writer decided it needed access, it would hit write, and wait for all the current readers to free themselves from the list. Then it would have exclusive access and write. The reason I mention this is the multiple readers. Allowing multiple reads would increase performance imo. Why is it that you don't do this? The other thing is, what if one core becomes completely free, are you waiting for an event to be mapped to a core? why not have each core steal events as needed. This would allow that no core becomes overloaded with long tasks and that all cores get to work even when new events aren't being mapped but long running events are currently mapped.
Jul 11 2011
prev sibling parent reply Kagamin <spam here.lot> writes:
eris Wrote:

 Windows uses a "proactor" model instead of reactor, so it schedules I/O first
and
 then waits for an IO completion flag. I've modified my reactor so that it
presents
 a reactor facade even on Windows systems.
Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
Jul 12 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Kagamin wrote:
 eris Wrote:

 Windows uses a "proactor" model instead of reactor, so it schedules I/O first
and
 then waits for an IO completion flag. I've modified my reactor so that it
presents
 a reactor facade even on Windows systems.
Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
You mean synchronous blocking IO. Proactor in Windows means asynchronous non-blocking IO (overlapped IO) and completion ports. Client may request multiple IO operations and its thread is not put to sleep. Instead, client receives all completed operations using GetQueuedCompletionStatus() or using callback function.
Jul 12 2011
parent reply Kagamin <spam here.lot> writes:
Piotr Szturmaj Wrote:

 Kagamin wrote:
 eris Wrote:

 Windows uses a "proactor" model instead of reactor, so it schedules I/O first
and
 then waits for an IO completion flag. I've modified my reactor so that it
presents
 a reactor facade even on Windows systems.
Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
You mean synchronous blocking IO. Proactor in Windows means asynchronous non-blocking IO (overlapped IO) and completion ports. Client may request multiple IO operations and its thread is not put to sleep. Instead, client receives all completed operations using GetQueuedCompletionStatus() or using callback function.
From what I understand, reactor is meant to be synchronous?
Jul 12 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Kagamin wrote:
 Piotr Szturmaj Wrote:

 Kagamin wrote:
 eris Wrote:

 Windows uses a "proactor" model instead of reactor, so it schedules I/O first
and
 then waits for an IO completion flag. I've modified my reactor so that it
presents
 a reactor facade even on Windows systems.
Huh? What does it change? IO is done pretty much the same on all systems: client requests an io operation, OS send the thread to sleep until the operation is complete.
You mean synchronous blocking IO. Proactor in Windows means asynchronous non-blocking IO (overlapped IO) and completion ports. Client may request multiple IO operations and its thread is not put to sleep. Instead, client receives all completed operations using GetQueuedCompletionStatus() or using callback function.
From what I understand, reactor is meant to be synchronous?
Reactor is event handling pattern, when client specify event handlers like onRecv() and just wait for the events - these are delivered as soon as they arrive. Proactor on the other side requires that client issue each asynchronous operation manually - there will be no events delivered until client requests them. For example in linux's epoll, events are delivered as soon as there is available data in the buffer (in level triggered mode). In Windows NT family recv event is delivered only after successfull call to WSARecv(). Proactor model has (a debatable) advantage of specifying data buffer before issuing async IO. This could avoid data copying in some circumstances, because IO manager can read data directly into that buffer. In reactor models (epoll) buffer is provided after IO completes. This involves copying of data from internal buffer to user's buffer.
Jul 12 2011
parent Kagamin <spam here.lot> writes:
Piotr Szturmaj Wrote:

 Reactor is event handling pattern, when client specify event handlers 
 like onRecv() and just wait for the events - these are delivered as soon 
 as they arrive. Proactor on the other side requires that client issue 
 each asynchronous operation manually - there will be no events delivered 
 until client requests them.
 
 For example in linux's epoll, events are delivered as soon as there is 
 available data in the buffer (in level triggered mode). In Windows NT 
 family recv event is delivered only after successfull call to WSARecv().
epoll seems like windows synchronization api except that in windows one shouldn't wait on file handles directly.
Jul 12 2011