www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [std.concurrency] prioritySend is 1000 times slower than send?

reply osa <osa aso.osa> writes:
I started using std.concurrency in some projects and overall it feels 
like a solid (albeit minimalistic) design. However, current 
implementation has some issues. For example, I've noticed that using 
prioritySend slows everything considerably. Here is a simple benchmark 
to demonstrate the problem:
---------
import std.concurrency;
import std.date;
import std.stdio;

struct Message {}

void main() {
     enum TIME_LIMIT = 5 * ticksPerSecond;
     auto started = getUTCtime();
     d_time running = 0;
     long iterations = 0;

     while( running < TIME_LIMIT ) {
         version( priority ) {
             prioritySend( thisTid, Message() );
         }
         else {
             send( thisTid, Message() );
         }
         receive( (Message){} );
         if( ++iterations % 100 == 0 ) {
             running = getUTCtime() - started;
         }
     }

     auto seconds = cast(double)running / ticksPerSecond;
     writeln( "Benchmark: ", iterations, " iterations in ", seconds, " 
seconds (", iterations / seconds, "/second)" );
}
---------

Using dmd v2.049 on linux, this produces:
	Benchmark: 4469600 iterations in 5 seconds (893920/second)

But when compiled with -version=priority, result is quite different:
	Benchmark: 3700 iterations in 5.177 seconds (714.7/second)

This is about 1250 times slower than using send! Is there any reason for 
such penalty for using prioritySend?

Note that benchmark code is single-threaded. Initial version was using 
two threads (with similar discrepancy between send and prioritySend) but 
when I've tried to run it after compiling with -profile, it did not 
work. I assume that profiling is not supported for multi-threaded 
programs yet? So I profiled single-threaded benchmark and it seems that 
the main offender is PriorityMessageException constructor:
   Num          Tree        Func        Per
   Calls        Time        Time        Call

    1700   777986427   777986427      457639     class 
std.concurrency.PriorityMessageException!(struct 
concur1.Message).PriorityMessageException 
std.concurrency.PriorityMessageException!(struct 
concur1.Message).PriorityMessageException.__ctor(struct concur1.Message)

P.S. demangle program example at 
http://www.digitalmars.com/d/2.0/phobos/std_demangle.html is broken -- 
it does not compile.

P.P.S. std.demangle fails for some symbols, for example:
	_D3std5array13__T5emptyTyaZ5emptyFNdxAyaZb
	_D3std6format19__T10FormatSpecTyaZ10FormatSpec6__ctorMFNcxAyaZS3std6format19__T10FormatSpecTyaZ10FormatSpec
and many other.
Sep 29 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 29 Sep 2010 14:25:07 -0400, osa <osa aso.osa> wrote:

 P.S. demangle program example at  
 http://www.digitalmars.com/d/2.0/phobos/std_demangle.html is broken --  
 it does not compile.
  P.P.S. std.demangle fails for some symbols, for example:
 	_D3std5array13__T5emptyTyaZ5emptyFNdxAyaZb
 	_D3std6format19__T10FormatSpecTyaZ10FormatSpec6__ctorMFNcxAyaZS3std6format19__T10FormatSpecTyaZ10FormatSpec
 and many other.
Note, core.demangle will probably soon replace std.demangle, and is actively being developed. You may need to download the svn version of druntime. ref: http://lists.puremagic.com/pipermail/phobos/2010-September/002376.html -Steve
Sep 29 2010
parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 29 Sep 2010 22:31:53 +0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 29 Sep 2010 14:25:07 -0400, osa <osa aso.osa> wrote:

 P.S. demangle program example at  
 http://www.digitalmars.com/d/2.0/phobos/std_demangle.html is broken --  
 it does not compile.
  P.P.S. std.demangle fails for some symbols, for example:
 	_D3std5array13__T5emptyTyaZ5emptyFNdxAyaZb
 	_D3std6format19__T10FormatSpecTyaZ10FormatSpec6__ctorMFNcxAyaZS3std6format19__T10FormatSpecTyaZ10FormatSpec
 and many other.
Note, core.demangle will probably soon replace std.demangle, and is actively being developed. You may need to download the svn version of druntime. ref: http://lists.puremagic.com/pipermail/phobos/2010-September/002376.html -Steve
IIRC, core.demangle is already in dmd2.049 (i.e. no need for an svn version unless there are significant changes in core.demangle in trunk).
Sep 29 2010
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
osa Wrote:

 I started using std.concurrency in some projects and overall it feels 
 like a solid (albeit minimalistic) design. However, current 
 implementation has some issues. For example, I've noticed that using 
 prioritySend slows everything considerably.
Thanks for this. I can tell you that prioritySend performs an extra allocation to account for a design requirement (if a priority message isn't received it's thrown as PriorityMessage!(T), and this exception is generated when the send occurs, since static type info isn't available at the receive side when it's needed for this). I had originally thought that the difference was just more garbage collections, but calling GC.disable only increases the number of priority messages sent by about 1000. I'll have to look at the code to see if I can figure out what's going on.
Sep 30 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Sean Kelly Wrote:

 osa Wrote:
 
 I started using std.concurrency in some projects and overall it feels 
 like a solid (albeit minimalistic) design. However, current 
 implementation has some issues. For example, I've noticed that using 
 prioritySend slows everything considerably.
Thanks for this. I can tell you that prioritySend performs an extra allocation to account for a design requirement (if a priority message isn't received it's thrown as PriorityMessage!(T), and this exception is generated when the send occurs, since static type info isn't available at the receive side when it's needed for this). I had originally thought that the difference was just more garbage collections, but calling GC.disable only increases the number of priority messages sent by about 1000. I'll have to look at the code to see if I can figure out what's going on.
Okay, I've fixed one issue with priority messages that, aside from broken behavior, has increased performance somewhat. Here are the timings: Benchmark: 5944400 iterations in 5 seconds (1.18888e+06/second) -- built without -version=priority Benchmark: 4900 iterations in 5.119 seconds (957.218/second) -- build with -version=priority before fix Benchmark: 39700 iterations in 5.001 seconds (7938.41/second) -- built with version=priority after fix The remaining issue has to do with the fact that the exception is constructed when the send is issued and when this exception is constructed a stack trace is generated as well. I'll have to modify Throwable so that derived classes can specify that no trace be generated. That or eliminate constructing the exception at the send site and change how that exception is represented.
Sep 30 2010
next sibling parent reply osa <osa aso.osa> writes:
On 09/30/2010 01:45 PM, Sean Kelly wrote:
 Benchmark: 5944400 iterations in 5 seconds (1.18888e+06/second) -- built
without -version=priority
 Benchmark: 4900 iterations in 5.119 seconds (957.218/second) -- build with
-version=priority before fix
 Benchmark: 39700 iterations in 5.001 seconds (7938.41/second) -- built with
version=priority after fix
Seems to be about an order of magnitude improvement. Not too bad.
 The remaining issue has to do with the fact that the exception is constructed
when the send is issued and when this exception is constructed a stack trace is
generated as well.  I'll have to modify Throwable so that derived classes can
specify that no trace be generated.  That or eliminate constructing the
exception at the send site and change how that exception is represented.
I've also thought about switching to 'send' if the receiver queue is empty, but there is no way in std.concurrency API to check for that. Is there any serious issue with adding such method? I understand that in multi-threaded environment an empty queue as told by 'isEmpty' call may become non-empty before that fact is used, but in some situations approximate result (means empty or almost empty) is fine.
Sep 30 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
osa Wrote:
 
 I've also thought about switching to 'send' if the receiver queue is 
 empty, but there is no way in std.concurrency API to check for that. Is 
 there any serious issue with adding such method? I understand that in 
 multi-threaded environment an empty queue as told by 'isEmpty' call may 
 become non-empty before that fact is used, but in some situations 
 approximate result (means empty or almost empty) is fine.
The current API is designed to apply to in-process and out-of-process messaging, so a function like that doesn't really fit. I think this is really more of just a tuning issue. And in fact, that the PriorityMessage exception is a template isn't feasible for out-of-process messaging, so this is an issue that has to be addressed at some point anyway. I think I'm going to both change the exception to be generated within receive() only if needed, have it contain a variant instead of a templated type, and possibly also not generate a stack trace for it. I haven't decided whether a trace is meaningful in this context. Getting a PriorityMessage exception could imply a failure to receive() a type required by the application design so a trace might be a good indication of where the error is... or maybe that's just wrong. I'm looking into the hang issue as well... it's just less obvious where the problem is there.
Sep 30 2010
parent reply osa <osa aso.osa> writes:
On 09/30/2010 03:33 PM, Sean Kelly wrote:
 osa Wrote:
 I've also thought about switching to 'send' if the receiver queue is
 empty, but there is no way in std.concurrency API to check for that. Is
 there any serious issue with adding such method? I understand that in
 multi-threaded environment an empty queue as told by 'isEmpty' call may
 become non-empty before that fact is used, but in some situations
 approximate result (means empty or almost empty) is fine.
The current API is designed to apply to in-process and out-of-process messaging, so a function like that doesn't really fit.
I see. It is reasonable if out-of-process messaging is going to be implemented.
  Getting a PriorityMessage exception could imply a failure to receive() a type
required by the application design so a trace might be a good indication of
where the error is... or maybe that's just wrong.
I'd say that having a trace for exceptions thrown by recieve may be useful only if you have many receieve() calls scattered all over the code, with try...catch on the very top level. But my (limited) experience with std.concurrency way of thread communication tells me that it is a bad idea; I'd use as few calls to receive() as possible and keep them close to each other. But people's mileage may vary.
Sep 30 2010
parent Sean Kelly <sean invisibleduck.org> writes:
osa Wrote:

 On 09/30/2010 03:33 PM, Sean Kelly wrote:
 osa Wrote:
 I've also thought about switching to 'send' if the receiver queue is
 empty, but there is no way in std.concurrency API to check for that. Is
 there any serious issue with adding such method? I understand that in
 multi-threaded environment an empty queue as told by 'isEmpty' call may
 become non-empty before that fact is used, but in some situations
 approximate result (means empty or almost empty) is fine.
The current API is designed to apply to in-process and out-of-process messaging, so a function like that doesn't really fit.
I see. It is reasonable if out-of-process messaging is going to be implemented.
It will be. But I want to get the bumps smoothed out for in-process messaging first.
Sep 30 2010
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Sean Kelly (sean invisibleduck.org)'s article
 Sean Kelly Wrote:
 osa Wrote:

 I started using std.concurrency in some projects and overall it feels
 like a solid (albeit minimalistic) design. However, current
 implementation has some issues. For example, I've noticed that using
 prioritySend slows everything considerably.
Thanks for this. I can tell you that prioritySend performs an extra allocation to account for a design requirement (if a priority message isn't received it's
thrown as PriorityMessage!(T), and this exception is generated when the send occurs, since static type info isn't available at the receive side when it's needed for this). I had originally thought that the difference was just more garbage collections, but calling GC.disable only increases the number of priority messages sent by about 1000. I'll have to look at the code to see if I can figure out what's going on.
 Okay, I've fixed one issue with priority messages that, aside from broken
behavior, has increased performance somewhat.  Here are the timings:
 Benchmark: 5944400 iterations in 5 seconds (1.18888e+06/second) -- built
without -version=priority
 Benchmark: 4900 iterations in 5.119 seconds (957.218/second) -- build with
-version=priority before fix
 Benchmark: 39700 iterations in 5.001 seconds (7938.41/second) -- built with
version=priority after fix
 The remaining issue has to do with the fact that the exception is constructed
when the send is issued and when this exception is constructed a stack trace is
generated as well. I'll have to modify Throwable so that derived classes can specify that no trace be generated. That or eliminate constructing the exception at the send site and change how that exception is represented. I just made some functional changes to how priority messages are sent and added a few performance tweaks to messaging in general. The only visible difference should be that PriorityMessageException is no longer a template class but instead contains a Variant, which is something that would have been necessary for inter-process messaging anyway. Here are the timings: --- Before --- $ dmd -inline -release -O priority Benchmark: 5749600 iterations in 5 seconds (1.14992e+06/second) Benchmark: 5747800 iterations in 5 seconds (1.14956e+06/second) Benchmark: 5748200 iterations in 5 seconds (1.14964e+06/second) $ dmd -inline -release -O priority -version=priority Benchmark: 39100 iterations in 5.01 seconds (7804.39/second) Benchmark: 39100 iterations in 5.01 seconds (7804.39/second) Benchmark: 39100 iterations in 5 seconds (7820/second) --- After --- $ dmd -inline -release -O priority Benchmark: 7204200 iterations in 5 seconds (1.44084e+06/second) Benchmark: 7167000 iterations in 5 seconds (1.4334e+06/second) Benchmark: 7164400 iterations in 5 seconds (1.43288e+06/second) $ dmd -inline -release -O priority -version=priority Benchmark: 7442500 iterations in 5 seconds (1.4885e+06/second) Benchmark: 7448600 iterations in 5 seconds (1.48972e+06/second) Benchmark: 7421800 iterations in 5 seconds (1.48436e+06/second)
Oct 08 2010
parent osa <osa aso.osa> writes:
On 10/08/2010 04:29 PM, Sean Kelly wrote:
 I just made some functional changes to how priority messages are sent and
added a few performance tweaks to messaging in general.  The only visible
 difference should be that PriorityMessageException is no longer a template
class but instead contains a Variant, which is something that would have been
 necessary for inter-process messaging anyway.  Here are the timings:

 --- After ---

 $ dmd -inline -release -O priority
 Benchmark: 7204200 iterations in 5 seconds (1.44084e+06/second)
 Benchmark: 7167000 iterations in 5 seconds (1.4334e+06/second)
 Benchmark: 7164400 iterations in 5 seconds (1.43288e+06/second)

 $ dmd -inline -release -O priority -version=priority
 Benchmark: 7442500 iterations in 5 seconds (1.4885e+06/second)
 Benchmark: 7448600 iterations in 5 seconds (1.48972e+06/second)
 Benchmark: 7421800 iterations in 5 seconds (1.48436e+06/second)
Wow! This is a really good improvement. Thanks! I assume this is in phobos SVN already, so I'll try to build my application (not simplified benchmark) using updated std.concurrency to see how it performs now. I'll let you know if something is wrong ;)
Oct 08 2010