www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.concurrency, speed, etc.

reply Adam Conner-Sax <adam_conner_sax yahoo.com> writes:
Attached (over this and the next post) are 3 D files and sample output
(compiled with "dmd -O -release -inline Queue_Tester.d Queue_Examples.d
PrettyPrint.d" on my system, OSX 10.6.6, 2x2.8GHz quad-core Xeon) from a
multi-threaded queuing test-bed.

I wrote the tester as an exercise in learning D.  The language is great;
perfect for me as someone who loved generics in C++ but found that all the
cool things you could do got ugly and messy fast.

The tester loops over a few different sorts of queues (a locking queue using a
lock in the queue/dequeue functions,a lock free queue using hazard pointers,
std.concurrency message passing) and a few different test scenarios (number of
threads, number of messages, message rates) and measures the statistics of the
latencies (I stamp the time when the message is formed and enqueued and then
when it's dequeued).

Anyway, after writing and debugging this, I'm left with some questions (note,
I imagine the answers to all these may be that I coded things wrong or badly.
I welcome that answer as long as it comes with a hint for how to do better!)

1) I couldn't get the synchronized class version (as opposed to using
synchronized statements in the functions) to run.  It would hang in odd ways.
 This may be related to a bug I reported earlier (and Sean was helpful enough
to fix!) so this may be moot.

2) Message passing is slow in my tests.  Often an order of magnitude or more
slower than the fastest (lock-free queue).  I expect to pay some price for the
convenience, etc. but that seems excessive.

3) I've built and run on windows and linux.  The windows version works fine
but the linux version seems to have some issue with the timestamping (using
systime()), often the latencies came through as 0 (and I'm using
toMicroseconds!double() so I should see any ticks at all).  Is there a known
Linux bug or issue with systime().

4) I still don't totally understand shared.  It does what I expect when
variables are static.  That's why all the queues are static objects.  But that
doesn't scale so well (I know I could set up static factories for static
objects but that seems like it shouldn't be necessary).  When I put an
unshared variable in a non-static class and then use the class from multiple
threads, the variable acts shared.  Is that a bug or a feature?

Thanks for any and all thoughts.

Adam
Feb 04 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Adam Conner-Sax:

 I wrote the tester as an exercise in learning D.  The language is great;
 perfect for me as someone who loved generics in C++ but found that all the
 cool things you could do got ugly and messy fast.

Few notes on the form of your code: - I suggest to use module names all in lowercase Some alternative ways to write some of your code: auto sum = reduce!("a+b")(0.0,latency_data); ==> auto sum = reduce!q{a + b}(0.0, latencyData); auto sd = reduce!(f)(0.0,latency_data); ==> auto sd = reduce!f(0.0, latencyData); Test_Parameters[] tests; tests ~= Test_Parameters(1,10,0,0); tests ~= Test_Parameters(4,10,0,0); ... ==> auto tests = [TestParameters(1,10,0,0), TestParameters(4,10,0,0), ... immutable int[] widths = [11,10,10,10,10,10,10,10]; ==> enum int[] widths = [11, 10, 10, 10, 10, 10, 10, 10]; debug (5) { printf("Rec'd: (pkt %i) %.*s\n",received,QT.package_tostring(p)); } ==> debug(5) printf("Rec'd: (pkt %i) %.*s\n", received, QT.packageToString(p)); Bye, bearophile
Feb 04 2011
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Adam Conner-Sax Wrote:
 
 1) I couldn't get the synchronized class version (as opposed to using
 synchronized statements in the functions) to run.  It would hang in odd ways.
  This may be related to a bug I reported earlier (and Sean was helpful enough
 to fix!) so this may be moot.

'synchronized' as a class label may not be implemented in the compiler yet. I'd stick to explicitly labeling methods are 'synchronized' for now.
 2) Message passing is slow in my tests.  Often an order of magnitude or more
 slower than the fastest (lock-free queue).  I expect to pay some price for the
 convenience, etc. but that seems excessive.

The limiting factor at this point is the cost of copying the Message struct around during processing. I've eliminated nearly all copies by passing by ref internally, but I believe an unnecessary copy or two may still remain. I'll see about tuning this further. Tuning the ctor and copy ops in Variant and Tuple would help as well, since nearly all the time spent is in those routines. For what it's worth, it's fairly easy to time this by building with -profile and having the main thread send messages to itself (since -profile doesn't yet work in multithreaded apps).
 4) I still don't totally understand shared.  It does what I expect when
 variables are static.  That's why all the queues are static objects.  But that
 doesn't scale so well (I know I could set up static factories for static
 objects but that seems like it shouldn't be necessary).  When I put an
 unshared variable in a non-static class and then use the class from multiple
 threads, the variable acts shared.  Is that a bug or a feature?

Maybe you're just lucky? It's hard to reason about behavior without an example.
Feb 04 2011
next sibling parent Adam Conner-Sax <adam_conner_sax yahoo.com> writes:
== Quote from Sean Kelly (sean invisibleduck.org)'s article
 Adam Conner-Sax Wrote:
 1) I couldn't get the synchronized class version (as opposed to using
 synchronized statements in the functions) to run.  It would hang in odd ways.
  This may be related to a bug I reported earlier (and Sean was helpful enough
 to fix!) so this may be moot.


I couldn't get that to work either. What does work is a "synchronized" block of code. That seems potentially more efficient also.
 2) Message passing is slow in my tests.  Often an order of magnitude or more
 slower than the fastest (lock-free queue).  I expect to pay some price for the
 convenience, etc. but that seems excessive.


internally, but I believe an unnecessary copy or two may still remain. I'll see about tuning this further. Tuning the ctor and copy ops in Variant and Tuple would help as well, since nearly all the time spent is in those routines. For what it's worth, it's fairly easy to time this by building with -profile and having the main thread send messages to itself (since -profile doesn't yet work in multithreaded apps). Right. I've run into the multithreaded profiling issue. What you're describing makes sense: message passing has a much higher minimum time (7-8 us) than any of the others (1-2 us). That could be copying. I had thought it was some sort of wakeup to the receiver. The other methods just have a while loop waiting on new data rather than the blocking "receive" so I imagined there was some cost to waking up the receive thread.
 4) I still don't totally understand shared.  It does what I expect when
 variables are static.  That's why all the queues are static objects.  But that
 doesn't scale so well (I know I could set up static factories for static
 objects but that seems like it shouldn't be necessary).  When I put an
 unshared variable in a non-static class and then use the class from multiple
 threads, the variable acts shared.  Is that a bug or a feature?


Maybe. I'd rather it not work this way (sharing even though not marked shared). Then I could put the queues into non-static structures and get the shared and TLS the way I expect. That would make using them from a spawned function a bit trickier but I think that could be handled. Thanks for the thoughts. Adam
Feb 04 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 04 February 2011 16:09:08 Sean Kelly wrote:
 Adam Conner-Sax Wrote:
 1) I couldn't get the synchronized class version (as opposed to using
 synchronized statements in the functions) to run.  It would hang in odd
 ways.
 
  This may be related to a bug I reported earlier (and Sean was helpful
  enough
 
 to fix!) so this may be moot.

'synchronized' as a class label may not be implemented in the compiler yet. I'd stick to explicitly labeling methods are 'synchronized' for now.

IIRC, according to TDPL, it's supposed to be the whole class or non if it, not a per-function thing. So, if that's not how it works at the moment, that it's another of the things that hasn't been fixed to match TDPL yet. - Jonathan M Davis
Feb 04 2011