www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Death by concurrency

reply Manfred Nowak <svv1999 hotmail.com> writes:
The well known shootout shows a negative mark for concurrency for D:

http://shootout.alioth.debian.org/benchmark.php?
test=message&lang=all&sort=fullcpu

What is the reason?

-manfred
Nov 08 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:Xns97086B99494CDsvv1999hotmailcom 63.105.9.61...
 The well known shootout shows a negative mark for concurrency for D:

 http://shootout.alioth.debian.org/benchmark.php?
 test=message&lang=all&sort=fullcpu

 What is the reason?

 -manfred

It could be the busy-waiting. Instead of looping and yielding a waiting thread should park itself. The ReentrantLock and Condition classes from http://home.comcast.net/~benhinkle/locks/locks.html should help - but I don't know if user libraries are allowed in the shootout like that.
Nov 08 2005
parent reply Sean Kelly <sean f4.ca> writes:
Ben Hinkle wrote:
 "Manfred Nowak" <svv1999 hotmail.com> wrote in message 
 news:Xns97086B99494CDsvv1999hotmailcom 63.105.9.61...
 
The well known shootout shows a negative mark for concurrency for D:

http://shootout.alioth.debian.org/benchmark.php?
test=message&lang=all&sort=fullcpu

What is the reason?


 It could be the busy-waiting. Instead of looping and yielding a waiting 
 thread should park itself. The ReentrantLock and Condition classes from 
 http://home.comcast.net/~benhinkle/locks/locks.html should help - but I 
 don't know if user libraries are allowed in the shootout like that. 

I'm not sure what's wrong with their test. I modified the shootout code to run on Ares with DMD .139 (since I'm too lazy to rebuild Phobos just for this test), and ptime reported it completing in 0.625 seconds on my laptop. And this was with quite a lot of stuff running in the background. In case anyone is interested, here is the test code. I simply renamed 'wait' to 'join' and did output via printf instead of streams: import std.thread, std.c.stdio, std.c.stdlib; int main(char[][] args) { const int length = 500; int n = args.length > 1 ? atoi(args[1]) : 1; EndLink chainEnd = new EndLink(length * n); chainEnd.start(); Link chain = chainEnd; while(n--) { for(int i = 1; i < length; i++) { Link link = new Link(chain); chain = link; } chain.put(0); while(chain.next) { chain.start(); chain.join(); chain = chain.next; } } chainEnd.join(); printf("%i\n", chainEnd.count); return 0; } class Link: Thread { private: int message = -1; public: Link next; this(Link t) { next = t; } void run() { next.put(this.take()); } synchronized void put(int m) { message = m; yield(); } protected: synchronized int take() { if(message != -1) { int m = message; message = -1; return m + 1; } yield(); return 0; } } class EndLink: Link { private: int finalCount; public: int count = 0; this(int i) { super(null); finalCount = i; } void run() { while(count < finalCount) { count += this.take(); yield(); } } }
Nov 08 2005
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Oh, the shootout says the code should print '5000' and mine printed 
'500'.  I haven't taken the time to figure out why the result was 
different, though it's likely a bug in the shootout code.


Sean
Nov 08 2005
parent Sean Kelly <sean f4.ca> writes:
Oops.  I just noticed that N is a command-line parameter.  For an N of 
10, ptime clocks this test at 5.210 seconds on my laptop, and '5000' is 
printed as expected.


Sean
Nov 08 2005
prev sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
If the code was making 500 threads it could also be that they ran the 
benchmark on linux and bumped into phobos's limitation on the number of 
threads allowed at once:
    static Thread[/*_POSIX_THREAD_THREADS_MAX*/ 100] allThreads;


"Sean Kelly" <sean f4.ca> wrote in message 
news:dkr4ls$ljk$1 digitaldaemon.com...
 Ben Hinkle wrote:
 "Manfred Nowak" <svv1999 hotmail.com> wrote in message 
 news:Xns97086B99494CDsvv1999hotmailcom 63.105.9.61...

The well known shootout shows a negative mark for concurrency for D:

http://shootout.alioth.debian.org/benchmark.php?
test=message&lang=all&sort=fullcpu

What is the reason?


 It could be the busy-waiting. Instead of looping and yielding a waiting 
 thread should park itself. The ReentrantLock and Condition classes from 
 http://home.comcast.net/~benhinkle/locks/locks.html should help - but I 
 don't know if user libraries are allowed in the shootout like that.

I'm not sure what's wrong with their test. I modified the shootout code to run on Ares with DMD .139 (since I'm too lazy to rebuild Phobos just for this test), and ptime reported it completing in 0.625 seconds on my laptop. And this was with quite a lot of stuff running in the background. In case anyone is interested, here is the test code. I simply renamed 'wait' to 'join' and did output via printf instead of streams: import std.thread, std.c.stdio, std.c.stdlib; int main(char[][] args) { const int length = 500; int n = args.length > 1 ? atoi(args[1]) : 1; EndLink chainEnd = new EndLink(length * n); chainEnd.start(); Link chain = chainEnd; while(n--) { for(int i = 1; i < length; i++) { Link link = new Link(chain); chain = link; } chain.put(0); while(chain.next) { chain.start(); chain.join(); chain = chain.next; } } chainEnd.join(); printf("%i\n", chainEnd.count); return 0; } class Link: Thread { private: int message = -1; public: Link next; this(Link t) { next = t; } void run() { next.put(this.take()); } synchronized void put(int m) { message = m; yield(); } protected: synchronized int take() { if(message != -1) { int m = message; message = -1; return m + 1; } yield(); return 0; } } class EndLink: Link { private: int finalCount; public: int count = 0; this(int i) { super(null); finalCount = i; } void run() { while(count < finalCount) { count += this.take(); yield(); } } }

Nov 08 2005
parent reply Sean Kelly <sean f4.ca> writes:
Ben Hinkle wrote:
 If the code was making 500 threads it could also be that they ran the 
 benchmark on linux and bumped into phobos's limitation on the number of 
 threads allowed at once:
     static Thread[/*_POSIX_THREAD_THREADS_MAX*/ 100] allThreads;

Ah, good point. Ares doesn't have this limitation as it used an AA for storing thread references. Sean
Nov 08 2005
parent reply pragma <pragma_member pathlink.com> writes:
In article <dkr5ac$me5$2 digitaldaemon.com>, Sean Kelly says...
Ben Hinkle wrote:
 If the code was making 500 threads it could also be that they ran the 
 benchmark on linux and bumped into phobos's limitation on the number of 
 threads allowed at once:
     static Thread[/*_POSIX_THREAD_THREADS_MAX*/ 100] allThreads;

Ah, good point. Ares doesn't have this limitation as it used an AA for storing thread references.

Sean, Out of curiosity, have you tried using Ares' Atomic lib for this task? I wonder what the difference in time would be when compared to 'synchronized'? - EricAnderton at yahoo
Nov 08 2005
parent Sean Kelly <sean f4.ca> writes:
pragma wrote:
 Out of curiosity, have you tried using Ares' Atomic lib for this task?  I
wonder
 what the difference in time would be when compared to 'synchronized'?

See my reply to the OP. I tried simply removing the 'synchronized' properties entirely and only saw a small performance increase (less than 0.1 seconds average). I suspect this is because the real time consumer in this case is thread creation. I also tried disabling the GC and the test ran slower on average than with it enabled. It would probably be difficult to optimize this test to perform noticeably better as the 500 threads need to be created no matter what. Sean
Nov 08 2005
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Manfred Nowak wrote:
 The well known shootout shows a negative mark for concurrency for D

I don't really like the way this test is structured, as what it is really testing the efficiency of thread creation. For any language with its roots in OS-level thread code, the performance should be pretty much equivalent. I suspect the functional languages perform so well because they do user-level concurrency rather than kernel-level concurrency (and probably also because they don't allocate large chunks of memory for stack space and such in the process). I'm quite surprised by the abysmal performance of the Scheme and OCaml tests however. Is it simply because their interpreters stink? Sean
Nov 08 2005
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Sean Kelly wrote:
 Manfred Nowak wrote:
 
 The well known shootout shows a negative mark for concurrency for D


<snip>
                 I suspect the functional languages perform
 so well because they do user-level concurrency rather than
 kernel-level concurrency 

<snip> I think we should have both in D. I don't think it's too hard to imagine a situation where one would want to use a few real OS threads, and _within_ some of them a bunch of simple cooperating light weight threads. ("Fibers, if you like.") Equally, preemtive threading is overkill for a lot of other things.
Nov 09 2005
parent Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:
Georg Wrede wrote:

 I suspect the functional languages perform
 so well because they do user-level concurrency rather than
 kernel-level concurrency

<snip> I think we should have both in D. I don't think it's too hard to imagine a situation where one would want to use a few real OS threads, and _within_ some of them a bunch of simple cooperating light weight threads. ("Fibers, if you like.")

I would like to see it too. I've to write my own "tasks" for TCP server I'm working on. Real threads are just to heavy to be massive. And things like this are often used IMO, so having them in standard lib would be great.
Nov 12 2005