www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More on Multithreading Performance

Our multithreading performance problems can probably be mitigated, at least on
Windows, by using InitializeCriticalSectionAndSpinCount instead of
InitializeCriticalSection to implement synchronized blocks.  According to
http://msdn.microsoft.com/en-us/library/ms683476%28VS.85%29.aspx this causes
the waiting thread to spin a specified amount of times before being context
switched, but only on multiprocesser computers.

This seems like a no-brainer for the GC lock.  Having a small amount of
spinning before the context switch also seems like a pretty good default for
synchronized blocks in general.  People who really want to customize things
like this will use something something more customizable than a plain old
synchronized block.

Here's a test program that measures the speed-up.

import core.thread, std.stdio, std.perf, core.sys.windows.windows, std.conv,

extern(Windows) BOOL InitializeCriticalSectionAndSpinCount(CRITICAL_SECTION*,

enum nThreads = 2;
__gshared int num = 0;
__gshared CRITICAL_SECTION lock;

void main(string[] args) {
    stderr.writeln("Give me a spin count.");
    int spinCount = to!int( readln().strip() );

    InitializeCriticalSectionAndSpinCount(&lock, spinCount);

    auto pc = new PerformanceCounter;

    auto threads = new Thread[nThreads];
    for(int i = 0; i < nThreads; i++) {
        threads[i] = new Thread(&doStuff);

    foreach(thread; threads) {


void doStuff() {
    for(int i = 0; i < 10_000_000; i++) {

spin count = 0:  3843 ms
spin count = 4000:  2095 ms

core.sync.Mutex doesn't use this feature.  Neither do synchronized blocks.
Based on looking at the source files for these, it seems trivial to start
using them.  Anyone see a good reason not to, or should I Bugzilla/patch this
Dec 16 2009