digitalmars.D.learn - Speed of synchronized

=?UTF-8?Q?Christian_K=c3=b6stlin?= (121/121) Oct 16 2016 Hi,

tcak (23/27) Oct 16 2016 Could you try that:

=?UTF-8?Q?Christian_K=c3=b6stlin?= (6/43) Oct 16 2016 thanks for the implementation. i think this is nicer, than using __gshar...

Daniel Kozak via Digitalmars-d-learn (3/14) Oct 16 2016 Can you post your timings (both D and Java)? And can you post your java...

=?UTF-8?Q?Christian_K=c3=b6stlin?= (39/57) Oct 16 2016 Hi,

Daniel Kozak via Digitalmars-d-learn (18/25) Oct 16 2016 I am still unable to get your java code working:

Daniel Kozak (3/23) Oct 16 2016 I have it, it is in

Daniel Kozak via Digitalmars-d-learn (50/57) Oct 17 2016 So I have done some testing, on my pc:

=?UTF-8?Q?Christian_K=c3=b6stlin?= (5/79) Oct 17 2016 thank you for looking into it.

=?UTF-8?Q?Christian_K=c3=b6stlin?= (31/111) Oct 17 2016 Thanks for the hint about the OS. I rerun the tests on a linux machine,

Daniel Kozak via Digitalmars-d-learn (3/6) Oct 17 2016 Can you try it on OSX with ldc compiler:

=?UTF-8?Q?Christian_K=c3=b6stlin?= (47/48) Oct 18 2016 on my machine i get the following output (using ldc2)

=?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:

Hi,

for an exercise I had to implement a thread safe counter.
This is what I came up with:

---SNIP---

import std.stdio;
import core.thread;
import std.conv;
import std.datetime;
static import core.atomic;
import core.sync.mutex;

int NR_OF_THREADS = 100;
int NR_OF_INCREMENTS = 10000;

interface Counter {
  void increment() shared;
  long get() shared;
}
class ThreadUnsafeCounter : Counter {
  long counter;
  void increment() shared {
    counter++;
  }
  long get() shared {
    return counter;
  }
}

class ThreadSafe1Counter : Counter {
  private long counter;
  synchronized void increment() shared {
    counter++;
  }
  long get() shared {
    return counter;
  }
}

class ThreadSafe2Counter : Counter {
  private long counter;
  __gshared Mutex lock; //
http://forum.dlang.org/post/rzyooanimrynpmqlywmf forum.dlang.org
  this() shared {
    lock = new Mutex;
  }
  void increment() shared {
    synchronized (lock) {
      counter++;
    }
  }
  long get() shared {
    return counter;
  }
}

class AtomicCounter : Counter {
  private long counter;
  void increment() shared {
    core.atomic.atomicOp!"+="(this.counter, 1);
  }
  long get() shared {
    return counter;
  }
}
void main() {
  void runWith(Counter)() {
    shared Counter counter = new shared Counter();
    void doIt() {
      Thread[] threads;
      for (int i=0; i<NR_OF_THREADS; ++i) {
        threads ~= new Thread({
            for (int i=0; i<NR_OF_INCREMENTS; ++i) {
              counter.increment();
            }
          });
      }
      foreach (Thread t; threads) {
        t.start();
      }
      foreach (Thread t; threads) {
        t.join();
      }
    }
    auto duration = benchmark!(doIt)(1);
    writeln(typeid(counter), ": got: ", counter.get(), " expected: ",
NR_OF_THREADS * NR_OF_INCREMENTS, " in ", to!Duration(duration[0]));
  }

  runWith!(AtomicCounter)();
  runWith!(ThreadSafe1Counter)();
  runWith!(ThreadSafe2Counter)();
  runWith!(ThreadUnsafeCounter)();

  void doIt2() {
    auto mutex      = new Mutex;
    int  numThreads = NR_OF_THREADS;
    int  numTries   = NR_OF_INCREMENTS;
    int  lockCount  = 0;

    void testFn() {
      for( int i = 0; i < numTries; ++i ) {
        synchronized( mutex ) {
          ++lockCount;
        }
      }
    }

    auto group = new ThreadGroup;

    for( int i = 0; i < numThreads; ++i )
      group.create( &testFn );

    group.joinAll();
    assert( lockCount == numThreads * numTries );
  }

  auto duration = benchmark!(doIt2)(1);
  writeln("from example got: ", to!Duration(duration[0]));
}


---SNIP---

For completeness I added also the example from core.sync.mutex
(https://dlang.org/phobos/core_sync_mutex.html) at the end.

My question now is, why is each mutex based thread safe variant so slow
compared to a similar java program? The only hint could be something
like:
https://blogs.oracle.com/dave/entry/java_util_concurrent_reentrantlock_vs that
mentions, that there is some magic going on underneath.
For the atomic and the non thread safe variant, the d solution seems to
be twice as fast as my java program, for the locked variant, the java
program seems to be 40 times faster?

btw. I run the code with dub run --build=release

Thanks in advance,
Christian

Oct 16 2016

tcak <1ltkrs+3wyh1ow7kzn1k sharklasers.com> writes:

On Sunday, 16 October 2016 at 08:41:26 UTC, Christian Köstlin 
wrote:
 Hi,

 for an exercise I had to implement a thread safe counter. This 
 is what I came up with:

 [...]

Could you try that:

class ThreadSafe3Counter: Counter{
   private long counter;
   private core.sync.mutex.Mutex mtx;

   public this() shared{
   	mtx = cast(shared)( new core.sync.mutex.Mutex );
   }

   void increment() shared {
   	(cast()mtx).lock();
   	scope(exit){ (cast()mtx).unlock(); }

     core.atomic.atomicOp!"+="(this.counter, 1);
   }

   long get() shared {
     return counter;
   }
}


Unfortunately, there are some stupid design decisions in D about 
"shared", and some people does not want to accept them.

Example while you are using mutex, so you shouldn't be forced to 
use atomicOp there. As a programmer, you know that it will be 
protected already. That is a loss of performance in the long run.

Oct 16 2016

=?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:

On 16/10/16 19:50, tcak wrote:
 On Sunday, 16 October 2016 at 08:41:26 UTC, Christian Köstlin wrote:
 Hi,

 for an exercise I had to implement a thread safe counter. This is what
 I came up with:

 [...]

 
 Could you try that:
 
 class ThreadSafe3Counter: Counter{
   private long counter;
   private core.sync.mutex.Mutex mtx;
 
   public this() shared{
       mtx = cast(shared)( new core.sync.mutex.Mutex );
   }
 
   void increment() shared {
       (cast()mtx).lock();
       scope(exit){ (cast()mtx).unlock(); }
 
     core.atomic.atomicOp!"+="(this.counter, 1);
   }
 
   long get() shared {
     return counter;
   }
 }
 
 
 Unfortunately, there are some stupid design decisions in D about
 "shared", and some people does not want to accept them.
 
 Example while you are using mutex, so you shouldn't be forced to use
 atomicOp there. As a programmer, you know that it will be protected
 already. That is a loss of performance in the long run.

thanks for the implementation. i think this is nicer, than using __gshared.
i think using atomic operations and mutexes at the same time, does not
make any sense. one or the other.

thanks,
Christian

Oct 16 2016

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a):

 My question now is, why is each mutex based thread safe variant so slow
 compared to a similar java program? The only hint could be something
 like:
 https://blogs.oracle.com/dave/entry/java_util_concurrent_reentrantlock_vs  that
 mentions, that there is some magic going on underneath.
 For the atomic and the non thread safe variant, the d solution seems to
 be twice as fast as my java program, for the locked variant, the java
 program seems to be 40 times faster?

 btw. I run the code with dub run --build=release

 Thanks in advance,
 Christian

Can you post your timings (both D and Java)?  And can you post your java 
code?

Oct 16 2016

=?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:

On 17/10/16 06:55, Daniel Kozak via Digitalmars-d-learn wrote:
 Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a):
 
 My question now is, why is each mutex based thread safe variant so slow
 compared to a similar java program? The only hint could be something
 like:
 https://blogs.oracle.com/dave/entry/java_util_concurrent_reentrantlock_vs 
 that
 mentions, that there is some magic going on underneath.
 For the atomic and the non thread safe variant, the d solution seems to
 be twice as fast as my java program, for the locked variant, the java
 program seems to be 40 times faster?

 btw. I run the code with dub run --build=release

 Thanks in advance,
 Christian

 Can you post your timings (both D and Java)?  And can you post your java
 code?

Hi,

thanks for asking. I attached my java and d sources.
Both try to do more or less the same thing. They spawn 100 threads, that
call increment on a counter object 10000 times. The implementation of
the counter object is exchanged, between a obviously broken thread
unsafe implementation, some with atomic operations, some with
mutex-implementations.

to run java call ./gradlew clean build
->
counter.AtomicIntCounter 25992ae3 expected: 2000000 got: 1000000 in: 22ms
counter.AtomicLongCounter 2539f946 expected: 2000000 got: 1000000 in: 17ms
counter.ThreadSafe2Counter 527d56c2 expected: 2000000 got: 1000000 in: 33ms
counter.ThreadSafe1Counter 6fd8b1a expected: 2000000 got: 1000000 in: 173ms
counter.ThreadUnsafeCounter 6bb33878 expected: 2000000 got: 562858 in: 10ms

obviously the unsafe implementation is fastest, followed by atomics.
the vrsion with reentrant locks performs very well, wheras the
implementation with synchronized is the slowest.

to run d call dub test (please mark, that the dub test build is
configured like this:
buildType "unittest" {
  buildOptions "releaseMode" "optimize" "inline" "unittests" "debugInfo"
}
, it should be release code speed and quality).

->
app.AtomicCounter: got: 1000000 expected: 1000000 in 23 ms, 852 μs, and
6 hnsecs
app.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 3 secs, 673
ms, 232 μs, and 6 hnsecs
app.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 3 secs, 684
ms, 416 μs, and 2 hnsecs
app.ThreadUnsafeCounter: got: 690073 expected: 1000000 in 8 ms and 540 μs
from example got: 3 secs, 806 ms, and 258 μs

here again, the unsafe implemenation is the fastest,
atomic performs in the same ballpark as java
only the thread safe variants are far off.

thanks for looking into this,
best regards,
christian

Oct 16 2016

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

Dne 17.10.2016 v 07:55 Christian Köstlin via Digitalmars-d-learn napsal(a):

 to run java call ./gradlew clean build
 ->
 counter.AtomicIntCounter 25992ae3 expected: 2000000 got: 1000000 in: 22ms
 counter.AtomicLongCounter 2539f946 expected: 2000000 got: 1000000 in: 17ms
 counter.ThreadSafe2Counter 527d56c2 expected: 2000000 got: 1000000 in: 33ms
 counter.ThreadSafe1Counter 6fd8b1a expected: 2000000 got: 1000000 in: 173ms
 counter.ThreadUnsafeCounter 6bb33878 expected: 2000000 got: 562858 in: 10ms

I am still unable to get your java code working:
[kozak dajinka threads]$ ./gradlew clean build
:clean
:compileJava
:processResources UP-TO-DATE
:classes
:jar
:assemble
:compileTestJava
:processTestResources UP-TO-DATE
:testClasses
:test
:check
:build

BUILD SUCCESSFUL

Total time: 3.726 secs


How I can run it?

Oct 16 2016

Daniel Kozak <kozzi11 gmail.com> writes:

On Monday, 17 October 2016 at 06:38:08 UTC, Daniel Kozak wrote:
 Dne 17.10.2016 v 07:55 Christian Köstlin via 
 Digitalmars-d-learn napsal(a):

[...]

 I am still unable to get your java code working:
 [kozak dajinka threads]$ ./gradlew clean build
 :clean
 :compileJava
 :processResources UP-TO-DATE
 :classes
 :jar
 :assemble
 :compileTestJava
 :processTestResources UP-TO-DATE
 :testClasses
 :test
 :check
 :build

 BUILD SUCCESSFUL

 Total time: 3.726 secs


 How I can run it?

I have it, it is in 
build/test-results/test/TEST-counter.CounterTest.xml

Oct 16 2016

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a):
 Hi,

 for an exercise I had to implement a thread safe counter.
 This is what I came up with:
 ....

 btw. I run the code with dub run --build=release

 Thanks in advance,
 Christian

So I have done some testing, on my pc:
Java result
counter.AtomicLongCounter 7ff5e7d8 expected: 2000000 got: 1000000 in: 83ms
counter.ThreadSafe2Counter 59b44e4b expected: 2000000 got: 1000000 in: 77ms
counter.ThreadSafe1Counter 2e5f6b4b expected: 2000000 got: 1000000 in: 154ms
counter.ThreadUnsafeCounter 762b155d expected: 2000000 got: 730428 in: 13ms

and my D results (code: http://dpaste.com/3QFXACY ):
snip.AtomicCounter: got: 1000000 expected: 1000000 in 77 ms and 783 μs
snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 287 ms, 727 
μs, and 3 hnsecs
snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 281 ms, 117 
μs, and 1 hnsec
snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 158 ms, 480 
μs, and 2 hnsecs
snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 6 ms, 682 
μs, and 1 hnsec

so atomic is same as in Java pthread_mutex is same speed as java 
synchronized
D mutexes and D synchronized are almost same, I belive that if I could 
setup same attrs as in pthread version it will be around 160ms too.

Unsafe is almost same for D and java. Only java ReentrantLock seems to 
work better. I believe there is some trick, so it will end up not using 
mutexes in the end at all. For example consider this change in D code:

void doIt(alias counter)() {
   auto thg = new ThreadGroup();
   for (int i=0; i<NR_OF_THREADS; ++i) {
      thg.create(&threadFuncBody!(counter));
   }
   thg.joinAll();
}

change it to

void doIt(alias counter)() {
   auto thg = new ThreadGroup();
   for (int i=0; i<NR_OF_THREADS; ++i) {
     auto tc = thg.create(&threadFuncBody!(counter));
     tc.join();
   }
}

and results are:

snip.AtomicCounter: got: 1000000 expected: 1000000 in 22 ms, 251 μs, and 
6 hnsecs
snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 46 ms, 146 
μs, and 3 hnsecs
snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 44 ms, 961 
μs, and 5 hnsecs
snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 42 ms, 512 
μs, and 8 hnsecs
snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 2 ms, 108 
μs, and 5 hnsecs

Oct 17 2016

=?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:

On 17/10/16 14:09, Daniel Kozak via Digitalmars-d-learn wrote:
 Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a):
 Hi,

 for an exercise I had to implement a thread safe counter.
 This is what I came up with:
 ....

 btw. I run the code with dub run --build=release

 Thanks in advance,
 Christian

 So I have done some testing, on my pc:
 Java result
 counter.AtomicLongCounter 7ff5e7d8 expected: 2000000 got: 1000000 in: 83ms
 counter.ThreadSafe2Counter 59b44e4b expected: 2000000 got: 1000000 in: 77ms
 counter.ThreadSafe1Counter 2e5f6b4b expected: 2000000 got: 1000000 in:
 154ms
 counter.ThreadUnsafeCounter 762b155d expected: 2000000 got: 730428 in: 13ms
 
 and my D results (code: http://dpaste.com/3QFXACY ):
 snip.AtomicCounter: got: 1000000 expected: 1000000 in 77 ms and 783 μs
 snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 287 ms, 727
 μs, and 3 hnsecs
 snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 281 ms, 117
 μs, and 1 hnsec
 snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 158 ms, 480
 μs, and 2 hnsecs
 snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 6 ms, 682
 μs, and 1 hnsec
 
 so atomic is same as in Java pthread_mutex is same speed as java
 synchronized
 D mutexes and D synchronized are almost same, I belive that if I could
 setup same attrs as in pthread version it will be around 160ms too.
 
 Unsafe is almost same for D and java. Only java ReentrantLock seems to
 work better. I believe there is some trick, so it will end up not using
 mutexes in the end at all. For example consider this change in D code:
 
 void doIt(alias counter)() {
   auto thg = new ThreadGroup();
   for (int i=0; i<NR_OF_THREADS; ++i) {
      thg.create(&threadFuncBody!(counter));
   }
   thg.joinAll();
 }
 
 change it to
 
 void doIt(alias counter)() {
   auto thg = new ThreadGroup();
   for (int i=0; i<NR_OF_THREADS; ++i) {
     auto tc = thg.create(&threadFuncBody!(counter));
     tc.join();
   }
 }
 
 and results are:
 
 snip.AtomicCounter: got: 1000000 expected: 1000000 in 22 ms, 251 μs, and
 6 hnsecs
 snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 46 ms, 146
 μs, and 3 hnsecs
 snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 44 ms, 961
 μs, and 5 hnsecs
 snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 42 ms, 512
 μs, and 8 hnsecs
 snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 2 ms, 108
 μs, and 5 hnsecs
 
 
 
 
 

thank you for looking into it.
this seems to be quite good.
I did expect something in those lines, but got the mentioned numbers on
my os x macbook. perhaps its a os x glitch.

Oct 17 2016

=?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:

On 17/10/16 14:44, Christian Köstlin wrote:
 On 17/10/16 14:09, Daniel Kozak via Digitalmars-d-learn wrote:
 Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a):
 Hi,

 for an exercise I had to implement a thread safe counter.
 This is what I came up with:
 ....

 btw. I run the code with dub run --build=release

 Thanks in advance,
 Christian

 So I have done some testing, on my pc:
 Java result
 counter.AtomicLongCounter 7ff5e7d8 expected: 2000000 got: 1000000 in: 83ms
 counter.ThreadSafe2Counter 59b44e4b expected: 2000000 got: 1000000 in: 77ms
 counter.ThreadSafe1Counter 2e5f6b4b expected: 2000000 got: 1000000 in:
 154ms
 counter.ThreadUnsafeCounter 762b155d expected: 2000000 got: 730428 in: 13ms

 and my D results (code: http://dpaste.com/3QFXACY ):
 snip.AtomicCounter: got: 1000000 expected: 1000000 in 77 ms and 783 μs
 snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 287 ms, 727
 μs, and 3 hnsecs
 snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 281 ms, 117
 μs, and 1 hnsec
 snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 158 ms, 480
 μs, and 2 hnsecs
 snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 6 ms, 682
 μs, and 1 hnsec

 so atomic is same as in Java pthread_mutex is same speed as java
 synchronized
 D mutexes and D synchronized are almost same, I belive that if I could
 setup same attrs as in pthread version it will be around 160ms too.

 Unsafe is almost same for D and java. Only java ReentrantLock seems to
 work better. I believe there is some trick, so it will end up not using
 mutexes in the end at all. For example consider this change in D code:

 void doIt(alias counter)() {
   auto thg = new ThreadGroup();
   for (int i=0; i<NR_OF_THREADS; ++i) {
      thg.create(&threadFuncBody!(counter));
   }
   thg.joinAll();
 }

 change it to

 void doIt(alias counter)() {
   auto thg = new ThreadGroup();
   for (int i=0; i<NR_OF_THREADS; ++i) {
     auto tc = thg.create(&threadFuncBody!(counter));
     tc.join();
   }
 }

 and results are:

 snip.AtomicCounter: got: 1000000 expected: 1000000 in 22 ms, 251 μs, and
 6 hnsecs
 snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 46 ms, 146
 μs, and 3 hnsecs
 snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 44 ms, 961
 μs, and 5 hnsecs
 snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 42 ms, 512
 μs, and 8 hnsecs
 snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 2 ms, 108
 μs, and 5 hnsecs

 thank you for looking into it.
 this seems to be quite good.
 I did expect something in those lines, but got the mentioned numbers on
 my os x macbook. perhaps its a os x glitch.
 

Thanks for the hint about the OS. I rerun the tests on a linux machine,
and there everything is fine!
linux dlang code:
app.AtomicCounter: got: 1000000 expected: 1000000 in 24 ms, 387 μs, and
3 hnsecs
app.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 143 ms, 534
μs, and 9 hnsecs
app.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 159 ms, 685
μs, and 1 hnsec
app.ThreadUnsafeCounter: got: 399937 expected: 1000000 in 9 ms and 556 μs
from example got: 156 ms, 198 μs, and 9 hnsecs


linux java code:
counter.CounterTest > testAtomicIntCounter STANDARD_OUT
    counter.AtomicIntCounter 1f2a2347 expected: 1000000 got: 1000000 in:
29ms

counter.CounterTest > testAtomicLongCounter STANDARD_OUT
    counter.AtomicLongCounter 675ad891 expected: 1000000 got: 1000000
in: 24ms

counter.CounterTest > testThreadSafe2Counter STANDARD_OUT
    counter.ThreadSafe2Counter 3043c6d2 expected: 1000000 got: 1000000
in: 38ms

counter.CounterTest > testThreadSafeCounter STANDARD_OUT
    counter.ThreadSafe1Counter bac4ba3 expected: 1000000 got: 1000000
in: 145ms

counter.CounterTest > testThreadUnsafeCounter STANDARD_OUT
    counter.ThreadUnsafeCounter 2fe82bf8 expected: 1000000 got: 433730
in: 9ms


Could someone check the numbers on another OS-X machine? Unfortunately I
only have one available.

Thanks in advance!

Oct 17 2016

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

Dne 17.10.2016 v 23:40 Christian Köstlin via Digitalmars-d-learn napsal(a):

 Could someone check the numbers on another OS-X machine? Unfortunately I
 only have one available.

 Thanks in advance!

Can you try it on OSX with ldc compiler:

dub run --build=release --compiler=ldc

Oct 17 2016

=?UTF-8?Q?Christian_K=c3=b6stlin?= <christian.koestlin gmail.com> writes:

On 18/10/16 07:04, Daniel Kozak via Digitalmars-d-learn wrote:
 dub run --build=release --compiler=ldc

on my machine i get the following output (using ldc2)
ldc2 --version      09:32
LDC - the LLVM D compiler (1.0.0):
  based on DMD v2.070.2 and LLVM 3.8.1
  built with LDC - the LLVM D compiler (0.17.1)
  Default target: x86_64-apple-darwin15.6.0
  Host CPU: haswell
  http://dlang.org - http://wiki.dlang.org/LDC

  Registered Targets:
    amdgcn  - AMD GCN GPUs
    arm     - ARM
    armeb   - ARM (big endian)
    nvptx   - NVIDIA PTX 32-bit
    nvptx64 - NVIDIA PTX 64-bit
    r600    - AMD GPUs HD2XXX-HD6XXX
    thumb   - Thumb
    thumbeb - Thumb (big endian)
    x86     - 32-bit X86: Pentium-Pro and above
    x86-64  - 64-bit X86: EM64T and AMD64


dub test --compiler=ldc2 (my unittest configuration now includes the
proper release flags thanks to sönke).
No source files found in configuration 'library'. Falling back to "dub
-b unittest".
Performing "unittest" build using ldc2 for x86_64.
05-threads ~master: building configuration "application"...
source/app.d(18): Deprecation: read-modify-write operations are not
allowed for shared variables. Use
core.atomic.atomicOp!"+="(this.counter, 1) instead.
source/app.d(28): Deprecation: read-modify-write operations are not
allowed for shared variables. Use
core.atomic.atomicOp!"+="(this.counter, 1) instead.
source/app.d(43): Deprecation: read-modify-write operations are not
allowed for shared variables. Use
core.atomic.atomicOp!"+="(this.counter, 1) instead.
Running ./05-threads
app.AtomicCounter: got: 1000000 expected: 1000000 in 21 ms, 692 μs, and
6 hnsecs
app.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 3 secs, 909
ms, 137 μs, and 3 hnsecs
app.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 3 secs, 724
ms, 201 μs, and 9 hnsecs
app.ThreadUnsafeCounter: got: 759497 expected: 1000000 in 8 ms, 841 μs,
and 9 hnsecs
from example got: 3 secs, 840 ms, 387 μs, and 2 hnsecs




looks similar to me.

thanks christian

Oct 18 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Speed of synchronized