digitalmars.D - More on GC Spinlocks

dsimcha (40/40) Dec 06 2008 A few days ago, I commented that I thought that maybe the GC should be u...

Fawzi Mohamed (8/61) Dec 06 2008 Nice, I think indeed for the allocation I think that spinlock are

The Anh Tran (4/4) Dec 06 2008 Hi,

bearophile (5/7) Dec 07 2008 This is a D version designed to be simple (it's like the Java version):
dsimcha (9/13) Dec 07 2008 I tried to reply to your post last night with my modified gcx.d, but app...

The Anh Tran (6/22) Dec 09 2008 I've failed to recompile druntime with your gcx.d. I'm still a D newbie

dsimcha <dsimcha yahoo.com> writes:

A few days ago, I commented that I thought that maybe the GC should be using
spinlocks, given how little time a typical allocation takes compared to
context switches, etc.  I've created a version of the D 2.21 druntime GC with
spinlocks instead of synchronized, and created the following simple benchmark
to just generate a ton of contention for the GC:

import core.thread, core.memory, std.perf, std.stdio, std.c.time, std.c.stdio;

void main() {
    readln;  //Allow for affinity to be changed.
    GC.disable;
    auto T = new Thread(&foo);
    T.start;
    scope auto pc = new PerformanceCounter;
    pc.start;
    foo();
    T.join;
    pc.stop;
    writeln(pc.milliseconds);
}

void foo() {
   foreach(i; 0..10_000_000) {
       auto foo = GC.malloc(8);
       GC.free(foo);
   }
}

Here are the times:

Using both of my CPU cores, meaning serious contention, in milliseconds:

Spinlock:  10006
Synchronized:  28563

The synchronized version uses ~25-30% CPU, because of OS rescheduling, while
the spinlock version uses 100%.

Setting the affinity to only one CPU to simulate a single-CPU environment:

Spinlock:  4356
Synchronized:  4758

Replacing one thread's foo() by a dummy function so that the lock is never
even contested:

Spinlock:  1876
Synchronized:  2589


I will acknowledge that this is an extremely simple benchmark, but I think
it's reasonably representative of a severely contested memory allocation lock.
 The spinlock I used was the simplest possible atomic CAS lock, nothing fancy.

Dec 06 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-06 21:31:34 +0100, dsimcha <dsimcha yahoo.com> said:

 A few days ago, I commented that I thought that maybe the GC should be using
 spinlocks, given how little time a typical allocation takes compared to
 context switches, etc.  I've created a version of the D 2.21 druntime GC with
 spinlocks instead of synchronized, and created the following simple benchmark
 to just generate a ton of contention for the GC:
 
 import core.thread, core.memory, std.perf, std.stdio, std.c.time, std.c.stdio;
 
 void main() {
     readln;  //Allow for affinity to be changed.
     GC.disable;
     auto T = new Thread(&foo);
     T.start;
     scope auto pc = new PerformanceCounter;
     pc.start;
     foo();
     T.join;
     pc.stop;
     writeln(pc.milliseconds);
 }
 
 void foo() {
    foreach(i; 0..10_000_000) {
        auto foo = GC.malloc(8);
        GC.free(foo);
    }
 }
 
 Here are the times:
 
 Using both of my CPU cores, meaning serious contention, in milliseconds:
 
 Spinlock:  10006
 Synchronized:  28563
 
 The synchronized version uses ~25-30% CPU, because of OS rescheduling, while
 the spinlock version uses 100%.
 
 Setting the affinity to only one CPU to simulate a single-CPU environment:
 
 Spinlock:  4356
 Synchronized:  4758
 
 Replacing one thread's foo() by a dummy function so that the lock is never
 even contested:
 
 Spinlock:  1876
 Synchronized:  2589
 
 
 I will acknowledge that this is an extremely simple benchmark, but I think
 it's reasonably representative of a severely contested memory allocation lock.
  The spinlock I used was the simplest possible atomic CAS lock, nothing fancy.

Nice, I think indeed for the allocation I think that spinlock are 
definitely preferable, but I think that they should be integrated in 
the runtime, so that the runtime can  be built also on platforms that 
do not support them (maybe a spinlock module that can be implemented on 
the top of normal locks if the need arises, it should not add too much 
overhead I hope.

Fawzi

Dec 06 2008

The Anh Tran <trtheanh gmail.com> writes:

Hi,
Could you measure spinlock with this bench?
http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all

Thanks.

Dec 06 2008

bearophile <bearophileHUGS lycos.com> writes:

The Anh Tran:
 Could you measure spinlock with this bench?
 http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all

This is a D version designed to be simple (it's like the Java version):
http://shootout.alioth.debian.org/gp4/benchmark.php?test=binarytrees&lang=dlang&id=1

Bye,
bearophile

Dec 07 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from The Anh Tran (trtheanh gmail.com)'s article
 Hi,
 Could you measure spinlock with this bench?
 http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all
 Thanks.

I tried to reply to your post last night with my modified gcx.d, but apparently
posting attachments of that size (~80kb) silently fails.

There is no multithreaded implementation of the binary trees benchmark for D
that
I was able to find.  As far as the single-threaded version, spinlocks would make
absolutely no difference because the druntime GC uses thread_needLock() to avoid
any kind of lock on single-threaded code.

If you or anyone else wants to play around w/ my modified gcx.d code and try it
under different use cases, I've posted it to http://cis.jhu.edu/~dsimcha/gcx.d.

Dec 07 2008

The Anh Tran <trtheanh gmail.com> writes:

dsimcha wrote:
 == Quote from The Anh Tran (trtheanh gmail.com)'s article
 Hi,
 Could you measure spinlock with this bench?
 http://shootout.alioth.debian.org/u32q/benchmark.php?test=binarytrees&lang=all
 Thanks.

 
 I tried to reply to your post last night with my modified gcx.d, but apparently
 posting attachments of that size (~80kb) silently fails.
 
 There is no multithreaded implementation of the binary trees benchmark for D
that
 I was able to find.  As far as the single-threaded version, spinlocks would
make
 absolutely no difference because the druntime GC uses thread_needLock() to
avoid
 any kind of lock on single-threaded code.
 
 If you or anyone else wants to play around w/ my modified gcx.d code and try it
 under different use cases, I've posted it to http://cis.jhu.edu/~dsimcha/gcx.d.

I've failed to recompile druntime with your gcx.d. I'm still a D newbie 
:|. Ie: need someone to hold my hand and guide steps by steps ;)

This is my multithread alloc implementation.
If it uses std.gc, and runs with 2 threads; it is 5 times slower than 1 
thread !???

Dec 09 2008

D Programming

C/C++ Programming

Other

digitalmars.D - More on GC Spinlocks