digitalmars.D - Concurrent GC (for Windows)

Rainer Schuetze (9/9) Jun 03 2014 Hi,

Benjamin Thaut (13/22) Jun 03 2014 This sounds quite promising.

Rainer Schuetze (4/28) Jun 04 2014 Yes, write barriers are a good addition to this, but is way more

Benjamin Thaut (5/8) Jun 04 2014 I assume they are hard to implement because the backend no longer has

Jacob Carlborg (5/13) Jun 03 2014 How does other GC's handle these memory problems? I'm thinking of Java

Paulo Pinto (2/19) Jun 04 2014 Note that there are native compilers for Java and C#.

Jacob Carlborg (4/5) Jun 04 2014 My question still remains :)

Rainer Schuetze (11/25) Jun 04 2014 All more sophisticated GCs have write or read barriers. That makes it

Jacob Carlborg (4/6) Jun 05 2014 Right, now I start to remember.
Martin Nowak (5/9) Jun 15 2014 That's pausing the thread which triggered the collection, right?

Rainer Schuetze (6/15) Jun 15 2014 Yes, the world is not stopped during sweeping. The GC lock still blocks

Dmitry Olshansky (26/35) Jun 11 2014 I'm not sure if how it would perform but what about the following idea

Rainer Schuetze (26/62) Jun 11 2014 Cool stuff! I remember trying something similar, but IIRC forcing the

Dmitry Olshansky (22/62) Jun 12 2014 Yes, exactly, but I forgot the recipe to convert COFF/OMF import librari...

Rainer Schuetze (16/49) Jun 13 2014 There could also be the fallback to VirtualQuery if QueryWorkingSetEx

Dmitry Olshansky (8/25) Jun 13 2014 I thought that's what I do by keeping original view mapped.

Rainer Schuetze (12/28) Jun 14 2014 I added COW to the concurrent GC (

Dmitry Olshansky (7/37) Jun 14 2014 No idea. It may also depend on OS version, given the fuzz about Win7

Rainer Schuetze (5/12) Jun 15 2014 The benchmarks have been recently merged into druntime. Compile and run

Martin Nowak (3/8) Jun 15 2014 Is there a chance that we can pause mutator threads when they trigger to...

Rainer Schuetze <r.sagitario gmx.de> writes:

Hi,

more GC talk: the last couple of days, I've been experimenting with 
implementing a concurrent GC on Windows inspired by Leandros CDGC. 
Here's a report on my experiments:

http://rainers.github.io/visuald/druntime/concurrentgc.html

tl;dr: there is a working(?) partially concurrent GC here: 
https://github.com/rainers/druntime/tree/concurrent_gc2
but it eats a whole lot of memory.

Rainer

Jun 03 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 03.06.2014 09:35, schrieb Rainer Schuetze:
 Hi,

 more GC talk: the last couple of days, I've been experimenting with
 implementing a concurrent GC on Windows inspired by Leandros CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html

 tl;dr: there is a working(?) partially concurrent GC here:
 https://github.com/rainers/druntime/tree/concurrent_gc2
 but it eats a whole lot of memory.

 Rainer

This sounds quite promising.

Did you think about using a write barrier instead of doing the chech per 
page? Because the way you propose will mean, that all writes, no matter 
if relevant for the GC or not, will cause the GC to reconsider the page. 
If we implement a write barrier into the front end that only gets called 
when a write through a pointer to a pointer happens, the GC could only 
rescan memory blocks that actually have changed pointers. Also the GC 
might be able to remember the old pointer value before the write is 
execued resulting in a consitent snapshot of the heap, which would 
eliminate the need to rescan after the background thread is finished.

Kind Regards
Benjamin Thaut

Jun 03 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 04.06.2014 07:44, Benjamin Thaut wrote:
 Am 03.06.2014 09:35, schrieb Rainer Schuetze:
 Hi,

 more GC talk: the last couple of days, I've been experimenting with
 implementing a concurrent GC on Windows inspired by Leandros CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html

 tl;dr: there is a working(?) partially concurrent GC here:
 https://github.com/rainers/druntime/tree/concurrent_gc2
 but it eats a whole lot of memory.

 Rainer

 This sounds quite promising.

 Did you think about using a write barrier instead of doing the chech per
 page? Because the way you propose will mean, that all writes, no matter
 if relevant for the GC or not, will cause the GC to reconsider the page.
 If we implement a write barrier into the front end that only gets called
 when a write through a pointer to a pointer happens, the GC could only
 rescan memory blocks that actually have changed pointers. Also the GC
 might be able to remember the old pointer value before the write is
 execued resulting in a consitent snapshot of the heap, which would
 eliminate the need to rescan after the background thread is finished.

Yes, write barriers are a good addition to this, but is way more 
difficult to implement and will meet more resistance from Walter. It's 
still something I'd like to look into.

Jun 04 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 04.06.2014 22:26, schrieb Rainer Schuetze:
 Yes, write barriers are a good addition to this, but is way more
 difficult to implement and will meet more resistance from Walter. It's
 still something I'd like to look into.

I assume they are hard to implement because the backend no longer has 
type information when generating the writes?

I remember that Walter stated a few weeks ago that he considered 
implementing write barriers.

Jun 04 2014

Jacob Carlborg <doob me.com> writes:

On 03/06/14 09:35, Rainer Schuetze wrote:
 Hi,

 more GC talk: the last couple of days, I've been experimenting with
 implementing a concurrent GC on Windows inspired by Leandros CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html

 tl;dr: there is a working(?) partially concurrent GC here:
 https://github.com/rainers/druntime/tree/concurrent_gc2
 but it eats a whole lot of memory.

How does other GC's handle these memory problems? I'm thinking of Java 


-- 
/Jacob Carlborg

Jun 03 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Wednesday, 4 June 2014 at 06:51:04 UTC, Jacob Carlborg wrote:
 On 03/06/14 09:35, Rainer Schuetze wrote:
 Hi,

 more GC talk: the last couple of days, I've been experimenting 
 with
 implementing a concurrent GC on Windows inspired by Leandros 
 CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html

 tl;dr: there is a working(?) partially concurrent GC here:
 https://github.com/rainers/druntime/tree/concurrent_gc2
 but it eats a whole lot of memory.

 How does other GC's handle these memory problems? I'm thinking 

 a virtual machine?

Jun 04 2014

Jacob Carlborg <doob me.com> writes:

On 2014-06-04 09:14, Paulo Pinto wrote:



My question still remains :)

-- 
/Jacob Carlborg

Jun 04 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 04.06.2014 08:51, Jacob Carlborg wrote:
 On 03/06/14 09:35, Rainer Schuetze wrote:
 Hi,

 more GC talk: the last couple of days, I've been experimenting with
 implementing a concurrent GC on Windows inspired by Leandros CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html

 tl;dr: there is a working(?) partially concurrent GC here:
 https://github.com/rainers/druntime/tree/concurrent_gc2
 but it eats a whole lot of memory.

 How does other GC's handle these memory problems? I'm thinking of Java


All more sophisticated GCs have write or read barriers. That makes it 
much easier to keep track of modifications during concurrent collection.

One reason for the huge memory foot print is that the application is 
allowed to continue to allocate during collection, but it will do this 
by getting more memory from the OS. It needs to be somehow throttled to 
avoid going too far with this.

Most of the remaining pause time is sweeping garbage. I think about 
deferring sweeping into allocations by running finalizers just before 
reusing the memory for another object. This can throttle allocations a 
bit while at the same time reduce pauses.

Jun 04 2014

Jacob Carlborg <doob me.com> writes:

On 2014-06-04 22:37, Rainer Schuetze wrote:

 All more sophisticated GCs have write or read barriers. That makes it
 much easier to keep track of modifications during concurrent collection.

Right, now I start to remember.

-- 
/Jacob Carlborg

Jun 05 2014

Martin Nowak <code dawg.eu> writes:

On 06/04/2014 10:37 PM, Rainer Schuetze wrote:
 Most of the remaining pause time is sweeping garbage. I think about
 deferring sweeping into allocations by running finalizers just before
 reusing the memory for another object. This can throttle allocations a
 bit while at the same time reduce pauses.

That's pausing the thread which triggered the collection, right?
Other threads don't need to wait for sweeping.
Dividing the cost to run finalizers among many allocations is a charming 
idea.

Jun 15 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 15.06.2014 23:30, Martin Nowak wrote:
 On 06/04/2014 10:37 PM, Rainer Schuetze wrote:
 Most of the remaining pause time is sweeping garbage. I think about
 deferring sweeping into allocations by running finalizers just before
 reusing the memory for another object. This can throttle allocations a
 bit while at the same time reduce pauses.

 That's pausing the thread which triggered the collection, right?
 Other threads don't need to wait for sweeping.

Yes, the world is not stopped during sweeping. The GC lock still blocks 
any thread that tries to allocate, though.

 Dividing the cost to run finalizers among many allocations is a charming
 idea.

Another nice property: If memory is recycled during malloc, you can run 
the finalizers without the GC lock held just before returning, so there 
are less restrictions on what can happen in a destructor.

Jun 15 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

03-Jun-2014 11:35, Rainer Schuetze пишет:
 Hi,

 more GC talk: the last couple of days, I've been experimenting with
 implementing a concurrent GC on Windows inspired by Leandros CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html

 tl;dr: there is a working(?) partially concurrent GC here:
 https://github.com/rainers/druntime/tree/concurrent_gc2
 but it eats a whole lot of memory.

I'm not sure if how it would perform but what about the following idea 
for COW snapshotting.

1. Don't use the separate process and shared view, use background thread.
2. Allocate heap space with MapViewOfFile, but private not shared.

During collection, creating a snapshot (during stop the world) as:

1. Make a new, read-write view of heap with MapViewOfFile, this is what 
background thread will use to scan memory.
2. Save base address of the original view, and unmap it.
3. MapViewOfFileEx with MAP_COPY and the old base address (that gives us 
back the same heap but in CoW mode).

... move on with application and let background thread do the marking.

Once marking is done, do stop the world and:

1. VirtualQuery (or QueryWorkingSetEx) over the application's CoW view 
of the heap. See the ranges of pages that have flipped to 
PAGE_READWRITE, those are the one modified and duplicated. If new 
allocation are served from it then it will include newly allocated stuff.
2. Copy these ranges of pages over to the normal read-write view.
3. Close CoW mapping (thereby freeing duplicates) and remap it again as 
r-w view on the same address with MapViewOfFileEx.

I wasn't able to get QueryWorkingSetEx to behave but I believe it must 
be a better way then VirtualQuery every page-range to death.

See the sketch of the idea here :
https://gist.github.com/DmitryOlshansky/5e32057e047425480f0e

 Rainer


-- 
Dmitry Olshansky

Jun 11 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 11.06.2014 18:59, Dmitry Olshansky wrote:
 03-Jun-2014 11:35, Rainer Schuetze пишет:
 Hi,

 more GC talk: the last couple of days, I've been experimenting with
 implementing a concurrent GC on Windows inspired by Leandros CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html

 tl;dr: there is a working(?) partially concurrent GC here:
 https://github.com/rainers/druntime/tree/concurrent_gc2
 but it eats a whole lot of memory.

 I'm not sure if how it would perform but what about the following idea
 for COW snapshotting.

 1. Don't use the separate process and shared view, use background thread.
 2. Allocate heap space with MapViewOfFile, but private not shared.

 During collection, creating a snapshot (during stop the world) as:

 1. Make a new, read-write view of heap with MapViewOfFile, this is what
 background thread will use to scan memory.
 2. Save base address of the original view, and unmap it.
 3. MapViewOfFileEx with MAP_COPY and the old base address (that gives us
 back the same heap but in CoW mode).

 .... move on with application and let background thread do the marking.

 Once marking is done, do stop the world and:

 1. VirtualQuery (or QueryWorkingSetEx) over the application's CoW view
 of the heap. See the ranges of pages that have flipped to
 PAGE_READWRITE, those are the one modified and duplicated. If new
 allocation are served from it then it will include newly allocated stuff.
 2. Copy these ranges of pages over to the normal read-write view.
 3. Close CoW mapping (thereby freeing duplicates) and remap it again as
 r-w view on the same address with MapViewOfFileEx.

 I wasn't able to get QueryWorkingSetEx to behave but I believe it must
 be a better way then VirtualQuery every page-range to death.

 See the sketch of the idea here :
 https://gist.github.com/DmitryOlshansky/5e32057e047425480f0e

Cool stuff! I remember trying something similar, but IIRC forcing the 
same address with MapViewOfFile somehow failed (maybe this was across 
processes). I tried your version on both Win32 and Win64 successfully, 
though.

I implemented the QueryWorkingSetEx version like this (you need a 
converted psapi.lib for Win32):

enum PAGES = 512; //SIZE / 4096;
PSAPI_WORKING_SET_EX_INFORMATION[PAGES] info;
foreach(i, ref inf; info)
	inf.VirtualAddress = heap + i * 4096;
if (!QueryWorkingSetEx(GetCurrentProcess(), info.ptr, info.sizeof))
	throw new Exception(format("Could not query info (%d).\n", 
GetLastError()));

foreach(i, ref inf; info)
	writefln("flags page %d: %x", i, inf.VirtualAttributes);


and you can check the "shared" field to get copied pages. This function 
is not supported on XP, though.

A short benchmark shows that VirtualQuery needs 55/42 ms for your test 
on Win32/Win64 on my mobile i7, while QueryWorkingSetEx takes about 17 
ms for both.

If I add the actual copy into heap2 (i.e. every fourth page of 512 MB is 
copied), I get 80-90 ms more.

The numbers are not great, but I guess the usual memory usage and number 
of modified pages will be much lower. I'll see if I can integrate this 
into the concurrent implementation.

Jun 11 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

12-Jun-2014 10:34, Rainer Schuetze пишет:
 On 11.06.2014 18:59, Dmitry Olshansky wrote:
 03-Jun-2014 11:35, Rainer Schuetze пишет:
 Hi,

 more GC talk: the last couple of days, I've been experimenting with
 implementing a concurrent GC on Windows inspired by Leandros CDGC.
 Here's a report on my experiments:

 http://rainers.github.io/visuald/druntime/concurrentgc.html



[snip]
 See the sketch of the idea here :
 https://gist.github.com/DmitryOlshansky/5e32057e047425480f0e

 Cool stuff! I remember trying something similar, but IIRC forcing the
 same address with MapViewOfFile somehow failed (maybe this was across
 processes). I tried your version on both Win32 and Win64 successfully,
 though.



 I implemented the QueryWorkingSetEx version like this (you need a
 converted psapi.lib for Win32):

Yes, exactly, but I forgot the recipe to convert COFF/OMF import libraries.

 enum PAGES = 512; //SIZE / 4096;
 PSAPI_WORKING_SET_EX_INFORMATION[PAGES] info;
 foreach(i, ref inf; info)
      inf.VirtualAddress = heap + i * 4096;
 if (!QueryWorkingSetEx(GetCurrentProcess(), info.ptr, info.sizeof))
      throw new Exception(format("Could not query info (%d).\n",
 GetLastError()));

 foreach(i, ref inf; info)
      writefln("flags page %d: %x", i, inf.VirtualAttributes);


 and you can check the "shared" field to get copied pages.


 This function
 is not supported on XP, though.

I wouldn't worry about it, it's not like XP users are growing in 
numbers. Also it looks like only 64bit version is good to go, as on 
32bit it would reduce usable memory in half.

 A short benchmark shows that VirtualQuery needs 55/42 ms for your test
 on Win32/Win64 on my mobile i7, while QueryWorkingSetEx takes about 17
 ms for both.

Seems in line with my measurements. Strictly speaking 1/2 of pages, 
interleaved should give the estimate of the worst case. Together with 
remapping (freeing duplicated pages) It doesn't go beyond 250ms on 640Mb 
of heap.

 If I add the actual copy into heap2 (i.e. every fourth page of 512 MB is
 copied), I get 80-90 ms more.

Aye... this is a lot. Also for me it turns out that unmapping CoW view 
at the last step takes the most of time. It might help to split the full 
heap into multiple views.

Also using VirtualProtect during the first step - turning a mapping into 
CoW one is faster then unmap/map (by factor of 2).

One thing that may help is saving a pointer to the end of used heap at 
the moment of scan, then remaping only this portion as COW.

Last issue I see is adjustment of pointers - in a GC, the mapped view is 
mapped at new address so it would need a fixup them during scanning.

 The numbers are not great, but I guess the usual memory usage and number
 of modified pages will be much lower. I'll see if I can integrate this
 into the concurrent implementation.

Wish you luck, I'm still not sure if it will help.

-- 
Dmitry Olshansky

Jun 12 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 13.06.2014 02:38, Dmitry Olshansky wrote:
 12-Jun-2014 10:34, Rainer Schuetze пишет:
 I implemented the QueryWorkingSetEx version like this (you need a
 converted psapi.lib for Win32):

 Yes, exactly, but I forgot the recipe to convert COFF/OMF import libraries.

Grab coffimplib.exe.

 This function
 is not supported on XP, though.

 I wouldn't worry about it, it's not like XP users are growing in
 numbers. Also it looks like only 64bit version is good to go, as on
 32bit it would reduce usable memory in half.

There could also be the fallback to VirtualQuery if QueryWorkingSetEx 
doesn't exist.

 A short benchmark shows that VirtualQuery needs 55/42 ms for your test
 on Win32/Win64 on my mobile i7, while QueryWorkingSetEx takes about 17
 ms for both.

 Seems in line with my measurements. Strictly speaking 1/2 of pages,
 interleaved should give the estimate of the worst case. Together with
 remapping (freeing duplicated pages) It doesn't go beyond 250ms on 640Mb
 of heap.

 If I add the actual copy into heap2 (i.e. every fourth page of 512 MB is
 copied), I get 80-90 ms more.

 Aye... this is a lot. Also for me it turns out that unmapping CoW view
 at the last step takes the most of time.

Maybe the memory needs to be actually flushed to the file if no mapping 
exists. If that is the case, we could avoid that if we create another 
temporary mapping.

 It might help to split the full heap into multiple views.

The current GC uses pools of max 32 MB, so that already exists.

 Also using VirtualProtect during the first step - turning a mapping into
 CoW one is faster then unmap/map (by factor of 2).

 One thing that may help is saving a pointer to the end of used heap at
 the moment of scan, then remaping only this portion as COW.

The pool architecture already does this if scanning just ignores new 
pools during collection.

Another optimization is to segregate the heap into memory with 
references and memory with plain data (NO_SCAN) at page/pool 
granularity. NO_SCAN-pages won't need COW. My hope is that this reduces 
necessary duplicate memory addresses considerably.

 Last issue I see is adjustment of pointers - in a GC, the mapped view is
 mapped at new address so it would need a fixup them during scanning.

Agreed, that's a slight additional cost for scanning, but I don't think 
it will be too difficult to implement.


 The numbers are not great, but I guess the usual memory usage and number
 of modified pages will be much lower. I'll see if I can integrate this
 into the concurrent implementation.

 Wish you luck, I'm still not sure if it will help.

Jun 13 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

13-Jun-2014 12:22, Rainer Schuetze пишет:
 On 13.06.2014 02:38, Dmitry Olshansky wrote:
 12-Jun-2014 10:34, Rainer Schuetze пишет:
 I implemented the QueryWorkingSetEx version like this (you need a
 converted psapi.lib for Win32):

 Yes, exactly, but I forgot the recipe to convert COFF/OMF import
 libraries.

 Grab coffimplib.exe.

Cool, thanks.

 If I add the actual copy into heap2 (i.e. every fourth page of 512 MB is
 copied), I get 80-90 ms more.

 Aye... this is a lot. Also for me it turns out that unmapping CoW view
 at the last step takes the most of time.

 Maybe the memory needs to be actually flushed to the file if no mapping
 exists. If that is the case, we could avoid that if we create another
 temporary mapping.

I thought that's what I do by keeping original view mapped.
The time taken is ~ proportional to the number of pages modified in the 
CoW view, since these are private copies I think it might take this much 
to recycle them. It's ~1 ms if no changes are made.


-- 
Dmitry Olshansky

Jun 13 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 13.06.2014 12:08, Dmitry Olshansky wrote:
 13-Jun-2014 12:22, Rainer Schuetze пишет:
 If I add the actual copy into heap2 (i.e. every fourth page of 512
 MB is
 copied), I get 80-90 ms more.

 Aye... this is a lot. Also for me it turns out that unmapping CoW view
 at the last step takes the most of time.

 Maybe the memory needs to be actually flushed to the file if no mapping
 exists. If that is the case, we could avoid that if we create another
 temporary mapping.

 I thought that's what I do by keeping original view mapped.
 The time taken is ~ proportional to the number of pages modified in the
 CoW view, since these are private copies I think it might take this much
 to recycle them. It's ~1 ms if no changes are made.

I added COW to the concurrent GC ( 
https://github.com/rainers/druntime/tree/concurrent_gc2 ) and it seems 
more stable with respect to pause times than the version with reading 
write watches.

I also see considerable time spent during unmapping the CoW view, about 
as much as copying the data.

One issue with calling OS functions while the world is stopped: if these 
function require a lock on some synchronization object, and this object 
is locked by another thread that is suspended, we create a dead lock.

Do you know if MapViewOfFileEx and UnmapViewOfFile are lock free?

The same can happen with GetWriteWatch and ResetWriteWatch, though.

Jun 14 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

14-Jun-2014 20:34, Rainer Schuetze пишет:
 On 13.06.2014 12:08, Dmitry Olshansky wrote:
 13-Jun-2014 12:22, Rainer Schuetze пишет:
 If I add the actual copy into heap2 (i.e. every fourth page of 512
 MB is
 copied), I get 80-90 ms more.

 Aye... this is a lot. Also for me it turns out that unmapping CoW view
 at the last step takes the most of time.

 Maybe the memory needs to be actually flushed to the file if no mapping
 exists. If that is the case, we could avoid that if we create another
 temporary mapping.

 I thought that's what I do by keeping original view mapped.
 The time taken is ~ proportional to the number of pages modified in the
 CoW view, since these are private copies I think it might take this much
 to recycle them. It's ~1 ms if no changes are made.

 I added COW to the concurrent GC (
 https://github.com/rainers/druntime/tree/concurrent_gc2 ) and it seems
 more stable with respect to pause times than the version with reading
 write watches.

How do I run benchmarks and what are the newest results for you?

 I also see considerable time spent during unmapping the CoW view, about
 as much as copying the data.

 One issue with calling OS functions while the world is stopped: if these
 function require a lock on some synchronization object, and this object
 is locked by another thread that is suspended, we create a dead lock.

 Do you know if MapViewOfFileEx and UnmapViewOfFile are lock free?

No idea. It may also depend on OS version, given the fuzz about Win7 
being the first with lock-free thread scheduler there may be giant-locks 
in memory-mapping at least in some versions.

 The same can happen with GetWriteWatch and ResetWriteWatch, though.


-- 
Dmitry Olshansky

Jun 14 2014

Rainer Schuetze <r.sagitario gmx.de> writes:

On 14.06.2014 20:17, Dmitry Olshansky wrote:
 I added COW to the concurrent GC (
 https://github.com/rainers/druntime/tree/concurrent_gc2 ) and it seems
 more stable with respect to pause times than the version with reading
 write watches.

 How do I run benchmarks and what are the newest results for you?

The benchmarks have been recently merged into druntime. Compile and run 
runbench.d in the benchmark folder.

I have updated the report and added new benchmarks at 
http://rainers.github.io/visuald/druntime/concurrentgc.html

Jun 15 2014

Martin Nowak <code dawg.eu> writes:

On 06/12/2014 08:34 AM, Rainer Schuetze wrote:
 If I add the actual copy into heap2 (i.e. every fourth page of 512 MB is
 copied), I get 80-90 ms more.

 The numbers are not great, but I guess the usual memory usage and number
 of modified pages will be much lower. I'll see if I can integrate this
 into the concurrent implementation.

Is there a chance that we can pause mutator threads when they trigger to 
many page copies?

Jun 15 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Concurrent GC (for Windows)