digitalmars.D.announce - DConf 2013 Day 1 Talk 6: Concurrent Garbage Collection for D by Leandro

Andrei Alexandrescu (4/4) May 20 2013 On reddit:

bearophile (5/6) May 20 2013 This link on the YouTube page seems to not be online:
Regan Heath (10/12) May 20 2013 This may be the Windows Copy On Write feature mentioned in the Q&A at th...

Diggory (20/30) May 20 2013 Fork isn't needed at all really in the technique described, this

deadalnix (2/6) May 20 2013 Copy on WRITE usually don't happen on immutable memory.

Diggory (4/10) May 20 2013 I never said it did... I said that when using fork the immutable

Leandro Lucarella (16/54) May 21 2013 I'm interested in what you're describing, but I don't know how can you

Diggory (12/26) May 21 2013 I do mean mapping the same physical block of memory into two
Diggory (28/42) May 23 2013 After doing some further investigation I think I've found a

Vladimir Panteleev (7/36) May 24 2013 I've tried writing a generational GC for D that used page

Leandro Lucarella (16/28) May 27 2013 Yeah, using memory protection to do what fork does manually is a known

Artur Skawina (7/16) May 24 2013 A page fault per every page written to between every GC run + a user

Vladimir Panteleev (6/13) May 20 2013 Yes, basically. I can't find where I've read that mapping COW

Diggory (16/31) May 20 2013 Either way, at least on windows the separate process would have

Vladimir Panteleev (4/10) May 23 2013 What would be some of the potential downsides of a long-running

Diggory (19/29) May 23 2013 You'd either have to distribute a separate .exe for the GC

Vladimir Panteleev (18/36) May 24 2013 I don't know why you say that. The satellite process would be the

Diggory (23/41) May 24 2013 I'm fairly sure windows only shares memory between instances of

Diggory (3/5) May 24 2013 OK, nvm, it seems it's just that the "SUBSYSTEM" setting in
Vladimir Panteleev (18/41) May 24 2013 Oh? I thought Windows treats all modules in the same way. Do you

Diggory (8/18) May 24 2013 True but assuming windows isn't completely degenerate the

Nick Sabalausky (4/8) May 20 2013 Torrents up, as well as links:
Jacob Carlborg (4/6) May 20 2013 Great talk. What's up with the extra minute added in the beginning?

Nick Sabalausky (3/10) May 20 2013 Suspense!

Dicebot (3/3) May 21 2013 Can't wait to see a prototype for D2 :) I have a feeling that

Leandro Lucarella (10/13) May 21 2013 I hope I can start porting it to D2 at some (not so far in the future)

Juan Manuel Cabo (22/26) May 24 2013 I know that this is slightly offtopic, since the topic lately

Jonathan M Davis (8/14) May 24 2013 There is interest in it, and Rainer Schütze did a talk on it at DConf. ...

Juan Manuel Cabo (3/22) May 24 2013 Thanks, that is great news!

Diggory (4/4) May 24 2013 On 64-bit windows there is also the "GetWriteWatch" function

Sean Cavanaugh (8/12) May 27 2013 There's all sorts of interesting stuff in 64 bit windows :) The user

Brian Rogoff (7/22) May 27 2013 That's very promising. The lack of precise garbage collection and

Diggory (15/39) May 27 2013 It's actually possible to improve the precision of the GC without

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On reddit:

http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/


Enjoy! Discuss!! Vote!!!

Andrei

May 20 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

This link on the YouTube page seems to not be online:

http://dconf.org/2013/talks/lucarella.pdf

Bye,
bearophile

May 20 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Mon, 20 May 2013 13:50:25 +0100, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On reddit:
 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

This may be the Windows Copy On Write feature mentioned in the Q&A at the  
end:
http://support.microsoft.com/kb/103858

.. but it's not clear to me how useful this is for fork emulation or  
similar.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

May 20 2013

"Diggory" <diggsey googlemail.com> writes:

On Monday, 20 May 2013 at 13:55:05 UTC, Regan Heath wrote:
 On Mon, 20 May 2013 13:50:25 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 On reddit:
 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

 This may be the Windows Copy On Write feature mentioned in the 
 Q&A at the end:
 http://support.microsoft.com/kb/103858

 .. but it's not clear to me how useful this is for fork 
 emulation or similar.

 R

Fork isn't needed at all really in the technique described, this 
is all that's needed:
- Map a copy of the memory using copy-on-write
- Run some code concurrently

It just happens that fork does both of these things, but you can 
equally well do the two things using separate calls.

In fact you should be able to avoid the deadlock issue by not 
using fork but just remapping some shared memory using copy on 
write. The GC can exist in a separate thread which pauses itself 
after every run. To run the GC it's then just a case of:
- stop the world
- copy registers to stack
- remap shared memory using COW
- resume the world
- resume the GC thread

And that would work on all modern OSes, plus you don't have the 
overhead of creating a new process or even a new thread. Also 
immutable memory doesn't need to be mapped, the GC thread can 
access it directly.

May 20 2013

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 20 May 2013 at 22:52:33 UTC, Diggory wrote:
 And that would work on all modern OSes, plus you don't have the 
 overhead of creating a new process or even a new thread. Also 
 immutable memory doesn't need to be mapped, the GC thread can 
 access it directly.

Copy on WRITE usually don't happen on immutable memory.

May 20 2013

"Diggory" <diggsey googlemail.com> writes:

On Tuesday, 21 May 2013 at 00:00:13 UTC, deadalnix wrote:
 On Monday, 20 May 2013 at 22:52:33 UTC, Diggory wrote:
 And that would work on all modern OSes, plus you don't have 
 the overhead of creating a new process or even a new thread. 
 Also immutable memory doesn't need to be mapped, the GC thread 
 can access it directly.

 Copy on WRITE usually don't happen on immutable memory.

I never said it did... I said that when using fork the immutable 
memory must be mapped into the new process, when using a thread 
it does not since threads share the same address space.

May 20 2013

Leandro Lucarella <luca llucax.com.ar> writes:

Diggory, el 21 de May a las 00:52 me escribiste:
On Monday, 20 May 2013 at 13:55:05 UTC, Regan Heath wrote:
On Mon, 20 May 2013 13:50:25 +0100, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

On reddit:
http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

This may be the Windows Copy On Write feature mentioned in the Q&A
at the end:
http://support.microsoft.com/kb/103858

.. but it's not clear to me how useful this is for fork emulation
or similar.

Fork isn't needed at all really in the technique described, this is
all that's needed:
- Map a copy of the memory using copy-on-write
- Run some code concurrently

It just happens that fork does both of these things, but you can
equally well do the two things using separate calls.

In fact you should be able to avoid the deadlock issue by not using
fork but just remapping some shared memory using copy on write. The
GC can exist in a separate thread which pauses itself after every
run. To run the GC it's then just a case of:
- stop the world
- copy registers to stack
- remap shared memory using COW
- resume the world
- resume the GC thread

And that would work on all modern OSes, plus you don't have the
overhead of creating a new process or even a new thread. Also
immutable memory doesn't need to be mapped, the GC thread can access
it directly.

I'm interested in what you're describing, but I don't know how can you
achieve this without fork()ing (or clone()ing in Linux). What does
"remap shared memory using COW" in a context where fork() doesn't
happen? Why do you even need shared memory if fork() doesn't happen? If
"remap shared memory using COW" means get a different address for the
same block of memory until a write happens in that block, then you can't
scan the roots anymore.

I'm *very* interested in your suggestion.

--
Leandro Lucarella (AKA luca) http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
CONDUCTOR BORRACHO CASI PROVOCA UNA TRAGEDIA: BATMAN UNICO TESTIGO
-- Crónica TV

May 21 2013

"Diggory" <diggsey googlemail.com> writes:

On Tuesday, 21 May 2013 at 20:08:16 UTC, Leandro Lucarella wrote:
 I'm interested in what you're describing, but I don't know how 
 can you
 achieve this without fork()ing (or clone()ing in Linux). What 
 does
 "remap shared memory using COW" in a context where fork() 
 doesn't
 happen? Why do you even need shared memory if fork() doesn't 
 happen? If
 "remap shared memory using COW" means get a different address 
 for the
 same block of memory until a write happens in that block, then 
 you can't
 scan the roots anymore.

 I'm *very* interested in your suggestion.

I do mean mapping the same physical block of memory into two 
virtual memory blocks, such that a write will cause the OS to 
make a copy and sever the link.

Given that we're dealing with significantly large blocks of 
memory all in multiples of the page size, it should be a simple 
matter to map from one copy of the memory to the other and back 
again using basic pointer arithmetic, so scanning the roots 
should not be a problem.

You need shared memory because a file handle is needed when 
calling "mmap" to enable the COW functionality, and shared memory 
lets you get a file handle to a block of memory.

May 21 2013

"Diggory" <diggsey googlemail.com> writes:

On Tuesday, 21 May 2013 at 20:08:16 UTC, Leandro Lucarella wrote:
 I'm interested in what you're describing, but I don't know how 
 can you
 achieve this without fork()ing (or clone()ing in Linux). What 
 does
 "remap shared memory using COW" in a context where fork() 
 doesn't
 happen? Why do you even need shared memory if fork() doesn't 
 happen? If
 "remap shared memory using COW" means get a different address 
 for the
 same block of memory until a write happens in that block, then 
 you can't
 scan the roots anymore.

 I'm *very* interested in your suggestion.

After doing some further investigation I think I've found a 
fairly awesome way of doing garbage collection on windows, 
hopefully linux has a similar mechanism. It doesn't require 
memory mapped files or anything special, it can be done 
retroactively, and it allows a kind of reverse copy on write 
which is what's actually needed.

When the GC is run:
- Use VirtualProtect to mark all mutable memory pages as read-only
- Add a vectored exception handler to handle the access violation 
exception
- Resume the GC thread

In the exception handler:
- Copy the page being modified to a new page
- Mark the original page as writable again
- Tell the GC to use the new page instead
- Continue

The main problem I see is the need to synchronise between the 
exception handler and the GC, when it would be nice if the 
exception handler was as light-weight as possible. However this 
can be largely solved by doing finer grained synchronisation/some 
clever stuff. For example, the handler can add the new mapping 
first, then do the page copy, and only then wait for the GC to 
acknowledge the new mapping, so the GC has the whole time that 
the copying is taking place to see the change. The GC would also 
of course have to wait for the copying to complete before 
continuing but pausing the GC for a short time isn't really an 
issue at all.

May 23 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Friday, 24 May 2013 at 06:52:19 UTC, Diggory wrote:
 On Tuesday, 21 May 2013 at 20:08:16 UTC, Leandro Lucarella 
 wrote:
 I'm interested in what you're describing, but I don't know how 
 can you
 achieve this without fork()ing (or clone()ing in Linux). What 
 does
 "remap shared memory using COW" in a context where fork() 
 doesn't
 happen? Why do you even need shared memory if fork() doesn't 
 happen? If
 "remap shared memory using COW" means get a different address 
 for the
 same block of memory until a write happens in that block, then 
 you can't
 scan the roots anymore.

 I'm *very* interested in your suggestion.

 After doing some further investigation I think I've found a 
 fairly awesome way of doing garbage collection on windows, 
 hopefully linux has a similar mechanism. It doesn't require 
 memory mapped files or anything special, it can be done 
 retroactively, and it allows a kind of reverse copy on write 
 which is what's actually needed.

 When the GC is run:
 - Use VirtualProtect to mark all mutable memory pages as 
 read-only
 - Add a vectored exception handler to handle the access 
 violation exception
 - Resume the GC thread

I've tried writing a generational GC for D that used page 
protection for write barriers a while ago. IIRC, I ran into 
performance issues (the page faults were rather expensive).

This approach does have the benefit that it will not cause pages 
that have been moved to swap to be pulled out in order to be 
scanned every time, though.

May 24 2013

Leandro Lucarella <luca llucax.com.ar> writes:

Vladimir Panteleev, el 24 de May a las 09:55 me escribiste:
When the GC is run:
- Use VirtualProtect to mark all mutable memory pages as read-only
- Add a vectored exception handler to handle the access violation
exception
- Resume the GC thread

 
 I've tried writing a generational GC for D that used page protection
 for write barriers a while ago. IIRC, I ran into performance issues
 (the page faults were rather expensive).

Yeah, using memory protection to do what fork does manually is a known
approach, discussed even in the "Garbage Collection" book[1].

The good thing about fork is it's sooooo much easier to implement, and
the OS is already highly tuned to do this for you. That's why, even when
it might be good to explore, is not a very tempting approach for me (but
I have it in mind as an alternative way to "fix" the potential deadlock
caused by glibc internal mutex).

 This approach does have the benefit that it will not cause pages
 that have been moved to swap to be pulled out in order to be scanned
 every time, though.

[1] http://books.google.de/books/about/Garbage_Collection.html?id=UdtQAAAAMAAJ&redir_esc=y

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Se ha dicho tanto que las apariencias engañan
Por supuesto que engañarán a quien sea tan vulgar como para creerlo

May 27 2013

Artur Skawina <art.08.09 gmail.com> writes:

On 05/24/13 08:52, Diggory wrote:
 After doing some further investigation I think I've found a fairly awesome way
of doing garbage collection on windows, hopefully linux has a similar
mechanism. It doesn't require memory mapped files or anything special, it can
be done retroactively, and it allows a kind of reverse copy on write which is
what's actually needed.
 
 When the GC is run:
 - Use VirtualProtect to mark all mutable memory pages as read-only
 - Add a vectored exception handler to handle the access violation exception
 - Resume the GC thread
 
 In the exception handler:
 - Copy the page being modified to a new page

A page fault per every page written to between every GC run + a user
space callback for every such fault + a page sized copy every time is
not going to perform very well...

(There are ways to avoid these costs, but no idea if it's easily doable on
windows...)

artur

May 24 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Monday, 20 May 2013 at 13:55:05 UTC, Regan Heath wrote:
 On Mon, 20 May 2013 13:50:25 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 On reddit:
 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

 This may be the Windows Copy On Write feature mentioned in the 
 Q&A at the end:
 http://support.microsoft.com/kb/103858

Yes, basically. I can't find where I've read that mapping COW 
memory from within the same process is only supported on NT 
versions. However, doing it from within the same process is not 
practical anyway, since that would imply at least halving the 
address space.

May 20 2013

"Diggory" <diggsey googlemail.com> writes:

On Tuesday, 21 May 2013 at 04:26:18 UTC, Vladimir Panteleev wrote:
 On Monday, 20 May 2013 at 13:55:05 UTC, Regan Heath wrote:
 On Mon, 20 May 2013 13:50:25 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 On reddit:
 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

 This may be the Windows Copy On Write feature mentioned in the 
 Q&A at the end:
 http://support.microsoft.com/kb/103858

 Yes, basically. I can't find where I've read that mapping COW 
 memory from within the same process is only supported on NT 
 versions. However, doing it from within the same process is not 
 practical anyway, since that would imply at least halving the 
 address space.

Either way, at least on windows the separate process would have 
to be persistent as creating a new process has a lot more 
overhead attached to it than on posix systems. Implicitly 
creating a child process is also something a D programmer might 
not want, again this is more windows specific where processes do 
not have the same strict hierarchy as on posix.

On 64-bit systems there shouldn't be a problem with address 
space, and even on 32-bit systems, the objects which take up the 
vast majority of the address space are usually arrays of 
primitive types which don't need to be scanned by the GC.

Certainly for 64-bit systems I think it's worth at least doing 
some performance comparisons, and I think using a thread will 
turn out to be more efficient. If it turns out that there are 
merits to both options then having a GC option to use a process 
or a thread might make sense.

May 20 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 21 May 2013 at 04:52:25 UTC, Diggory wrote:
 Either way, at least on windows the separate process would have 
 to be persistent as creating a new process has a lot more 
 overhead attached to it than on posix systems. Implicitly 
 creating a child process is also something a D programmer might 
 not want, again this is more windows specific where processes 
 do not have the same strict hierarchy as on posix.

What would be some of the potential downsides of a long-running 
satellite process?

A concurrent GC would probably be opt-in or opt-out, either way.

May 23 2013

"Diggory" <diggsey googlemail.com> writes:

On Thursday, 23 May 2013 at 12:39:04 UTC, Vladimir Panteleev 
wrote:
 On Tuesday, 21 May 2013 at 04:52:25 UTC, Diggory wrote:
 Either way, at least on windows the separate process would 
 have to be persistent as creating a new process has a lot more 
 overhead attached to it than on posix systems. Implicitly 
 creating a child process is also something a D programmer 
 might not want, again this is more windows specific where 
 processes do not have the same strict hierarchy as on posix.

 What would be some of the potential downsides of a long-running 
 satellite process?

 A concurrent GC would probably be opt-in or opt-out, either way.

You'd either have to distribute a separate .exe for the GC 
process or embed the second .exe inside the first and extract it 
at runtime to a temporary location, or call the same .exe with 
some flag which tells it to turn into a GC. None of those options 
are particularly nice, all of them require significant 
interference from the GC, it's not just a case of calling some 
function to start the GC. This is especially a problem in cases 
where the runtime can't insert the init handler such as when 
using WinMain, which you currently have to do unless you want a 
console window to pop up.

Next there's the fact that you have a separate top level process 
for every D process. This is going to cause problems with 
security software since you now have to give both processes 
permissions in order to run. In addition you have to consider the 
case where one or other of the two processes is paused or killed 
unexpectedly - you don't want orphaned GC processes hanging 
around wasting resources.

May 23 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thursday, 23 May 2013 at 23:49:20 UTC, Diggory wrote:
 You'd either have to distribute a separate .exe for the GC 
 process or embed the second .exe inside the first and extract 
 it at runtime to a temporary location, or call the same .exe 
 with some flag which tells it to turn into a GC. None of those 
 options are particularly nice, all of them require significant 
 interference from the GC, it's not just a case of calling some 
 function to start the GC.

I don't know why you say that. The satellite process would be the 
same executable file as the main process, but started with a 
special switch (or environment variable), which will be handled 
by the D runtime. Since the two processes use the same image, the 
executable code will share the same pages in memory, resulting in 
a small overhead.

 This is especially a problem in cases where the runtime can't 
 insert the init handler such as when using WinMain,

When implementing WinMain, you have to call the runtime 
initialization function, which will initialize the GC. GC 
initialization for the satellite process need not exit.

 which you currently have to do unless you want a console window 
 to pop up.

I don't think this is true. Although the image subsystem is 
auto-detected by the entry point you're using (CONSOLE for main, 
WINDOWS for WinMain), you can specify it explicitly using the 
/SUBSYSTEM linker switch (/SUBSYSTEM:WINDOWS).

 Next there's the fact that you have a separate top level 
 process for every D process. This is going to cause problems 
 with security software since you now have to give both 
 processes permissions in order to run.

Sorry? Could you provide an example (with a specific security 
software package)?

 In addition you have to consider the case where one or other of 
 the two processes is paused or killed unexpectedly - you don't 
 want orphaned GC processes hanging around wasting resources.

Why is this an issue? Windows provides simple ways to wait or 
check for a process's termination.

May 24 2013

"Diggory" <diggsey googlemail.com> writes:

On Friday, 24 May 2013 at 07:48:49 UTC, Vladimir Panteleev wrote:
 I don't know why you say that. The satellite process would be 
 the same executable file as the main process, but started with 
 a special switch (or environment variable), which will be 
 handled by the D runtime. Since the two processes use the same 
 image, the executable code will share the same pages in memory, 
 resulting in a small overhead.

I'm fairly sure windows only shares memory between instances of 
the same DLL, not executables.

 When implementing WinMain, you have to call the runtime 
 initialization function, which will initialize the GC. GC 
 initialization for the satellite process need not exit.

You're still executing part of the user's code and it's going to 
be exceedingly hard to satisfactorily explain why the main 
function will be called twice except the second time it will 
mysteriously stop half-way through because the runtime init 
function doesn't return...

 I don't think this is true. Although the image subsystem is 
 auto-detected by the entry point you're using (CONSOLE for 
 main, WINDOWS for WinMain), you can specify it explicitly using 
 the /SUBSYSTEM linker switch (/SUBSYSTEM:WINDOWS).

The only time I've been able to get it to not show a console 
window is when using WinMain...

 Sorry? Could you provide an example (with a specific security 
 software package)?

Any security software with the active defense type stuff, Commodo 
does it for example. The things they ask you about vary and it's 
completely pointless IMO, but they do ask for permission for 
practically everything and with two processes that's twice as 
much to ask you about.

Anyway, my main point is that it's not the kind of thing you want 
to impose as standard on all D applications, there may be other 
problems it causes depending on what the program is for. If it 
exists at all it should be opt-in and the default should still be 
as performant as possible.

 I've tried writing a generational GC for D that used page 
 protection for write barriers a while ago. IIRC, I ran into 
 performance issues (the page faults were rather expensive).

Hmm, it shouldn't really be much slower than the technique 
Leandro was using - both rely on the exact same number of page 
faults - and his results looked very promising.

May 24 2013

"Diggory" <diggsey googlemail.com> writes:

On Friday, 24 May 2013 at 09:56:25 UTC, Diggory wrote:
 The only time I've been able to get it to not show a console 
 window is when using WinMain...

OK, nvm, it seems it's just that the "SUBSYSTEM" setting in 
VisualD has no effect, you have to set it through a .def file.

May 24 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Friday, 24 May 2013 at 09:56:25 UTC, Diggory wrote:
 I'm fairly sure windows only shares memory between instances of 
 the same DLL, not executables.

Oh? I thought Windows treats all modules in the same way. Do you 
have a source?

 You're still executing part of the user's code and it's going 
 to be exceedingly hard to satisfactorily explain why the main 
 function will be called twice except the second time it will 
 mysteriously stop half-way through because the runtime init 
 function doesn't return...

I don't see that as a problem, considering such a GC would need 
to be opt-in.

 Any security software with the active defense type stuff, 
 Commodo does it for example. The things they ask you about vary 
 and it's completely pointless IMO, but they do ask for 
 permission for practically everything and with two processes 
 that's twice as much to ask you about.

The reason why I asked for a specific example is that I don't 
know of any actual case when this will be a problem in practice.

For one, I've never seen security software that reacts 
differently on two instances of the same executable - usually, 
any permissions you set are for the .exe file, or more generic 
rules (like port numbers).

For two, the satellite process does not need any kinds of 
permissions. Even the requirement of accessing other processes' 
memory that the straight-forward approach needs can be avoided by 
allowing the satellite process to inherit a handle to the main 
process.

 Anyway, my main point is that it's not the kind of thing you 
 want to impose as standard on all D applications, there may be 
 other problems it causes depending on what the program is for. 
 If it exists at all it should be opt-in and the default should 
 still be as performant as possible.

Yes, which is why I also think it should be opt-in.

 I've tried writing a generational GC for D that used page 
 protection for write barriers a while ago. IIRC, I ran into 
 performance issues (the page faults were rather expensive).

 Hmm, it shouldn't really be much slower than the technique 
 Leandro was using - both rely on the exact same number of page 
 faults - and his results looked very promising.

Leandro wasn't using Windows ;)

May 24 2013

"Diggory" <diggsey googlemail.com> writes:

On Friday, 24 May 2013 at 10:20:12 UTC, Vladimir Panteleev wrote:
 Leandro wasn't using Windows ;)

True but assuming windows isn't completely degenerate the
overhead of a page fault is going to be pretty close to identical
on the same cpu architecture.

On Friday, 24 May 2013 at 11:15:21 UTC, Artur Skawina wrote:
 A page fault per every page written to between every GC run + a 
 user
 space callback for every such fault + a page sized copy every 
 time is
 not going to perform very well...

 (There are ways to avoid these costs, but no idea if it's 
 easily doable on
 windows...)

 artur

It's only between the time a GC run is started and the time it
finishes, exactly the same as in the talk. A generational GC
should be able to do complete mark cycles fairly infrequently.

May 24 2013

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On Mon, 20 May 2013 08:50:25 -0400
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 On reddit:
 
 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/
 

Torrents up, as well as links:

http://semitwist.com/download/misc/dconf2013/

May 20 2013

Jacob Carlborg <doob me.com> writes:

On 2013-05-20 14:50, Andrei Alexandrescu wrote:
 On reddit:

 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

Great talk. What's up with the extra minute added in the beginning?

-- 
/Jacob Carlborg

May 20 2013

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On Tue, 21 May 2013 08:27:23 +0200
Jacob Carlborg <doob me.com> wrote:

 On 2013-05-20 14:50, Andrei Alexandrescu wrote:
 On reddit:

 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/

 
 Great talk. What's up with the extra minute added in the beginning?
 

Suspense!

May 20 2013

"Dicebot" <m.strashun gmail.com> writes:

Can't wait to see a prototype for D2 :) I have a feeling that 
this may solve at least some of vibe.d latency issues at high 
concurrency levels.

May 21 2013

Leandro Lucarella <luca llucax.com.ar> writes:

Dicebot, el 21 de May a las 09:55 me escribiste:
 Can't wait to see a prototype for D2 :) I have a feeling that this
 may solve at least some of vibe.d latency issues at high concurrency
 levels.

I hope I can start porting it to D2 at some (not so far in the future)
point...

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Did you know the originally a Danish guy invented the burglar-alarm
unfortunately, it got stolen

May 21 2013

"Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:

On Monday, 20 May 2013 at 12:50:23 UTC, Andrei Alexandrescu wrote:
 On reddit:

 http://www.reddit.com/r/programming/comments/1eovfu/dconf_2013_day_1_talk_6_concurrent_garbage/


 Enjoy! Discuss!! Vote!!!

 Andrei

I know that this is slightly offtopic, since the topic lately 
seems to be how to make the GC run generationally or with small 
footprint (don't stop the world, etc.).

I'd like to know if there is interest in a precise garbage 
collector. Anyways, here is how .NET does it:

http://blogs.msdn.com/b/abhinaba/archive/2009/03/03/back-to-basics-how-does-the-gc-find-object-references.aspx

It uses a mask stored in the type information of a class. D 
doesn't have this kind of type info in runtime I guess, but since 
D is on the verge of supporting multiple dlls/so, the time is now 
for a small modification to be made in the ABI to support this 
(if it is ever going to be made).

I know that in 64bits there is less of a problem with data as 
false pointers, but having a precise garbage collector would make 
two things possible:

   1) Defragmenting the heap by being able to move references.
   2) Easier to make a generational GC

The following link explains (in the first comment) how .NET 
distinguishes its own stack frames from non-managed stack frames, 
by adding a "cookie":

http://stackoverflow.com/questions/10669173/how-does-the-gc-update-references-after-compaction-occurs


--jm

May 24 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, May 24, 2013 20:30:54 Juan Manuel Cabo wrote:

 I know that this is slightly offtopic, since the topic lately
 seems to be how to make the GC run generationally or with small
 footprint (don't stop the world, etc.).
 
 I'd like to know if there is interest in a precise garbage
 collector.

There is interest in it, and Rainer Schütze did a talk on it at DConf. At the 
current pace (assuming that Andrei actually posts one on Monday even though 
it's a federal holiday in the US), it'll be posted on June 3rd (and if he 
skips Monday, then it'll probably be June 5th). And actually, the precise GC 
changes stand a much better chance of making it into druntime in the short 
term than any concurrency changes do.

- Jonathan M Davis

May 24 2013

"Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:

On Friday, 24 May 2013 at 19:44:19 UTC, Jonathan M Davis wrote:
 On Friday, May 24, 2013 20:30:54 Juan Manuel Cabo wrote:

 I know that this is slightly offtopic, since the topic lately
 seems to be how to make the GC run generationally or with small
 footprint (don't stop the world, etc.).
 
 I'd like to know if there is interest in a precise garbage
 collector.

 There is interest in it, and Rainer Schütze did a talk on it at 
 DConf. At the
 current pace (assuming that Andrei actually posts one on Monday 
 even though
 it's a federal holiday in the US), it'll be posted on June 3rd 
 (and if he
 skips Monday, then it'll probably be June 5th). And actually, 
 the precise GC
 changes stand a much better chance of making it into druntime 
 in the short
 term than any concurrency changes do.

 - Jonathan M Davis

Thanks, that is great news!

--jm

May 24 2013

"Diggory" <diggsey googlemail.com> writes:

On 64-bit windows there is also the "GetWriteWatch" function 
which lets you access the dirty flag in the page table = no page 
faults = super efficient concurrent generational GC. Just a shame 
it doesn't exist on 32-bit systems for some reason.

May 24 2013

Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:

On 5/24/2013 11:12 PM, Diggory wrote:
 On 64-bit windows there is also the "GetWriteWatch" function which lets
 you access the dirty flag in the page table = no page faults = super
 efficient concurrent generational GC. Just a shame it doesn't exist on
 32-bit systems for some reason.

There's all sorts of interesting stuff in 64 bit windows :)  The user 
mode thread scheduler is pretty cool.

On the flip side: 32 bit is in its twilight days, and I am reasonably 
confident the next game I work on will be the last one that even 
supports 32 bits.  Then I can finally use all the new 64 bit goodies :) 
   32 bits will be reserved for phones and tablets (and even then 
tablets will probably be making the switch pretty soon-ish)

May 27 2013

"Brian Rogoff" <brogoff gmail.com> writes:

On Friday, 24 May 2013 at 19:44:19 UTC, Jonathan M Davis wrote:
 On Friday, May 24, 2013 20:30:54 Juan Manuel Cabo wrote:
 I'd like to know if there is interest in a precise garbage
 collector.

 There is interest in it, and Rainer Schütze did a talk on it at 
 DConf. At the
 current pace (assuming that Andrei actually posts one on Monday 
 even though
 it's a federal holiday in the US), it'll be posted on June 3rd 
 (and if he
 skips Monday, then it'll probably be June 5th). And actually, 
 the precise GC
 changes stand a much better chance of making it into druntime 
 in the short
 term than any concurrency changes do.

 - Jonathan M Davis

That's very promising. The lack of precise garbage collection and 
the unclear story with regards to programming sans-GC (maybe it's 
clear to someone, but not to me) is far more of a "deal breaker" 
for me than the lack of non-nullable pointers. I hope that you're 
right and that this gets sorted out soon.

-- Brian

May 27 2013

"Diggory" <diggsey googlemail.com> writes:

On Monday, 27 May 2013 at 17:56:10 UTC, Brian Rogoff wrote:
 On Friday, 24 May 2013 at 19:44:19 UTC, Jonathan M Davis wrote:
 On Friday, May 24, 2013 20:30:54 Juan Manuel Cabo wrote:
 I'd like to know if there is interest in a precise garbage
 collector.

 There is interest in it, and Rainer Schütze did a talk on it 
 at DConf. At the
 current pace (assuming that Andrei actually posts one on 
 Monday even though
 it's a federal holiday in the US), it'll be posted on June 3rd 
 (and if he
 skips Monday, then it'll probably be June 5th). And actually, 
 the precise GC
 changes stand a much better chance of making it into druntime 
 in the short
 term than any concurrency changes do.

 - Jonathan M Davis

 That's very promising. The lack of precise garbage collection 
 and the unclear story with regards to programming sans-GC 
 (maybe it's clear to someone, but not to me) is far more of a 
 "deal breaker" for me than the lack of non-nullable pointers. I 
 hope that you're right and that this gets sorted out soon.

 -- Brian

It's actually possible to improve the precision of the GC without 
any additional type info. As long as you can give some unique ID 
to each type when you allocate it then the GC can learn the 
layout of that type on the fly.

For example a simple algorithm would be:
- When a new ID is first see create new type-info that is all 
pointers.
- While scanning an instance of that type, if a pointer points to 
a value in the higher half, or a sufficiently low value which is 
not equal to zero, then remove this pointer from the type-info.

You would have to disable this for unions, but for the rest it 
should work fine. Plus with more intelligent algorithms you can 
handle more cases. You could even save the type-info to a file 
and reuse it later to improve performance.

May 27 2013

D Programming

C/C++ Programming

Other

digitalmars.D.announce - DConf 2013 Day 1 Talk 6: Concurrent Garbage Collection for D by Leandro