www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Question about D, garbage collection and fork()

reply Jerry Quinn <jlquinn optonline.net> writes:
Where I work, we find it very useful to start a process, load data, then fork()
to parallelize.  Our data is large, such that we'd run out of memory  trying to
run a complete copy on each core.  Once the process is loaded, we don't need
that much writable memory, so fork is appealing to share the loaded pages. 
It's possible to use mmap for some of the data, but inconvenient for other
data, even though it's read-only at runtime.

So here's my question:  In D, if I create a lot of data in the
garbage-collected heap that will be read-only, then fork the process, will I
get the benefit of the operating system's copy-on-write and only use a small
amount of additional memory per process?

In case you're wondering why I wouldn't use threading, one argument is that if
you have a bug and the process crashes, you only lose one process instead of N
threads.  That's actually useful for robustness.

Thoughts?
Mar 09 2011
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 09 Mar 2011 17:56:54 -0500, Jerry Quinn <jlquinn optonline.net>  
wrote:

 Where I work, we find it very useful to start a process, load data, then  
 fork() to parallelize.  Our data is large, such that we'd run out of  
 memory  trying to run a complete copy on each core.  Once the process is  
 loaded, we don't need that much writable memory, so fork is appealing to  
 share the loaded pages.  It's possible to use mmap for some of the data,  
 but inconvenient for other data, even though it's read-only at runtime.

 So here's my question:  In D, if I create a lot of data in the  
 garbage-collected heap that will be read-only, then fork the process,  
 will I get the benefit of the operating system's copy-on-write and only  
 use a small amount of additional memory per process?
Do you know what causes the OS to regard that memory as read-only? Since fork() is a C system call, and D gets its heap memory the same as any other unix process (brk()), I can't see why it wouldn't work. As long as you do the same thing you do in C, I think it will work. -Steve
Mar 10 2011
parent reply Jerry Quinn <jlquinn optonline.net> writes:
Steven Schveighoffer Wrote:
 
 Do you know what causes the OS to regard that memory as read-only?  Since  
 fork() is a C system call, and D gets its heap memory the same as any  
 other unix process (brk()), I can't see why it wouldn't work.  As long as  
 you do the same thing you do in C, I think it will work.
It's not that the OS considers the memory actually read-only. It uses copy-on-write so the pages will be shared between the processes until one or the other attempts to write to the page. So if the garbage collector moves things around, it will cause the pages to be copied and unshared. So my question is really probably whether the garbage collector will tend to dirty shared pages or not. Jerry
Mar 10 2011
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 10 Mar 2011 14:44:40 -0500, Jerry Quinn <jlquinn optonline.net>  
wrote:

 Steven Schveighoffer Wrote:

 Do you know what causes the OS to regard that memory as read-only?   
 Since
 fork() is a C system call, and D gets its heap memory the same as any
 other unix process (brk()), I can't see why it wouldn't work.  As long  
 as
 you do the same thing you do in C, I think it will work.
It's not that the OS considers the memory actually read-only. It uses copy-on-write so the pages will be shared between the processes until one or the other attempts to write to the page. So if the garbage collector moves things around, it will cause the pages to be copied and unshared. So my question is really probably whether the garbage collector will tend to dirty shared pages or not.
Some pages are made of bins of smaller blocks. For example, a page may be a set of 16-byte blocks. In this case, it's entirely possible that both process-local and process-shared data can be in the same page. To get around this, allocate blocks of more than PAGESIZE/2 size. Then use those to contain your read-only data. The GC stores its metadata in separate pages than the actual data, so you don't have to worry about pages being dirtied by the GC (for example during garbage collection) even though the data is static. You also always have the ability to use C malloc if you prefer to avoid GC involvement. -Steve
Mar 10 2011
prev sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
On 10-3-2011 6:56, Jerry Quinn wrote:
 Where I work, we find it very useful to start a process, load data, then
fork() to parallelize.  Our data is large, such that we'd run out of memory 
trying to run a complete copy on each core.  Once the process is loaded, we
don't need that much writable memory, so fork is appealing to share the loaded
pages.  It's possible to use mmap for some of the data, but inconvenient for
other data, even though it's read-only at runtime.

 So here's my question:  In D, if I create a lot of data in the
garbage-collected heap that will be read-only, then fork the process, will I
get the benefit of the operating system's copy-on-write and only use a small
amount of additional memory per process?

 In case you're wondering why I wouldn't use threading, one argument is that if
you have a bug and the process crashes, you only lose one process instead of N
threads.  That's actually useful for robustness.

 Thoughts?
D's try-catch will catch all errors, even access violations and stack overflow: import std.stdio; void so() { so(); } void main() { try { so(); } catch { } writeln("graceful exit"); } By wrapping each thread's code in try-catch you can handle each thread going down. Of course, a thread can still corrupt the memory of another thread. To share memory between processes you'd have to use an OS specific API. On Windows you'd use a file mapping. L.
Mar 10 2011
parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Thu, 10 Mar 2011 16:13:16 +0200, Lionello Lunesu  
<lio lunesu.remove.com> wrote:

 D's try-catch will catch all errors, even access violations and stack  
 overflow:
Only on Windows. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Mar 10 2011