www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - StackThreads and Phobos

reply mclysenk mtu.edu writes:
Hello fellow D users!  I've been working diligently on improving StackThreads
(http://assertfalse.com), and so far D has worked wonderfully.  The same can not
be said for Phobos...  (This may get a bit long winded.)

---

1.  removeRange is really slow.

In order to prevent the garbage collector from accidentally deleting valid
objects, we need to mark the stack of each user-context.

One naive solution is to simply mark the entire stack region, however this isn't
a very good idea.

Say I create a user context which calls a function and then yields.  If that
function performed any allocations, then they would remain on the stack, and get
scanned by the GC.  Until I overwrite those values, the GC will always see them,
and therefore never collect their memory. 

This results in memory leaks, and a dramatic loss of performance.  The solution
is to dynamically resize the range used by the stack.  When we switch into a
context, we remove the range, and before we leave, we add it back.  This nicely
solves the problem, except for one thing - removeRange requires O(n) time.

This performance penalty applies to each Context, increasing the cost of
switching dramatically.

A simple solution is to use a hash set or another efficient data structure.
This would also be able to detect range address conflicts, such as adding a
range twice.

As a short term fix, I propose changing line 987 of internal/gcx.d from:

memmove(ranges + i, ranges + i + 1, (nranges - i) * ranges[0].sizeof);

to:

memmove(ranges + i, ranges + nranges, ranges[0].sizeof);

While this does not remove the sluggish linear search, it does eliminate a very
large memory copy.  The cost of this approach is that the ordering of ranges is
not preserved, which may increase look up time in some circumstances.

Another possibility is to allow for 'dynamic' ranges, which the user would be
able to resize.  In this situation, we could still use a linear search for
removal, however changing the range's dimensions is a constant time operation.
Thread safety could be insured via an atomic operation, ie CMPXCHG.


2.  It is impossible to safely handle a page fault.

Another problem with StackThreads right now is stack growth.  In a regular
context, the operating system automatically grows the stack once the program
hits the bottom.  It keeps on doing this until there is no more address
space/memory left, and then it throws an overflow.

In StackThreads, I'd like to do the same thing.  However, DMD transforms any
page faults into exceptions, which it then unwinds up the call stack.  This
makes it impossible to trap, since the exception handler will unwind the stack
until it hits a try block with a matching catch - and only then will it get
handled.  From this state, there is no way to recover the program, since we
cannot undo the unwind.  The only tenable option is a global fault and
termination of the context.

In windows, you could try to install a custom exception handler, and trap any
stack overflows yourself - except DMD automatically installs an SEH for each
try..catch block.  It is pretty much inevitable that a hapless user will eat
your page fault when he is trying to catch some other sort of exception.

Call backs are one solution, user programs can carefully handle page faults and
stack overflows without disturbing the state of the program.


---

There are a few more issues, but these are the only two I have encountered that
I couldn't hack around.

My personal view is that integrated user-context switching would fix most of
those issues, and provide a great deal of flexibility.  Once the GC issues are
resolved, it can be made very fast.  In fact, the only overhead (outside GC
management) is equivalent to a single function call.

Standard user level contexts make all sorts of things possible, like coroutines
or micro threads.  They can be used to iterate over complex data structures
trivially.  They are much simpler and faster than threads for GUIs, and
eliminate the need for many state machine objects.

In conjunction with standard preemptive threading, user-contexts provide an
elegant solution to many practical programming problems.  Standard library
integration will result in a much faster context switching and better future
support.


-Mikola Lysenko
Jun 30 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
mclysenk mtu.edu wrote:
 Hello fellow D users!  I've been working diligently on improving StackThreads
 (http://assertfalse.com), and so far D has worked wonderfully.  The same can
not
 be said for Phobos...  (This may get a bit long winded.)
 
 ---
 
 1.  removeRange is really slow.
 
 In order to prevent the garbage collector from accidentally deleting valid
 objects, we need to mark the stack of each user-context.
 
 One naive solution is to simply mark the entire stack region, however this
isn't
 a very good idea.
 
 Say I create a user context which calls a function and then yields.  If that
 function performed any allocations, then they would remain on the stack, and
get
 scanned by the GC.  Until I overwrite those values, the GC will always see
them,
 and therefore never collect their memory. 
 
 This results in memory leaks, and a dramatic loss of performance.  The solution
 is to dynamically resize the range used by the stack.  When we switch into a
 context, we remove the range, and before we leave, we add it back.  This nicely
 solves the problem, except for one thing - removeRange requires O(n) time.
 
But isn't that the correct behavior? If one can return to that function instance(context), shouldn't all allocations made previously by that function remain valid? (note: I haven't used your library, I just read your post and this question came up) Also: Where do you keep the data from each context? In the heap? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 02 2006
parent reply mclysenk mtu.edu writes:
In article <e88ltq$kvc$3 digitaldaemon.com>, Bruno Medeiros says...
But isn't that the correct behavior? If one can return to that function 
instance(context), shouldn't all allocations made previously by that 
function remain valid?
(note: I haven't used your library, I just read your post and this 
question came up)
It is not correct behavior. Here's an example which may illustrate this better: #void func1() #void func2() Here is a trace of the execution of func1 alongside it's stack In this situation, it is doubtful that arr will ever get collected. In order for the GC to remove it, it must not find ANY references to arr in memory. Period. If the GC scans the entire stack, it will see a reference to arr, and therefore it will never collect arr. This is a trivial example, but it is not difficult to see that these sorts of memory leaks can become very substantial and very difficult to track down. The problem is how to tell the garbage collector not to scan past the stack pointer. The only way at the moment is to use addRange and removeRange, or zero out all of the unused stack memory every time you yield. Either option has a substantial overhead.
Also: Where do you keep the data from each context? In the heap?
Heap. The next version of stackthreads will replace malloc/free with system specific allocators, such as mmap and VirtualAlloc. -Mikola Lysenko
Jul 02 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
mclysenk mtu.edu wrote:
 In article <e88ltq$kvc$3 digitaldaemon.com>, Bruno Medeiros says...
 But isn't that the correct behavior? If one can return to that function 
 instance(context), shouldn't all allocations made previously by that 
 function remain valid?
 (note: I haven't used your library, I just read your post and this 
 question came up)
It is not correct behavior. Here's an example which may illustrate this better: #void func1() #void func2() Here is a trace of the execution of func1 alongside it's stack In this situation, it is doubtful that arr will ever get collected. In order for the GC to remove it, it must not find ANY references to arr in memory. Period. If the GC scans the entire stack, it will see a reference to arr, and therefore it will never collect arr. This is a trivial example, but it is not difficult to see that these sorts of memory leaks can become very substantial and very difficult to track down. The problem is how to tell the garbage collector not to scan past the stack pointer. The only way at the moment is to use addRange and removeRange, or zero out all of the unused stack memory every time you yield. Either option has a substantial overhead.
 Also: Where do you keep the data from each context? In the heap?
Heap. The next version of stackthreads will replace malloc/free with system specific allocators, such as mmap and VirtualAlloc. -Mikola Lysenko
Unless my understanding of this is wrong, the use of the word "never" (gets collected) is too strong. If during normal program flow the stack is overwritten again, the reference becomes free. Or am I missing something? Also, doesn't this happen with any regular function? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 03 2006
parent reply mclysenk mtu.edu writes:
In article <e8artd$16e0$1 digitaldaemon.com>, Bruno Medeiros says...
Unless my understanding of this is wrong, the use of the word "never" 
(gets collected) is too strong. If during normal program flow the stack 
is overwritten again, the reference becomes free. Or am I missing something?
Also, doesn't this happen with any regular function?
Well, you are correct, that it may still get collected in the future - regardless, its prospects are fairly dim. You are also right in your statement that this is not abnormal behavior for functions, and the GC knows this. It has a special case which handles the stack of a regular function. When the GC does a full collect, it pauses all threads and marks the range from the stack pointer to the bottom of the stack. This prevents it from scanning stuff which is no longer valid, since that range never gets marked. This works very well. The bad part happens when we create multiple stacks for one thread. In order for the GC to properly scan them, we need to mark their stack when it is not in use. If the GC didn't scan their stack, then we'd accidentally over-collect stuff they were using. If we mark the entire stack range, then we'd under-collect and get a bunch of memory leaks. To fix this, we use addRange to mark the active part of their stack before we switch out their context, and removeRange to delete their old - and potentially invalid - stack range when we switch into their context. This part is tricky, and it burned me many times while I was working on stack threads before I finally caught on. It is not a big deal to fix once you know what's happening. The main disadvantage at the moment is removeRange - it's just too slow! It can be fixed, but it requires changing phobos. -Mikola Lysenko
Jul 03 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
mclysenk mtu.edu wrote:
 In article <e8artd$16e0$1 digitaldaemon.com>, Bruno Medeiros says...
 Unless my understanding of this is wrong, the use of the word "never" 
 (gets collected) is too strong. If during normal program flow the stack 
 is overwritten again, the reference becomes free. Or am I missing something?
 Also, doesn't this happen with any regular function?
Well, you are correct, that it may still get collected in the future - regardless, its prospects are fairly dim. You are also right in your statement that this is not abnormal behavior for functions, and the GC knows this. It has a special case which handles the stack of a regular function. When the GC does a full collect, it pauses all threads and marks the range from the stack pointer to the bottom of the stack. This prevents it from scanning stuff which is no longer valid, since that range never gets marked. This works very well. The bad part happens when we create multiple stacks for one thread. In order for the GC to properly scan them, we need to mark their stack when it is not in use. If the GC didn't scan their stack, then we'd accidentally over-collect stuff they were using. If we mark the entire stack range, then we'd under-collect and get a bunch of memory leaks. To fix this, we use addRange to mark the active part of their stack before we switch out their context, and removeRange to delete their old - and potentially invalid - stack range when we switch into their context. This part is tricky, and it burned me many times while I was working on stack threads before I finally caught on. It is not a big deal to fix once you know what's happening. The main disadvantage at the moment is removeRange - it's just too slow! It can be fixed, but it requires changing phobos. -Mikola Lysenko
That was clear now, but for what I understand, then your problem is simply that you must use removeRange(and it is slow). That is, it doesn't really have to do with function contexts and stacks. Anyone who allocates manually managed memory for any purpose, and then uses addRange on it, will have the same problem, right? (because they too will have to use removeRange) -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 03 2006
parent mclysenk mtu.edu writes:
In article <e8baer$1ond$1 digitaldaemon.com>, Bruno Medeiros says...
That was clear now, but for what I understand, then your problem is 
simply that you must use removeRange(and it is slow).
That is, it doesn't really have to do with function contexts and stacks. 
Anyone who allocates manually managed memory for any purpose, and then 
uses addRange on it, will have the same problem, right? (because they 
too will have to use removeRange)
Correct. This problem will affect everyone who uses addRange/removeRange. -Mikola Lysenko
Jul 03 2006