www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Advice requested for fixing issue 17914

reply Brian Schott <briancschott gmail.com> writes:
Context: https://issues.dlang.org/show_bug.cgi?id=17914

I need to get this issue resolved as soon as possible so that the 
fix makes it into the next compiler release. Because it involves 
cleanup code in a class destructor a design change may be 
necessary. Who should I contact to determine the best way to fix 
this bug?
Oct 23
next sibling parent Kagamin <spam here.lot> writes:
Call destroy(fiber) when it completed execution? Who manages 
fibers?
Oct 24
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/23/17 12:56 PM, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914
 
 I need to get this issue resolved as soon as possible so that the fix 
 makes it into the next compiler release. Because it involves cleanup 
 code in a class destructor a design change may be necessary. Who should 
 I contact to determine the best way to fix this bug?
A failing use case would help. Fixing a bug when you can't reproduce is difficult. -Steve
Oct 24
parent Brian Schott <briancschott gmail.com> writes:
On Tuesday, 24 October 2017 at 14:28:01 UTC, Steven Schveighoffer 
wrote:
 A failing use case would help. Fixing a bug when you can't 
 reproduce is difficult.

 -Steve
I've attached one to the bug report.
Oct 24
prev sibling next sibling parent reply qznc <qznc web.de> writes:
On Monday, 23 October 2017 at 16:56:32 UTC, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914

 I need to get this issue resolved as soon as possible so that 
 the fix makes it into the next compiler release. Because it 
 involves cleanup code in a class destructor a design change may 
 be necessary. Who should I contact to determine the best way to 
 fix this bug?
Looking at git blame [0], I guess Martin Nowak and Nemanja Boric seem to be pretty involved. Not sure how deep Petar Kirov and Sean Kelly are into Fibers. My question wrt to the bug: Why is munmap/freeStack called in the destructor? Could be done right after termination? [0] https://github.com/dlang/druntime/blame/ec9a79e15d446863191308fd5e20febce2053546/src/core/thread.d#L4077
Oct 24
parent reply Brian Schott <briancschott gmail.com> writes:
On Tuesday, 24 October 2017 at 21:49:10 UTC, qznc wrote:
 My question wrt to the bug: Why is munmap/freeStack called in 
 the destructor? Could be done right after termination?
I've been reading the Fiber code and (so far) that seems seems to be reasonable. Can anybody think of a reason that this would be a bad idea? I'd rather not create a pull request for a design that's not going to work because of a detail I've overlooked.
Oct 24
parent safety0ff <safety0ff.dev gmail.com> writes:
On Wednesday, 25 October 2017 at 01:26:10 UTC, Brian Schott wrote:
 I've been reading the Fiber code and (so far) that seems seems 
 to be reasonable. Can anybody think of a reason that this would 
 be a bad idea? I'd rather not create a pull request for a 
 design that's not going to work because of a detail I've 
 overlooked.
Just skimming the Fiber code I found the reset(...) API functions whose purpose is to re-use Fibers once they've terminated. Eager stack deallocation would have to coexist with the Fiber reuse API. Perhaps the Fiber reuse API could simply be polished & made easy to integrate so that your original use case no longer hits system limits. I.e. Perhaps an optional delegate could be called upon termination, making it easier to hook in Fiber recycling. The reason my thoughts head in that direction is that I've read that mmap/unmmap 'ing frequently isn't recommended in performance conscious programs.
Oct 24
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/23/17 12:56 PM, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914
 
 I need to get this issue resolved as soon as possible so that the fix 
 makes it into the next compiler release. Because it involves cleanup 
 code in a class destructor a design change may be necessary. Who should 
 I contact to determine the best way to fix this bug?
It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before? Regardless of the cause, this puts a limitation on the number of simultaneous Fibers one can have. In other words, this is not just a problem with Fibers not being cleaned up properly, because one may need more than 65k fibers actually running simultaneously. We should try to prevent that as a limitation. For example, even the following code I would think is something we should support: void main() { import std.concurrency : Generator, yield; import std.stdio : File, writeln; auto f = File("/proc/sys/vm/max_map_count", "r"); ulong n; f.readf("%d", &n); writeln("/proc/sys/vm/max_map_count = ", n); Generator!int[] gens; // retain pointers to all the generators foreach (i; 0 .. n + 1000) { if (i % 1000 == 0) writeln("i = ", i); gens ~= new Generator!int({ yield(1); }); } } If we *can't* do this, then we should provide a way to manage the limits I.e. there should be a way to be able to create more than the limit's number of fibers, but only allocate stacks when we can (and have a way to tell the user what's going on). -Steve
Oct 25
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via 
Digitalmars-d wrote:
 On 10/23/17 12:56 PM, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914

 I need to get this issue resolved as soon as possible so that the fix
 makes it into the next compiler release. Because it involves cleanup
 code in a class destructor a design change may be necessary. Who should
 I contact to determine the best way to fix this bug?
It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Oct 25
next sibling parent reply Nemanja Boric <4burgos gmail.com> writes:
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis 
wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
 via Digitalmars-d wrote:
 On 10/23/17 12:56 PM, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914

 I need to get this issue resolved as soon as possible so 
 that the fix makes it into the next compiler release. 
 Because it involves cleanup code in a class destructor a 
 design change may be necessary. Who should I contact to 
 determine the best way to fix this bug?
It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
Oct 25
next sibling parent reply Nemanja Boric <4burgos gmail.com> writes:
On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric 
wrote:
 On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis 
 wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
 via Digitalmars-d wrote:
 [...]
Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
I'm not sure that would allow us to mprotect the page guards, though.
Oct 25
parent reply Nemanja Boric <4burgos gmail.com> writes:
On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric 
wrote:
 On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric 
 wrote:
 On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M 
 Davis wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
 via Digitalmars-d wrote:
 [...]
Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
I'm not sure that would allow us to mprotect the page guards, though.
I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( ) What do you think?
Oct 25
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/25/17 11:27 AM, Nemanja Boric wrote:
 On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:
 On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:
 On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via 
 Digitalmars-d wrote:
 [...]
Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
I'm not sure that would allow us to mprotect the page guards, though.
I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( )
mmap does return an error. And onOutMemoryError is called when it fails. https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518 Which should throw an error: https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542 however, when mprotect fails, it calls abort(): https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540
 What do you think?
I think we should reverse the mprotect default, and try and determine a better way to opt-in to the limit. Is this a Linux-specific problem? Are there issues with mprotect on other OSes? Or is Linux the only OS that supports mprotect? -Steve
Oct 25
parent Nemanja Boric <4burgos gmail.com> writes:
On Wednesday, 25 October 2017 at 15:43:12 UTC, Steven 
Schveighoffer wrote:
 On 10/25/17 11:27 AM, Nemanja Boric wrote:
 On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric 
 wrote:
 On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric 
 wrote:
 On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M 
 Davis wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven 
 Schveighoffer via Digitalmars-d wrote:
 [...]
Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
I'm not sure that would allow us to mprotect the page guards, though.
I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( )
mmap does return an error. And onOutMemoryError is called when it fails. https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518 Which should throw an error: https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542 however, when mprotect fails, it calls abort(): https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540
 What do you think?
I think we should reverse the mprotect default, and try and determine a better way to opt-in to the limit. Is this a Linux-specific problem? Are there issues with mprotect on other OSes? Or is Linux the only OS that supports mprotect? -Steve
Reading FreeBSD man pages, it looks like at least FreeBSD has the same limitation in the mmap, but the man pages for the mprotect doesn't specify this. However, I believe it's just the issue in the man pages, as it would be the same thing really as with Linux. Linux's manpage for mprotect specifically says this:
        ENOMEM Changing the protection of a memory region would 
 result in the
total number of mappings with distinct attributes (e.g., read versus read/write protection) exceeding the allowed maximum. (For example, making the protection of a range PROT_READ in the middle of a region currently protected as PROT_READ|PROT_WRITE would result in three mappings: two read/write mappings at each end and a read-only mapping in the middle.) so I was right about doubling the mappings.
 I think we should reverse the mprotect default, and try and
determine a better way to opt-in to the limit.
I agree: https://github.com/dlang/druntime/pull/1956
Oct 25
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/25/17 11:12 AM, Nemanja Boric wrote:
 On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via 
 Digitalmars-d wrote:
 On 10/23/17 12:56 PM, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914

 I need to get this issue resolved as soon as possible so > that the 
fix makes it into the next compiler release. > Because it involves cleanup code in a class destructor a > design change may be necessary. Who should I contact to > determine the best way to fix this bug? It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Maybe there was a change in the OS(es) being used that affected the limit?
Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
Hm... the mprotect docs specifically state that calling mprotect on something that's not allocated via mmap is undefined. So if you use GC to allocate Fiber stacks, you can't mprotect it. I think what we need is a more configurable way to allocate stacks. There is a tradeoff to mprotect vs. simple allocation, and it's not obvious to choose one over the other. I still am baffled as to why this is now showing up. Perhaps if you are using mmap as an allocator (as Fiber seems to be doing), it doesn't count towards the limit? Maybe it's just glommed into the standard allocator's space? -Steve
Oct 25
parent Nemanja Boric <4burgos gmail.com> writes:
On Wednesday, 25 October 2017 at 15:32:36 UTC, Steven 
Schveighoffer wrote:
 On 10/25/17 11:12 AM, Nemanja Boric wrote:
 On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M 
 Davis wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
 via Digitalmars-d wrote:
 [...]
Maybe there was a change in the OS(es) being used that affected the limit?
Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.
Hm... the mprotect docs specifically state that calling mprotect on something that's not allocated via mmap is undefined. So if you use GC to allocate Fiber stacks, you can't mprotect it. I think what we need is a more configurable way to allocate stacks. There is a tradeoff to mprotect vs. simple allocation, and it's not obvious to choose one over the other. I still am baffled as to why this is now showing up. Perhaps if you are using mmap as an allocator (as Fiber seems to be doing), it doesn't count towards the limit? Maybe it's just glommed into the standard allocator's space? -Steve
I'm sorry I wrote several messages in the row, as the thoughts were coming to me. I think the reason is that mprotect creates a new range, since it needs to have distinct protection attributes, hence doubling the number of mappings.
 Maybe it's just glommed into the standard allocator's space?
No, you get to see each fiber's stack allocated separately when you cat /proc/pid/mappings.
Oct 25
prev sibling parent Nemanja Boric <4burgos gmail.com> writes:
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis 
wrote:
 On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
 via Digitalmars-d wrote:
 On 10/23/17 12:56 PM, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914

 I need to get this issue resolved as soon as possible so 
 that the fix makes it into the next compiler release. 
 Because it involves cleanup code in a class destructor a 
 design change may be necessary. Who should I contact to 
 determine the best way to fix this bug?
It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
On linux it's controllable by: `sysctl vm.max_map_count`
Oct 25
prev sibling parent Nemanja Boric <4burgos gmail.com> writes:
On Wednesday, 25 October 2017 at 13:26:26 UTC, Steven 
Schveighoffer wrote:
 On 10/23/17 12:56 PM, Brian Schott wrote:
 Context: https://issues.dlang.org/show_bug.cgi?id=17914
 
 I need to get this issue resolved as soon as possible so that 
 the fix makes it into the next compiler release. Because it 
 involves cleanup code in a class destructor a design change 
 may be necessary. Who should I contact to determine the best 
 way to fix this bug?
It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Although after the stack overflow protection for fibers number of mmap calls is the same, _I think_ mprotect actually splits the single mmapped region into two (as they share different protection bits). This effectively doubles the number of the regions. As a workaround (if you have lots of fibers and doesn't care about stack overflow protection) you can just pass zero for the guardPageSize: https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4037
Oct 25