digitalmars.D - Stack frames larger than 4K should be rejected, but what if I want
- IGotD- (20/20) Jun 27 2021 This is a breakout thread from this thread.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/7) Jun 27 2021 It makes no sense and would kill a system level language. The
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) Jun 27 2021 It makes no sense and would kill a system level language. The
- Dennis (8/9) Jun 27 2021 While it's not specified in "The compiler should reject any stack
- Dennis (18/20) Jun 27 2021 Actually, that's hard to realize, since the check for `@safe` is
- IGotD- (6/23) Jun 27 2021 That's a good observation. Does this mean that the point of the
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/19) Jun 28 2021 No, because the compiler can refuse to inline, but it is a bad
- Luis (6/25) Jun 28 2021 I'm pretty worried how this would affect the actual
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/16) Jun 28 2021 Did you mean the proposed 4K limitation? Your project sounds like
- IGotD- (19/27) Jun 28 2021 Hitting guard pages and you will get an exception and a core
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/27) Jun 28 2021 You should be able to trap it and use the program counter/stack
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/6) Jun 28 2021 Isn't this what D is doing now, anyway?
- SealabJaster (8/10) Jun 28 2021 Slightly related, but I was recently learning how to implement
- IGotD- (12/18) Jun 28 2021 printf and conversion functions are usually a typical example of
- Walter Bright (3/6) Jun 27 2021 dmd already does that. But very few programmers are aware of this, and i...
- ag0aep6g (6/12) Jun 28 2021 ??
- Walter Bright (2/4) Jun 28 2021 https://github.com/dlang/dmd/blob/master/src/dmd/backend/cod3.d#L3599
- ag0aep6g (9/10) Jun 28 2021 From there:
- IGotD- (12/16) Jun 27 2021 Yes kind of, and I can think of use cases when you want big
- Walter Bright (7/22) Jun 27 2021 There's a huge difference between the virtual address range set aside fo...
- Elronnd (5/13) Jun 27 2021 If you need to quickly allocate very large buffers, you _can_
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/11) Jun 27 2021 You think more than 4K for a stack frame is very large? That is a
- rikki cattermole (2/5) Jun 27 2021 You can set it for fibers too!
- Steven Schveighoffer (7/32) Jun 28 2021 Could this be fixed by not allowing *uninitialized* stack segments
- IGotD- (4/10) Jun 28 2021 Why? When you decide not to initialize you also surpass the
- Steven Schveighoffer (10/22) Jun 28 2021 The point is to ensure the guard page is triggered. This is not about
- IGotD- (5/16) Jun 28 2021 If you want stack overflow safety for whatever reason, then you
- sighoya (2/4) Jun 28 2021 Ditto. D shouldn't become a vendor.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/15) Jun 28 2021 Trapping page 0 is not foolproof. For instance if you have a
- Vladimir Panteleev (7/14) Jun 28 2021 I recall that Pascal debug builds added manual checks for stack
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/11) Jun 29 2021 You could, if all your code is written in D, but D has to adapt
This is a breakout thread from this thread. https://forum.dlang.org/thread/ghodronzxeirokyoqeag forum.dlang.org According to the comment from Walter. *As per Walter's recent comment in that thread, he asserts that stack allocations beyond 4k should be rejected.* in this issue: https://issues.dlang.org/show_bug.cgi?id=17566 In general I can agree with the rationale to not allow stack frames larger than 4K when it comes to normal programming, it makes sense. However, since D is supposed to be a "systems programming language" the language should not dictate what the programmer should be able to do with the stack. It also assumes things that all systems have 4K page sizes, that the stack should be used moderately. There might be cases when the programmer wants to go nuts with stack and do a lot storage on it, for whatever reason. Therefore I think this limit should not be a language definition. First this 4K limit should be configurable per operating system. Also there should be an option to override this limit, for example with a compiler option.
Jun 27 2021
On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:In general I can agree with the rationale to not allow stack frames larger than 4K when it comes to normal programming, it makes sense.It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer.
Jun 27 2021
On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:In general I can agree with the rationale to not allow stack frames larger than 4K when it comes to normal programming, it makes sense.It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer, anything less than 16KiB is a joke IMHO.
Jun 27 2021
On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad wrote:It makes no sense and would kill a system level language.While it's not specified in "The compiler should reject any stack frame that's larger than 4K", I think it's only meant to apply to safe functions, not system or trusted ones. Also, instead of straight up rejecting large stack frames in safe code, the compiler could also start such a function with probing the guard page by making writes at intervals of 4K.
Jun 27 2021
On Sunday, 27 June 2021 at 20:17:48 UTC, Dennis wrote:I think it's only meant to apply to safe functions, not system or trusted ones.Actually, that's hard to realize, since the check for ` safe` is a semantic check in the frontend, while final stack sizes are only known by the backend. Making the frontend guess an upper bound is hard because of tail calls and/or inlining, e.g: ```D void f(ubyte[] x) { ubyte[4000] bufA = void; g(bufA[]); } void g(ubyte[] bufA) { ubyte[4000] bufB = void; h(bufA, bufB); } void h(ubyte[] x, ubyte[] y); ``` With ldc -O3, the stack frame of `f` is 8008 bytes because it has `g` inlined.
Jun 27 2021
On Sunday, 27 June 2021 at 22:01:22 UTC, Dennis wrote:Actually, that's hard to realize, since the check for ` safe` is a semantic check in the frontend, while final stack sizes are only known by the backend. Making the frontend guess an upper bound is hard because of tail calls and/or inlining, e.g: ```D void f(ubyte[] x) { ubyte[4000] bufA = void; g(bufA[]); } void g(ubyte[] bufA) { ubyte[4000] bufB = void; h(bufA, bufB); } void h(ubyte[] x, ubyte[] y); ``` With ldc -O3, the stack frame of `f` is 8008 bytes because it has `g` inlined.That's a good observation. Does this mean that the point of the suggested 4K limit falls? In practice, if you want to prevent stack overflow and be sure about it I think you need a check for every new frame. This has a performance impact but safety usually has a that.
Jun 27 2021
On Sunday, 27 June 2021 at 22:16:12 UTC, IGotD- wrote:That's a good observation. Does this mean that the point of the suggested 4K limit falls?No, because the compiler can refuse to inline, but it is a bad idea. It also means that safe functions cannot call FFI, how can you prove that FFI does not exceed 4K after inlining? D is trying to become a high level language, but is also aiming to be system level and that is not really an attainable goal. You have to make a choice. It would be better for D to not go for 100% safe, but instead prevent common mistakes. I've never ran out of stack for any code I've written ever. Maybe that means allocating larger stacks so that guard pages are never hit, add more guard pages and simply terminate when guard pages are hit. Other alternatives: do stack depth analysis, add stack less coroutines. Anyway, for a system level language, those choices should be at the hand of the programmer. I agree with you on that.
Jun 28 2021
On Monday, 28 June 2021 at 07:10:05 UTC, Ola Fosheim Grøstad wrote:On Sunday, 27 June 2021 at 22:16:12 UTC, IGotD- wrote:I'm pretty worried how this would affect the actual coroutines/fibers implementation. I'm slowly trying to write a retro game engine that uses a lot of fibers (hundreds!) on a single thread.That's a good observation. Does this mean that the point of the suggested 4K limit falls?No, because the compiler can refuse to inline, but it is a bad idea. It also means that safe functions cannot call FFI, how can you prove that FFI does not exceed 4K after inlining? D is trying to become a high level language, but is also aiming to be system level and that is not really an attainable goal. You have to make a choice. It would be better for D to not go for 100% safe, but instead prevent common mistakes. I've never ran out of stack for any code I've written ever. Maybe that means allocating larger stacks so that guard pages are never hit, add more guard pages and simply terminate when guard pages are hit. Other alternatives: do stack depth analysis, add stack less coroutines. Anyway, for a system level language, those choices should be at the hand of the programmer. I agree with you on that.
Jun 28 2021
On Monday, 28 June 2021 at 07:27:46 UTC, Luis wrote:I'm pretty worried how this would affect the actual coroutines/fibers implementation. I'm slowly trying to write a retro game engine that uses a lot of fibers (hundreds!) on a single thread.Did you mean the proposed 4K limitation? Your project sounds like a fun project! I started programming on a CBM64 8-bit computer with hardware sprites… ;-). Seems like your project could be a use case for stackless coroutines. I was quite surprised to see that embedded programmers also care a lot about stackless coroutines as they can implement state machines with predictable memory consumption. There is also a presentation on youtube that explains how database search optimizations can make good use of them to speed up parallell binary searches in indexes. Also surprising to me. Seems to be a lot of room for innovation in the concurrency design space.
Jun 28 2021
On Monday, 28 June 2021 at 07:10:05 UTC, Ola Fosheim Grøstad wrote:D is trying to become a high level language, but is also aiming to be system level and that is not really an attainable goal. You have to make a choice. It would be better for D to not go for 100% safe, but instead prevent common mistakes. I've never ran out of stack for any code I've written ever. Maybe that means allocating larger stacks so that guard pages are never hit, add more guard pages and simply terminate when guard pages are hit.Hitting guard pages and you will get an exception and a core dump. That's one way to do it but there is no explicit message that you ran out of stack. By checking the stack limit for each frame you can gracefully exit and print a message. Any of these methods gives you a performance hit and I am a bit skeptical that it is worth it. I think it should be an opt in feature. This kind of feature is similar to Visual Studio C/C++ debug mode which has extensive stack analysis for each frame. The performance hit very high for branchy code but I have discovered a lot of bugs the debug stack analysis. VS also put in guard patterns between stack variables so that you can detect if there is any overwrite within a frame. These kind of bugs are common in C/C++, mainly because there is no bounds checking with arrays. I don't like limits on how I use the stack and if the programmer is an amateur is a weak argument. I really question how effective this limit is. I rather go the VS route which mean proper stack analysis in debug mode.
Jun 28 2021
On Monday, 28 June 2021 at 08:54:59 UTC, IGotD- wrote:Hitting guard pages and you will get an exception and a core dump. That's one way to do it but there is no explicit message that you ran out of stack. By checking the stack limit for each frame you can gracefully exit and print a message. Any of these methods gives you a performance hit and I am a bit skeptical that it is worth it. I think it should be an opt in feature.You should be able to trap it and use the program counter/stack pointer to figure out where it happened. Then you can extend the stack if you are able to, if not call a cleanup handler. The best thing is obviously to prove that the stack usage will stay within a worst case estimate. That is what you want in embedded and comparable applications. That should obviously be opt in, as it prevents arbitrary recursion, but then you also need no guards and other complications.I don't like limits on how I use the stack and if the programmer is an amateur is a weak argument. I really question how effective this limit is. I rather go the VS route which mean proper stack analysis in debug mode.The «amateur» claim is 100% bogus and a made up excuse. There is nothing wrong with stack allocating buffers near the leafs on the callstack. When you know what the call depth is then allocating temporary buffers on the stack is a good, high performing strategy if the runtime and code-gen is sensible. If you specify what your estimated worst case stack depth is, then you shouldn't hit the guard pages anyway. For a system level language the programmer should decide this, absolutely!
Jun 28 2021
On Monday, 28 June 2021 at 09:17:46 UTC, Ola Fosheim Grøstad wrote:You should be able to trap it and use the program counter/stack pointer to figure out where it happened. Then you can extend the stack if you are able to, if not call a cleanup handler.Isn't this what D is doing now, anyway?
Jun 28 2021
On Monday, 28 June 2021 at 07:10:05 UTC, Ola Fosheim Grøstad wrote:It also means that safe functions cannot call FFI, how can you prove that FFI does not exceed 4K after inlining?Slightly related, but I was recently learning how to implement coroutines myself and it was crashing because MSCV's `printf` allocates a rather large buffer on the stack. Can't remember the exact size, but I'm pretty sure it was at least 4K. So in other words, you can't prove it for foreign functions I believe.
Jun 28 2021
On Monday, 28 June 2021 at 11:39:27 UTC, SealabJaster wrote:Slightly related, but I was recently learning how to implement coroutines myself and it was crashing because MSCV's `printf` allocates a rather large buffer on the stack. Can't remember the exact size, but I'm pretty sure it was at least 4K. So in other words, you can't prove it for foreign functions I believe.printf and conversion functions are usually a typical example of functions that use large buffers on the stack. There is usually no problem with this as rich OSes have plenty of virtual stack space. If this wasn't allowed you would need to dynamically allocate memory of every printf call if not several depending what you print. This would slow down the printing function and another problem for embedded systems/systems programming is that you might want to print something before malloc/free are initialized. Big buffers on the stack is not always because the programmer is an amateur.
Jun 28 2021
On 6/27/2021 1:17 PM, Dennis wrote:Also, instead of straight up rejecting large stack frames in safe code, the compiler could also start such a function with probing the guard page by making writes at intervals of 4K.dmd already does that. But very few programmers are aware of this, and it's pretty inefficient.
Jun 27 2021
On Sunday, 27 June 2021 at 23:12:44 UTC, Walter Bright wrote:On 6/27/2021 1:17 PM, Dennis wrote:?? DMD does not emit stack probes. If it did, issue 17566 (and this forum thread) wouldn't exist. (I already posted this via the newsgroup a few hours ago, but it doesn't show up.)Also, instead of straight up rejecting large stack frames in safe code, the compiler could also start such a function with probing the guard page by making writes at intervals of 4K.dmd already does that. But very few programmers are aware of this, and it's pretty inefficient.
Jun 28 2021
On 6/28/2021 3:05 AM, ag0aep6g wrote:DMD does not emit stack probes. If it did, issue 17566 (and this forum thread) wouldn't exist.https://github.com/dlang/dmd/blob/master/src/dmd/backend/cod3.d#L3599
Jun 28 2021
On 28.06.21 21:34, Walter Bright wrote:https://github.com/dlang/dmd/blob/master/src/dmd/backend/cod3.d#L3599From there: ---- if (config.exe & (EX_LINUX | EX_LINUX64)) check = false; // seems that Linux doesn't need to fault in stack pages ---- Looks like it's not done on Linux. You might be able to fix issue 17566 by actually enabling that code.
Jun 28 2021
On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad wrote:It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer, anything less than 16KiB is a joke IMHO.Yes kind of, and I can think of use cases when you want big arrays on the stack. The question is if the approach is correct. Walter wants to catch potential memory corruption but limiting the stack usage (per function I assume) but in this case the real solution would be reading the stack limit (probably with an expensive system API but can be stored as a TLS variable). On 64-bit systems there usually enough virtual space to put several megabytes of guard regions. On 32-bit systems it is more cramped and perhaps only 4K. The question is if this is something that should be dealt by the language.
Jun 27 2021
On 6/27/2021 2:08 PM, IGotD- wrote:On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad wrote:There's a huge difference between the virtual address range set aside for the entire thread stack, and the amount of stack consumed by one function invocation. Large amounts of stack allocated for one function's frame is almost always the result of an inexperienced system coder who doesn't realize what it means to allocate memory on the stack, how guard pages work, and how physical memory gets allocated as the stack grows.It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer, anything less than 16KiB is a joke IMHO.Yes kind of, and I can think of use cases when you want big arrays on the stack. The question is if the approach is correct. Walter wants to catch potential memory corruption but limiting the stack usage (per function I assume) but in this case the real solution would be reading the stack limit (probably with an expensive system API but can be stored as a TLS variable). On 64-bit systems there usually enough virtual space to put several megabytes of guard regions. On 32-bit systems it is more cramped and perhaps only 4K. The question is if this is something that should be dealt by the language.
Jun 27 2021
On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad wrote:On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:If you need to quickly allocate very large buffers, you _can_ quickly allocate very large buffers. But the stack is not the right place to do that.In general I can agree with the rationale to not allow stack frames larger than 4K when it comes to normal programming, it makes sense.It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer, anything less than 16KiB is a joke IMHO.
Jun 27 2021
On Monday, 28 June 2021 at 01:18:57 UTC, Elronnd wrote:If you need to quickly allocate very large buffers, you _can_ quickly allocate very large buffers. But the stack is not the right place to do that.You think more than 4K for a stack frame is very large? That is a crazy and unworkable restriction. Even when 2K is enough for a single function it will be a complete disaster as inlining could make this 16K in a heartbeat. The stack is the correct place to put fast allocations as that memory is in the cache. Well, D is not a system level language, that is for sure.
Jun 27 2021
On 28/06/2021 6:56 AM, IGotD- wrote:First this 4K limit should be configurable per operating system. Also there should be an option to override this limit, for example with a compiler option.You can set it for fibers too!
Jun 27 2021
On 6/27/21 2:56 PM, IGotD- wrote:This is a breakout thread from this thread. https://forum.dlang.org/thread/ghodronzxeirokyoqeag forum.dlang.org According to the comment from Walter. *As per Walter's recent comment in that thread, he asserts that stack allocations beyond 4k should be rejected.* in this issue: https://issues.dlang.org/show_bug.cgi?id=17566 In general I can agree with the rationale to not allow stack frames larger than 4K when it comes to normal programming, it makes sense. However, since D is supposed to be a "systems programming language" the language should not dictate what the programmer should be able to do with the stack. It also assumes things that all systems have 4K page sizes, that the stack should be used moderately. There might be cases when the programmer wants to go nuts with stack and do a lot storage on it, for whatever reason. Therefore I think this limit should not be a language definition. First this 4K limit should be configurable per operating system. Also there should be an option to override this limit, for example with a compiler option.Could this be fixed by not allowing *uninitialized* stack segments larger than 4k? Basically, if you can't create a stack which contains a contiguous 4k of uninitialized space, then you can't skip over the guard page. void-initialized data is pretty rare in D. -Steve
Jun 28 2021
On Monday, 28 June 2021 at 12:41:20 UTC, Steven Schveighoffer wrote:Could this be fixed by not allowing *uninitialized* stack segments larger than 4k? Basically, if you can't create a stack which contains a contiguous 4k of uninitialized space, then you can't skip over the guard page. void-initialized data is pretty rare in D. -SteveWhy? When you decide not to initialize you also surpass the safety benefits of initialized values.
Jun 28 2021
On 6/28/21 9:09 AM, IGotD- wrote:On Monday, 28 June 2021 at 12:41:20 UTC, Steven Schveighoffer wrote:The point is to ensure the guard page is triggered. This is not about the safety of initialized values. It's about making sure the stack pointer stays sane. I don't know about you, but I don't want to start having to worry about stack pointer correctness, even in system code. This would be like saying null pointer dereferences only trigger a segfault in safe code, so now all system code that doesn't want to corrupt some mmapped data at the null page must first check that a pointer is not null before using. It's nonsense. -SteveCould this be fixed by not allowing *uninitialized* stack segments larger than 4k? Basically, if you can't create a stack which contains a contiguous 4k of uninitialized space, then you can't skip over the guard page. void-initialized data is pretty rare in D.Why? When you decide not to initialize you also surpass the safety benefits of initialized values.
Jun 28 2021
On Monday, 28 June 2021 at 14:01:50 UTC, Steven Schveighoffer wrote:The point is to ensure the guard page is triggered. This is not about the safety of initialized values. It's about making sure the stack pointer stays sane. I don't know about you, but I don't want to start having to worry about stack pointer correctness, even in system code. This would be like saying null pointer dereferences only trigger a segfault in safe code, so now all system code that doesn't want to corrupt some mmapped data at the null page must first check that a pointer is not null before using. It's nonsense. -SteveIf you want stack overflow safety for whatever reason, then you should do proper bounds checking for each frame. I consider the 4K limitation and poking stack pages ahead to be just hacks.
Jun 28 2021
On Monday, 28 June 2021 at 14:40:12 UTC, IGotD- wrote:I consider the 4K limitation and poking stack pages ahead to be just hacks.Ditto. D shouldn't become a vendor.
Jun 28 2021
On Monday, 28 June 2021 at 14:01:50 UTC, Steven Schveighoffer wrote:This would be like saying null pointer dereferences only trigger a segfault in safe code, so now all system code that doesn't want to corrupt some mmapped data at the null page must first check that a pointer is not null before using. It's nonsense.Trapping page 0 is not foolproof. For instance if you have a pointer to a static array, you could easily be outside the range that will trap. Not all hardware can trap page 0 either. So you do end up with a solution that works often, but not always. But for stacks specifically, there is nothing that prevents you from having multiple guard pages. Just make it user configurable: min stack depth, max stack depth, number of guard pages.
Jun 28 2021
On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:This is a breakout thread from this thread. https://forum.dlang.org/thread/ghodronzxeirokyoqeag forum.dlang.org According to the comment from Walter. *As per Walter's recent comment in that thread, he asserts that stack allocations beyond 4k should be rejected.* in this issue: https://issues.dlang.org/show_bug.cgi?id=17566I recall that Pascal debug builds added manual checks for stack overflow. This was especially necessary in real mode, where a stack overflow would just crash the PC. I don't see why D couldn't do the same. It would be useful in protected mode with small variables too, as it would allow generating a normal D exception instead of a segmentation fault.
Jun 28 2021
On Tuesday, 29 June 2021 at 04:00:24 UTC, Vladimir Panteleev wrote:I recall that Pascal debug builds added manual checks for stack overflow. This was especially necessary in real mode, where a stack overflow would just crash the PC. I don't see why D couldn't do the same. It would be useful in protected mode with small variables too, as it would allow generating a normal D exception instead of a segmentation fault.You could, if all your code is written in D, but D has to adapt stack layout to what FFI code expects. For instance, if 30% of your code is FFI, then you only detect 70% of stack overflows.
Jun 29 2021