www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Stack frames larger than 4K should be rejected, but what if I want

reply IGotD- <nise nise.com> writes:
This is a breakout thread from this thread.

https://forum.dlang.org/thread/ghodronzxeirokyoqeag forum.dlang.org

According to the comment from Walter.

*As per Walter's recent comment in that thread, he asserts that 
stack allocations beyond 4k should be rejected.*

in this issue:
https://issues.dlang.org/show_bug.cgi?id=17566

In general I can agree with the rationale to not allow stack 
frames larger than 4K when it comes to normal programming, it 
makes sense. However, since D is supposed to be a "systems 
programming language" the language should not dictate what the 
programmer should be able to do with the stack. It also assumes 
things that all systems have 4K page sizes, that the stack should 
be used moderately. There might be cases when the programmer 
wants to go nuts with stack and do a lot storage on it, for 
whatever reason. Therefore I think this limit should not be a 
language definition.

First this 4K limit should be configurable per operating system. 
Also there should be an option to override this limit, for 
example with a compiler option.
Jun 27
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:
 In general I can agree with the rationale to not allow stack 
 frames larger than 4K when it comes to normal programming, it 
 makes sense.
It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer.
Jun 27
prev sibling next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:
 In general I can agree with the rationale to not allow stack 
 frames larger than 4K when it comes to normal programming, it 
 makes sense.
It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer, anything less than 16KiB is a joke IMHO.
Jun 27
next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad 
wrote:
 It makes no sense and would kill a system level language.
While it's not specified in "The compiler should reject any stack frame that's larger than 4K", I think it's only meant to apply to safe functions, not system or trusted ones. Also, instead of straight up rejecting large stack frames in safe code, the compiler could also start such a function with probing the guard page by making writes at intervals of 4K.
Jun 27
next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Sunday, 27 June 2021 at 20:17:48 UTC, Dennis wrote:
 I think it's only meant to apply to  safe functions, not 
  system or  trusted ones.
Actually, that's hard to realize, since the check for ` safe` is a semantic check in the frontend, while final stack sizes are only known by the backend. Making the frontend guess an upper bound is hard because of tail calls and/or inlining, e.g: ```D void f(ubyte[] x) { ubyte[4000] bufA = void; g(bufA[]); } void g(ubyte[] bufA) { ubyte[4000] bufB = void; h(bufA, bufB); } void h(ubyte[] x, ubyte[] y); ``` With ldc -O3, the stack frame of `f` is 8008 bytes because it has `g` inlined.
Jun 27
parent reply IGotD- <nise nise.com> writes:
On Sunday, 27 June 2021 at 22:01:22 UTC, Dennis wrote:
 Actually, that's hard to realize, since the check for ` safe` 
 is a semantic check in the frontend, while final stack sizes 
 are only known by the backend. Making the frontend guess an 
 upper bound is hard because of tail calls and/or inlining, e.g:

 ```D
 void f(ubyte[] x) {
     ubyte[4000] bufA = void;
     g(bufA[]);
 }

 void g(ubyte[] bufA) {
     ubyte[4000] bufB = void;
     h(bufA, bufB);
 }

 void h(ubyte[] x, ubyte[] y);
 ```
 With ldc -O3, the stack frame of `f` is 8008 bytes because it 
 has `g` inlined.
That's a good observation. Does this mean that the point of the suggested 4K limit falls? In practice, if you want to prevent stack overflow and be sure about it I think you need a check for every new frame. This has a performance impact but safety usually has a that.
Jun 27
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 27 June 2021 at 22:16:12 UTC, IGotD- wrote:
 That's a good observation. Does this mean that the point of the 
 suggested 4K limit falls?
No, because the compiler can refuse to inline, but it is a bad idea. It also means that safe functions cannot call FFI, how can you prove that FFI does not exceed 4K after inlining? D is trying to become a high level language, but is also aiming to be system level and that is not really an attainable goal. You have to make a choice. It would be better for D to not go for 100% safe, but instead prevent common mistakes. I've never ran out of stack for any code I've written ever. Maybe that means allocating larger stacks so that guard pages are never hit, add more guard pages and simply terminate when guard pages are hit. Other alternatives: do stack depth analysis, add stack less coroutines. Anyway, for a system level language, those choices should be at the hand of the programmer. I agree with you on that.
Jun 28
next sibling parent reply Luis <Luis.panadero gmail.com> writes:
On Monday, 28 June 2021 at 07:10:05 UTC, Ola Fosheim Grøstad 
wrote:
 On Sunday, 27 June 2021 at 22:16:12 UTC, IGotD- wrote:
 That's a good observation. Does this mean that the point of 
 the suggested 4K limit falls?
No, because the compiler can refuse to inline, but it is a bad idea. It also means that safe functions cannot call FFI, how can you prove that FFI does not exceed 4K after inlining? D is trying to become a high level language, but is also aiming to be system level and that is not really an attainable goal. You have to make a choice. It would be better for D to not go for 100% safe, but instead prevent common mistakes. I've never ran out of stack for any code I've written ever. Maybe that means allocating larger stacks so that guard pages are never hit, add more guard pages and simply terminate when guard pages are hit. Other alternatives: do stack depth analysis, add stack less coroutines. Anyway, for a system level language, those choices should be at the hand of the programmer. I agree with you on that.
I'm pretty worried how this would affect the actual coroutines/fibers implementation. I'm slowly trying to write a retro game engine that uses a lot of fibers (hundreds!) on a single thread.
Jun 28
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 28 June 2021 at 07:27:46 UTC, Luis wrote:
 I'm pretty worried how this would affect the actual 
 coroutines/fibers implementation. I'm slowly trying to write a 
 retro game engine that uses a lot of fibers (hundreds!) on a 
 single thread.
Did you mean the proposed 4K limitation? Your project sounds like a fun project! I started programming on a CBM64 8-bit computer with hardware sprites… ;-). Seems like your project could be a use case for stackless coroutines. I was quite surprised to see that embedded programmers also care a lot about stackless coroutines as they can implement state machines with predictable memory consumption. There is also a presentation on youtube that explains how database search optimizations can make good use of them to speed up parallell binary searches in indexes. Also surprising to me. Seems to be a lot of room for innovation in the concurrency design space.
Jun 28
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 28 June 2021 at 07:10:05 UTC, Ola Fosheim Grøstad 
wrote:
 D is trying to become a high level language, but is also aiming 
 to be system level and that is not really an attainable goal. 
 You have to make a choice. It would be better for D to not go 
 for 100% safe, but instead prevent common mistakes.

 I've never ran out of stack for any code I've written ever.

 Maybe that means allocating larger stacks so that guard pages 
 are never hit, add more guard pages and simply terminate when 
 guard pages are hit.
Hitting guard pages and you will get an exception and a core dump. That's one way to do it but there is no explicit message that you ran out of stack. By checking the stack limit for each frame you can gracefully exit and print a message. Any of these methods gives you a performance hit and I am a bit skeptical that it is worth it. I think it should be an opt in feature. This kind of feature is similar to Visual Studio C/C++ debug mode which has extensive stack analysis for each frame. The performance hit very high for branchy code but I have discovered a lot of bugs the debug stack analysis. VS also put in guard patterns between stack variables so that you can detect if there is any overwrite within a frame. These kind of bugs are common in C/C++, mainly because there is no bounds checking with arrays. I don't like limits on how I use the stack and if the programmer is an amateur is a weak argument. I really question how effective this limit is. I rather go the VS route which mean proper stack analysis in debug mode.
Jun 28
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 28 June 2021 at 08:54:59 UTC, IGotD- wrote:
 Hitting guard pages and you will get an exception and a core 
 dump. That's one way to do it but there is no explicit message 
 that you ran out of stack. By checking the stack limit for each 
 frame you can gracefully exit and print a message. Any of these 
 methods gives you a performance hit and I am a bit skeptical 
 that it is worth it. I think it should be an opt in feature.
You should be able to trap it and use the program counter/stack pointer to figure out where it happened. Then you can extend the stack if you are able to, if not call a cleanup handler. The best thing is obviously to prove that the stack usage will stay within a worst case estimate. That is what you want in embedded and comparable applications. That should obviously be opt in, as it prevents arbitrary recursion, but then you also need no guards and other complications.
 I don't like limits on how I use the stack and if the 
 programmer is an amateur is a weak argument. I really question 
 how effective this limit is. I rather go the VS route which 
 mean proper stack analysis in debug mode.
The «amateur» claim is 100% bogus and a made up excuse. There is nothing wrong with stack allocating buffers near the leafs on the callstack. When you know what the call depth is then allocating temporary buffers on the stack is a good, high performing strategy if the runtime and code-gen is sensible. If you specify what your estimated worst case stack depth is, then you shouldn't hit the guard pages anyway. For a system level language the programmer should decide this, absolutely!
Jun 28
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 28 June 2021 at 09:17:46 UTC, Ola Fosheim Grøstad 
wrote:
 You should be able to trap it and use the program counter/stack 
 pointer to figure out where it happened. Then you can extend 
 the stack if you are able to, if not call a cleanup handler.
Isn't this what D is doing now, anyway?
Jun 28
prev sibling parent reply SealabJaster <sealabjaster gmail.com> writes:
On Monday, 28 June 2021 at 07:10:05 UTC, Ola Fosheim Grøstad 
wrote:
 It also means that safe functions cannot call FFI, how can you 
 prove that FFI does not exceed 4K after inlining?
Slightly related, but I was recently learning how to implement coroutines myself and it was crashing because MSCV's `printf` allocates a rather large buffer on the stack. Can't remember the exact size, but I'm pretty sure it was at least 4K. So in other words, you can't prove it for foreign functions I believe.
Jun 28
parent IGotD- <nise nise.com> writes:
On Monday, 28 June 2021 at 11:39:27 UTC, SealabJaster wrote:
 Slightly related, but I was recently learning how to implement 
 coroutines myself and it was crashing because MSCV's `printf` 
 allocates a rather large buffer on the stack. Can't remember 
 the exact size, but I'm pretty sure it was at least 4K.

 So in other words, you can't prove it for foreign functions I 
 believe.
printf and conversion functions are usually a typical example of functions that use large buffers on the stack. There is usually no problem with this as rich OSes have plenty of virtual stack space. If this wasn't allowed you would need to dynamically allocate memory of every printf call if not several depending what you print. This would slow down the printing function and another problem for embedded systems/systems programming is that you might want to print something before malloc/free are initialized. Big buffers on the stack is not always because the programmer is an amateur.
Jun 28
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/27/2021 1:17 PM, Dennis wrote:
 Also, instead of straight up rejecting large stack frames in  safe code, the 
 compiler could also start such a function with probing the guard page by
making 
 writes at intervals of 4K.
dmd already does that. But very few programmers are aware of this, and it's pretty inefficient.
Jun 27
parent reply ag0aep6g <anonymous example.com> writes:
On Sunday, 27 June 2021 at 23:12:44 UTC, Walter Bright wrote:
 On 6/27/2021 1:17 PM, Dennis wrote:
 Also, instead of straight up rejecting large stack frames in 
  safe code, the compiler could also start such a function with 
 probing the guard page by making writes at intervals of 4K.
dmd already does that. But very few programmers are aware of this, and it's pretty inefficient.
?? DMD does not emit stack probes. If it did, issue 17566 (and this forum thread) wouldn't exist. (I already posted this via the newsgroup a few hours ago, but it doesn't show up.)
Jun 28
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/28/2021 3:05 AM, ag0aep6g wrote:
 DMD does not emit stack probes. If it did, issue 17566 (and this forum thread) 
 wouldn't exist.
https://github.com/dlang/dmd/blob/master/src/dmd/backend/cod3.d#L3599
Jun 28
parent ag0aep6g <anonymous example.com> writes:
On 28.06.21 21:34, Walter Bright wrote:
 https://github.com/dlang/dmd/blob/master/src/dmd/backend/cod3.d#L3599
From there: ---- if (config.exe & (EX_LINUX | EX_LINUX64)) check = false; // seems that Linux doesn't need to fault in stack pages ---- Looks like it's not done on Linux. You might be able to fix issue 17566 by actually enabling that code.
Jun 28
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad 
wrote:
 It makes no sense and would kill a system level language. The 
 stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a 
 commonly sized FFT buffer, anything less than 16KiB is a joke 
 IMHO.
Yes kind of, and I can think of use cases when you want big arrays on the stack. The question is if the approach is correct. Walter wants to catch potential memory corruption but limiting the stack usage (per function I assume) but in this case the real solution would be reading the stack limit (probably with an expensive system API but can be stored as a TLS variable). On 64-bit systems there usually enough virtual space to put several megabytes of guard regions. On 32-bit systems it is more cramped and perhaps only 4K. The question is if this is something that should be dealt by the language.
Jun 27
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/27/2021 2:08 PM, IGotD- wrote:
 On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad wrote:
 It makes no sense and would kill a system level language. The stack depth for 
 Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer, 
 anything less than 16KiB is a joke IMHO.
Yes kind of, and I can think of use cases when you want big arrays on the stack. The question is if the approach is correct. Walter wants to catch potential memory corruption but limiting the stack usage (per function I assume) but in this case the real solution would be reading the stack limit (probably with an expensive system API but can be stored as a TLS variable). On 64-bit systems there usually enough virtual space to put several megabytes of guard regions. On 32-bit systems it is more cramped and perhaps only 4K. The question is if this is something that should be dealt by the language.
There's a huge difference between the virtual address range set aside for the entire thread stack, and the amount of stack consumed by one function invocation. Large amounts of stack allocated for one function's frame is almost always the result of an inexperienced system coder who doesn't realize what it means to allocate memory on the stack, how guard pages work, and how physical memory gets allocated as the stack grows.
Jun 27
prev sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Sunday, 27 June 2021 at 19:41:40 UTC, Ola Fosheim Grøstad 
wrote:
 On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:
 In general I can agree with the rationale to not allow stack 
 frames larger than 4K when it comes to normal programming, it 
 makes sense.
It makes no sense and would kill a system level language. The stack depth for Linux is 8MiB. 4KiB isn't even enough to fit a commonly sized FFT buffer, anything less than 16KiB is a joke IMHO.
If you need to quickly allocate very large buffers, you _can_ quickly allocate very large buffers. But the stack is not the right place to do that.
Jun 27
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 28 June 2021 at 01:18:57 UTC, Elronnd wrote:
 If you need to quickly allocate very large buffers, you _can_ 
 quickly allocate very large buffers.  But the stack is not the 
 right place to do that.
You think more than 4K for a stack frame is very large? That is a crazy and unworkable restriction. Even when 2K is enough for a single function it will be a complete disaster as inlining could make this 16K in a heartbeat. The stack is the correct place to put fast allocations as that memory is in the cache. Well, D is not a system level language, that is for sure.
Jun 27
prev sibling next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 28/06/2021 6:56 AM, IGotD- wrote:
 First this 4K limit should be configurable per operating system. Also 
 there should be an option to override this limit, for example with a 
 compiler option.
You can set it for fibers too!
Jun 27
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/27/21 2:56 PM, IGotD- wrote:
 This is a breakout thread from this thread.
 
 https://forum.dlang.org/thread/ghodronzxeirokyoqeag forum.dlang.org
 
 According to the comment from Walter.
 
 *As per Walter's recent comment in that thread, he asserts that stack 
 allocations beyond 4k should be rejected.*
 
 in this issue:
 https://issues.dlang.org/show_bug.cgi?id=17566
 
 In general I can agree with the rationale to not allow stack frames 
 larger than 4K when it comes to normal programming, it makes sense. 
 However, since D is supposed to be a "systems programming language" the 
 language should not dictate what the programmer should be able to do 
 with the stack. It also assumes things that all systems have 4K page 
 sizes, that the stack should be used moderately. There might be cases 
 when the programmer wants to go nuts with stack and do a lot storage on 
 it, for whatever reason. Therefore I think this limit should not be a 
 language definition.
 
 First this 4K limit should be configurable per operating system. Also 
 there should be an option to override this limit, for example with a 
 compiler option.
Could this be fixed by not allowing *uninitialized* stack segments larger than 4k? Basically, if you can't create a stack which contains a contiguous 4k of uninitialized space, then you can't skip over the guard page. void-initialized data is pretty rare in D. -Steve
Jun 28
parent reply IGotD- <nise nise.com> writes:
On Monday, 28 June 2021 at 12:41:20 UTC, Steven Schveighoffer 
wrote:
 Could this be fixed by not allowing *uninitialized* stack 
 segments larger than 4k? Basically, if you can't create a stack 
 which contains a contiguous 4k of uninitialized space, then you 
 can't skip over the guard page.

 void-initialized data is pretty rare in D.

 -Steve
Why? When you decide not to initialize you also surpass the safety benefits of initialized values.
Jun 28
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/28/21 9:09 AM, IGotD- wrote:
 On Monday, 28 June 2021 at 12:41:20 UTC, Steven Schveighoffer wrote:
 Could this be fixed by not allowing *uninitialized* stack segments 
 larger than 4k? Basically, if you can't create a stack which contains 
 a contiguous 4k of uninitialized space, then you can't skip over the 
 guard page.

 void-initialized data is pretty rare in D.
Why? When you decide not to initialize you also surpass the safety benefits of initialized values.
The point is to ensure the guard page is triggered. This is not about the safety of initialized values. It's about making sure the stack pointer stays sane. I don't know about you, but I don't want to start having to worry about stack pointer correctness, even in system code. This would be like saying null pointer dereferences only trigger a segfault in safe code, so now all system code that doesn't want to corrupt some mmapped data at the null page must first check that a pointer is not null before using. It's nonsense. -Steve
Jun 28
next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 28 June 2021 at 14:01:50 UTC, Steven Schveighoffer 
wrote:
 The point is to ensure the guard page is triggered. This is not 
 about the safety of initialized values. It's about making sure 
 the stack pointer stays sane. I don't know about you, but I 
 don't want to start having to worry about stack pointer 
 correctness, even in system code.

 This would be like saying null pointer dereferences only 
 trigger a segfault in safe code, so now all system code that 
 doesn't want to corrupt some mmapped data at the null page must 
 first check that a pointer is not null before using. It's 
 nonsense.

 -Steve
If you want stack overflow safety for whatever reason, then you should do proper bounds checking for each frame. I consider the 4K limitation and poking stack pages ahead to be just hacks.
Jun 28
parent sighoya <sighoya gmail.com> writes:
On Monday, 28 June 2021 at 14:40:12 UTC, IGotD- wrote:
 I consider the 4K limitation and poking stack pages ahead to be 
 just hacks.
Ditto. D shouldn't become a vendor.
Jun 28
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 28 June 2021 at 14:01:50 UTC, Steven Schveighoffer 
wrote:
 This would be like saying null pointer dereferences only 
 trigger a segfault in safe code, so now all system code that 
 doesn't want to corrupt some mmapped data at the null page must 
 first check that a pointer is not null before using. It's 
 nonsense.
Trapping page 0 is not foolproof. For instance if you have a pointer to a static array, you could easily be outside the range that will trap. Not all hardware can trap page 0 either. So you do end up with a solution that works often, but not always. But for stacks specifically, there is nothing that prevents you from having multiple guard pages. Just make it user configurable: min stack depth, max stack depth, number of guard pages.
Jun 28
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 27 June 2021 at 18:56:57 UTC, IGotD- wrote:
 This is a breakout thread from this thread.

 https://forum.dlang.org/thread/ghodronzxeirokyoqeag forum.dlang.org

 According to the comment from Walter.

 *As per Walter's recent comment in that thread, he asserts that 
 stack allocations beyond 4k should be rejected.*

 in this issue:
 https://issues.dlang.org/show_bug.cgi?id=17566
I recall that Pascal debug builds added manual checks for stack overflow. This was especially necessary in real mode, where a stack overflow would just crash the PC. I don't see why D couldn't do the same. It would be useful in protected mode with small variables too, as it would allow generating a normal D exception instead of a segmentation fault.
Jun 28
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 29 June 2021 at 04:00:24 UTC, Vladimir Panteleev 
wrote:
 I recall that Pascal debug builds added manual checks for stack 
 overflow. This was especially necessary in real mode, where a 
 stack overflow would just crash the PC. I don't see why D 
 couldn't do the same. It would be useful in protected mode with 
 small variables too, as it would allow generating a normal D 
 exception instead of a segmentation fault.
You could, if all your code is written in D, but D has to adapt stack layout to what FFI code expects. For instance, if 30% of your code is FFI, then you only detect 70% of stack overflows.
Jun 29