www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - alloca is slow and dangerous

reply welkam <wwwelkam gmail.com> writes:
Over the years I saw several people asking why D doesn't have 
alloca. I just want to share a video where it states that alloca 
is slower than static array so we have one more arguments against 
it.
https://youtu.be/FY9SbqTO5GQ?t=468

In summary alloca is:
1. Hard to implement
2. security problem
3. slower than static array on the stack
Jan 01
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:
 Over the years I saw several people asking why D doesn't have 
 alloca. I just want to share a video where it states that 
 alloca is slower than static array
13% slower is nothing compared to the alternative which is malloc or running out of stack (which will happen real fast if you keep piling up worst case fixed size).
Jan 01
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:
 Over the years I saw several people asking why D doesn't have 
 alloca. I just want to share a video where it states that 
 alloca is slower than static array so we have one more 
 arguments against it.
 https://youtu.be/FY9SbqTO5GQ?t=468

 In summary alloca is:
 1. Hard to implement
Why is it hard to implement?
 2. security problem
Not entirely true. You need to check so that the amount elements is within reasonable limits. You can have stack overflows but most operating systems have guard pages detecting such cases. In kernels it might not be the case but in kernels you need pay attention more than in user space programs. Keep in mind static arrays on the stack are prone to overflows just as much as a VLA. In D static arrays have an additional problem and that is that D will initialize the array by default. For stability this is great but performance this takes more time. VLA can be a better option here as you initialize exactly the amount elements you need.
 3. slower than static array on the stack
The reason was strangely a lot of code for just allocating on the stack. He said he wasn't even using -O2 optimization and with -O2 it would be smaller. In general it shouldn't be that bad. The fear of alloca in my opinion is exaggerated. Doesn't D implement alloca for those who want it? https://dlang.org/phobos/rt_alloca.html
Jan 01
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:
 Over the years I saw several people asking why D doesn't have 
 alloca. I just want to share a video where it states that 
 alloca is slower than static array so we have one more 
 arguments against it.
 https://youtu.be/FY9SbqTO5GQ?t=468

 In summary alloca is:
 1. Hard to implement
Why is it hard to implement?
 2. security problem
Not entirely true. You need to check so that the amount elements is within reasonable limits. You can have stack overflows but most operating systems have guard pages detecting such cases. In kernels it might not be the case but in kernels you need pay attention more than in user space programs. Keep in mind static arrays on the stack are prone to overflows just as much as a VLA.
I think it is even worse than for VLA, because for the VLA you will have used a computed length depending on the parameters. So a real size. With static arrays, one will generally use a "worst case" size which value is much more prone to errors than the real length required. Anecdote: recent gcc versions (since version 7) have for each version better and better heuristic to check the potential size of a buffers with string combination functions (sprintf, strcpy, strcat etc.). The number of potential worst case buffer overflows in legacy code is staggering.
 In D static arrays have an additional problem and that is that 
 D will initialize the array by default. For stability this is 
 great but performance this takes more time. VLA can be a better 
 option here as you initialize exactly the amount elements you 
 need.
yes
 3. slower than static array on the stack
The reason was strangely a lot of code for just allocating on the stack. He said he wasn't even using -O2 optimization and with -O2 it would be smaller. In general it shouldn't be that bad.
afaik VLA get in the way of clever frame pointer optimisations. Local variables and parameters will have to be accessed with slightly more costly code than if the code knows at compile time exactly where they are. Better optimizers and code generation can alleviate it in a lot of cases.
 The fear of alloca in my opinion is exaggerated.

 Doesn't D implement alloca for those who want it?
 https://dlang.org/phobos/rt_alloca.html
While I'm neither afraid nor sceptical towards VLA/alloca, I have to say after 10 years of use that there are really very few cases where it was really a substantial improvement. It often spared a malloc/free pair but more often than expected added memory copies. YMMV.
Jan 01
parent reply welkam <wwwelkam gmail.com> writes:
Im going to respond to both messages

On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 Why is it hard to implement?
Because it requires backend help. alloca is a special function On Friday, 1 January 2021 at 17:48:22 UTC, Patrick Schluter wrote:
 On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 In D static arrays have an additional problem and that is that 
 D will initialize the array by default. For stability this is 
 great but performance this takes more time. VLA can be a 
 better option here as you initialize exactly the amount 
 elements you need.
yes
Usually this forum has excellent technical responses but this time you dropped ball on this one. Behold byte[128] = void; Walter did a talk on this one. DConf 2019 Day 1 Keynote: Allocating Memory with the D Programming Language -- Walter Bright https://www.youtube.com/watch?t=2210&v=_PB6Hdi4R7M On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 3. slower than static array on the stack
The reason was strangely a lot of code for just allocating on the stack. He said he wasn't even using -O2 optimization and with -O2 it would be smaller. In general it shouldn't be that bad.
Later he said that when people replaced VLA in kernel they found 13% speed up. While it was not stated kernel should be benchmarked with optimizations enabled.
Jan 03
parent reply IGotD- <nise nise.com> writes:
On Sunday, 3 January 2021 at 19:12:47 UTC, welkam wrote:
 Usually this forum has excellent technical responses but this 
 time you dropped ball on this one. Behold

 byte[128] = void;
I think the default initialization in D is an excellent feature. MS has already stated that they want to do this in C++. Sure your example circumvents that but still the default initialization help. My point was that more that you will allocate exactly the elements you need on the stack. For example you have have a maximum of 1024 elements but usually you only use one or two elements. Then static allocation of 1024 elements become crazy big and also it generate more cache misses. I'm not sure why the stack allocation needs to be so enormously complicated like the seminar. In general you just need to subtract the stack pointer.
Jan 03
parent reply welkam <wwwelkam gmail.com> writes:
On Sunday, 3 January 2021 at 19:33:10 UTC, IGotD- wrote:
 I think the default initialization in D is an excellent feature.
Agree. Also Walter put an escape hatch because he knew that some times you dont want it for performance reasons.
 MS has already stated that they want to do this in C++.
Not only MS but also linux people want default initialize everything so they both put effort in compilers to improve optimizations so they dont pay performance penalty where they should not. And in the end we benefit too.
 My point was that more that you will allocate exactly the 
 elements you need on the stack. For example you have have a 
 maximum of 1024 elements but usually you only use one or two 
 elements. Then static allocation of 1024 elements become crazy 
 big and also it generate more cache misses.
If that's the case then you do like Walter showed. Common case on the stack and uncommon big allocations on heap.
 I'm not sure why the stack allocation needs to be so enormously 
 complicated like the seminar. In general you just need to 
 subtract the stack pointer.
I'm not following you
Jan 03
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 3 January 2021 at 20:59:00 UTC, welkam wrote:
 MS has already stated that they want to do this in C++.
Not only MS but also linux people want default initialize everything so they both put effort in compilers to improve optimizations so they dont pay performance penalty where they should not. And in the end we benefit too.
The standard approach in PL design is to either default initialize or to statically establish that variables are not used until initialized (better, but more tricky). C++ was just a small addition to C, so that is where the semantics come from. C is the outlier here, not the norm...
Jan 03
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/1/21 9:48 AM, welkam wrote:
 Over the years I saw several people asking why D doesn't have alloca. I 
 just want to share a video where it states that alloca is slower than 
 static array so we have one more arguments against it.
 https://youtu.be/FY9SbqTO5GQ?t=468
 
 In summary alloca is:
 1. Hard to implement
 2. security problem
 3. slower than static array on the stack
D has alloca. It's in core.std.stdlib https://dlang.org/phobos/core_stdc_stdlib.html#.alloca In my experience, using it isn't extremely beneficial -- since it's in stdc, it requires the c library, which means you have malloc/free, which is much safer/usable. I'd much rather use a static array, or malloc/scope(exit) free, or a combination between the two that uses the stack when it can, and expands to malloc when needed. Or just use the GC... -Steve
Jan 01
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer 
wrote:
 In my experience, using it isn't extremely beneficial -- since 
 it's in stdc, it requires the c library, which means you have 
 malloc/free, which is much safer/usable.
I never use alloca directly, but in C you can just use an int variable for the dynamic array size. I find that useful for things like building a zero terminated path that only will be used with one function call. Or for a FFT buffer that is only known at runtime and I want to use hot memory (already in cache). To do that fixed size would take maybe 200 KB, could easily be 100 times more than needed... And the next stack frame would be cold as hell...
Jan 01
prev sibling parent reply welkam <wwwelkam gmail.com> writes:
On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer 
wrote:
 D has alloca. It's in core.std.stdlib

 https://dlang.org/phobos/core_stdc_stdlib.html#.alloca
Does it work on all compilers and all platforms?
Jan 03
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/3/21 2:15 PM, welkam wrote:
 On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer wrote:
 D has alloca. It's in core.std.stdlib

 https://dlang.org/phobos/core_stdc_stdlib.html#.alloca
Does it work on all compilers and all platforms?
I don't know. I would expect it to work anywhere D is supported. Looking at the code, DMD supports it, with GNU it's an intrinsic, and with LDC it's given a pragma to tag it (presumably so it can be recognized as an intrinsic). Are there other compilers or platforms that work with D besides DMD, LDC, and GDC? Are there some GDC or LDC platforms that don't support alloca? If so, it's possible that some don't support it. But it's not versioned that way in the code. -Steve
Jan 04
parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On Monday, 4 January 2021 at 15:02:32 UTC, Steven Schveighoffer 
wrote:
 Are there other compilers or platforms that work with D besides 
 DMD, LDC, and GDC? Are there some GDC or LDC platforms that 
 don't support alloca? If so, it's possible that some don't 
 support it. But it's not versioned that way in the code.
There's no such thing as a platform that doesn't support alloca as far as I'm aware. As for C compilers that don't support alloca, they are niche and few.
Jan 04
prev sibling parent reply Johan <j j.nl> writes:
On Sunday, 3 January 2021 at 19:15:05 UTC, welkam wrote:
 On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer 
 wrote:
 D has alloca. It's in core.std.stdlib

 https://dlang.org/phobos/core_stdc_stdlib.html#.alloca
Does it work on all compilers and all platforms?
Why wouldn't it? In C/C++/D/... language land, there is no such thing as "_the_ stack". Yeah, the _compiler_ may decide to use the special instructions/register of the CPU that address "the stack", but there is no guarantee it will; nor is there a guarantee that the binary executable with CPU stack instructions will actually use the stack and not just dynamically allocate things. There are CPUs that do not have such special instructions, and there are platforms that do not use a "stack" in the common interpretation of the word to execute a D program. Examples: Webassembly, an executable running with ASan's FakeStack enabled, a binary running in an emulator, ... LDC emits the same LLVM IR "alloca" instruction for local variables (`int i;`) as for `alloca` function calls. Simplified: if you can write `int i;` for your platform, `core.stdc.stdlib.alloca` also works. ;) -Johan
Jan 04
next sibling parent Max Haughton <maxhaton gmail.com> writes:
On Monday, 4 January 2021 at 19:38:48 UTC, Johan wrote:
 On Sunday, 3 January 2021 at 19:15:05 UTC, welkam wrote:
 [...]
Why wouldn't it? In C/C++/D/... language land, there is no such thing as "_the_ stack". Yeah, the _compiler_ may decide to use the special instructions/register of the CPU that address "the stack", but there is no guarantee it will; nor is there a guarantee that the binary executable with CPU stack instructions will actually use the stack and not just dynamically allocate things. There are CPUs that do not have such special instructions, and there are platforms that do not use a "stack" in the common interpretation of the word to execute a D program. Examples: Webassembly, an executable running with ASan's FakeStack enabled, a binary running in an emulator, ... LDC emits the same LLVM IR "alloca" instruction for local variables (`int i;`) as for `alloca` function calls. Simplified: if you can write `int i;` for your platform, `core.stdc.stdlib.alloca` also works. ;) -Johan
Nothing D will ever be used on will have this issue but I think some niche architectures don't have alloca a la C. I've seen some VLIW that can't use it as easily but I can't remember which one's exactly
Jan 04
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2021-01-04 20:38, Johan wrote:

 there are platforms that do not use a "stack" in the 
 common interpretation of the word to execute a D program.
 Examples: Webassembly, an executable running with ASan's FakeStack 
 enabled, a binary running in an emulator, ...
You can always implement a stack on the heap ;) -- /Jacob Carlborg
Jan 06