digitalmars.D - alloca is slow and dangerous

welkam (9/9) Jan 01 2021 Over the years I saw several people asking why D doesn't have

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/7) Jan 01 2021 13% slower is nothing compared to the alternative which is malloc
IGotD- (18/27) Jan 01 2021 Not entirely true. You need to check so that the amount elements

Patrick Schluter (22/53) Jan 01 2021 I think it is even worse than for VLA, because for the VLA you

welkam (15/28) Jan 03 2021 Because it requires backend help. alloca is a special function

IGotD- (12/15) Jan 03 2021 I think the default initialization in D is an excellent feature.

welkam (10/20) Jan 03 2021 Agree. Also Walter put an escape hatch because he knew that some

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/11) Jan 03 2021 The standard approach in PL design is to either default

Steven Schveighoffer (11/20) Jan 01 2021 D has alloca. It's in core.std.stdlib

Ola Fosheim Grostad (10/13) Jan 01 2021 I never use alloca directly, but in C you can just use an int
welkam (3/5) Jan 03 2021 Does it work on all compilers and all platforms?

Steven Schveighoffer (10/16) Jan 04 2021 I don't know. I would expect it to work anywhere D is supported. Looking...

Iain Buclaw (5/9) Jan 04 2021 There's no such thing as a platform that doesn't support alloca

Johan (18/25) Jan 04 2021 Why wouldn't it?

Max Haughton (5/24) Jan 04 2021 Nothing D will ever be used on will have this issue but I think
Jacob Carlborg (4/8) Jan 06 2021 You can always implement a stack on the heap ;)

welkam <wwwelkam gmail.com> writes:

Over the years I saw several people asking why D doesn't have 
alloca. I just want to share a video where it states that alloca 
is slower than static array so we have one more arguments against 
it.
https://youtu.be/FY9SbqTO5GQ?t=468

In summary alloca is:
1. Hard to implement
2. security problem
3. slower than static array on the stack

Jan 01 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:
 Over the years I saw several people asking why D doesn't have 
 alloca. I just want to share a video where it states that 
 alloca is slower than static array

13% slower is nothing compared to the alternative which is malloc 
or running out of stack (which will happen real fast if you keep 
piling up worst case fixed size).

Jan 01 2021

IGotD- <nise nise.com> writes:

On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:
 Over the years I saw several people asking why D doesn't have 
 alloca. I just want to share a video where it states that 
 alloca is slower than static array so we have one more 
 arguments against it.
 https://youtu.be/FY9SbqTO5GQ?t=468

 In summary alloca is:
 1. Hard to implement

Why is it hard to implement?

 2. security problem

Not entirely true. You need to check so that the amount elements 
is within reasonable limits. You can have stack overflows but 
most operating systems have guard pages detecting such cases. In 
kernels it might not be the case but in kernels you need pay 
attention more than in user space programs. Keep in mind static 
arrays on the stack are prone to overflows just as much as a VLA.

In D static arrays have an additional problem and that is that D 
will initialize the array by default. For stability this is great 
but performance this takes more time. VLA can be a better option 
here as you initialize exactly the amount elements you need.

 3. slower than static array on the stack

The reason was strangely a lot of code for just allocating on the 
stack. He said he wasn't even using -O2 optimization and with -O2 
it would be smaller. In general it shouldn't be that bad.


The fear of alloca in my opinion is exaggerated.

Doesn't D implement alloca for those who want it?
https://dlang.org/phobos/rt_alloca.html

Jan 01 2021

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:
 Over the years I saw several people asking why D doesn't have 
 alloca. I just want to share a video where it states that 
 alloca is slower than static array so we have one more 
 arguments against it.
 https://youtu.be/FY9SbqTO5GQ?t=468

 In summary alloca is:
 1. Hard to implement

 Why is it hard to implement?

 2. security problem

 Not entirely true. You need to check so that the amount 
 elements is within reasonable limits. You can have stack 
 overflows but most operating systems have guard pages detecting 
 such cases. In kernels it might not be the case but in kernels 
 you need pay attention more than in user space programs. Keep 
 in mind static arrays on the stack are prone to overflows just 
 as much as a VLA.

I think it is even worse than for VLA, because for the VLA you 
will have used a computed length depending on the parameters. So 
a real size. With static arrays, one will generally use a "worst 
case" size which value is much more prone to errors than the real 
length required.
Anecdote: recent gcc versions (since version 7) have for each 
version better and better heuristic to check the potential size 
of a buffers with string combination functions (sprintf, strcpy, 
strcat etc.). The number of potential worst case buffer overflows 
in legacy code is staggering.

 In D static arrays have an additional problem and that is that 
 D will initialize the array by default. For stability this is 
 great but performance this takes more time. VLA can be a better 
 option here as you initialize exactly the amount elements you 
 need.

yes

 3. slower than static array on the stack

 The reason was strangely a lot of code for just allocating on 
 the stack. He said he wasn't even using -O2 optimization and 
 with -O2 it would be smaller. In general it shouldn't be that 
 bad.

afaik VLA get in the way of clever frame pointer optimisations. 
Local variables and parameters will have to be accessed with 
slightly more costly code than if the code knows at compile time 
exactly where they are. Better optimizers and code generation can 
alleviate it in a lot of cases.

 The fear of alloca in my opinion is exaggerated.

 Doesn't D implement alloca for those who want it?
 https://dlang.org/phobos/rt_alloca.html

While I'm neither afraid nor sceptical towards VLA/alloca, I have 
to say after 10 years of use that there are really very few cases 
where it was really a substantial improvement. It often spared a 
malloc/free pair but more often than expected added memory 
copies. YMMV.

Jan 01 2021

welkam <wwwelkam gmail.com> writes:

Im going to respond to both messages

On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 Why is it hard to implement?

Because it requires backend help. alloca is a special function

On Friday, 1 January 2021 at 17:48:22 UTC, Patrick Schluter wrote:
 On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 In D static arrays have an additional problem and that is that 
 D will initialize the array by default. For stability this is 
 great but performance this takes more time. VLA can be a 
 better option here as you initialize exactly the amount 
 elements you need.

 yes

Usually this forum has excellent technical responses but this 
time you dropped ball on this one. Behold

byte[128] = void;

Walter did a talk on this one.
DConf 2019 Day 1 Keynote: Allocating Memory with the D 
Programming Language -- Walter Bright
https://www.youtube.com/watch?t=2210&v=_PB6Hdi4R7M

On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:
 3. slower than static array on the stack

 The reason was strangely a lot of code for just allocating on 
 the stack. He said he wasn't even using -O2 optimization and 
 with -O2 it would be smaller. In general it shouldn't be that 
 bad.

Later he said that when people replaced VLA in kernel they found 
13% speed up. While it was not stated kernel should be 
benchmarked with optimizations enabled.

Jan 03 2021

IGotD- <nise nise.com> writes:

On Sunday, 3 January 2021 at 19:12:47 UTC, welkam wrote:
 Usually this forum has excellent technical responses but this 
 time you dropped ball on this one. Behold

 byte[128] = void;

I think the default initialization in D is an excellent feature. 
MS has already stated that they want to do this in C++. Sure your 
example circumvents that but still the default initialization 
help. My point was that more that you will allocate exactly the 
elements you need on the stack. For example you have have a 
maximum of 1024 elements but usually you only use one or two 
elements. Then static allocation of 1024 elements become crazy 
big and also it generate more cache misses.

I'm not sure why the stack allocation needs to be so enormously 
complicated like the seminar. In general you just need to 
subtract the stack pointer.

Jan 03 2021

welkam <wwwelkam gmail.com> writes:

On Sunday, 3 January 2021 at 19:33:10 UTC, IGotD- wrote:
 I think the default initialization in D is an excellent feature.

Agree. Also Walter put an escape hatch because he knew that some 
times you dont want it for performance reasons.

 MS has already stated that they want to do this in C++.

Not only MS but also linux people want default initialize 
everything so they both put effort in compilers to improve 
optimizations so they dont pay performance penalty where they 
should not. And in the end we benefit too.

 My point was that more that you will allocate exactly the 
 elements you need on the stack. For example you have have a 
 maximum of 1024 elements but usually you only use one or two 
 elements. Then static allocation of 1024 elements become crazy 
 big and also it generate more cache misses.

If that's the case then you do like Walter showed. Common case on 
the stack and uncommon big allocations on heap.

 I'm not sure why the stack allocation needs to be so enormously 
 complicated like the seminar. In general you just need to 
 subtract the stack pointer.

I'm not following you

Jan 03 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 3 January 2021 at 20:59:00 UTC, welkam wrote:
 MS has already stated that they want to do this in C++.

 Not only MS but also linux people want default initialize 
 everything so they both put effort in compilers to improve 
 optimizations so they dont pay performance penalty where they 
 should not. And in the end we benefit too.

The standard approach in PL design is to either default 
initialize or to statically establish that variables are not used 
until initialized (better, but more tricky). C++ was just a small 
addition to C, so that is where the semantics come from. C is the 
outlier here, not the norm...

Jan 03 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/1/21 9:48 AM, welkam wrote:
 Over the years I saw several people asking why D doesn't have alloca. I 
 just want to share a video where it states that alloca is slower than 
 static array so we have one more arguments against it.
 https://youtu.be/FY9SbqTO5GQ?t=468
 
 In summary alloca is:
 1. Hard to implement
 2. security problem
 3. slower than static array on the stack

D has alloca. It's in core.std.stdlib

https://dlang.org/phobos/core_stdc_stdlib.html#.alloca

In my experience, using it isn't extremely beneficial -- since it's in 
stdc, it requires the c library, which means you have malloc/free, which 
is much safer/usable.

I'd much rather use a static array, or malloc/scope(exit) free, or a 
combination between the two that uses the stack when it can, and expands 
to malloc when needed.

Or just use the GC...

-Steve

Jan 01 2021

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer 
wrote:
 In my experience, using it isn't extremely beneficial -- since 
 it's in stdc, it requires the c library, which means you have 
 malloc/free, which is much safer/usable.

I never use alloca directly, but in C you can just use an int 
variable for the dynamic array size. I find that useful for 
things like building a zero terminated path that only will be 
used with one function call. Or for a FFT buffer that is only 
known at runtime and I want to use hot memory (already in cache).

To do that fixed size would take maybe 200 KB, could easily be 
100 times more than needed... And the next stack frame would be 
cold as hell...

Jan 01 2021

welkam <wwwelkam gmail.com> writes:

On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer 
wrote:
 D has alloca. It's in core.std.stdlib

 https://dlang.org/phobos/core_stdc_stdlib.html#.alloca

Does it work on all compilers and all platforms?

Jan 03 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/3/21 2:15 PM, welkam wrote:
 On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer wrote:
 D has alloca. It's in core.std.stdlib

 https://dlang.org/phobos/core_stdc_stdlib.html#.alloca

 Does it work on all compilers and all platforms?

I don't know. I would expect it to work anywhere D is supported. Looking 
at the code, DMD supports it, with GNU it's an intrinsic, and with LDC 
it's given a pragma to tag it (presumably so it can be recognized as an 
intrinsic).

Are there other compilers or platforms that work with D besides DMD, 
LDC, and GDC? Are there some GDC or LDC platforms that don't support 
alloca? If so, it's possible that some don't support it. But it's not 
versioned that way in the code.

-Steve

Jan 04 2021

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Monday, 4 January 2021 at 15:02:32 UTC, Steven Schveighoffer 
wrote:
 Are there other compilers or platforms that work with D besides 
 DMD, LDC, and GDC? Are there some GDC or LDC platforms that 
 don't support alloca? If so, it's possible that some don't 
 support it. But it's not versioned that way in the code.

There's no such thing as a platform that doesn't support alloca 
as far as I'm aware.  As for C compilers that don't support 
alloca, they are niche and few.

Jan 04 2021

Johan <j j.nl> writes:

On Sunday, 3 January 2021 at 19:15:05 UTC, welkam wrote:
 On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer 
 wrote:
 D has alloca. It's in core.std.stdlib

 https://dlang.org/phobos/core_stdc_stdlib.html#.alloca

 Does it work on all compilers and all platforms?

Why wouldn't it?

In C/C++/D/... language land, there is no such thing as "_the_ 
stack". Yeah, the _compiler_ may decide to use the special 
instructions/register of the CPU that address "the stack", but 
there is no guarantee it will; nor is there a guarantee that the 
binary executable with CPU stack instructions will actually use 
the stack and not just dynamically allocate things. There are 
CPUs that do not have such special instructions, and there are 
platforms that do not use a "stack" in the common interpretation 
of the word to execute a D program.
Examples: Webassembly, an executable running with ASan's 
FakeStack enabled, a binary running in an emulator, ...

LDC emits the same LLVM IR "alloca" instruction for local 
variables (`int i;`) as for `alloca` function calls. Simplified: 
if you can write `int i;` for your platform, 
`core.stdc.stdlib.alloca` also works. ;)

-Johan

Jan 04 2021

Max Haughton <maxhaton gmail.com> writes:

On Monday, 4 January 2021 at 19:38:48 UTC, Johan wrote:
 On Sunday, 3 January 2021 at 19:15:05 UTC, welkam wrote:
 [...]

 Why wouldn't it?

 In C/C++/D/... language land, there is no such thing as "_the_ 
 stack". Yeah, the _compiler_ may decide to use the special 
 instructions/register of the CPU that address "the stack", but 
 there is no guarantee it will; nor is there a guarantee that 
 the binary executable with CPU stack instructions will actually 
 use the stack and not just dynamically allocate things. There 
 are CPUs that do not have such special instructions, and there 
 are platforms that do not use a "stack" in the common 
 interpretation of the word to execute a D program.
 Examples: Webassembly, an executable running with ASan's 
 FakeStack enabled, a binary running in an emulator, ...

 LDC emits the same LLVM IR "alloca" instruction for local 
 variables (`int i;`) as for `alloca` function calls. 
 Simplified: if you can write `int i;` for your platform, 
 `core.stdc.stdlib.alloca` also works. ;)

 -Johan

Nothing D will ever be used on will have this issue but I think 
some niche architectures don't have alloca a la C. I've seen some 
VLIW that can't use it as easily but I can't remember which one's 
exactly

Jan 04 2021

Jacob Carlborg <doob me.com> writes:

On 2021-01-04 20:38, Johan wrote:

 there are platforms that do not use a "stack" in the 
 common interpretation of the word to execute a D program.
 Examples: Webassembly, an executable running with ASan's FakeStack 
 enabled, a binary running in an emulator, ...

You can always implement a stack on the heap ;)

-- 
/Jacob Carlborg

Jan 06 2021

D Programming

C/C++ Programming

Other

digitalmars.D - alloca is slow and dangerous