digitalmars.D - Lack of optimisation with returning address of inner functions
- Cecil Ward (33/33) Sep 03 2020 This question is about a peculiar lack of optimisation in a
- Cecil Ward (1/1) Sep 03 2020 For the LDC version, see https://d.godbolt.org/z/x4rhbe
- Stefan Koch (2/3) Sep 03 2020 Compile with -O3 and -Oz.
- Jackel (13/14) Sep 03 2020 Not sure what your intention here is, but you are returning a
- Adam D. Ruppe (27/34) Sep 03 2020 I think this is a frontend/backend thing.
This question is about a peculiar lack of optimisation in a certain weird case only. Example, see https://d.godbolt.org/z/54eaGd ; either LDC or GDC may be used, results are the same here : auto test2() { int a = 20; int foo() { return a + 5; } // inner function return &foo; // other way to construct delegate } auto bar() { return foo(); } Now with LDC or GDC, inspecting the code generated, the code for foo is simply literally { return 25; }, yet if test2 is called, the code generated for the foo2 routine is not used; rather the generated code is : call _d_allocmemory mov dword ptr [rax], 20 mov rdx, foo ret 1. So why the lack of optimisation? - could simply have got rid of the delegate generation in test2a as implementations when it is inlined in bar (and which is done sanely [!] in the generated code for test2a). 2. Even weirder, if you delete the & from &foo leaving simply "return foo;" then this fixes the non-optimisation bug. Why? 3. What’s the difference between foo and &foo ? 4. Leaving aside the special case above where the inner function’s address is returned, surely in many cases an inner function can be converted into an ordinary function, or simply _inlined_ so there is no function at all, no? As is seen in the standalone code generated for foo.
Sep 03 2020
For the LDC version, see https://d.godbolt.org/z/x4rhbe
Sep 03 2020
On Friday, 4 September 2020 at 01:13:53 UTC, Cecil Ward wrote:For the LDC version, see https://d.godbolt.org/z/x4rhbeCompile with -O3 and -Oz.
Sep 03 2020
On Friday, 4 September 2020 at 01:13:53 UTC, Cecil Ward wrote:For the LDC version, see https://d.godbolt.org/z/x4rhbeNot sure what your intention here is, but you are returning a delegate. You aren't actually calling it. There could be side effects, eg it has to create the context for the delegate. Which another function could access the delegate's data. So it can't simply optimize it out in this case. When you actually call the function though, it does just narrow down to returning 25. https://d.godbolt.org/z/6f87aT auto bar() { return test2()(); } pure nothrow safe int example.bar(): mov eax, 25 ret
Sep 03 2020
On Friday, 4 September 2020 at 01:10:48 UTC, Cecil Ward wrote:1. So why the lack of optimisation? - could simply have got rid of the delegate generation in test2a as implementations when it is inlined in bar (and which is done sanely [!] in the generated code for test2a).I think this is a frontend/backend thing. That optimization is done by the back end, but the front end doesn't know that and still assumes there's a full-blown delegate required.2. Even weirder, if you delete the & from &foo leaving simply "return foo;" then this fixes the non-optimisation bug. Why?That just calls the function and returns its value, which obviously needs no delegate since the function doesn't outlive the surrounding context.3. What’s the difference between foo and &foo ?Huge, huge difference. &foo returns a function pointer or delegate referring to the function. The function is not called here. foo is just foo() without the optional parenthesis; the function is actually immediately called. Whenever the compiler frontend sees a `return &some_nested_function` it assumes a longer lifetime is required and allocates the captured variables on the heap up front. So by the time it gets to the optimizer in the back end, it sees all that allocation and pointer code already existing. With certain settings, it might be able to see through it and optimize anyway, but its job got a lot harder since it might not know what happens with that return value later in the program. I suspect the best you'd see in practice is all usages get inlined then the linker can discard the actual function that allocates as unused but even that can be harder than it seems for the backend to figure out given the information it has. It doesn't really understand *why* it is calling this other function, it just knows it is.
Sep 03 2020