www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - An LLVM bug that affect both LDC and SDC. Worth pushing for

reply "deadalnix" <deadalnix gmail.com> writes:
http://llvm.org/bugs/show_bug.cgi?id=20049

Basically when you have a closure in a closure and the whole 
thing get inlined, LLVM mess up, which result in compiler not 
being able to optimize GC allocation away.

Probably worth pushing for. It does probably affect other 
functional languages as well, but I didn't checked.
Jun 15 2014
parent reply "safety0ff" <safety0ff.dev gmail.com> writes:
On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole 
 thing get inlined, LLVM mess up, which result in compiler not 
 being able to optimize GC allocation away.

 Probably worth pushing for. It does probably affect other 
 functional languages as well, but I didn't checked.
This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
Jun 16 2014
next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 16 June 2014 at 16:31:20 UTC, safety0ff wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole 
 thing get inlined, LLVM mess up, which result in compiler not 
 being able to optimize GC allocation away.

 Probably worth pushing for. It does probably affect other 
 functional languages as well, but I didn't checked.
This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
Not exactly, but this is the kind of code that will trigger the bug.
Jun 16 2014
prev sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 16 June 2014 17:31, safety0ff via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole thing get
 inlined, LLVM mess up, which result in compiler not being able to optimize
 GC allocation away.

 Probably worth pushing for. It does probably affect other functional
 languages as well, but I didn't checked.
This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
That code shouldn't create a GC allocated closure. :o)
Jun 16 2014
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Monday, 16 June 2014 at 19:18:29 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 On 16 June 2014 17:31, safety0ff via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole 
 thing get
 inlined, LLVM mess up, which result in compiler not being 
 able to optimize
 GC allocation away.

 Probably worth pushing for. It does probably affect other 
 functional
 languages as well, but I didn't checked.
This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
That code shouldn't create a GC allocated closure. :o)
Change return bar to return &bar and you got one possible candidate to trigger the bug.
Jun 16 2014
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 16 June 2014 20:37, deadalnix via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 19:18:29 UTC, Iain Buclaw via Digitalmars-d
 wrote:
 On 16 June 2014 17:31, safety0ff via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole thing get
 inlined, LLVM mess up, which result in compiler not being able to
 optimize
 GC allocation away.

 Probably worth pushing for. It does probably affect other functional
 languages as well, but I didn't checked.
This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
That code shouldn't create a GC allocated closure. :o)
Change return bar to return &bar and you got one possible candidate to trigger the bug.
Yeah, I did get that bit. I'm not sure of the optimisation though. IMO, the closure/frame generation should occur *after* inlining.
Jun 18 2014
next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation 
 though.

 IMO, the closure/frame generation should occur *after* inlining.
How would that work if your inliner operates on some language-independent IR? David
Jun 18 2014
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 18 June 2014 14:18, David Nadlinger via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d
 wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation though.

 IMO, the closure/frame generation should occur *after* inlining.
How would that work if your inliner operates on some language-independent IR?
I don't know LLVM to comment. But the way GCC operates at a higher level so that all information is available to use (the inlined function is just duplicated with all its parameters remapped into variables, and the return expression is turned into an assignment to a dedicated return-value variable). Though the fact still is that the same is true with GDC, it's IR is generated before optimisation passes.
Jun 18 2014
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 18 June 2014 at 21:14:48 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 On 18 June 2014 14:18, David Nadlinger via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via 
 Digitalmars-d
 wrote:
 IMO, the closure/frame generation should occur *after* 
 inlining.
How would that work if your inliner operates on some language-independent IR?
I don't know LLVM to comment. But the way GCC operates at a higher level so that all information is available to use (the inlined function is just duplicated with all its parameters remapped into variables, and the return expression is turned into an assignment to a dedicated return-value variable).
You stated that closure/frame generation should occur after inlining. I doubt that this is feasible to implement in the current LDC architecture, and probably also in GDC (although I don't know its internals well enough to be sure). What we do in LDC, by the way, is just to optimize the closure GC allocations into a stack allocation if we can prove the context is not escaped after inlining. This happens in a custom optimization pass on the IR level. deadalnix is presumably talking about something very similar he is working on for SDC. David
Jun 18 2014
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 18 June 2014 at 22:33:03 UTC, David Nadlinger wrote:
 You stated that closure/frame generation should occur after 
 inlining. I doubt that this is feasible to implement in the 
 current LDC architecture, and probably also in GDC (although I 
 don't know its internals well enough to be sure).

 What we do in LDC, by the way, is just to optimize the closure 
 GC allocations into a stack allocation if we can prove the 
 context is not escaped after inlining. This happens in a custom 
 optimization pass on the IR level. deadalnix is presumably 
 talking about something very similar he is working on for SDC.

 David
Yes, but the problem is not limited to SDC. LDC exhibit the same behavior (because it is an LLVM bug, not a SDC or LDC one).
Jun 18 2014
parent "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 18 June 2014 at 23:08:06 UTC, deadalnix wrote:
 Yes, but the problem is not limited to SDC. LDC exhibit the same
 behavior (because it is an LLVM bug, not a SDC or LDC one).
Yes, certainly. To me, this looks like a limitation in GVN or so. But coming back to the D side of things, do you have an actual D test case showing the problem? The remaining load in your example shouldn't be enough to trip up LDC's optimizer pass by itself, but I'm rather certain that there might be more complex code with missed optimization opportunities due to this. David
Jun 18 2014
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation 
 though.

 IMO, the closure/frame generation should occur *after* inlining.
That doesn't really work that way for LLVM. You generate language independent IR and optimizations passes run on it. The front can add passes of its own in the optimization process to do language dependent optimizations.
Jun 18 2014
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 18 June 2014 19:20, deadalnix via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d
 wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation though.

 IMO, the closure/frame generation should occur *after* inlining.
That doesn't really work that way for LLVM. You generate language independent IR and optimizations passes run on it. The front can add passes of its own in the optimization process to do language dependent optimizations.
Likewise here. But unless I'm missing something (I'm not sure what magic happens with allocate, for instance), I'm not sure how you could expect the optimisation passes to squash closures together. Am I correct in that it's asking for: ------ int *i = new int; *i = 42; return *i; To be folded into: ------ return 42;
Jun 18 2014
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 18 June 2014 at 21:22:44 UTC, Iain Buclaw via
Digitalmars-d wrote:
 Likewise here.  But unless I'm missing something (I'm not sure 
 what
 magic happens with  allocate, for instance), I'm not sure how 
 you
 could expect the optimisation passes to squash closures 
 together.

 Am I correct in that it's asking for:
 ------
 int *i = new int;
 *i = 42;
 return *i;


 To be folded into:
 ------
 return 42;
That is the final goal. A first goal should be: int *i = new int; *i = 42; return 42; That first step is supposed to be done by LLVM infra itself (and it does for such a simple example, but if you multiply the new, it gets confused). It is necessary because at this point, the language specific pass will be able to detect that nobody ever read from the allocated memory and that it doesn't escape, so it can be optimized away. If the first step do not happen, then the second step won't either, and it cascade down to pretty stupid code generation.
Jun 18 2014
parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 18 June 2014 22:29, deadalnix via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 21:22:44 UTC, Iain Buclaw via

 Digitalmars-d wrote:
 Likewise here.  But unless I'm missing something (I'm not sure what
 magic happens with  allocate, for instance), I'm not sure how you
 could expect the optimisation passes to squash closures together.

 Am I correct in that it's asking for:
 ------
 int *i = new int;
 *i = 42;
 return *i;


 To be folded into:
 ------
 return 42;
That is the final goal. A first goal should be: int *i = new int; *i = 42; return 42; That first step is supposed to be done by LLVM infra itself (and it does for such a simple example, but if you multiply the new, it gets confused). It is necessary because at this point, the language specific pass will be able to detect that nobody ever read from the allocated memory and that it doesn't escape, so it can be optimized away. If the first step do not happen, then the second step won't either, and it cascade down to pretty stupid code generation.
I just tried out doing something simple in gdc to see if I could trigger this - got optimisation passes to compile it down to: _d_allocmemory (16); _d_allocmemory (16); return 36; Which is more than what I expected... it managed to const-fold all operations into a single return, just haven't lost the (now) useless GC allocations for the closures that were removed as dead code.
Jun 18 2014
prev sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 18.06.2014 23:22, schrieb Iain Buclaw via Digitalmars-d:
 Likewise here.  But unless I'm missing something (I'm not sure what
 magic happens with  allocate, for instance), I'm not sure how you
 could expect the optimisation passes to squash closures together.

 Am I correct in that it's asking for:
 ------
 int *i = new int;
 *i = 42;
 return *i;


 To be folded into:
 ------
 return 42;
just to show what clang 3.5 svn and libc++ can currently optimize down patches clang: http://reviews.llvm.org/rL210137 libc++: http://reviews.llvm.org/rL210211 #example 1 #include <vector> #include <numeric> int main() { const std::vector<int> a{1,2}; const std::vector<int> b{4,5}; const std::vector<int> ints { std::accumulate(a.begin(),a.end(),1), std::accumulate(b.begin(),b.end(),2), }; return std::accumulate(ints.begin(),ints.end(),100); } asm result: movl $115, %eax retq #example 2 #include <string> int main() { return std::string("hello").size(); } asm result: movl $5, %eax retq an older clang/libc++, gcc 4.9.x, and VS2013 producing much much (much) more asm code in these situations
Jun 18 2014
parent reply "deadalnix" <deadalnix gmail.com> writes:
If they go for clang specific solution, that aren't gonna cut it 
for us :(
Jun 18 2014
parent dennis luehring <dl.soluz gmx.net> writes:
Am 19.06.2014 07:16, schrieb deadalnix:
 If they go for clang specific solution, that aren't gonna cut it
 for us :(
only as an orientation what weaker language + optimizer can reach :)
Jun 18 2014