digitalmars.D - An LLVM bug that affect both LDC and SDC. Worth pushing for

deadalnix (6/6) Jun 15 2014 http://llvm.org/bugs/show_bug.cgi?id=20049

safety0ff (4/10) Jun 16 2014 This is the corresponding D code:

deadalnix (3/15) Jun 16 2014 Not exactly, but this is the kind of code that will trigger the
Iain Buclaw via Digitalmars-d (3/16) Jun 16 2014 That code shouldn't create a GC allocated closure. :o)

deadalnix (4/26) Jun 16 2014 Change return bar to return &bar and you got one possible

Iain Buclaw via Digitalmars-d (4/34) Jun 18 2014 Yeah, I did get that bit. I'm not sure of the optimisation though.

David Nadlinger (5/8) Jun 18 2014 How would that work if your inliner operates on some

Iain Buclaw via Digitalmars-d (9/17) Jun 18 2014 I don't know LLVM to comment. But the way GCC operates at a higher

David Nadlinger (12/33) Jun 18 2014 You stated that closure/frame generation should occur after

deadalnix (3/13) Jun 18 2014 Yes, but the problem is not limited to SDC. LDC exhibit the same

David Nadlinger (8/10) Jun 18 2014 Yes, certainly. To me, this looks like a limitation in GVN or so.

deadalnix (6/9) Jun 18 2014 That doesn't really work that way for LLVM. You generate language

Iain Buclaw via Digitalmars-d (13/23) Jun 18 2014 Likewise here. But unless I'm missing something (I'm not sure what

deadalnix (14/28) Jun 18 2014 That is the final goal. A first goal should be:

Iain Buclaw via Digitalmars-d (10/39) Jun 18 2014 I just tried out doing something simple in gdc to see if I could

dennis luehring (35/46) Jun 18 2014 just to show what clang 3.5 svn and libc++ can currently optimize down

deadalnix (2/2) Jun 18 2014 If they go for clang specific solution, that aren't gonna cut it

dennis luehring (2/4) Jun 18 2014 only as an orientation what weaker language + optimizer can reach :)

"deadalnix" <deadalnix gmail.com> writes:

http://llvm.org/bugs/show_bug.cgi?id=20049

Basically when you have a closure in a closure and the whole 
thing get inlined, LLVM mess up, which result in compiler not 
being able to optimize GC allocation away.

Probably worth pushing for. It does probably affect other 
functional languages as well, but I didn't checked.

Jun 15 2014

"safety0ff" <safety0ff.dev gmail.com> writes:

On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole 
 thing get inlined, LLVM mess up, which result in compiler not 
 being able to optimize GC allocation away.

 Probably worth pushing for. It does probably affect other 
 functional languages as well, but I didn't checked.

This is the corresponding D code: 
https://github.com/deadalnix/SDC/blob/master/tests/test0156.d

Correct?

Jun 16 2014

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 16 June 2014 at 16:31:20 UTC, safety0ff wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole 
 thing get inlined, LLVM mess up, which result in compiler not 
 being able to optimize GC allocation away.

 Probably worth pushing for. It does probably affect other 
 functional languages as well, but I didn't checked.

 This is the corresponding D code: 
 https://github.com/deadalnix/SDC/blob/master/tests/test0156.d

 Correct?

Not exactly, but this is the kind of code that will trigger the 
bug.

Jun 16 2014

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 16 June 2014 17:31, safety0ff via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole thing get
 inlined, LLVM mess up, which result in compiler not being able to optimize
 GC allocation away.

 Probably worth pushing for. It does probably affect other functional
 languages as well, but I didn't checked.


 This is the corresponding D code:
 https://github.com/deadalnix/SDC/blob/master/tests/test0156.d

 Correct?

That code shouldn't create a GC allocated closure. :o)

Jun 16 2014

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 16 June 2014 at 19:18:29 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 On 16 June 2014 17:31, safety0ff via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole 
 thing get
 inlined, LLVM mess up, which result in compiler not being 
 able to optimize
 GC allocation away.

 Probably worth pushing for. It does probably affect other 
 functional
 languages as well, but I didn't checked.


 This is the corresponding D code:
 https://github.com/deadalnix/SDC/blob/master/tests/test0156.d

 Correct?

 That code shouldn't create a GC allocated closure. :o)

Change return bar to return &bar and you got one possible 
candidate to trigger the bug.

Jun 16 2014

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 16 June 2014 20:37, deadalnix via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 19:18:29 UTC, Iain Buclaw via Digitalmars-d
 wrote:
 On 16 June 2014 17:31, safety0ff via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:
 http://llvm.org/bugs/show_bug.cgi?id=20049

 Basically when you have a closure in a closure and the whole thing get
 inlined, LLVM mess up, which result in compiler not being able to
 optimize
 GC allocation away.

 Probably worth pushing for. It does probably affect other functional
 languages as well, but I didn't checked.



 This is the corresponding D code:
 https://github.com/deadalnix/SDC/blob/master/tests/test0156.d

 Correct?


 That code shouldn't create a GC allocated closure. :o)


 Change return bar to return &bar and you got one possible candidate to
 trigger the bug.


Yeah, I did get that bit. I'm not sure of the optimisation though.

IMO, the closure/frame generation should occur *after* inlining.

Jun 18 2014

"David Nadlinger" <code klickverbot.at> writes:

On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation 
 though.

 IMO, the closure/frame generation should occur *after* inlining.

How would that work if your inliner operates on some 
language-independent IR?

David

Jun 18 2014

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 18 June 2014 14:18, David Nadlinger via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d
 wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation though.

 IMO, the closure/frame generation should occur *after* inlining.


 How would that work if your inliner operates on some language-independent
 IR?

I don't know LLVM to comment.  But the way GCC operates at a higher
level so that all information is available to use (the inlined
function is just duplicated with all its parameters remapped into
variables, and the return expression is turned into an assignment to a
dedicated return-value variable).

Though the fact still is that the same is true with GDC, it's IR is
generated before optimisation passes.

Jun 18 2014

"David Nadlinger" <code klickverbot.at> writes:

On Wednesday, 18 June 2014 at 21:14:48 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 On 18 June 2014 14:18, David Nadlinger via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via 
 Digitalmars-d
 wrote:
 IMO, the closure/frame generation should occur *after* 
 inlining.


 How would that work if your inliner operates on some 
 language-independent
 IR?

 I don't know LLVM to comment.  But the way GCC operates at a 
 higher
 level so that all information is available to use (the inlined
 function is just duplicated with all its parameters remapped 
 into
 variables, and the return expression is turned into an 
 assignment to a
 dedicated return-value variable).

You stated that closure/frame generation should occur after 
inlining. I doubt that this is feasible to implement in the 
current LDC architecture, and probably also in GDC (although I 
don't know its internals well enough to be sure).

What we do in LDC, by the way, is just to optimize the closure GC 
allocations into a stack allocation if we can prove the context 
is not escaped after inlining. This happens in a custom 
optimization pass on the IR level. deadalnix is presumably 
talking about something very similar he is working on for SDC.

David

Jun 18 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 18 June 2014 at 22:33:03 UTC, David Nadlinger wrote:
 You stated that closure/frame generation should occur after 
 inlining. I doubt that this is feasible to implement in the 
 current LDC architecture, and probably also in GDC (although I 
 don't know its internals well enough to be sure).

 What we do in LDC, by the way, is just to optimize the closure 
 GC allocations into a stack allocation if we can prove the 
 context is not escaped after inlining. This happens in a custom 
 optimization pass on the IR level. deadalnix is presumably 
 talking about something very similar he is working on for SDC.

 David

Yes, but the problem is not limited to SDC. LDC exhibit the same
behavior (because it is an LLVM bug, not a SDC or LDC one).

Jun 18 2014

"David Nadlinger" <code klickverbot.at> writes:

On Wednesday, 18 June 2014 at 23:08:06 UTC, deadalnix wrote:
 Yes, but the problem is not limited to SDC. LDC exhibit the same
 behavior (because it is an LLVM bug, not a SDC or LDC one).

Yes, certainly. To me, this looks like a limitation in GVN or so.

But coming back to the D side of things, do you have an actual D 
test case showing the problem? The remaining load in your example 
shouldn't be enough to trip up LDC's optimizer pass by itself, 
but I'm rather certain that there might be more complex code with 
missed optimization opportunities due to this.

David

Jun 18 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via 
Digitalmars-d wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation 
 though.

 IMO, the closure/frame generation should occur *after* inlining.

That doesn't really work that way for LLVM. You generate language 
independent IR and optimizations passes run on it. The front can 
add passes of its own in the optimization process to do language 
dependent optimizations.

Jun 18 2014

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 18 June 2014 19:20, deadalnix via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d
 wrote:
 Yeah, I did get that bit. I'm not sure of the optimisation though.

 IMO, the closure/frame generation should occur *after* inlining.


 That doesn't really work that way for LLVM. You generate language
 independent IR and optimizations passes run on it. The front can add passes
 of its own in the optimization process to do language dependent
 optimizations.

Likewise here.  But unless I'm missing something (I'm not sure what
magic happens with  allocate, for instance), I'm not sure how you
could expect the optimisation passes to squash closures together.

Am I correct in that it's asking for:
------
int *i = new int;
*i = 42;
return *i;


To be folded into:
------
return 42;

Jun 18 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 18 June 2014 at 21:22:44 UTC, Iain Buclaw via
Digitalmars-d wrote:
 Likewise here.  But unless I'm missing something (I'm not sure 
 what
 magic happens with  allocate, for instance), I'm not sure how 
 you
 could expect the optimisation passes to squash closures 
 together.

 Am I correct in that it's asking for:
 ------
 int *i = new int;
 *i = 42;
 return *i;


 To be folded into:
 ------
 return 42;

That is the final goal. A first goal should be:

int *i = new int;
*i = 42;
return 42;

That first step is supposed to be done by LLVM infra itself (and
it does for such a simple example, but if you multiply the new,
it gets confused). It is necessary because at this point, the
language specific pass will be able to detect that nobody ever
read from the allocated memory and that it doesn't escape, so it
can be optimized away.

If the first step do not happen, then the second step won't
either, and it cascade down to pretty stupid code generation.

Jun 18 2014

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 18 June 2014 22:29, deadalnix via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Wednesday, 18 June 2014 at 21:22:44 UTC, Iain Buclaw via

 Digitalmars-d wrote:
 Likewise here.  But unless I'm missing something (I'm not sure what
 magic happens with  allocate, for instance), I'm not sure how you
 could expect the optimisation passes to squash closures together.

 Am I correct in that it's asking for:
 ------
 int *i = new int;
 *i = 42;
 return *i;


 To be folded into:
 ------
 return 42;


 That is the final goal. A first goal should be:


 int *i = new int;
 *i = 42;
 return 42;

 That first step is supposed to be done by LLVM infra itself (and
 it does for such a simple example, but if you multiply the new,
 it gets confused). It is necessary because at this point, the
 language specific pass will be able to detect that nobody ever
 read from the allocated memory and that it doesn't escape, so it
 can be optimized away.

 If the first step do not happen, then the second step won't
 either, and it cascade down to pretty stupid code generation.

I just tried out doing something simple in gdc to see if I could
trigger this - got optimisation passes to compile it down to:

_d_allocmemory (16);
_d_allocmemory (16);
return 36;

Which is more than what I expected... it managed to const-fold all
operations into a single return, just haven't lost the (now) useless
GC allocations for the closures that were removed as dead code.

Jun 18 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 18.06.2014 23:22, schrieb Iain Buclaw via Digitalmars-d:
 Likewise here.  But unless I'm missing something (I'm not sure what
 magic happens with  allocate, for instance), I'm not sure how you
 could expect the optimisation passes to squash closures together.

 Am I correct in that it's asking for:
 ------
 int *i = new int;
 *i = 42;
 return *i;


 To be folded into:
 ------
 return 42;

just to show what clang 3.5 svn and libc++ can currently optimize down

patches
clang: http://reviews.llvm.org/rL210137
libc++: http://reviews.llvm.org/rL210211

#example 1

#include <vector>
#include <numeric>
int main()
{
     const std::vector<int> a{1,2};
     const std::vector<int> b{4,5};
     const std::vector<int> ints
     {
       std::accumulate(a.begin(),a.end(),1),
       std::accumulate(b.begin(),b.end(),2),
     };
     return std::accumulate(ints.begin(),ints.end(),100);
}

asm result:


    movl  $115, %eax
    retq

#example 2

#include <string>
int main()
{
    return std::string("hello").size();
}

asm result:


    movl $5, %eax
    retq

an older clang/libc++, gcc 4.9.x, and VS2013 producing much much (much) 
more asm code in these situations

Jun 18 2014

"deadalnix" <deadalnix gmail.com> writes:

If they go for clang specific solution, that aren't gonna cut it 
for us :(

Jun 18 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 19.06.2014 07:16, schrieb deadalnix:
 If they go for clang specific solution, that aren't gonna cut it
 for us :(

only as an orientation what weaker language + optimizer can reach :)

Jun 18 2014

D Programming

C/C++ Programming

Other

digitalmars.D - An LLVM bug that affect both LDC and SDC. Worth pushing for