www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Bad codegen for ARM (and maybe others) when optimizing

reply Dan Olson <zans.is.for.cans yahoo.com> writes:
Just a warning to ARM and PPC users.  And potentially other CPUs with
similar register files.  I've discovered a codegen problem with -O1 and
higher.  What happens is that load instructions are being incorrectly
reordered before stores.  It is a problem in the LLVM backend.  Versions
LLVM-3.5, 3.4, 3.3 have the problem based on my testing, and maybe
earlier.

The result is a program with incorrect results or crashes.

After much digging I discovered it happens in the "top-down list latency
scheduler" pass and a search of llvm bugs brought up for PPC:

http://llvm.org/bugs/show_bug.cgi?id=20280

This looks to be the same problem.  All the fixes should be in LLVM-3.6
which I have not tried yet (does LDC work with 3.6?).

Here is a small program which exhibits the problem on ARM (iOS in my case).

---- badopt.d ----
struct A {int value;}

A fun(A a) {return a;}

void foo(A a)
{
    A z = fun(a);

    import core.stdc.stdio: printf;
    printf("%d %d\n", a.value, z.value);
    // -O1 prints: 12345678 1
    // -O0 prints: 12345678 12345678
}

void main()
{
    A a = A(12345678);
    foo(a);
}
----

In this case the, the -O1 assembly for foo() as it calls fun() is:

$ ldc2 -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -float-abi=softfp -O1
-output-s -c badopt.d

// foo gets arg a.value in r0
__D6badopt3fooFS6badopt1AZv:

	push	{r7, lr}
	mov	r7, sp



	mov	r0, sp
	bl	__D6badopt3funFS6badopt1AZS6badopt1A
Feb 04 2015
parent reply "Kai Nacke" <kai redstar.de> writes:
Hi Dan!

On Wednesday, 4 February 2015 at 16:15:47 UTC, Dan Olson wrote:
 This looks to be the same problem.  All the fixes should be in 
 LLVM-3.6
 which I have not tried yet (does LDC work with 3.6?).
Thanks for the hint! LDC works with LLVM 3.6 out of the box. Regards, Kai
Feb 04 2015
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"Kai Nacke" <kai redstar.de> writes:

 Hi Dan!

 On Wednesday, 4 February 2015 at 16:15:47 UTC, Dan Olson wrote:
 This looks to be the same problem.  All the fixes should be in
 LLVM-3.6
 which I have not tried yet (does LDC work with 3.6?).
Thanks for the hint! LDC works with LLVM 3.6 out of the box. Regards, Kai
Thanks. I updated to LLVM 3.6 and it does not fix the ARM problem. I looked at the changes for http://llvm.org/bugs/show_bug.cgi?id=20280 and they are specific to PPC target :-( I have been looking at IR. The bug is connected to using the byval attribute. I checked out clang's IR and it does not use byval for structs. It passes structs as an array. I am wondering if LDC could use a abi-arm.cpp to generate something like clang. For example, with these declarations: struct A {int value, x,y, z;} A fun(A a); clang IR for ARM: %struct.A = type { i32, i32, i32, i32 } where as LDC is generating: %badopt.A = type { i32, i32, i32, i32 } declare fastcc void _D6badopt3funFS6badopt1AZS6badopt1A(%badopt.A* noalias sret, %badopt.A* byval) The other alternative is to try to duplicate the PPC LLCM fix in my forked LLVM repo and submit a llvm bug report. -- Dan
Feb 05 2015
next sibling parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
Dan Olson <zans.is.for.cans yahoo.com> writes:
 I have been looking at IR.  The bug is connected to using the byval
 attribute.  I checked out clang's IR and it does not use byval for
 structs.  It passes structs as an array.  I am wondering if LDC could
 use a abi-arm.cpp to generate something like clang.
I think an ARM specific abi class is the right solution. On a whim, I tried one small change in abi.cpp bool passByVal(Type* t) { // was // return t->toBasetype()->ty == Tstruct; return false; } This gets rid of LLVM byval attribute and fixes this codegen problem. The generated instructions are similar to clang. The parameter is not an array like clang, it is the IR struct type as a value, but it works. "%badopt.A" instead of "%badopt.A* byval" Another thing I noticed about clang is that structs that fit in 32-bits are returned directly in r0 instead of an sret parameter. There may be more, I need to study clang. -- Dan
Feb 06 2015
parent reply "Kai Nacke" <kai redstar.de> writes:
Hi Dan!

On Friday, 6 February 2015 at 15:03:11 UTC, Dan Olson wrote:
 I think an ARM specific abi class is the right solution.  On a 
 whim, I tried one small change in abi.cpp

     bool passByVal(Type* t)
     {
         // was
         // return t->toBasetype()->ty == Tstruct;
         return false;
     }

 This gets rid of LLVM byval attribute and fixes this codegen 
 problem.  The generated instructions are similar to clang.  The 
 parameter is not an array like clang, it is the IR struct type 
 as a value, but it works.

 "%badopt.A"  instead of "%badopt.A* byval"

 Another thing I noticed about clang is that structs that fit in 
 32-bits are returned directly in r0 instead of an sret 
 parameter.  There may be more, I need to study clang.
We can add an ARM specific abi class. Could you just provide your change as a pull request? (I tried similar things in the past with the PPC backend. But at some point it was easier to fix LLVM. :-) Regards, Kai
Feb 07 2015
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"Kai Nacke" <kai redstar.de> writes:

 Hi Dan!

 We can add an ARM specific abi class. Could you just provide your
 change as a pull request?
 (I tried similar things in the past with the PPC backend. But at some
 point it was easier to fix LLVM. :-)

 Regards,
 Kai
I am working on it. After reading the D ABI http://dlang.org/abi.html which says: Function Calling Conventions The extern (C) and extern (D) calling convention matches the C calling convention used by the supported C compiler on the host system. Should I follow that? I noticed the default abi.cpp TargetABI::callingConv() returns fastcc for D calling convention and instead of ccc. I am wondering if I should cripple LDC for iOS to follow this rule. It seems to me, that when D is calling D, function calls should be as efficient as possible. -- Dan
Feb 08 2015
parent reply "Kai Nacke" <kai redstar.de> writes:
On Sunday, 8 February 2015 at 21:59:02 UTC, Dan Olson wrote:
 I am working on it. After reading the D ABI 
 http://dlang.org/abi.html
 which says:

 Function Calling Conventions
 The extern (C) and extern (D) calling convention matches the C 
 calling
 convention used by the supported C compiler on the host system.

 Should I follow that? I noticed the default abi.cpp
 TargetABI::callingConv() returns fastcc for D calling 
 convention and
 instead of ccc. I am wondering if I should cripple LDC for iOS 
 to follow
 this rule. It seems to me, that when D is calling D, function 
 calls
 should be as efficient as possible.
Hi Dan! I prefer if you follow the D ABI. My impression is that the difference between fastcc and the standard calling convention is not really great, at least on non-x86 systems. Regards, Kai
Feb 09 2015
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Tuesday, 10 February 2015 at 06:07:54 UTC, Kai Nacke wrote:
 I prefer if you follow the D ABI. My impression is that the 
 difference between fastcc and the standard calling convention 
 is not really great, at least on non-x86 systems.
To further put this into context: LDC supported non-x86_32 platforms before the D spec ABI had that "just as C" clause. Walter's position once explicitly was that on non-x86 platforms, other compilers are "free to innovate", but that seems to have changed. And in my opinion, rightfully so. We need C ABI compatibility on all the platforms anyway, and there is little reason to also maintain a second ABI implementation. We can still relax the ABI on the IR transformation level for internal function when we are doing LTO anyway (LLVM does this by default). David
Feb 10 2015
parent Dan Olson <zans.is.for.cans yahoo.com> writes:
"David Nadlinger" <code klickverbot.at> writes:

 On Tuesday, 10 February 2015 at 06:07:54 UTC, Kai Nacke wrote:
 I prefer if you follow the D ABI. My impression is that the
 difference between fastcc and the standard calling convention is not
 really great, at least on non-x86 systems.
To further put this into context: LDC supported non-x86_32 platforms before the D spec ABI had that "just as C" clause. Walter's position once explicitly was that on non-x86 platforms, other compilers are "free to innovate", but that seems to have changed. And in my opinion, rightfully so. We need C ABI compatibility on all the platforms anyway, and there is little reason to also maintain a second ABI implementation. We can still relax the ABI on the IR transformation level for internal function when we are doing LTO anyway (LLVM does this by default). David
So then, I would assume that abi.cpp TargetABI::callingConv() should be changed to return llvm::CallingConv::C for Linkd? https://github.com/ldc-developers/ldc/blob/master/gen/abi.cpp#L52 I would have thought extern(C) is enough to be compatible with native C ABI. We don't expect C or C++ to call extern(D) functions. Or is the reason for extern(D) C ABI compatibilty more to do with being compatible with other tools like debuggers. I now realize that up until this point on iOS, fastcc for D linkage has been used. I did an experiment and changed to ccc for D linkage. For some reason I start failing unittests. I'll have to see why. One peculiarity of fastcc (unlike ccc) is that it passes floating types in FP registers even though I am selecting softfp (i.e. use hardware FPU, but pass float types in GP registers). The assembly code for D math functions looks awesome for fastcc when I compare to ccc. -- Dan
Feb 10 2015
prev sibling parent reply "Kai Nacke" <kai redstar.de> writes:
Hi Dan!

On Thursday, 5 February 2015 at 16:28:37 UTC, Dan Olson wrote:
 The other alternative is to try to duplicate the PPC LLCM fix 
 in my
 forked LLVM repo and submit a llvm bug report.
IMHO you should submit the bug report as LLVM does not work as specified. Regards, Kai
Feb 07 2015
parent Dan Olson <zans.is.for.cans yahoo.com> writes:
"Kai Nacke" <kai redstar.de> writes:

 Hi Dan!

 On Thursday, 5 February 2015 at 16:28:37 UTC, Dan Olson wrote:
 The other alternative is to try to duplicate the PPC LLCM fix in my
 forked LLVM repo and submit a llvm bug report.
IMHO you should submit the bug report as LLVM does not work as specified. Regards, Kai
True, I'll do that.
Feb 08 2015