digitalmars.D.ldc - Android/ARM codegen

Joakim (72/72) Jul 16 2015 Alright, I've been stepping through some failing tests: one that

Dan Olson (9/20) Jul 16 2015 That doesn't look right.

Joakim (13/35) Jul 16 2015 llvm triple and other flags used:

Dan Olson (4/18) Jul 16 2015 And the nice thing about LLVM / LDC is that I can specify your

Dan Olson (35/64) Jul 17 2015 Joakim, I have a hunch. Can you try changing your abi-android.cpp so

Joakim (3/18) Jul 17 2015 Yep, that fixed it, std.zip passes its tests now. :) I'll run the

Dan Olson (2/4) Jul 17 2015 :-)

Joakim (7/11) Jul 17 2015 Around 30 more modules from std.phobos now pass their tests with

Joakim (20/33) Aug 03 2015 Dan, as I said in the main forum, most of the druntime/phobos

Dan Olson (18/40) Aug 03 2015 Hi Joakim - I have -O3 optimization on for all modules in the release

"Joakim" <dlang joakim.fea.st> writes:

Alright, I've been stepping through some failing tests: one that 
seems to be bad codegen is that many calls to 
std.algorithm.iteration.map seem to fail, for example, when 
running the tests for std.zip.  One of the std.zip tests calls 
std.random.uniform, which then gets its second parameter stomped 
by map from rndGen().  I've tracked it down to this function in 
the llvm IR:

define weak_odr void 
 _D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(%"std.random.rndGen.M
pResult!(__lambda4, Repeat!int).MapResult"* noalias nocapture sret %.sret_arg,
i8* %.nest_arg, %"std.range.Repeat!int.Repeat"* byval nocapture readonly

{

   %.structliteral = alloca 
%"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult", 
align 4 ; [#uses = 3 type = 
%"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"*]

   %1 = getelementptr inbounds 
%"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"* 
%.structliteral, i32 0, i32 0, i32 0 ; [#uses = 1 type = i32*]

   store i32 0, i32* %1, align 4

   %2 = getelementptr %"std.random.rndGen.MapResult!(__lambda4, 
Repeat!int).MapResult"* %.structliteral, i32 0, i32 1 ; [#uses = 
1 type = i8**]

   store i8* %.nest_arg, i8** %2, align 4

   %tmp = call %"std.random.rndGen.MapResult!(__lambda4, 
Repeat!int).MapResult"* 
 _D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(%"std.random.rndGen.M
pResult!(__lambda4, Repeat!int).MapResult"* returned %.structliteral,
%"std.range.Repeat!int.Repeat"* byval %r_arg) ; [#uses = 1 type =
%"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"*]

   %3 = bitcast %"std.random.rndGen.MapResult!(__lambda4, 
Repeat!int).MapResult"* %tmp to i64* ; [#uses = 1 type = i64*]

   %4 = bitcast %"std.random.rndGen.MapResult!(__lambda4, 
Repeat!int).MapResult"* %.sret_arg to i64* ; [#uses = 1 type = 
i64*]

   %5 = load i64* %3, align 1                      ; [#uses = 1 
type = i64]

   store i64 %5, i64* %4, align 4

   ret void
}

which gets translated to the following ARM assembly:

_D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult:
         .fnstart
.Leh_func_begin94:


         .save   {r4, lr}
         push    {r4, lr}


         mov     r4, r0


         stmib   sp, {r0, r1}


         bl      
_D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(PLT)
         ldmib   sp, {r0, r1}
         stm     r4, {r0, r1}

         pop     {r4, lr}

         bx      lr

The problem appears to be that even though r4 is saved at the 
beginning of the function, it is overwritten by r1 in the stmib 
instruction afterwards.  Obviously there's no point in pushing r4 
and popping it at the end of the function, if you've lost it by 

the stack pointer by 8, then stmib increments it 4 _before_ 
storing r0 then r1, overwriting the contents of r4 saved at the 
beginning of the function.  Other calls to map also seem to fail 
in other instances of the same template function, but 
interestingly in different ways, ie no uses of stmib there.  I 
haven't tracked down exactly why those other instances fail.

I'm not sure how to reduce this further and if I need to file a 
bug for llvm: any pointers?

Jul 16 2015

Dan Olson <gorox comcast.net> writes:

"Joakim" <dlang joakim.fea.st> writes:
 The problem appears to be that even though r4 is saved at the
 beginning of the function, it is overwritten by r1 in the stmib
 instruction afterwards.  Obviously there's no point in pushing r4 and
 popping it at the end of the function, if you've lost it by

 stack pointer by 8, then stmib increments it 4 _before_ storing r0
 then r1, overwriting the contents of r4 saved at the beginning of the
 function.  Other calls to map also seem to fail in other instances of
 the same template function, but interestingly in different ways, ie no
 uses of stmib there.  I haven't tracked down exactly why those other
 instances fail.

That doesn't look right.

What is your triple and other options (optimization), llvm version your
patched src is based on, and ldc branch (merge-2.067 branch I think)?
I'd like to compare the asm snippet with what I get get for
thumbv7-apple-ios, because it should be similar, but it passes std.zip.

It could be I just haven't stumbled into the combination you are using.

-- 
Dan

Jul 16 2015

"Joakim" <dlang joakim.fea.st> writes:

On Thursday, 16 July 2015 at 16:40:31 UTC, Dan Olson wrote:
 "Joakim" <dlang joakim.fea.st> writes:
 The problem appears to be that even though r4 is saved at the 
 beginning of the function, it is overwritten by r1 in the 
 stmib instruction afterwards.  Obviously there's no point in 
 pushing r4 and popping it at the end of the function, if 
 you've lost it by overwriting in between.  Specifically, "sub 

 increments it 4 _before_ storing r0 then r1, overwriting the 
 contents of r4 saved at the beginning of the function.  Other 
 calls to map also seem to fail in other instances of the same 
 template function, but interestingly in different ways, ie no 
 uses of stmib there.  I haven't tracked down exactly why those 
 other instances fail.

 That doesn't look right.

 What is your triple and other options (optimization), llvm 
 version your patched src is based on, and ldc branch 
 (merge-2.067 branch I think)? I'd like to compare the asm 
 snippet with what I get get for thumbv7-apple-ios, because it 
 should be similar, but it passes std.zip.

 It could be I just haven't stumbled into the combination you 
 are using.

llvm triple and other flags used:

--output-o -w -d -mtriple=armv7-none-linux-androideabi 
-relocation-model=pic -O3 -release -unittest

ldc was compiled against a locally-compiled llvm 3.6 from the 
Android NDK repo, which has some modifications:

https://android.googlesource.com/toolchain/llvm/+log/release_36

Yes, I'm using the merge-2.067 branch of ldc as of commit 
122ea372d from a couple weeks ago, with inlining turned off, as 
noted in the earlier EH thread.  I would be curious to see what 
llvm IR and ARM assembly is generated for you on iOS and what 
other flags you're using.  The actual function shown is from the 
std.random module.

Jul 16 2015

"Dan Olson" <zans4cans yahoo.com> writes:

On Thursday, 16 July 2015 at 17:07:33 UTC, Joakim wrote:
 On Thursday, 16 July 2015 at 16:40:31 UTC, Dan Olson wrote:
 [...]

 llvm triple and other flags used:

 --output-o -w -d -mtriple=armv7-none-linux-androideabi 
 -relocation-model=pic -O3 -release -unittest

 ldc was compiled against a locally-compiled llvm 3.6 from the 
 Android NDK repo, which has some modifications:

 https://android.googlesource.com/toolchain/llvm/+log/release_36

 Yes, I'm using the merge-2.067 branch of ldc as of commit 
 122ea372d from a couple weeks ago, with inlining turned off, as 
 noted in the earlier EH thread.  I would be curious to see what 
 llvm IR and ARM assembly is generated for you on iOS and what 
 other flags you're using.  The actual function shown is from 
 the std.random module.

And the nice thing about LLVM / LDC is that I can specify your 
triple and other options with one compiler.  I love that about 
LLVM: all cross compilers in one.

Jul 16 2015

Dan Olson <gorox comcast.net> writes:

"Joakim" <dlang joakim.fea.st> writes:

 Alright, I've been stepping through some failing tests: one that seems
 to be bad codegen is that many calls to std.algorithm.iteration.map
 seem to fail, for example, when running the tests for std.zip.  One of
 the std.zip tests calls std.random.uniform, which then gets its second
 parameter stomped by map from rndGen().

Joakim, I have a hunch.  Can you try changing your abi-android.cpp so
passByVal() returns false always?  That is the only difference in our
generated IR and in my experience byval causes incorrect code on ARM.

As an experiment, I changed my passByVal() back to return true for
Tstruct (the LDC default), and then my IR is identical to yours and my
ARM code, though slightly different, also is clobbering a saved reg on
the stack.

Anyway, I pasted my GOOD assembly below yours for comparison.

 which gets translated to the following ARM assembly:

 _D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult:
         .fnstart
 .Leh_func_begin94:


         .save   {r4, lr}
         push    {r4, lr}


         mov     r4, r0


         stmib   sp, {r0, r1}


         bl
 _D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(PLT)
         ldmib   sp, {r0, r1}
         stm     r4, {r0, r1}

         pop     {r4, lr}

         bx      lr

Better assembly produced with -output-s -w -d -mtriple=armv7-apple-ios
-relocation-model=pic -O3 -release -unittest -disable-inlining (needed
to match up with your change).  It is similar but uses stm (e.g stmia)
instead of stmib.

__D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult:
	push	{r4, r7, lr}


	mov	r4, r0

	stm	sp, {r0, r1}
	mov	r0, sp
	mov	r1, r2
	bl	__D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult
	ldm	sp, {r0, r1}
	strd	r0, r1, [r4]

	pop	{r4, r7, pc}

I wanted to use your triple with OS of android, but std/random.d needs
Android specific D code from elsewhere, so compile failed.

As aside, with inlining enabled, that function shrinks to 3 instructions:

__D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult:
	mov	r3, r1
	strd	r2, r3, [r0]
	bx	lr

Hope it helps

Jul 17 2015

"Joakim" <dlang joakim.fea.st> writes:

On Friday, 17 July 2015 at 08:58:30 UTC, Dan Olson wrote:
 "Joakim" <dlang joakim.fea.st> writes:

 Alright, I've been stepping through some failing tests: one 
 that seems to be bad codegen is that many calls to 
 std.algorithm.iteration.map seem to fail, for example, when 
 running the tests for std.zip.  One of the std.zip tests calls 
 std.random.uniform, which then gets its second parameter 
 stomped by map from rndGen().

 Joakim, I have a hunch.  Can you try changing your 
 abi-android.cpp so passByVal() returns false always?  That is 
 the only difference in our generated IR and in my experience 
 byval causes incorrect code on ARM.

 As an experiment, I changed my passByVal() back to return true 
 for Tstruct (the LDC default), and then my IR is identical to 
 yours and my ARM code, though slightly different, also is 
 clobbering a saved reg on the stack.

Yep, that fixed it, std.zip passes its tests now. :) I'll run the 
rest of the tests and report back, thanks for your help.

Jul 17 2015

Dan Olson <gorox comcast.net> writes:

"Joakim" <dlang joakim.fea.st> writes:
 Yep, that fixed it, std.zip passes its tests now. :) I'll run the rest
 of the tests and report back, thanks for your help.

:-)

Jul 17 2015

"Joakim" <dlang joakim.fea.st> writes:

On Friday, 17 July 2015 at 16:33:07 UTC, Dan Olson wrote:
 "Joakim" <dlang joakim.fea.st> writes:
 Yep, that fixed it, std.zip passes its tests now. :) I'll run 
 the rest of the tests and report back, thanks for your help.

 :-)

Around 30 more modules from std.phobos now pass their tests with 
that change, most of which used to segfault somewhere in a test 
before.  core.time from druntime also passes all its tests now.  
That one change made a big difference, thanks for pointing it out.

Now to fix the rest.  Many are related to the real type, which I 
haven't patched for 64-bit yet.

Jul 17 2015

"Joakim" <dlang joakim.fea.st> writes:

On Friday, 17 July 2015 at 17:57:01 UTC, Joakim wrote:
 On Friday, 17 July 2015 at 16:33:07 UTC, Dan Olson wrote:
 "Joakim" <dlang joakim.fea.st> writes:
 Yep, that fixed it, std.zip passes its tests now. :) I'll run 
 the rest of the tests and report back, thanks for your help.

 :-)

 Around 30 more modules from std.phobos now pass their tests 
 with that change, most of which used to segfault somewhere in a 
 test before.  core.time from druntime also passes all its tests 
 now.  That one change made a big difference, thanks for 
 pointing it out.

 Now to fix the rest.  Many are related to the real type, which 
 I haven't patched for 64-bit yet.

Dan, as I said in the main forum, most of the druntime/phobos 
modules' tests pass on Android/ARM now.  However, I had to turn 
off optimizations for a handful of modules, have you had to do 
the same?  As noted before, one optimization pass was screwing up 
ldc.eh.  I also had to turn off all optimizations, ie -O0, for 
std.random and std.stream to get their unit tests to pass.  For 
one phobos module, std.regex, turning off all optimizations for 
druntime's core.memory got the regex tests to pass.

Other than those four modules, everything is compiled with -O3 
and seems to work, except for the two modules that still 
segfault, std.net.isemail and std.regex.internal.tests, where 
compiling those modules with -O0 doesn't make a difference.  I 
haven't spent any time tracking down if other optimized modules 
might be causing those two to segfault, as was the case with 
std.regex and core.memory, or exactly which llvm optimizations 
are causing problems with core.memory, std.random, and std.stream.

Are you seeing similar results with your 2.067 branch of ldc with 
iOS?  Since ARM codegen should be similar for the two, I wonder 
if I'm the only one seeing this.

Aug 03 2015

Dan Olson <gorox comcast.net> writes:

"Joakim" <dlang joakim.fea.st> writes:

 On Friday, 17 July 2015 at 17:57:01 UTC, Joakim wrote:
 On Friday, 17 July 2015 at 16:33:07 UTC, Dan Olson wrote:
 "Joakim" <dlang joakim.fea.st> writes:


 Dan, as I said in the main forum, most of the druntime/phobos modules'
 tests pass on Android/ARM now.  However, I had to turn off
 optimizations for a handful of modules, have you had to do the same?
 As noted before, one optimization pass was screwing up ldc.eh.  I also
 had to turn off all optimizations, ie -O0, for std.random and
 std.stream to get their unit tests to pass.  For one phobos module,
 std.regex, turning off all optimizations for druntime's core.memory
 got the regex tests to pass.

 Other than those four modules, everything is compiled with -O3 and
 seems to work, except for the two modules that still segfault,
 std.net.isemail and std.regex.internal.tests, where compiling those
 modules with -O0 doesn't make a difference.  I haven't spent any time
 tracking down if other optimized modules might be causing those two to
 segfault, as was the case with std.regex and core.memory, or exactly
 which llvm optimizations are causing problems with core.memory,
 std.random, and std.stream.

 Are you seeing similar results with your 2.067 branch of ldc with iOS?
 Since ARM codegen should be similar for the two, I wonder if I'm the
 only one seeing this.

Hi Joakim - I have -O3 optimization on for all modules in the release
build and I think I have tested with most of the other -O levels.  I did
run into an alignment error with neon instructions in std.random
unittest with LLVM 3.5.1 and 0.15.1 (2.066) and eventually disabled neon
(-mattr=-neon) during optimization as a workaround.  I have not tried
reenabling neon for merge-2.067 and LLVM 3.6.

The problem was a vst1.64 instruction requesting 128-bit alignment when
the data was only 64-bit aligned:

0x52197a:  vst1.64 {d16, d17}, [r5:128]  // r5 addr not properly aligned

If it can happen in std.random it could happen elsewhere.

You might try -mattr=-neon and see what happens.

One other difference to think about is that Android is AAPCS and iOS is
a variant of the older APCS.  LLVM has some different paths for these.
I studied it some a few weeks ago as I made the extern(C) ABI compatible
with clang.
-- 
Dan

Aug 03 2015

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - Android/ARM codegen