www.digitalmars.com         C & C++   DMDScript  

D.gnu - ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns)

reply Mike <none none.com> writes:
My stm32 demo has now been updated and working with GDC/GCC 
7.1.0.  Thanks for all your improvements.

However, I'm getting broken binaries with -O2 and -O3.  I've 
nailed the culprit down to -fschedule-insns (i.e. if I add 
-fno-schedule-insns to -O2 or -O3, the binary works fine).

I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared 
them, but they were quite different all the way through.  No only 
because of address locations, but also different registers and 
even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, 
r2, [sp, #8]')

Is there anything I can do to provide more actionable information 
to help identify the underlying cause?

Mike
Jul 21
next sibling parent reply Mike <none none.com> writes:
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 I'm getting broken binaries with -O2 and -O3.  I've nailed the 
 culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared 
 them, but they were quite different all the way through.  No 
 only because of address locations, but also different registers 
 and even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd 
 r1, r2, [sp, #8]')
Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>) ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r2, #0] and.w r3, r3, #780 ; 0x30c orr.w r3, r3, #37888 ; 0x9400 movs r0, #0 str r3, [r2, #0] strb r0, [r1, #0] ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>) ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>) ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>) ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>) movs r1, #1 strb r1, [r3, #0] ldr r3, [r0, #0] orr.w r3, r3, #49152 ; 0xc000 str r3, [r0, #0] strb r1, [r4, #0] Not Working Binary ------------------ ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>) ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>) ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r1, #0] ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>) movs r5, #0 strb r5, [r0, #0] movs r0, #1 strb r0, [r2, #0] ldr r2, [r4, #0] ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>) orr.w r2, r2, #49152 ; 0xc000 and.w r3, r3, #780 ; 0x30c str r2, [r4, #0] orr.w r3, r3, #37888 ; 0x9400 ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>) strb r0, [r5, #0] str r3, [r1, #0] By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
Jul 21
parent reply Timo Sintonen <t.sintonen luukku.com> writes:
On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 I'm getting broken binaries with -O2 and -O3.  I've nailed the 
 culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and 
 compared them, but they were quite different all the way 
 through.  No only because of address locations, but also 
 different registers and even different opcodes.  (e.g. 'str 
 r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>) ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r2, #0] and.w r3, r3, #780 ; 0x30c orr.w r3, r3, #37888 ; 0x9400 movs r0, #0 str r3, [r2, #0] strb r0, [r1, #0] ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>) ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>) ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>) ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>) movs r1, #1 strb r1, [r3, #0] ldr r3, [r0, #0] orr.w r3, r3, #49152 ; 0xc000 str r3, [r0, #0] strb r1, [r4, #0] Not Working Binary ------------------ ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>) ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>) ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r1, #0] ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>) movs r5, #0 strb r5, [r0, #0] movs r0, #1 strb r0, [r2, #0] ldr r2, [r4, #0] ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>) orr.w r2, r2, #49152 ; 0xc000 and.w r3, r3, #780 ; 0x30c str r2, [r4, #0] orr.w r3, r3, #37888 ; 0x9400 ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>) strb r0, [r5, #0] str r3, [r1, #0] By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. After this the test inside while may also be optimized outside. Then the whole while thing which now has nothing to test and nothing inside the body can be optimized out. I got this working because 'shared' meant 'volatile' in gdc but this is not true any more. I did not yet look how you define your data type and how you access the data but it seems the compiler thinks it is an ordinary variable. I made custom Volatile data type that uses those new compiler intrinsics and my sample program seems to work with gdc 7. The only thing that does not work is exceptions. The exception code in runtime has changed a lot so I do not know whether it should work or not.
Jul 22
parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 22 Jul 2017 07:07:33 +0000
schrieb Timo Sintonen <t.sintonen luukku.com>:

 On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
  
 I'm getting broken binaries with -O2 and -O3.  I've nailed the 
 culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and 
 compared them, but they were quite different all the way 
 through.  No only because of address locations, but also 
 different registers and even different opcodes.  (e.g. 'str 
 r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')  
Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>) ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r2, #0] and.w r3, r3, #780 ; 0x30c orr.w r3, r3, #37888 ; 0x9400 movs r0, #0 str r3, [r2, #0] strb r0, [r1, #0] ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>) ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>) ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>) ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>) movs r1, #1 strb r1, [r3, #0] ldr r3, [r0, #0] orr.w r3, r3, #49152 ; 0xc000 str r3, [r0, #0] strb r1, [r4, #0] Not Working Binary ------------------ ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>) ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>) ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r1, #0] ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>) movs r5, #0 strb r5, [r0, #0] movs r0, #1 strb r0, [r2, #0] ldr r2, [r4, #0] ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>) orr.w r2, r2, #49152 ; 0xc000 and.w r3, r3, #780 ; 0x30c str r2, [r4, #0] orr.w r3, r3, #37888 ; 0x9400 ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>) strb r0, [r5, #0] str r3, [r1, #0] By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]
There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated. -- Johannes
Jul 22
parent "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 22 July 2017 at 10:09, Johannes Pfau via D.gnu <d.gnu puremagic.com> wrote:
 Am Sat, 22 Jul 2017 07:07:33 +0000
 schrieb Timo Sintonen <t.sintonen luukku.com>:

 On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 I'm getting broken binaries with -O2 and -O3.  I've nailed the
 culprit down to -fschedule-insns (i.e. if I add
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and
 compared them, but they were quite different all the way
 through.  No only because of address locations, but also
 different registers and even different opcodes.  (e.g. 'str
 r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')
Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>) ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r2, #0] and.w r3, r3, #780 ; 0x30c orr.w r3, r3, #37888 ; 0x9400 movs r0, #0 str r3, [r2, #0] strb r0, [r1, #0] ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- ldr r3, [pc, #172] ; (8000c58 <hardwareInit+0x10c>) ldr r0, [pc, #176] ; (8000c5c <hardwareInit+0x110>) ldr r4, [pc, #176] ; (8000c60 <hardwareInit+0x114>) ldr r2, [pc, #180] ; (8000c64 <hardwareInit+0x118>) movs r1, #1 strb r1, [r3, #0] ldr r3, [r0, #0] orr.w r3, r3, #49152 ; 0xc000 str r3, [r0, #0] strb r1, [r4, #0] Not Working Binary ------------------ ldr r0, [pc, #184] ; (8000c4c <hardwareInit+0x100>) ldr r1, [pc, #184] ; (8000c50 <hardwareInit+0x104>) ldr r2, [pc, #188] ; (8000c54 <hardwareInit+0x108>) ldr r3, [r1, #0] ldr r4, [pc, #188] ; (8000c58 <hardwareInit+0x10c>) movs r5, #0 strb r5, [r0, #0] movs r0, #1 strb r0, [r2, #0] ldr r2, [r4, #0] ldr r5, [pc, #180] ; (8000c5c <hardwareInit+0x110>) orr.w r2, r2, #49152 ; 0xc000 and.w r3, r3, #780 ; 0x30c str r2, [r4, #0] orr.w r3, r3, #37888 ; 0x9400 ldr r2, [pc, #168] ; (8000c60 <hardwareInit+0x114>) strb r0, [r5, #0] str r3, [r1, #0] By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]
There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated.
While I'm confident that the current implementation of volatileLoad/volatileStore should prevent such reordering, inserting a memory barrier before generating our volatileLoad/Store's can also be done to really hammer it in to the gcc optimizer. Iain.
Jul 22
prev sibling next sibling parent reply Mike <none none.com> writes:
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 However, I'm getting broken binaries with -O2 and -O3.  I've 
 nailed the culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).
I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit. Here's the D code ----------------- // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false; // This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808 with(RCC.CFGR) { setValue !( MCO2, 0 , MCO2PRE, 0 , MCO1PRE, 0 , I2SSRC, 0 , MCO1, 0 , RTCPRE, 0 , HPRE, 0b000 , PPRE2, 0b100 , PPRE1, 0b101 , SW, 0 )(); } And here's the dis-assembly --------------------------_ 8000b92: ldr r2, [pc, #188] ; (8000c50 <hardwareInit+0x104>) 0x40023808 - RCC.CFGR 8000b94: ldr r1, [pc, #188] ; (8000c54 <hardwareInit+0x108>) 0x40023808 - RCC.CR.HSEBYP ; Read-modify of RCC.CFGR 8000b96: ldr r3, [r2, #0] 8000b98: and.w r3, r3, #780 ; 0x30c 8000b9c: orr.w r3, r3, #37888 ; 0x9400 8000ba0: movs r0, #0 ; #0 is `false` value for RCC.CR.HSEBYP 8000ba2: str r3, [r2, #0] ; This is the store to RCC.CFGR 8000ba4: strb r0, [r1, #0] ; This is the store to RCC.CR.HSEBYP ... 8000c50: .word 0x40023808 8000c54: .word 0x42470048 You can see that at 8000ba2 and 8000ba4 RCC.CFGR is written first. But in the D code RCC.CR.HSEBYP should be written first. Mike
Jul 22
parent Johannes Pfau <nospam example.com> writes:
Am Sat, 22 Jul 2017 08:11:28 +0000
schrieb Mike <none none.com>:

 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
 
 However, I'm getting broken binaries with -O2 and -O3.  I've 
 nailed the culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).  
I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit. Here's the D code ----------------- // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false; // This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808 with(RCC.CFGR) { setValue !( MCO2, 0 , MCO2PRE, 0 , MCO1PRE, 0 , I2SSRC, 0 , MCO1, 0 , RTCPRE, 0 , HPRE, 0b000 , PPRE2, 0b100 , PPRE1, 0b101 , SW, 0 )(); }
I guess this doesn't happen for a reduced example directly using volatileLoad/volatileStore? If you could provide such a reduced example that'd be very useful. -- Johannes
Jul 22
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Fri, 21 Jul 2017 23:44:53 +0000
schrieb Mike <none none.com>:

 My stm32 demo has now been updated and working with GDC/GCC 
 7.1.0.  Thanks for all your improvements.
 
 However, I'm getting broken binaries with -O2 and -O3.  I've 
 nailed the culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).
 
 I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared 
 them, but they were quite different all the way through.  No only 
 because of address locations, but also different registers and 
 even different opcodes.  (e.g. 'str r2, [sp, #12]' vs 'strd r1, 
 r2, [sp, #8]')
This can unfortunately happen if scheduling allows further optimizations. Then the generated code might look nothing like the original code. It's also possible that this is only caused by a combination of optimization passes, I guess it doesn't happen using -fschedule-insns without other -O flags?
 
 Is there anything I can do to provide more actionable information 
 to help identify the underlying cause?
As I don't have an ARM bare metal compiler ready to test: The output of -fdump-tree-all and -fdump-rtl-all would be useful. The tree output is usually quite readable, rtl not so much... There might be some more useful switches on https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html -- Johannes
Jul 22
prev sibling parent reply "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 22 July 2017 at 01:44, Mike via D.gnu <d.gnu puremagic.com> wrote:
 My stm32 demo has now been updated and working with GDC/GCC 7.1.0.  Thanks
 for all your improvements.

 However, I'm getting broken binaries with -O2 and -O3.  I've nailed the
 culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2
 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but
 they were quite different all the way through.  No only because of address
 locations, but also different registers and even different opcodes.  (e.g.
 'str r2, [sp, #12]' vs 'strd r1, r2, [sp, #8]')

 Is there anything I can do to provide more actionable information to help
 identify the underlying cause?

 Mike
Hi Mike, Is the stm discovery repository up to date on Github? https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type= Those should probably be volatileLoad, as they look to be used by setValue(). Iain.
Jul 22
parent reply Mike <none none.com> writes:
On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:

 https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type=

 Those should probably be volatileLoad, as they look to be used 
 by setValue().
I am such an idiot. Problem solved. Thank you, and I'm terribly sorry for the noise. Mike
Jul 22
parent "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 22 July 2017 at 11:22, Mike via D.gnu <d.gnu puremagic.com> wrote:
 On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:

 https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type=

 Those should probably be volatileLoad, as they look to be used by
 setValue().
I am such an idiot. Problem solved. Thank you, and I'm terribly sorry for the noise.
Fantastic, and don't worry about it. It's always welcome to have more people raising issues with gdc's codegen. Iain.
Jul 22