www.digitalmars.com         C & C++   DMDScript  

D.gnu - ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns)

reply Mike <none none.com> writes:
My stm32 demo has now been updated and working with GDC/GCC 
7.1.0.  Thanks for all your improvements.

However, I'm getting broken binaries with -O2 and -O3.  I've 
nailed the culprit down to -fschedule-insns (i.e. if I add 
-fno-schedule-insns to -O2 or -O3, the binary works fine).

I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared 
them, but they were quite different all the way through.  No only 
because of address locations, but also different registers and 



Is there anything I can do to provide more actionable information 
to help identify the underlying cause?

Mike
Jul 21 2017
next sibling parent reply Mike <none none.com> writes:
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 I'm getting broken binaries with -O2 and -O3.  I've nailed the 
 culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared 
 them, but they were quite different all the way through.  No 
 only because of address locations, but also different registers 


Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
Jul 21 2017
parent reply Timo Sintonen <t.sintonen luukku.com> writes:
On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 I'm getting broken binaries with -O2 and -O3.  I've nailed the 
 culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and 
 compared them, but they were quite different all the way 
 through.  No only because of address locations, but also 
 different registers and even different opcodes.  (e.g. 'str 

Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. After this the test inside while may also be optimized outside. Then the whole while thing which now has nothing to test and nothing inside the body can be optimized out. I got this working because 'shared' meant 'volatile' in gdc but this is not true any more. I did not yet look how you define your data type and how you access the data but it seems the compiler thinks it is an ordinary variable. I made custom Volatile data type that uses those new compiler intrinsics and my sample program seems to work with gdc 7. The only thing that does not work is exceptions. The exception code in runtime has changed a lot so I do not know whether it should work or not.
Jul 22 2017
parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 22 Jul 2017 07:07:33 +0000
schrieb Timo Sintonen <t.sintonen luukku.com>:

 On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
  
 I'm getting broken binaries with -O2 and -O3.  I've nailed the 
 culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and 
 compared them, but they were quite different all the way 
 through.  No only because of address locations, but also 
 different registers and even different opcodes.  (e.g. 'str 

Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]
There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated. -- Johannes
Jul 22 2017
parent "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 22 July 2017 at 10:09, Johannes Pfau via D.gnu <d.gnu puremagic.com> wrote:
 Am Sat, 22 Jul 2017 07:07:33 +0000
 schrieb Timo Sintonen <t.sintonen luukku.com>:

 On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:
 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 I'm getting broken binaries with -O2 and -O3.  I've nailed the
 culprit down to -fschedule-insns (i.e. if I add
 -fno-schedule-insns to -O2 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and
 compared them, but they were quite different all the way
 through.  No only because of address locations, but also
 different registers and even different opcodes.  (e.g. 'str

Interestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]
There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated.
While I'm confident that the current implementation of volatileLoad/volatileStore should prevent such reordering, inserting a memory barrier before generating our volatileLoad/Store's can also be done to really hammer it in to the gcc optimizer. Iain.
Jul 22 2017
prev sibling next sibling parent reply Mike <none none.com> writes:
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:

 However, I'm getting broken binaries with -O2 and -O3.  I've 
 nailed the culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).
I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit. Here's the D code ----------------- // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false; // This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808 with(RCC.CFGR) { setValue !( MCO2, 0 , MCO2PRE, 0 , MCO1PRE, 0 , I2SSRC, 0 , MCO1, 0 , RTCPRE, 0 , HPRE, 0b000 , PPRE2, 0b100 , PPRE1, 0b101 , SW, 0 )(); } And here's the dis-assembly --------------------------_ <hardwareInit+0x104>) 0x40023808 - RCC.CFGR <hardwareInit+0x108>) 0x40023808 - RCC.CR.HSEBYP ; Read-modify of RCC.CFGR RCC.CR.HSEBYP RCC.CFGR RCC.CR.HSEBYP ... 8000c50: .word 0x40023808 8000c54: .word 0x42470048 You can see that at 8000ba2 and 8000ba4 RCC.CFGR is written first. But in the D code RCC.CR.HSEBYP should be written first. Mike
Jul 22 2017
parent Johannes Pfau <nospam example.com> writes:
Am Sat, 22 Jul 2017 08:11:28 +0000
schrieb Mike <none none.com>:

 On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:
 
 However, I'm getting broken binaries with -O2 and -O3.  I've 
 nailed the culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).  
I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit. Here's the D code ----------------- // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false; // This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808 with(RCC.CFGR) { setValue !( MCO2, 0 , MCO2PRE, 0 , MCO1PRE, 0 , I2SSRC, 0 , MCO1, 0 , RTCPRE, 0 , HPRE, 0b000 , PPRE2, 0b100 , PPRE1, 0b101 , SW, 0 )(); }
I guess this doesn't happen for a reduced example directly using volatileLoad/volatileStore? If you could provide such a reduced example that'd be very useful. -- Johannes
Jul 22 2017
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Fri, 21 Jul 2017 23:44:53 +0000
schrieb Mike <none none.com>:

 My stm32 demo has now been updated and working with GDC/GCC 
 7.1.0.  Thanks for all your improvements.
 
 However, I'm getting broken binaries with -O2 and -O3.  I've 
 nailed the culprit down to -fschedule-insns (i.e. if I add 
 -fno-schedule-insns to -O2 or -O3, the binary works fine).
 
 I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared 
 them, but they were quite different all the way through.  No only 
 because of address locations, but also different registers and 


This can unfortunately happen if scheduling allows further optimizations. Then the generated code might look nothing like the original code. It's also possible that this is only caused by a combination of optimization passes, I guess it doesn't happen using -fschedule-insns without other -O flags?
 
 Is there anything I can do to provide more actionable information 
 to help identify the underlying cause?
As I don't have an ARM bare metal compiler ready to test: The output of -fdump-tree-all and -fdump-rtl-all would be useful. The tree output is usually quite readable, rtl not so much... There might be some more useful switches on https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html -- Johannes
Jul 22 2017
prev sibling parent reply "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 22 July 2017 at 01:44, Mike via D.gnu <d.gnu puremagic.com> wrote:
 My stm32 demo has now been updated and working with GDC/GCC 7.1.0.  Thanks
 for all your improvements.

 However, I'm getting broken binaries with -O2 and -O3.  I've nailed the
 culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2
 or -O3, the binary works fine).

 I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but
 they were quite different all the way through.  No only because of address
 locations, but also different registers and even different opcodes.  (e.g.


 Is there anything I can do to provide more actionable information to help
 identify the underlying cause?

 Mike
Hi Mike, Is the stm discovery repository up to date on Github? https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type= Those should probably be volatileLoad, as they look to be used by setValue(). Iain.
Jul 22 2017
parent reply Mike <none none.com> writes:
On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:

 https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type=

 Those should probably be volatileLoad, as they look to be used 
 by setValue().
I am such an idiot. Problem solved. Thank you, and I'm terribly sorry for the noise. Mike
Jul 22 2017
parent "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 22 July 2017 at 11:22, Mike via D.gnu <d.gnu puremagic.com> wrote:
 On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:

 https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type=

 Those should probably be volatileLoad, as they look to be used by
 setValue().
I am such an idiot. Problem solved. Thank you, and I'm terribly sorry for the noise.
Fantastic, and don't worry about it. It's always welcome to have more people raising issues with gdc's codegen. Iain.
Jul 22 2017