D.gnu - ARM Cortex-M Broken Binaries with -O2 and -O3 (-fschedule-insns)
- Mike (13/13) Jul 21 2017 My stm32 demo has now been updated and working with GDC/GCC
- Mike (61/69) Jul 21 2017 Interestingly, I added a stategically placed `asm { "nop"; }` and
- Timo Sintonen (17/87) Jul 22 2017 A quick answer without looking your code: this never worked
- Johannes Pfau (6/91) Jul 22 2017 There's a small thinko here ;-) In Mike's code, value is a property
- Iain Buclaw via D.gnu (6/96) Jul 22 2017 While I'm confident that the current implementation of
- Mike (50/53) Jul 22 2017 I've confirmed that -fschedule-insns is reordering register
- Johannes Pfau (6/42) Jul 22 2017 I guess this doesn't happen for a reduced example directly using
- Johannes Pfau (13/28) Jul 22 2017 This can unfortunately happen if scheduling allows further
- Iain Buclaw via D.gnu (6/18) Jul 22 2017 Hi Mike,
- Mike (4/7) Jul 22 2017 I am such an idiot. Problem solved. Thank you, and I'm terribly
- Iain Buclaw via D.gnu (4/12) Jul 22 2017 Fantastic, and don't worry about it. It's always welcome to have more
My stm32 demo has now been updated and working with GDC/GCC 7.1.0. Thanks for all your improvements. However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and Is there anything I can do to provide more actionable information to help identify the underlying cause? Mike
Jul 21 2017
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registersInterestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
Jul 21 2017
On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. After this the test inside while may also be optimized outside. Then the whole while thing which now has nothing to test and nothing inside the body can be optimized out. I got this working because 'shared' meant 'volatile' in gdc but this is not true any more. I did not yet look how you define your data type and how you access the data but it seems the compiler thinks it is an ordinary variable. I made custom Volatile data type that uses those new compiler intrinsics and my sample program seems to work with gdc 7. The only thing that does not work is exceptions. The exception code in runtime has changed a lot so I do not know whether it should work or not.I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'strInterestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
Jul 22 2017
Am Sat, 22 Jul 2017 07:07:33 +0000 schrieb Timo Sintonen <t.sintonen luukku.com>:On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated. -- JohannesOn Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'strInterestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
Jul 22 2017
On 22 July 2017 at 10:09, Johannes Pfau via D.gnu <d.gnu puremagic.com> wrote:Am Sat, 22 Jul 2017 07:07:33 +0000 schrieb Timo Sintonen <t.sintonen luukku.com>:While I'm confident that the current implementation of volatileLoad/volatileStore should prevent such reordering, inserting a memory barrier before generating our volatileLoad/Store's can also be done to really hammer it in to the gcc optimizer. Iain.On Saturday, 22 July 2017 at 01:11:02 UTC, Mike wrote:There's a small thinko here ;-) In Mike's code, value is a property using volatileLoad/volatileStore internally. So the real problem is likely more complicated.On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:A quick answer without looking your code: this never worked properly because the compiler thinks the value is not changing and may be optimized out of the loop. [...]I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. 'strInterestingly, I added a stategically placed `asm { "nop"; }` and my binary was able to execute further. Comparing the disassembly of the function I modified still showed quite a significant difference. Working Binary ------------- ;------------------------------------------------------- nop ; My stategically placed nop ;------------------------------------------------------- Not Working Binary ------------------ By "Not Working" I mean this code gets stuck in the while loop PWR.CR.ODEN.value = true; while(!PWR.CSR.ODRDY.value) { } This is simply setting the "Overdrive Enable" register on the power control peripheral of my hardware. The documentation states: To set or reset the ODEN bit, the HSI or HSE must be selected as system clock. I'm setting the HSI prior to setting ODEN, but it appears that maybe the compiler is reordering the instructions. I still need to investigate that further, but hopefully that provides a little more insight. Mike
Jul 22 2017
On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit. Here's the D code ----------------- // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false; // This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808 with(RCC.CFGR) { setValue !( MCO2, 0 , MCO2PRE, 0 , MCO1PRE, 0 , I2SSRC, 0 , MCO1, 0 , RTCPRE, 0 , HPRE, 0b000 , PPRE2, 0b100 , PPRE1, 0b101 , SW, 0 )(); } And here's the dis-assembly --------------------------_ <hardwareInit+0x104>) 0x40023808 - RCC.CFGR <hardwareInit+0x108>) 0x40023808 - RCC.CR.HSEBYP ; Read-modify of RCC.CFGR RCC.CR.HSEBYP RCC.CFGR RCC.CR.HSEBYP ... 8000c50: .word 0x40023808 8000c54: .word 0x42470048 You can see that at 8000ba2 and 8000ba4 RCC.CFGR is written first. But in the D code RCC.CR.HSEBYP should be written first. Mike
Jul 22 2017
Am Sat, 22 Jul 2017 08:11:28 +0000 schrieb Mike <none none.com>:On Friday, 21 July 2017 at 23:44:53 UTC, Mike wrote:I guess this doesn't happen for a reduced example directly using volatileLoad/volatileStore? If you could provide such a reduced example that'd be very useful. -- JohannesHowever, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine).I've confirmed that -fschedule-insns is reordering register access even though they are being accessed with volatileLoad/Store. Read the comments in the following code for understanding. FYI A bit-banded address is a 32-bit address to a single bit. Here's the D code ----------------- // This is a single atomic store to bit-banded address 0x42470048 RCC.CR.HSEBYP.value = false; // This is a single read-modify-write to non-bit-banded 32-bit address 0x40023808 with(RCC.CFGR) { setValue !( MCO2, 0 , MCO2PRE, 0 , MCO1PRE, 0 , I2SSRC, 0 , MCO1, 0 , RTCPRE, 0 , HPRE, 0b000 , PPRE2, 0b100 , PPRE1, 0b101 , SW, 0 )(); }
Jul 22 2017
Am Fri, 21 Jul 2017 23:44:53 +0000 schrieb Mike <none none.com>:My stm32 demo has now been updated and working with GDC/GCC 7.1.0. Thanks for all your improvements. However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers andThis can unfortunately happen if scheduling allows further optimizations. Then the generated code might look nothing like the original code. It's also possible that this is only caused by a combination of optimization passes, I guess it doesn't happen using -fschedule-insns without other -O flags?Is there anything I can do to provide more actionable information to help identify the underlying cause?As I don't have an ARM bare metal compiler ready to test: The output of -fdump-tree-all and -fdump-rtl-all would be useful. The tree output is usually quite readable, rtl not so much... There might be some more useful switches on https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html -- Johannes
Jul 22 2017
On 22 July 2017 at 01:44, Mike via D.gnu <d.gnu puremagic.com> wrote:My stm32 demo has now been updated and working with GDC/GCC 7.1.0. Thanks for all your improvements. However, I'm getting broken binaries with -O2 and -O3. I've nailed the culprit down to -fschedule-insns (i.e. if I add -fno-schedule-insns to -O2 or -O3, the binary works fine). I disassembled '-O2' and '-O2 -fno-schedule-insns' and compared them, but they were quite different all the way through. No only because of address locations, but also different registers and even different opcodes. (e.g. Is there anything I can do to provide more actionable information to help identify the underlying cause? MikeHi Mike, Is the stm discovery repository up to date on Github? https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type= Those should probably be volatileLoad, as they look to be used by setValue(). Iain.
Jul 22 2017
On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type= Those should probably be volatileLoad, as they look to be used by setValue().I am such an idiot. Problem solved. Thank you, and I'm terribly sorry for the noise. Mike
Jul 22 2017
On 22 July 2017 at 11:22, Mike via D.gnu <d.gnu puremagic.com> wrote:On Saturday, 22 July 2017 at 09:07:31 UTC, Iain Buclaw wrote:Fantastic, and don't worry about it. It's always welcome to have more people raising issues with gdc's codegen. Iain.https://github.com/JinShil/stm32f42_discovery_demo/search?utf8=%E2%9C%93&q=cast%28shared&type= Those should probably be volatileLoad, as they look to be used by setValue().I am such an idiot. Problem solved. Thank you, and I'm terribly sorry for the noise.
Jul 22 2017