www.digitalmars.com         C & C++   DMDScript  

D.gnu - Using Link Time Optimization (LTO)

reply "Mike" <none none.com> writes:
Hello,

I have some code generating the following assembly:
{OnReset}:
  8000010:       b508            push    {r3, lr}
  8000012:       20ff            movs    r0, #255        ; 0xff
  8000014:       f000 f828       bl      8000068 <{MyFunction}>
  8000018:       e7fe            b.n     8000018 <{OnReset}+0x8>
  800001a:       bf00            nop

08000068
{MyFunction}:
  8000068:       f44f 5380       mov.w   r3, #4096       ; 0x1000
  800006c:       f2c2 0300       movt    r3, #8192       ; 0x2000
  8000070:       7018            strb    r0, [r3, #0]
  8000072:       4770            bx      lr

"MyFunction" and "OnReset" are in different source files and 
therefore compiled to different object files.  I would like to 
get "MyFunction" fully inlined to "OnReset" to remove the extra 
branch instructions (bl and bx).

It's my understanding that because the two functions are compiled 
into separate object files, this must be done using LTO.  If I 
compile them into the same object file, I get the full inlining 
I'm looking for, but that's not going to scale well for my 
project.

** Beautiful, isn't it? **
{OnReset}:
  8000010:       f44f 5380       mov.w   r3, #4096       ; 0x1000
  8000014:       f2c2 0300       movt    r3, #8192       ; 0x2000
  8000018:       22ff            movs    r2, #255        ; 0xff
  800001a:       701a            strb    r2, [r3, #0]
  800001c:       e7fe            b.n     800001c <{OnReset}+0xc>
  800001e:       bf00            nop


I've tried adding -flto to my compiler and linker flags and a 
number of other things without success.  The compiler seems to 
generate extra information in my object files, but the linker 
doesn't seem to do the optimization.  I don't get any ICEs, 
however, as stated in Bug 61 and 88.  I just don't get the result 
I'm after.

Here are my compiler commands:
arm-none-eabi-gdc -mthumb -mcpu=cortex-m4 -fno-emit-moduleinfo 
-ffunction-sections -fdata-sections -O3 -c -flto ...
arm-none-eabi-ld -T link/link.ld -Map binary/memory.map 
--gc-sections -flto ...

I'm using my arm-none-eabi cross toolchain built from the GDC 4.8 
branch.  I tried adding --enable-lto to my toolchain's configure, 
but that had no effect.  It's my understanding that it's enabled 
by default anyway.

Does anyone know how I can get this level of inlining without 
compiling all my source into one object file?

Thanks for any help,
Mike
Mar 22 2014
next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 23 Mar 2014 02:14:20 +0000
schrieb "Mike" <none none.com>:

 Hello,
 
 I have some code generating the following assembly:
 {OnReset}:
   8000010:       b508            push    {r3, lr}
   8000012:       20ff            movs    r0, #255        ; 0xff
   8000014:       f000 f828       bl      8000068 <{MyFunction}>
   8000018:       e7fe            b.n     8000018 <{OnReset}+0x8>
   800001a:       bf00            nop
 
 08000068
 {MyFunction}:
   8000068:       f44f 5380       mov.w   r3, #4096       ; 0x1000
   800006c:       f2c2 0300       movt    r3, #8192       ; 0x2000
   8000070:       7018            strb    r0, [r3, #0]
   8000072:       4770            bx      lr
 
 "MyFunction" and "OnReset" are in different source files and 
 therefore compiled to different object files.  I would like to 
 get "MyFunction" fully inlined to "OnReset" to remove the extra 
 branch instructions (bl and bx).
 
 It's my understanding that because the two functions are compiled 
 into separate object files, this must be done using LTO.  If I 
 compile them into the same object file, I get the full inlining 
 I'm looking for, but that's not going to scale well for my 
 project.
 
 ** Beautiful, isn't it? **
 {OnReset}:
   8000010:       f44f 5380       mov.w   r3, #4096       ; 0x1000
   8000014:       f2c2 0300       movt    r3, #8192       ; 0x2000
   8000018:       22ff            movs    r2, #255        ; 0xff
   800001a:       701a            strb    r2, [r3, #0]
   800001c:       e7fe            b.n     800001c <{OnReset}+0xc>
   800001e:       bf00            nop
 
 
 I've tried adding -flto to my compiler and linker flags and a 
 number of other things without success.  The compiler seems to 
 generate extra information in my object files, but the linker 
 doesn't seem to do the optimization.  I don't get any ICEs, 
 however, as stated in Bug 61 and 88.  I just don't get the result 
 I'm after.
 
 Here are my compiler commands:
 arm-none-eabi-gdc -mthumb -mcpu=cortex-m4 -fno-emit-moduleinfo 
 -ffunction-sections -fdata-sections -O3 -c -flto ...
 arm-none-eabi-ld -T link/link.ld -Map binary/memory.map 
 --gc-sections -flto ...
 
 I'm using my arm-none-eabi cross toolchain built from the GDC 4.8 
 branch.  I tried adding --enable-lto to my toolchain's configure, 
 but that had no effect.  It's my understanding that it's enabled 
 by default anyway.
 
 Does anyone know how I can get this level of inlining without 
 compiling all my source into one object file?
 
 Thanks for any help,
 Mike

Some time ago LTO was only supported by the gold linker, so you might need to configure binutils with --enable-gold --enable-plugins --enable-lto GCC should also be compiled with --enable-gold --enable-plugins --enable-lto http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Optimize-Options.html also says if you link manually you must use gcc to link, not ld and pass -flto when linking as well: gcc -o myprog -flto -O2 foo.o bar.o You can also try passing -fuse-linker-plugin to all gcc commands. I never used LTO though, so I'm not sure if this will actually help :-)
Mar 23 2014
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 23 March 2014 07:49, Johannes Pfau <nospam example.com> wrote:
 Am Sun, 23 Mar 2014 02:14:20 +0000
 schrieb "Mike" <none none.com>:

 Hello,

 I have some code generating the following assembly:
 {OnReset}:
   8000010:       b508            push    {r3, lr}
   8000012:       20ff            movs    r0, #255        ; 0xff
   8000014:       f000 f828       bl      8000068 <{MyFunction}>
   8000018:       e7fe            b.n     8000018 <{OnReset}+0x8>
   800001a:       bf00            nop

 08000068
 {MyFunction}:
   8000068:       f44f 5380       mov.w   r3, #4096       ; 0x1000
   800006c:       f2c2 0300       movt    r3, #8192       ; 0x2000
   8000070:       7018            strb    r0, [r3, #0]
   8000072:       4770            bx      lr

 "MyFunction" and "OnReset" are in different source files and
 therefore compiled to different object files.  I would like to
 get "MyFunction" fully inlined to "OnReset" to remove the extra
 branch instructions (bl and bx).

 It's my understanding that because the two functions are compiled
 into separate object files, this must be done using LTO.  If I
 compile them into the same object file, I get the full inlining
 I'm looking for, but that's not going to scale well for my
 project.

 ** Beautiful, isn't it? **
 {OnReset}:
   8000010:       f44f 5380       mov.w   r3, #4096       ; 0x1000
   8000014:       f2c2 0300       movt    r3, #8192       ; 0x2000
   8000018:       22ff            movs    r2, #255        ; 0xff
   800001a:       701a            strb    r2, [r3, #0]
   800001c:       e7fe            b.n     800001c <{OnReset}+0xc>
   800001e:       bf00            nop


 I've tried adding -flto to my compiler and linker flags and a
 number of other things without success.  The compiler seems to
 generate extra information in my object files, but the linker
 doesn't seem to do the optimization.  I don't get any ICEs,
 however, as stated in Bug 61 and 88.  I just don't get the result
 I'm after.

 Here are my compiler commands:
 arm-none-eabi-gdc -mthumb -mcpu=cortex-m4 -fno-emit-moduleinfo
 -ffunction-sections -fdata-sections -O3 -c -flto ...
 arm-none-eabi-ld -T link/link.ld -Map binary/memory.map
 --gc-sections -flto ...

 I'm using my arm-none-eabi cross toolchain built from the GDC 4.8
 branch.  I tried adding --enable-lto to my toolchain's configure,
 but that had no effect.  It's my understanding that it's enabled
 by default anyway.

 Does anyone know how I can get this level of inlining without
 compiling all my source into one object file?

 Thanks for any help,
 Mike

Some time ago LTO was only supported by the gold linker, so you might need to configure binutils with --enable-gold --enable-plugins --enable-lto GCC should also be compiled with --enable-gold --enable-plugins --enable-lto http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Optimize-Options.html also says if you link manually you must use gcc to link, not ld and pass -flto when linking as well: gcc -o myprog -flto -O2 foo.o bar.o You can also try passing -fuse-linker-plugin to all gcc commands. I never used LTO though, so I'm not sure if this will actually help :-)

I'd rather we'd fix the outstanding LTO bug before we start testing with it. :o)
Mar 23 2014
prev sibling parent "Mike" <none none.com> writes:
On Sunday, 23 March 2014 at 07:51:14 UTC, Johannes Pfau wrote:
 Some time ago LTO was only supported by the gold linker, so you 
 might
 need to configure binutils with --enable-gold --enable-plugins
 --enable-lto

 GCC should also be compiled with --enable-gold --enable-plugins
 --enable-lto

 http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Optimize-Options.html
 also says if you link manually you must use gcc to link, not ld 
 and
 pass -flto when linking as well:
 gcc -o myprog -flto -O2 foo.o bar.o

 You can also try passing -fuse-linker-plugin to all gcc 
 commands.

 I never used LTO though, so I'm not sure if this will actually 
 help :-)

You were right, I have to link with gcc to get LTO to kick in. And sure enough Bug 88 symptoms appeared. At least I now know why nothing was happening. Thank you!
Mar 23 2014