www.digitalmars.com         C & C++   DMDScript  

D.gnu - Cross-module inlining in gdc

reply Mike Farnsworth <mike.farnsworth gmail.com> writes:
So, as I've been working on getting the gcc builtins available to D code
(somewhat successfully as of last night, I might add), I've run into a fairly
significant inlining problem.

Given a function definition in D, where I want to force inlining:

// Assume __v4sf is defined by the compiler
pragma(set_attribute, _mm_add_ps, always_inline, artificial);
__v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
{
    return __builtin_ia32_addps(__A, __B);
}

When this occurs in the module I care about, it works dandy.  It gets inlined,
the generated code is pretty optimal, etc.  When it is defined in another
module, and I call the function, I get messages like "sorry, unimplemented:
inlining failed" where it states it doesn't have the body of the function.

I was compiling each file as a separate module, one at a time, so I used
-combine to give it multiple source files at once and allow it to link it right
away.  That didn't make any difference.  If I take away the pragma, it will
then compile, but it never inlines.

When doing -combine, is there a way to get gdc to feed all of the source to the
frontend all at once, such that all the definitions/bodies/etc. are all present
so that inlining can occur?  I would imagine even this strategy falls apart
when linking against a library; is there any way we can support something like
-flto so that at codegen time gcc has more opportunity to do inlining?

Intrinsic wrappers defined in a different module, and then never getting
inlined kinda defeats the purpose of the intrinsics.  It'd be nice if we can
find a way to get cross-module inlining to work, even if it means using
link-time optimization.

-Mike
Feb 09 2011
next sibling parent reply Trass3r <un known.com> writes:
 // Assume __v4sf is defined by the compiler
 pragma(set_attribute, _mm_add_ps, always_inline, artificial);
 __v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
 {
     return __builtin_ia32_addps(__A, __B);
 }
2 notes: Isn't it pragma(GNU_set_attribute? And you should be able to do pragma(GNU_attribute, always_inline, artificial) __v4sf _mm_add_ps.... as well.
Feb 09 2011
parent Mike Farnsworth <mike.farnsworth gmail.com> writes:
Trass3r Wrote:

 // Assume __v4sf is defined by the compiler
 pragma(set_attribute, _mm_add_ps, always_inline, artificial);
 __v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
 {
     return __builtin_ia32_addps(__A, __B);
 }
2 notes: Isn't it pragma(GNU_set_attribute? And you should be able to do pragma(GNU_attribute, always_inline, artificial) __v4sf _mm_add_ps.... as well.
That's the syntax that ibuclaw gave me, and it does indeed work. GNU_set_attribute is deprecated now, as far as I know (from spelunking through the code). -Mike
Feb 09 2011
prev sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
== Quote from Mike Farnsworth (mike.farnsworth gmail.com)'s article
 When doing -combine, is there a way to get gdc to feed all of the source to the
frontend all at once, such that all the definitions/bodies/etc. are all present so that inlining can occur? I would imagine even this strategy falls apart when linking against a library; is there any way we can support something like -flto so that at codegen time gcc has more opportunity to do inlining? -combine does feed all of the source to the frontend all at once. Why it doesn't get inlined is likely because the gcc backend consider to not do so (ie: because code size would grow). -flto should be supported if gcc was builtin with it enabled (--enable-languages=lto) I've never tried it though, so that's a second guess.
Feb 09 2011