www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Inlining asm functions

reply bearophile <bearophileHUGS lycos.com> writes:
While implementing some code I have seen that LDC (that uses Tango) isn't able
to inline the code of expi, that is little more than a fsincos asm instruction.
LDC is now able to inline sqrt, etc (that increases their efficiency a lot),
but expi contains asm, and such functions aren't allowed to be inlined.

LDC has a way to allow such inlining of asm-containing functions anyway:
pragma(allow_inline)

So I have seen my code get faster (with LDC) when I have used the following
expi instead of the Tango one (I have removed the part that manages the case of
no asm allowed because I was doing a quick test):

creal expi(real y) {
    version (LDC) pragma(allow_inline);
	asm {           
		fld y;
		fsincos;
		fxch ST(1), ST(0);
	}
    // add code here if asm isn't allowed
}

Instead of just adding "version (LDC) pragma(allow_inline);" at the top of some
Tango functions (array operations too can enjoy such inlining, because if you
do a[]+b[] and their length is 4 there is a big overhead to call tango/Phobos
functions), isn't it better to add to the D language something standard (= that
works on DMD too) to state that some asm function can be inlined?
(Generally the idea of porting back tiny things from LDC to DMD sounds nice,
especially when DMD back-end is able to support them).

Bye,
bearophile
Jun 10 2009
next sibling parent Don <nospam nospam.com> writes:
bearophile wrote:
 While implementing some code I have seen that LDC (that uses Tango) isn't able
to inline the code of expi, that is little more than a fsincos asm instruction.
LDC is now able to inline sqrt, etc (that increases their efficiency a lot),
but expi contains asm, and such functions aren't allowed to be inlined.
 
 LDC has a way to allow such inlining of asm-containing functions anyway:
 pragma(allow_inline)
 
 So I have seen my code get faster (with LDC) when I have used the following
expi instead of the Tango one (I have removed the part that manages the case of
no asm allowed because I was doing a quick test):
 
 creal expi(real y) {
     version (LDC) pragma(allow_inline);
 	asm {           
 		fld y;
 		fsincos;
 		fxch ST(1), ST(0);
 	}
     // add code here if asm isn't allowed
 }
 
 Instead of just adding "version (LDC) pragma(allow_inline);" at the top of
some Tango functions (array operations too can enjoy such inlining, because if
you do a[]+b[] and their length is 4 there is a big overhead to call
tango/Phobos functions), isn't it better to add to the D language something
standard (= that works on DMD too) to state that some asm function can be
inlined?
 (Generally the idea of porting back tiny things from LDC to DMD sounds nice,
especially when DMD back-end is able to support them).
Nice idea, but it'd be pretty hard to do asm inlining in DMD. Consider that to get much benefit, it needs to get rid of the the "fld y". Tricky. expi could be an intrinsic, it's the only place where fsincos would ever be used. Array operations deserve to be treated specially when the length is known (and short). In trying to exterminate the internal compiler error bugs, I've been looking at the compiler's treatment of array ops over the past few days. There's a heap of room for improvement. (It's also one of the primary sources of bad code generation bugs).
 
 Bye,
 bearophile
Jun 10 2009
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
bearophile wrote:
 While implementing some code I have seen that LDC (that uses Tango) isn't able
to inline the code of expi, that is little more than a fsincos asm instruction.
LDC is now able to inline sqrt, etc (that increases their efficiency a lot),
but expi contains asm, and such functions aren't allowed to be inlined.
 
 LDC has a way to allow such inlining of asm-containing functions anyway:
 pragma(allow_inline)
 
 So I have seen my code get faster (with LDC) when I have used the following
expi instead of the Tango one (I have removed the part that manages the case of
no asm allowed because I was doing a quick test):
 
 creal expi(real y) {
     version (LDC) pragma(allow_inline);
 	asm {           
 		fld y;
 		fsincos;
 		fxch ST(1), ST(0);
 	}
     // add code here if asm isn't allowed
 }
 
 Instead of just adding "version (LDC) pragma(allow_inline);" at the top of
some Tango functions (array operations too can enjoy such inlining, because if
you do a[]+b[] and their length is 4 there is a big overhead to call
tango/Phobos functions), isn't it better to add to the D language something
standard (= that works on DMD too) to state that some asm function can be
inlined?
 (Generally the idea of porting back tiny things from LDC to DMD sounds nice,
especially when DMD back-end is able to support them).
Note that for LDC, an even more optimal arrangement is something like[1]: ----- version(LDC) import ldc.llvmasm; creal expi(real y) { return __asm!(creal)("fsincos", "={st(0)},={st(1)},0", y); } ----- (That works for both x86 and x86-64) This allows LLVM to load the real into the register any way it wants (not just an fld right before the fsincos), which may be useful when inlining. On x86 it also automatically inserts the fxch (the LLVM IR generated by LDC includes an explicit swap due to ABI issues), but it may omit it after inlining. The pragma(allow_inline) is easier to add to code that needs to support other compilers too, though ;). [1]: The asm may need some extra clobbers (probably either st(7) or st(2)-st(7)) to be correct, I'm not entirely sure. (Clobbers are specified by appending something like ",~{st(7)}" to the comma-separated second string argument)
Jun 11 2009
parent reply Brad Roberts <braddr bellevue.puremagic.com> writes:
On Thu, 11 Jun 2009, Frits van Bommel wrote:

 bearophile wrote:
 creal expi(real y) {
     version (LDC) pragma(allow_inline);
 	asm {           		fld y;
 		fsincos;
 		fxch ST(1), ST(0);
 	}
     // add code here if asm isn't allowed
 }
 
Note that for LDC, an even more optimal arrangement is something like[1]: ----- version(LDC) import ldc.llvmasm; creal expi(real y) { return __asm!(creal)("fsincos", "={st(0)},={st(1)},0", y); } -----
My appologies if I cut too much context, but one of the things Walter explicitly wanted with D's asm syntax was that it be portable across compilers. Having to have special syntax for each compiler is a problem. I strongly agree that functions using asm should be inlineable. Some will almost certainly point out that the proposed macros for d-future will solve the problem, and yes, it could be used here. But should it be necessary just to accomplish what should already be doable with standard inlining techniques? Anyway, food for thought. Later, Brad
Jun 11 2009
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Brad Roberts wrote:
 On Thu, 11 Jun 2009, Frits van Bommel wrote:
 Note that for LDC, an even more optimal arrangement is something like[1]:
 -----
 version(LDC)
     import ldc.llvmasm;

 creal expi(real y) {
     return __asm!(creal)("fsincos", "={st(0)},={st(1)},0", y);
 }
 -----
My appologies if I cut too much context, but one of the things Walter explicitly wanted with D's asm syntax was that it be portable across compilers. Having to have special syntax for each compiler is a problem.
The problem with the standard asm syntax is that it only allows access to variables in memory (except in naked functions), while it would often be beneficial for the compiler to load values into registers instead. Currently, "standard asm" can lead to silly sequences like storing a value to the stack (outside the asm), then immediately loading it again (inside the asm), possibly to the same register... I suppose this could be worked around by allowing the compiler to "strip off" loads from variables at the start of asm and stores to variables at the end, and replace them with register constraints. (And/or allow it to put the variable in a register altogether when all instructions that use it allow a register operand too and the register is otherwise unused by the inline asm) This "Do what I mean, not what I say" is not allowed by the specification for the standard asm syntax, I think. (Though both GDC and LDC already do some gymnastics that come close to this since IIRC both need to translate "somevar[EBP]" into something their respective backends can understand, which may or may not result in actually indexing off of EBP in the resulting code) (Can you tell I was thinking about something like that when Tomas implemented the __asm hack? :) ) Or does D have an "as if" rule like C++ does? (i.e.: the compiler is allowed to transform code any way it likes, as long as observable behavior is "as if" it did what the programmer specified) And if so, does it apply to inline asm too? Note that GDC actually took a similar approach as LDC did by allowing an alternate asm syntax in addition to the standard one, and it uses explicit constraints instead of requiring the programmer to manually load and store to variables. Another benefit of the LDC asm syntax is that it maps 1-to-1 onto the LLVM inline asm primitive, so it needs no adjustment to the LDC source to support ARM, Mips, PowerPC, or any other platform LLVM supports but for which no inline asm is implemented otherwise. So while I agree a standard asm syntax is desirable, I just think the current syntax may not be the best one for all jobs.
 I strongly agree that functions using asm should be inlineable.
This is very hard to allow without an explicit signal that it's okay (like pragma(allow_inline)) with the standard syntax because it would require the compiler to check that it's safe, meaning it needs a pretty good understanding of each and every asm instruction as opposed to just being able to translate it into binary code or AT&T syntax, which is what current compilers do. (Both GDC and LDC already do some analysis of the asm while translating to AT&T to figure out what registers are modified so they can put together a clobber list, but that's not enough to allow inlining) The fact that everything in std.intrinsic could easily be replaced by an asm one-liner[1] if they could only be inlined tells me Walter hasn't figured out how to do it either[2]... [1]: Plus either set-up & tear-down or constraints, that is. [2]: Though this may have to do with the internals of the DMD inliner, which is a whole other topic...
Jun 11 2009