www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Stack Alignment for Numerics

reply dsimcha <dsimcha yahoo.com> writes:
Does the freezing of the D2 language spec and the publication of TDPL preclude
fixing low level ABI issues like stack alignment?  I have some numerics code
that is taking a massive performance hit because the stack keeps ending up
aligned such that none of my doubles are aligned on 8-byte boundaries,
resulting in something like a 2x performance hit.

If not, this is a pretty serious performance problem.  Is there a "standard"
solution to the stack alignment problem that will allow consistently good
performance on numerics code that uses double-precision floats?
Mar 03 2010
parent reply Don <nospam nospam.com> writes:
dsimcha wrote:
 Does the freezing of the D2 language spec and the publication of TDPL preclude
 fixing low level ABI issues like stack alignment?  I have some numerics code
 that is taking a massive performance hit because the stack keeps ending up
 aligned such that none of my doubles are aligned on 8-byte boundaries,
 resulting in something like a 2x performance hit.
 
 If not, this is a pretty serious performance problem.  Is there a "standard"
 solution to the stack alignment problem that will allow consistently good
 performance on numerics code that uses double-precision floats?
I agree. See bugzilla 2278. Something that's changed since this issue was last raised, is that the DMD backend now has 8-byte stack alignment for the Mac compiler. So the hard work has already been done. All that would be required to support it on Windows and Linux as well, is to enable it, and to align the stack to 8 bytes around every extern(C) call. As a workaround, I've been doing things like: // Align the stack to a multiple of 64 bytes void main() { asm { naked; mov EBP, ESP; and ESP, 0xFFFF_FFC0; call alignedmain; mov ESP, EBP; ret; } }
Mar 03 2010
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Don (nospam nospam.com)'s article
 dsimcha wrote:
 Does the freezing of the D2 language spec and the publication of TDPL preclude
 fixing low level ABI issues like stack alignment?  I have some numerics code
 that is taking a massive performance hit because the stack keeps ending up
 aligned such that none of my doubles are aligned on 8-byte boundaries,
 resulting in something like a 2x performance hit.

 If not, this is a pretty serious performance problem.  Is there a "standard"
 solution to the stack alignment problem that will allow consistently good
 performance on numerics code that uses double-precision floats?
I agree. See bugzilla 2278. Something that's changed since this issue was last raised, is that the DMD backend now has 8-byte stack alignment for the Mac compiler. So the hard work has already been done. All that would be required to support it on Windows and Linux as well, is to enable it, and to align the stack to 8 bytes around every extern(C) call. As a workaround, I've been doing things like: // Align the stack to a multiple of 64 bytes void main() { asm { naked; mov EBP, ESP; and ESP, 0xFFFF_FFC0; call alignedmain; mov ESP, EBP; ret; } }
Possibly stupid question: Would aligning each stack frame on 8-byte boundaries be enough to ensure that each individual stack-allocated double is aligned on 8-byte boundaries?
Mar 03 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Don:
 Something that's changed since this issue 
 was last raised, is that the DMD backend now has 8-byte stack alignment 
 for the Mac compiler.
Isn't the default alignment on osx 16 bytes? http://blogs.embarcadero.com/eboling/2009/05/20/5607 I think it can be good to do some experiments and benchmarks (on Linux or Windows) to compare few alternative implementation ideas, for example using LLVM. Bye, bearophile
Mar 03 2010