www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How to set struct alignment on the stack?

reply Brian Chapman <nospam-for-brian see-post-for-address.net> writes:
Was going to optimize my vector functions for SSE capable CPUs but I 
ran into a problem. How does one set the alignment for a struct? Not 
the byte packing alignment for the member data, but how the struct gets 
aligned on the stack? This is very important for SIMD operations. In 
the code that follows, there are two main functions. The first one will 
crash. The second one works, but is not optimal and sucks. Am I doing 
something lame? Thanks for any time you take to reply! - Brian

version = ia32simd; // this is the version being tested.
/********************************************************/
align (16) struct vector
{
	float x,y,z,w;
	void set (float a, float b, float c) {x=a;y=b;z=c;w=1;}
	void print () {printf ("[ %g, %g, %g, %g ]\n",x,y,z,w);}
}

void add (inout vector result, inout vector a, inout vector b)
{
	version (ia32simd) asm
	{
		mov ESI,a;
		mov EDI,b;
		movaps XMM0,[ESI];
		addps XMM0,[EDI];
		mov ESI,result;
		movaps [ESI],XMM0;
	}
	else
	{
		c.x = a.x + b.x;
		c.y = a.y + b.y;
		c.z = a.z + b.z;
	}
}

/********************************************************/
/* This Main Doesn't Work */

static assert (vector.sizeof == 16);
//static assert (vector.alignof == 16); // FAILS! ???

void main1 ()
{
	vector a,b,c;
	//assert ((cast(int)(&a) & 0b1111) == 0); // FAILS!
	//assert ((cast(int)(&b) & 0b1111) == 0); // FAILS!
	//assert ((cast(int)(&c) & 0b1111) == 0); // FAILS!
	a.set (1,2,3);
	b.set (4,5,6);
	add (c,a,b); // Error: Win32 Exception !!!
	c.print();
}

/********************************************************/
/* This Main Works, but SUCKS! */

vector *alloc16aligned ()
{
	/* allocate a vector off the heap 16 bytes aligned */
	byte *p = new byte [vector.sizeof+0b1111];
	return cast(vector*)(((cast(int)(p))+0b1111)&~0b1111);
}

void main2 ()
{
	vector *a = alloc16aligned();
	vector *b = alloc16aligned();
	vector *c = alloc16aligned();
	a.set (1,2,3);
	b.set (4,5,6);
	add (*c,*a,*b);
	c.print();
	assert (c.x == 5);
	assert (c.y == 7);
	assert (c.z == 9);
	assert (c.w == 2);
}
Feb 08 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Brian Chapman wrote:

 Was going to optimize my vector functions for SSE capable CPUs but I ran 
 into a problem. How does one set the alignment for a struct? Not the 
 byte packing alignment for the member data, but how the struct gets 
 aligned on the stack? This is very important for SIMD operations.

It's equally important for AltiVec, as well as it is for SSE. It would be nice to avoid having to use assembler*, but then D would have to have the same vector extensions that C has... http://developer.apple.com/hardware/ve/model.html And since the PowerPC G4+ has 32 vector registers, in addition to the 32 integer and the 32 floating-point registers, passing vector data on the stack does suck in comparison with registers. But a first small step is aligning the thing to 16-byte boundaries. Otherwise one would have permute all loads, and that sucks worse. http://developer.apple.com/hardware/ve/alignment.html --anders * not that GDC supports any inline assembler yet anyway, but...
Feb 08 2005
next sibling parent "Craig Black" <cblack ara.com> writes:
SIMD extensions for D would be really cool.

"Anders F Björklund" <afb algonet.se> wrote in message 
news:cubf02$s48$1 digitaldaemon.com...
 Brian Chapman wrote:

 Was going to optimize my vector functions for SSE capable CPUs but I ran 
 into a problem. How does one set the alignment for a struct? Not the byte 
 packing alignment for the member data, but how the struct gets aligned on 
 the stack? This is very important for SIMD operations.

It's equally important for AltiVec, as well as it is for SSE. It would be nice to avoid having to use assembler*, but then D would have to have the same vector extensions that C has... http://developer.apple.com/hardware/ve/model.html And since the PowerPC G4+ has 32 vector registers, in addition to the 32 integer and the 32 floating-point registers, passing vector data on the stack does suck in comparison with registers. But a first small step is aligning the thing to 16-byte boundaries. Otherwise one would have permute all loads, and that sucks worse. http://developer.apple.com/hardware/ve/alignment.html --anders * not that GDC supports any inline assembler yet anyway, but...

Feb 08 2005
prev sibling parent reply Brian Chapman <nospam-for-brian see-post-for-address.net> writes:
On 2005-02-08 16:37:54 -0600, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= 
<afb algonet.se> said:

 It's equally important for AltiVec, as well as it is for SSE.

Yeah, I was wanting to do some altivec too, but that's going to require an external asm file since, as you mentioned, GDC doesn't support inline asm. Which means, that it's only worth while to do on longer operations with more data (like matrices). But since its external, I may as well do it in C and use the compiler intrinsics, as you also mentioned. But then suddenly I'm back to using C again which I was wanting to get away from. *sigh*
 It would be nice to avoid having to use assembler*, but then
 D would have to have the same vector extensions that C has...
 http://developer.apple.com/hardware/ve/model.html
 
 And since the PowerPC G4+ has 32 vector registers, in addition
 to the 32 integer and the 32 floating-point registers, passing
 vector data on the stack does suck in comparison with registers.
 
 But a first small step is aligning the thing to 16-byte boundaries.
 Otherwise one would have permute all loads, and that sucks worse.
 http://developer.apple.com/hardware/ve/alignment.html
 
 --anders
 
 * not that GDC supports any inline assembler yet anyway, but...

It would be nice if at the very least there was a way, perhaps via the command line, to globally set the data alignment to an arbitrary value (in this case 16 bytes).
Feb 08 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Brian Chapman wrote:

 Yeah, I was wanting to do some altivec too, but that's going to require 
 an external asm file since, as you mentioned, GDC doesn't support inline 
 asm. Which means, that it's only worth while to do on longer operations 
 with more data (like matrices). But since its external, I may as well do 
 it in C and use the compiler intrinsics, as you also mentioned. But then 
 suddenly I'm back to using C again which I was wanting to get away from. 
 *sigh*

D doesn't let you get away from C. It lets you get away from *C++* :-) AltiVec works fine if you compile it with /usr/bin/gcc, and then link in the objects in the D source ? (it'll require a PPC G4/G5, of course) It might be possible (with a few months or something of work) to get the AltiVec patches and the D patches to co-exist in the GCC 3.3 base... See this changelog for all the patches that are being applied to it: http://www.opensource.apple.com/darwinsource/DevToolsAug2004/gcc-1762/CHANGES.Apple (some examples)
 Owner     Status     Name of change
 -----     ------     --------------
 zlaski    local      -Wno-altivec-long-deprecated
 shebs     mixed      AltiVec
 shebs     unknown    Altivec related
 shebs     unknown    darwin native, AltiVec
 shebs     local      disable generic AltiVec patterns

And a ton of other patches, mostly related to 1) Objective-C 2) Objective-C++ 3) Macintosh legacy 4) Fat i386/ppc builds (the sources are modified, so you need to use "diff" a lot) To my local GCC/GDC copy, I have applied the Apple framework patches (so that "#include <Carbon/Carbon.h>" and -framework Carbon works) as well as the -mcpu patches so that G3, G4 and G5 are recognized. http://dstress.kuehne.cn/raw_results/mac-OS-X-10.3.7_gdc-0.10-patch/ But perhaps a worthier effort would be to port GDC to GCC 4.0 ? --anders
Feb 09 2005