www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Help needed on inline assembly

reply Hendrik Renken <funsheep -[no-spam]-gmx.net> writes:
Hi,

i'd like to work with the SSE-commands in assembly. I wrote some
testroutines (with my limited knowledge). Some of them work, others
dont. I'd like to know, why's that so. Can someone of you guys help me out?


First thing:
#void main()
#{
#	float[4] array = [ 1f, 2f, 3f, 4f ];
#	float* a = &array[0];
#//	float t;
#
#	asm
#	{
#		mov EBX, a;
#		movaps XMM1, [EBX];
#	}
#}

doesnt work. uncomment the line
//	float t;

and it works. Why? Does the assembler code need to be aligned to
something? When yes, how can i do this without the need of allocating
another float on the stack?


Second thing: I dont know how to address a public const variable, with
my limited knowledge i would do something like this:

#float[4] array = [ 1f, 2f, 3f, 4f ];
#
#void main()
#{
#	float* a = &array[0];
#
#	asm
#	{
#		mov EAX, [a];
#		movaps XMM1, [EAX];		
#	}
#}


But i get a secfault. Why? I would interpret the code like this: a holds
the address to the first arrayelement. moc EAX, [a]; copies the address
to the first arrayelement to EAX. Which is then used to access the array.

Im using DMD 1.024 (since later versions broke derelict on my platform)
Jan 30 2008
parent reply downs <default_357-line yahoo.de> writes:
 
 #float[4] array = [ 1f, 2f, 3f, 4f ];
 #
 #void main()
 #{
 #	float* a = &array[0];
 #
 #	asm
 #	{
 #		mov EAX, [a];
 #		movaps XMM1, [EAX];		
 #	}
 #}
 
 
 But i get a secfault. Why? I would interpret the code like this: a holds
 the address to the first arrayelement. moc EAX, [a]; copies the address
 to the first arrayelement to EAX. 

Nope :) Remember, [] dereferences. mov EAX, [a] copies the value, i.e. "a dereferenced", to EAX. So EAX now contains the first value in array, 1f. Trying to dereference that floating point number leads understandably to a segfault. --downs
Jan 30 2008
parent reply Hendrik Renken <funsheep -[no-spam]-gmx.net> writes:
downs schrieb:
 #float[4] array = [ 1f, 2f, 3f, 4f ];
 #
 #void main()
 #{
 #	float* a = &array[0];
 #
 #	asm
 #	{
 #		mov EAX, [a];
 #		movaps XMM1, [EAX];		
 #	}
 #}


 But i get a secfault. Why? I would interpret the code like this: a holds
 the address to the first arrayelement. moc EAX, [a]; copies the address
 to the first arrayelement to EAX. 

Nope :) Remember, [] dereferences. mov EAX, [a] copies the value, i.e. "a dereferenced", to EAX. So EAX now contains the first value in array, 1f. Trying to dereference that floating point number leads understandably to a segfault.

ok. yeah. didnt posted the right example (i have a bunch of asm-test-files). But this doesnt work either: float[4] array = [ 1f, 2f, 3f, 4f ]; void main() { float* a = &array[0]; asm { mov EAX, a; movaps XMM1, [EAX]; } }
Jan 30 2008
next sibling parent downs <default_357-line yahoo.de> writes:
The problem is that parameters to movaps need to be aligned on a 16-byte
boundary. That's what the 'a' in movaps means, "aligned".

So you can either use movups (unaligned), which is slower, or explicitly
allocate your memory to lie on a 16-byte boundary.


Example:

 import std.gc, std.stdio;
 void* malloc_align16(size_t count) {
   void* res = malloc(count+15).ptr;
   return cast(void*) ((cast(size_t)(res + 15))&(0xFFFFFFFF - 15));
 }

 float[4] array = [ 1f, 2f, 3f, 4f ];

 void main()
 {
 	auto _array = (cast(float*)malloc_align16(4*float.sizeof))[0 .. 4];
         _array[] = array;
         auto a = &_array[0];
 	asm
 	{
 		mov EAX, a;
 		movaps XMM1, [EAX];
 	}
         writefln("Done");
 }


Hope it helps. --downs
Jan 30 2008
prev sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message 
news:fnprum$6oh$1 digitalmars.com...
 float[4] array = [ 1f, 2f, 3f, 4f ];

 void main()
 {
 float* a = &array[0];

 asm
 {
 mov EAX, a;
 movaps XMM1, [EAX];
 }
 }

If you're using the newest DMD, this should work, it does for me. If you're using anything older than 1.023 (like, hm, 1.015? GRRGH), this will probably fail. 1.023 made anything in the static data segment >= 16 bytes paragraph aligned, so that data is already aligned properly. I don't know what GDC does in this case. Another way to get an aligned allocation is to use a struct with the float[4] in it. struct vec { float[4] array; } void main() { vec* v = new vec; // ptr will get you the pointer to the 0th element too float* a = v.array.ptr; asm { mov EAX, a; movaps XMM1, [EAX]; } } This also doesn't rely on any standard library stuff.
Jan 30 2008
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
news:fnq21n$mvk$1 digitalmars.com...

 Another way to get an aligned allocation is to use a struct with the 
 float[4] in it.

 struct vec
 {
    float[4] array;
 }

 void main()
 {
    vec* v = new vec;

    // ptr will get you the pointer to the 0th element too
    float* a = v.array.ptr;

    asm
    {
        mov EAX, a;
        movaps XMM1, [EAX];
    }
 }

 This also doesn't rely on any standard library stuff.

A third way is to wrap the second way in a function, allowing you to allocate statically-sized arrays directly: T* alloc(T)() { struct S { T t; } return &(new S).t; } void main() { float[4]* array = alloc!(float[4]); float* a = array.ptr; asm { mov EAX, a; movaps XMM1, [EAX]; } }
Jan 30 2008
prev sibling parent reply Hendrik Renken <funsheep -[no-spam]-gmx.net> writes:
Jarrett Billingsley schrieb:
 "Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message 
 news:fnprum$6oh$1 digitalmars.com...
 float[4] array = [ 1f, 2f, 3f, 4f ];

 void main()
 {
 float* a = &array[0];

 asm
 {
 mov EAX, a;
 movaps XMM1, [EAX];
 }
 }

If you're using the newest DMD, this should work, it does for me.

Now i've updated to 1.026. And the above example doesnt work (still not aligned) movups works.
 If you're 
 using anything older than 1.023 (like, hm, 1.015?  GRRGH), this will 
 probably fail. 

Yeah. I used 1.015 before...
 1.023 made anything in the static data segment >= 16 bytes 
 paragraph aligned, so that data is already aligned properly.

Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment is in 1.026 broken again...
 I don't know what GDC does in this case.
 
 Another way to get an aligned allocation is to use a struct with the 
 float[4] in it.
 
 struct vec
 {
     float[4] array;
 }
 
 void main()
 {
     vec* v = new vec;
 
     // ptr will get you the pointer to the 0th element too
     float* a = v.array.ptr;
 
     asm
     {
         mov EAX, a;
         movaps XMM1, [EAX];
     }
 }
 
 This also doesn't rely on any standard library stuff. 
 
 

Jan 30 2008
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message 
news:fnqet1$1on0$1 digitalmars.com...

 Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment
 is in 1.026 broken again...

Ah, it's probably because of Linux. That code works on Windows. I forgot that DMD uses ELF on Linux like GDC. DMD maybe can't control the alignment of the data there. Either that, or it's a genuine bug. :\
Jan 30 2008
parent reply Hendrik Renken <funsheep gmx.net> writes:
Jarrett Billingsley wrote:
 "Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message 
 news:fnqet1$1on0$1 digitalmars.com...
 
 Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment
 is in 1.026 broken again...

Ah, it's probably because of Linux. That code works on Windows. I forgot that DMD uses ELF on Linux like GDC. DMD maybe can't control the alignment of the data there. Either that, or it's a genuine bug. :\

i did some more testing, it seems that dynamically allocated data is aligned. for that i can use movaps. however statically allocated data is not aligned. but we can allocate 1 to 3 ints/floats/etc before the data, until it is aligned ;) thanks for the help, got it working - and a speedup factor of 240! yeah. from 480 millisec down to 2 millisec with sse instructions. that rocks! regards Hendrik
Jan 31 2008
parent Don Clugston <dac nospam.com.au> writes:
Hendrik Renken wrote:
 Jarrett Billingsley wrote:
 "Hendrik Renken" <funsheep -[no-spam]-gmx.net> wrote in message 
 news:fnqet1$1on0$1 digitalmars.com...

 Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment
 is in 1.026 broken again...

Ah, it's probably because of Linux. That code works on Windows. I forgot that DMD uses ELF on Linux like GDC. DMD maybe can't control the alignment of the data there. Either that, or it's a genuine bug. :\

i did some more testing, it seems that dynamically allocated data is aligned. for that i can use movaps. however statically allocated data is not aligned. but we can allocate 1 to 3 ints/floats/etc before the data, until it is aligned ;)

That's what I found on Windows, and persuaded Walter to fix it. I didn't realise it wasn't working on Linux yet. I hope that eventually we'll get stack data properly aligned; if it gets into the D ABI, then we only have to worry about callbacks from C -- ie, only extern() functions would need to align the stack.
 thanks for the help, got it working - and a speedup factor of 240! yeah. 
 from 480 millisec down to 2 millisec with sse instructions.
 
 that rocks!

Oh yeah! DMD's floating point code generation is very basic; it does almost no optimisation; and it's excellent support for inline asm makes asm particularly attractive. But a factor of 240 is pretty extreme.
Jan 31 2008