www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - ASM access to array

reply Heinz <malagana15 yahoo.com> writes:
Hi there,
Although i've been coding in D for the last 5 years, i've never got my hands
into ASM until now. I'm trying to use the inline assembler now and i'm trying
to apply opcodes to a class member, an uint array but don't know how. This is
what i've been trying to accomplish:

class MyClass
{
    private uint[] array;

    private void MyFunc()
    {
        asm
        {
            rol array[1], 8;
            rol array[2], 16
        }
    }
}

The above code complains about type/size. However, it seems to work with
single uint variables local to the function. I've tried moving the values to
EBX and ECX then applying rol to these registers. The code compile and run but
does nothing at all.
Any ideas or guide? A good doc about asm with D?
BIG THXS!!!
Feb 01 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Heinz:

Not tested much:

import std.stdio: writeln;

class MyClass {
    uint[] array;

    this() {
        array = new typeof(array)(4);
        array = [10, 20, 30, 40];
    }

    void myFunc() {
        version (D_InlineAsm_X86) {
            auto aptr = array.ptr;
            enum dsize = typeof(array[0]).sizeof;

            asm {
                mov EAX, aptr;

                mov ECX, [EAX + dsize * 1];
                rol ECX, 8;
                mov [EAX + dsize * 1], ECX;

                mov ECX, [EAX + dsize * 2];
                rol ECX, 16;
                mov [EAX+ dsize * 2], ECX;
            }
        } else
             assert(0);
    }
}

void main() {
    auto c = new MyClass();
    writeln(c.array);
    c.myFunc();
    writeln(c.array);
}

Bye,
bearophile
Feb 01 2011
parent reply Heinz <malagana15 yahoo.com> writes:
Wow, thanks for the reply. Changing the 'enum dsize' to 'uint dsize = 4;' seems
to
produce some results...i guess it's working now (i have no way to verify it is
working but looks like bits are being rotated by rol). One thing, if i replace
dsize with 4 in all lines, the code compiles but again it does nothing, weird
uh?

Is D_InlineAsm_X86 really needed? I saw it is listed in the predefined versions
table. Is this version for custom use or is it used internally by the compiler
to
specify that inline ASM is available or not? Inline ASM is always available
right?

Anyway, thanks for the code. I was wondering: Is there any other way to directly
use operands with class members, array items in this case? I mean, for variables
local to functions i use the variables as if they were in a D code statement,
but
in your code we have to create local variables and move them between registers,
the values are moved too.

// LOCAL VARIABLE EXAMPLE
void MyFunct()
{
    uint temp;
    asm
    {
        rol temp, 8;
    }
}

If i'm going to create local pointers and constants and move them across
registers
then, to accomplish the same result, would it be better (or the same) to create
2
local ints, rol them and then move them to their respective place (this way the
compiler might generate better code):

// EXAMPLE
class MyClass
{
    private uint[] array;

    private void MyFunc()
    {
        uint a = array[0], b = array[1];
        asm
        {
            rol a, 8;
            rol b, 16
        }
        array[0] = a;
        array[1] = b;
    }
}

Thanks 4 everything!!!!!!!!!!!
Feb 01 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Heinz:

(i have no way to verify it is working but looks like bits are being rotated by
rol).<
Create an unittest and test if the output-input pairs are correct.
Is D_InlineAsm_X86 really needed?<
Someday D will run on other CPUs, like 64 bit ones, so 32 bit X86 will not work. To avoid problems it's better to get used to protect asm blocks with that. So it's not necessary, but it's a good habit and it doesn't hurt.
Inline ASM is always available right?<
I think the (partial) D implementation for Dotnet didn't support asm. But even if all implementations support it, there are various kinds of CPUs, and each one needs a different asm.
If i'm going to create local pointers and constants and move them across
registers then, to accomplish the same result, would it be better (or the same)
to create 2 local ints, rol them and then move them to their respective place
(this way the compiler might generate better code):<
When the D compiler sees asm code, it locally switches off optimizations. I suggest you to compile your two versions, and take a look at the asm the D compiler produces (with obj2asm or with a free disassembler on Linux). Bye, bearophile
Feb 01 2011
parent reply Matthias Pleh <sufu alter.com> writes:
As bearophile notet, unittest are always good!!
But for the start, I always liked to make some pretty-printing functions ...
... so this is my version


import std.stdio;

uint rotl_d(uint value,ubyte rotation){
     return (value<<rotation) | (value>>(value.sizeof*8 - rotation));
}

uint rotl_asm(uint value,ubyte rotation){
     asm{
         mov EAX, value;   // get first argument
         mov CL , rotation; // how many bits to move
         rol EAX, CL;
     }// return with result in EAX
}

void bin_writeln(string info,uint value, bool nl){
     writefln("%1s: %02$32b%3$s",info,value,nl?"\n":"");
}

int main(string[] argv){
     uint a=0xc0def00d;
     bin_writeln("value a",a           ,false);
     bin_writeln("value b",rotl_d(a,1),true);
     //
     bin_writeln("value a",a             ,false);
     bin_writeln("value b",rotl_asm(a,1),true);
     return 0;
}

greets
Matthias
Feb 02 2011
parent reply Joel Christensen <joelcnz gmail.com> writes:
What about my edited version:

import std.stdio;

uint rotl_d(uint value,ubyte rotation){
     return (value<<rotation) | (value>>(value.sizeof*8 - rotation));
}

uint rotl_asm(uint value,ubyte rotation){
     asm{
         mov EAX, value;   // get first argument
         mov CL , rotation; // how many bits to move
         rol EAX, CL;
     }// return with result in EAX
}

void bin_writeln(string info,uint value, bool nl){
     writefln("%1s: %02$32b%3$s",info,value,nl?"\n":"");
}

int main(string[] argv){
     uint a=0xc0def00d;
     bin_writeln("value a",a           ,false);
     bin_writeln("value b",rotl_d(a,1),true);
     //
     bin_writeln("value a",a             ,false);
     bin_writeln("value b",rotl_asm(a,1),true);
	
	uint b;
	ubyte c = 0;
	while ( 1 == 1 ) { // Press Ctrl + C to quit
		b = rotl_asm(0xc0def00d, c);
		foreach (rst; 0 .. 5_000 )
			writef("%032b %2d\r",b, c );
		c = cast(ubyte)( c + 1 == 32 ? 0 : c + 1 );
		
	}
     return 0;
}
Feb 02 2011
parent reply Heinz <malagana15 yahoo.com> writes:
Thanks 4 the code (this goes to Matthias Pleh too):

Both codes work amazingly fine after modifying them a bit to work with DMD1.030.
This helped me a lot!
Still don't know how the asm version implicitly returns a value (no return
keyword
needed). It seems that the returned value is EAX, not the variable "value". To
return "value", first the content of EAX should be moved to "value", right? (mov
value, EAX;)

I found this site: http://www.swansontec.com/sregisters.html
It helped me to resolve optimal register usage. It also details wich registers
can
use offsets.

I finally ended with this ASM code:

// I have to rotate every int of an uint[3]. For some reason i can't directly
reference the array pointer so i create a local variable to the function.

void myFunct()
{
    uint* p = myarray.ptr;
    asm
    {
        mov EBX, p;

        mov EAX, [EBX + 4];
        rol EAX, 8;
        mov [EBX + 4], EAX;

        mov EAX, [EBX + 8];
        rol EAX, 16;
        mov [EBX + 8], EAX;

        mov EAX, [EBX + 12];
        rol EAX, 24;
        mov [EBX + 12], EAX;
    }
}

Hope this helps someone else.
Cheers.

Heinz
Feb 03 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Heinz:

 void myFunct()
 {
     uint* p = myarray.ptr;
     asm
     {
         mov EBX, p;
 
         mov EAX, [EBX + 4];
         rol EAX, 8;
         mov [EBX + 4], EAX;
 
         mov EAX, [EBX + 8];
         rol EAX, 16;
         mov [EBX + 8], EAX;
 
         mov EAX, [EBX + 12];
         rol EAX, 24;
         mov [EBX + 12], EAX;
     }
 }
I see you have removed the asm guard I have shown you. I suggest you to benchmark it against another normal D function. Keep in mind that asm blocks kill inlining. Also try to perform a load-load-load processing-processing-processing store-store-store instead a load-processing-store load-processing-store load-processing-store, because this often helps the pipelining of the processor (expecially when you use SSE/AVX registers). Bye, bearophile
Feb 04 2011
parent reply Heinz <malagana15 yahoo.com> writes:
bearophile,

Thank you so much for all your help. It seems you're very into ASM.
I kept the D_InlineAsm_X86 in my code as you suggested. The code i gave here was
just an example. But my code's version implementation looks like this:

version(D_InlineAsm_X86)
{
    // ASM Code.
}
else
{
    // D code.
}

This results in a much robust code. You were right about it.
You are right too about the "load-load-load processing-processing-processing
store-store-store instead a load-processing-store load-processing-store
load-processing-store" thing. I'll modify my code to this model, though it will
require to move some elements to the stack but no big deal, i think this won't
hurt performance as it is designed to work this way.

-Does ASM kill inlining for the function where the asm block is present or for
the
whole compilation?
-In your opinion, How badly can be if function inlining is not present? some
docs
from the net: http://www.parashift.com/c++-faq-lite/inline-functions.html

Cheers,

Heinz
Feb 04 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Heinz:

 This results in a much robust code.
That's the right way to do it, with a D fallback.
 You are right too about the "load-load-load processing-processing-processing
store-store-store instead a load-processing-store load-processing-store
load-processing-store" thing. I'll modify my code to this model, though it will
require to move some elements to the stack but no big deal, i think this won't
hurt performance as it is designed to work this way.<
That may be slow, so you need to benchmark.
-Does ASM kill inlining for the function where the asm block is present or for
the whole compilation?<
It prevents just the function that contains assembly to be inlined.
-In your opinion, How badly can be if function inlining is not present?<
Inlining is an important optimization if your function does very little, otherwise it's not important or it makes the code slower. In your function there are only few asm instructions, so inlining becomes important. This is why I suggest you to benchmark your asm code against equivalent D code compiled with -O -release -inline. The D code+inlining may turn out to be faster. Here this is the most probable outcome, in my opinion. If you use the LDC compiler there are two different ways (pragma(allow_inline) and inline asm expressions) to have inlining even when you use asm code, so the situation is better. Bye, bearophile
Feb 04 2011