www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Mixin in Inline Assembly

reply Chris M. <chrismohrfeld comcast.net> writes:
Right now I'm working on a project where I'm implementing a VM in 
D. I'm on the rotate instructions, and realized I could *almost* 
abstract the ror and rol instructions with the following function

private void rot(string ins)(int *op1, int op2)
{
     int tmp = *op1;
     asm
     {
         mov EAX, tmp; // I'd also like to know if I could just 
load *op1 directly into EAX
         mov ECX, op2[EBP];
         mixin(ins ~ " EAX, CL;"); // Issue here
         mov tmp, EAX;
     }
     *op1 = tmp;
}

However, the inline assembler doesn't like me trying to do a 
mixin. Is there a way around this?

(There is a reason op1 is a pointer instead of a ref int, please 
don't ask about it)
Jan 08 2017
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:
 Right now I'm working on a project where I'm implementing a VM 
 in D. I'm on the rotate instructions, and realized I could 
 *almost* abstract the ror and rol instructions with the 
 following function

 private void rot(string ins)(int *op1, int op2)
 {
     int tmp = *op1;
     asm
     {
         mov EAX, tmp; // I'd also like to know if I could just 
 load *op1 directly into EAX
         mov ECX, op2[EBP];
         mixin(ins ~ " EAX, CL;"); // Issue here
         mov tmp, EAX;
     }
     *op1 = tmp;
 }

 However, the inline assembler doesn't like me trying to do a 
 mixin. Is there a way around this?

 (There is a reason op1 is a pointer instead of a ref int, 
 please don't ask about it)
Yes make the whole inline asm a mixin.
Jan 08 2017
parent Chris M. <chrismohrfeld comcast.net> writes:
On Monday, 9 January 2017 at 02:38:01 UTC, Stefan Koch wrote:
 On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:
 [...]
Yes make the whole inline asm a mixin.
Awesome, got it working. Thanks to both replies.
Jan 08 2017
prev sibling next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:
 However, the inline assembler doesn't like me trying to do a 
 mixin.
yep. iasm is completely independent from other fronted, it has it's own lexer, parser and so on. don't expect those things to work. the only way is to mixin the whole iasm block, including `asm{}`.
Jan 08 2017
prev sibling next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:
     asm
     {
         mov EAX, tmp; // I'd also like to know if I could just 
 load *op1 directly into EAX
         mov ECX, op2[EBP];
         mixin(ins ~ " EAX, CL;"); // Issue here
         mov tmp, EAX;
     }
     *op1 = tmp;
 }

 However, the inline assembler doesn't like me trying to do a 
 mixin. Is there a way around this?
' You should be able to break it up too asm { mov EAX, tmp; } mixin("asm { "~ ins ~ "EAX, CL;" }"); asm { move tmp, EAX; } you get the idea. It should compile to the same thing.
Jan 08 2017
prev sibling parent reply Basile B. <b2.temp gmx.com> writes:
On Monday, 9 January 2017 at 02:31:42 UTC, Chris M. wrote:
 Right now I'm working on a project where I'm implementing a VM 
 in D. I'm on the rotate instructions, and realized I could 
 *almost* abstract the ror and rol instructions with the 
 following function

 private void rot(string ins)(int *op1, int op2)
 {
     int tmp = *op1;
     asm
     {
         mov EAX, tmp; // I'd also like to know if I could just 
 load *op1 directly into EAX
         mov ECX, op2[EBP];
         mixin(ins ~ " EAX, CL;"); // Issue here
         mov tmp, EAX;
     }
     *op1 = tmp;
 }
don't forget to flag asm pure nothrow {} otherwise it's slow.
Jan 10 2017
next sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:
 don't forget to flag

 asm pure nothrow {}

 otherwise it's slow.
Why?
Jan 10 2017
parent reply Basile B. <b2.temp gmx.com> writes:
On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat 
wrote:
 On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:
 don't forget to flag

 asm pure nothrow {}

 otherwise it's slow.
Why?
It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.
Jan 10 2017
next sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Tuesday, 10 January 2017 at 13:13:17 UTC, Basile B. wrote:
 On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat 
 wrote:
 On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:
 don't forget to flag

 asm pure nothrow {}

 otherwise it's slow.
Why?
It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.
Interesting, thanks.
Jan 10 2017
prev sibling parent reply Chris M <chrismohrfeld comcast.net> writes:
On Tuesday, 10 January 2017 at 13:13:17 UTC, Basile B. wrote:
 On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat 
 wrote:
 On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:
 don't forget to flag

 asm pure nothrow {}

 otherwise it's slow.
Why?
It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.
Huh, that's really interesting, thanks for posting. I guess my other question would be how do I determine if a block of assembly is pure? I also figured out moving *op1 directly into RAX, guess it makes sense that a 64-bit value would need a 64-bit register :) private void rot(string ins)(int *op1, int op2) { mixin(" asm { mov RAX, op1; mov ECX, op2[EBP];" ~ ins ~ " [RAX], CL; } "); }
Jan 10 2017
parent Basile B. <b2.temp gmx.com> writes:
On Wednesday, 11 January 2017 at 00:11:50 UTC, Chris M wrote:
 On Tuesday, 10 January 2017 at 13:13:17 UTC, Basile B. wrote:
 On Tuesday, 10 January 2017 at 11:38:43 UTC, Guillaume Piolat 
 wrote:
 On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:
 don't forget to flag

 asm pure nothrow {}

 otherwise it's slow.
Why?
It's an empirical observation. In september I tried to get why an inline asm function was slow. What happened was that I didn't mark the asm block as nothrow https://forum.dlang.org/post/xznocpxtalpayvkrwxey forum.dlang.org I opened an issue asking the specifications to explain that clearly.
Huh, that's really interesting, thanks for posting. I guess my other question would be how do I determine if a block of assembly is pure?
The game changer for the performances is just "nothrow".
Jan 10 2017
prev sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Tuesday, 10 January 2017 at 10:41:54 UTC, Basile B. wrote:
 don't forget to flag

 asm pure nothrow {}

 otherwise it's slow.
Suddenly reminds me some of the speedup assembly I was writing for wideint, but seems I lost my code. too bad, the 128bit multiply had sped up and the division needed some work.
Jan 10 2017
parent reply Guillaume Piolat <first.last gmail.com> writes:
On Wednesday, 11 January 2017 at 06:14:35 UTC, Era Scarecrow 
wrote:
 Suddenly reminds me some of the speedup assembly I was writing 
 for wideint, but seems I lost my code. too bad, the 128bit 
 multiply had sped up and the division needed some work.
I'm a taker if you have some algorithm to reuse 32-bit divide in wideint division instead of scanning bits :)
Jan 11 2017
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Wednesday, 11 January 2017 at 15:39:49 UTC, Guillaume Piolat 
wrote:
 On Wednesday, 11 January 2017 at 06:14:35 UTC, Era Scarecrow 
 wrote:
 Suddenly reminds me some of the speedup assembly I was writing 
 for wideint, but seems I lost my code. too bad, the 128bit 
 multiply had sped up and the division needed some work.
I'm a taker if you have some algorithm to reuse 32-bit divide in wideint division instead of scanning bits :)
I remember the divide was giving me some trouble. The idea was to try and use the built in registers and limits of the assembly to take advantage of full 128bit division, unfortunately if the result is too large to fit in a 64bit result it breaks, rather than giving me half the result and letting me work with it. Still I think I'll impliment my own version and then if it's faster I'll submit it.
Jan 11 2017
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Wednesday, 11 January 2017 at 17:32:35 UTC, Era Scarecrow 
wrote:
  Still I think I'll impliment my own version and then if it's 
 faster I'll submit it.
Decided I'd give my hand at writing a 'ScaledInt' which is intended to basically allow any larger unsigned type. Coming across some assembly confusion. Using mixin with assembly here's the 'result' of the mixin (as a final result) alias UCent = ScaledInt!(uint, 4); struct ScaledInt(I, int Size) if (isUnsigned!(I) && Size > 1) { I[Size] val; ScaledInt opBinary(string op)(const ScaledInt rhs) const if (op == "+") { ScaledInt t; asm pure nothrow { //mixin generated from another function, for simplicity mov EBX, this; clc; mov EAX, rhs[EBP+0]; adc EAX, val[EBX+0]; mov t[EBP+0], EAX; mov EAX, rhs[EBP+4]; adc EAX, val[EBX+4]; mov t[EBP+4], EAX; mov EAX, rhs[EBP+8]; adc EAX, val[EBX+8]; mov t[EBP+8], EAX; mov EAX, rhs[EBP+12]; adc EAX, val[EBX+12]; mov t[EBP+12], EAX; } return t; } } Raw disassembly for my asm code shows this: mov EBX,-4[EBP] clc mov EAX,0Ch[EBP] adc EAX,[EBX] mov -014h[EBP],EAX mov EAX,010h[EBP] adc EAX,4[EBX] mov -010h[EBP],EAX mov EAX,014h[EBP] adc EAX,8[EBX] mov -0Ch[EBP],EAX mov EAX,018h[EBP] adc EAX,0Ch[EBX] mov -8[EBP],EAX From what I'm seeing, it should be 8, 0ch, 10h, then 14h, all positive. I'm really scratching my head why I'm having this issue... Doing an add of t[0] = val[0] + rhs[0]; i get this disassembly: mov EDX,-4[EBP] //mov EDX, this; mov EBX,[EDX] //val[0] add EBX,0Ch[EBP]//+ rhs.val[0] mov ECX,8[EBP] //mov ECX, ???[???] mov [ECX],EBX //t.val[0] = If i do "mov ECX,t[EBP]", i get "mov ECX,-014h[EBP]". If i try to reference the exact variable val within t, it complains it doesn't know it at compiler-time (although it's a fixed location). What am i missing here?
May 22 2017
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Tuesday, 23 May 2017 at 03:33:38 UTC, Era Scarecrow wrote:
 From what I'm seeing, it should be 8, 0ch, 10h, then 14h, all 
 positive. I'm really scratching my head why I'm having this 
 issue...

 What am i missing here?
More experiments and i think it comes down to static arrays. The following function code int[4] fun2() { int[4] x = void; asm { mov dword ptr x, 100; } x[0] = 200; //get example of real offset return x; } Produces the following (from obj2asm) int[4] x.fun2() comdat assume CS:int[4] x.fun2() enter 014h,0 mov -4[EBP],EAX mov dword ptr -014h[EBP],064h mov EAX,-4[EBP] mov dword ptr [EAX],0C8h // x[0]=200, offset +0 mov EAX,-4[EBP] leave ret int[4] x.fun2() ends So why is the offset off by 14h (20 bytes)? It's not like we need a to set a ptr first. Go figure i probably found a bug...
Jun 01 2017
parent Era Scarecrow <rtcvb32 yahoo.com> writes:
On Thursday, 1 June 2017 at 12:00:45 UTC, Era Scarecrow wrote:
  So why is the offset off by 14h (20 bytes)? It's not like we 
 need a to set a ptr first.

  Go figure i probably found a bug...
Well as a side note a simple yet not happy workaround is making a new array slice of the memory and then using that pointer directly. Looking at the intel opcode and memory call conventions, I could have used a very compact intel set and scaling. Instead I'm forced to ignore scaling, and I'm also forced to push/pop the flags to save the carry when advancing the two pointers in parallel. Plus there's 3 instructions that don't need to be there. Yeah this is probably nitpicking... I can't help wanting to be as optimized and small as possible.
Jun 02 2017