www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - LDC2 win64 calling convention

reply realhet <real_het hotmail.com> writes:
Hi,

Is there a documentation about the win64 calling convention used 
with LDC2 compiler?

So far I try to use the Microsoft x64 calling convention, but I'm 
not sure if that's the one I have to. But it's not too accurate 
becaues I think it uses the stack only.

https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention

I'm asking for Your help in the following things:

1. Is there register parameters? (I think no)
2. What are the volatile regs? RAX, RCX, RDX, XMM6..XMM15?
3. Is the stack pointer aligned to 16?
4. Is there a 32 byte shadow area on the stack?

Thank you!
Nov 28 2018
parent reply kinke <noone nowhere.com> writes:
On Wednesday, 28 November 2018 at 18:56:14 UTC, realhet wrote:
 1. Is there register parameters? (I think no)
Of course, e.g., POD structs of power-of-2 sizes <= 8 bytes and integral scalars as well as float/double/vectors. The stack isn't used at all, aggregates > 8 bytes are passed by ref (caller makes a copy on its stack and passes a pointer to it to the callee); that seems not to be mentioned at all in the Wiki article.
 2. What are the volatile regs? RAX, RCX, RDX, XMM6..XMM15?
See Microsoft's docs.
 3. Is the stack pointer aligned to 16?
It is IIRC.
 4. Is there a 32 byte shadow area on the stack?
Yes, IIRC. --- LDC conforms to the regular Win64 ABI (incl. __vectorcall extension for vectors). The biggest difference is that `extern(D)` (as opposed to `extern(C)` or `extern(C++)`) reverses the arguments - `foo(1, 2, 3, 4)` becomes `foo(4, 3, 2, 1)`, so not the first 4 args are passed in registers if possible, but the last ones (incl. special cases wrt. struct-return + `this` pointers). Other than that, there are just very few special cases for delegates and dynamic arrays, which only apply to `extern(D)`.
Nov 28 2018
parent reply kinke <noone nowhere.com> writes:
On Wednesday, 28 November 2018 at 20:17:53 UTC, kinke wrote:
 The stack isn't used at all
To prevent confusion: it's used of course, e.g., if there are more than 4 total parameters. Just not in the classical sense, i.e., a 16-bytes struct isn't pushed directly onto the stack, but the caller makes the copy and passes a pointer, either in a register or on the stack.
Nov 28 2018
parent reply realhet <real_het hotmail.com> writes:
Thank You for the explanation!

But my tests has different results:

void* SSE_sobelRow(ubyte* src, ubyte* dst, size_t srcStride){ asm{
   push RDI;

   mov RAX, 0; mov RDX, 0; mov RCX, 0; //clear 'parameter' 
registers

   mov RAX, src;
   mov RDI, dst;

   //gen
   movups XMM0,[RAX];
   movaps XMM1,XMM0;
   pslldq XMM0,1;
   movaps XMM2,XMM1;
   psrldq XMM1,1;
   pavgb XMM1,XMM0;
   pavgb XMM1,XMM2;
   movups [RDI],XMM1;
   //gen end

   pop RDI;
}}

When I clear those volatile regs that are used for register 
calling, I'm still able to get good results.
However when I put "mov [RBP+8], 0" into the code it generates an 
access violation, so this is why I think parameters are on the 
stack.

What I'm really unsire is that the registers I HAVE TO save in my 
asm routine.
Currently I think I only able to trash the contents of RAX, RCX, 
RDX, XMM0..XMM5 based on the Microsoft calling model. But I'm not 
sure what's the actual case with LDC2 Win64.

If my code is surrounded by SSE the optimizations of the LDC2 
compiler, and I can't satisfy the requirements, I will have 
random errors in the future. I better avoid those.

On the 32bit target the rule is simpe: you could do with all the 
XMM regs and a,c,d what you want. Now at 64bit I'm quite unsure. 
:S
Nov 28 2018
parent reply kinke <noone nowhere.com> writes:
You're not using naked asm; this entails a prologue (spilling the 
params to stack etc.). Additionally, LDC doesn't really like 
accessing params and locals in DMD-style inline asm, see 
https://github.com/ldc-developers/ldc/issues/2854.

You can check the final asm trivially online, e.g., 
https://run.dlang.io/is/e0c2Ly (click the ASM button). You'll see 
that your params are in R8, RDX and RCX (reversed order as 
mentioned earlier).
Nov 28 2018
parent reply realhet <real_het hotmail.com> writes:
On Wednesday, 28 November 2018 at 21:58:16 UTC, kinke wrote:
 You're not using naked asm; this entails a prologue (spilling 
 the params to stack etc.). Additionally, LDC doesn't really 
 like accessing params and locals in DMD-style inline asm, see 
 https://github.com/ldc-developers/ldc/issues/2854.

 You can check the final asm trivially online, e.g., 
 https://run.dlang.io/is/e0c2Ly (click the ASM button). You'll 
 see that your params are in R8, RDX and RCX (reversed order as 
 mentioned earlier).
Hi again. I just tried a new debugger: x64dbg. I really like it, it is not the bloatware I got used to nowadays. It turns out that LDC2's parameter/register handling is really clever: - Register saving/restoring: fully automatic. It analyzes my asm and saves/restores only those regs I overwrite. - Parameters: Reversed Microsoft x64 calling convention, just as you said. Parameters in the registers will be 'spilled' onto the stack no matter if I'm using them by their names or by the register. Maybe this is not too clever but as I can use the params by their name from anywhere, it can make my code nicer. - Must not use the "ret" instruction because it will take it literally and will skip the auto-generated exit code. In conclusion: Maybe LDC2 generates a lot of extra code, but I always make longer asm routines, so it's not a problem for me at all while it helps me a lot.
Nov 29 2018
parent Johan Engelen <j j.nl> writes:
On Thursday, 29 November 2018 at 15:10:41 UTC, realhet wrote:
 In conclusion: Maybe LDC2 generates a lot of extra code, but I 
 always make longer asm routines, so it's not a problem for me 
 at all while it helps me a lot.
An extra note: I recommend you look into using `ldc.llvmasm.__asm` to write inline assembly. Some advantages: no worrying about calling conventions (portability) and you'll have more instructions available. If you care about performance, usually you should _not_ write assembly, but for the 1% of other cases: the compiler also understands your asm much better if you use __asm. LDC's __asm syntax is very similar (if not the same) to what GDC uses for inline assembly. -Johan
Dec 01 2018