www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Using YMM registers causes an undefined label error

reply z <z z.com> writes:
XMM registers work, but as soon as they are changed into YMM DMD 
outputs "bad type/size of operands %s" and LDC outputs an "label 
YMM0 is undefined" error. Are they not supported?
To illutrate : https://run.dlang.io/is/IqDHlK

By the way, how can i use instructions that are not listed in 
[1]?(vfmaddxxxps for example) And how are function parameters 
accessed if they are not on the stack?(looking up my own code in 
a debugger, i see that the majority of pointer parameters are 
already in registers rather than being on the stack.)
I need those so that i can write a better answer for [2].

Big thanks
[1] https://dlang.org/spec/iasm.html#supported_opcodes
[2] 
https://forum.dlang.org/thread/qyybpvwvbfkhlvulvuxa forum.dlang.org
Mar 05 2021
next sibling parent reply Rumbu <rumbu rumbu.ro> writes:
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:
 XMM registers work, but as soon as they are changed into YMM 
 DMD outputs "bad type/size of operands %s" and LDC outputs an 
 "label YMM0 is undefined" error. Are they not supported?
 To illutrate : https://run.dlang.io/is/IqDHlK

 By the way, how can i use instructions that are not listed in 
 [1]?(vfmaddxxxps for example) And how are function parameters 
 accessed if they are not on the stack?(looking up my own code 
 in a debugger, i see that the majority of pointer parameters 
 are already in registers rather than being on the stack.)
 I need those so that i can write a better answer for [2].

 Big thanks
 [1] https://dlang.org/spec/iasm.html#supported_opcodes
 [2] 
 https://forum.dlang.org/thread/qyybpvwvbfkhlvulvuxa forum.dlang.org
First of all, in 64 bit ABI, parameters are not passed on stack, therefore a[RBP] is a nonsense. void complement32(simdbytes* a, simdbytes* b) a is in RCX, b is in RDX on Windows a is in RDI, b is in RSI on Linux Secondly, there is no such thing as movaps YMMX, [RAX], but vmovaps YMM3, [RAX] Same for vxorps, but there are 3 operands, not 2.
Mar 05 2021
parent reply z <z z.com> writes:
On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:
 First of all, in 64 bit ABI, parameters are not passed on 
 stack, therefore a[RBP] is a nonsense.

 void complement32(simdbytes* a, simdbytes* b)

 a is in RCX, b is in RDX on Windows
 a is in RDI, b is in RSI on Linux
I'm confused, with your help i've been able to find the function calling convention but on LDC-generated code, sometimes i see the layout being reversed(The function i was looking at is a 7 argument function, all are pointers. The first argument is on the stack, the seventh and last is in RCX) and the offsets don't seem to make sense either(first arguemnt as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])
 Secondly, there is no such thing as movaps YMMX, [RAX], but 
 vmovaps YMM3, [RAX]
 Same for vxorps, but there are 3 operands, not 2.
You're absolutely right, but apparently it only accepts the two-operand version from SSE. Other AVX/AVX2/AVX512 instructions that have «v» prefixed aren't recognized either("Error: unknown opcode vmovaps"), is AVX(2) with YMM registers supported for «asm{}» statements?
Mar 05 2021
parent reply Rumbu <rumbu rumbu.ro> writes:
On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:
 On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:
 First of all, in 64 bit ABI, parameters are not passed on 
 stack, therefore a[RBP] is a nonsense.

 void complement32(simdbytes* a, simdbytes* b)

 a is in RCX, b is in RDX on Windows
 a is in RDI, b is in RSI on Linux
I'm confused, with your help i've been able to find the function calling convention but on LDC-generated code, sometimes i see the layout being reversed(The function i was looking at is a 7 argument function, all are pointers. The first argument is on the stack, the seventh and last is in RCX) and the offsets don't seem to make sense either(first arguemnt as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])
 Secondly, there is no such thing as movaps YMMX, [RAX], but 
 vmovaps YMM3, [RAX]
 Same for vxorps, but there are 3 operands, not 2.
You're absolutely right, but apparently it only accepts the two-operand version from SSE. Other AVX/AVX2/AVX512 instructions that have «v» prefixed aren't recognized either("Error: unknown opcode vmovaps"), is AVX(2) with YMM registers supported for «asm{}» statements?
I just made some tests, it seems that D has invented his own calling convention. And it's not documented. If you decorate your function with extern(C) it should respect the x86-64 ABI conventions. This is what I got for a 7 parameters function. The two compilers seems to do the same thing: param no., extern(C), extern(D) 1 RCX RSP + 56 2 RDX RSP + 48 3 R8 RSP + 40 4 R9 R9 5 RSP + 40 R8 6 RSP + 48 RDX 7 RSP + 56 RCX I would stick to extern(C), the extern(D) convention seems completely illogical, they push the first 3 parameters on the stack from left to right, but if there are less than 4, they use register transfer. WTF. Note: tested on Windows, probably on Linux both conventions will use Linux ABI conventional registers and will not reserve 32 bytes on stack. Now, on the other side, it seems that LDC is one step behind DMD because - you are right - it doesn't support AVX-2 instructions operating on ymm registers.
Mar 06 2021
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 6 March 2021 at 10:45:08 UTC, Rumbu wrote:
 On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:
 [...]
I just made some tests, it seems that D has invented his own calling convention. And it's not documented. If you decorate your function with extern(C) it should respect the x86-64 ABI conventions. This is what I got for a 7 parameters function. The two compilers seems to do the same thing: [...]
What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.
Mar 06 2021
parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:

 What... Is this really how it's supposed to be? Makes no sense 
 to not use any of the existing conventions.
extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows: https://dlang.org/spec/abi.html#function_calling_conventions There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.
Mar 06 2021
next sibling parent reply kinke <noone nowhere.com> writes:
On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:
 On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:

 What... Is this really how it's supposed to be? Makes no sense 
 to not use any of the existing conventions.
extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows: https://dlang.org/spec/abi.html#function_calling_conventions There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.
The main difference is that the params are reversed for extern(D), at least with DMD and LDC, not with GDC. And that can't be easily changed because of all the naked DMD-style inline asm code (GDC doesn't support that, so no problem for GDC). This comes up regularly here in this forum whenever people experiment with DMD-style asm. There are other slight breakages of that 'spec', e.g., LDC's extern(D) ABI is very similar to Microsoft's __vectorcall (so that e.g. vectors are passed in registers).
Mar 06 2021
parent kinke <noone nowhere.com> writes:
On Saturday, 6 March 2021 at 12:29:07 UTC, kinke wrote:
 There are other slight breakages of that 'spec', e.g., LDC's 
 extern(D) ABI is very similar to Microsoft's __vectorcall (so 
 that e.g. vectors are passed in registers).
[Windows only, to prevent any more confusion.]
Mar 06 2021
prev sibling parent reply Rumbu <rumbu rumbu.ro> writes:
On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:
 On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:

 What... Is this really how it's supposed to be? Makes no sense 
 to not use any of the existing conventions.
extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows: https://dlang.org/spec/abi.html#function_calling_conventions There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.
Where exactly is documented the extern(D) x86-64 calling convention? Because currently seems like a mess according to the dissasembly. First X parameters on stack from left to right, last 4 in registers. But wait, if you have less than 4 parameters, they are passed in register. Again, WTF?
Mar 06 2021
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 6 March 2021 at 15:40:56 UTC, Rumbu wrote:
 On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:
 [...]
Where exactly is documented the extern(D) x86-64 calling convention? Because currently seems like a mess according to the dissasembly. First X parameters on stack from left to right, last 4 in registers. But wait, if you have less than 4 parameters, they are passed in register. Again, WTF?
Reading this, I'm experiencing true fear for the first time in my life.
Mar 06 2021
parent Guillaume Piolat <first.name spam.org> writes:
On Saturday, 6 March 2021 at 16:09:03 UTC, Imperatorn wrote:
 On Saturday, 6 March 2021 at 15:40:56 UTC, Rumbu wrote:
 On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:
 [...]
Where exactly is documented the extern(D) x86-64 calling convention? Because currently seems like a mess according to the dissasembly. First X parameters on stack from left to right, last 4 in registers. But wait, if you have less than 4 parameters, they are passed in register. Again, WTF?
Reading this, I'm experiencing true fear for the first time in my life.
I'm also learning that extern(D) is different across compilers in some cases, but it isn't that bad. Preferred ABI boundaries across executables is extern(C). If you deal with static librariries, then they are likely built from the same compiler too. When LDC change the extern(D) ABI, it is rightfully a minor change as everything will get rebuilt. https://github.com/ldc-developers/ldc/releases/tag/v1.25.0 Besides, such changes are there for efficiency :)
Mar 06 2021
prev sibling next sibling parent kinke <noone nowhere.com> writes:
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:
 XMM registers work, but as soon as they are changed into YMM 
 DMD outputs "bad type/size of operands %s" and LDC outputs an 
 "label YMM0 is undefined" error. Are they not supported?
 To illutrate : https://run.dlang.io/is/IqDHlK
LDC's support for DMD-style inline asm is limited; GDC-style inline asm is the preferred way (e.g., not restricted to x86[_64] and no need to worry about calling convention details). Your example can be reduced to a trivial: import core.simd; ubyte32 complement32(ubyte32 a, ubyte32 b) { return a ^ b; } which yields the following asm with `ldc2 -mattr=avx -O` (see https://d.godbolt.org/z/ex7YE7): _D7example12complement32FNhG32hQgZQj: vxorps ymm0, ymm1, ymm0 ret
Mar 06 2021
prev sibling parent reply z <z z.com> writes:
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:
 ...
Then it seems the only way to get AVX-compatible inline assembly(ldc.llvmasm excluded) is to use an external assembler. For example :
import std.stdio;
extern(C) void vxorps_d(ubyte[32]*);

void main() {
	ubyte[32] a = 2;
	writefln!"Contents of a before : %( %s %)"(a);
	vxorps_d(&a);
	writefln!"Contents of a after  : %( %s %)"(a);
}
BITS 64

global vxorps_d

section .text2
vxorps_d:
	vmovups ymm0, [rcx];
	mov rdx, zerofilled
	vbroadcastss ymm1, [rdx]
	vxorps ymm0, ymm0, ymm1
	vmovups [rcx], ymm0
	ret
zerofilled:
db 0xFF,0xFF,0xFF,0xFF
nasm -g -f win64 asmfile.asm
dmd vxorpstest.d asmfile.obj -m64
ldc vxorpstest.d asmfile.obj -m64
vxorpstest.exe
Contents of a before :  2  2  2... (0x02/0b0000_0010)
Contents of a after  :  253  253  253...(0xFD/0b1111_1101)
Mar 09 2021
parent reply z <z z.com> writes:
On Tuesday, 9 March 2021 at 20:23:48 UTC, z wrote:
 On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:
 ...
Then it seems the only way to get AVX-compatible inline assembly(ldc.llvmasm excluded) is to use an external assembler. For example :
...
But i'm not really sure how to integrate that into a dub project, it seems «lflags "filename.obj"» and preGenerateCommands/preBuildCommands would work but i haven't tested that.(«dflags "filename.obj"» doesn't work for sure)
Mar 09 2021
parent z <z z.com> writes:
On Tuesday, 9 March 2021 at 20:33:01 UTC, z wrote:
 On Tuesday, 9 March 2021 at 20:23:48 UTC, z wrote:
 On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:
 ...
Then it seems the only way to get AVX-compatible inline assembly(ldc.llvmasm excluded) is to use an external assembler. For example :
...
But i'm not really sure how to integrate that into a dub project, it seems «lflags "filename.obj"» and preGenerateCommands/preBuildCommands would work but i haven't tested that.(«dflags "filename.obj"» doesn't work for sure)
In dub.sdl :
lflags "source/asmfunctions.obj"
preGenerateCommands " cd source && *command or bat/sh file that 
builds the asm object file(s)*"
It works, but if the package is being imported by another then it will fail because the way lflags work mean that the linker will try to find source/asmfunctions.obj from the working directory of the importer. This is circumventable with relative paths(if possible).
lflags "../importedpackagesname/source/asmfunctions.obj"
Mar 19 2021