digitalmars.D.learn - Using YMM registers causes an undefined label error
- z (14/14) Mar 05 2021 XMM registers work, but as soon as they are changed into YMM DMD
- Rumbu (9/23) Mar 05 2021 First of all, in 64 bit ABI, parameters are not passed on stack,
- z (13/21) Mar 05 2021 I'm confused, with your help i've been able to find the function
- Rumbu (24/47) Mar 06 2021 I just made some tests, it seems that D has invented his own
- Imperatorn (3/11) Mar 06 2021 What... Is this really how it's supposed to be? Makes no sense to
- Mike Parker (8/10) Mar 06 2021 extern(C) and extern(D) are both documented to be the same as the
- kinke (10/21) Mar 06 2021 The main difference is that the params are reversed for
- kinke (2/5) Mar 06 2021 [Windows only, to prevent any more confusion.]
- Rumbu (6/17) Mar 06 2021 Where exactly is documented the extern(D) x86-64 calling
- Imperatorn (3/10) Mar 06 2021 Reading this, I'm experiencing true fear for the first time in my
- Guillaume Piolat (10/21) Mar 06 2021 I'm also learning that extern(D) is different across compilers in
- kinke (15/19) Mar 06 2021 LDC's support for DMD-style inline asm is limited; GDC-style
- z (4/31) Mar 09 2021 Then it seems the only way to get AVX-compatible inline
XMM registers work, but as soon as they are changed into YMM DMD outputs "bad type/size of operands %s" and LDC outputs an "label YMM0 is undefined" error. Are they not supported? To illutrate : https://run.dlang.io/is/IqDHlK By the way, how can i use instructions that are not listed in [1]?(vfmaddxxxps for example) And how are function parameters accessed if they are not on the stack?(looking up my own code in a debugger, i see that the majority of pointer parameters are already in registers rather than being on the stack.) I need those so that i can write a better answer for [2]. Big thanks [1] https://dlang.org/spec/iasm.html#supported_opcodes [2] https://forum.dlang.org/thread/qyybpvwvbfkhlvulvuxa forum.dlang.org
Mar 05 2021
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:XMM registers work, but as soon as they are changed into YMM DMD outputs "bad type/size of operands %s" and LDC outputs an "label YMM0 is undefined" error. Are they not supported? To illutrate : https://run.dlang.io/is/IqDHlK By the way, how can i use instructions that are not listed in [1]?(vfmaddxxxps for example) And how are function parameters accessed if they are not on the stack?(looking up my own code in a debugger, i see that the majority of pointer parameters are already in registers rather than being on the stack.) I need those so that i can write a better answer for [2]. Big thanks [1] https://dlang.org/spec/iasm.html#supported_opcodes [2] https://forum.dlang.org/thread/qyybpvwvbfkhlvulvuxa forum.dlang.orgFirst of all, in 64 bit ABI, parameters are not passed on stack, therefore a[RBP] is a nonsense. void complement32(simdbytes* a, simdbytes* b) a is in RCX, b is in RDX on Windows a is in RDI, b is in RSI on Linux Secondly, there is no such thing as movaps YMMX, [RAX], but vmovaps YMM3, [RAX] Same for vxorps, but there are 3 operands, not 2.
Mar 05 2021
On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:First of all, in 64 bit ABI, parameters are not passed on stack, therefore a[RBP] is a nonsense. void complement32(simdbytes* a, simdbytes* b) a is in RCX, b is in RDX on Windows a is in RDI, b is in RSI on LinuxI'm confused, with your help i've been able to find the function calling convention but on LDC-generated code, sometimes i see the layout being reversed(The function i was looking at is a 7 argument function, all are pointers. The first argument is on the stack, the seventh and last is in RCX) and the offsets don't seem to make sense either(first arguemnt as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])Secondly, there is no such thing as movaps YMMX, [RAX], but vmovaps YMM3, [RAX] Same for vxorps, but there are 3 operands, not 2.You're absolutely right, but apparently it only accepts the two-operand version from SSE. Other AVX/AVX2/AVX512 instructions that have «v» prefixed aren't recognized either("Error: unknown opcode vmovaps"), is AVX(2) with YMM registers supported for «asm{}» statements?
Mar 05 2021
On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:I just made some tests, it seems that D has invented his own calling convention. And it's not documented. If you decorate your function with extern(C) it should respect the x86-64 ABI conventions. This is what I got for a 7 parameters function. The two compilers seems to do the same thing: param no., extern(C), extern(D) 1 RCX RSP + 56 2 RDX RSP + 48 3 R8 RSP + 40 4 R9 R9 5 RSP + 40 R8 6 RSP + 48 RDX 7 RSP + 56 RCX I would stick to extern(C), the extern(D) convention seems completely illogical, they push the first 3 parameters on the stack from left to right, but if there are less than 4, they use register transfer. WTF. Note: tested on Windows, probably on Linux both conventions will use Linux ABI conventional registers and will not reserve 32 bytes on stack. Now, on the other side, it seems that LDC is one step behind DMD because - you are right - it doesn't support AVX-2 instructions operating on ymm registers.First of all, in 64 bit ABI, parameters are not passed on stack, therefore a[RBP] is a nonsense. void complement32(simdbytes* a, simdbytes* b) a is in RCX, b is in RDX on Windows a is in RDI, b is in RSI on LinuxI'm confused, with your help i've been able to find the function calling convention but on LDC-generated code, sometimes i see the layout being reversed(The function i was looking at is a 7 argument function, all are pointers. The first argument is on the stack, the seventh and last is in RCX) and the offsets don't seem to make sense either(first arguemnt as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])Secondly, there is no such thing as movaps YMMX, [RAX], but vmovaps YMM3, [RAX] Same for vxorps, but there are 3 operands, not 2.You're absolutely right, but apparently it only accepts the two-operand version from SSE. Other AVX/AVX2/AVX512 instructions that have «v» prefixed aren't recognized either("Error: unknown opcode vmovaps"), is AVX(2) with YMM registers supported for «asm{}» statements?
Mar 06 2021
On Saturday, 6 March 2021 at 10:45:08 UTC, Rumbu wrote:On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.[...]I just made some tests, it seems that D has invented his own calling convention. And it's not documented. If you decorate your function with extern(C) it should respect the x86-64 ABI conventions. This is what I got for a 7 parameters function. The two compilers seems to do the same thing: [...]
Mar 06 2021
On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows: https://dlang.org/spec/abi.html#function_calling_conventions There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.
Mar 06 2021
On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:The main difference is that the params are reversed for extern(D), at least with DMD and LDC, not with GDC. And that can't be easily changed because of all the naked DMD-style inline asm code (GDC doesn't support that, so no problem for GDC). This comes up regularly here in this forum whenever people experiment with DMD-style asm. There are other slight breakages of that 'spec', e.g., LDC's extern(D) ABI is very similar to Microsoft's __vectorcall (so that e.g. vectors are passed in registers).What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows: https://dlang.org/spec/abi.html#function_calling_conventions There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.
Mar 06 2021
On Saturday, 6 March 2021 at 12:29:07 UTC, kinke wrote:There are other slight breakages of that 'spec', e.g., LDC's extern(D) ABI is very similar to Microsoft's __vectorcall (so that e.g. vectors are passed in registers).[Windows only, to prevent any more confusion.]
Mar 06 2021
On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:On Saturday, 6 March 2021 at 11:57:13 UTC, Imperatorn wrote:Where exactly is documented the extern(D) x86-64 calling convention? Because currently seems like a mess according to the dissasembly. First X parameters on stack from left to right, last 4 in registers. But wait, if you have less than 4 parameters, they are passed in register. Again, WTF?What... Is this really how it's supposed to be? Makes no sense to not use any of the existing conventions.extern(C) and extern(D) are both documented to be the same as the platform's C calling convention everywhere except x86 windows: https://dlang.org/spec/abi.html#function_calling_conventions There have been times when differences were noted (I recall a particularly bad one related to passing structs by value on 64-bit linux) and there may be more. When they are, they should be reported in Bugzilla.
Mar 06 2021
On Saturday, 6 March 2021 at 15:40:56 UTC, Rumbu wrote:On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:Reading this, I'm experiencing true fear for the first time in my life.[...]Where exactly is documented the extern(D) x86-64 calling convention? Because currently seems like a mess according to the dissasembly. First X parameters on stack from left to right, last 4 in registers. But wait, if you have less than 4 parameters, they are passed in register. Again, WTF?
Mar 06 2021
On Saturday, 6 March 2021 at 16:09:03 UTC, Imperatorn wrote:On Saturday, 6 March 2021 at 15:40:56 UTC, Rumbu wrote:I'm also learning that extern(D) is different across compilers in some cases, but it isn't that bad. Preferred ABI boundaries across executables is extern(C). If you deal with static librariries, then they are likely built from the same compiler too. When LDC change the extern(D) ABI, it is rightfully a minor change as everything will get rebuilt. https://github.com/ldc-developers/ldc/releases/tag/v1.25.0 Besides, such changes are there for efficiency :)On Saturday, 6 March 2021 at 12:15:43 UTC, Mike Parker wrote:Reading this, I'm experiencing true fear for the first time in my life.[...]Where exactly is documented the extern(D) x86-64 calling convention? Because currently seems like a mess according to the dissasembly. First X parameters on stack from left to right, last 4 in registers. But wait, if you have less than 4 parameters, they are passed in register. Again, WTF?
Mar 06 2021
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:XMM registers work, but as soon as they are changed into YMM DMD outputs "bad type/size of operands %s" and LDC outputs an "label YMM0 is undefined" error. Are they not supported? To illutrate : https://run.dlang.io/is/IqDHlKLDC's support for DMD-style inline asm is limited; GDC-style inline asm is the preferred way (e.g., not restricted to x86[_64] and no need to worry about calling convention details). Your example can be reduced to a trivial: import core.simd; ubyte32 complement32(ubyte32 a, ubyte32 b) { return a ^ b; } which yields the following asm with `ldc2 -mattr=avx -O` (see https://d.godbolt.org/z/ex7YE7): _D7example12complement32FNhG32hQgZQj: vxorps ymm0, ymm1, ymm0 ret
Mar 06 2021
On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:...Then it seems the only way to get AVX-compatible inline assembly(ldc.llvmasm excluded) is to use an external assembler. For example :import std.stdio; extern(C) void vxorps_d(ubyte[32]*); void main() { ubyte[32] a = 2; writefln!"Contents of a before : %( %s %)"(a); vxorps_d(&a); writefln!"Contents of a after : %( %s %)"(a); }BITS 64 global vxorps_d section .text2 vxorps_d: vmovups ymm0, [rcx]; mov rdx, zerofilled vbroadcastss ymm1, [rdx] vxorps ymm0, ymm0, ymm1 vmovups [rcx], ymm0 ret zerofilled: db 0xFF,0xFF,0xFF,0xFFnasm -g -f win64 asmfile.asmdmd vxorpstest.d asmfile.obj -m64 ldc vxorpstest.d asmfile.obj -m64vxorpstest.exe Contents of a before : 2 2 2... (0x02/0b0000_0010) Contents of a after : 253 253 253...(0xFD/0b1111_1101)
Mar 09 2021
On Tuesday, 9 March 2021 at 20:23:48 UTC, z wrote:On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:But i'm not really sure how to integrate that into a dub project, it seems «lflags "filename.obj"» and preGenerateCommands/preBuildCommands would work but i haven't tested that.(«dflags "filename.obj"» doesn't work for sure)...Then it seems the only way to get AVX-compatible inline assembly(ldc.llvmasm excluded) is to use an external assembler. For example :...
Mar 09 2021
On Tuesday, 9 March 2021 at 20:33:01 UTC, z wrote:On Tuesday, 9 March 2021 at 20:23:48 UTC, z wrote:In dub.sdl :On Friday, 5 March 2021 at 12:57:43 UTC, z wrote:But i'm not really sure how to integrate that into a dub project, it seems «lflags "filename.obj"» and preGenerateCommands/preBuildCommands would work but i haven't tested that.(«dflags "filename.obj"» doesn't work for sure)...Then it seems the only way to get AVX-compatible inline assembly(ldc.llvmasm excluded) is to use an external assembler. For example :...lflags "source/asmfunctions.obj" preGenerateCommands " cd source && *command or bat/sh file that builds the asm object file(s)*"It works, but if the package is being imported by another then it will fail because the way lflags work mean that the linker will try to find source/asmfunctions.obj from the working directory of the importer. This is circumventable with relative paths(if possible).lflags "../importedpackagesname/source/asmfunctions.obj"
Mar 19 2021