digitalmars.D.bugs - [Issue 15255] New: Generated better code for saturation arithmetic
- via Digitalmars-d-bugs (165/165) Oct 28 2015 https://issues.dlang.org/show_bug.cgi?id=15255
https://issues.dlang.org/show_bug.cgi?id=15255 Issue ID: 15255 Summary: Generated better code for saturation arithmetic Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P1 Component: dmd Assignee: nobody puremagic.com Reporter: bugzilla digitalmars.com I've just spent a couple of hours trying to convince the compiler to generate the code I want to see for saturating arithmetic. DMD generates a lot of extra work. LDC does pretty well, but I still couldn't convince it to produce the code I wanted to see. GDC is the same as LDC, except one of my permutations seems to have hit the sweet spot! ;) The asm I'm chasing is: add %eax,%ebx sbb %eax,%eax or %ebx,%eax I'm pretty sure this is the best result, but I can't convince DMD or LDC to emit that code. Code: pure nothrow safe nogc: template Promote(U) { static if(is(U == ubyte)) alias Promote = ushort; else static if(is(U == ushort)) alias Promote = uint; else static if(is(U == uint)) alias Promote = ulong; else static if(is(U == ulong)) alias Promote = ucent; else static assert(false, "!"); } // note; I cast results everywhere because D's integer promotion seemed to interfere U addSat(U)(U a, U b) if(isIntegral!U && isUnsigned!U) { return cast(U)((a > cast(U)(U.max - b)) ? cast(U)U.max : cast(U)(a + b)); } U addSat2(U)(U a, U b) if(isIntegral!U && isUnsigned!U) { Promote!U c = cast(Promote!U)a+b; return cast(U)(-cast(U)(c >> (U.sizeof*8)) | cast(U)c); // expect: // add %eax,%ebx // sbb %eax,%eax // or %ebx,%eax } U addSat3(U)(U a, U b) if(isIntegral!U && isUnsigned!U) { U c = a + b; if (c < a) // can only happen due to overflow c = U(-1); return c; } U addSat4(U)(U a, U b) if(isIntegral!U && isUnsigned!U) { U c = a + b; c |= -U(c < a); return c; } When I generate those functions for uint, I get these: DMD's output: pure nothrow nogc safe uint std.experimental.color.internal.saturate.addSat!(uint).addSat(uint, uint): 0000000000000000: push rbp 0000000000000001: mov rbp,rsp 0000000000000004: mov dword ptr [rbp+10h],ecx 0000000000000007: mov dword ptr [rbp+18h],edx 000000000000000A: mov eax,0FFFFFFFFh 000000000000000F: sub eax,dword ptr [rbp+10h] 0000000000000012: cmp eax,dword ptr [rbp+18h] 0000000000000015: mov eax,0FFFFFFFFh 000000000000001A: jb 0000000000000022 000000000000001C: mov ecx,dword ptr [rbp+10h] 000000000000001F: lea eax,[rdx+rcx] 0000000000000022: pop rbp 0000000000000023: ret pure nothrow nogc safe uint std.experimental.color.internal.saturate.addSat2!(uint).addSat2(uint, uint): 0000000000000000: push rbp 0000000000000001: mov rbp,rsp 0000000000000004: sub rsp,10h 0000000000000008: mov dword ptr [rbp+10h],ecx 000000000000000B: mov dword ptr [rbp+18h],edx 000000000000000E: mov eax,dword ptr [rbp+18h] 0000000000000011: mov edx,dword ptr [rbp+10h] 0000000000000014: add rax,rdx 0000000000000017: mov qword ptr [rbp-8],rax 000000000000001B: mov eax,dword ptr [rbp-4] 000000000000001E: neg eax 0000000000000020: or eax,dword ptr [rbp-8] 0000000000000023: lea rsp,[rbp] 0000000000000027: pop rbp 0000000000000028: ret pure nothrow nogc safe uint std.experimental.color.internal.saturate.addSat3!(uint).addSat3(uint, uint): 0000000000000000: push rbp 0000000000000001: mov rbp,rsp 0000000000000004: lea eax,[rdx+rcx] 0000000000000007: cmp eax,edx 0000000000000009: jae 0000000000000010 000000000000000B: mov eax,0FFFFFFFFh 0000000000000010: pop rbp 0000000000000011: ret pure nothrow nogc safe uint std.experimental.color.internal.saturate.addSat4!(uint).addSat4(uint, uint): 0000000000000000: push rbp 0000000000000001: mov rbp,rsp 0000000000000004: lea r8d,[rdx+rcx] 0000000000000008: cmp r8d,edx 000000000000000B: setb al 000000000000000E: movzx eax,al 0000000000000011: neg eax 0000000000000013: or r8d,eax 0000000000000016: mov eax,r8d 0000000000000019: pop rbp 000000000000001A: ret LDC's output: addSat: leal (%rdx,%rcx), %r8d notl %ecx cmpl %ecx, %edx movl $-1, %eax cmovbel %r8d, %eax retq addSat2: movl %edx, %edx movl %ecx, %eax addq %rdx, %rax movq %rax, %rcx shrq $32, %rcx negl %ecx orl %ecx, %eax retq addSat3: addl %edx, %ecx movl $-1, %eax cmovael %ecx, %eax retq addSat4: addl %edx, %ecx movl $-1, %eax cmovael %ecx, %eax retq GDC is almost identical to LDC, except: addSat4: addl %edi, %esi sbbl %eax, %eax orl %esi, %eax ret I think this is what I was chasing! (not sure what the deal is with the choice of registers though?) Anyway, I'm surprised by almost all these results, I thought all compilers would do better on such simple code. Thought you might be interested. Cheers. - Manu --
Oct 28 2015