www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 19663] New: On x86_64 the fabs intrinsic should use SSE

https://issues.dlang.org/show_bug.cgi?id=19663

          Issue ID: 19663
           Summary: On x86_64 the fabs intrinsic should use SSE
           Product: D
           Version: D2
          Hardware: x86_64
                OS: All
            Status: NEW
          Keywords: performance
          Severity: enhancement
          Priority: P1
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: b2.temp gmx.com

Currently on x86_64 dmd backend uses the FPU FABS homonymous instruction but
since `single` and `double` parameters are passed, as defined by ABI, in SSE
registers, the they have to travel from these SSE registers to GP registers
then only to FPU registers and depending on what's done with the absolute value
that's obtained: back to a GP register (and all of this to clear a bit !), then
again back to SSE register if the func has to return the value etc.

It would be more wise to use SSE logical AND with a mask.
This would be done only for the single and double types.

Several options exist
1. generate mask and ANDPS/ANDPD
2. ANDPS/ANDPD on a constant mask (LDC2 does that btw)
3. left shift and right shift by one 


Forum discussion:
https://forum.dlang.org/post/diljelbvmenuxtaqbuxw forum.dlang.org

Reference for the possible solutions:
https://stackoverflow.com/questions/32408665/fastest-way-to-compute-absolute-value-using-sse

--
Feb 09 2019