www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 23013] New: generate optimized SIMD register assignment


          Issue ID: 23013
           Summary: generate optimized SIMD register assignment
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Keywords: backend, performance
          Severity: enhancement
          Priority: P1
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: dkorpel live.nl

See: https://github.com/dlang/dmd/pull/13977#issuecomment-1098199644

import core.simd;

double2 set0(double2 x, double* a)
    x[0] = *a;
    return x;

double2 set1(double2 x, double* a)
    x[1] = *a;
    return x;

GDC generates this optimized code:
        movlpd  xmm0, QWORD PTR [rdi]
        movhpd  xmm0, QWORD PTR [rdi]

But DMD -O still does a roundtrip to stack memory:
assume  CS:.text.set1
    push    RBP
    mov     RBP,RSP
    sub     RSP,010h
    movapd  -010h[RBP],XMM0
    movsd   XMM1,[RDI]
    movsd   -8[RBP],XMM1
    movapd  XMM0,-010h[RBP]
    mov     RSP,RBP
    pop     RBP

In dmd.backend.cod1.getlvalue, vector variables are prevented from being in a
register because the backend doesn't generate the correct assignment
instructions yet. For example, it would use movsd for `x[0] = *a`, which clears
the upper 64 bits of the XMM0 register and accidentally set `x[1] = 0` (see
issue 21673 and issue 23009).

When this is fixed, SIMD code gen can be improved by allowing vector variables
to be put in registers again.

Apr 13 2022