www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 21027] New: Backend: DMD use 'rep stosb' even for ulong arrays

https://issues.dlang.org/show_bug.cgi?id=21027

          Issue ID: 21027
           Summary: Backend: DMD use 'rep stosb' even for ulong arrays
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Keywords: performance
          Severity: normal
          Priority: P1
         Component: dmd
          Assignee: nobody puremagic.com
          Reporter: pro.mathias.lang gmail.com

Take the following code:
```
alias Content = ulong[256];
void main ()
{
    Content v;
}
```

What DMD generates for this is on Linux c86_64 (used `run.dlang.org`):
```
.text._Dmain    segment
        assume  CS:.text._Dmain
_Dmain:
                push    RBP
                mov     RBP,RSP
                sub     RSP,0808h
                mov     ECX,0800h
                mov     qword ptr -8[RBP],0
                lea     RAX,-8[RBP]
                mov     AL,[RAX]
                lea     RDI,0FFFFF7F8h[RBP]
                rep
                stosb
                xor     EAX,EAX
                leave
                ret
                add     [RAX],AL
.text._Dmain    ends
```

The best to do here would be to call `memset` or `memcpy`, which is what LDC
does.
The second best would be to use `rep stosd` 0x100 times, as it is faster than
`rep stosb` 0x800 times.

Source:
- Agner Fog, optimizing assembly
(https://www.agner.org/optimize/optimizing_assembly.pdf), 16.9 Strings
instructions (all processors):
 `REP MOVSD` and `REP STOSD` are quite fast if the repeat count is not too
small. The largest word size (DWORD in 32-bit mode, QWORD in 64-bit mode) is
preferred. Both source and destination should be aligned by the word size or
better. In many cases, however, it is faster to use vector registers. Moving
data in the largest available registers is faster than `REP MOVSD` and `REP
STOSD` in most cases, especially on older processors. See page 150 for details.
Related: https://issues.dlang.org/show_bug.cgi?id=14458 --
Jul 07 2020