digitalmars.D.bugs - [Issue 21027] New: Backend: DMD use 'rep stosb' even for ulong arrays
- d-bugmail puremagic.com (53/54) Jul 07 2020 https://issues.dlang.org/show_bug.cgi?id=21027
https://issues.dlang.org/show_bug.cgi?id=21027 Issue ID: 21027 Summary: Backend: DMD use 'rep stosb' even for ulong arrays Product: D Version: D2 Hardware: x86_64 OS: Linux Status: NEW Keywords: performance Severity: normal Priority: P1 Component: dmd Assignee: nobody puremagic.com Reporter: pro.mathias.lang gmail.com Take the following code: ``` alias Content = ulong[256]; void main () { Content v; } ``` What DMD generates for this is on Linux c86_64 (used `run.dlang.org`): ``` .text._Dmain segment assume CS:.text._Dmain _Dmain: push RBP mov RBP,RSP sub RSP,0808h mov ECX,0800h mov qword ptr -8[RBP],0 lea RAX,-8[RBP] mov AL,[RAX] lea RDI,0FFFFF7F8h[RBP] rep stosb xor EAX,EAX leave ret add [RAX],AL .text._Dmain ends ``` The best to do here would be to call `memset` or `memcpy`, which is what LDC does. The second best would be to use `rep stosd` 0x100 times, as it is faster than `rep stosb` 0x800 times. Source: - Agner Fog, optimizing assembly (https://www.agner.org/optimize/optimizing_assembly.pdf), 16.9 Strings instructions (all processors):`REP MOVSD` and `REP STOSD` are quite fast if the repeat count is not too small. The largest word size (DWORD in 32-bit mode, QWORD in 64-bit mode) is preferred. Both source and destination should be aligned by the word size or better. In many cases, however, it is faster to use vector registers. Moving data in the largest available registers is faster than `REP MOVSD` and `REP STOSD` in most cases, especially on older processors. See page 150 for details.Related: https://issues.dlang.org/show_bug.cgi?id=14458 --
Jul 07 2020