digitalmars.D.bugs - [Issue 17484] New: high penalty for vbroadcastsd with -mcpu=avx
- via Digitalmars-d-bugs (25/25) Jun 08 2017 https://issues.dlang.org/show_bug.cgi?id=17484
https://issues.dlang.org/show_bug.cgi?id=17484 Issue ID: 17484 Summary: high penalty for vbroadcastsd with -mcpu=avx Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: normal Priority: P3 Component: dmd Assignee: nobody puremagic.com Reporter: code dawg.eu With -mcpu=avx, the compiler emits vbroadcastsd ymm2, qword ptr [rsp] even when initializing only 128-bit wide double2 variables. This causes a high 50-80 cycle penalty when later some legacy SSE instruction is used with such a register value (or a derived value), because the CPU does not know that the upper bits are zero, and apparently preserves them in an internal register buffer. https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx We should A not write to 256-bit wide YMM registers when only 128-bit wide XMM registers are used, and B avoid mixing legacy encoded SSE instructions (movsd) with vex encoded AVX-128 instructions, i.e. use vmovsd instead of movsd. --
Jun 08 2017