digitalmars.D.announce - Godbolt.org: mir-algorithm was added

• Ilya Yaroshenko (113/113) Sep 21 2017 Mir Algorithm and Mir GLAS (glas is experimental) was added to
Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
```Mir Algorithm and Mir GLAS (glas is experimental) was added to
https://d.godbolt.org
by Johan Engelen. Thanks you, Johan!

Try it:
1. Select mir-algorithm 0.6.13 from the libraries list (after
Intel button)
2. Select LDC 1.4.0
3. Add compiler flags: -O -release -mcpu=cannonlake
5. Enjoy AVX512 instructions with fused math :)
-------
// Euclidian norm
import mir.ndslice;
import mir.math.common;

fastmath double norm2(ContiguousVector!double x) {
return 0.0.reduce!"a + b * b"(x).sqrt;
}

--------
Output:

double example.norm2(mir.ndslice.slice.Slice!(2, [1],
double*).Slice):
mov rax, qword ptr [rsp + 8]
test rax, rax
je .LBB0_1
lea rcx, [rsp + 8]
mov rcx, qword ptr [rcx + 8]
vxorpd xmm0, xmm0, xmm0
cmp rax, 32
jb .LBB0_12
mov r8, rax
and r8, -32
mov rsi, rax
and rsi, -32
je .LBB0_12
lea rdi, [rsi - 32]
mov rdx, rdi
shr rdx, 5
bt edi, 5
jb .LBB0_5
vmovupd zmm0, zmmword ptr [rcx]
vmovupd zmm1, zmmword ptr [rcx + 64]
vmovupd zmm2, zmmword ptr [rcx + 128]
vmovupd zmm3, zmmword ptr [rcx + 192]
vmulpd zmm0, zmm0, zmm0
vmulpd zmm1, zmm1, zmm1
vmulpd zmm2, zmm2, zmm2
vmulpd zmm3, zmm3, zmm3
mov r9d, 32
test rdx, rdx
jne .LBB0_8
jmp .LBB0_10
.LBB0_1:
vxorps xmm0, xmm0, xmm0
vsqrtsd xmm0, xmm0, xmm0
ret
.LBB0_5:
vxorpd zmm0, zmm0, zmm0
xor r9d, r9d
vxorpd zmm1, zmm1, zmm1
vxorpd zmm2, zmm2, zmm2
vxorpd zmm3, zmm3, zmm3
test rdx, rdx
je .LBB0_10
.LBB0_8:
mov rdi, rsi
sub rdi, r9
lea rdx, [rcx + 8*r9 + 448]
.LBB0_9:
vmovupd zmm4, zmmword ptr [rdx - 448]
vmovupd zmm5, zmmword ptr [rdx - 384]
vmovupd zmm6, zmmword ptr [rdx - 320]
vmovupd zmm7, zmmword ptr [rdx - 256]
vmovupd zmm0, zmmword ptr [rdx - 192]
vmovupd zmm1, zmmword ptr [rdx - 128]
vmovupd zmm2, zmmword ptr [rdx - 64]
vmovupd zmm3, zmmword ptr [rdx]
jne .LBB0_9
.LBB0_10:
vshuff64x2 zmm1, zmm0, zmm0, 14
vpermpd zmm1, zmm0, 238
vpermilpd zmm1, zmm0, 1
cmp rax, rsi
je .LBB0_13
sub rax, r8
lea rcx, [rcx + 8*rsi]
.LBB0_12:
vmovsd xmm1, qword ptr [rcx]
jne .LBB0_12
.LBB0_13:
vsqrtsd xmm0, xmm0, xmm0
ret

Bet regards,
Ilya
```
Sep 21 2017
Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
```On Friday, 22 September 2017 at 03:51:36 UTC, Ilya Yaroshenko
wrote:
Mir Algorithm and Mir GLAS (glas is experimental) was added to
https://d.godbolt.org

```
Sep 21 2017
Arun Chandrasekaran <aruncxy gmail.com> writes:
```On Friday, 22 September 2017 at 03:51:36 UTC, Ilya Yaroshenko
wrote:
Mir Algorithm and Mir GLAS (glas is experimental) was added to
https://d.godbolt.org
by Johan Engelen. Thanks you, Johan!

[...]

Honestly, how do you guys understand these assembly instructions
that's further optimized by the complier? Am I alone here?
```
Sep 21 2017
rikki cattermole <rikki cattermole.co.nz> writes:
```On 22/09/2017 5:36 AM, Arun Chandrasekaran wrote:
On Friday, 22 September 2017 at 03:51:36 UTC, Ilya Yaroshenko wrote:
Mir Algorithm and Mir GLAS (glas is experimental) was added to
https://d.godbolt.org
by Johan Engelen. Thanks you, Johan!

[...]

Honestly, how do you guys understand these assembly instructions that's
further optimized by the complier? Am I alone here?