digitalmars.D.learn - Virtual value types during compile-time for static type safety, static

Tamas (53/53) Jul 17 2015 I got inspired by Andrei's "Generic Programming Must Go" talk:

ZombineDev (5/8) Jul 17 2015 Did you try looking at assembly generated by GDC or LDC with full

ZombineDev (3/11) Jul 17 2015 Also, see the table at the bottom of this comment:

Tamas (9/21) Jul 17 2015 Thanks for the pointers! My first priority a performant library,

Adam D. Ruppe (2/3) Jul 17 2015 what compile flags?

Tamas (198/198) Jul 18 2015 I made a thorough comparison using multiple compilers and a

Tamas (9/9) Jul 18 2015 Sorry, the main function of positive0.d correctly looks like this:
Adam D. Ruppe (4/7) Jul 18 2015 Try adding the automatic optimize flags in all your cases. For

Tamas (8/15) Jul 18 2015 Thanks, indeed, after -O -inline the bodies of the two abs

Baz (92/99) Jul 17 2015 Hi, I don't see the runtime penalities you talk about.

"Tamas" <foldenyi.tamas gmail.com> writes:

I got inspired by Andrei's "Generic Programming Must Go" talk:
https://www.youtube.com/watch?v=mCrVYYlFTrA

I.e. writing functions with static if-s, based on what we know 
about the input.
So the question is how to annotate the input variables?

Adam showed an excellent solution for this, by wrapping the value 
into a struct, and using alias: 
http://stackoverflow.com/questions/31442059/virtual-static-value-types

Unfortunately, it's not without runtime penalty. I have checked 
this idea with ints, against the generated asm.dlang.org code. 
Unfortunately the struct wrapper results in longer asm code. In 
theory such run-time penalty is not necessary, as everything is 
available compile-time.

To understand my use-case, here is an example "math library" that 
skips certain validations or computations altogether, whether 
some properties of the input can be inferred from how the value 
was produced:

import std.math : sqrt;
import std.conv;

struct Positive {
	int num;
	alias num this;
}

Positive abs(T)(T n) {
	static if (is(T == Positive)) {
		return n;
	}
	if (n >= 0) return Positive(n);
	return Positive(-n);
}

Positive isqrt(T)(T x) {
	static if (is(T == Positive)) {
		return Positive(sqrt(x.to!float).to!int);
	}
	if (x<0) {
		throw new Exception("no root for negative numbers");
	}
	return Positive(sqrt(x.to!float).to!int);
}

unittest {
	assert(abs(-4) == 4); // full abs runs
	assert(isqrt(4).abs() == 2); // full isqrt runs, abs returns n 
immediately.
	assert(abs(-4).isqrt() == 2); // full abs runs, isqrt return 
immediately, skipping validation;
}

Although this code is fully operational, presents nice api and 
compile-time optimizations, the extra Struct wrapper is not 
without runtime penalty.

Is there a solution that results the same static optimizations, 
but has no runtime penalty, i.e. the functions just operates with 
ints? (At least when compiled)

Thanks, Tamas

Jul 17 2015

"ZombineDev" <valid_email he.re> writes:

On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Is there a solution that results the same static optimizations, 
 but has no runtime penalty, i.e. the functions just operates 
 with ints? (At least when compiled)

Did you try looking at assembly generated by GDC or LDC with full 
optimizations? For example GDC does quite better than DMD for the 
proposed SafeInt type:
https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595

Jul 17 2015

"ZombineDev" <valid_email he.re> writes:

On Friday, 17 July 2015 at 23:15:31 UTC, ZombineDev wrote:
 On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Is there a solution that results the same static 
 optimizations, but has no runtime penalty, i.e. the functions 
 just operates with ints? (At least when compiled)

 Did you try looking at assembly generated by GDC or LDC with 
 full optimizations? For example GDC does quite better than DMD 
 for the proposed SafeInt type:
 https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595

Also, see the table at the bottom of this comment:
https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-117450524

Jul 17 2015

"Tamas" <foldenyi.tamas gmail.com> writes:

On Friday, 17 July 2015 at 23:16:51 UTC, ZombineDev wrote:
 On Friday, 17 July 2015 at 23:15:31 UTC, ZombineDev wrote:
 On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Is there a solution that results the same static 
 optimizations, but has no runtime penalty, i.e. the functions 
 just operates with ints? (At least when compiled)

 Did you try looking at assembly generated by GDC or LDC with 
 full optimizations? For example GDC does quite better than DMD 
 for the proposed SafeInt type:
 https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595

 Also, see the table at the bottom of this comment:
 https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-117450524

Thanks for the pointers! My first priority a  performant library, 
secondary is a nice api, 3rd is a nice implementation. So that 
kind of rules put Amy degradation compared to ints. I used DMD, 
BTW.

I see no reason for the necessity of performance degradation. 
Essentially I just want to assign a similar qualifyer like const 
or immutable. They are checked and used at compile time, but 
erased for runtime. Same here.

Jul 17 2015

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 17 July 2015 at 23:44:25 UTC, Tamas wrote:
 I used DMD, BTW.

what compile flags?

Jul 17 2015

"Tamas" <foldenyi.tamas gmail.com> writes:

I made a thorough comparison using multiple compilers and a 
summary of the findings. In short, there is a runtime overhead.

I reduced the code to cut out the imports and made two versions 
with equivalent semantic content.
positive0.d contains the hand written specializations of the abs 
function.
positive.d contains the solution with function templates / static 
type analysis.

///////

/* positive0.d:

Compile & execute:
$ dmd positive0.d; ./positive0; echo $?
$ ldc2 positive0.d; ./positive0; echo $?

generate ASM source:
$ dmd positive0.d; gobjdump -d positive0.o > positive0.dmd.s
$ ldc2 positive0.d -output-s

*/

int absPositive(int n) {
   return n;
}

int abs(int n) {
   return (n>=0) ? n : -n;
}

int square(int x) {
   return x * x;
}

int main() {
   return !((abs(-16) == 16)
     && (abs(3) == 3)
     && (square(5).abs == 25)
     && (square(-4).abs == 16));
}

///////

/* positive.d:

Compile & execute:
$ dmd positive.d; ./positive; echo $?
$ ldc2 positive.d; ./positive; echo $?

generate ASM source:
$ dmd positive.d; gobjdump -d positive.o > positive.dmd.s
$ ldc2 positive.d -output-s

*/
struct Positive {
   int num;
   alias num this;
}

Positive abs(T)(T n) {
   static if (is(T == Positive)) {
     return n;
   } else {
     return Positive((n >= 0) ? n : -n);
   }
}

Positive square(int x) {
   return Positive(x * x);
}

int main() {
   return !((abs(-16) == 16)
     && (abs(3) == 3)
     && (square(5).abs == 25)
     && (square(-4).abs == 16));
}

///////

I compared the generated asms. The asm code was substantially 
longer in case of non-hand written specializations of the abs 
function.

The 'optimized' versions of the abs function were equivalent, but 
the 'non-optimzed' versions shows the runtime overhead for dmd 
and ldc2 as well, a double 'mov' commands instead of a single 
ones;

The compiled hand written code was roughly half the size for both 
compilers:

File sizes:
ldc:
2678 positive0.s
4313 positive.s

dmd:
3442 positive0.dmd.s
8701 positive.dmd.s

You can see the abs functions below, and you can spot the double 
'mov' operations:

positive.dmd.s:
0000000000000230 
<_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive>:
  230:	55                   	push   %rbp
  231:	48 8b ec             	mov    %rsp,%rbp
  234:	48 83 ec 10          	sub    $0x10,%rsp
  238:	85 ff                	test   %edi,%edi
  23a:	78 02                	js     23e 
<_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive+0xe>
  23c:	eb 02                	jmp    240 
<_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive+0x10>
  23e:	f7 df                	neg    %edi
  240:	89 7d f0             	mov    %edi,-0x10(%rbp)
  243:	48 89 f8             	mov    %rdi,%rax
  246:	c9                   	leaveq
  247:	c3                   	retq

0000000000000248 
<_D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive>:
  248:	55                   	push   %rbp
  249:	48 8b ec             	mov    %rsp,%rbp
  24c:	48 83 ec 10          	sub    $0x10,%rsp
  250:	48 89 f8             	mov    %rdi,%rax
  253:	c9                   	leaveq
  254:	c3                   	retq
  255:	0f 1f 00             	nopl   (%rax)



positive0.dmd.s:
00000000000000a0 <_D9positive011absPositiveFiZi>:
   a0:	55                   	push   %rbp
   a1:	48 8b ec             	mov    %rsp,%rbp
   a4:	48 83 ec 10          	sub    $0x10,%rsp
   a8:	48 89 f8             	mov    %rdi,%rax
   ab:	c9                   	leaveq
   ac:	c3                   	retq
   ad:	0f 1f 00             	nopl   (%rax)

00000000000000b0 <_D9positive03absFiZi>:
   b0:	55                   	push   %rbp
   b1:	48 8b ec             	mov    %rsp,%rbp
   b4:	48 83 ec 10          	sub    $0x10,%rsp
   b8:	85 ff                	test   %edi,%edi
   ba:	78 05                	js     c1 <_D9positive03absFiZi+0x11>
   bc:	48 89 f8             	mov    %rdi,%rax
   bf:	eb 05                	jmp    c6 <_D9positive03absFiZi+0x16>
   c1:	48 89 f8             	mov    %rdi,%rax
   c4:	f7 d8                	neg    %eax
   c6:	c9                   	leaveq
   c7:	c3                   	retq


ldc2:
positive.s:

__D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	cmpl	$0, -4(%rsp)
	jl	LBB2_2
	leaq	-4(%rsp), %rax
	movq	%rax, -16(%rsp)
	jmp	LBB2_3
LBB2_2:
	leaq	-20(%rsp), %rax
	xorl	%ecx, %ecx
	subl	-4(%rsp), %ecx
	movl	%ecx, -20(%rsp)
	movq	%rax, -16(%rsp)
LBB2_3:
	movq	-16(%rsp), %rax
	movl	(%rax), %ecx
	movl	%ecx, -8(%rsp)
	movl	%ecx, %eax
	retq
	.cfi_endproc

	.globl	__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive
	.weak_definition	__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive
	.align	4, 0x90
__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive:
	.cfi_startproc
	movl	%edi, -8(%rsp)
	movl	%edi, %eax
	retq
	.cfi_endproc

	.section	__TEXT,__text,regular,pure_instructions
	.align	4, 0x90


positive0.s:
__D9positive011absPositiveFiZi:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	movl	-4(%rsp), %eax
	retq
	.cfi_endproc

	.globl	__D9positive03absFiZi
	.align	4, 0x90
__D9positive03absFiZi:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	cmpl	$0, -4(%rsp)
	jl	LBB1_2
	leaq	-4(%rsp), %rax
	movq	%rax, -16(%rsp)
	jmp	LBB1_3
LBB1_2:
	leaq	-20(%rsp), %rax
	xorl	%ecx, %ecx
	subl	-4(%rsp), %ecx
	movl	%ecx, -20(%rsp)
	movq	%rax, -16(%rsp)
LBB1_3:
	movq	-16(%rsp), %rax
	movl	(%rax), %eax
	retq
	.cfi_endproc

	.globl	__D9positive06squareFiZi
	.align	4, 0x90


my compilers:

$ ldc2 -version
LDC - the LLVM D compiler (6d3923):
   based on DMD v2.066.1 and LLVM 3.6.1
   Default target: x86_64-apple-darwin14.4.0
   Host CPU: core-avx2

$ dmd --version
DMD64 D Compiler v2.067

Jul 18 2015

"Tamas" <foldenyi.tamas gmail.com> writes:

Sorry, the main function of positive0.d correctly looks like this:

int main() {
   return !((abs(-16) == 16)
     && (abs(3) == 3)
     && (square(5).absPositive == 25)
     && (square(-4).absPositive == 16));
}

But this does not affect the results, the asm file sizs or the 
asm abs function bodies.

Jul 18 2015

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Saturday, 18 July 2015 at 10:06:07 UTC, Tamas wrote:
 Compile & execute:
 $ dmd positive0.d; ./positive0; echo $?
 $ ldc2 positive0.d; ./positive0; echo $?

Try adding the automatic optimize flags in all your cases. For 
dmd, `-O -inline`. Not sure about ldc but I think it is `-O` as 
well.

Jul 18 2015

"Tamas" <foldenyi.tamas gmail.com> writes:

On Saturday, 18 July 2015 at 13:16:26 UTC, Adam D. Ruppe wrote:
 On Saturday, 18 July 2015 at 10:06:07 UTC, Tamas wrote:
 Compile & execute:
 $ dmd positive0.d; ./positive0; echo $?
 $ ldc2 positive0.d; ./positive0; echo $?

 Try adding the automatic optimize flags in all your cases. For 
 dmd, `-O -inline`. Not sure about ldc but I think it is `-O` as 
 well.

Thanks, indeed, after -O -inline the bodies of the two abs 
functions are the same! :)

The asm code of the templated version is still longer overall, 
but I think it's only some garbage that is not really executed. 
(e.g some with assert and unittest in the name, although I have 
none such)

Soo thank you, it's really awesome! :)

Jul 18 2015

"Baz" <bb.temp gmx.com> writes:

On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Although this code is fully operational, presents nice api and 
 compile-time optimizations, the extra Struct wrapper is not 
 without runtime penalty.

 Is there a solution that results the same static optimizations, 
 but has no runtime penalty, i.e. the functions just operates 
 with ints? (At least when compiled)

 Thanks, Tamas

Hi, I don't see the runtime penalities you talk about.

I've defined these alias:

---
alias abs1 = abs!int;
alias abs2 = abs!Positive;
alias isqrt1 = isqrt!int;
alias isqrt2 = isqrt!Positive;
---

and the asm produced for the `Positive` type is clearly faster:

;------- abs1 -------
00402074h  enter 000Ch, 00h
00402078h  test eax, eax
0040207Ah  js 00402081h
0040207Ch  mov dword ptr [ebp-0Ch], eax
0040207Fh  leave
00402080h  ret
00402081h  neg eax
00402083h  mov dword ptr [ebp-08h], eax
00402086h  leave
00402087h  ret
;----------------------------

;------- abs2 (Positive)-------
00402088h  enter 0004h, 00h
0040208Ch  leave
0040208Dh  ret
;----------------------------

;------- isqrt1 -------
00402090h  enter 0008h, 00h
00402094h  test eax, eax
00402096h  jns 004020CFh
00402098h  mov ecx, 00454D60h
0040209Dh  push ecx
0040209Eh  call 00403410h
004020A3h  add esp, 04h
004020A6h  push dword ptr [004540A4h]
004020ACh  push dword ptr [004540A0h]
004020B2h  push dword ptr [004540E4h]
004020B8h  push dword ptr [004540E0h]
004020BEh  push 0000001Bh
004020C0h  push 00000000h
004020C2h  call 004035C0h
004020C7h  push eax
004020C8h  call 00403400h
004020CDh  jmp 004020E6h
004020CFh  call 0040247Ch
004020D4h  fsqrt
004020D6h  sub esp, 04h
004020D9h  fstp dword ptr [esp]
004020DCh  call 00402494h
004020E1h  mov dword ptr [ebp-08h], eax
004020E4h  leave
004020E5h  ret
004020E6h  leave
004020E7h  ret
;----------------------------

;------- isqrt2 (Positive)-------
004020E8h  enter 0008h, 00h
004020ECh  call 00402560h
004020F1h  fsqrt
004020F3h  sub esp, 04h
004020F6h  fstp dword ptr [esp]
004020F9h  call 00402494h
004020FEh  mov dword ptr [ebp-08h], eax
00402101h  leave
00402102h  ret
;----------------------------

(dmd win32 bit, -w -wi).

Also note that i've tweaked the code to get rid of a few warnings:
---
Positive abs(T)(T n) {
     static if (is(T == Positive)) {
         return n;
     }
     else
     {
         if (n >= 0) return Positive(n);
         else return Positive(-n);
     }
}

Positive isqrt(T)(T x) {
     static if (is(T == Positive)) {
         return Positive(sqrt(x.to!float).to!int);
     }
     else
     {
         if (x<0)
             throw new Exception("no root for negative numbers");
         else return Positive(sqrt(x.to!float).to!int);
     }

}
---

Jul 17 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Virtual value types during compile-time for static type safety, static