www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Virtual value types during compile-time for static type safety, static

reply "Tamas" <foldenyi.tamas gmail.com> writes:
I got inspired by Andrei's "Generic Programming Must Go" talk:
https://www.youtube.com/watch?v=mCrVYYlFTrA

I.e. writing functions with static if-s, based on what we know 
about the input.
So the question is how to annotate the input variables?

Adam showed an excellent solution for this, by wrapping the value 
into a struct, and using alias: 
http://stackoverflow.com/questions/31442059/virtual-static-value-types

Unfortunately, it's not without runtime penalty. I have checked 
this idea with ints, against the generated asm.dlang.org code. 
Unfortunately the struct wrapper results in longer asm code. In 
theory such run-time penalty is not necessary, as everything is 
available compile-time.

To understand my use-case, here is an example "math library" that 
skips certain validations or computations altogether, whether 
some properties of the input can be inferred from how the value 
was produced:

import std.math : sqrt;
import std.conv;

struct Positive {
	int num;
	alias num this;
}

Positive abs(T)(T n) {
	static if (is(T == Positive)) {
		return n;
	}
	if (n >= 0) return Positive(n);
	return Positive(-n);
}

Positive isqrt(T)(T x) {
	static if (is(T == Positive)) {
		return Positive(sqrt(x.to!float).to!int);
	}
	if (x<0) {
		throw new Exception("no root for negative numbers");
	}
	return Positive(sqrt(x.to!float).to!int);
}

unittest {
	assert(abs(-4) == 4); // full abs runs
	assert(isqrt(4).abs() == 2); // full isqrt runs, abs returns n 
immediately.
	assert(abs(-4).isqrt() == 2); // full abs runs, isqrt return 
immediately, skipping validation;
}

Although this code is fully operational, presents nice api and 
compile-time optimizations, the extra Struct wrapper is not 
without runtime penalty.

Is there a solution that results the same static optimizations, 
but has no runtime penalty, i.e. the functions just operates with 
ints? (At least when compiled)

Thanks, Tamas
Jul 17 2015
next sibling parent reply "ZombineDev" <valid_email he.re> writes:
On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Is there a solution that results the same static optimizations, 
 but has no runtime penalty, i.e. the functions just operates 
 with ints? (At least when compiled)
Did you try looking at assembly generated by GDC or LDC with full optimizations? For example GDC does quite better than DMD for the proposed SafeInt type: https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595
Jul 17 2015
parent reply "ZombineDev" <valid_email he.re> writes:
On Friday, 17 July 2015 at 23:15:31 UTC, ZombineDev wrote:
 On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Is there a solution that results the same static 
 optimizations, but has no runtime penalty, i.e. the functions 
 just operates with ints? (At least when compiled)
Did you try looking at assembly generated by GDC or LDC with full optimizations? For example GDC does quite better than DMD for the proposed SafeInt type: https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595
Also, see the table at the bottom of this comment: https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-117450524
Jul 17 2015
parent reply "Tamas" <foldenyi.tamas gmail.com> writes:
On Friday, 17 July 2015 at 23:16:51 UTC, ZombineDev wrote:
 On Friday, 17 July 2015 at 23:15:31 UTC, ZombineDev wrote:
 On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Is there a solution that results the same static 
 optimizations, but has no runtime penalty, i.e. the functions 
 just operates with ints? (At least when compiled)
Did you try looking at assembly generated by GDC or LDC with full optimizations? For example GDC does quite better than DMD for the proposed SafeInt type: https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-119005595
Also, see the table at the bottom of this comment: https://github.com/D-Programming-Language/phobos/pull/3389#issuecomment-117450524
Thanks for the pointers! My first priority a performant library, secondary is a nice api, 3rd is a nice implementation. So that kind of rules put Amy degradation compared to ints. I used DMD, BTW. I see no reason for the necessity of performance degradation. Essentially I just want to assign a similar qualifyer like const or immutable. They are checked and used at compile time, but erased for runtime. Same here.
Jul 17 2015
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 17 July 2015 at 23:44:25 UTC, Tamas wrote:
 I used DMD, BTW.
what compile flags?
Jul 17 2015
parent reply "Tamas" <foldenyi.tamas gmail.com> writes:
I made a thorough comparison using multiple compilers and a 
summary of the findings. In short, there is a runtime overhead.

I reduced the code to cut out the imports and made two versions 
with equivalent semantic content.
positive0.d contains the hand written specializations of the abs 
function.
positive.d contains the solution with function templates / static 
type analysis.

///////

/* positive0.d:

Compile & execute:
$ dmd positive0.d; ./positive0; echo $?
$ ldc2 positive0.d; ./positive0; echo $?

generate ASM source:
$ dmd positive0.d; gobjdump -d positive0.o > positive0.dmd.s
$ ldc2 positive0.d -output-s

*/

int absPositive(int n) {
   return n;
}

int abs(int n) {
   return (n>=0) ? n : -n;
}

int square(int x) {
   return x * x;
}

int main() {
   return !((abs(-16) == 16)
     && (abs(3) == 3)
     && (square(5).abs == 25)
     && (square(-4).abs == 16));
}

///////

/* positive.d:

Compile & execute:
$ dmd positive.d; ./positive; echo $?
$ ldc2 positive.d; ./positive; echo $?

generate ASM source:
$ dmd positive.d; gobjdump -d positive.o > positive.dmd.s
$ ldc2 positive.d -output-s

*/
struct Positive {
   int num;
   alias num this;
}

Positive abs(T)(T n) {
   static if (is(T == Positive)) {
     return n;
   } else {
     return Positive((n >= 0) ? n : -n);
   }
}

Positive square(int x) {
   return Positive(x * x);
}

int main() {
   return !((abs(-16) == 16)
     && (abs(3) == 3)
     && (square(5).abs == 25)
     && (square(-4).abs == 16));
}

///////

I compared the generated asms. The asm code was substantially 
longer in case of non-hand written specializations of the abs 
function.

The 'optimized' versions of the abs function were equivalent, but 
the 'non-optimzed' versions shows the runtime overhead for dmd 
and ldc2 as well, a double 'mov' commands instead of a single 
ones;

The compiled hand written code was roughly half the size for both 
compilers:

File sizes:
ldc:
2678 positive0.s
4313 positive.s

dmd:
3442 positive0.dmd.s
8701 positive.dmd.s

You can see the abs functions below, and you can spot the double 
'mov' operations:

positive.dmd.s:
0000000000000230 
<_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive>:
  230:	55                   	push   %rbp
  231:	48 8b ec             	mov    %rsp,%rbp
  234:	48 83 ec 10          	sub    $0x10,%rsp
  238:	85 ff                	test   %edi,%edi
  23a:	78 02                	js     23e 
<_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive+0xe>
  23c:	eb 02                	jmp    240 
<_D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive+0x10>
  23e:	f7 df                	neg    %edi
  240:	89 7d f0             	mov    %edi,-0x10(%rbp)
  243:	48 89 f8             	mov    %rdi,%rax
  246:	c9                   	leaveq
  247:	c3                   	retq

0000000000000248 
<_D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive>:
  248:	55                   	push   %rbp
  249:	48 8b ec             	mov    %rsp,%rbp
  24c:	48 83 ec 10          	sub    $0x10,%rsp
  250:	48 89 f8             	mov    %rdi,%rax
  253:	c9                   	leaveq
  254:	c3                   	retq
  255:	0f 1f 00             	nopl   (%rax)



positive0.dmd.s:
00000000000000a0 <_D9positive011absPositiveFiZi>:
   a0:	55                   	push   %rbp
   a1:	48 8b ec             	mov    %rsp,%rbp
   a4:	48 83 ec 10          	sub    $0x10,%rsp
   a8:	48 89 f8             	mov    %rdi,%rax
   ab:	c9                   	leaveq
   ac:	c3                   	retq
   ad:	0f 1f 00             	nopl   (%rax)

00000000000000b0 <_D9positive03absFiZi>:
   b0:	55                   	push   %rbp
   b1:	48 8b ec             	mov    %rsp,%rbp
   b4:	48 83 ec 10          	sub    $0x10,%rsp
   b8:	85 ff                	test   %edi,%edi
   ba:	78 05                	js     c1 <_D9positive03absFiZi+0x11>
   bc:	48 89 f8             	mov    %rdi,%rax
   bf:	eb 05                	jmp    c6 <_D9positive03absFiZi+0x16>
   c1:	48 89 f8             	mov    %rdi,%rax
   c4:	f7 d8                	neg    %eax
   c6:	c9                   	leaveq
   c7:	c3                   	retq


ldc2:
positive.s:

__D8positive10__T3absTiZ3absFNaNbNiNfiZS8positive8Positive:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	cmpl	$0, -4(%rsp)
	jl	LBB2_2
	leaq	-4(%rsp), %rax
	movq	%rax, -16(%rsp)
	jmp	LBB2_3
LBB2_2:
	leaq	-20(%rsp), %rax
	xorl	%ecx, %ecx
	subl	-4(%rsp), %ecx
	movl	%ecx, -20(%rsp)
	movq	%rax, -16(%rsp)
LBB2_3:
	movq	-16(%rsp), %rax
	movl	(%rax), %ecx
	movl	%ecx, -8(%rsp)
	movl	%ecx, %eax
	retq
	.cfi_endproc

	.globl	__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive
	.weak_definition	__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive
	.align	4, 0x90
__D8positive28__T3absTS8positive8PositiveZ3absFNaNbNiNfS8positive8PositiveZS8positive8Positive:
	.cfi_startproc
	movl	%edi, -8(%rsp)
	movl	%edi, %eax
	retq
	.cfi_endproc

	.section	__TEXT,__text,regular,pure_instructions
	.align	4, 0x90


positive0.s:
__D9positive011absPositiveFiZi:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	movl	-4(%rsp), %eax
	retq
	.cfi_endproc

	.globl	__D9positive03absFiZi
	.align	4, 0x90
__D9positive03absFiZi:
	.cfi_startproc
	movl	%edi, -4(%rsp)
	cmpl	$0, -4(%rsp)
	jl	LBB1_2
	leaq	-4(%rsp), %rax
	movq	%rax, -16(%rsp)
	jmp	LBB1_3
LBB1_2:
	leaq	-20(%rsp), %rax
	xorl	%ecx, %ecx
	subl	-4(%rsp), %ecx
	movl	%ecx, -20(%rsp)
	movq	%rax, -16(%rsp)
LBB1_3:
	movq	-16(%rsp), %rax
	movl	(%rax), %eax
	retq
	.cfi_endproc

	.globl	__D9positive06squareFiZi
	.align	4, 0x90


my compilers:

$ ldc2 -version
LDC - the LLVM D compiler (6d3923):
   based on DMD v2.066.1 and LLVM 3.6.1
   Default target: x86_64-apple-darwin14.4.0
   Host CPU: core-avx2

$ dmd --version
DMD64 D Compiler v2.067
Jul 18 2015
next sibling parent "Tamas" <foldenyi.tamas gmail.com> writes:
Sorry, the main function of positive0.d correctly looks like this:

int main() {
   return !((abs(-16) == 16)
     && (abs(3) == 3)
     && (square(5).absPositive == 25)
     && (square(-4).absPositive == 16));
}

But this does not affect the results, the asm file sizs or the 
asm abs function bodies.
Jul 18 2015
prev sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Saturday, 18 July 2015 at 10:06:07 UTC, Tamas wrote:
 Compile & execute:
 $ dmd positive0.d; ./positive0; echo $?
 $ ldc2 positive0.d; ./positive0; echo $?
Try adding the automatic optimize flags in all your cases. For dmd, `-O -inline`. Not sure about ldc but I think it is `-O` as well.
Jul 18 2015
parent "Tamas" <foldenyi.tamas gmail.com> writes:
On Saturday, 18 July 2015 at 13:16:26 UTC, Adam D. Ruppe wrote:
 On Saturday, 18 July 2015 at 10:06:07 UTC, Tamas wrote:
 Compile & execute:
 $ dmd positive0.d; ./positive0; echo $?
 $ ldc2 positive0.d; ./positive0; echo $?
Try adding the automatic optimize flags in all your cases. For dmd, `-O -inline`. Not sure about ldc but I think it is `-O` as well.
Thanks, indeed, after -O -inline the bodies of the two abs functions are the same! :) The asm code of the templated version is still longer overall, but I think it's only some garbage that is not really executed. (e.g some with assert and unittest in the name, although I have none such) Soo thank you, it's really awesome! :)
Jul 18 2015
prev sibling parent "Baz" <bb.temp gmx.com> writes:
On Friday, 17 July 2015 at 21:20:41 UTC, Tamas wrote:
 Although this code is fully operational, presents nice api and 
 compile-time optimizations, the extra Struct wrapper is not 
 without runtime penalty.

 Is there a solution that results the same static optimizations, 
 but has no runtime penalty, i.e. the functions just operates 
 with ints? (At least when compiled)

 Thanks, Tamas
Hi, I don't see the runtime penalities you talk about. I've defined these alias: --- alias abs1 = abs!int; alias abs2 = abs!Positive; alias isqrt1 = isqrt!int; alias isqrt2 = isqrt!Positive; --- and the asm produced for the `Positive` type is clearly faster: ;------- abs1 ------- 00402074h enter 000Ch, 00h 00402078h test eax, eax 0040207Ah js 00402081h 0040207Ch mov dword ptr [ebp-0Ch], eax 0040207Fh leave 00402080h ret 00402081h neg eax 00402083h mov dword ptr [ebp-08h], eax 00402086h leave 00402087h ret ;---------------------------- ;------- abs2 (Positive)------- 00402088h enter 0004h, 00h 0040208Ch leave 0040208Dh ret ;---------------------------- ;------- isqrt1 ------- 00402090h enter 0008h, 00h 00402094h test eax, eax 00402096h jns 004020CFh 00402098h mov ecx, 00454D60h 0040209Dh push ecx 0040209Eh call 00403410h 004020A3h add esp, 04h 004020A6h push dword ptr [004540A4h] 004020ACh push dword ptr [004540A0h] 004020B2h push dword ptr [004540E4h] 004020B8h push dword ptr [004540E0h] 004020BEh push 0000001Bh 004020C0h push 00000000h 004020C2h call 004035C0h 004020C7h push eax 004020C8h call 00403400h 004020CDh jmp 004020E6h 004020CFh call 0040247Ch 004020D4h fsqrt 004020D6h sub esp, 04h 004020D9h fstp dword ptr [esp] 004020DCh call 00402494h 004020E1h mov dword ptr [ebp-08h], eax 004020E4h leave 004020E5h ret 004020E6h leave 004020E7h ret ;---------------------------- ;------- isqrt2 (Positive)------- 004020E8h enter 0008h, 00h 004020ECh call 00402560h 004020F1h fsqrt 004020F3h sub esp, 04h 004020F6h fstp dword ptr [esp] 004020F9h call 00402494h 004020FEh mov dword ptr [ebp-08h], eax 00402101h leave 00402102h ret ;---------------------------- (dmd win32 bit, -w -wi). Also note that i've tweaked the code to get rid of a few warnings: --- Positive abs(T)(T n) { static if (is(T == Positive)) { return n; } else { if (n >= 0) return Positive(n); else return Positive(-n); } } Positive isqrt(T)(T x) { static if (is(T == Positive)) { return Positive(sqrt(x.to!float).to!int); } else { if (x<0) throw new Exception("no root for negative numbers"); else return Positive(sqrt(x.to!float).to!int); } } ---
Jul 17 2015