digitalmars.D - Re: Programming language benchmark

bearophile <bearophileHUGS lycos.com> Jun 24 2011

Don <nospam nospam.com> Jun 26 2011

Timon Gehr <timon.gehr gmx.ch> Jun 26 2011

Don <nospam nospam.com> Jun 26 2011

Walter Bright <newshound2 digitalmars.com> Jun 26 2011

Caligo <iteronvexor gmail.com> Jun 28 2011

bearophile <bearophileHUGS lycos.com> writes:

Don:

Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?


I compare/convert code to D every day, so I am aware that D code compiled with
DMD is often slower than C/C++ code compiled with GCC. Since some years I even
keep a collection of snippets of slow code.

But I am also aware that the low performance has many different causes, like
some missing inlining, missing loop unrolling, etc, so spotting a clear and
small case of integer arithmetic code that causes a slow down, to give you
evidence, is not easy. So I am sorry for my overly broad claim. 


If it is true, there's a strong possibility that it's a small, fixable issue
(for example, DMD used to have terrible performance for ulong multiplication).<


You are right, the case I'm going to show is a precise problem that's fixable. 

-----------------------

// C code
#include "limits.h"
#include "stdio.h"

int divideBySeven(int x) {
    return x / 7;
}

int main() {
    int i = INT_MAX;
    int r;
    while (i--)
        r = divideBySeven(i);

    printf("%d\n", r);
    return 0;
}

-----------------------

// D code
int divideBySeven(int x) {
    return x / 7;
}

void main() {
    int i = int.max;
    int r;
    while (i--)
        r = divideBySeven(i);
    printf("%d\n", r);
}

-----------------------

Asm from the C version:

_divideBySeven:
	pushl	%ebx
	movl	$-1840700269, %ebx
	movl	8(%esp), %ecx
	movl	%ebx, %eax
	popl	%ebx
	imull	%ecx
	leal	(%edx,%ecx), %eax
	sarl	$31, %ecx
	sarl	$2, %eax
	subl	%ecx, %eax
	ret

_main:
	leal	4(%esp), %ecx
	andl	$-16, %esp
	pushl	-4(%ecx)
	pushl	%ebx
	movl	$-1840700269, %ebx
	pushl	%ecx
	subl	$20, %esp
	call	___main
	movl	$2147483646, %ecx
	.p2align 4,,10
L4:
	movl	%ecx, %eax
	imull	%ebx
	movl	%ecx, %eax
	addl	%ecx, %edx
	sarl	$31, %eax
	sarl	$2, %edx
	decl	%ecx
	subl	%eax, %edx
	cmpl	$-1, %ecx
	jne	L4
	movl	%edx, 4(%esp)
	movl	$LC0, (%esp)
	call	_printf
	addl	$20, %esp
	xorl	%eax, %eax
	popl	%ecx
	popl	%ebx
	leal	-4(%ecx), %esp
	ret
	.def	_printf;	.scl	2;	.type	32;	.endef

-----------------------

Asm from the D version:

_D9int_div_d13divideBySevenFiZi	comdat
		mov	ECX,7
		cdq
		idiv	ECX
		ret

__Dmain	comdat
L0:		push	EAX
		push	EBX
		mov	EBX,07FFFFFFFh
		push	ESI
		xor	ESI,ESI
		test	EBX,EBX
		lea	EBX,-1[EBX]
		je	L24
L11:		mov	EAX,EBX
		mov	ECX,7
		cdq
		idiv	ECX
		test	EBX,EBX
		mov	ESI,EAX
		lea	EBX,-1[EBX]
		jne	L11
L24:		push	ESI
		mov	EDX,offset FLAT:_DATA
		push	EDX
		call	near ptr _printf
		add	ESP,8
		xor	EAX,EAX
		pop	ESI
		pop	EBX
		pop	ECX
		ret

-----------------------

For a more real case see:
http://d.puremagic.com/issues/show_bug.cgi?id=5607

Bye,
bearophile

Jun 24 2011

Don <nospam nospam.com> writes:

bearophile wrote:
 Don:
 
 Sorry for my slow answer, I was quite busy for days.
 
 
 I've never heard that claim before. Do you have evidence for that?


 I compare/convert code to D every day, so I am aware that D code compiled with
DMD is often slower than C/C++ code compiled with GCC. Since some years I even
keep a collection of snippets of slow code.
 
 But I am also aware that the low performance has many different causes, like
some missing inlining, missing loop unrolling, etc, so spotting a clear and
small case of integer arithmetic code that causes a slow down, to give you
evidence, is not easy. So I am sorry for my overly broad claim. 


It is true in general that DMD's inliner is not very good. I _suspect_ 
that it is the primary cause of most instances of poor integer 
performance. It's actually part of the front-end, not the back-end. So 
many of those performance problems won't apply to DMC.

It's also true that the DMD/DMC instruction scheduler doesn't schedule 
for modern processors. But last I checked, GCC wasn't really much better 
in practice (you have to be almost perfect to get a benefit from 
instruction scheduling these days, the hardware does a very good job on 
unscheduled code).

Otherwise, I don't think there's any major optimisation it misses. But 
it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue
(for example, DMD used to have terrible performance for ulong multiplication).<


 You are right, the case I'm going to show is a precise problem that's fixable. 


[snip]
 -----------------------
 
 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607



Thanks, that's helpful. It's a major speed difference (factor of 20, 
maybe) so it wouldn't have to occur very often to be noticeable.

Jun 26 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Don wrote:
 bearophile wrote:
 Don:

 Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?


 I compare/convert code to D every day, so I am aware that D code compiled with




 collection of snippets of slow code.

 But I am also aware that the low performance has many different causes, like




case of
 integer arithmetic code that causes a slow down, to give you evidence, is not


 It is true in general that DMD's inliner is not very good. I _suspect_
 that it is the primary cause of most instances of poor integer
 performance. It's actually part of the front-end, not the back-end. So
 many of those performance problems won't apply to DMC.

 It's also true that the DMD/DMC instruction scheduler doesn't schedule
 for modern processors. But last I checked, GCC wasn't really much better
 in practice (you have to be almost perfect to get a benefit from
 instruction scheduling these days, the hardware does a very good job on
 unscheduled code).

 Otherwise, I don't think there's any major optimisation it misses. But
 it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue






 You are right, the case I'm going to show is a precise problem that's fixable.


 [snip]
 -----------------------

 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607



 Thanks, that's helpful. It's a major speed difference (factor of 20,
 maybe) so it wouldn't have to occur very often to be noticeable.


You may also want to have a look at this paper:

http://www.agner.org/optimize/optimizing_cpp.pdf

I don't know if it still accurately reflects the current state though.
Interestingly, it says that DMC is already able to perform the optimization
requested by bearophile.

On page 73 starts a tabular that is quite specific about which optimizations the
DMC backend is lacking.

Cheers,
-Timon

Jun 26 2011

Don <nospam nospam.com> writes:

Timon Gehr wrote:
 Don wrote:
 bearophile wrote:
 Don:

 Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?






 collection of snippets of slow code.

 But I am also aware that the low performance has many different causes, like




 case of
 integer arithmetic code that causes a slow down, to give you evidence, is not


 It is true in general that DMD's inliner is not very good. I _suspect_
 that it is the primary cause of most instances of poor integer
 performance. It's actually part of the front-end, not the back-end. So
 many of those performance problems won't apply to DMC.

 It's also true that the DMD/DMC instruction scheduler doesn't schedule
 for modern processors. But last I checked, GCC wasn't really much better
 in practice (you have to be almost perfect to get a benefit from
 instruction scheduling these days, the hardware does a very good job on
 unscheduled code).

 Otherwise, I don't think there's any major optimisation it misses. But
 it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue






 You are right, the case I'm going to show is a precise problem that's fixable.


 -----------------------

 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607


 Thanks, that's helpful. It's a major speed difference (factor of 20,
 maybe) so it wouldn't have to occur very often to be noticeable.


 You may also want to have a look at this paper:
 
 http://www.agner.org/optimize/optimizing_cpp.pdf
 
 I don't know if it still accurately reflects the current state though.


It's a little out of date, DMD now does a couple of things it didn't do 
when Agner did the testing. Incidentally I contributed a bit to that 
paper <g>.

 Interestingly, it says that DMC is already able to perform the optimization
 requested by bearophile.
 
 On page 73 starts a tabular that is quite specific about which optimizations
the
 DMC backend is lacking.
 
 Cheers,
 -Timon

Jun 26 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 6/26/2011 2:24 PM, Don wrote:
 You may also want to have a look at this paper:

 http://www.agner.org/optimize/optimizing_cpp.pdf

 I don't know if it still accurately reflects the current state though.


 It's a little out of date, DMD now does a couple of things it didn't do when
 Agner did the testing. Incidentally I contributed a bit to that paper <g>.


The table has several errors wrt DMC++. For example, DMC++ certainly does 
function inlining, constant propagation and the branch optimizations.

Jun 26 2011

Caligo <iteronvexor gmail.com> writes:

Kind of off topic, but a good place to get benchmark results for many
of the programming languages is Sphere Online Judge:
http://www.spoj.pl/problems/classical/

They accept solutions in D, but not many have been submitted.  I found a few:

http://www.spoj.pl/ranks/FCTRL/lang=D
http://www.spoj.pl/ranks/HASHIT/lang=D
http://www.spoj.pl/ranks/ONP/lang=D

Most of the fastest solutions are in C++, but D is pretty close.
Maybe we could start submitting solutions :-)

Jun 28 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Re: Programming language benchmark