digitalmars.D - Programming language benchmark

Piotr Szturmaj (8/8) Jun 22 2011 Hi,

bearophile (5/8) Jun 22 2011 Only LDC and GDC. DMD is not up to GCC/ICC/LLVM even on integer math ben...

Dmitry Olshansky (21/29) Jun 22 2011 Yeah, if that's supposed to show anything.

Andrei Alexandrescu (3/4) Jun 22 2011 Now that's an interesting line.

Don (5/16) Jun 23 2011 I've never heard that claim before. Do you have evidence for that? If it...

bearophile (117/119) Jun 24 2011 I compare/convert code to D every day, so I am aware that D code compile...

Don (15/32) Jun 26 2011 It is true in general that DMD's inliner is not very good. I _suspect_

Timon Gehr (15/49) Jun 26 2011 some missing inlining, missing loop unrolling, etc, so spotting a clear ...

Don (4/60) Jun 26 2011 It's a little out of date, DMD now does a couple of things it didn't do

Walter Bright (3/10) Jun 26 2011 The table has several errors wrt DMC++. For example, DMC++ certainly doe...

Caligo (9/9) Jun 28 2011 Kind of off topic, but a good place to get benchmark results for many

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Hi,

I've just found this benchmark which includes GDC and LDC compilers. I 
don't know why there's no DMD flavor, though. Here's the link:

http://attractivechaos.wordpress.com/2011/06/22/my-programming-language-benchmark-analyses/

It clearly shows that D is really fast (comparable to C) which I like a 
lot! However, RegExp implementation is missing from D benchmark. Author 
"could not get it to work".

Piotr

Jun 22 2011

bearophile <bearophileHUGS lycos.com> writes:

Piotr Szturmaj:

 I don't know why there's no DMD flavor, though.

The author of that bench has said:
I have not evaluated DMD because I am running the programs on a Linux server I
have no control of. The �libc� is quite old and incompatible with the binary
release of dmd.<


 It clearly shows that D is really fast (comparable to C) which I like a lot!

Only LDC and GDC. DMD is not up to GCC/ICC/LLVM even on integer math benchmarks.

Bye,
bearophile

Jun 22 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 22.06.2011 15:41, bearophile wrote:
 Piotr Szturmaj:

 I don't know why there's no DMD flavor, though.

 The author of that bench has said:
 I have not evaluated DMD because I am running the programs on a Linux server I
have no control of. The �libc� is quite old and incompatible with the binary
release of dmd.<

 It clearly shows that D is really fast (comparable to C) which I like a lot!


Yeah, if that's supposed to show anything.
I personally dislike the way author benchmarks regexes anyway, e.g. perl:

while (<>) {
chomp;
print $_, "\n" if /$re/;
}

chomp??
and printing each line will get this test biased by performance of text 
printing facilities.
Same things with C, gets and puts and chomping that have nothing to do 
with pattern matching:

while (fgets(buf, BUF_SIZE - 1, stdin)) {
++l;
for (q = buf; *q; ++q); if (q > buf) *(q-1) = 0;//was that triming '\n'?
if (regexec(&r, buf, 10, match, 0) != REG_NOMATCH)
puts(buf);
}



 Only LDC and GDC. DMD is not up to GCC/ICC/LLVM even on integer math
benchmarks.

Mm, proof link? :)
 Bye,
 bearophile


-- 
Dmitry Olshansky

Jun 22 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/22/11 6:52 AM, Dmitry Olshansky wrote:
 for (q = buf; *q; ++q); if (q > buf) *(q-1) = 0;//was that triming '\n'?

Now that's an interesting line.

Andrei

Jun 22 2011

Don <nospam nospam.com> writes:

bearophile wrote:
 Piotr Szturmaj:
 
 I don't know why there's no DMD flavor, though.

 
 The author of that bench has said:
 I have not evaluated DMD because I am running the programs on a Linux server I
have no control of. The �libc� is quite old and incompatible with the binary
release of dmd.<

 
 
 It clearly shows that D is really fast (comparable to C) which I like a lot!

 
 Only LDC and GDC. DMD is not up to GCC/ICC/LLVM even on integer math
benchmarks.

I've never heard that claim before. Do you have evidence for that? If it 
is true, there's a strong possibility that it's a small, fixable issue 
(for example, DMD used to have terrible performance for ulong 
multiplication).

Jun 23 2011

bearophile <bearophileHUGS lycos.com> writes:

Don:

Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?

I compare/convert code to D every day, so I am aware that D code compiled with
DMD is often slower than C/C++ code compiled with GCC. Since some years I even
keep a collection of snippets of slow code.

But I am also aware that the low performance has many different causes, like
some missing inlining, missing loop unrolling, etc, so spotting a clear and
small case of integer arithmetic code that causes a slow down, to give you
evidence, is not easy. So I am sorry for my overly broad claim. 


If it is true, there's a strong possibility that it's a small, fixable issue
(for example, DMD used to have terrible performance for ulong multiplication).<

You are right, the case I'm going to show is a precise problem that's fixable. 

-----------------------

// C code
#include "limits.h"
#include "stdio.h"

int divideBySeven(int x) {
    return x / 7;
}

int main() {
    int i = INT_MAX;
    int r;
    while (i--)
        r = divideBySeven(i);

    printf("%d\n", r);
    return 0;
}

-----------------------

// D code
int divideBySeven(int x) {
    return x / 7;
}

void main() {
    int i = int.max;
    int r;
    while (i--)
        r = divideBySeven(i);
    printf("%d\n", r);
}

-----------------------

Asm from the C version:

_divideBySeven:
	pushl	%ebx
	movl	$-1840700269, %ebx
	movl	8(%esp), %ecx
	movl	%ebx, %eax
	popl	%ebx
	imull	%ecx
	leal	(%edx,%ecx), %eax
	sarl	$31, %ecx
	sarl	$2, %eax
	subl	%ecx, %eax
	ret

_main:
	leal	4(%esp), %ecx
	andl	$-16, %esp
	pushl	-4(%ecx)
	pushl	%ebx
	movl	$-1840700269, %ebx
	pushl	%ecx
	subl	$20, %esp
	call	___main
	movl	$2147483646, %ecx
	.p2align 4,,10
L4:
	movl	%ecx, %eax
	imull	%ebx
	movl	%ecx, %eax
	addl	%ecx, %edx
	sarl	$31, %eax
	sarl	$2, %edx
	decl	%ecx
	subl	%eax, %edx
	cmpl	$-1, %ecx
	jne	L4
	movl	%edx, 4(%esp)
	movl	$LC0, (%esp)
	call	_printf
	addl	$20, %esp
	xorl	%eax, %eax
	popl	%ecx
	popl	%ebx
	leal	-4(%ecx), %esp
	ret
	.def	_printf;	.scl	2;	.type	32;	.endef

-----------------------

Asm from the D version:

_D9int_div_d13divideBySevenFiZi	comdat
		mov	ECX,7
		cdq
		idiv	ECX
		ret

__Dmain	comdat
L0:		push	EAX
		push	EBX
		mov	EBX,07FFFFFFFh
		push	ESI
		xor	ESI,ESI
		test	EBX,EBX
		lea	EBX,-1[EBX]
		je	L24
L11:		mov	EAX,EBX
		mov	ECX,7
		cdq
		idiv	ECX
		test	EBX,EBX
		mov	ESI,EAX
		lea	EBX,-1[EBX]
		jne	L11
L24:		push	ESI
		mov	EDX,offset FLAT:_DATA
		push	EDX
		call	near ptr _printf
		add	ESP,8
		xor	EAX,EAX
		pop	ESI
		pop	EBX
		pop	ECX
		ret

-----------------------

For a more real case see:
http://d.puremagic.com/issues/show_bug.cgi?id=5607

Bye,
bearophile

Jun 24 2011

Don <nospam nospam.com> writes:

bearophile wrote:
 Don:
 
 Sorry for my slow answer, I was quite busy for days.
 
 
 I've never heard that claim before. Do you have evidence for that?

 
 I compare/convert code to D every day, so I am aware that D code compiled with
DMD is often slower than C/C++ code compiled with GCC. Since some years I even
keep a collection of snippets of slow code.
 
 But I am also aware that the low performance has many different causes, like
some missing inlining, missing loop unrolling, etc, so spotting a clear and
small case of integer arithmetic code that causes a slow down, to give you
evidence, is not easy. So I am sorry for my overly broad claim. 

It is true in general that DMD's inliner is not very good. I _suspect_ 
that it is the primary cause of most instances of poor integer 
performance. It's actually part of the front-end, not the back-end. So 
many of those performance problems won't apply to DMC.

It's also true that the DMD/DMC instruction scheduler doesn't schedule 
for modern processors. But last I checked, GCC wasn't really much better 
in practice (you have to be almost perfect to get a benefit from 
instruction scheduling these days, the hardware does a very good job on 
unscheduled code).

Otherwise, I don't think there's any major optimisation it misses. But 
it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue
(for example, DMD used to have terrible performance for ulong multiplication).<

 
 You are right, the case I'm going to show is a precise problem that's fixable. 

[snip]
 -----------------------
 
 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607


Thanks, that's helpful. It's a major speed difference (factor of 20, 
maybe) so it wouldn't have to occur very often to be noticeable.

Jun 26 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Don wrote:
 bearophile wrote:
 Don:

 Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?

 I compare/convert code to D every day, so I am aware that D code compiled with


DMD is often slower than C/C++ code compiled with GCC. Since some years I even
keep a
 collection of snippets of slow code.

 But I am also aware that the low performance has many different causes, like


some missing inlining, missing loop unrolling, etc, so spotting a clear and
small
case of
 integer arithmetic code that causes a slow down, to give you evidence, is not

easy. So I am sorry for my overly broad claim.
 It is true in general that DMD's inliner is not very good. I _suspect_
 that it is the primary cause of most instances of poor integer
 performance. It's actually part of the front-end, not the back-end. So
 many of those performance problems won't apply to DMC.

 It's also true that the DMD/DMC instruction scheduler doesn't schedule
 for modern processors. But last I checked, GCC wasn't really much better
 in practice (you have to be almost perfect to get a benefit from
 instruction scheduling these days, the hardware does a very good job on
 unscheduled code).

 Otherwise, I don't think there's any major optimisation it misses. But
 it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue



(for example, DMD used to have terrible performance for ulong multiplication).<
 You are right, the case I'm going to show is a precise problem that's fixable.

 [snip]
 -----------------------

 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607


 Thanks, that's helpful. It's a major speed difference (factor of 20,
 maybe) so it wouldn't have to occur very often to be noticeable.

You may also want to have a look at this paper:

http://www.agner.org/optimize/optimizing_cpp.pdf

I don't know if it still accurately reflects the current state though.
Interestingly, it says that DMC is already able to perform the optimization
requested by bearophile.

On page 73 starts a tabular that is quite specific about which optimizations the
DMC backend is lacking.

Cheers,
-Timon

Jun 26 2011

Don <nospam nospam.com> writes:

Timon Gehr wrote:
 Don wrote:
 bearophile wrote:
 Don:

 Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?

 I compare/convert code to D every day, so I am aware that D code compiled with


 DMD is often slower than C/C++ code compiled with GCC. Since some years I even
keep a
 collection of snippets of slow code.

 But I am also aware that the low performance has many different causes, like


 some missing inlining, missing loop unrolling, etc, so spotting a clear and
small
 case of
 integer arithmetic code that causes a slow down, to give you evidence, is not

 easy. So I am sorry for my overly broad claim.
 It is true in general that DMD's inliner is not very good. I _suspect_
 that it is the primary cause of most instances of poor integer
 performance. It's actually part of the front-end, not the back-end. So
 many of those performance problems won't apply to DMC.

 It's also true that the DMD/DMC instruction scheduler doesn't schedule
 for modern processors. But last I checked, GCC wasn't really much better
 in practice (you have to be almost perfect to get a benefit from
 instruction scheduling these days, the hardware does a very good job on
 unscheduled code).

 Otherwise, I don't think there's any major optimisation it misses. But
 it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue



 (for example, DMD used to have terrible performance for ulong multiplication).<
 You are right, the case I'm going to show is a precise problem that's fixable.

 [snip]
 -----------------------

 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607

 Thanks, that's helpful. It's a major speed difference (factor of 20,
 maybe) so it wouldn't have to occur very often to be noticeable.

 
 You may also want to have a look at this paper:
 
 http://www.agner.org/optimize/optimizing_cpp.pdf
 
 I don't know if it still accurately reflects the current state though.

It's a little out of date, DMD now does a couple of things it didn't do 
when Agner did the testing. Incidentally I contributed a bit to that 
paper <g>.

 Interestingly, it says that DMC is already able to perform the optimization
 requested by bearophile.
 
 On page 73 starts a tabular that is quite specific about which optimizations
the
 DMC backend is lacking.
 
 Cheers,
 -Timon

Jun 26 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 6/26/2011 2:24 PM, Don wrote:
 You may also want to have a look at this paper:

 http://www.agner.org/optimize/optimizing_cpp.pdf

 I don't know if it still accurately reflects the current state though.

 It's a little out of date, DMD now does a couple of things it didn't do when
 Agner did the testing. Incidentally I contributed a bit to that paper <g>.

The table has several errors wrt DMC++. For example, DMC++ certainly does 
function inlining, constant propagation and the branch optimizations.

Jun 26 2011

Caligo <iteronvexor gmail.com> writes:

Kind of off topic, but a good place to get benchmark results for many
of the programming languages is Sphere Online Judge:
http://www.spoj.pl/problems/classical/

They accept solutions in D, but not many have been submitted.  I found a few:

http://www.spoj.pl/ranks/FCTRL/lang=D
http://www.spoj.pl/ranks/HASHIT/lang=D
http://www.spoj.pl/ranks/ONP/lang=D

Most of the fastest solutions are in C++, but D is pretty close.
Maybe we could start submitting solutions :-)

Jun 28 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Programming language benchmark