www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Programming language benchmark

reply bearophile <bearophileHUGS lycos.com> writes:
Don:

Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?

I compare/convert code to D every day, so I am aware that D code compiled with DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a collection of snippets of slow code. But I am also aware that the low performance has many different causes, like some missing inlining, missing loop unrolling, etc, so spotting a clear and small case of integer arithmetic code that causes a slow down, to give you evidence, is not easy. So I am sorry for my overly broad claim.
If it is true, there's a strong possibility that it's a small, fixable issue
(for example, DMD used to have terrible performance for ulong multiplication).<

You are right, the case I'm going to show is a precise problem that's fixable. ----------------------- // C code #include "limits.h" #include "stdio.h" int divideBySeven(int x) { return x / 7; } int main() { int i = INT_MAX; int r; while (i--) r = divideBySeven(i); printf("%d\n", r); return 0; } ----------------------- // D code int divideBySeven(int x) { return x / 7; } void main() { int i = int.max; int r; while (i--) r = divideBySeven(i); printf("%d\n", r); } ----------------------- Asm from the C version: _divideBySeven: pushl %ebx movl $-1840700269, %ebx movl 8(%esp), %ecx movl %ebx, %eax popl %ebx imull %ecx leal (%edx,%ecx), %eax sarl $31, %ecx sarl $2, %eax subl %ecx, %eax ret _main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebx movl $-1840700269, %ebx pushl %ecx subl $20, %esp call ___main movl $2147483646, %ecx .p2align 4,,10 L4: movl %ecx, %eax imull %ebx movl %ecx, %eax addl %ecx, %edx sarl $31, %eax sarl $2, %edx decl %ecx subl %eax, %edx cmpl $-1, %ecx jne L4 movl %edx, 4(%esp) movl $LC0, (%esp) call _printf addl $20, %esp xorl %eax, %eax popl %ecx popl %ebx leal -4(%ecx), %esp ret .def _printf; .scl 2; .type 32; .endef ----------------------- Asm from the D version: _D9int_div_d13divideBySevenFiZi comdat mov ECX,7 cdq idiv ECX ret __Dmain comdat L0: push EAX push EBX mov EBX,07FFFFFFFh push ESI xor ESI,ESI test EBX,EBX lea EBX,-1[EBX] je L24 L11: mov EAX,EBX mov ECX,7 cdq idiv ECX test EBX,EBX mov ESI,EAX lea EBX,-1[EBX] jne L11 L24: push ESI mov EDX,offset FLAT:_DATA push EDX call near ptr _printf add ESP,8 xor EAX,EAX pop ESI pop EBX pop ECX ret ----------------------- For a more real case see: http://d.puremagic.com/issues/show_bug.cgi?id=5607 Bye, bearophile
Jun 24 2011
parent reply Don <nospam nospam.com> writes:
bearophile wrote:
 Don:
 
 Sorry for my slow answer, I was quite busy for days.
 
 
 I've never heard that claim before. Do you have evidence for that?

I compare/convert code to D every day, so I am aware that D code compiled with DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a collection of snippets of slow code. But I am also aware that the low performance has many different causes, like some missing inlining, missing loop unrolling, etc, so spotting a clear and small case of integer arithmetic code that causes a slow down, to give you evidence, is not easy. So I am sorry for my overly broad claim.

It is true in general that DMD's inliner is not very good. I _suspect_ that it is the primary cause of most instances of poor integer performance. It's actually part of the front-end, not the back-end. So many of those performance problems won't apply to DMC. It's also true that the DMD/DMC instruction scheduler doesn't schedule for modern processors. But last I checked, GCC wasn't really much better in practice (you have to be almost perfect to get a benefit from instruction scheduling these days, the hardware does a very good job on unscheduled code). Otherwise, I don't think there's any major optimisation it misses. But it's quite likely that it misses several very specific minor optimizations.
 If it is true, there's a strong possibility that it's a small, fixable issue
(for example, DMD used to have terrible performance for ulong multiplication).<

You are right, the case I'm going to show is a precise problem that's fixable.

[snip]
 -----------------------
 
 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607

Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable.
Jun 26 2011
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Don wrote:
 bearophile wrote:
 Don:

 Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?

I compare/convert code to D every day, so I am aware that D code compiled with


 collection of snippets of slow code.

 But I am also aware that the low performance has many different causes, like


case of
 integer arithmetic code that causes a slow down, to give you evidence, is not

 It is true in general that DMD's inliner is not very good. I _suspect_
 that it is the primary cause of most instances of poor integer
 performance. It's actually part of the front-end, not the back-end. So
 many of those performance problems won't apply to DMC.

 It's also true that the DMD/DMC instruction scheduler doesn't schedule
 for modern processors. But last I checked, GCC wasn't really much better
 in practice (you have to be almost perfect to get a benefit from
 instruction scheduling these days, the hardware does a very good job on
 unscheduled code).

 Otherwise, I don't think there's any major optimisation it misses. But
 it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue



 You are right, the case I'm going to show is a precise problem that's fixable.

[snip]
 -----------------------

 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607

Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable.

You may also want to have a look at this paper: http://www.agner.org/optimize/optimizing_cpp.pdf I don't know if it still accurately reflects the current state though. Interestingly, it says that DMC is already able to perform the optimization requested by bearophile. On page 73 starts a tabular that is quite specific about which optimizations the DMC backend is lacking. Cheers, -Timon
Jun 26 2011
parent reply Don <nospam nospam.com> writes:
Timon Gehr wrote:
 Don wrote:
 bearophile wrote:
 Don:

 Sorry for my slow answer, I was quite busy for days.


 I've never heard that claim before. Do you have evidence for that?



 collection of snippets of slow code.

 But I am also aware that the low performance has many different causes, like


case of
 integer arithmetic code that causes a slow down, to give you evidence, is not

 It is true in general that DMD's inliner is not very good. I _suspect_
 that it is the primary cause of most instances of poor integer
 performance. It's actually part of the front-end, not the back-end. So
 many of those performance problems won't apply to DMC.

 It's also true that the DMD/DMC instruction scheduler doesn't schedule
 for modern processors. But last I checked, GCC wasn't really much better
 in practice (you have to be almost perfect to get a benefit from
 instruction scheduling these days, the hardware does a very good job on
 unscheduled code).

 Otherwise, I don't think there's any major optimisation it misses. But
 it's quite likely that it misses several very specific minor optimizations.

 If it is true, there's a strong possibility that it's a small, fixable issue



 You are right, the case I'm going to show is a precise problem that's fixable.

 -----------------------

 For a more real case see:
 http://d.puremagic.com/issues/show_bug.cgi?id=5607

Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable.

You may also want to have a look at this paper: http://www.agner.org/optimize/optimizing_cpp.pdf I don't know if it still accurately reflects the current state though.

It's a little out of date, DMD now does a couple of things it didn't do when Agner did the testing. Incidentally I contributed a bit to that paper <g>.
 Interestingly, it says that DMC is already able to perform the optimization
 requested by bearophile.
 
 On page 73 starts a tabular that is quite specific about which optimizations
the
 DMC backend is lacking.
 
 Cheers,
 -Timon

Jun 26 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2011 2:24 PM, Don wrote:
 You may also want to have a look at this paper:

 http://www.agner.org/optimize/optimizing_cpp.pdf

 I don't know if it still accurately reflects the current state though.

It's a little out of date, DMD now does a couple of things it didn't do when Agner did the testing. Incidentally I contributed a bit to that paper <g>.

The table has several errors wrt DMC++. For example, DMC++ certainly does function inlining, constant propagation and the branch optimizations.
Jun 26 2011
prev sibling parent Caligo <iteronvexor gmail.com> writes:
Kind of off topic, but a good place to get benchmark results for many
of the programming languages is Sphere Online Judge:
http://www.spoj.pl/problems/classical/

They accept solutions in D, but not many have been submitted.  I found a few:

http://www.spoj.pl/ranks/FCTRL/lang=D
http://www.spoj.pl/ranks/HASHIT/lang=D
http://www.spoj.pl/ranks/ONP/lang=D

Most of the fastest solutions are in C++, but D is pretty close.
Maybe we could start submitting solutions :-)
Jun 28 2011