digitalmars.D - Re: Programming language benchmark
- bearophile <bearophileHUGS lycos.com> Jun 24 2011
- Don <nospam nospam.com> Jun 26 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 26 2011
- Don <nospam nospam.com> Jun 26 2011
- Walter Bright <newshound2 digitalmars.com> Jun 26 2011
- Caligo <iteronvexor gmail.com> Jun 28 2011
Don: Sorry for my slow answer, I was quite busy for days.I've never heard that claim before. Do you have evidence for that?
I compare/convert code to D every day, so I am aware that D code compiled with DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a collection of snippets of slow code. But I am also aware that the low performance has many different causes, like some missing inlining, missing loop unrolling, etc, so spotting a clear and small case of integer arithmetic code that causes a slow down, to give you evidence, is not easy. So I am sorry for my overly broad claim.If it is true, there's a strong possibility that it's a small, fixable issue (for example, DMD used to have terrible performance for ulong multiplication).<
You are right, the case I'm going to show is a precise problem that's fixable. ----------------------- // C code #include "limits.h" #include "stdio.h" int divideBySeven(int x) { return x / 7; } int main() { int i = INT_MAX; int r; while (i--) r = divideBySeven(i); printf("%d\n", r); return 0; } ----------------------- // D code int divideBySeven(int x) { return x / 7; } void main() { int i = int.max; int r; while (i--) r = divideBySeven(i); printf("%d\n", r); } ----------------------- Asm from the C version: _divideBySeven: pushl %ebx movl $-1840700269, %ebx movl 8(%esp), %ecx movl %ebx, %eax popl %ebx imull %ecx leal (%edx,%ecx), %eax sarl $31, %ecx sarl $2, %eax subl %ecx, %eax ret _main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebx movl $-1840700269, %ebx pushl %ecx subl $20, %esp call ___main movl $2147483646, %ecx .p2align 4,,10 L4: movl %ecx, %eax imull %ebx movl %ecx, %eax addl %ecx, %edx sarl $31, %eax sarl $2, %edx decl %ecx subl %eax, %edx cmpl $-1, %ecx jne L4 movl %edx, 4(%esp) movl $LC0, (%esp) call _printf addl $20, %esp xorl %eax, %eax popl %ecx popl %ebx leal -4(%ecx), %esp ret .def _printf; .scl 2; .type 32; .endef ----------------------- Asm from the D version: _D9int_div_d13divideBySevenFiZi comdat mov ECX,7 cdq idiv ECX ret __Dmain comdat L0: push EAX push EBX mov EBX,07FFFFFFFh push ESI xor ESI,ESI test EBX,EBX lea EBX,-1[EBX] je L24 L11: mov EAX,EBX mov ECX,7 cdq idiv ECX test EBX,EBX mov ESI,EAX lea EBX,-1[EBX] jne L11 L24: push ESI mov EDX,offset FLAT:_DATA push EDX call near ptr _printf add ESP,8 xor EAX,EAX pop ESI pop EBX pop ECX ret ----------------------- For a more real case see: http://d.puremagic.com/issues/show_bug.cgi?id=5607 Bye, bearophile
Jun 24 2011
bearophile wrote:Don: Sorry for my slow answer, I was quite busy for days.I've never heard that claim before. Do you have evidence for that?
I compare/convert code to D every day, so I am aware that D code compiled with DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a collection of snippets of slow code. But I am also aware that the low performance has many different causes, like some missing inlining, missing loop unrolling, etc, so spotting a clear and small case of integer arithmetic code that causes a slow down, to give you evidence, is not easy. So I am sorry for my overly broad claim.
It is true in general that DMD's inliner is not very good. I _suspect_ that it is the primary cause of most instances of poor integer performance. It's actually part of the front-end, not the back-end. So many of those performance problems won't apply to DMC. It's also true that the DMD/DMC instruction scheduler doesn't schedule for modern processors. But last I checked, GCC wasn't really much better in practice (you have to be almost perfect to get a benefit from instruction scheduling these days, the hardware does a very good job on unscheduled code). Otherwise, I don't think there's any major optimisation it misses. But it's quite likely that it misses several very specific minor optimizations.If it is true, there's a strong possibility that it's a small, fixable issue (for example, DMD used to have terrible performance for ulong multiplication).<
You are right, the case I'm going to show is a precise problem that's fixable.
[snip]----------------------- For a more real case see: http://d.puremagic.com/issues/show_bug.cgi?id=5607
Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable.
Jun 26 2011
Don wrote:bearophile wrote:Don: Sorry for my slow answer, I was quite busy for days.I've never heard that claim before. Do you have evidence for that?
I compare/convert code to D every day, so I am aware that D code compiled with
collection of snippets of slow code. But I am also aware that the low performance has many different causes, like
case ofinteger arithmetic code that causes a slow down, to give you evidence, is not
It is true in general that DMD's inliner is not very good. I _suspect_ that it is the primary cause of most instances of poor integer performance. It's actually part of the front-end, not the back-end. So many of those performance problems won't apply to DMC. It's also true that the DMD/DMC instruction scheduler doesn't schedule for modern processors. But last I checked, GCC wasn't really much better in practice (you have to be almost perfect to get a benefit from instruction scheduling these days, the hardware does a very good job on unscheduled code). Otherwise, I don't think there's any major optimisation it misses. But it's quite likely that it misses several very specific minor optimizations.If it is true, there's a strong possibility that it's a small, fixable issue
You are right, the case I'm going to show is a precise problem that's fixable.
[snip]----------------------- For a more real case see: http://d.puremagic.com/issues/show_bug.cgi?id=5607
Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable.
You may also want to have a look at this paper: http://www.agner.org/optimize/optimizing_cpp.pdf I don't know if it still accurately reflects the current state though. Interestingly, it says that DMC is already able to perform the optimization requested by bearophile. On page 73 starts a tabular that is quite specific about which optimizations the DMC backend is lacking. Cheers, -Timon
Jun 26 2011
Timon Gehr wrote:Don wrote:bearophile wrote:Don: Sorry for my slow answer, I was quite busy for days.I've never heard that claim before. Do you have evidence for that?
collection of snippets of slow code. But I am also aware that the low performance has many different causes, like
case ofinteger arithmetic code that causes a slow down, to give you evidence, is not
It is true in general that DMD's inliner is not very good. I _suspect_ that it is the primary cause of most instances of poor integer performance. It's actually part of the front-end, not the back-end. So many of those performance problems won't apply to DMC. It's also true that the DMD/DMC instruction scheduler doesn't schedule for modern processors. But last I checked, GCC wasn't really much better in practice (you have to be almost perfect to get a benefit from instruction scheduling these days, the hardware does a very good job on unscheduled code). Otherwise, I don't think there's any major optimisation it misses. But it's quite likely that it misses several very specific minor optimizations.If it is true, there's a strong possibility that it's a small, fixable issue
You are right, the case I'm going to show is a precise problem that's fixable.
----------------------- For a more real case see: http://d.puremagic.com/issues/show_bug.cgi?id=5607
Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable.
You may also want to have a look at this paper: http://www.agner.org/optimize/optimizing_cpp.pdf I don't know if it still accurately reflects the current state though.
It's a little out of date, DMD now does a couple of things it didn't do when Agner did the testing. Incidentally I contributed a bit to that paper <g>.Interestingly, it says that DMC is already able to perform the optimization requested by bearophile. On page 73 starts a tabular that is quite specific about which optimizations the DMC backend is lacking. Cheers, -Timon
Jun 26 2011
On 6/26/2011 2:24 PM, Don wrote:You may also want to have a look at this paper: http://www.agner.org/optimize/optimizing_cpp.pdf I don't know if it still accurately reflects the current state though.
It's a little out of date, DMD now does a couple of things it didn't do when Agner did the testing. Incidentally I contributed a bit to that paper <g>.
The table has several errors wrt DMC++. For example, DMC++ certainly does function inlining, constant propagation and the branch optimizations.
Jun 26 2011
Kind of off topic, but a good place to get benchmark results for many of the programming languages is Sphere Online Judge: http://www.spoj.pl/problems/classical/ They accept solutions in D, but not many have been submitted. I found a few: http://www.spoj.pl/ranks/FCTRL/lang=D http://www.spoj.pl/ranks/HASHIT/lang=D http://www.spoj.pl/ranks/ONP/lang=D Most of the fastest solutions are in C++, but D is pretty close. Maybe we could start submitting solutions :-)
Jun 28 2011