digitalmars.D.learn - Why DMD is so slow?
- Marco <marco.falda gmail.com> Jun 02 2008
- Marco <marco.falda gmail.com> Jun 02 2008
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Jun 02 2008
- "Saaa" <empty needmail.com> Jun 02 2008
- janderson <askme me.com> Jun 02 2008
- janderson <askme me.com> Jun 03 2008
- "Unknown W. Brackets" <unknown simplemachines.org> Jun 03 2008
- Robert Fraser <fraserofthenight gmail.com> Jun 03 2008
- Chris Wright <dhasenan gmail.com> Jun 03 2008
- "Koroskin Denis" <2korden gmail.com> Jun 03 2008
- "Saaa" <empty needmail.com> Jun 03 2008
- "Unknown W. Brackets" <unknown simplemachines.org> Jun 03 2008
- "Saaa" <empty needmail.com> Jun 03 2008
- "Dave" <Dave_member pathlink.com> Jun 03 2008
- Fawzi Mohamed <fmohamed mac.com> Jun 04 2008
Content-Type: text/plain
I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?
Thank you.
// begin of file mandel_d1.d
/*
DMD: dmd -inline -release -O mandel_d1.d
GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d
*/
import std.stdio;
int main()
{
cdouble a, b, c, z;
double mand_re = 0, mand_im = 0;
for (double y = -2; y < 2; y += 0.01) {
for (double x = -2; x < 2; x += 0.01) {
z = (x + mand_re) + (y + mand_im) * 1i;
c = z;
for (int i = 0; i < 10000; i++) {
z = z * z + c;
if(z.re * z.re + z.im * z.im > 4.0) {
break;
}
}
}
}
return 0;
}
// end of file mandel_d1.d
------------------
H:\Codici\Benchmarks> ..\timethis.exe mandel_d1
TimeThis : Command Line : mandel_d1
TimeThis : Start Time : Mon Jun 02 11:28:41 2008
TimeThis : Command Line : mandel_d1
TimeThis : Start Time : Mon Jun 02 11:28:41 2008
TimeThis : End Time : Mon Jun 02 11:30:35 2008
TimeThis : Elapsed Time : 00:01:54.234
H:\Codici\Benchmarks> ..\timethis mandel_gdc1
TimeThis : Command Line : mandel_gdc1
TimeThis : Start Time : Mon Jun 02 11:42:27 2008
TimeThis : Command Line : mandel_gdc1
TimeThis : Start Time : Mon Jun 02 11:42:27 2008
TimeThis : End Time : Mon Jun 02 11:42:29 2008
TimeThis : Elapsed Time : 00:00:01.843
Jun 02 2008
I am sorry: 1 minute vs 1 second is 60 times! (I was reasoning in terms of order of magnitude, but in this case the multiple is 60).
Jun 02 2008
Marco wrote:I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference?
cdouble a, b, c, z; double mand_re = 0, mand_im = 0;
IIRC DMD's backend is just not as good at optimizing floating-point code as GDC's backend (GCC) is.
Jun 02 2008
This outputs ~2660ms on my pentium 4.
Try profiling your code, or comparing the asm.
------------------
auto timer = new PerformanceCounter;
timer.start();
cdouble a, b, c, z;
double mand_re = 0, mand_im = 0;
for (double y = -2; y < 2; y += 0.01) {
for (double x = -2; x < 2; x += 0.01) {
z = (x + mand_re) + (y + mand_im) * 1i;
c = z;
for (int i = 0; i < 10000; i++) {
z = z * z + c;
if(z.re * z.re + z.im * z.im > 4.0) {
break;
}
}
}
}
timer.stop();
int elapsedMsec = timer.milliseconds;
writefln("Time elapsed: %s msec", elapsedMsec);
Jun 02 2008
Marco wrote:I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference? Thank you. // begin of file mandel_d1.d /* DMD: dmd -inline -release -O mandel_d1.d GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d */ import std.stdio; int main() { cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } return 0; } // end of file mandel_d1.d ------------------ H:\Codici\Benchmarks> ..\timethis.exe mandel_d1 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : End Time : Mon Jun 02 11:30:35 2008 TimeThis : Elapsed Time : 00:01:54.234 H:\Codici\Benchmarks> ..\timethis mandel_gdc1 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : End Time : Mon Jun 02 11:42:29 2008 TimeThis : Elapsed Time : 00:00:01.843
It would be interesting to compare the ASM produced. DMD is not that great at floating point, doesn't unroll so well and has a longer startup time then GDC in my experience. -Joel
Jun 02 2008
janderson wrote:Marco wrote:I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference? Thank you. // begin of file mandel_d1.d /* DMD: dmd -inline -release -O mandel_d1.d GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d */ import std.stdio; int main() { cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } return 0; } // end of file mandel_d1.d ------------------ H:\Codici\Benchmarks> ..\timethis.exe mandel_d1 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : End Time : Mon Jun 02 11:30:35 2008 TimeThis : Elapsed Time : 00:01:54.234 H:\Codici\Benchmarks> ..\timethis mandel_gdc1 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : End Time : Mon Jun 02 11:42:29 2008 TimeThis : Elapsed Time : 00:00:01.843
It would be interesting to compare the ASM produced. DMD is not that great at floating point, doesn't unroll so well and has a longer startup time then GDC in my experience. -Joel
DMD does seem to beat GDC on some tests. From memory I think its better at integer then GDC. -Joel
Jun 03 2008
As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho. -[Unknown] Marco wrote:I have written the code reported below to test execution speed of D in Windows and I have found that the same code is about 10 times slower if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps something wrong? why such a difference? Thank you. // begin of file mandel_d1.d /* DMD: dmd -inline -release -O mandel_d1.d GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d */ import std.stdio; int main() { cdouble a, b, c, z; double mand_re = 0, mand_im = 0; for (double y = -2; y < 2; y += 0.01) { for (double x = -2; x < 2; x += 0.01) { z = (x + mand_re) + (y + mand_im) * 1i; c = z; for (int i = 0; i < 10000; i++) { z = z * z + c; if(z.re * z.re + z.im * z.im > 4.0) { break; } } } } return 0; } // end of file mandel_d1.d ------------------ H:\Codici\Benchmarks> ..\timethis.exe mandel_d1 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : Command Line : mandel_d1 TimeThis : Start Time : Mon Jun 02 11:28:41 2008 TimeThis : End Time : Mon Jun 02 11:30:35 2008 TimeThis : Elapsed Time : 00:01:54.234 H:\Codici\Benchmarks> ..\timethis mandel_gdc1 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : Command Line : mandel_gdc1 TimeThis : Start Time : Mon Jun 02 11:42:27 2008 TimeThis : End Time : Mon Jun 02 11:42:29 2008 TimeThis : Elapsed Time : 00:00:01.843
Jun 03 2008
Unknown W. Brackets wrote:As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho.
Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets
Jun 03 2008
Robert Fraser wrote:Unknown W. Brackets wrote:As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho.
Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets
GCC already offers that. On the other hand, I've read papers where people modified GCC for research purposes. One month spent on algorithms, one month on implementation, four months just learning how GCC works and where to insert the code. I would guess that LLVM is currently well factored and much more extensible.
Jun 03 2008
On Tue, 03 Jun 2008 16:23:07 +0400, Robert Fraser <fraserofthenight gmail.com> wrote:Unknown W. Brackets wrote:As everyone has said, these are problems in DMD and DMC. DMD is not a bad compiler, but it doesn't have all of the optimizations that some other compilers are. For example, Microsoft's cl and Intel's icc might possibly beat gcc at this too (although they don't currently compile D code.) Anyway, it's a matter of priorities. Improving the performance of DMD compiled programs is great, but making the D language work is more important. If GDC can do a good optimization job, that's great for it imho.
Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets
The same goes for GCC as well.
Jun 03 2008
Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
I'm sure DMC is no faster than DMD here - the problem is the backend optimizer. Many benchmarks (especially concerning floating point) have shown this. -[Unknown] Saaa wrote:Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
I meant GDC :/ The original post reports a more than one minute runtime using DMD, I can't replicate that (with a reasonable cpu). Or did I miss something ..I'm sure DMC is no faster than DMD here - the problem is the backend optimizer. Many benchmarks (especially concerning floating point) have shown this. -[Unknown] Saaa wrote:Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
"Saaa" <empty needmail.com> wrote in message news:g240q6$14cm$1 digitalmars.com...I meant GDC :/ The original post reports a more than one minute runtime using DMD, I can't replicate that (with a reasonable cpu). Or did I miss something ..
I think you're on to something. I get wildly different timings over several runs, and sometimes get a (much) faster time _without_ the -O switch on a P4. No way should that code be that much slower between DMD and GDC... It's a bug. It's probably an alignment issue, but I wouldn't be surprised to see incorrect results for DMD either. The OP should post that code and the results as a bug. C++ code with DMC probably wouldn't reproduce it because the D version is using the built-in complex type, which is probably the heart of the bug. http://d.puremagic.com/issues/enter_bug.cgi - DaveI'm sure DMC is no faster than DMD here - the problem is the backend optimizer. Many benchmarks (especially concerning floating point) have shown this. -[Unknown] Saaa wrote:Did anybody verify DMC being faster? I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.As everyone has said, these are problems in DMD and DMC.
Jun 03 2008
On 2008-06-04 02:49:33 +0200, "Dave" <Dave_member pathlink.com> said:"Saaa" <empty needmail.com> wrote in message news:g240q6$14cm$1 digitalmars.com...I meant GDC :/ The original post reports a more than one minute runtime using DMD, I can't replicate that (with a reasonable cpu). Or did I miss something ..
I think you're on to something. I get wildly different timings over several runs, and sometimes get a (much) faster time _without_ the -O switch on a P4. No way should that code be that much slower between DMD and GDC... It's a bug. It's probably an alignment issue, but I wouldn't be surprised to see incorrect results for DMD either. The OP should post that code and the results as a bug. C++ code with DMC probably wouldn't reproduce it because the D version is using the built-in complex type, which is probably the heart of the bug. http://d.puremagic.com/issues/enter_bug.cgi - Dave
Yes I think this is probably a bug, and should be reported. Maybe actually the unoptimized version is faster. In any case one should *never* benchmark something that does not print/use something depending on what has been calculated: 1) no way to verify if the calculation was correct 2) some smart compiler might even optimize away the whole calculation (correctly because it is not needed) In this case it is possible that some bug makes NaNs appear, and depending on the IEEE compliance settings of the processor NaNs might slow down the calculation very much (I saw a factor 100 in some calculations). Fawzi
Jun 04 2008









Marco <marco.falda gmail.com> 