www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why DMD is so slow?

reply Marco <marco.falda gmail.com> writes:
Content-Type: text/plain

I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?
Thank you.

// begin of file mandel_d1.d
/*
DMD: dmd -inline -release -O mandel_d1.d
GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d
*/

import std.stdio;

int main()
{
 cdouble a, b, c, z;
 double mand_re = 0, mand_im = 0;

 for (double y = -2; y < 2; y += 0.01) {
  for (double x = -2; x < 2; x += 0.01) {
   z = (x + mand_re) + (y + mand_im) * 1i;
   c = z;
   for (int i = 0; i < 10000; i++) {
    z = z * z + c;
    if(z.re * z.re + z.im * z.im > 4.0) {
     break;
    }
   }
  }
 }
 return 0;
}
// end of file mandel_d1.d
------------------

H:\Codici\Benchmarks> ..\timethis.exe mandel_d1

TimeThis :  Command Line :  mandel_d1
TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008


TimeThis :  Command Line :  mandel_d1
TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
TimeThis :  Elapsed Time :  00:01:54.234

H:\Codici\Benchmarks> ..\timethis mandel_gdc1

TimeThis :  Command Line :  mandel_gdc1
TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008


TimeThis :  Command Line :  mandel_gdc1
TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
TimeThis :  Elapsed Time :  00:00:01.843
Jun 02 2008
next sibling parent Marco <marco.falda gmail.com> writes:
I am sorry: 1 minute vs 1 second is 60 times! (I was reasoning in terms of
order of magnitude, but in this case the multiple is 60).
Jun 02 2008
prev sibling next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Marco wrote:
 I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?

  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;

IIRC DMD's backend is just not as good at optimizing floating-point code as GDC's backend (GCC) is.
Jun 02 2008
prev sibling next sibling parent "Saaa" <empty needmail.com> writes:
This outputs ~2660ms on my pentium 4.
Try profiling your code, or comparing the asm.

------------------

auto timer = new PerformanceCounter;
timer.start();

 cdouble a, b, c, z;
 double mand_re = 0, mand_im = 0;

 for (double y = -2; y < 2; y += 0.01) {
  for (double x = -2; x < 2; x += 0.01) {
   z = (x + mand_re) + (y + mand_im) * 1i;
   c = z;
   for (int i = 0; i < 10000; i++) {
    z = z * z + c;
    if(z.re * z.re + z.im * z.im > 4.0) {
     break;
    }
   }
  }
 }

timer.stop();
int elapsedMsec = timer.milliseconds;
writefln("Time elapsed: %s msec", elapsedMsec); 
Jun 02 2008
prev sibling next sibling parent reply janderson <askme me.com> writes:
Marco wrote:
 I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?
 Thank you.
 
 // begin of file mandel_d1.d
 /*
 DMD: dmd -inline -release -O mandel_d1.d
 GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations
mandel_d1.d
 */
 
 import std.stdio;
 
 int main()
 {
  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;
 
  for (double y = -2; y < 2; y += 0.01) {
   for (double x = -2; x < 2; x += 0.01) {
    z = (x + mand_re) + (y + mand_im) * 1i;
    c = z;
    for (int i = 0; i < 10000; i++) {
     z = z * z + c;
     if(z.re * z.re + z.im * z.im > 4.0) {
      break;
     }
    }
   }
  }
  return 0;
 }
 // end of file mandel_d1.d
 ------------------
 
 H:\Codici\Benchmarks> ..\timethis.exe mandel_d1
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
 TimeThis :  Elapsed Time :  00:01:54.234
 
 H:\Codici\Benchmarks> ..\timethis mandel_gdc1
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
 TimeThis :  Elapsed Time :  00:00:01.843
 

It would be interesting to compare the ASM produced. DMD is not that great at floating point, doesn't unroll so well and has a longer startup time then GDC in my experience. -Joel
Jun 02 2008
parent janderson <askme me.com> writes:
janderson wrote:
 Marco wrote:
 I have written the code reported below to test execution speed of D in 
 Windows and I have found that the same code is about 10 times slower 
 if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds 
 in GDC ! Is there perhaps something wrong? why such a difference?
 Thank you.

 // begin of file mandel_d1.d
 /*
 DMD: dmd -inline -release -O mandel_d1.d
 GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations 
 mandel_d1.d
 */

 import std.stdio;

 int main()
 {
  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;

  for (double y = -2; y < 2; y += 0.01) {
   for (double x = -2; x < 2; x += 0.01) {
    z = (x + mand_re) + (y + mand_im) * 1i;
    c = z;
    for (int i = 0; i < 10000; i++) {
     z = z * z + c;
     if(z.re * z.re + z.im * z.im > 4.0) {
      break;
     }
    }
   }
  }
  return 0;
 }
 // end of file mandel_d1.d
 ------------------

 H:\Codici\Benchmarks> ..\timethis.exe mandel_d1

 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008


 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
 TimeThis :  Elapsed Time :  00:01:54.234

 H:\Codici\Benchmarks> ..\timethis mandel_gdc1

 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008


 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
 TimeThis :  Elapsed Time :  00:00:01.843

It would be interesting to compare the ASM produced. DMD is not that great at floating point, doesn't unroll so well and has a longer startup time then GDC in my experience. -Joel

DMD does seem to beat GDC on some tests. From memory I think its better at integer then GDC. -Joel
Jun 03 2008
prev sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
As everyone has said, these are problems in DMD and DMC.

DMD is not a bad compiler, but it doesn't have all of the optimizations 
that some other compilers are.  For example, Microsoft's cl and Intel's 
icc might possibly beat gcc at this too (although they don't currently 
compile D code.)

Anyway, it's a matter of priorities.  Improving the performance of DMD 
compiled programs is great, but making the D language work is more 
important.  If GDC can do a good optimization job, that's great for it imho.

-[Unknown]


Marco wrote:
 I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?
 Thank you.
 
 // begin of file mandel_d1.d
 /*
 DMD: dmd -inline -release -O mandel_d1.d
 GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations
mandel_d1.d
 */
 
 import std.stdio;
 
 int main()
 {
  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;
 
  for (double y = -2; y < 2; y += 0.01) {
   for (double x = -2; x < 2; x += 0.01) {
    z = (x + mand_re) + (y + mand_im) * 1i;
    c = z;
    for (int i = 0; i < 10000; i++) {
     z = z * z + c;
     if(z.re * z.re + z.im * z.im > 4.0) {
      break;
     }
    }
   }
  }
  return 0;
 }
 // end of file mandel_d1.d
 ------------------
 
 H:\Codici\Benchmarks> ..\timethis.exe mandel_d1
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
 TimeThis :  Elapsed Time :  00:01:54.234
 
 H:\Codici\Benchmarks> ..\timethis mandel_gdc1
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
 TimeThis :  Elapsed Time :  00:00:01.843
 

Jun 03 2008
next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Unknown W. Brackets wrote:
 As everyone has said, these are problems in DMD and DMC.
 
 DMD is not a bad compiler, but it doesn't have all of the optimizations 
 that some other compilers are.  For example, Microsoft's cl and Intel's 
 icc might possibly beat gcc at this too (although they don't currently 
 compile D code.)
 
 Anyway, it's a matter of priorities.  Improving the performance of DMD 
 compiled programs is great, but making the D language work is more 
 important.  If GDC can do a good optimization job, that's great for it 
 imho.

Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets
Jun 03 2008
parent Chris Wright <dhasenan gmail.com> writes:
Robert Fraser wrote:
 Unknown W. Brackets wrote:
 As everyone has said, these are problems in DMD and DMC.

 DMD is not a bad compiler, but it doesn't have all of the 
 optimizations that some other compilers are.  For example, Microsoft's 
 cl and Intel's icc might possibly beat gcc at this too (although they 
 don't currently compile D code.)

 Anyway, it's a matter of priorities.  Improving the performance of DMD 
 compiled programs is great, but making the D language work is more 
 important.  If GDC can do a good optimization job, that's great for it 
 imho.

Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets

GCC already offers that. On the other hand, I've read papers where people modified GCC for research purposes. One month spent on algorithms, one month on implementation, four months just learning how GCC works and where to insert the code. I would guess that LLVM is currently well factored and much more extensible.
Jun 03 2008
prev sibling next sibling parent "Koroskin Denis" <2korden gmail.com> writes:
On Tue, 03 Jun 2008 16:23:07 +0400, Robert Fraser  
<fraserofthenight gmail.com> wrote:

 Unknown W. Brackets wrote:
 As everyone has said, these are problems in DMD and DMC.
  DMD is not a bad compiler, but it doesn't have all of the  
 optimizations that some other compilers are.  For example, Microsoft's  
 cl and Intel's icc might possibly beat gcc at this too (although they  
 don't currently compile D code.)
  Anyway, it's a matter of priorities.  Improving the performance of DMD  
 compiled programs is great, but making the D language work is more  
 important.  If GDC can do a good optimization job, that's great for it  
 imho.

Which is why I think the LLVM project is so important. Many languages -> one optimizer -> many targets

The same goes for GCC as well.
Jun 03 2008
prev sibling parent reply "Saaa" <empty needmail.com> writes:
Did anybody verify DMC being faster?
I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.


 As everyone has said, these are problems in DMD and DMC.

 

Jun 03 2008
parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
I'm sure DMC is no faster than DMD here - the problem is the backend 
optimizer.  Many benchmarks (especially concerning floating point) have 
shown this.

-[Unknown]


Saaa wrote:
 Did anybody verify DMC being faster?
 I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.
 
 
 As everyone has said, these are problems in DMD and DMC.


Jun 03 2008
parent reply "Saaa" <empty needmail.com> writes:
I meant GDC :/
The original post reports a more than one minute runtime using DMD,
I can't replicate that (with a reasonable cpu).
Or did I miss something ..

 I'm sure DMC is no faster than DMD here - the problem is the backend 
 optimizer.  Many benchmarks (especially concerning floating point) have 
 shown this.

 -[Unknown]


 Saaa wrote:
 Did anybody verify DMC being faster?
 I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.


 As everyone has said, these are problems in DMD and DMC.



Jun 03 2008
parent reply "Dave" <Dave_member pathlink.com> writes:
"Saaa" <empty needmail.com> wrote in message 
news:g240q6$14cm$1 digitalmars.com...
I meant GDC :/
 The original post reports a more than one minute runtime using DMD,
 I can't replicate that (with a reasonable cpu).
 Or did I miss something ..

I think you're on to something. I get wildly different timings over several runs, and sometimes get a (much) faster time _without_ the -O switch on a P4. No way should that code be that much slower between DMD and GDC... It's a bug. It's probably an alignment issue, but I wouldn't be surprised to see incorrect results for DMD either. The OP should post that code and the results as a bug. C++ code with DMC probably wouldn't reproduce it because the D version is using the built-in complex type, which is probably the heart of the bug. http://d.puremagic.com/issues/enter_bug.cgi - Dave
 I'm sure DMC is no faster than DMD here - the problem is the backend 
 optimizer.  Many benchmarks (especially concerning floating point) have 
 shown this.

 -[Unknown]


 Saaa wrote:
 Did anybody verify DMC being faster?
 I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.


 As everyone has said, these are problems in DMD and DMC.




Jun 03 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-06-04 02:49:33 +0200, "Dave" <Dave_member pathlink.com> said:

 
 "Saaa" <empty needmail.com> wrote in message 
 news:g240q6$14cm$1 digitalmars.com...
 I meant GDC :/
 The original post reports a more than one minute runtime using DMD,
 I can't replicate that (with a reasonable cpu).
 Or did I miss something ..

I think you're on to something. I get wildly different timings over several runs, and sometimes get a (much) faster time _without_ the -O switch on a P4. No way should that code be that much slower between DMD and GDC... It's a bug. It's probably an alignment issue, but I wouldn't be surprised to see incorrect results for DMD either. The OP should post that code and the results as a bug. C++ code with DMC probably wouldn't reproduce it because the D version is using the built-in complex type, which is probably the heart of the bug. http://d.puremagic.com/issues/enter_bug.cgi - Dave

Yes I think this is probably a bug, and should be reported. Maybe actually the unoptimized version is faster. In any case one should *never* benchmark something that does not print/use something depending on what has been calculated: 1) no way to verify if the calculation was correct 2) some smart compiler might even optimize away the whole calculation (correctly because it is not needed) In this case it is possible that some bug makes NaNs appear, and depending on the IEEE compliance settings of the processor NaNs might slow down the calculation very much (I saw a factor 100 in some calculations). Fawzi
Jun 04 2008