digitalmars.D.learn - Why DMD is so slow?

Marco (42/42) Jun 02 2008 I have written the code reported below to test execution speed of D in W...

Marco (1/1) Jun 02 2008 I am sorry: 1 minute vs 1 second is 60 times! (I was reasoning in terms ...
Frits van Bommel (5/8) Jun 02 2008 [snip]
Saaa (22/22) Jun 02 2008 This outputs ~2660ms on my pentium 4.
janderson (5/60) Jun 02 2008 It would be interesting to compare the ASM produced. DMD is not that

janderson (4/70) Jun 03 2008 DMD does seem to beat GDC on some tests. From memory I think its better...

Unknown W. Brackets (10/65) Jun 03 2008 As everyone has said, these are problems in DMD and DMC.

Robert Fraser (3/14) Jun 03 2008 Which is why I think the LLVM project is so important. Many languages ->...

Koroskin Denis (3/15) Jun 03 2008 The same goes for GCC as well.
Chris Wright (6/21) Jun 03 2008 GCC already offers that. On the other hand, I've read papers where

Saaa (2/4) Jun 03 2008

Unknown W. Brackets (5/14) Jun 03 2008 I'm sure DMC is no faster than DMD here - the problem is the backend

Saaa (4/17) Jun 03 2008 I meant GDC :/

Dave (13/34) Jun 03 2008 I think you're on to something.

Fawzi Mohamed (13/37) Jun 04 2008 Yes I think this is probably a bug, and should be reported.

Marco <marco.falda gmail.com> writes:

I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?
Thank you.

// begin of file mandel_d1.d
/*
DMD: dmd -inline -release -O mandel_d1.d
GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations mandel_d1.d
*/

import std.stdio;

int main()
{
 cdouble a, b, c, z;
 double mand_re = 0, mand_im = 0;

 for (double y = -2; y < 2; y += 0.01) {
  for (double x = -2; x < 2; x += 0.01) {
   z = (x + mand_re) + (y + mand_im) * 1i;
   c = z;
   for (int i = 0; i < 10000; i++) {
    z = z * z + c;
    if(z.re * z.re + z.im * z.im > 4.0) {
     break;
    }
   }
  }
 }
 return 0;
}
// end of file mandel_d1.d
------------------

H:\Codici\Benchmarks> ..\timethis.exe mandel_d1

TimeThis :  Command Line :  mandel_d1
TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008


TimeThis :  Command Line :  mandel_d1
TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
TimeThis :  Elapsed Time :  00:01:54.234

H:\Codici\Benchmarks> ..\timethis mandel_gdc1

TimeThis :  Command Line :  mandel_gdc1
TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008


TimeThis :  Command Line :  mandel_gdc1
TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
TimeThis :  Elapsed Time :  00:00:01.843

Jun 02 2008

Marco <marco.falda gmail.com> writes:

I am sorry: 1 minute vs 1 second is 60 times! (I was reasoning in terms of
order of magnitude, but in this case the multiple is 60).

Jun 02 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Marco wrote:
 I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?

[snip]
  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;

[snip]

IIRC DMD's backend is just not as good at optimizing floating-point code 
as GDC's backend (GCC) is.

Jun 02 2008

"Saaa" <empty needmail.com> writes:

This outputs ~2660ms on my pentium 4.
Try profiling your code, or comparing the asm.

------------------

auto timer = new PerformanceCounter;
timer.start();

 cdouble a, b, c, z;
 double mand_re = 0, mand_im = 0;

 for (double y = -2; y < 2; y += 0.01) {
  for (double x = -2; x < 2; x += 0.01) {
   z = (x + mand_re) + (y + mand_im) * 1i;
   c = z;
   for (int i = 0; i < 10000; i++) {
    z = z * z + c;
    if(z.re * z.re + z.im * z.im > 4.0) {
     break;
    }
   }
  }
 }

timer.stop();
int elapsedMsec = timer.milliseconds;
writefln("Time elapsed: %s msec", elapsedMsec);

Jun 02 2008

janderson <askme me.com> writes:

Marco wrote:
 I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?
 Thank you.
 
 // begin of file mandel_d1.d
 /*
 DMD: dmd -inline -release -O mandel_d1.d
 GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations
mandel_d1.d
 */
 
 import std.stdio;
 
 int main()
 {
  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;
 
  for (double y = -2; y < 2; y += 0.01) {
   for (double x = -2; x < 2; x += 0.01) {
    z = (x + mand_re) + (y + mand_im) * 1i;
    c = z;
    for (int i = 0; i < 10000; i++) {
     z = z * z + c;
     if(z.re * z.re + z.im * z.im > 4.0) {
      break;
     }
    }
   }
  }
  return 0;
 }
 // end of file mandel_d1.d
 ------------------
 
 H:\Codici\Benchmarks> ..\timethis.exe mandel_d1
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
 TimeThis :  Elapsed Time :  00:01:54.234
 
 H:\Codici\Benchmarks> ..\timethis mandel_gdc1
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
 TimeThis :  Elapsed Time :  00:00:01.843
 

It would be interesting to compare the ASM produced.  DMD is not that 
great at floating point, doesn't unroll so well and has a longer startup 
time then GDC in my experience.

-Joel

Jun 02 2008

janderson <askme me.com> writes:

janderson wrote:
 Marco wrote:
 I have written the code reported below to test execution speed of D in 
 Windows and I have found that the same code is about 10 times slower 
 if compiled using DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds 
 in GDC ! Is there perhaps something wrong? why such a difference?
 Thank you.

 // begin of file mandel_d1.d
 /*
 DMD: dmd -inline -release -O mandel_d1.d
 GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations 
 mandel_d1.d
 */

 import std.stdio;

 int main()
 {
  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;

  for (double y = -2; y < 2; y += 0.01) {
   for (double x = -2; x < 2; x += 0.01) {
    z = (x + mand_re) + (y + mand_im) * 1i;
    c = z;
    for (int i = 0; i < 10000; i++) {
     z = z * z + c;
     if(z.re * z.re + z.im * z.im > 4.0) {
      break;
     }
    }
   }
  }
  return 0;
 }
 // end of file mandel_d1.d
 ------------------

 H:\Codici\Benchmarks> ..\timethis.exe mandel_d1

 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008


 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
 TimeThis :  Elapsed Time :  00:01:54.234

 H:\Codici\Benchmarks> ..\timethis mandel_gdc1

 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008


 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
 TimeThis :  Elapsed Time :  00:00:01.843

 
 It would be interesting to compare the ASM produced.  DMD is not that 
 great at floating point, doesn't unroll so well and has a longer startup 
 time then GDC in my experience.
 
 -Joel

DMD does seem to beat GDC on some tests.  From memory I think its better 
at integer then GDC.

-Joel

Jun 03 2008

"Unknown W. Brackets" <unknown simplemachines.org> writes:

As everyone has said, these are problems in DMD and DMC.

DMD is not a bad compiler, but it doesn't have all of the optimizations 
that some other compilers are.  For example, Microsoft's cl and Intel's 
icc might possibly beat gcc at this too (although they don't currently 
compile D code.)

Anyway, it's a matter of priorities.  Improving the performance of DMD 
compiled programs is great, but making the D language work is more 
important.  If GDC can do a good optimization job, that's great for it imho.

-[Unknown]


Marco wrote:
 I have written the code reported below to test execution speed of D in Windows
and I have found that the same code is about 10 times slower if compiled using
DMD w.r.t. GCD: 1.9 minutes in DMD vs 1.84 seconds in GDC ! Is there perhaps
something wrong? why such a difference?
 Thank you.
 
 // begin of file mandel_d1.d
 /*
 DMD: dmd -inline -release -O mandel_d1.d
 GCD: gdc -O3 --fast-math -inline -lgphobos --expensive-optimizations
mandel_d1.d
 */
 
 import std.stdio;
 
 int main()
 {
  cdouble a, b, c, z;
  double mand_re = 0, mand_im = 0;
 
  for (double y = -2; y < 2; y += 0.01) {
   for (double x = -2; x < 2; x += 0.01) {
    z = (x + mand_re) + (y + mand_im) * 1i;
    c = z;
    for (int i = 0; i < 10000; i++) {
     z = z * z + c;
     if(z.re * z.re + z.im * z.im > 4.0) {
      break;
     }
    }
   }
  }
  return 0;
 }
 // end of file mandel_d1.d
 ------------------
 
 H:\Codici\Benchmarks> ..\timethis.exe mandel_d1
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 
 
 TimeThis :  Command Line :  mandel_d1
 TimeThis :    Start Time :  Mon Jun 02 11:28:41 2008
 TimeThis :      End Time :  Mon Jun 02 11:30:35 2008
 TimeThis :  Elapsed Time :  00:01:54.234
 
 H:\Codici\Benchmarks> ..\timethis mandel_gdc1
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 
 
 TimeThis :  Command Line :  mandel_gdc1
 TimeThis :    Start Time :  Mon Jun 02 11:42:27 2008
 TimeThis :      End Time :  Mon Jun 02 11:42:29 2008
 TimeThis :  Elapsed Time :  00:00:01.843

Jun 03 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Unknown W. Brackets wrote:
 As everyone has said, these are problems in DMD and DMC.
 
 DMD is not a bad compiler, but it doesn't have all of the optimizations 
 that some other compilers are.  For example, Microsoft's cl and Intel's 
 icc might possibly beat gcc at this too (although they don't currently 
 compile D code.)
 
 Anyway, it's a matter of priorities.  Improving the performance of DMD 
 compiled programs is great, but making the D language work is more 
 important.  If GDC can do a good optimization job, that's great for it 
 imho.

Which is why I think the LLVM project is so important. Many languages -> 
one optimizer -> many targets

Jun 03 2008

"Koroskin Denis" <2korden gmail.com> writes:

On Tue, 03 Jun 2008 16:23:07 +0400, Robert Fraser  
<fraserofthenight gmail.com> wrote:

 Unknown W. Brackets wrote:
 As everyone has said, these are problems in DMD and DMC.
  DMD is not a bad compiler, but it doesn't have all of the  
 optimizations that some other compilers are.  For example, Microsoft's  
 cl and Intel's icc might possibly beat gcc at this too (although they  
 don't currently compile D code.)
  Anyway, it's a matter of priorities.  Improving the performance of DMD  
 compiled programs is great, but making the D language work is more  
 important.  If GDC can do a good optimization job, that's great for it  
 imho.

 Which is why I think the LLVM project is so important. Many languages ->  
 one optimizer -> many targets

The same goes for GCC as well.

Jun 03 2008

Chris Wright <dhasenan gmail.com> writes:

Robert Fraser wrote:
 Unknown W. Brackets wrote:
 As everyone has said, these are problems in DMD and DMC.

 DMD is not a bad compiler, but it doesn't have all of the 
 optimizations that some other compilers are.  For example, Microsoft's 
 cl and Intel's icc might possibly beat gcc at this too (although they 
 don't currently compile D code.)

 Anyway, it's a matter of priorities.  Improving the performance of DMD 
 compiled programs is great, but making the D language work is more 
 important.  If GDC can do a good optimization job, that's great for it 
 imho.

 
 Which is why I think the LLVM project is so important. Many languages -> 
 one optimizer -> many targets

GCC already offers that. On the other hand, I've read papers where 
people modified GCC for research purposes. One month spent on 
algorithms, one month on implementation, four months just learning how 
GCC works and where to insert the code. I would guess that LLVM is 
currently well factored and much more extensible.

Jun 03 2008

"Saaa" <empty needmail.com> writes:

Did anybody verify DMC being faster?
I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.


 As everyone has said, these are problems in DMD and DMC.

Jun 03 2008

"Unknown W. Brackets" <unknown simplemachines.org> writes:

I'm sure DMC is no faster than DMD here - the problem is the backend 
optimizer.  Many benchmarks (especially concerning floating point) have 
shown this.

-[Unknown]


Saaa wrote:
 Did anybody verify DMC being faster?
 I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.
 
 
 As everyone has said, these are problems in DMD and DMC.

Jun 03 2008

"Saaa" <empty needmail.com> writes:

I meant GDC :/
The original post reports a more than one minute runtime using DMD,
I can't replicate that (with a reasonable cpu).
Or did I miss something ..

 I'm sure DMC is no faster than DMD here - the problem is the backend 
 optimizer.  Many benchmarks (especially concerning floating point) have 
 shown this.

 -[Unknown]


 Saaa wrote:
 Did anybody verify DMC being faster?
 I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.


 As everyone has said, these are problems in DMD and DMC.

Jun 03 2008

"Dave" <Dave_member pathlink.com> writes:

"Saaa" <empty needmail.com> wrote in message 
news:g240q6$14cm$1 digitalmars.com...
I meant GDC :/
 The original post reports a more than one minute runtime using DMD,
 I can't replicate that (with a reasonable cpu).
 Or did I miss something ..

I think you're on to something.

I get wildly different timings over several runs, and sometimes get a (much) 
faster time _without_ the -O switch on a P4.

No way should that code be that much slower between DMD and GDC... It's a 
bug. It's probably an alignment issue, but I wouldn't be surprised to see 
incorrect results for DMD either.

The OP should post that code and the results as a bug. C++ code with DMC 
probably wouldn't reproduce it because the D version is using the built-in 
complex type, which is probably the heart of the bug.

http://d.puremagic.com/issues/enter_bug.cgi

- Dave

 I'm sure DMC is no faster than DMD here - the problem is the backend 
 optimizer.  Many benchmarks (especially concerning floating point) have 
 shown this.

 -[Unknown]


 Saaa wrote:
 Did anybody verify DMC being faster?
 I don't have DMC, but my DMD code ran in 2.6s iso more than a minute.


 As everyone has said, these are problems in DMD and DMC.

Jun 03 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-06-04 02:49:33 +0200, "Dave" <Dave_member pathlink.com> said:

 
 "Saaa" <empty needmail.com> wrote in message 
 news:g240q6$14cm$1 digitalmars.com...
 I meant GDC :/
 The original post reports a more than one minute runtime using DMD,
 I can't replicate that (with a reasonable cpu).
 Or did I miss something ..

 
 I think you're on to something.
 
 I get wildly different timings over several runs, and sometimes get a 
 (much) faster time _without_ the -O switch on a P4.
 
 No way should that code be that much slower between DMD and GDC... It's 
 a bug. It's probably an alignment issue, but I wouldn't be surprised to 
 see incorrect results for DMD either.
 
 The OP should post that code and the results as a bug. C++ code with 
 DMC probably wouldn't reproduce it because the D version is using the 
 built-in complex type, which is probably the heart of the bug.
 
 http://d.puremagic.com/issues/enter_bug.cgi
 
 - Dave

Yes I think this is probably a bug, and should be reported.
Maybe actually the unoptimized version is faster.
In any case one should *never* benchmark something that does not 
print/use something depending on what has been calculated:

1) no way to verify if the calculation was correct
2) some smart compiler might even optimize away the whole calculation 
(correctly because it is not needed)

In this case it is possible that some bug makes NaNs appear, and 
depending on the IEEE compliance settings of the processor NaNs might 
slow down the calculation very much (I saw a factor 100 in some 
calculations).

Fawzi

Jun 04 2008

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Why DMD is so slow?