www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Woeful performance of D compared to C++

reply rael seesig.com writes:
In other benchmarks I've seen, D seems quite competitive with C/C++.

I seem to have written a very simple program that shows D in a very
poor light compared to C++.  I wonder if it is my inexperience.

I am using dmd 1.0, and g++ 4.1.1 under Linux Fedora Core 6, running
on a 3.0 GHz Pentium 4 with 1 gig of RAM.

The program is a simulation of the Monty Hall problem (see Wikipedia).

Here is the D program:

import std.random;
import std.stdio;

void main() {
    const uint n = 10_000_000;
    ubyte doors;
    uint wins, wins_switching;

    for (uint i; i < n; ++i) {
        doors |= cast(ubyte)(1 << rand() % 3);

        if (doors & 1) {
            ++wins;
        } else {
            ++wins_switching;
        }

        doors = 0;
    }

    writefln("Wins switching: %d [%f%%]", wins_switching,
             (wins_switching / cast(double) n) * 100);
    writefln("Wins without switching: %d [%f%%]", wins,
             (wins / cast(double) n) * 100);
}

Compiled with:

% dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d
gcc monty.o -o monty_d -m32 -lphobos -lpthread -lm 

and here is the C++:

#include <iostream>
#include <cstdlib>

int main() {
    unsigned char doors = 0;
    const unsigned int n = 10000000;
    unsigned int wins = 0, wins_switching = 0;

    for (unsigned int i = 0; i < n; ++i) {
        unsigned char r = 1 << (rand() % 3);
        doors |= r; // place the car behind a random door

        if (doors & 1) { // choose zero'th door, same as random choice
            ++wins;
        } else {
            ++wins_switching;
        }

        doors ^= r; // zero the door with car
    }

    const double d = n / 100;
    std::cout << "Win % switching: " << (wins_switching / d)
              << "\nWin % no switching: " << (wins / d) << '\n';
}

Compiled with:

% g++ -O3 -o monty_cc

Execution times (best of 5):

% time monty_d
Wins switching: 6665726 [66.657260%]
Wins without switching: 3334274 [33.342740%]

real    0m2.444s
user    0m2.442s
sys     0m0.002s

% time monty_cc
Win % switching: 66.6766
Win % no switching: 33.3234

real    0m0.433s
user    0m0.432s
sys     0m0.001s


Any help would be appreciated.

Thanks.


Bill
--
Bill Lear
r * e *   * o * y * a * c * m
* a * l * z * p * r * . * o *
Jan 18 2007
next sibling parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
rael seesig.com wrote:
 Compiled with:
 
 % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d

Perhaps the -release flag would make a difference? Also, that -I option should be redundant if you've set up your dmd.conf file properly. -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org
Jan 18 2007
parent reply Bill Lear <rael pppp.zopyra.com> writes:
Kirk McDonald <kirklin.mcdonald gmail.com> writes:

 rael seesig.com wrote:
 Compiled with:
 % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d

Perhaps the -release flag would make a difference? Also, that -I option should be redundant if you've set up your dmd.conf file properly.

Compiling with -release seems to make no appreciable difference. And, I have tried and tried to set up my dmd.conf file: % ls -l /etc/dmd.conf % cat /etc/dmd.conf [Environment] DFLAGS=-I/opt/dmd/src/phobos But it doesn't seem to work. Do you see anything I've done wrong here? This, in fact, is driving me nuts... Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Bill Lear wrote:
 Kirk McDonald <kirklin.mcdonald gmail.com> writes:
 
 rael seesig.com wrote:
 Compiled with:
 % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d

Also, that -I option should be redundant if you've set up your dmd.conf file properly.

Compiling with -release seems to make no appreciable difference. And, I have tried and tried to set up my dmd.conf file: % ls -l /etc/dmd.conf % cat /etc/dmd.conf [Environment] DFLAGS=-I/opt/dmd/src/phobos But it doesn't seem to work. Do you see anything I've done wrong here? This, in fact, is driving me nuts... Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *

Here is the dmd.conf search path (as documented on http://www.digitalmars.com/d/dcompiler.html): 1. current working directory 2. $HOME 3. the directory the dmd executable is in 4. /etc/dmd.conf If you simply extracted the dmd archive into /opt, then it will find the dmd.conf file alongside the binary before it finds the one at /etc/dmd.conf. Either remove the one next to the binary or edit it. -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org
Jan 18 2007
parent rael seesig.com writes:
Kirk McDonald <kirklin.mcdonald gmail.com> writes:

 Bill Lear wrote:
 [elided stupidity]

 Here is the dmd.conf search path (as documented on
 http://www.digitalmars.com/d/dcompiler.html):
 
 1. current working directory
 2. $HOME
 3. the directory the dmd executable is in
 4. /etc/dmd.conf
 
 If you simply extracted the dmd archive into /opt, then it will find
 the dmd.conf file alongside the binary before it finds the one at
 /etc/dmd.conf. Either remove the one next to the binary or edit it.

I think I once knew this but somehow forgot. Score one for stupidty. Works perfecly. Thank you. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
prev sibling next sibling parent reply Pragma <ericanderton yahoo.removeme.com> writes:
rael seesig.com wrote:
 In other benchmarks I've seen, D seems quite competitive with C/C++.
 
 I seem to have written a very simple program that shows D in a very
 poor light compared to C++.  I wonder if it is my inexperience.

 Any help would be appreciated.

I'm gobsmacked. No array concatenation, strings, large allocations, or even floating point. Just integer math and comparisons. Check the obvious stuff first: disable the GC, compile with "-inline -release" for GDC to match the "-O3 -o" that you're using on GCC. The only part of that loop that is of any consequence is the call to rand() - odds are they are two completely different algorithms, with D's being slower (performance test anyone?). Everything else should reduce to almost the same exact machine code. -- - EricAnderton at yahoo
Jan 18 2007
parent reply Bill Lear <rael pppp.zopyra.com> writes:
Pragma <ericanderton yahoo.removeme.com> writes:

 rael seesig.com wrote:
 In other benchmarks I've seen, D seems quite competitive with C/C++.
 I seem to have written a very simple program that shows D in a very
 poor light compared to C++.  I wonder if it is my inexperience.

 Any help would be appreciated.

I'm gobsmacked. No array concatenation, strings, large allocations, or even floating point. Just integer math and comparisons. Check the obvious stuff first: disable the GC, compile with "-inline -release" for GDC to match the "-O3 -o" that you're using on GCC.

Hmm, tried -release and -O -inline, but not disable GC. I'll throw a spare whirl that way and see how that goes.
 The only part of that loop that is of any consequence is the call to
 rand() - odds are they are two completely different algorithms, with
 D's being slower (performance test anyone?).  Everything else should
 reduce to almost the same exact machine code.

The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-). Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
next sibling parent Dave <Dave_member pathlink.com> writes:
Bill Lear wrote:
 
 The rand() call is definitely the most expensive.  When I remove it
 from both the C++ and the D program, the times plummet (to 0.003
 and 0.013 seconds, respectively --- still, however, leaving the D
 program running in 4.3 times that of the C++ program;-).
 

Yes, but if you make it so that the C++ compiler can't so easily remove the loop, then they are the same :) int main(int argc, char *argv[]) { unsigned char doors = 0; //const unsigned int n = 100000000; unsigned int n = argc > 1 ? atoi(argv[1]) : 10000000; <IMHO, that's almost always a worthless optimization for "real-world" code and even "good" benchmarks :)>.
 
 Bill
 --
 Bill Lear
 r * e *   * o * y * a * c * m
 * a * l * z * p * r * . * o *

Jan 18 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Bill Lear wrote:
 
 The rand() call is definitely the most expensive.  When I remove it
 from both the C++ and the D program, the times plummet (to 0.003
 and 0.013 seconds, respectively --- still, however, leaving the D
 program running in 4.3 times that of the C++ program;-).

With execution times that short, you're really comparing the startup time of a D application vs. a C++ application. And D application startup time includes the initialization of a garbage collector, in the default case. If you really wanted to compare apples to apples here I'd rip out the default GC and replace it with one that has no initialization cost. Sean
Jan 18 2007
parent Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 Bill Lear wrote:
 The rand() call is definitely the most expensive.  When I remove it
 from both the C++ and the D program, the times plummet (to 0.003
 and 0.013 seconds, respectively --- still, however, leaving the D
 program running in 4.3 times that of the C++ program;-).

With execution times that short, you're really comparing the startup time of a D application vs. a C++ application. And D application startup time includes the initialization of a garbage collector, in the default case. If you really wanted to compare apples to apples here I'd rip out the default GC and replace it with one that has no initialization cost.

There are easier solutions to get better timings. See: http://www.digitalmars.com/techtips/timing_code.html
Jan 18 2007
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
D's rand() is slow.

//import std.random;
extern (C) int rand();
import std.stdio;
import std.conv;

void main()
{
     const uint n = 10_000_000;
     ubyte doors;
     uint wins, wins_switching;

     for (uint i; i < n; ++i) {
         doors |= cast(ubyte)(1 << rand() % 3);

         if (doors & 1) {
             ++wins;
         } else {
             ++wins_switching;
         }

         doors = 0;
     }

     writefln("Wins switching: %d [%f%%]", wins_switching,
              (wins_switching / cast(double) n) * 100);
     writefln("Wins without switching: %d [%f%%]", wins,
              (wins / cast(double) n) * 100);
}

rael seesig.com wrote:
 In other benchmarks I've seen, D seems quite competitive with C/C++.
 
 I seem to have written a very simple program that shows D in a very
 poor light compared to C++.  I wonder if it is my inexperience.
 
 I am using dmd 1.0, and g++ 4.1.1 under Linux Fedora Core 6, running
 on a 3.0 GHz Pentium 4 with 1 gig of RAM.
 
 The program is a simulation of the Monty Hall problem (see Wikipedia).
 
 Here is the D program:
 
 import std.random;
 import std.stdio;
 
 void main() {
     const uint n = 10_000_000;
     ubyte doors;
     uint wins, wins_switching;
 
     for (uint i; i < n; ++i) {
         doors |= cast(ubyte)(1 << rand() % 3);
 
         if (doors & 1) {
             ++wins;
         } else {
             ++wins_switching;
         }
 
         doors = 0;
     }
 
     writefln("Wins switching: %d [%f%%]", wins_switching,
              (wins_switching / cast(double) n) * 100);
     writefln("Wins without switching: %d [%f%%]", wins,
              (wins / cast(double) n) * 100);
 }
 
 Compiled with:
 
 % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d
 gcc monty.o -o monty_d -m32 -lphobos -lpthread -lm 
 
 and here is the C++:
 
 #include <iostream>
 #include <cstdlib>
 
 int main() {
     unsigned char doors = 0;
     const unsigned int n = 10000000;
     unsigned int wins = 0, wins_switching = 0;
 
     for (unsigned int i = 0; i < n; ++i) {
         unsigned char r = 1 << (rand() % 3);
         doors |= r; // place the car behind a random door
 
         if (doors & 1) { // choose zero'th door, same as random choice
             ++wins;
         } else {
             ++wins_switching;
         }
 
         doors ^= r; // zero the door with car
     }
 
     const double d = n / 100;
     std::cout << "Win % switching: " << (wins_switching / d)
               << "\nWin % no switching: " << (wins / d) << '\n';
 }
 
 Compiled with:
 
 % g++ -O3 -o monty_cc
 
 Execution times (best of 5):
 
 % time monty_d
 Wins switching: 6665726 [66.657260%]
 Wins without switching: 3334274 [33.342740%]
 
 real    0m2.444s
 user    0m2.442s
 sys     0m0.002s
 
 % time monty_cc
 Win % switching: 66.6766
 Win % no switching: 33.3234
 
 real    0m0.433s
 user    0m0.432s
 sys     0m0.001s
 
 
 Any help would be appreciated.
 
 Thanks.
 
 
 Bill
 --
 Bill Lear
 r * e *   * o * y * a * c * m
 * a * l * z * p * r * . * o *
 

Jan 18 2007
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Dave wrote:
 
 D's rand() is slow.

True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 18 2007
next sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Walter Bright" <newshound digitalmars.com> wrote in message 
news:eoomm5$20qk$1 digitaldaemon.com...
 Dave wrote:
 D's rand() is slow.

True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.

This might solve the performace in this case, but Walter, have you checked the thread "Why is this D code slower than C++" in digitalmars.D.learn ? It's comparing DMD to DMC, and DMD's exe takes more than twice as long to complete than DMC's. Compiler flags, GC, the obvious things have been checked. Your insight would be appreciated. L.
Jan 18 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Lionello Lunesu wrote:
 This might solve the performace in this case, but Walter, have you checked 
 the thread "Why is this D code slower than C++" in digitalmars.D.learn ?

The first thing I'd try is using DMD's built-in profiler: dmd -profile test.d
Jan 18 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Lionello Lunesu wrote:
 This might solve the performace in this case, but Walter, have you 
 checked the thread "Why is this D code slower than C++" in 
 digitalmars.D.learn ?

The first thing I'd try is using DMD's built-in profiler: dmd -profile test.d

Been done. The main thing it shows is that the Sphere.Intersect routine is a hotspot. The other hotspot is the big recursive Raytrace function itself, but that's not so useful without a line-by-line breakdown since basically everything happens inside there. The D trace.log is at: http://www.webpages.uidaho.edu/~shro8822/trace.log The C++ log was attached to a post: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=5958 Though I'm not sure it's useful to compare them, because I think it was two different machines that ran the two. --bb
Jan 19 2007
prev sibling next sibling parent reply janderson <askme me.com> writes:
Walter Bright wrote:
 Dave wrote:
 D's rand() is slow.

True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.

Maybe there should be a randfast() in the standard lib? I imagine this confusion will come up again. -Joel
Jan 18 2007
parent Jeff McGlynn <d jeffrules.com> writes:
janderson Wrote:

 Walter Bright wrote:
 Dave wrote:
 D's rand() is slow.

True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.

Maybe there should be a randfast() in the standard lib? I imagine this confusion will come up again. -Joel

I recommend a built-in mersenne twist function, usually called mt_rand(). -- Jeff
Jan 19 2007
prev sibling parent reply Paulo Herrera <pauloh81 yahoo.ca> writes:
Hi,
I've be investigating about performance of different programming
languages/compiler using some micro-benchmarks like the one posted in this
thread. I observed that in many of them library implementations are much
more important than the language itself. Some of my results are posted here
http://pauloherrera.blogspot.com/ .

In the case of random number generators the performance difference among
different implementations/algorithms in the same language can be orders of
magnitude.
I don't know why all libraries do not implement the Mersenne-Twister
algorithm that is considered as the fastest and highest quality (most
random).

Paulo



Walter Bright wrote:

 Dave wrote:
 
 D's rand() is slow.

True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.

Jan 19 2007
parent reply Dave <Dave_member pathlink.com> writes:
Paulo Herrera wrote:
 Hi,
 I've be investigating about performance of different programming
 languages/compiler using some micro-benchmarks like the one posted in this
 thread. I observed that in many of them library implementations are much
 more important than the language itself. Some of my results are posted here
 http://pauloherrera.blogspot.com/ .
 
 In the case of random number generators the performance difference among
 different implementations/algorithms in the same language can be orders of
 magnitude.
 I don't know why all libraries do not implement the Mersenne-Twister
 algorithm that is considered as the fastest and highest quality (most
 random).
 
 Paulo
 

Nice blog. Hopefully in the near future or so DMD will get improved floating point code generation. If so, that should put it at/near the top for each test you sited. D itself has an advantage that may turn out to be very important for numerical codes; the real data type supports the hardware maximum, so for example D supports 80 bit precision on x86 where other languages/compilers don't. Plus there isn't a limit in the D spec. on maximum precision so D compilers can optimize more aggressively. Performance aside, what was your impression on writing the code for each language? Thanks, - Dave
Jan 19 2007
parent reply Paulo Herrera <pauloh81 yahoo.ca> writes:
Dave wrote:

 Paulo Herrera wrote:
 Hi,
 I've be investigating about performance of different programming
 languages/compiler using some micro-benchmarks like the one posted in
 this thread. I observed that in many of them library implementations are
 much more important than the language itself. Some of my results are
 posted here http://pauloherrera.blogspot.com/ .
 
 In the case of random number generators the performance difference among
 different implementations/algorithms in the same language can be orders
 of magnitude.
 I don't know why all libraries do not implement the Mersenne-Twister
 algorithm that is considered as the fastest and highest quality (most
 random).
 
 Paulo
 

Nice blog. Hopefully in the near future or so DMD will get improved floating point code generation. If so, that should put it at/near the top for each test you sited. D itself has an advantage that may turn out to be very important for numerical codes; the real data type supports the hardware maximum, so for example D supports 80 bit precision on x86 where other languages/compilers don't. Plus there isn't a limit in the D spec. on maximum precision so D compilers can optimize more aggressively. Performance aside, what was your impression on writing the code for each language? Thanks, - Dave

Hi Dave, I didn't write the tests, I only downloaded them from http://shootout.alioth.debian.org/. I really wanted to see if I could observe some difference among compilers, and if I could reproduce the results posted on that site. I've been frustrated about the fact so many people discuss languages performance without facts. I do really have to run lots of simulations that can take several hours. Therefore, performance is really important to me, and 20% or 30% difference can help me to graduate some months earlier, ;D. I have some experience with other languages such as: Python, Java, C++, Fortran95. Comparing to them, I think D has a lot of advantages and I'd really like to use it instead of any of the other ones. It's cleaner, more concise, templates are great, relatively fast, etc. However, I see two problems to use D for number crunching: 1) lack of multidimensional arrays. I know that has been mentioned several times in this forum. My first idea was to write my own class. So, I did it, but it performed much worse than some Fortran compilers.... How bad? Well, nested loops were 8-9 times slower. I couldn't believe that difference. I tried/checked many things to fix that: inlining, memory order, etc, but I couldn't get better performance. I also checked that the Fortran compiler was not too smart to just skip the loop. My conclusion was that to get good performance, like with complex numbers, multidimensional arrays must be implemented as a language feature. Maybe, I'm completely wrong. 2) D performance for floating point operations is relatively slow compared to good C (not C++) and Fortran compilers. I would say, differences of 60-80% or even more in intensive loops are not unusual. For those reasons, I'm just using Fortran95 with features of Fortran 2003 for now. By the way, new Fortran is not that bad. IMHO, it just lacks templates. I think D is the best language for a lot of other applications. It would be nice if it could be the best for scientific applications, too. Paulo
Jan 19 2007
parent janderson <askme me.com> writes:
 I think D is the best language for a lot of other applications. It would be
 nice if it could be the best for scientific applications, too.
 
 Paulo 

I agree, I wish something was done to fix these well known perforce holes in DMD. Then it could even best languages like Fortran. I think performance would be the best sales pitch for DMD. -Joel
Jan 19 2007
prev sibling parent Bill Lear <rael pppp.zopyra.com> writes:
Dave <Dave_member pathlink.com> writes:
 D's rand() is slow.
 
 //import std.random;
 extern (C) int rand();
 import std.stdio;
 import std.conv;
[...]

Wow, what a difference! Now D is 0.623 seconds. Huge difference. Many thanks for solving this mystery for me. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
prev sibling parent Sean Kelly <sean f4.ca> writes:
I tried running these under Tango with DMD on Win32 (as it's the setup I 
currently have).  Here are my slightly altered programs to make the two 
a bit more comparable.  First, the D code:

import tango.stdc.stdlib;
import tango.stdc.stdio;

void main() {
     const uint n = 10_000_000;
     ubyte doors;
     uint wins, wins_switching;

     for (uint i; i < n; ++i) {
         doors |= cast(ubyte)(1 << rand() % 3);

         if (doors & 1) {
             ++wins;
         } else {
             ++wins_switching;
         }

         doors = 0;
     }

     printf("Wins switching: %d [%f%%]\n", wins_switching,
              (wins_switching / cast(double) n) * 100);
     printf("Wins without switching: %d [%f%%]\n", wins,
              (wins / cast(double) n) * 100);
}

And now the C++ code:

#include <cstdlib>
#include <cstdio>

int main() {
     unsigned char doors = 0;
     const unsigned int n = 10000000;
     unsigned int wins = 0, wins_switching = 0;

     for (unsigned int i = 0; i < n; ++i) {
         unsigned char r = 1 << (rand() % 3);
         doors |= r; // place the car behind a random door

         if (doors & 1) { // choose zero'th door, same as random choice
             ++wins;
         } else {
             ++wins_switching;
         }

         doors ^= r; // zero the door with car
     }

     const double d = n / 100;

     printf("Wins switching: %d [%f%%]\n", wins_switching,
              (wins_switching / (double) n) * 100);
     printf("Wins without switching: %d [%f%%]\n", wins,
              (wins / (double) n) * 100);
}

C:> dmd -O -inline -release dtest
C:> dmc -o ctest.cpp

Here are the results for three runs of the D app:

Execution time: 1.323 s
Execution time: 1.005 s
Execution time: 1.125 s

And three runs of the C++ app:

Execution time: 1.149 s
Execution time: 1.202 s
Execution time: 1.304 s

The numbers above aren't quite as accurate as those using "time" on 
Unix, but they're sufficient for a rough comparison.  That said, DMD and 
DMC perform pretty much the same once the variable of IOStreams vs. 
writefln is removed.


Sean
Jan 18 2007