digitalmars.D - Woeful performance of D compared to C++
- rael seesig.com Jan 18 2007
- Kirk McDonald <kirklin.mcdonald gmail.com> Jan 18 2007
- Bill Lear <rael pppp.zopyra.com> Jan 18 2007
- Kirk McDonald <kirklin.mcdonald gmail.com> Jan 18 2007
- rael seesig.com Jan 18 2007
- Pragma <ericanderton yahoo.removeme.com> Jan 18 2007
- Bill Lear <rael pppp.zopyra.com> Jan 18 2007
- Dave <Dave_member pathlink.com> Jan 18 2007
- Sean Kelly <sean f4.ca> Jan 18 2007
- Walter Bright <newshound digitalmars.com> Jan 18 2007
- Dave <Dave_member pathlink.com> Jan 18 2007
- Walter Bright <newshound digitalmars.com> Jan 18 2007
- "Lionello Lunesu" <lionello lunesu.remove.com> Jan 18 2007
- Walter Bright <newshound digitalmars.com> Jan 18 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 19 2007
- janderson <askme me.com> Jan 18 2007
- Jeff McGlynn <d jeffrules.com> Jan 19 2007
- Paulo Herrera <pauloh81 yahoo.ca> Jan 19 2007
- Dave <Dave_member pathlink.com> Jan 19 2007
- Paulo Herrera <pauloh81 yahoo.ca> Jan 19 2007
- janderson <askme me.com> Jan 19 2007
- Bill Lear <rael pppp.zopyra.com> Jan 18 2007
- Sean Kelly <sean f4.ca> Jan 18 2007
In other benchmarks I've seen, D seems quite competitive with C/C++.
I seem to have written a very simple program that shows D in a very
poor light compared to C++. I wonder if it is my inexperience.
I am using dmd 1.0, and g++ 4.1.1 under Linux Fedora Core 6, running
on a 3.0 GHz Pentium 4 with 1 gig of RAM.
The program is a simulation of the Monty Hall problem (see Wikipedia).
Here is the D program:
import std.random;
import std.stdio;
void main() {
const uint n = 10_000_000;
ubyte doors;
uint wins, wins_switching;
for (uint i; i < n; ++i) {
doors |= cast(ubyte)(1 << rand() % 3);
if (doors & 1) {
++wins;
} else {
++wins_switching;
}
doors = 0;
}
writefln("Wins switching: %d [%f%%]", wins_switching,
(wins_switching / cast(double) n) * 100);
writefln("Wins without switching: %d [%f%%]", wins,
(wins / cast(double) n) * 100);
}
Compiled with:
% dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d
gcc monty.o -o monty_d -m32 -lphobos -lpthread -lm
and here is the C++:
#include <iostream>
#include <cstdlib>
int main() {
unsigned char doors = 0;
const unsigned int n = 10000000;
unsigned int wins = 0, wins_switching = 0;
for (unsigned int i = 0; i < n; ++i) {
unsigned char r = 1 << (rand() % 3);
doors |= r; // place the car behind a random door
if (doors & 1) { // choose zero'th door, same as random choice
++wins;
} else {
++wins_switching;
}
doors ^= r; // zero the door with car
}
const double d = n / 100;
std::cout << "Win % switching: " << (wins_switching / d)
<< "\nWin % no switching: " << (wins / d) << '\n';
}
Compiled with:
% g++ -O3 -o monty_cc
Execution times (best of 5):
% time monty_d
Wins switching: 6665726 [66.657260%]
Wins without switching: 3334274 [33.342740%]
real 0m2.444s
user 0m2.442s
sys 0m0.002s
% time monty_cc
Win % switching: 66.6766
Win % no switching: 33.3234
real 0m0.433s
user 0m0.432s
sys 0m0.001s
Any help would be appreciated.
Thanks.
Bill
--
Bill Lear
r * e * * o * y * a * c * m
* a * l * z * p * r * . * o *
Jan 18 2007
rael seesig.com wrote:Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d
Perhaps the -release flag would make a difference? Also, that -I option should be redundant if you've set up your dmd.conf file properly. -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org
Jan 18 2007
Kirk McDonald <kirklin.mcdonald gmail.com> writes:rael seesig.com wrote:Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d
Perhaps the -release flag would make a difference? Also, that -I option should be redundant if you've set up your dmd.conf file properly.
Compiling with -release seems to make no appreciable difference. And, I have tried and tried to set up my dmd.conf file: % ls -l /etc/dmd.conf % cat /etc/dmd.conf [Environment] DFLAGS=-I/opt/dmd/src/phobos But it doesn't seem to work. Do you see anything I've done wrong here? This, in fact, is driving me nuts... Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
Bill Lear wrote:Kirk McDonald <kirklin.mcdonald gmail.com> writes:rael seesig.com wrote:Compiled with: % dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d
Also, that -I option should be redundant if you've set up your dmd.conf file properly.
Compiling with -release seems to make no appreciable difference. And, I have tried and tried to set up my dmd.conf file: % ls -l /etc/dmd.conf % cat /etc/dmd.conf [Environment] DFLAGS=-I/opt/dmd/src/phobos But it doesn't seem to work. Do you see anything I've done wrong here? This, in fact, is driving me nuts... Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Here is the dmd.conf search path (as documented on http://www.digitalmars.com/d/dcompiler.html): 1. current working directory 2. $HOME 3. the directory the dmd executable is in 4. /etc/dmd.conf If you simply extracted the dmd archive into /opt, then it will find the dmd.conf file alongside the binary before it finds the one at /etc/dmd.conf. Either remove the one next to the binary or edit it. -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org
Jan 18 2007
Kirk McDonald <kirklin.mcdonald gmail.com> writes:Bill Lear wrote: [elided stupidity] Here is the dmd.conf search path (as documented on http://www.digitalmars.com/d/dcompiler.html): 1. current working directory 2. $HOME 3. the directory the dmd executable is in 4. /etc/dmd.conf If you simply extracted the dmd archive into /opt, then it will find the dmd.conf file alongside the binary before it finds the one at /etc/dmd.conf. Either remove the one next to the binary or edit it.
I think I once knew this but somehow forgot. Score one for stupidty. Works perfecly. Thank you. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
rael seesig.com wrote:In other benchmarks I've seen, D seems quite competitive with C/C++. I seem to have written a very simple program that shows D in a very poor light compared to C++. I wonder if it is my inexperience.
Any help would be appreciated.
I'm gobsmacked. No array concatenation, strings, large allocations, or even floating point. Just integer math and comparisons. Check the obvious stuff first: disable the GC, compile with "-inline -release" for GDC to match the "-O3 -o" that you're using on GCC. The only part of that loop that is of any consequence is the call to rand() - odds are they are two completely different algorithms, with D's being slower (performance test anyone?). Everything else should reduce to almost the same exact machine code. -- - EricAnderton at yahoo
Jan 18 2007
Pragma <ericanderton yahoo.removeme.com> writes:rael seesig.com wrote:In other benchmarks I've seen, D seems quite competitive with C/C++. I seem to have written a very simple program that shows D in a very poor light compared to C++. I wonder if it is my inexperience.
Any help would be appreciated.
I'm gobsmacked. No array concatenation, strings, large allocations, or even floating point. Just integer math and comparisons. Check the obvious stuff first: disable the GC, compile with "-inline -release" for GDC to match the "-O3 -o" that you're using on GCC.
Hmm, tried -release and -O -inline, but not disable GC. I'll throw a spare whirl that way and see how that goes.The only part of that loop that is of any consequence is the call to rand() - odds are they are two completely different algorithms, with D's being slower (performance test anyone?). Everything else should reduce to almost the same exact machine code.
The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-). Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
Bill Lear wrote:The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-).
Yes, but if you make it so that the C++ compiler can't so easily remove the loop, then they are the same :) int main(int argc, char *argv[]) { unsigned char doors = 0; //const unsigned int n = 100000000; unsigned int n = argc > 1 ? atoi(argv[1]) : 10000000; <IMHO, that's almost always a worthless optimization for "real-world" code and even "good" benchmarks :)>.Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
Bill Lear wrote:The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-).
With execution times that short, you're really comparing the startup time of a D application vs. a C++ application. And D application startup time includes the initialization of a garbage collector, in the default case. If you really wanted to compare apples to apples here I'd rip out the default GC and replace it with one that has no initialization cost. Sean
Jan 18 2007
Sean Kelly wrote:Bill Lear wrote:The rand() call is definitely the most expensive. When I remove it from both the C++ and the D program, the times plummet (to 0.003 and 0.013 seconds, respectively --- still, however, leaving the D program running in 4.3 times that of the C++ program;-).
With execution times that short, you're really comparing the startup time of a D application vs. a C++ application. And D application startup time includes the initialization of a garbage collector, in the default case. If you really wanted to compare apples to apples here I'd rip out the default GC and replace it with one that has no initialization cost.
There are easier solutions to get better timings. See: http://www.digitalmars.com/techtips/timing_code.html
Jan 18 2007
D's rand() is slow.
//import std.random;
extern (C) int rand();
import std.stdio;
import std.conv;
void main()
{
const uint n = 10_000_000;
ubyte doors;
uint wins, wins_switching;
for (uint i; i < n; ++i) {
doors |= cast(ubyte)(1 << rand() % 3);
if (doors & 1) {
++wins;
} else {
++wins_switching;
}
doors = 0;
}
writefln("Wins switching: %d [%f%%]", wins_switching,
(wins_switching / cast(double) n) * 100);
writefln("Wins without switching: %d [%f%%]", wins,
(wins / cast(double) n) * 100);
}
rael seesig.com wrote:
In other benchmarks I've seen, D seems quite competitive with C/C++.
I seem to have written a very simple program that shows D in a very
poor light compared to C++. I wonder if it is my inexperience.
I am using dmd 1.0, and g++ 4.1.1 under Linux Fedora Core 6, running
on a 3.0 GHz Pentium 4 with 1 gig of RAM.
The program is a simulation of the Monty Hall problem (see Wikipedia).
Here is the D program:
import std.random;
import std.stdio;
void main() {
const uint n = 10_000_000;
ubyte doors;
uint wins, wins_switching;
for (uint i; i < n; ++i) {
doors |= cast(ubyte)(1 << rand() % 3);
if (doors & 1) {
++wins;
} else {
++wins_switching;
}
doors = 0;
}
writefln("Wins switching: %d [%f%%]", wins_switching,
(wins_switching / cast(double) n) * 100);
writefln("Wins without switching: %d [%f%%]", wins,
(wins / cast(double) n) * 100);
}
Compiled with:
% dmd -I/opt/dmd/src/phobos -O -inline monty.d -ofmonty_d
gcc monty.o -o monty_d -m32 -lphobos -lpthread -lm
and here is the C++:
#include <iostream>
#include <cstdlib>
int main() {
unsigned char doors = 0;
const unsigned int n = 10000000;
unsigned int wins = 0, wins_switching = 0;
for (unsigned int i = 0; i < n; ++i) {
unsigned char r = 1 << (rand() % 3);
doors |= r; // place the car behind a random door
if (doors & 1) { // choose zero'th door, same as random choice
++wins;
} else {
++wins_switching;
}
doors ^= r; // zero the door with car
}
const double d = n / 100;
std::cout << "Win % switching: " << (wins_switching / d)
<< "\nWin % no switching: " << (wins / d) << '\n';
}
Compiled with:
% g++ -O3 -o monty_cc
Execution times (best of 5):
% time monty_d
Wins switching: 6665726 [66.657260%]
Wins without switching: 3334274 [33.342740%]
real 0m2.444s
user 0m2.442s
sys 0m0.002s
% time monty_cc
Win % switching: 66.6766
Win % no switching: 33.3234
real 0m0.433s
user 0m0.432s
sys 0m0.001s
Any help would be appreciated.
Thanks.
Bill
--
Bill Lear
r * e * * o * y * a * c * m
* a * l * z * p * r * . * o *
Jan 18 2007
Dave wrote:D's rand() is slow.
True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 18 2007
"Walter Bright" <newshound digitalmars.com> wrote in message news:eoomm5$20qk$1 digitaldaemon.com...Dave wrote:D's rand() is slow.
True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
This might solve the performace in this case, but Walter, have you checked the thread "Why is this D code slower than C++" in digitalmars.D.learn ? It's comparing DMD to DMC, and DMD's exe takes more than twice as long to complete than DMC's. Compiler flags, GC, the obvious things have been checked. Your insight would be appreciated. L.
Jan 18 2007
Lionello Lunesu wrote:This might solve the performace in this case, but Walter, have you checked the thread "Why is this D code slower than C++" in digitalmars.D.learn ?
The first thing I'd try is using DMD's built-in profiler: dmd -profile test.d
Jan 18 2007
Walter Bright wrote:Lionello Lunesu wrote:This might solve the performace in this case, but Walter, have you checked the thread "Why is this D code slower than C++" in digitalmars.D.learn ?
The first thing I'd try is using DMD's built-in profiler: dmd -profile test.d
Been done. The main thing it shows is that the Sphere.Intersect routine is a hotspot. The other hotspot is the big recursive Raytrace function itself, but that's not so useful without a line-by-line breakdown since basically everything happens inside there. The D trace.log is at: http://www.webpages.uidaho.edu/~shro8822/trace.log The C++ log was attached to a post: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=5958 Though I'm not sure it's useful to compare them, because I think it was two different machines that ran the two. --bb
Jan 19 2007
Walter Bright wrote:Dave wrote:D's rand() is slow.
True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Maybe there should be a randfast() in the standard lib? I imagine this confusion will come up again. -Joel
Jan 18 2007
janderson Wrote:Walter Bright wrote:Dave wrote:D's rand() is slow.
True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Maybe there should be a randfast() in the standard lib? I imagine this confusion will come up again. -Joel
I recommend a built-in mersenne twist function, usually called mt_rand(). -- Jeff
Jan 19 2007
Hi, I've be investigating about performance of different programming languages/compiler using some micro-benchmarks like the one posted in this thread. I observed that in many of them library implementations are much more important than the language itself. Some of my results are posted here http://pauloherrera.blogspot.com/ . In the case of random number generators the performance difference among different implementations/algorithms in the same language can be orders of magnitude. I don't know why all libraries do not implement the Mersenne-Twister algorithm that is considered as the fastest and highest quality (most random). Paulo Walter Bright wrote:Dave wrote:D's rand() is slow.
True. C's rand() is fast, but is known to be not very random. As you pointed out, D users can use either as required.
Jan 19 2007
Paulo Herrera wrote:Hi, I've be investigating about performance of different programming languages/compiler using some micro-benchmarks like the one posted in this thread. I observed that in many of them library implementations are much more important than the language itself. Some of my results are posted here http://pauloherrera.blogspot.com/ . In the case of random number generators the performance difference among different implementations/algorithms in the same language can be orders of magnitude. I don't know why all libraries do not implement the Mersenne-Twister algorithm that is considered as the fastest and highest quality (most random). Paulo
Nice blog. Hopefully in the near future or so DMD will get improved floating point code generation. If so, that should put it at/near the top for each test you sited. D itself has an advantage that may turn out to be very important for numerical codes; the real data type supports the hardware maximum, so for example D supports 80 bit precision on x86 where other languages/compilers don't. Plus there isn't a limit in the D spec. on maximum precision so D compilers can optimize more aggressively. Performance aside, what was your impression on writing the code for each language? Thanks, - Dave
Jan 19 2007
Dave wrote:Paulo Herrera wrote:Hi, I've be investigating about performance of different programming languages/compiler using some micro-benchmarks like the one posted in this thread. I observed that in many of them library implementations are much more important than the language itself. Some of my results are posted here http://pauloherrera.blogspot.com/ . In the case of random number generators the performance difference among different implementations/algorithms in the same language can be orders of magnitude. I don't know why all libraries do not implement the Mersenne-Twister algorithm that is considered as the fastest and highest quality (most random). Paulo
Nice blog. Hopefully in the near future or so DMD will get improved floating point code generation. If so, that should put it at/near the top for each test you sited. D itself has an advantage that may turn out to be very important for numerical codes; the real data type supports the hardware maximum, so for example D supports 80 bit precision on x86 where other languages/compilers don't. Plus there isn't a limit in the D spec. on maximum precision so D compilers can optimize more aggressively. Performance aside, what was your impression on writing the code for each language? Thanks, - Dave
Hi Dave, I didn't write the tests, I only downloaded them from http://shootout.alioth.debian.org/. I really wanted to see if I could observe some difference among compilers, and if I could reproduce the results posted on that site. I've been frustrated about the fact so many people discuss languages performance without facts. I do really have to run lots of simulations that can take several hours. Therefore, performance is really important to me, and 20% or 30% difference can help me to graduate some months earlier, ;D. I have some experience with other languages such as: Python, Java, C++, Fortran95. Comparing to them, I think D has a lot of advantages and I'd really like to use it instead of any of the other ones. It's cleaner, more concise, templates are great, relatively fast, etc. However, I see two problems to use D for number crunching: 1) lack of multidimensional arrays. I know that has been mentioned several times in this forum. My first idea was to write my own class. So, I did it, but it performed much worse than some Fortran compilers.... How bad? Well, nested loops were 8-9 times slower. I couldn't believe that difference. I tried/checked many things to fix that: inlining, memory order, etc, but I couldn't get better performance. I also checked that the Fortran compiler was not too smart to just skip the loop. My conclusion was that to get good performance, like with complex numbers, multidimensional arrays must be implemented as a language feature. Maybe, I'm completely wrong. 2) D performance for floating point operations is relatively slow compared to good C (not C++) and Fortran compilers. I would say, differences of 60-80% or even more in intensive loops are not unusual. For those reasons, I'm just using Fortran95 with features of Fortran 2003 for now. By the way, new Fortran is not that bad. IMHO, it just lacks templates. I think D is the best language for a lot of other applications. It would be nice if it could be the best for scientific applications, too. Paulo
Jan 19 2007
I think D is the best language for a lot of other applications. It would be nice if it could be the best for scientific applications, too. Paulo
I agree, I wish something was done to fix these well known perforce holes in DMD. Then it could even best languages like Fortran. I think performance would be the best sales pitch for DMD. -Joel
Jan 19 2007
Dave <Dave_member pathlink.com> writes:D's rand() is slow. //import std.random; extern (C) int rand(); import std.stdio; import std.conv; [...]
Wow, what a difference! Now D is 0.623 seconds. Huge difference. Many thanks for solving this mystery for me. Bill -- Bill Lear r * e * * o * y * a * c * m * a * l * z * p * r * . * o *
Jan 18 2007
I tried running these under Tango with DMD on Win32 (as it's the setup I
currently have). Here are my slightly altered programs to make the two
a bit more comparable. First, the D code:
import tango.stdc.stdlib;
import tango.stdc.stdio;
void main() {
const uint n = 10_000_000;
ubyte doors;
uint wins, wins_switching;
for (uint i; i < n; ++i) {
doors |= cast(ubyte)(1 << rand() % 3);
if (doors & 1) {
++wins;
} else {
++wins_switching;
}
doors = 0;
}
printf("Wins switching: %d [%f%%]\n", wins_switching,
(wins_switching / cast(double) n) * 100);
printf("Wins without switching: %d [%f%%]\n", wins,
(wins / cast(double) n) * 100);
}
And now the C++ code:
#include <cstdlib>
#include <cstdio>
int main() {
unsigned char doors = 0;
const unsigned int n = 10000000;
unsigned int wins = 0, wins_switching = 0;
for (unsigned int i = 0; i < n; ++i) {
unsigned char r = 1 << (rand() % 3);
doors |= r; // place the car behind a random door
if (doors & 1) { // choose zero'th door, same as random choice
++wins;
} else {
++wins_switching;
}
doors ^= r; // zero the door with car
}
const double d = n / 100;
printf("Wins switching: %d [%f%%]\n", wins_switching,
(wins_switching / (double) n) * 100);
printf("Wins without switching: %d [%f%%]\n", wins,
(wins / (double) n) * 100);
}
C:> dmd -O -inline -release dtest
C:> dmc -o ctest.cpp
Here are the results for three runs of the D app:
Execution time: 1.323 s
Execution time: 1.005 s
Execution time: 1.125 s
And three runs of the C++ app:
Execution time: 1.149 s
Execution time: 1.202 s
Execution time: 1.304 s
The numbers above aren't quite as accurate as those using "time" on
Unix, but they're sufficient for a rough comparison. That said, DMD and
DMC perform pretty much the same once the variable of IOStreams vs.
writefln is removed.
Sean
Jan 18 2007









rael seesig.com 