www.digitalmars.com         C & C++   DMDScript  

D - [Performance] dmd outperforms gcc C and several others in trigonometric functions

reply Manfred Nowak <svv1999 hotmail.com> writes:
Results of a reported Benchmark on a Pentium 4-M, 2 GHz, WinXP Pro SP1: 

Python/Psyco        13.1
gcc C               14.9 
Java 1.3.1          22.1 
Python/interpreted  47.1
Java 1.4.2          57.1

My result on a Duron, 700 MHZ, Win98SE:

dmd 0.79             9.54

All results in seconds. Details will follow.

So long.
Feb 12 2004
next sibling parent reply "Ben Hinkle" <bhinkle4 juno.com> writes:
Hmm. Are you sure you are testing the same thing? I find it hard to believe
C on a 2GHz machine getting beaten by D on a 700MHz machine. Can you run the
D code on the same machine as the others?

"Manfred Nowak" <svv1999 hotmail.com> wrote in message
news:c0gnum$1ugh$1 digitaldaemon.com...
 Results of a reported Benchmark on a Pentium 4-M, 2 GHz, WinXP Pro SP1:

 Python/Psyco        13.1
 gcc C               14.9
 Java 1.3.1          22.1
 Python/interpreted  47.1
 Java 1.4.2          57.1

 My result on a Duron, 700 MHZ, Win98SE:

 dmd 0.79             9.54

 All results in seconds. Details will follow.

 So long.

Feb 12 2004
parent Manfred Nowak <svv1999 hotmail.com> writes:
Ben Hinkle wrote:

 Hmm. Are you sure you are testing the same thing? I find it hard to believe
 C on a 2GHz machine getting beaten by D on a 700MHz machine. Can you run the
 D code on the same machine as the others?

I provisionally checked your complaint: According to SiSoft Sandras FPU-Benchmark the Pentium4-2GHz has 1480 Whetstone MFLOPS while the Duron-700MHz has 1102. The reported result for gcc is 14.9s and my result is 19.6s. The expression (1480/1102) / (19.6/14.9) yields 1.021. So it is close enough to trust the result for dmd. Three of the used functions are intrinsic to dmd, one is coded in assembler and only one is from std.c.math. No, i do not have access to the benchmarking machine. However the author says: | I could also extend the range of languages or variants tested. But wait until you see the other results. So long.
Feb 12 2004
prev sibling next sibling parent reply Sean Kelly <sean ffwd.cx> writes:
Manfred Nowak wrote:
 Now the relative results for gcc and dmd:
 
             int    long   double  trig      geometric mean
 Visual C++  1      1       1      1         1
 gcc C       1.021  1.532   1.484  4.257     1.77
 dmd(est)    1.039  5.691   3.688  2.074     2.59

Looks like D's performance with longs is low across the board, but there's a very important distinction to be made here. VC++ has 32 bit longs and gcc-windows may as well. For an accurate measure on this test, all compilers should have used guranteed 64 bit width types. Also, since this includes compile-time, I expect that is a factor in D's performance. VC++ is pretty fast but I doubt it could compare to D, even with trivial programs. It would be nice if the performance evaluations split out run time vs. compile time for each test. Sean
Feb 13 2004
next sibling parent Sean Kelly <sean ffwd.cx> writes:
Sean Kelly wrote:
 Also, since this includes compile-time, I expect that is a factor in D's 
 performance.

It doesn't include-compile-time. My mistake. I misread Mr. Nowak's post. Sean
Feb 13 2004
prev sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
Sean Kelly wrote:

[...]
 VC++ has 32 bit longs and gcc-windows may as well.

A look at the sources would have convinced you, that the computations were actually taken out with `long long' by VC++ and gcc. There is another issue, I only shortly mentioned: the Pentium 4 has the Intel SSE2 incorporated. To be able to use this raises according to SiSoft Sandra's database the performance in Whetstone from 1480 MFLOPS to 2706 MFLOPS, i.e. a factor of 1.865. It is very unlikely, that VC++ does not use the SSE2 and gcc, although instructed to use them, seems to have failed. So long.
Feb 13 2004
prev sibling next sibling parent reply Patrick Down <Patrick_member pathlink.com> writes:
In article <Xns948F1CC05B07svv1999hotmailcom 127.0.0.1>, Manfred Nowak says...
Following the benchmark, which can be found under
http://osnews.com/story.php?news_id=5602&page=1
the benchmark tested five criteria:
integer math, long math, double math, trigonometric functions, file io

I should be noted that D's longs are 64 bit. http://www.digitalmars.com/d/index.html
Feb 13 2004
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Patrick Down wrote:

 I should be noted that D's longs are 64 bit.

.. and in the benchmark code for C and C++ actually `long long', i.e. 64 bit was used. The sources for the other Compilers can be checked at the submitted adress. So long.
Feb 13 2004
parent Sean Kelly <sean ffwd.cx> writes:
Manfred Nowak wrote:
 .. and in the benchmark code for C and C++ actually `long long', i.e. 64
 bit was used. The sources for the other Compilers can be checked at the
 submitted adress.

Teach me to post on a friday evening. I completely missed the second "long" when I looked at the source. Sean
Feb 13 2004
prev sibling next sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Manfred Nowak wrote:
 NOTE:
 
 also gcc was adviced in the original benchmark to make use of the SSE2,
 this  seemed not to work. Otherwise dmd should even score better

GCC cannot vectorize normal code. But there is a number of GCC-specific intrinsic functions which work on vectors, only they profit from SSE2, SSE, 3DNow and other SIMD extensions. What i would also like to see is a comparison with DMC. Obviously, D semantics does not contain anything that would cause much lower performance than C on math, but it may be the fault of current front-end or of the used back-end. -eye
Feb 14 2004
parent "Manfred Nowak" <svv1999 hotmail.com> writes:
"Ilya Minkov" wrote:
 but it may be the fault of current front-end
 or of the used back-end.

It is the back-end. I did only one run witch dmc on the C source and got the same impressive performancce for long math as with dmd. Astonishingly dmc and gcc rejected the C++ source, because the run index might be out of bounds. So I wonder what VC++ did with the source. So long.
Feb 14 2004
prev sibling next sibling parent reply "Matthew" <matthew.hat stlsoft.dot.org> writes:
It's meaningless if you're performing them on different machines.

"Manfred Nowak" <svv1999 hotmail.com> wrote in message
news:c0gnum$1ugh$1 digitaldaemon.com...
 Results of a reported Benchmark on a Pentium 4-M, 2 GHz, WinXP Pro SP1:

 Python/Psyco        13.1
 gcc C               14.9
 Java 1.3.1          22.1
 Python/interpreted  47.1
 Java 1.4.2          57.1

 My result on a Duron, 700 MHZ, Win98SE:

 dmd 0.79             9.54

 All results in seconds. Details will follow.

 So long.

Feb 19 2004
parent Manfred Nowak <svv1999 hotmail.com> writes:
On Thu, 19 Feb 2004 19:48:51 +1100, Matthew wrote:

 It's meaningless if you're performing them on different machines.

Meanwhile I have given the results on the same machine: 19.6 vs 9.54. What do you mean with meaningless? So long.
Feb 19 2004
prev sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
Following the benchmark, which can be found under
http://osnews.com/story.php?news_id=5602&page=1
the benchmark tested five criteria:
integer math, long math, double math, trigonometric functions, file io

I only take into account the first four. The unweighted geometric mean
of the quotients of the time needed for the program compiled by dmd and
of the time needed by the program compiled by Visual C++ yields an
estimated performance for dmd of a

        2,59-fold

of the time needed by the reference compiler Visual C++.

The ordered ranking of the used compilers follows:

Visual C++     1
Visual Basic   1.43
Visual C#      1.43
Visual J#      1.43
gcc C          1.77
Java 1.4.2     2.04
Java 1.3.1     2.58
dmd 0.79(est)  2.59
Python/psyco   8.78
Python        34.1


Now the relative results for gcc and dmd:

            int    long   double  trig      geometric mean
Visual C++  1      1       1      1         1
gcc C       1.021  1.532   1.484  4.257     1.77
dmd(est)    1.039  5.691   3.688  2.074     2.59

The estimated timing results for dmd on the benchmark machine are:

            int    long   double  trig 
dmd(est)    9.97   107    23.6    7.26   (all values in seconds)   

This estimated timing results are calculated by compiling the source
code of the benchmark by the gcc(same version) on my machine and running
it. The runs yield the following timing results `gcc.mine' in seconds
for the four tests:

            int    long   double  trig 
gcc.mine    18.1   46.9   16.6   19.6

Then the code of the benchmark was adopted to the syntax of D and
compiled by dmd 0.79 and run on my machine, yielding the following
timing results `dmd.mine' in seconds for the four tests:

            int    long   double  trig 
dmd.mine    18.4   176    41.3    9.54

The benchmark results from gcc for the four tests are `gcc.rep' in
seconds 

            int    long   double  trig 
gcc.rep     9.8    28.8   9.5    14.9  

From these  three values the estimated benchmark result for each test is
computed using the formula

      dmd(est) := dmd.mine * gcc.rep / gcc.mine


The values for gcc.mine and dmd.mine where obtained as follows:

compiling the source with 'gcc -march=pentium -mno-cygwin -s -O3' under
cygwin and running it three times on a command prompt on a nearly empty
machine yielded the following:

                               gcc.mine
int:     18130, 18130, 18180 -> 18.1
long:    46850, 46900, 46850 -> 46.9
double:  16640, 16590, 16640 -> 16.6
trig:    19610, 19610, 19670 -> 19.6

compiling the adapted source  with 'dmd -O' and running it three times
on a comand prompt yielded the following:

                                    d.mine
int:     18410,  18404,  18434 ->  18.4
long:   174497, 174534, 174563 -> 175 
double:  41283,  41278,  41255 ->  41.3
trig:     9537,   9544,   9541 ->   9.5i4



The code of the benchmark is located at:
 http://www.ocf.berkeley.edu/~cowell/research/benchmark/code/


The source adapted for D is not attached because of the possible copyright 
infringement of the benchmark creator..


NOTE:

also gcc was adviced in the original benchmark to make use of the SSE2,
this  seemed not to work. Otherwise dmd should even score better

So long.
Mar 13 2004