www.digitalmars.com         C & C++   DMDScript  

D - Is the current D slow or Java fast ?

reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
I've been testing some crypto code, basic port of some Java crypto to D
(the C versions are all macro'ed).

and I got some disturbing results

PC used Athlon 1G33, 512Mb 266 DDR

java version "1.4.0_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_03-b04)
Java HotSpot(TM) Client VM (build 1.4.0_03-b04, mixed mode)
testing md5 expect a 3 to 10 minute delay
1M blocks - 24065ms 24s
1K blocks - 21381ms 21s
1G hashed in
1M blocks - 24065ms 24s
1K blocks - 21381ms 21s
100K hashed in
1B blocks - 16704ms 16s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
1M blocks - 38215ms 38s
1K blocks - 38085ms 38s
1G hashed in
1M blocks - 38215ms 38s
1K blocks - 38085ms 38s
100K hashed in
1B blocks - 18757ms 18s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

compiled with dmd  no options
testing md5 expect a 3 to 10 minute delay
start =1044598753, tm=1044598753
1M blocks - 40s (0m)
1K blocks - 39s (0m)
1G hashed in
1M blocks - 40s (0m)
1K blocks - 39s (0m)
100K hashed in
1B blocks - 35s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
start =1044598867, tm=1044598867
1M blocks - 76s (1m)
1K blocks - 76s (1m)
1G hashed in
1M blocks - 76s (1m)
1K blocks - 76s (1m)
100K hashed in
1B blocks - 44s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -]

with dmd -release things are a little better
testing md5 expect a 3 to 10 minute delay
start =1044599115, tm=1044599115
1M blocks - 36s (0m)
1K blocks - 36s (0m)
1G hashed in
1M blocks - 36s (0m)
1K blocks - 36s (0m)
100K hashed in
1B blocks - 9s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
start =1044599196, tm=1044599196
1M blocks - 50s (0m)
1K blocks - 49s (0m)
1G hashed in
1M blocks - 50s (0m)
1K blocks - 49s (0m)
100K hashed in
1B blocks - 12s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

apart from the 100K hashed as single bytes, it is still slower than Java

a very odd thing happened when I recompiled and ran with the jdk 1.1.8
testing md5 expect a 3 to 10 minute delay
1M blocks - 14511ms 14s
1K blocks - 14370ms 14s
1G hashed in
1M blocks - 14511ms 14s
1K blocks - 14370ms 14s
100K hashed in
1B blocks - 7030ms 7s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
1M blocks - 29593ms 29s
1K blocks - 26869ms 26s
1G hashed in
1M blocks - 29593ms 29s
1K blocks - 26869ms 26s
100K hashed in
1B blocks - 8041ms 8s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

the same class files run under Java 2

testing md5 expect a 3 to 10 minute delay
1M blocks - 24054ms 24s
1K blocks - 21301ms 21s
1G hashed in
1M blocks - 24054ms 24s
1K blocks - 21301ms 21s
100K hashed in
1B blocks - 16443ms 16s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
testing sha0 expect a 3 to 10 minute delay
1M blocks - 40878ms 40s
1K blocks - 38185ms 38s
1G hashed in
1M blocks - 40878ms 40s
1K blocks - 38185ms 38s
100K hashed in
1B blocks - 17916ms 17s
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

I have to tryout C# and C ( I know I can read from disk and hash at over
45M/s with the C version of MD5 I have )

I've included the source,  so can someone else please verify my findings,
and ideal workout where the performance hit is :)
Feb 06 2003
next sibling parent reply Burton Radons <loth users.sourceforge.net> writes:
Mike Wynn wrote:
 I've been testing some crypto code, basic port of some Java crypto to D
 (the C versions are all macro'ed).
 
 and I got some disturbing results

Uh, yeah. You need to enable optimisations. When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it went from 21s to 14s.
Feb 07 2003
parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
well it's as good as C# using dmd -release
C# basic edition, not got the full optiomiser etc.
(I've only ported MD5 so far)
testing MD5 expect a 3 to 10 minute delay

start =10786002

1M blocks - 35670ms (35s)
1K blocks - 33165ms (33s)
1G hashed in
1M blocks - 35670ms (35s)
1K blocks - 33165ms (33m)

100K hashed in
1B blocks - 7130ms (7m)

I never noticed -inline or -O  yes that helps, much better results
-release -inline
testing md5 expect a 3 to 10 minute delay
start =1044643391, tm=1044643391
1M blocks - 20s (0m)
1K blocks - 19s (0m)
1G hashed in
1M blocks - 20s (0m)
1K blocks - 19s (0m)
100K hashed in
1B blocks - 8s (0m)
- -- - -- - -- - -- - -- - -- - -- - -- - -- -

-release -inline -O

testing md5 expect a 3 to 10 minute delay
start =1044644195, tm=1044644195
1M blocks - 14s (0m)
1K blocks - 14s (0m)
1G hashed in
1M blocks - 14s (0m)
1K blocks - 14s (0m)
100K hashed in
1B blocks - 8s (0m)

now that's a lot more acceptable, and much closer to what I expected.
(not tries just -O)
looks like we're running on similar hardware
time to find out what a C version can do.

Mike.
- -- - -- - -- - -- - -- - -- - -- - -- - -- -
"Burton Radons" <loth users.sourceforge.net> wrote in message
news:b1vs2a$2smt$1 digitaldaemon.com...
 Mike Wynn wrote:
 I've been testing some crypto code, basic port of some Java crypto to D
 (the C versions are all macro'ed).

 and I got some disturbing results

Uh, yeah. You need to enable optimisations. When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it went from 21s to 14s.

Feb 07 2003
next sibling parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
D is actually faster than dmc :)

similar MD5 test (C version)

compiled `dmc`
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 27s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 24s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 83s (block:1 [0k], count:1073741824)

compiled with `dmc -o+speed`
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 17s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 14s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 60s (block:1 [0k], count:1073741824)

gcc --version    -> 2.95.3-6  (mingw)
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 30s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 27s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 79s (block:1 [0k], count:1073741824)

gcc -O3
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 10s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 9s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 52s (block:1 [0k], count:1073741824)


compiled with VC++6 release
MD5 speed test (C)
time for 1073741824 bytes (1048576k [1024M])
 is 10s (block:1048576 [1024k], count:1024)
time for 1073741824 bytes (1048576k [1024M])
 is 10s (block:1024 [1k], count:1048576)
time for 1073741824 bytes (1048576k [1024M])
 is 55s (block:1 [0k], count:1073741824)

and I was interested to find gcc -O3 is the same or faster than VC++

has anyone tried to port the linux D gcc front end to mingw (I've never
managed to get gcc to build so I'm not going to even start to try) ?



"Mike Wynn" <mike.wynn l8night.co.uk> wrote in message
news:b210qd$fqe$1 digitaldaemon.com...
 well it's as good as C# using dmd -release
 C# basic edition, not got the full optiomiser etc.
 (I've only ported MD5 so far)
 testing MD5 expect a 3 to 10 minute delay

 start =10786002

 1M blocks - 35670ms (35s)
 1K blocks - 33165ms (33s)
 1G hashed in
 1M blocks - 35670ms (35s)
 1K blocks - 33165ms (33m)

 100K hashed in
 1B blocks - 7130ms (7m)

 I never noticed -inline or -O  yes that helps, much better results
 -release -inline
 testing md5 expect a 3 to 10 minute delay
 start =1044643391, tm=1044643391
 1M blocks - 20s (0m)
 1K blocks - 19s (0m)
 1G hashed in
 1M blocks - 20s (0m)
 1K blocks - 19s (0m)
 100K hashed in
 1B blocks - 8s (0m)
 - -- - -- - -- - -- - -- - -- - -- - -- - -- -

 -release -inline -O

 testing md5 expect a 3 to 10 minute delay
 start =1044644195, tm=1044644195
 1M blocks - 14s (0m)
 1K blocks - 14s (0m)
 1G hashed in
 1M blocks - 14s (0m)
 1K blocks - 14s (0m)
 100K hashed in
 1B blocks - 8s (0m)

 now that's a lot more acceptable, and much closer to what I expected.
 (not tries just -O)
 looks like we're running on similar hardware
 time to find out what a C version can do.

 Mike.
 - -- - -- - -- - -- - -- - -- - -- - -- - -- -
 "Burton Radons" <loth users.sourceforge.net> wrote in message
 news:b1vs2a$2smt$1 digitaldaemon.com...
 Mike Wynn wrote:
 I've been testing some crypto code, basic port of some Java crypto to



 (the C versions are all macro'ed).

 and I got some disturbing results

Uh, yeah. You need to enable optimisations. When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it went from 21s to 14s.


Feb 07 2003
parent "Nic Tiger" <nictiger progtech.ru> writes:
If MD5 uses floating point math, you should specify -ff switch.
It gives real speed gain.

Nic Tiger.

"Mike Wynn" <mike.wynn l8night.co.uk> сообщил/сообщила в новостях следующее:
news:b214it$i5d$1 digitaldaemon.com...
 D is actually faster than dmc :)

 similar MD5 test (C version)

 compiled `dmc`
 MD5 speed test (C)
 time for 1073741824 bytes (1048576k [1024M])
  is 27s (block:1048576 [1024k], count:1024)
 time for 1073741824 bytes (1048576k [1024M])
  is 24s (block:1024 [1k], count:1048576)
 time for 1073741824 bytes (1048576k [1024M])
  is 83s (block:1 [0k], count:1073741824)

 compiled with `dmc -o+speed`
 MD5 speed test (C)
 time for 1073741824 bytes (1048576k [1024M])
  is 17s (block:1048576 [1024k], count:1024)
 time for 1073741824 bytes (1048576k [1024M])
  is 14s (block:1024 [1k], count:1048576)
 time for 1073741824 bytes (1048576k [1024M])
  is 60s (block:1 [0k], count:1073741824)

 gcc --version    -> 2.95.3-6  (mingw)
 MD5 speed test (C)
 time for 1073741824 bytes (1048576k [1024M])
  is 30s (block:1048576 [1024k], count:1024)
 time for 1073741824 bytes (1048576k [1024M])
  is 27s (block:1024 [1k], count:1048576)
 time for 1073741824 bytes (1048576k [1024M])
  is 79s (block:1 [0k], count:1073741824)

 gcc -O3
 MD5 speed test (C)
 time for 1073741824 bytes (1048576k [1024M])
  is 10s (block:1048576 [1024k], count:1024)
 time for 1073741824 bytes (1048576k [1024M])
  is 9s (block:1024 [1k], count:1048576)
 time for 1073741824 bytes (1048576k [1024M])
  is 52s (block:1 [0k], count:1073741824)


 compiled with VC++6 release
 MD5 speed test (C)
 time for 1073741824 bytes (1048576k [1024M])
  is 10s (block:1048576 [1024k], count:1024)
 time for 1073741824 bytes (1048576k [1024M])
  is 10s (block:1024 [1k], count:1048576)
 time for 1073741824 bytes (1048576k [1024M])
  is 55s (block:1 [0k], count:1073741824)

 and I was interested to find gcc -O3 is the same or faster than VC++

 has anyone tried to port the linux D gcc front end to mingw (I've never
 managed to get gcc to build so I'm not going to even start to try) ?



 "Mike Wynn" <mike.wynn l8night.co.uk> wrote in message
 news:b210qd$fqe$1 digitaldaemon.com...
 well it's as good as C# using dmd -release
 C# basic edition, not got the full optiomiser etc.
 (I've only ported MD5 so far)
 testing MD5 expect a 3 to 10 minute delay

 start =10786002

 1M blocks - 35670ms (35s)
 1K blocks - 33165ms (33s)
 1G hashed in
 1M blocks - 35670ms (35s)
 1K blocks - 33165ms (33m)

 100K hashed in
 1B blocks - 7130ms (7m)

 I never noticed -inline or -O  yes that helps, much better results
 -release -inline
 testing md5 expect a 3 to 10 minute delay
 start =1044643391, tm=1044643391
 1M blocks - 20s (0m)
 1K blocks - 19s (0m)
 1G hashed in
 1M blocks - 20s (0m)
 1K blocks - 19s (0m)
 100K hashed in
 1B blocks - 8s (0m)
 - -- - -- - -- - -- - -- - -- - -- - -- - -- -

 -release -inline -O

 testing md5 expect a 3 to 10 minute delay
 start =1044644195, tm=1044644195
 1M blocks - 14s (0m)
 1K blocks - 14s (0m)
 1G hashed in
 1M blocks - 14s (0m)
 1K blocks - 14s (0m)
 100K hashed in
 1B blocks - 8s (0m)

 now that's a lot more acceptable, and much closer to what I expected.
 (not tries just -O)
 looks like we're running on similar hardware
 time to find out what a C version can do.

 Mike.
 - -- - -- - -- - -- - -- - -- - -- - -- - -- -
 "Burton Radons" <loth users.sourceforge.net> wrote in message
 news:b1vs2a$2smt$1 digitaldaemon.com...
 Mike Wynn wrote:
 I've been testing some crypto code, basic port of some Java crypto




 D
 (the C versions are all macro'ed).

 and I got some disturbing results

Uh, yeah. You need to enable optimisations. When I enabled -O the first test went from 36s to 21s; when I enabled -inline as well it



 from 21s to 14s.



Feb 08 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Mike Wynn" <mike.wynn l8night.co.uk> wrote in message
news:b210qd$fqe$1 digitaldaemon.com...
 looks like we're running on similar hardware
 time to find out what a C version can do.

When comparing to C, use DMC++. The reason is that DMD and DMC++ share the optimizer and back end code generator, so you really are comparing the languages rather than the back ends.
Feb 07 2003
parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
I used the dmc.exe that comes with the D alpha
do you mean use dmc -cpp ?
and is -o+speed the right options to get the fastest code ?

the C version is a bit of a devils advocate version realy, and realisticly I
would expect any OO version to always be a bit slower hashing big blocks
(overhead of virtual methods, and the code reuse) and anything when hashing
1 byte entities. just shows how efficient the virtual call code is.

I think I started off trying to compare langs, and ended up comparing
backends :)
with D, Java and C# the differences in the code are subtle, I've got to try
a Java version that uses and Interface as Sun's VM used to be very poor with
interface methods, a C# version that used COM interfaces, unless it does
anyway. and a C# version with `unsafe` code, as the D version currently uses
pointers.
which means I should write a pointer free D version, along with a D version
that use interfaces too.

I think what was yesterday a random query about performance will be
converted into a real utility to test languages and backends, and methods of
optimising for lang X impl Y.

at first I was very conserned that D was unexpectly slow, it is not, I was
doing the wrong things. however it has reasserted my faith in dynamic
compilers, and that compilation speed is irrelivant, gcc which is dog slow,
comes out top.

so the gauntlet has been put down for the gcc front end coders to get D to
compiler to code at least as fast as equiv C++.

however, performance of such a dedicated app is almost irrelivant, jview
(the MS java) performs about the same as dmd with no options, yet on a real
word app it out performs jdk 1.1.8 (or at least its GUI responce is better).
like dmd is out performs the Jdk on the single byte at a time hash, where
the code path is basically a long chain of calls, branches, and reads.
rather than the maths intense md code. and is IMHO a more important
benchmark as it reflects the types of ops that a real usefull app would be
doing.
the 1M hash speed is a test of how good the optimiser for maths code and
register allocaters are.
I've got a couple of other implementations and will have a play with seeing
if the changes to code make any changes to performance.


Mike.


"Walter" <walter digitalmars.com> wrote in message
news:b219t5$la2$1 digitaldaemon.com...
 "Mike Wynn" <mike.wynn l8night.co.uk> wrote in message
 news:b210qd$fqe$1 digitaldaemon.com...
 looks like we're running on similar hardware
 time to find out what a C version can do.

When comparing to C, use DMC++. The reason is that DMD and DMC++ share the optimizer and back end code generator, so you really are comparing the languages rather than the back ends.

Feb 07 2003
parent "Walter" <walter digitalmars.com> writes:
In reality, you shouldn't see much difference in that code between
C/C++/D/Java/C#. The reason is it is integer math intensive, and does not do
much with objects, strings, etc. Those languages all treat integer math
about the same.
Feb 07 2003
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Mike Wynn" <mike.wynn l8night.co.uk> wrote in message
news:b1vlms$2o31$1 digitaldaemon.com...
 I've been testing some crypto code, basic port of some Java crypto to D
 (the C versions are all macro'ed).

Could you post/email the C version please, just so I'm using the same code? Thanks!
Feb 09 2003