www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Updated D Benchmarks

reply Robert Clipsham <robert octarineparrot.com> writes:
Hi all,

After reading through your comments from my last post, I have 
implemented most of the changes you have requested.

  * Added Compile times
  * Added Memory and Virtual Memory usage
  * Added final executable size
  * Tests are now run 4 times, and readings are the minimum taken from 
the last three runs only (or maximum in the case of memory usage)
  * Graphs now line up properly
  * More detailed compiler information is given
  * Benchmarks have been tweaked to last longer

The only request I believe I've missed (correct me if I'm wrong!) is a C 
or C++ reference. This was planned for inclusion, but I couldn't come to 
a conclusion on which compiler to include and whether to use C or C++. 
Before you suggest having multiple references, I would rather only have 
one reference otherwise it becomes more general language benchmarks 
rather than D.

The benchmarks can be found at http://dbench.octarineparrot.com/.

Updated source code can be found at 
http://hg.octarineparrot.com/dbench/file/tip.

Again, if you have any comments or ideas for improvements let me know! 
If you can come to a conclusion on what C/C++ compiler to use as a 
reference I will rerun the benchmarks with a reference.
Mar 14 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham, the pages are indeed improved a lot. Thank you for your work.

with all benchmarks limited to 256mb memory usage<
- Some benchmarks of the Shootout site will probably need more than 256 mb of RAM.
 The only request I believe I've missed (correct me if I'm wrong!) is a C 
 or C++ reference. This was planned for inclusion, but I couldn't come to 
 a conclusion on which compiler to include and whether to use C or C++.
 Before you suggest having multiple references, I would rather only have 
 one reference otherwise it becomes more general language benchmarks 
 rather than D.
The purpose of the reference is to see how far are the D implementations from "a good enough" compilation. In most cases this means that you can time a C or C++ version (in some benchmarks other languages are faster than C/C++, but for the moment we can ignore this). So my suggest is just to take a look at the Shootout site, where you take your code from (D implementations aren't present anymore, but there are kept elsewhere if you need them), and use the C version every time it's the faster between the C and C++ ones, and use the C++ version in the other cases (I give preference to the C version because they are generally simpler). Few notes: - I may also offer you few more benchmarks not present in the Shootout site. - Now I suggest you to add more benchmarks. - Did you strip the executable produced by ldc and gdc? (If you do it, or you don't do it, then add a note that says it). - What is the trouble of nbody with gdc? - when you give an URL into an email or post I suggest you to not put a full stop "." at the end, otherwise the person that reads the post may have to delete it manually later from the URL. If you really want to add a full stop or a closing parentheses, put a space before it, like this: (http://www.digitalmars.com/webnews/ ). - The compilation times are too much small, so such values are probably noisy. So you may add another value: you can compile all the programs and take the total time required (this isn't the sum of the single compilation times). Otherwise you may need a different benchmark, a much longer D program, that you can compile with all three compilers. - From your results it seems ldc needs more memory to run the programs. The LDC team may take a look at this. Bye, bearophile
Mar 15 2009
parent reply Robert Clipsham <robert octarineparrot.com> writes:
bearophile wrote:
 Robert Clipsham, the pages are indeed improved a lot. Thank you for your work.
Thanks, I'm glad you approve!
 with all benchmarks limited to 256mb memory usage<
- Some benchmarks of the Shootout site will probably need more than 256 mb of RAM.
None of the ones that I'm currently using are. This is just an arbitrary limit I have put in place so my server doesn't run out of memory.
 The only request I believe I've missed (correct me if I'm wrong!) is a C 
 or C++ reference. This was planned for inclusion, but I couldn't come to 
 a conclusion on which compiler to include and whether to use C or C++.
 Before you suggest having multiple references, I would rather only have 
 one reference otherwise it becomes more general language benchmarks 
 rather than D.
The purpose of the reference is to see how far are the D implementations from "a good enough" compilation. In most cases this means that you can time a C or C++ version (in some benchmarks other languages are faster than C/C++, but for the moment we can ignore this). So my suggest is just to take a look at the Shootout site, where you take your code from (D implementations aren't present anymore, but there are kept elsewhere if you need them), and use the C version every time it's the faster between the C and C++ ones, and use the C++ version in the other cases (I give preference to the C version because they are generally simpler).
So you suggest I choose whichever performs best out of C or C++? What compiler would you recommend? Before I was leaning towards C++ (not sure on a compiler), purely because it has a more similar feature set to D/
 Few notes:
 - I may also offer you few more benchmarks not present in the Shootout site.
Thanks! I'd love to add more benchmarks, 6 doesn't give a great overview. Someone has already sent me a few which I plan to include, most of them seem to be x86-32 specific though so I've asked they're updated before I include them.
 - Now I suggest you to add more benchmarks.
My thoughts exactly!
 - Did you strip the executable produced by ldc and gdc? (If you do it, or you
don't do it, then add a note that says it).
No, I did not strip them. I think I might add another page, one with executable sizes, another with stripped executable sizes.
 - What is the trouble of nbody with gdc?
I can't remember off the top of my head, I seem to recall it was a linking error though. I did try to debug it when they were originally run I didn't manage to get anywhere with it though.
 - when you give an URL into an email or post I suggest you to not put a full
stop "." at the end, otherwise the person that reads the post may have to
delete it manually later from the URL. If you really want to add a full stop or
a closing parentheses, put a space before it, like this:
(http://www.digitalmars.com/webnews/ ).
I generally do, it was 4am when I posted though ;P
 - The compilation times are too much small, so such values are probably noisy.
So you may add another value: you can compile all the programs and take the
total time required (this isn't the sum of the single compilation times).
Otherwise you may need a different benchmark, a much longer D program, that you
can compile with all three compilers.
I don't see a problem with this. Even if the values are noisy it still shows that the compilation times are tiny! I quite like your idea of summing the total compile time, I believe this will probably confuse some people and lead them to think D takes a long time to compile. I should note that compile times are unlikely to be accurate in any case as they are only compiled once.
 - From your results it seems ldc needs more memory to run the programs. The
LDC team may take a look at this.
There doesn't seem to be that much difference, but I'm sure they'd be happy to look into it. One of the ideas of the benchmarks is to add a bit of competition and see if we can get D compilers even faster! :D I think once we have a reference from a C or C++ compiler this will become even more true (providing D isn't faster already).
Mar 15 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham:

I have seen you have put all graphs in a page. This is probably better. When
you have 10-20 benchmarks you may need less thick bars.

You can add the raw timings, formatted into an ASCII table, a bit like this
(don't use an HTML table):
http://zi.fi/shootout/rawresults.txt
There's no strict need of a separated file, a <pre>...</pre> part in the page
is enough too.
It's useful for automatic processing of your data, for example with a small
script.
It's so useful, that you too can use such script to generate your html page
with a small python script from such table of numbers.


None of the ones that I'm currently using are.<
I know, but here you can see C++ benchmarks that use 300+ MB: http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=gpp&lang2=gpp&box=1
So you suggest I choose whichever performs best out of C or C++?<
Yep, it gives a more reliable reference. But if you don't like this suggestion do as you like. Using C++ only too is acceptable to me.
What compiler would you recommend?<
GCC or LLVM-GCC seems fine. They aren't equal, as you may have seen from my benchmarks. GCC is probably better, more developed and more widespread.
- I may also offer you few more benchmarks not present in the Shootout site.<<
Thanks! I'd love to add more benchmarks, 6 doesn't give a great overview.<
OK, I can probably find you 5-10 more small benchmarks. I think a private email is better for this (or I'll put a zip somewhere and I'll give you an URL).
No, I did not strip them. I think I might add another page, one with executable
sizes, another with stripped executable sizes.<
Stripped only versions are enough too.
- What is the trouble of nbody with gdc?<<
I can't remember off the top of my head, I seem to recall it was a linking
error though. I did try to debug it when they were originally run I didn't
manage to get anywhere with it though.<
Such trouble can probably be fixed.
- From your results it seems ldc needs more memory to run the programs. The LDC
team may take a look at this.<
There doesn't seem to be that much difference,<
This is a small Python script with data scraped manually from your page (this is why having a raw table is useful): data = """ldc 0.69 dmd 0.63 gdc 0.63 ldc 30.7 dmd 30.64 gdc 30.65 ldc 140.24 dmd 120.61 gdc 120.62 ldc 16.68 dmd 16.62 gdc 16.63 ldc 0.95 dmd 1.52 gdc 0.87 """ data = data.replace("ldc", "").replace("dmd", "").replace("gdc", "").splitlines() data = [map(float, line.split()) for line in data] results = [int(round(sum(line))) for line in zip(*data)] for comp_time in zip("ldc dmd gdc".split(), results): print "%s: %d MB" % comp_time Its output: ldc: 189 MB dmd: 170 MB gdc: 169 MB To me it seems there's some difference. Bye, bearophile
Mar 15 2009
parent reply Robert Clipsham <robert octarineparrot.com> writes:
bearophile wrote:
 Robert Clipsham:
 
 I have seen you have put all graphs in a page. This is probably better. When
you have 10-20 benchmarks you may need less thick bars.
 
 You can add the raw timings, formatted into an ASCII table, a bit like this
(don't use an HTML table):
 http://zi.fi/shootout/rawresults.txt
 There's no strict need of a separated file, a <pre>...</pre> part in the page
is enough too.
 It's useful for automatic processing of your data, for example with a small
script.
 It's so useful, that you too can use such script to generate your html page
with a small python script from such table of numbers.
I've got a better idea. That page is automatically generated from an xml file, I'll just make that available instead.
 None of the ones that I'm currently using are.<
I know, but here you can see C++ benchmarks that use 300+ MB: http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=gpp&lang2=gpp&box=1
I would probably have to exclude tests that use that much memory, there isn't enough ram in my server to go much higher than 256mb in the benchmarks (without taking out all the services running on it first).
 So you suggest I choose whichever performs best out of C or C++?<
Yep, it gives a more reliable reference. But if you don't like this suggestion do as you like. Using C++ only too is acceptable to me.
I'll probably go all C++, we'll see what other people want though.
 What compiler would you recommend?< 
GCC or LLVM-GCC seems fine. They aren't equal, as you may have seen from my benchmarks. GCC is probably better, more developed and more widespread.
I'll probably go with GCC then. Again, we'll see what anyone else thinks first.
 OK, I can probably find you 5-10 more small benchmarks.
 I think a private email is better for this (or I'll put a zip somewhere and
I'll give you an URL).
That'd be great! Thanks.
 Stripped only versions are enough too.
But if I go with both then I've got more data up there for not much more effort :P
 - What is the trouble of nbody with gdc?<<
 I can't remember off the top of my head, I seem to recall it was a linking
error though. I did try to debug it when they were originally run I didn't
manage to get anywhere with it though.<
Such trouble can probably be fixed.
Probably, I'll look into it again before the next time I run the benchmarks,
 - From your results it seems ldc needs more memory to run the programs. The
LDC team may take a look at this.<
 There doesn't seem to be that much difference,<
This is a small Python script with data scraped manually from your page (this is why having a raw table is useful): data = """ldc 0.69 dmd 0.63 gdc 0.63 ldc 30.7 dmd 30.64 gdc 30.65 ldc 140.24 dmd 120.61 gdc 120.62 ldc 16.68 dmd 16.62 gdc 16.63 ldc 0.95 dmd 1.52 gdc 0.87 """ data = data.replace("ldc", "").replace("dmd", "").replace("gdc", "").splitlines() data = [map(float, line.split()) for line in data] results = [int(round(sum(line))) for line in zip(*data)] for comp_time in zip("ldc dmd gdc".split(), results): print "%s: %d MB" % comp_time Its output: ldc: 189 MB dmd: 170 MB gdc: 169 MB To me it seems there's some difference.
OK, it's more difference than I saw with a quick glance... You proved me wrong!
Mar 15 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham:

I've got a better idea. That page is automatically generated from an xml file,
I'll just make that available instead.<
I don't like XML; a small txt table is so easy to process with three lines of Python... :-) (Json is fine too).
I would probably have to exclude tests that use that much memory, there isn't
enough ram in my server to go much higher than 256mb in the benchmarks (without
taking out all the services running on it first).<
Your timings may be quite noisy then. You may need more re-runs and/or much longer timings.
That'd be great! Thanks.<
OK. I see you accept both ways. Bye, bearophile
Mar 15 2009
parent reply Robert Clipsham <robert octarineparrot.com> writes:
bearophile wrote:
 I don't like XML; a small txt table is so easy to process with three lines of
Python... :-)
 (Json is fine too).
If you would like to provide me with a script to convert the xml file to a text table, I'll happily run it and make it available to you. As it is I'm too lazy to write such a thing myself. The xml file will be up in about 5 minutes at http://dbench.octarineparrot.com/results.xml .
Mar 15 2009
parent reply naryl <cy ngs.ru> writes:
Robert Clipsham Wrote:

 bearophile wrote:
 I don't like XML; a small txt table is so easy to process with three lines of
Python... :-)
 (Json is fine too).
If you would like to provide me with a script to convert the xml file to a text table, I'll happily run it and make it available to you. As it is I'm too lazy to write such a thing myself. The xml file will be up in about 5 minutes at http://dbench.octarineparrot.com/results.xml .
I think this will suffice: $ sed 's/<[^>]*>//g; /^$/d' < data | sed 'N; N; N; N; N; N; s/\n/ /g' It'll strip XML tags, blank lines an remove all but every 7th line feeds. You'll get the following output: http://stashbox.org/448426/out.txt You can use AWK to process it. For example: $ awk '/dmd/ {dmd += $5;} /gdc/ {gdc += $5;} /ldc/ {ldc += $5;} END {print "DMD: " dmd/1024 "M\nGDC: " gdc/1024 "M\nLDC: " ldc/1024 "M"}' < out.txt Outputs: DMD: 170.672M GDC: 169.398M LDC: 189.977M
Mar 15 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
naryl:
 I think this will suffice:
 $ sed 's/<[^>]*>//g; /^$/d' < data | sed 'N; N; N; N; N; N; s/\n/ /g'
A Python version a little more resilient to changes in that file: from xml.dom.minidom import parse results1 = parse("results.xml").getElementsByTagName("results") results = results1[0].getElementsByTagName("result") for node in results[0].childNodes: if node.nodeType != node.TEXT_NODE: print node.localName, print for result in results: for node in result.childNodes: if node.nodeType != node.TEXT_NODE: print node.firstChild.data, print Bye, bearophile
Mar 15 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Sorry, assuming a "tidy XML file" is silly. Better:

from xml.dom.minidom import parse

r = parse("results.xml").getElementsByTagName("results")
results = r[0].getElementsByTagName("result")


fields = [n.localName for n in results[0].childNodes if n.nodeType !=
n.TEXT_NODE]
print " ".join(fields)


for r in results:
    print " ".join(r.getElementsByTagName(f)[0].firstChild.data for f in fields)

Bye,
bearophile
Mar 15 2009
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham, eventually your site may become like this page (it may be slow
to load, you may need to load it later too):
http://sbcl.boinkor.net/bench/

It's also useful to see how performance evolves across versions, like a brother
of bugzilla, to spot performance bugs.

Bye,
bearophile
Mar 15 2009
parent Robert Clipsham <robert octarineparrot.com> writes:
bearophile wrote:
 Robert Clipsham, eventually your site may become like this page (it may be
slow to load, you may need to load it later too):
 http://sbcl.boinkor.net/bench/
 
 It's also useful to see how performance evolves across versions, like a
brother of bugzilla, to spot performance bugs.
 
 Bye,
 bearophile
This looks like the way to go for the benchmarks. When I've added all the tests people have sent me and added a reference C++ result I will put the results file under revision control with the rest of the source code and set up a script to generate graphs to show results over time.
Mar 16 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham:
 This looks like the way to go for the benchmarks. When I've added all 
 the tests people have sent me and added a reference C++ result I will 
 put the results file under revision control with the rest of the source 
 code and set up a script to generate graphs to show results over time.
I have sent you an email with several benchmarks. In the meantime I have seen this tiny C++ Global Illumination in 99 lines: http://kevinbeason.com/smallpt/ There are other scenes available too: http://kevinbeason.com/smallpt/extraScenes.txt It's not meant to be fast, so surely there are ways to write much faster C++ code. But it's short enough, and the results are nice enough (even if slow) that it may be translated to D, for comparison. To run it with my MinGW I've had to add: inline double erand48() { return rand() / (double)RAND_MAX; } and replace erand48(...) with erand48(). The MinGW I use is based on GCC 4.3.1 but I have not used OpenMP, it's used in this code to essentially divide the running time by the number of available cores. Bye, bearophile
Mar 16 2009