digitalmars.D - Port a benchmark to D?
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 03 2011
- bearophile <bearophileHUGS lycos.com> Jun 03 2011
- "Nick Sabalausky" <a a.a> Jun 03 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 03 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 03 2011
- Jonathan M Davis <jmdavisProg gmx.com> Jun 06 2011
- "Jonathan M Davis" <jmdavisProg gmx.com> Jun 03 2011
- "Steven Schveighoffer" <schveiguy yahoo.com> Jun 06 2011
- bearophile <bearophileHUGS lycos.com> Jun 03 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 03 2011
- Walter Bright <newshound2 digitalmars.com> Jun 04 2011
- Jeff Nowakowski <jeff dilacero.org> Jun 04 2011
- Walter Bright <newshound2 digitalmars.com> Jun 04 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- "Jonathan M Davis" <jmdavisProg gmx.com> Jun 03 2011
- bearophile <bearophileHUGS lycos.com> Jun 03 2011
- Adam D. Ruppe <destructionator gmail.com> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Adam D. Ruppe <destructionator gmail.com> Jun 03 2011
- Jonathan M Davis <jmdavisProg gmx.com> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- bearophile <bearophileHUGS lycos.com> Jun 03 2011
- dsimcha <dsimcha yahoo.com> Jun 03 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 04 2011
- Mafi <mafi example.org> Jun 04 2011
- Walter Bright <newshound2 digitalmars.com> Jun 04 2011
- dsimcha <dsimcha yahoo.com> Jun 03 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 04 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 04 2011
- Daniel Gibson <metalcaedes gmail.com> Jun 04 2011
- Timon Gehr <timon.gehr gmx.ch> Jun 04 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 04 2011
- Walter Bright <newshound2 digitalmars.com> Jun 04 2011
- bearophile <bearophileHUGS lycos.com> Jun 04 2011
- dsimcha <dsimcha yahoo.com> Jun 04 2011
- Daniel Gibson <metalcaedes gmail.com> Jun 04 2011
- Walter Bright <newshound2 digitalmars.com> Jun 04 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- bearophile <bearophileHUGS lycos.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 03 2011
- Caligo <iteronvexor gmail.com> Jun 03 2011
- Caligo <iteronvexor gmail.com> Jun 04 2011
- Caligo <iteronvexor gmail.com> Jun 04 2011
- Iain Buclaw <ibuclaw ubuntu.com> Jun 04 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 04 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Jun 04 2011
This paper: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ compares the efficiency of an algorithm implemented in C++, Scala, Java, and Go. I wonder how D would fare. Does anyone have the time and inclination to port the benchmark so we can take a look? Thanks, Andrei
Jun 03 2011
Andrei Alexandrescu wrote:This paper:
compares the efficiency of an algorithm implemented in C++, Scala, Java, and Go. I wonder how D would fare. Does anyone have the time and inclination to port the benchmark so we can take a look? Thanks, Andrei
OK. I'll start right away. Timon
Jun 03 2011
Timon Gehr:OK. I'll start right away.
I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Jun 03 2011
"bearophile" <bearophileHUGS lycos.com> wrote in message news:isbd1j$1u9s$1 digitalmars.com...But this article written by Google and its benchmarking methodology are not good.
I realize you'e probably talking about content, but...Pet Peeve: Trying to read a paged multi-column layout outside of dead-tree form.
Jun 03 2011
bearophile wrote:Timon Gehr:OK. I'll start right away.
I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. Timon
Jun 03 2011
On 6/3/11 2:48 PM, Timon Gehr wrote:bearophile wrote:Timon Gehr:OK. I'll start right away.
I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. Timon
Terrific, thanks! I think this will be very instructive. Andrei
Jun 03 2011
On 6/3/11 2:48 PM, Timon Gehr wrote:bearophile wrote:Timon Gehr:OK. I'll start right away.
I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. Timon
One note - some of their std::list uses can be replaced by SList, and others by built-in arrays. Andrei
Jun 03 2011
Andrei Alexandresco wrote:On 6/3/11 2:48 PM, Timon Gehr wrote:bearophile wrote:Timon Gehr:OK. I'll start right away.
I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. Timon
One note - some of their std::list uses can be replaced by SList, and others by built-in arrays. Andrei
Yes, but built-in arrays are more similar to std::vector, therefore it might already be an optimization over the original benchmark. I'm not sure if that is okay? Timon
Jun 03 2011
On 6/3/11 3:01 PM, Timon Gehr wrote:Andrei Alexandresco wrote:On 6/3/11 2:48 PM, Timon Gehr wrote:bearophile wrote:Timon Gehr:OK. I'll start right away.
I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. Timon
One note - some of their std::list uses can be replaced by SList, and others by built-in arrays. Andrei
Yes, but built-in arrays are more similar to std::vector, therefore it might already be an optimization over the original benchmark. I'm not sure if that is okay? Timon
I noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. Andrei
Jun 03 2011
Andrei Alexandrescu wrote:I noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. Andrei
Yes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps? Timon
Jun 03 2011
Jonathan M Davis wrote:On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:I noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. Andrei
Yes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps?
You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset. But If you adjust its template parameters, you can make it a set, map, multimap, or any other type of collection which uses a red-black tree > as its data structure. - Jonathan M Davis
Yes, thats what I had in mind, but I thought it is strange that there is no boilerplate map in std.container. Timon
Jun 03 2011
On 2011-06-06 06:37, Steven Schveighoffer wrote:On Fri, 03 Jun 2011 17:30:54 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:Jonathan M Davis wrote:On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:I noticed that the C++ code uses std::list without there being any
needfor a linked list structure. See for example the data structure
used inFindSet. It's a list, but it's just appended too and then used for
oneiteration. Andrei
Yes, but the list in FindSet is unnecessary anyways. If I start
changingthe original implementation, the first thing I will do is to remove
that.First however, I will port the code as closely as possible. Is there
anyassociative version of RedBlackTree (I realize it could be made
associativequite easily), or should I just use built-in hash maps?
You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset.
This makes it a set, not a multiset. allowDuplicates set to true would make it a multiset.
Bleh. You're right. I guess that I wasn't thinking straight (or was thinking too quickly) on that one. :( - Jonathan M Davis
Jun 06 2011
On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:I noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. Andrei
Yes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps?
You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset. But If you adjust its template parameters, you can make it a set, map, multimap, or any other type of collection which uses a red-black tree as its data structure. - Jonathan M Davis
Jun 03 2011
On Fri, 03 Jun 2011 17:30:54 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:Jonathan M Davis wrote:On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:I noticed that the C++ code uses std::list without there being any
for a linked list structure. See for example the data structure
FindSet. It's a list, but it's just appended too and then used for
iteration. Andrei
Yes, but the list in FindSet is unnecessary anyways. If I start
the original implementation, the first thing I will do is to remove
First however, I will port the code as closely as possible. Is there
associative version of RedBlackTree (I realize it could be made
quite easily), or should I just use built-in hash maps?
You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset.
This makes it a set, not a multiset. allowDuplicates set to true would make it a multiset.Yes, thats what I had in mind, but I thought it is strange that there is no boilerplate map in std.container.
I think the goal is to eventually have these higher-level types based on the implementation types, but I'm uncertain how they will look or where they will go. Almost certainly, there will be a template based on a set-like template for defining a map (e.g. map!(RedBlackTree, int, int) ). If you want a cookie-cutter map type that uses the exact same implementation as RedBlackTree, with a custom allocator that helps with performance for long-lasting containers, dcollections provides such a type (TreeMap). If you do end up using it, I recommend the latest trunk, I've made some critical bug fixes (I need to release soon...). -Steve
Jun 06 2011
Timon Gehr:have you already noticed how incredibly stupid some parts of the implementation of cpp are?
Some of those "stupid" things come from the very strict Google C++ coding standards. Even if my C++ experience is not extensive, I find this C++ easy to understand. One part of the code: } // node_pool.size } // Step c } // FindLoops private: MaoCFG *CFG_; // current control flow graph. LoopStructureGraph *lsg_; // loop forest. }; // HavlakLoopFinder The Ada language has a syntax to write those names at the closing ends, and the Ada compiler enforces such names to be always coherent and correct. In C/C++/D unfortunately such names (written as comments) may go out of sync. Despite this risk I add similar comments at the end of functions/classe/structs/loops in my D code too, because the risk of them getting out of sync is lower than the advantages they give me. I have not written an enhancement request yet about this in the D Bugzilla yet, also because I don't know what a good C-style syntax for it could be. Bye, bearophile
Jun 03 2011
bearophile wrote:Timon Gehr:have you already noticed how incredibly stupid some parts of the implementation
Some of those "stupid" things come from the very strict Google C++ coding
sure? Eg. UnionFindNode *FindSet() { typedef std::list<UnionFindNode *> NodeListType; NodeListType nodeList; UnionFindNode *node = this; while (node != node->parent()) { if (node->parent() != node->parent()->parent()) nodeList.push_back(node); node = node->parent(); } // Path Compression, all nodes' parents point to the 1st level parent. NodeListType::iterator iter = nodeList.begin(); NodeListType::iterator end = nodeList.end(); for (; iter != end; ++iter) (*iter)->set_parent(node->parent()); return node; } They create a new linked list of nodes on the way to the root. It is not that that is just basically the same they already had when following the parent pointers. They obviously need a copy. Also, they made sure that node==node->parent(). (node is the a root) But then they call node->parent(). etc... Timon
Jun 03 2011
On 6/3/2011 1:24 PM, bearophile wrote:The Ada language has a syntax to write those names at the closing ends, and the Ada compiler enforces such names to be always coherent and correct. In C/C++/D unfortunately such names (written as comments) may go out of sync.
I never understood the point of such. My editor has a single key "find matching { [ < ( ) > ] }" command, and so I never have a need for such ugly comments. In fact I usually delete such comments when I encounter them.
Jun 04 2011
On 06/04/2011 04:25 AM, Walter Bright wrote:On 6/3/2011 1:24 PM, bearophile wrote:The Ada language has a syntax to write those names at the closing ends, and the Ada compiler enforces such names to be always coherent and correct. In C/C++/D unfortunately such names (written as comments) may go out of sync.
I never understood the point of such. My editor has a single key "find matching { [ < ( ) > ] }" command, and so I never have a need for such ugly comments. In fact I usually delete such comments when I encounter them.
I thought you were big on printed out code reviews and not requiring any editing features from the language?
Jun 04 2011
On 6/4/2011 7:14 AM, Jeff Nowakowski wrote:I thought you were big on printed out code reviews and not requiring any editing features from the language?
I don't find those comments useful in printed code either.
Jun 04 2011
Speaking of SList, for some reason std_slist.html is still distributed with DMD.
Jun 03 2011
On 6/3/11, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Speaking of SList, for some reason std_slist.html is still distributed with DMD.
http://d.puremagic.com/issues/show_bug.cgi?id=6101
Jun 03 2011
On 2011-06-03 14:30, Timon Gehr wrote:Jonathan M Davis wrote:On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:I noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. Andrei
Yes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps?
You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset. But If you adjust its template parameters, you can make it a set, map, multimap, or any other type of collection which uses a red-black tree > as its data structure. - Jonathan M Davis
Yes, thats what I had in mind, but I thought it is strange that there is no boilerplate map in std.container.
std.container provides data structures. It's up to you to decide how to use the data structures. It does not attempt to create a wrapper or anything of the sort to give you particular uses for a data structure (such as using RedBlackTree as a map). Whether it should do that sort of thing is up for debate, but its focus is on data structures not on "collections" which are organized by how they're used. - Jonathan M Davis
Jun 03 2011
Andrei:Does anyone have the time and inclination to port the benchmark so we can take a look?
My modified C++0x code: http://codepad.org/iBwINTkJ First D2 translation that seems to work: http://codepad.org/sEEFNlAd Notes on the translation: - The program crashes with zero error messages if you don't increase the stack. Time ago I have written an enhancement request on this: http://d.puremagic.com/issues/show_bug.cgi?id=6088 - The final print of the "." done using write() doesn't perform the fflush. So after few tries I have used printf. I have not found a good way to flush. Suggestions welcome. - The C++ code uses classes, but they are often more like D structs. In the first D version to keep things simple I have used final classes everywhere. I will profile the code and I will create a more efficient version that uses structs where needed. - I have had to track down a null reference error (caused by the nodes array, that contains just references in the first D version). Nonnullable type modifiers avoids similar bugs at compile-time. - The C++ code is 25 KB, the D code (first version is 21 KB), the C++ code uses about 100 MB at runtime, while the D code uses about 160 MB. - The C++ loops to iterate on collections is many times worse than the D foreach. - In the main() of C++ code, there is a LoopStructureGraph lsg that shadows another one, I've replaced it with a lsg2 in both D and C++ code. - In the C++ code I have used a std::unordered_map to implement BasicBlockMap, because it's significantly faster. So now the code is C++0x. - The method names start with uppercase, I'd like to change this in a second D version. - No linked lists or sorted dictionaries seem needed in this program. But I have felt a bit of need of a set data structure in D (I have used an AA with bool values). Despite there are some associative arrays of SimpleLoop and BasicBlock objects, I think their classes don't need a hash function and opEquals. Bye, bearophile
Jun 03 2011
bearophile wrote:My modified C++0x code: First D2 translation that seems to work:
What timings did you get? On my computer, the D version ran slightly faster (56 seconds vs 63s for C++) without optimizations turned on. With optimizations turned on, C++ took a nice lead (28 seconds vs 53 seconds for D). This is similar to other benchmarks I've run in the past. Standard D builds beat standard C++ builds, but gcc's optimizer takes the lead vs dmd's optimizer. However, in the past, I've found writing D style instead of C++ style lets standard D beat even optimized g++. I wonder if that's the case here too..- The final print of the "." done using write() doesn't perform the fflush. So after few tries I have used printf. I have not found a good way to flush. Suggestions welcome.
To flush with D's stdout, simply call stdout.flush: write("."); stdout.flush();
Jun 03 2011
On 6/3/11 6:09 PM, Adam D. Ruppe wrote:bearophile wrote:My modified C++0x code: First D2 translation that seems to work:
What timings did you get? On my computer, the D version ran slightly faster (56 seconds vs 63s for C++) without optimizations turned on. With optimizations turned on, C++ took a nice lead (28 seconds vs 53 seconds for D).
What are your timings for the Java, Scala, and Go versions? Thanks, Andrei
Jun 03 2011
Andrei Alexandrescu wrote:What are your timings for the Java, Scala, and Go versions?
I don't have compilers nor the code for those languages, so I don't know.
Jun 03 2011
On 2011-06-03 17:16, Andrej Mitrovic wrote:I must be doing something wrong. Do I have to pipe some files to the module or something? I've compiled it and ran it, it's done in 1.5 seconds. :s
Or maybe you just have a way better machine? - Jonathan M Davis
Jun 03 2011
On 6/3/11 7:42 PM, Andrej Mitrovic wrote:DMD debug: 50s DMD optimized: 49s (-release -noboundscheck -O -inline -L/STACK:1280000000) That's around 1GB of stack memory. Compiling with GDC will make the app throw an exception at runtime due to the stack being blown, the error message isn't special but it's better than no error message. GDC debug: 1m:24s GDC optimized: 1m:12s (gdc -v2 -O3 -Wl,--stack,1280000000) I haven't tested the other ones. Does Go have a Windows compiler yet?
I think they do. But another good data point would be the C++ timings on the same machine. Andrei
Jun 03 2011
On 6/3/11 8:05 PM, Andrej Mitrovic wrote:I've compiled the C++ example from google, but I had to remove the "-lc" option due to errors (what is this option anyway?). gcc 4.4.0 with -O3: 1:50
Hmm, it would be odd if the D version is more than twice faster than the C++ one. Wonder what's going on there. Andrei
Jun 03 2011
On 06/03/2011 08:47 PM, Andrej Mitrovic wrote:Running Ubuntu v10.10 x32 on Virtualbox with hardware virtualization, GCC v4.4.4 and DMD 2.053 installed. DMD debug: 54s DMD optimized: 47s Google's C++ optimized: 27s (gcc -O3 -lc -lstdc++) Bear's C++ optimized: 24s (g++ -O3 -std=gnu++0x) So it's MinGW's fault apparently.
Interesting. So now we have a run time comparable with C++'s run time in the paper. The implementation by bearophile has 721 lines. Once we have compilation times, binary size, and memory footprint, we have a good comparison point with the study in the paper. Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer. Thanks, Andrei
Jun 03 2011
Andrei:Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas from the original paper). The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophile
Jun 03 2011
On 6/4/2011 12:01 AM, Caligo wrote:Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, RES: 174MB] real 0m28.407s user 0m28.330s sys 0m0.070s DMD -O -release [VIRT: 94MB, RES: 92MB] real 0m43.232s user 0m42.980s sys 0m0.070s GDC -O3 [VIRT: 306MB, RES: 295MB] real 1m10.788s user 1m10.570s sys 0m0.190s LDC2 segmentation fault
Why not -inline on dmd?
Jun 03 2011
On 06/04/2011 02:10 AM, Caligo wrote:And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060s
Thanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ Andrei
Jun 04 2011
Am 04.06.2011 16:37, schrieb Andrei Alexandrescu:On 06/04/2011 02:10 AM, Caligo wrote:And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060s
Thanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ Andrei
Could you please give a link to the post. I can't find it.
Jun 04 2011
On 06/04/2011 09:43 AM, Mafi wrote:Am 04.06.2011 16:37, schrieb Andrei Alexandrescu:On 06/04/2011 02:10 AM, Caligo wrote:And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060s
Thanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ Andrei
Could you please give a link to the post. I can't find it.
My id is andralex. http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/c1xqj0w Andrei
Jun 04 2011
On 6/4/2011 7:37 AM, Andrei Alexandrescu wrote:Thanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/
Direct link: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/c1xqj0w
Jun 04 2011
On 6/4/2011 12:01 AM, Caligo wrote:Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, RES: 174MB] real 0m28.407s user 0m28.330s sys 0m0.070s DMD -O -release [VIRT: 94MB, RES: 92MB] real 0m43.232s user 0m42.980s sys 0m0.070s GDC -O3 [VIRT: 306MB, RES: 295MB] real 1m10.788s user 1m10.570s sys 0m0.190s LDC2 segmentation fault
Oh, also, you way want to re-try the benchmark w/ 2.053. It looks rather allocation heavy and I substantially improved the GC performance for 2.053.
Jun 03 2011
Andrei:Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >
the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophile
Your port segfaults DMD 2.053 with the -g flag (at least on linux). Andrei: You may want to point out on reddit that the code is approx. a 1 to 1 port of the C++ code and not specially tuned. Timon
Jun 04 2011
Andrei:Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >
the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophile
One simple but very powerful optimization is to minimize the runs of the GC. I have added a call to GC.disable(); in the beginning of main and then added a GC.collect(); after each 10 test runs. Results on my machine (32bit executables): C++ (-O2): 30.7s, ~170MB. D (-release -O -inline): 29.5s, ~520MB Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;). Timon
Jun 04 2011
Am 04.06.2011 18:25, schrieb Timon Gehr:Andrei:Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >
the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophile
One simple but very powerful optimization is to minimize the runs of the GC. I have added a call to GC.disable(); in the beginning of main and then added a GC.collect(); after each 10 test runs. Results on my machine (32bit executables): C++ (-O2): 30.7s, ~170MB. D (-release -O -inline): 29.5s, ~520MB Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;). Timon
What was your time for D without disabling the GC? Probably 40-50s? This certainly is a big improvement, I didn't think the GC slows it down that much. What'd be really interesting is the benchmark with a D-style implementation of the code (if I understood correctly the current versions are more or less direct translations of the C++ code to D). Cheers, - Daniel
Jun 04 2011
Daniel Gibson wrote:What was your time for D without disabling the GC? Probably 40-50s? This certainly is a big improvement, I didn't think the GC slows it down that much. What'd be really interesting is the benchmark with a D-style implementation of the code (if I understood correctly the current versions are more or less direct translations of the C++ code to D). Cheers, - Daniel
Without disabling the GC, it runs through in 38.2s. The D-style implementation would probably be about the same, bearophile has already replaced std::map/std::set with associative arrays and std::list with dynamic arrays. The algorithm cannot take advantage of Eg. array slicing. It solves a graph problem. I will try to tune the code some more. Timon
Jun 04 2011
On 06/04/2011 11:08 AM, Timon Gehr wrote:Andrei:Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.
First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And>
the nodes, and one trying to apply some of the C++ improvement ideas> from the original paper).The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophile
Your port segfaults DMD 2.053 with the -g flag (at least on linux). Andrei: You may want to point out on reddit that the code is approx. a 1 to 1 port of the C++ code and not specially tuned. Timon
Better yet, feel free to contribute. The more of us participate, the better. Andrei
Jun 04 2011
On 6/3/2011 4:09 PM, Adam D. Ruppe wrote:This is similar to other benchmarks I've run in the past. Standard D builds beat standard C++ builds, but gcc's optimizer takes the lead vs dmd's optimizer.
It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!
Jun 04 2011
Walter:It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!
There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests. Bye, bearophile
Jun 04 2011
On 6/4/2011 9:15 AM, bearophile wrote:Walter:It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!
There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests. Bye, bearophile
That's probably right, for two reasons: 1. Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code. 2. A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't.
Jun 04 2011
Am 04.06.2011 16:54, schrieb dsimcha:On 6/4/2011 9:15 AM, bearophile wrote:Walter:It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!
There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests. Bye, bearophile
That's probably right, for two reasons: 1. Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code.
Besides better optimization by the compiler, see Adam Ruppe's post, 3 posts up:On my computer, the D version ran slightly faster (56 seconds vs 63s >
With optimizations turned on, C++ took a nice lead (28 seconds vs 53 seconds for D).
So it seems like it's not all the GCs fault.2. A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't.
Jun 04 2011
On 6/4/2011 6:15 AM, bearophile wrote:There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests.
Easy to test, simply disable the gc.
Jun 04 2011
On 6/3/11 5:48 PM, bearophile wrote:Andrei:Does anyone have the time and inclination to port the benchmark so we can take a look?
My modified C++0x code: http://codepad.org/iBwINTkJ First D2 translation that seems to work: http://codepad.org/sEEFNlAd Notes on the translation: - The program crashes with zero error messages if you don't increase the stack. Time ago I have written an enhancement request on this: http://d.puremagic.com/issues/show_bug.cgi?id=6088
I think a better tail call optimization would be in order.- The final print of the "." done using write() doesn't perform the fflush. So after few tries I have used printf. I have not found a good way to flush. Suggestions welcome.
Just call stdout.flush().- The C++ code uses classes, but they are often more like D structs. In the first D version to keep things simple I have used final classes everywhere. I will profile the code and I will create a more efficient version that uses structs where needed.
Makes sense.- I have had to track down a null reference error (caused by the nodes array, that contains just references in the first D version). Nonnullable type modifiers avoids similar bugs at compile-time.
Stay tuned. Good news coming soon. [snip] This is solid work. Looking forward to seeing measurements! Thanks, Andrei
Jun 03 2011
I must be doing something wrong. Do I have to pipe some files to the module or something? I've compiled it and ran it, it's done in 1.5 seconds. :s
Jun 03 2011
Oh I think it's that stack overflow thing bear was talking about? Because it quits after this with no message: Welcome to LoopTesterApp, D edition Constructing App... Constructing Simple CFG... 15000 dummy loops Constructing CFG... Performing Loop Recognition 1 Iteration
Jun 03 2011
Okay, here's how to increase stack size via Optlink: dmd benchmark.d -L/STACK:128000000
Jun 03 2011
Andrej Mitrovic:Okay, here's how to increase stack size via Optlink: dmd benchmark.d -L/STACK:128000000
Keep in mind that sets the max stack size, not the max heap size. -L/STACK:5000000 is plenty :-) Bye, bearophile
Jun 03 2011
DMD debug: 50s DMD optimized: 49s (-release -noboundscheck -O -inline -L/STACK:1280000000) That's around 1GB of stack memory. Compiling with GDC will make the app throw an exception at runtime due to the stack being blown, the error message isn't special but it's better than no error message. GDC debug: 1m:24s GDC optimized: 1m:12s (gdc -v2 -O3 -Wl,--stack,1280000000) I haven't tested the other ones. Does Go have a Windows compiler yet?
Jun 03 2011
On 6/4/11, bearophile <bearophileHUGS lycos.com> wrote:Andrej Mitrovic:Okay, here's how to increase stack size via Optlink: dmd benchmark.d -L/STACK:128000000
Keep in mind that sets the max stack size, not the max heap size. -L/STACK:5000000 is plenty :-) Bye, bearophile
Is that in bytes or kilobytes? Well, I guess Optlink/LD will automatically limit the stack size to something sane if I've blown it out of proportion. The app ends up using about 130megs.
Jun 03 2011
Btw for those wondering, the paper's code is at http://code.google.com/p/multi-language-bench/
Jun 03 2011
I've compiled the C++ example from google, but I had to remove the "-lc" option due to errors (what is this option anyway?). gcc 4.4.0 with -O3: 1:50
Jun 03 2011
SORRY, I meant gcc 4.5.2 not 4.4.0.
Jun 03 2011
I can't compile bear's C++0x example, I get a ton of undefined reference errors. I've tried with: gcc -O3 -std=gnu++0x raw.cpp Maybe he's using gcc 4.6 or I'm missing some flags.
Jun 03 2011
It seems I should have used g++: g++ -O3 -std=gnu++0x -Wl,--stack,1280000000 raw.cpp time: 2m:12s
Jun 03 2011
I'll give a shot at compiling on Linux, maybe this is all MinGW's fault.
Jun 03 2011
What do I use to accurately time an executable under Linux?
Jun 03 2011
Running Ubuntu v10.10 x32 on Virtualbox with hardware virtualization, GCC v4.4.4 and DMD 2.053 installed. DMD debug: 54s DMD optimized: 47s Google's C++ optimized: 27s (gcc -O3 -lc -lstdc++) Bear's C++ optimized: 24s (g++ -O3 -std=gnu++0x) So it's MinGW's fault apparently.
Jun 03 2011
On 6/4/11, bearophile <bearophileHUGS lycos.com> wrote:Second version, with all structs: http://codepad.org/etsLsZV5
38secs. Cut down by 10 secs from last time.
Jun 03 2011
Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, RES: 174MB] real 0m28.407s user 0m28.330s sys 0m0.070s DMD -O -release [VIRT: 94MB, RES: 92MB] real 0m43.232s user 0m42.980s sys 0m0.070s GDC -O3 [VIRT: 306MB, RES: 295MB] real 1m10.788s user 1m10.570s sys 0m0.190s LDC2 segmentation fault
Jun 03 2011
On Fri, Jun 3, 2011 at 11:16 PM, dsimcha <dsimcha yahoo.com> wrote:On 6/4/2011 12:01 AM, Caligo wrote:Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, =A0RES: 174MB] real =A0 =A00m28.407s user =A0 =A00m28.330s sys =A0 =A0 0m0.070s DMD -O -release [VIRT: 94MB, =A0RES: 92MB] real =A0 =A00m43.232s user =A0 =A00m42.980s sys =A0 =A0 0m0.070s GDC -O3 [VIRT: 306MB, =A0RES: 295MB] real =A0 =A01m10.788s user =A0 =A01m10.570s sys =A0 =A0 0m0.190s LDC2 segmentation fault
Why not -inline on dmd?
I don't like the '-inline' option, but here it is. Besides, I usually use GDC or LDC2 and I was expecting them to outperform DMD because they usually do, but not in this case. DMD-32bit v2.52 -O -release -inline [VIRT: 94MB, RES: 92MB] real 0m42.490s user 0m42.480s sys 0m0.000s DMD-32bit v2.53 -O -release -inline [VIRT: 107MB, RES: 104MB] real 0m34.011s user 0m33.930s sys 0m0.070s DMD-64bit v2.53 -O -release -inline segmentation fault DMD-64bit v2.53 -O -release [VIRT: 232MB, RES: 219MB] real 0m44.715s user 0m44.580s sys 0m0.080s P.S. It's a 64-bit system.
Jun 04 2011
And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060s
Jun 04 2011
On 4 June 2011 08:03, Caligo <iteronvexor gmail.com> wrote:On Fri, Jun 3, 2011 at 11:16 PM, dsimcha <dsimcha yahoo.com> wrote:On 6/4/2011 12:01 AM, Caligo wrote:Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, =A0RES: 174MB] real =A0 =A00m28.407s user =A0 =A00m28.330s sys =A0 =A0 0m0.070s DMD -O -release [VIRT: 94MB, =A0RES: 92MB] real =A0 =A00m43.232s user =A0 =A00m42.980s sys =A0 =A0 0m0.070s GDC -O3 [VIRT: 306MB, =A0RES: 295MB] real =A0 =A01m10.788s user =A0 =A01m10.570s sys =A0 =A0 0m0.190s LDC2 segmentation fault
Why not -inline on dmd?
I don't like the '-inline' option, but here it is. =A0Besides, I usually use GDC or LDC2 and I was expecting them to outperform DMD because they usually do, but not in this case.
Well, I'm not the least bit surprised by DMD outperforming GDC here. Mostly because the program produced by GDC will have 109+ extra array bounds checks, and 300+ extra assertions (hint: the program produced by DMD will have 0). --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 04 2011
Here's another run on win32 for bear's struct version: DMD optimized, GC enabled: 38s.015 DMD optimized, GC disabled: 30s.890
Jun 04 2011
AFAIK google also messed with the GC in their paper, so I'd say this is fair-game.
Jun 04 2011









"Nick Sabalausky" <a a.a> 