digitalmars.D - Re: Port a benchmark to D?
- bearophile <bearophileHUGS lycos.com> Jun 04 2011
- bearophile <bearophileHUGS lycos.com> Jun 04 2011
UnionFindNode struct instances probably doesn't need a memory pool, it's already allocated in (small) arrays. I have not found a simple enough way to allocate SimpleLoop struct instances with a memory pool. I have not found simple ways to use most of the C++ optimization hints of the original paper. The paper is badly written and its contents are worse. I think they have written this as an internal benchmark and only later have created a paper. I think they have used an old CPU because their PC farms often use similar old CPUs. I have created a D version a bit de-optimized, that uses classes instead of structs for the classes that are instantiated only few times. Generally the original C++ code is not good: it's low level where it doesn't need to be low level, and it wastes memory and time where it matters (like in tiny data structures with lot of overhead.) I do not generally write code like that. I don't have desire to work further on this code. I'd like a Set with a very handy API (copied from Python) in Phobos. I'd like the built-in associative arrays to have an optimization for very small number of items. Python dicts have an optimization for 8 items or less. I have done no attempts to tune the GC. I confirm that if you disable the GC and perform a collection only once in a while, the code gets faster. Regarding the improvements of the D GC, I remember I have found an optimization for LDC1 lot of time ago, but I don't remember it well. I think the GC scans the static memory too, etc, this slows down the GC. Google seems to care a lot about compilation times too. The D code compiles in about 1 second or less with full optimization. The C++0x code compiled with G++ doesn't come even close to that. D code compiled with DMD 2.053 with: dmd -O -release -inline -w -L/STACK:5000000 Using -g doesn't crash my code on Windows, so I suggest someone to minimize the code to find this bug. I also suggest someone to minimize the code for the 64 version to find the 64 bit DMD bug. C++0x code compiled with G++ 4.6 with: g++ -std=c++0x -Ofast -Wall -Wextra -s With LDC and GDC I suggest to look much better for the correct switches. Othrerwise this benchmark becomes even more meaningless (it's already not so meaningful. Google has not released the full C++ code). I have created a D version that uses a TinySet, following one of the C++ optimization hints, one of the D versions use it, but I don't currently have a good PC to actually take the timings: import std.typetuple: TypeTuple; extern(C) pure nothrow void exit (in int status); template Range(int stop) { static if (stop <= 0) alias TypeTuple!() Range; else alias TypeTuple!(Range!(stop-1), stop-1) Range; } struct TinySet(T, int maxLength) { T[maxLength] data; size_t length = 0; // readonly property void add(T x) { /*static*/ foreach (i; Range!maxLength) if (x == data[i]) return; if (length < maxLength) { data[this.length] = x; length++; } else exit(1); } T[] opSlice() { return data[0 .. this.length]; } } C++0x code a bit modified, uses unordered_map instead of silly map: http://codepad.org/9NEL9yPo D code with structs and classes (no manual deallocations), less ugly, probably this is the version to show to people: http://codepad.org/Xqg0Y26S Experimental D code with TinySet, with structs and classes (no manual deallocations): http://codepad.org/4ATOvFsV D code with only structs (no manual deallocations): http://codepad.org/SePFDAaT D code with only classes: http://codepad.org/k0DUzP3D Bye, bearophile
Jun 04 2011
void add(T x) { /*static*/ foreach (i; Range!maxLength) if (x == data[i]) return;
Sorry, this code is wrong, it has to iterate from 0 to length. If some code doesn't have unittests then it's broken. Bye, bearophile
Jun 04 2011