www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: std.algorithm issue

reply bearophile <bearophileHUGS lycos.com> writes:
Dee Girl:
 I did not install bearophile library. It would be interesting to see what the
speed is of std.algorithm and d.func.

I am not using D 2.x, but I have performed some benchmarks, with quite interesting results (DMD 1.029 because my libs don't work at all with DMD 1.030, on a Core Duo, using 1 core only): import std.stdio: put = writef, putr = writefln; import d.func: map, range; import d.time: clock; import std.traits: ReturnType; import d.extra: NewVoidCArray, delVoidC, NewVoidGCArray; import std.c.stdlib: alloca; static int inv(int x) { return -x; } ReturnType!(f)[] Map2(alias f)(int[] arr) { auto result = new ReturnType!(f)[arr.length]; foreach (i, el; arr) result[i] = f(el); return result; } ReturnType!(f)[] Map3(alias f)(int[] arr) { auto result = NewVoidGCArray!(int)(arr.length); foreach (i, el; arr) result[i] = f(el); return result; } void Map4(alias f)(int[] arr) { auto result = (cast(int*)alloca(arr.length * int.sizeof))[0 .. arr.length]; if (result.ptr is null) throw new Exception("Not enough free stack"); foreach (i, el; arr) result[i] = f(el); return result; } void main() { auto data = range(6_000_000); double t; typeof(data) r; t = clock(); for (uint i; i < 100U; ++i) { r = new int[data.length]; foreach (pos, el; data) r[pos] = -el; } putr("inline: ", clock() - t, " ", r[$-1]); t = clock(); for (uint i; i < 100U; ++i) r = map(&inv, data); putr("map global func: ", clock() - t, " ", r[$-1]); t = clock(); for (uint i; i < 100U; ++i) r = map((int x) {return -x;}, data); putr("map locally defined delegate: ", clock() - t, " ", r[$-1]); t = clock(); for (uint i; i < 100U; ++i) r = map(function(int x) {return -x;}, data); putr("map locally defined function: ", clock() - t, " ", r[$-1]); t = clock(); for (uint i; i < 100U; ++i) r = Map2!(inv)(data); putr("Map2 with global function: ", clock() - t, " ", r[$-1]); t = clock(); for (uint i; i < 100U; ++i) { auto r2 = NewVoidCArray!(int)(data.length); foreach (pos, el; data) r2[pos] = -el; delVoidC(r2); } putr("inline C malloc: ", clock() - t); t = clock(); for (uint i; i < 100U; ++i) r = Map3!(inv)(data); putr("Map3 with global function: ", clock() - t, " ", r[$-1]); t = clock(); for (uint i; i < 100U; ++i) Map4!(inv)(data); putr("Map4 with global function: ", clock() - t); } Timings results, n = 6_000_000 of ints: inline: 4.59 map global func: 4.95 map locally defined delegate: 4.73 map locally defined function: 4.31 Map2 with global function: 3.97 inline C malloc: 2.01 Map3 with global function: 2.53 Map4 with global function: 2.0 I can see two surprises there, one is that putting "function" in the map avoids it to become a delegate, speeding up the code a little. The other bigger surprise is the first version (inlined) being slower than the version "map locally defined function". Bye, bearophile
May 24 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
More versions with more surprises:

    t = clock();
    for (uint i; i < 100U; ++i)
        r = Map2!((int x) {return -x;})(data);
    putr("Map2 with locally defined delegate: ", clock() - t, " ", r[$-1]);

    t = clock();
    for (uint i; i < 100U; ++i)
        r = Map2!(function(int x) {return -x;})(data);
    putr("Map2 with locally defined function: ", clock() - t, " ", r[$-1]);
}

/*

Timings:
    inline: 4.61 -5999999
    map global func: 4.968 -5999999
    map locally defined delegate: 4.782 -5999999
    map locally defined function: 4.328 -5999999
    Map2 with global function: 4 -5999999
    inline C malloc: 2.015
    Map3 with global function: 2.531 -5999999
    Map4 with global function: 1.969
    Map2 with locally defined delegate: 4.297 -5999999
    Map2 with locally defined function: 4.156 -5999999
May 24 2008
parent Dee Girl <deegirl noreply.com> writes:
bearophile Wrote:

 More versions with more surprises:
 
     t = clock();
     for (uint i; i < 100U; ++i)
         r = Map2!((int x) {return -x;})(data);
     putr("Map2 with locally defined delegate: ", clock() - t, " ", r[$-1]);
 
     t = clock();
     for (uint i; i < 100U; ++i)
         r = Map2!(function(int x) {return -x;})(data);
     putr("Map2 with locally defined function: ", clock() - t, " ", r[$-1]);
 }
 
 /*
 
 Timings:
     inline: 4.61 -5999999
     map global func: 4.968 -5999999
     map locally defined delegate: 4.782 -5999999
     map locally defined function: 4.328 -5999999
     Map2 with global function: 4 -5999999
     inline C malloc: 2.015
     Map3 with global function: 2.531 -5999999
     Map4 with global function: 1.969
     Map2 with locally defined delegate: 4.297 -5999999
     Map2 with locally defined function: 4.156 -5999999

I think you want to measure function call overhead. Then do not put dynamic allocation in function. Otherwise you measure allocate speed. That is why results are odd. Maybe if you change order results will change. Would help to put static allocation inside function, never dynamic. Then you measure loop and call overhead. Also tests should be in separate runs of program and averaged over few runs. Dee Girl
May 24 2008