digitalmars.D - A possible GC optimization
- bearophile (163/163) May 07 2009 Among the new things done for Python 2.7 (currently in alpha) they have ...
Among the new things done for Python 2.7 (currently in alpha) they have
improved the GC:
http://bugs.python.org/issue4074
http://bugs.python.org/issue4688
There they have shown a small Python program named "tuple_gc_hell.py", this is
a reduced version:
import time, gc
def list_of_tuples(n):
    m = n // 10
    l = []
    prev_time = time.time()
    for i in xrange(n):
        if i % m == 0:
            cur_time = time.time()
            print i, round(cur_time - prev_time, 2)
            prev_time = cur_time
        l.append((i % 2, i % 12))
#gc.disable()
#import psyco; psyco.full()
list_of_tuples(10 * 1000 * 1000)
Its output:
0 0.0
1000000 1.2
2000000 2.58
3000000 3.98
4000000 4.89
5000000 6.64
6000000 8.05
7000000 9.44
8000000 9.84
9000000 12.2
Total running time: 73.3 s
Disabling the GC and using Psyco JIT compiler:
0 0.0
1000000 0.17
2000000 0.19
3000000 0.17
4000000 0.19
5000000 0.16
6000000 0.17
7000000 0.17
8000000 0.17
9000000 0.19
Total running time: 2.67 s
Despite being a very different language, with a very different GC (Python has
an improved reference counting GC) I have tried to test a similar D program
(compiled with DMD v2.029):
import std.c.time: clock; // where's CLOCKS_PER_SEC?
import std.stdio: writeln;
//import std.gc: disable;
//import core.memory: disable; // where's the disable?
void array_of_pairs(int n) {
    class S {
        int x, y;
        this(int x, int y) {
            this.x = x;
            this.x = y;
        }
    }
    int m = n / 10;
    S[] l;
    auto prev_time = clock();
    for (int i; i < n; i++) {
        if (i % m == 0) {
            auto cur_time = clock();
            writeln(i, " ", (cur_time - prev_time) / 1000.0);
            prev_time = cur_time;
        }
        l ~= new S(i % 2, i % 12);
    }
}
void main() {
    //disable();
    array_of_pairs(10_000_000);
}
Its output is not good:
0 0
1000000 1.594
2000000 3.563
3000000 6.156
4000000 8.5
5000000 9.469
6000000 10.047
7000000 13.453
8000000 18.703
9000000 24.203
Total running time: 122.3 s
In DMD2.029 array appends are very slow so in a successive version I have
created the array up-front. And I have seen that pulling the class out of the
array_of_pairs function speeds up the program significantly. To improve the
situation even more I have used a struct instead of a class:
import std.c.time: clock;
import std.stdio: writeln;
struct S {
    int x, y;
    this(int x, int y) {
        this.x = x;
        this.x = y;
    }
}
void array_of_pairs(int n) {
    int m = n / 10;
    auto l = new S*[n];
    auto prev_time = clock();
    for (int i; i < n; i++) {
        if (i % m == 0) {
            auto cur_time = clock();
            writeln(i, " ", (cur_time - prev_time) / 1000.0);
            prev_time = cur_time;
        }
        l[i] = new S(i % 2, i % 12);
    }
}
void main() {
    //disable();
    array_of_pairs(10_000_000);
}
Its output:
0 0
1000000 0.266
2000000 0.625
3000000 0.547
4000000 0.641
5000000 0.765
6000000 0.891
7000000 1.031
8000000 1.172
9000000 1.328
Total running time: 9.46 s
Now it's much faster than Python (with GC and without JIT), but the GC shows
having troules anyway.
Just to be sure it's not a D2 problem I have done the following test with DMD
v1.042 and Phobos1:
import std.c.time: clock, CLOCKS_PER_SEC;
import std.stdio: writefln;
import std.gc: disable;
struct S { int x, y; }
void list_of_tuples(int n) {
    int m = n / 10;
    auto l = new S*[n];
    auto prev_time = clock();
    for (int i; i < n; i++) {
        if (i % m == 0) {
            auto cur_time = clock();
            writefln(i, " ", (cur_time - prev_time) /
cast(float)CLOCKS_PER_SEC);
            prev_time = cur_time;
        }
        l[i] = new S;
        l[i].x = i % 2;
        l[i].y = i % 12;
    }
}
void main() {
    disable();
    list_of_tuples(10_000_000);
}
Its output:
0 0
1000000 0.297
2000000 0.344
3000000 0.359
4000000 0.391
5000000 0.421
6000000 0.438
7000000 0.484
8000000 0.5
9000000 0.532
Total running time: 4.96
About twice faster.
Now that Phobos 2 has Tuple people will use it, and many of such tuples will
not contain references, so I think an optimization similar to the one done for
Python 2.7 may become useful.
Bye,
bearophile
 May 07 2009








 
  
  
  bearophile <bearophileHUGS lycos.com>
 bearophile <bearophileHUGS lycos.com>