www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - why allocation of large amount of small objects so slow (x10) in D?

reply nobody <no where.com> writes:
$ g++     alloc.cpp   -o alloc
$ time ./alloc
real    0m1.946s
user    0m1.688s
sys     0m0.256s

$ dmd -O -release allocd.d
$ time ./allocd
real    0m22.734s
user    0m22.353s
sys     0m0.360s

$ cat alloc.cpp
#include <vector>

typedef std::vector<int> intvec;
typedef intvec* intvecp;

int main() {
  int i, n = 20000000;
  intvecp* iva;
  iva = new intvecp[n];
  for (i = n; i-- > 0; ) {
    iva[i] = new intvec();
  }

  return 0;
}

$ cat allocd.d
int main() {
  int i, n = 20000000;
  Object[] oa;
  oa = new Object[n];
  for (i = n; i-- > 0; ) {
    oa[i] = new Object();
  }

  return 0;
}
May 21 2009
next sibling parent Leandro Lucarella <llucax gmail.com> writes:
nobody, el 22 de mayo a las 00:08 me escribiste:
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s
 
 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s
 
 $ cat alloc.cpp
 #include <vector>
 
 typedef std::vector<int> intvec;
 typedef intvec* intvecp;
 
 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }
 
   return 0;
 }
 
 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }
 
   return 0;
 }

You mean why GCC C++ compiler is faster than DM D compiler. You are comparing 2 different compiler implementations for 2 different languages. It's really hard to get any conclusion from that. It would be much more meaningfull if you compare g++ vs gdc or dmd vs dmc or ldc vs llvm-g++ (they at least use the same backend). -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Oiganmen ñatos de corazón, es más posible que un potus florezca en primavera a que un ángel pase con una remera. -- Peperino Pómoro
May 21 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from nobody (no where.com)'s article
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s
 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s
 $ cat alloc.cpp
 #include <vector>
 typedef std::vector<int> intvec;
 typedef intvec* intvecp;
 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }
   return 0;
 }
 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }
   return 0;
 }

You've probably hit a corner case for the garbage collector. When you allocate 20,000,000 Object instances and hold onto the references, the garbage collector probably runs about a zillion times and never finds anything to delete. If this is a bottleneck, you should temporarily disable the garbage collector in these situations, until you are done with all these allocations, then re-enable it.
May 21 2009
parent reply nobody <no where.com> writes:
 You've probably hit a corner case for the garbage collector.  When you allocate
 20,000,000 Object instances and hold onto the references, the garbage collector
 probably runs about a zillion times and never finds anything to delete.  If
this
 is a bottleneck, you should temporarily disable the garbage collector in these
 situations, until you are done with all these allocations, then re-enable it.

Thanks. Disble gc improve the speed by 5x: $ dmd -O -release allocd.d $ time ./allocd real 0m4.080s user 0m3.644s sys 0m0.420s $ cat allocd.d import core.memory; int main() { core.memory.GC.disable(); int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } core.memory.GC.enable(); return 0; }
May 21 2009
parent Sean Kelly <sean invisibleduck.org> writes:
nobody wrote:
 You've probably hit a corner case for the garbage collector.  When you allocate
 20,000,000 Object instances and hold onto the references, the garbage collector
 probably runs about a zillion times and never finds anything to delete.  If
this
 is a bottleneck, you should temporarily disable the garbage collector in these
 situations, until you are done with all these allocations, then re-enable it.

Thanks. Disble gc improve the speed by 5x: $ dmd -O -release allocd.d $ time ./allocd real 0m4.080s user 0m3.644s sys 0m0.420s $ cat allocd.d import core.memory; int main() { core.memory.GC.disable(); int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } core.memory.GC.enable(); return 0; }

Try: import core.memory; int main() { core.memory.GC.reserve(80000000); core.memory.GC.disable(); int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } core.memory.GC.enable(); return 0; } If the call to reserve fails you could try breaking it up into two calls for half the space each.
May 21 2009
prev sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
nobody wrote:
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s
 
 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s
 
 $ cat alloc.cpp
 #include <vector>
 
 typedef std::vector<int> intvec;
 typedef intvec* intvecp;
 
 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }
 
   return 0;
 }
 
 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }
 
   return 0;
 }

I use this a structure for arena-based memory allocation (attached). Example of use: import candy.util.MemPool MemStack!() stack; class MyObject { mixin MemPoolNew!(stack); } int main() { stack.push(); int i, n = 20000000; MyObject[] oa; oa = new MyObject[n]; for (i = n; i-- > 0; ) { oa[i] = new MyObject(); } stack.pop(); return 0; } The push() and pop() allows memory to be allocated and deallocated as large blocks. However, you shouldn't need to deallocate manually -- it's GCed memory, so ideally the GC should free it when it's no longer referenced. That being said, I've run some tests, and the GC will free it *eventually*, but it allocates 4-6x as much memory as it needs before it starts freeing it, even when GC.collect() is called manually. -------------------- /** * Provides a pool of GCed memory to allocate things from a block. * This maintains cache coherency for related types (i.e. tree nodes). * It doesn't garuntee any ordering, though, the array struct should be * used for that. Also, everything has to be freed at once, freeing one * portion of this has no effect. * * Based on a similar concept posted by bearophile at: * http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=88227 */ public struct MemPool(size_t BLOCK_SIZE = 1 << 14) { private void* next; // Next available block private void* end; // End of the current block private void*[] blocks; public void* alloc(size_t sz) { sz = ((sz + 7) & ~7); // Next multiple of 8 if this isn't a multiple of 8 if (this.next + sz >= this.end) { void* blk = GC.calloc(BLOCK_SIZE); this.blocks.length = this.blocks.length + 1; this.blocks[$ - 1] = blk; this.next = blk; this.end = blk + BLOCK_SIZE; } void* ret = this.next; this.next += sz; return ret; } public void free() { foreach(blk; this.blocks) GC.free(blk); this.blocks = null; this.blocks.length = 0; this.next = null; this.end = null; } } /** * Wrapper for MemPool that allocates the given struct */ public struct StructPool(T) { private MemPool!() pool; public T* alloc() { return cast(T*) pool.alloc(T.sizeof); } } public struct MemStack(size_t BLOCK_SIZE = 1 << 14) { private Stack!(MemPool!(BLOCK_SIZE)*, 16, true, true) stack; public static const size_t MAX_ALLOC = BLOCK_SIZE; public void* alloc(size_t sz) { return stack.peek().alloc(sz); } public void push() { stack.push(new MemPool!(BLOCK_SIZE)); } public void pop() { stack.pop().free(); } } /** * Placement new mixin for allocating from a memory pool. Benchmarks show this * as faster than the D new in real usage (i.e. the parser runs about 1.2x * faster using this). */ public template MemPoolNew(alias Pool) { version(NoMemPool) { } else { public final new(uint sz) { return Pool.alloc(sz); } public final delete(void *p) { } } }
May 21 2009
parent Robert Fraser <fraserofthenight gmail.com> writes:
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Robert Fraser wrote:
 nobody wrote:
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s

 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s

 $ cat alloc.cpp
 #include <vector>

 typedef std::vector<int> intvec;
 typedef intvec* intvecp;

 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }

   return 0;
 }

 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }

   return 0;
 }

I use this a structure for arena-based memory allocation (attached). Example of use: import candy.util.MemPool MemStack!() stack; class MyObject { mixin MemPoolNew!(stack); } int main() { stack.push(); int i, n = 20000000; MyObject[] oa; oa = new MyObject[n]; for (i = n; i-- > 0; ) { oa[i] = new MyObject(); } stack.pop(); return 0; } The push() and pop() allows memory to be allocated and deallocated as large blocks. However, you shouldn't need to deallocate manually -- it's GCed memory, so ideally the GC should free it when it's no longer referenced. That being said, I've run some tests, and the GC will free it *eventually*, but it allocates 4-6x as much memory as it needs before it starts freeing it, even when GC.collect() is called manually. -------------------- /** * Provides a pool of GCed memory to allocate things from a block. * This maintains cache coherency for related types (i.e. tree nodes). * It doesn't garuntee any ordering, though, the array struct should be * used for that. Also, everything has to be freed at once, freeing one * portion of this has no effect. * * Based on a similar concept posted by bearophile at: * http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar .D&article_id=88227 */ public struct MemPool(size_t BLOCK_SIZE = 1 << 14) { private void* next; // Next available block private void* end; // End of the current block private void*[] blocks; public void* alloc(size_t sz) { sz = ((sz + 7) & ~7); // Next multiple of 8 if this isn't a multiple of 8 if (this.next + sz >= this.end) { void* blk = GC.calloc(BLOCK_SIZE); this.blocks.length = this.blocks.length + 1; this.blocks[$ - 1] = blk; this.next = blk; this.end = blk + BLOCK_SIZE; } void* ret = this.next; this.next += sz; return ret; } public void free() { foreach(blk; this.blocks) GC.free(blk); this.blocks = null; this.blocks.length = 0; this.next = null; this.end = null; } } /** * Wrapper for MemPool that allocates the given struct */ public struct StructPool(T) { private MemPool!() pool; public T* alloc() { return cast(T*) pool.alloc(T.sizeof); } } public struct MemStack(size_t BLOCK_SIZE = 1 << 14) { private Stack!(MemPool!(BLOCK_SIZE)*, 16, true, true) stack; public static const size_t MAX_ALLOC = BLOCK_SIZE; public void* alloc(size_t sz) { return stack.peek().alloc(sz); } public void push() { stack.push(new MemPool!(BLOCK_SIZE)); } public void pop() { stack.pop().free(); } } /** * Placement new mixin for allocating from a memory pool. Benchmarks show this * as faster than the D new in real usage (i.e. the parser runs about 1.2x * faster using this). */ public template MemPoolNew(alias Pool) { version(NoMemPool) { } else { public final new(uint sz) { return Pool.alloc(sz); } public final delete(void *p) { } } }

Oops, that needs another module. Okay, both are attached in a compile-able form.
May 21 2009