www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DB/DBMS in D

reply Vladimir A. Reznichenko <kalessil gmail.com> writes:
Dear Mr./Ms.,


I'd like to ask you about the garbage collector.
It slows down an application, doesn't it?

In case of DBMS, this is critical. I haven't found any articles or tests
about this.

Also it would be great to find out about memory management implemented in
DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are
used there could it be named?

The C/C++ is classic choice for such projects (DBMS), but the D language
is great one and the best for me ). I want to find out abilities of using
it.


Faithfully yours.
Feb 16 2009
next sibling parent reply Chris R Miller <lordsauronthegreat gmail.com> writes:
Vladimir A. Reznichenko wrote:
 Dear Mr./Ms.,


 I'd like to ask you about the garbage collector.
 It slows down an application, doesn't it?

 In case of DBMS, this is critical. I haven't found any articles or tests
 about this.

 Also it would be great to find out about memory management implemented in
 DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are
used there could it be named?

 The C/C++ is classic choice for such projects (DBMS), but the D language
 is great one and the best for me ). I want to find out abilities of using
 it.

I would argue the opposite: that in a long-running process such as an RDBMS you would *want* the garbage collector to ensure that there are no memory leaks. You could have either a super-fast database which leaks memory (so your users would have to restart it periodically) OR you could use a garbage collector, take the performance penalty (not that much - quite frankly, complaining about the garbage collector is like complaining that the silverware is gold and not platinum) and have the assurance that your memory leakage will be kept to an absolute minimum (or not at all, if you remember to properly declare weak references). Obviously it is possible to use a language like C++ and write code which doesn't leak memory... however, that level of effort isn't going to give you significant increases in performance compared to D. D is just plain fast.
Feb 16 2009
next sibling parent reply Vladimir A. Reznichenko <kalessil gmail.com> writes:
== Quote from Chris R Miller (lordsauronthegreat gmail.com)'s article
 Vladimir A. Reznichenko wrote:
 Dear Mr./Ms.,


 I'd like to ask you about the garbage collector.
 It slows down an application, doesn't it?

 In case of DBMS, this is critical. I haven't found any articles or tests
 about this.

 Also it would be great to find out about memory management implemented in
 DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are
used


 The C/C++ is classic choice for such projects (DBMS), but the D language
 is great one and the best for me ). I want to find out abilities of using
 it.

RDBMS you would *want* the garbage collector to ensure that there are no memory leaks. You could have either a super-fast database which leaks memory (so your users would have to restart it periodically) OR you could use a garbage collector, take the performance penalty (not that much - quite frankly, complaining about the garbage collector is like complaining that the silverware is gold and not platinum) and have the assurance that your memory leakage will be kept to an absolute minimum (or not at all, if you remember to properly declare weak references). Obviously it is possible to use a language like C++ and write code which doesn't leak memory... however, that level of effort isn't going to give you significant increases in performance compared to D. D is just plain fast.

That's clear. The thing that's not clear for me is memory fragmentation level. In C++ memory is deallocated as soon as object is deleted. In case of using GC deleted object is kept before reused. If GC operates on some range of addresses, and places all objects there (like using buffer) we get fragmentation. The longer we run process the harder to eliminate it. But if GS stores collection of object pointers, located somewhere in memory in undefined order then, of course, we can find deleted object, update it and reuse - this could be even faster. Which of these 2 ways is implemented in DMD GC?
Feb 16 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Vladimir A. Reznichenko:
 In case of using GC deleted object is kept before reused. If GC operates on
some range
 of addresses, and places all objects there (like using buffer) we get
fragmentation. The
 longer we run process the harder to eliminate it.

When I have asked a similar question, people have told me that the current D GC allocates memory in blocks long as powers of two (until they become big enough) (this has also the consequence that saving few bytes in a struct is often an illusion, because if you use 9 bytes, the GC allocator gives you a 16 bytes long memory block. This happens in associative arrays too, so saving small amounts of memory is sometimes impossible. You have to use the C heap allocator or your own pools, arenas, etc). I can also suggest you to perform some experiment, to try to fragment memory and to look at how the memory uses grows or not grows (and to show us the code). Experiments require a bit of time and they can be wrong, but very often they also show you interesting surprises, I have seen such "surprises" very often while doing speed benchmarks. Bye, bearophile
Feb 16 2009
prev sibling parent grauzone <none example.net> writes:
I suspect that in long running applications, there's more and more 
unfree'd garbage, because the conservative GC thinks it's still alive. 
And the garbage references other garbage and so on.

Can someone confirm or confute this?
Feb 16 2009
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
Vladimir A. Reznichenko wrote:
 Dear Mr./Ms.,
 
 
 I'd like to ask you about the garbage collector.
 It slows down an application, doesn't it?
 
 In case of DBMS, this is critical. I haven't found any articles or tests
 about this.
 
 Also it would be great to find out about memory management implemented in
 DMD: fragmentation, allocation, reallocation. And if wide-known algorithms are
used there could it be named?

The GC contains a collection of pools, each containing N contiguous pages of memory. Each page can be individually assigned for storing a particular fixed memory block size, with sizes as powers of two from 16 to 4096 bytes (1 page, on most systems). For allocations beyond 4096 bytes, the minimum necessary contiguous pages will be used to hold the memory block. All free blocks of a particular size are held in a free list. When an allocation occurs, the GC first checks the appropriate free list to see if there is a block of the right size available. If not, it looks for an available page in an existing memory pool that can be turned into a page of the appropriate size blocks. If there is none, a mark/sweep garbage collection cycle occurs. Then the GC looks for a free block, free page, and if there still aren't any it allocates a new pool from the OS. After garbage collection, the D2 GC will check to see if any pools are completely empty and release these back to the OS. I'm working from memory, but that's roughly the way the GC works. You can also explicitly delete GC memory via the 'delete' expression, though there's been some contention about whether having a 'delete' operation for GCed memory is actually a good idea.
 The C/C++ is classic choice for such projects (DBMS), but the D language
 is great one and the best for me ). I want to find out abilities of using
 it.

D is just as fast as C/C++. For a DBMS, you'll mostly want to be aware of the potential memory overhead of the GC allocation scheme and the cost of the "stop the world" collection cycles. If you decide you don't want to use the GC at all, you're also able to call the C malloc and free. Only the built-in AA and some string operations (concatenation, etc) allocate GC memory behind the scenes. The rest comes from explicit 'new' calls that you make in your own code.
Feb 16 2009