www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GC question

reply osa1 <omeragacan gmail.com> writes:
Hi all,

I was looking at D as the next language to use in my hobby 
projects, but the
"conservative GC" part in the language spec
(http://dlang.org/spec/garbage.html) looks a bit concerning. I'm 
wondering what
are the implications of the fact that current GC is a Boehm-style 
conservative
GC rather than a precise one, I've never worked with a 
conservative GC before.
Are there any disallowed memory operations? Can I break things by 
not following
some unchecked rules etc. ? How often does it leak? Do I need to 
be careful
with some operations to avoid leaks? Is a precise GC in the 
roadmap? any kind
of comments on the GC would be really appreciated.

Thanks
Jan 31
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.
The GC isn't competitive with the ones you find in GC languages (Java, Go etc). E.g. Go is now aiming for GC pauses in the microseconds range. Resource management in D is rather lacklustre, even C++ does better imho. D seems to move towards using thread local ref-counting and making GC optional. I guess that would be ok on cpus with few cores, but not really adequate on many core CPUs.
Feb 01
parent reply osa1 <omeragacan gmail.com> writes:
On Wednesday, 1 February 2017 at 09:40:17 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.
The GC isn't competitive with the ones you find in GC languages (Java, Go etc). E.g. Go is now aiming for GC pauses in the microseconds range. Resource management in D is rather lacklustre, even C++ does better imho. D seems to move towards using thread local ref-counting and making GC optional. I guess that would be ok on cpus with few cores, but not really adequate on many core CPUs.
Thanks for the answer. Could you elaborate on the lacklustre part? It's fine if I have to do manual memory management, but I don't want any leaks. Ideally I'd have a precise GC + RAII style resource management when needed.
Feb 01
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 1 February 2017 at 09:50:42 UTC, osa1 wrote:
 Thanks for the answer. Could you elaborate on the lacklustre 
 part? It's fine if I have to do manual memory management, but I 
 don't want any leaks. Ideally I'd have a precise GC + RAII 
 style resource management when needed.
Rust, Go, Java, Swift etc have a single memory management scheme which is used by libraries and mostly enforced by the compiler. In C++ you tend to go with unique ownership and occasional shared ownership with the ability to have weak pointers and swap out objects without updating pointer, and there is an easy transition form old ad-hoc ownership to shared (the reference counter is in a separate object). It is not enforced by the compiler, but C++ is moving towards having dedicated tools for checking correctness. In D the goal is to have safety enforced by the compiler, but it isn't quite there yet and what is on the map for leak free resource management seems a simple reference counting mechanism (simpler than swift?) with refcount embedded in objects (like intrusive_ptr in Boost except the compiler is intended to be better at optimizing unnecessary updates of the reference count).
Feb 01
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 Are there any disallowed memory operations?
Currently can't touch GC from destructor during collection. Another concern is interoperability with C-allocated memory: GC knows nothing about C heap.
 How often does it leak?
Leaks are likely in 32-bit processes and unlikely in 64-bit processes. See e.g. https://issues.dlang.org/show_bug.cgi?id=15723
 Do I need to be careful with some operations to avoid leaks?
Leaks happen only due to false pointers. But data allocated in GC with new operator and known to have no pointers (e.g. strings) is not scanned. Premature collection happen when GC doesn't see a pointer to the allocated data, happens when such pointer is put in a memory GC doesn't see, like C heap.
 Is a precise GC in the roadmap?
There's an effort on it: https://forum.dlang.org/post/hdwwkzqswwtffjehenjt forum.dlang.org
 It's fine if I have to do manual memory management, but I don't 
 want any leaks.
If you manually deallocate memory, it gets deallocated for sure, shouldn't leak. Comparing to java, D GC trades GC performance for code execution performance, which can result in better overall performance when you don't allocate much and worse performance for allocation-heavy code that java GC is optimized for.
Feb 03
parent reply osa1 <omeragacan gmail.com> writes:
On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote:
 Leaks are likely in 32-bit processes and unlikely in 64-bit 
 processes. See e.g. 
 https://issues.dlang.org/show_bug.cgi?id=15723
This looks pretty bad. I think I'll consider something else until D's memory management story gets better. This is sad because the language otherwise looks quite good, and I'd love to try assertions, contracts, scope guards, macros etc.
Feb 03
parent Dsby <dushibaiyu yahoo.com> writes:
On Friday, 3 February 2017 at 11:36:26 UTC, osa1 wrote:
 On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote:
 Leaks are likely in 32-bit processes and unlikely in 64-bit 
 processes. See e.g. 
 https://issues.dlang.org/show_bug.cgi?id=15723
This looks pretty bad. I think I'll consider something else until D's memory management story gets better. This is sad because the language otherwise looks quite good, and I'd love to try assertions, contracts, scope guards, macros etc.
you can use less auto GC. use the RC to replace the GC. https://github.com/huntlabs/SmartRef
Feb 03
prev sibling parent reply thedeemon <dlang thedeemon.com> writes:
On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.
 Are there any disallowed memory operations? Can I break things 
 by not following
 some unchecked rules etc. ? How often does it leak? Do I need 
 to be careful
 with some operations to avoid leaks?
Here's some practical perspective from someone who released a 32-bit video processing app in D with thousands of users. When developing with GC in D you need to keep in mind 3 key things: 1) The GC will treat some random stack data as possible pointers, and some of those false pointers will accidentally point to some places in the heap, so for any object in GC heap there is a probability that GC will think it's alive (used) even when it's not, and this probability is directly proportional to the size of your object. 2) Each GC iteration scans the whole GC heap, so the larger your managed heap, the slower it gets. Main consequence of 1 and 2: don't store large objects (images, big file chunks etc.) in the GC heap, use other allocators for them. Leave GC heap just for the small litter. This way you practically don't leak and keep GC pauses short. 3) GC will call destructors (aka finalizers) for the objects that have them, and during the GC phase no allocations are allowed. Also, since you don't know in which order objects are collected, accessing other objects from a destructor is a bad idea, those objects might be collected already. Main consequence of 3: don't do silly things in destructor (like throwing exceptions or doing other operations that might allocate), try avoiding using the destructors at all, if possible. They may be used to ensure you release your resources, but don't make it the primary and only way to release them, since some objects might leak and their destructors won't be called at all. If you follow these principles, your app will be fine, it's not hard really.
Feb 04
parent reply osa1 <omeragacan gmail.com> writes:
On Saturday, 4 February 2017 at 11:09:21 UTC, thedeemon wrote:
 On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.
 Are there any disallowed memory operations? Can I break things 
 by not following
 some unchecked rules etc. ? How often does it leak? Do I need 
 to be careful
 with some operations to avoid leaks?
Here's some practical perspective from someone who released a 32-bit video processing app in D with thousands of users. When developing with GC in D you need to keep in mind 3 key things: 1) The GC will treat some random stack data as possible pointers, and some of those false pointers will accidentally point to some places in the heap, so for any object in GC heap there is a probability that GC will think it's alive (used) even when it's not, and this probability is directly proportional to the size of your object. 2) Each GC iteration scans the whole GC heap, so the larger your managed heap, the slower it gets. Main consequence of 1 and 2: don't store large objects (images, big file chunks etc.) in the GC heap, use other allocators for them. Leave GC heap just for the small litter. This way you practically don't leak and keep GC pauses short. 3) GC will call destructors (aka finalizers) for the objects that have them, and during the GC phase no allocations are allowed. Also, since you don't know in which order objects are collected, accessing other objects from a destructor is a bad idea, those objects might be collected already. Main consequence of 3: don't do silly things in destructor (like throwing exceptions or doing other operations that might allocate), try avoiding using the destructors at all, if possible. They may be used to ensure you release your resources, but don't make it the primary and only way to release them, since some objects might leak and their destructors won't be called at all. If you follow these principles, your app will be fine, it's not hard really.
Honestly this still sounds horrible. I'd be OK with any of these two: - Percise GC, no manual management, no RAII or destructors etc. - Manual GC, RAII and destructors, smart pointers. but this: - Automatic but conservative. Can leak at any time. You have to implement manual management (managed heaps) to avoid leaks. Leaks are hard to find as any heap value may be causing it. is the worst of both worlds. I'm surprised that D was able to come this far with this.
Feb 04
next sibling parent Kagamin <spam here.lot> writes:
On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 I'm surprised that D was able to come this far with this.
It's used mostly for server software. Things are moving to 64 bit, so this will be less of an issue.
Feb 04
prev sibling next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 - Automatic but conservative. Can leak at any time.
All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs. Granted, the D 32 bit GC is more prone to leak than most others (including D 64 bit), but this isn't as horrible as you're believing, it does it's *main* job pretty well just at the cost of higher memory consumption, which we can often afford. And if you can't, manual management of large arrays tends to be relatively simple anyway. For example, my png.d used to leak something nasty in 32 bit because it used GC-allocated large temporary buffers while decompressing images. But, since they were temporary buffers, it was really easy to just `scope(exit) free(buffer);` after allocating to let them be freed at the end of the function. Then the memory consumption cut in half.
Feb 04
next sibling parent osa1 <omeragacan gmail.com> writes:
 All GCs are prone to leak, including precise ones. The point of 
 garbage collection is not to prevent leaks, but rather to 
 prevent use-after-free bugs.
Of course I can have leaks in a GC environment, but having non-deterministic leaks is another thing, and I'd rather make sure to delete my references to let GC do its thing than to pray and hope some random number on my stack won't be in the range of my heap. I don't agree that the point is just preventing use-after-free, which can be guaranteed statically even in a non-GC language (see e.g. Rust).
Feb 04
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Saturday, 4 February 2017 at 15:23:53 UTC, Adam D. Ruppe wrote:
 On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 - Automatic but conservative. Can leak at any time.
All GCs are prone to leak, including precise ones. The point of garbage collection is not to prevent leaks, but rather to prevent use-after-free bugs.
No, the main point of GC is to prevent leaks in the case where you have circular references. Precise GCs don't leak, by definition. If the object is reachable then it isn't a leak. Now, you might claim that objects that provably won't be touched again should be classified as dead and freed and that this is a bug that exhibit the same behaviour as a leak (running out of memory). But it's really nothing like the leaks you experience with manual memory management (e.g. circular references preventing memory from being released in a reference counting management scheme)
Feb 08
prev sibling parent reply thedeemon <dlang thedeemon.com> writes:
On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 - Automatic but conservative. Can leak at any time. You have to 
 implement manual management (managed heaps) to avoid leaks. 
 Leaks are hard to find as any heap value may be causing it.
By "managed heap" I just meant the GC heap, the one used by "new" operator. Besides it, there are already other allocators and container libraries available, don't need to implement this stuff manually.
 the worst of both worlds.
It may look so from a distance. But in my experience it's not that bad. In most software I did in D it did not matter really (it's either 64-bit or short lived programs) and the control D gives to choose how to deal with everything makes it all quite manageable, I can decide what to take from both worlds and hence pick the best, not the worst.
Feb 04
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 05/02/2017 5:02 PM, thedeemon wrote:

snip

 It may look so from a distance. But in my experience it's not that bad.
 In most software I did in D it did not matter really (it's either 64-bit
 or short lived programs) and the control D gives to choose how to deal
 with everything makes it all quite manageable, I can decide what to take
 from both worlds and hence pick the best, not the worst.
The best of both worlds can be done quite simply. Instead of a chain of input ranges like: int[] data = input.filter!"a != 7".map!"a * 2".array; Use: int[] data; data.length = input.length; size_t i; foreach(v; input.filter!"a != 7".map!"a * 2") { data[i] = v; i++; } data.length = i; Of course this is dirt simple example, but instead look at it for e.g. a csv parser with some complex data structure creation + manipulation. I have some real world code here[0] that uses it. Not only is there less allocations and uses the GC but also it ends up being significantly faster! [0] https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124
Feb 04
parent Cym13 <cpicard openmailbox.org> writes:
On Sunday, 5 February 2017 at 04:22:30 UTC, rikki cattermole 
wrote:
 On 05/02/2017 5:02 PM, thedeemon wrote:

 snip

 It may look so from a distance. But in my experience it's not 
 that bad.
 In most software I did in D it did not matter really (it's 
 either 64-bit
 or short lived programs) and the control D gives to choose how 
 to deal
 with everything makes it all quite manageable, I can decide 
 what to take
 from both worlds and hence pick the best, not the worst.
The best of both worlds can be done quite simply. Instead of a chain of input ranges like: int[] data = input.filter!"a != 7".map!"a * 2".array; Use: int[] data; data.length = input.length; size_t i; foreach(v; input.filter!"a != 7".map!"a * 2") { data[i] = v; i++; } data.length = i; Of course this is dirt simple example, but instead look at it for e.g. a csv parser with some complex data structure creation + manipulation. I have some real world code here[0] that uses it. Not only is there less allocations and uses the GC but also it ends up being significantly faster! [0] https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124
Some data to weigh that in order to compare different memory management strategies on that simple case: #!/usr/bin/env rdmd import std.conv; import std.stdio; import std.array; import std.range; import std.algorithm; auto input = [1, 2, 7, 3, 7, 8, 8, 9, 7, 1, 0]; void naive() { int[] data = input.filter!(a => a!= 7).map!(a => a*2).array; assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void maxReallocs() { int[] data; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data ~= v; } assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void betterOfTwoWorlds() { int[] data; data.length = input.length; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i] = v; i++; } data.length = i; assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void explicitNew() { int[] data = new int[input.length]; scope(exit) delete data; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i] = v; i++; } data.length = i; assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void cStyle() nogc { import std.c.stdlib; int* data = cast(int*)malloc(input.length * int.sizeof); scope(exit) free(data); size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i++] = v; } debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void onTheStack() nogc { int[100] data; size_t i; foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) { data[i++] = v; } debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string); } void main(string[] args) { import std.datetime; benchmark!( naive, maxReallocs, betterOfTwoWorlds, explicitNew, cStyle, onTheStack )(100000).each!writeln; } /* Results: Compiled with dmd -profile=gc test.d ==================================== TickDuration(385731143) // naive, TickDuration(575673615) // maxReallocs, TickDuration(255928562) // betterOfTwoWorlds, TickDuration(270497154) // explicitNew, TickDuration(97596363) // cStyle, TickDuration(96467459) // onTheStack GC usage: bytes allocated, allocations, type, function, file:line 17600000 100000 int[] test.explicitNew test.d:43 4400000 100000 int[] test.betterOfTwoWorlds test.d:30 3200000 800000 int[] test.maxReallocs test.d:22 3200000 100000 int[] test.maxReallocs test.d:25 3200000 100000 int[] test.explicitNew test.d:51 3200000 100000 int[] test.explicitNew test.d:53 3200000 100000 int[] test.betterOfTwoWorlds test.d:37 3200000 100000 int[] test.betterOfTwoWorlds test.d:39 3200000 100000 std.array.Appender!(int[]).Appender.Data std.array.Appender!(int[]).Appender.this /usr/include/dlang/dmd/std/array.d:2675 3200000 100000 int[] test.naive test.d:14 Compiled with dmd -O -inline test.d =================================== TickDuration(159383005) // naive, TickDuration(187192137) // maxReallocs, TickDuration(94094585) // betterOfTwoWorlds, TickDuration(102374657) // explicitNew, TickDuration(41801695) // cStyle, TickDuration(45613954) // onTheStack Compiled with dmd -O -inline -release -boundscheck=off test.d ============================================================= TickDuration(152151439) // naive, TickDuration(140870515) // maxReallocs, TickDuration(46740440) // betterOfTwoWorlds, TickDuration(59089016) // explicitNew, TickDuration(26038060) // cStyle, TickDuration(25984371) // onTheStack */
Feb 05