www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is a moving GC really needed?

reply Lionello Lunesu <lio lunesu.remove.com> writes:
I've noticed that some design decisions are made with the possibility of 
a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
Is a moving GC really worth the extra trouble?

Being able to move memory blocks reduces memory fragmentation, am I 
correct? Is this the only reason? (For the remainder of this post, I'm 
assuming it is.)

I've experienced the problems of memory fragmentation first hand. In the 
project I'm working on (3D visualization software) I've had to track 
out-of-memory errors, which turned out to be because of virtual memory 
fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's 
malloc directly calls VirtualAlloc for big memory blocks) for 80MB 
failed. Our problems were resolved by reserving a huge block (~1GB) of 
virtual memory at application start-up, to prevent third-party DLLs from 
fragmenting the virtual address space.

One of the reasons we ran into problems with memory fragmentation was 
that Windows is actually only using 2GB of virtual address space. Using 
Windows Address Extension (a flag passed to the linker), however, it is 
possible to get the full 4GB of virtual address space available. That's 
an extra 2GB of continuous virtual address space! In the (near) future 
we'll have 2^64 bytes of virtual address space, which "should be enough 
for anyone".

Is the extra complexity and run-time overhead of a moving GC worth the 
trouble, at this point in time?

L.
Oct 02 2006
next sibling parent reply "Chris Miller" <chris dprogramming.com> writes:
On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu  
<lio lunesu.remove.com> wrote:

 I've noticed that some design decisions are made with the possibility of  
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why?  
 Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I  
 correct? Is this the only reason? (For the remainder of this post, I'm  
 assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in. It may also improve caching as allocated memory is more together.
Oct 02 2006
next sibling parent Lars Ivar Igesund <larsivar igesund.net> writes:
Chris Miller wrote:

 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu
 <lio lunesu.remove.com> wrote:
 
 I've noticed that some design decisions are made with the possibility of
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why?
 Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I
 correct? Is this the only reason? (For the remainder of this post, I'm
 assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in. It may also improve caching as allocated memory is more together.

Also, the moving part can (and most likely will) lead to the implementation of a generational GC, which should be able to reduce the time spent in scanning with large amounts (I believe todays Java GC's often operate with 6 generations). All GC traits have pro's and con's, the current D GC is as close as you come to a naive implementation, different types of applications need different traits, so having different GC's available will be necessary to secure a future for D. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi
Oct 02 2006
prev sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 
 I've noticed that some design decisions are made with the possibility 
 of a moving GC in mind. Will D indeed end up with a moving GC? If so, 
 why? Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, I'm 
 assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.

Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)
 It may also improve caching as allocated memory is more together.

This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed together.. L.
Oct 02 2006
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.

Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)

"Normal" allocators have to deal with fragmentation. With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) )
 It may also improve caching as allocated memory is more together.

This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed

IIRC CPUs typically have multiple leveled caches, and the later levels have larger cache lines. Not sure how big they get though.
Oct 02 2006
next sibling parent Dave <Dave_member pathlink.com> writes:
Frits van Bommel wrote:
 Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.

Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)

"Normal" allocators have to deal with fragmentation. With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) )

And then if you add 'generations' of objects by 'age', you can effectively scan only the 'youngest' generation to recover some good percentage of garbage, and only scan the rest if memory is still tight. Seems to makes for a pretty good strategy for many programs, including interactive ones (at least for imperative languages). I think the .NET GC does a really good job from what I've seen, and IIRC it is a moving / generational GC. So is Sun's latest (again, IIRC).
 It may also improve caching as allocated memory is more together.

This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed

IIRC CPUs typically have multiple leveled caches, and the later levels have larger cache lines. Not sure how big they get though.

Oct 02 2006
prev sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
Frits van Bommel wrote:
 Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.

Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)

"Normal" allocators have to deal with fragmentation. With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) )

Imagine doing that with 2^64 bytes of virtual memory! Maybe you don't even need to compact (although, small blocks will keep entire pages in memory.. bad idea) L.
Oct 02 2006
prev sibling parent Kevin Bealer <kevinbealer gmail.com> writes:
Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:

 I've noticed that some design decisions are made with the possibility 
 of a moving GC in mind. Will D indeed end up with a moving GC? If so, 
 why? Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.

Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)
 It may also improve caching as allocated memory is more together.

This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed together.. L.

If you think about it from the point of view of disk blocks or pages it makes more sense, but even cache lines could benefit some. What happens is that normal applications typically have linked lists and arrays of pointers to sub-structures. Moving allocators normally traverse these objects in 'depth-first' mode, and copying the data into some destination area as they scan it, rather than only moving to fill gaps as you might guess. Copying a linked list in depth first order, means that the nodes of the list are copied to a new region of memory - and end up *sequential* in memory as a result. From then on, they are sequential and contiguous, packed into nearly as few pages as possible. Later, as nodes are added to the middle of the list, the list gets 'scattered', but is always recollected and re-sequentialized during the next sweep. The same is true for arrays of pointers, binary trees, and many other structures. It's probably less important for hash tables and "value" arrays. Also it's more important when memory is full, because disk pages are heavier than cache lines. Java programmers are taught that "ArrayList" is the best list for most applications -- it's basically c++'s std::vector -- so now arrays have become the norm in Java and C++, and the argument for moving collectors is probably weaker now. For LISP like languages (postscript, lisp, scheme, sml, ocaml), this is probably a bigger deal. Kevin
Oct 04 2006
prev sibling next sibling parent reply xs0 <xs0 xs0.com> writes:
Lionello Lunesu wrote:
 I've noticed that some design decisions are made with the possibility of 
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
 Is a moving GC really worth the extra trouble?
 
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, I'm 
 assuming it is.)
 
 I've experienced the problems of memory fragmentation first hand. In the 
 project I'm working on (3D visualization software) I've had to track 
 out-of-memory errors, which turned out to be because of virtual memory 
 fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's 
 malloc directly calls VirtualAlloc for big memory blocks) for 80MB 
 failed. Our problems were resolved by reserving a huge block (~1GB) of 
 virtual memory at application start-up, to prevent third-party DLLs from 
 fragmenting the virtual address space.
 
 One of the reasons we ran into problems with memory fragmentation was 
 that Windows is actually only using 2GB of virtual address space. Using 
 Windows Address Extension (a flag passed to the linker), however, it is 
 possible to get the full 4GB of virtual address space available. That's 
 an extra 2GB of continuous virtual address space! In the (near) future 
 we'll have 2^64 bytes of virtual address space, which "should be enough 
 for anyone".
 
 Is the extra complexity and run-time overhead of a moving GC worth the 
 trouble, at this point in time?

While I'm no expert, I doubt a moving GC is even possible in a systems language like D. First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update union { int a; void* b; } ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.? And second, for the generational case, you need an efficient way to track references from older objects to newer objects, otherwise you need to scan them all, defeating the point of having generations in the first place. While a JIT-compiled language/runtime can relatively easily (and efficiently) do this by injecting appropriate code into older objects, I think it's practically impossible to do so with native code. I've no idea how to overcome those without involving the end-user (programmer) and/or losing quite a lot of speed during normal operation, which I'm quite sure are not acceptable trade-offs. On the bright side, I believe there's considerably less need to heap-allocate in D than, say, in Java, and even when used, one can overcome a bad(slow) GC in many cases (with stuff like malloc/free, delete, etc.), so the performance of GC is not as critical. If the compiler/GC were improved to differentiate between atomic and non-atomic data (the latter contains pointers to other data, the first doesn't), so memory areas that can't contain pointers don't get scanned at all*, I think I'd already be quite happy with the state of things.. xs0 *) that may already be the case, but last time I checked it wasn't :)
Oct 02 2006
parent reply Chad J <""gamerChad\" spamIsBad gmail.com"> writes:
xs0 wrote:
 
 While I'm no expert, I doubt a moving GC is even possible in a systems 
 language like D.
 
 First, if you move things around, you obviously need to be precise when 
 updating pointers, lest all hell breaks loose. But how do you update
 
 union {
     int a;
     void* b;
 }
 
 ? While you could forbid overlap of pointers and non-pointer data, what 
 about custom allocators, assembler, C libraries (including OS runtime!), 
 etc.?

For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } } Now the GC can be precise with unions. Notice also the enum, which would be nice to make available to userland - AFAIK many unions are coded in a struct like that, so this will not be a loss in memory usage for those cases, provided D exposes the implicit union information. At any rate, unions seem pretty rare, so no one would notice the extra mem usage. Not sure how custom allocators mess up the GC, I thought these were just on their own anyways. If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything. Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm: struct Foo { int member1; int member2; } Foo bar; ... Foo* foo = &bar; int extracted; // foo spotted in the assembly block, never mind the context // as such, foo gets pinned. asm { mov EAX, foo; // EAX = foo; add EAX, 4; // EAX += 4; mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2; } // foo is unpinned here As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way. C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.
 
 On the bright side, I believe there's considerably less need to 
 heap-allocate in D than, say, in Java, and even when used, one can 
 overcome a bad(slow) GC in many cases (with stuff like malloc/free, 
 delete, etc.), so the performance of GC is not as critical.

structs are teh rulez. I'm still not comfortable with manual memory management in D though, mostly because the standard lib (phobos) is built with GC in mind and will probably leak the hell out of my program if I trust it too far. Either that or I have to roll my own functions, which sucks, or I have to be stuck with std.c which also sucks because it's not nearly as nice as phobos IMO. Mostly I agree with this though. Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.
 
 If the compiler/GC were improved to differentiate between atomic and 
 non-atomic data (the latter contains pointers to other data, the first 
 doesn't), so memory areas that can't contain pointers don't get scanned 
 at all*, I think I'd already be quite happy with the state of things..
 
 
 xs0
 
 *) that may already be the case, but last time I checked it wasn't :)

I'd love this optimization. It doesn't seem too horribly hard to do either. The GC needs a new heap and a new allocation function and the compiler needs to be trained to use the new allocation function.
Oct 02 2006
parent reply xs0 <xs0 xs0.com> writes:
Chad J > wrote:
 xs0 wrote:
 While I'm no expert, I doubt a moving GC is even possible in a systems 
 language like D.

 First, if you move things around, you obviously need to be precise 
 when updating pointers, lest all hell breaks loose. But how do you update

 union {
     int a;
     void* b;
 }

 ? While you could forbid overlap of pointers and non-pointer data, 
 what about custom allocators, assembler, C libraries (including OS 
 runtime!), etc.?

For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } }

Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application.. And, even if it was an option, it doesn't really solve the precision issue. To be precise, the GC must know the internal structure of anything on its heap, which I don't think is possible unless the language was severely restricted or it was required that the programmer supplies that information manually whenever the purpose of a part of memory changes from pointer to non-pointer and vice-versa. Both options suck quite a lot.
 Not sure how custom allocators mess up the GC, I thought these were just 
 on their own anyways.  If a pointer to something is outside of the GC 
 heap, the GC doesn't bother changing it or collecting it or moving 
 anything.

That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..
 Assembler is a bit tricky, maybe someone smarter than I can handle it 
 better, but here's a shot with some psuedoasm:
 
 struct Foo
 {
   int member1;
   int member2;
 }
 Foo bar;
 
 ...
 
 Foo* foo = &bar;
 int extracted;
 // foo spotted in the assembly block, never mind the context
 // as such, foo gets pinned.
 asm
 {
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
 }
 // foo is unpinned here

I'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.
 As for C libraries, it seems like the same thing as custom allocators. 
 The C heap is outside of the GC's jurisdiction and won't be moved or 
 manipulated in any way.  C code that handles D objects will have to be 
 careful, and the callee D code will have to pin the objects before the 
 go out into the unknown.

I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?
 Also, I wonder, if I were to make a tool that does escape analysis on 
 your program, then finds that a number of classes can either be stack 
 allocated or safely deleted after they reach a certain point in the 
 code, then would this change the effectiveness of a generational GC? 
 Perhaps part of why young objects die so often is because they are 
 temporary things that can often be safely deleted at the end of scope or 
 some such.

Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;) xs0
Oct 04 2006
parent reply Chad J <""gamerChad\" spamIsBad gmail.com"> writes:
xs0 wrote:
 Chad J > wrote:
 
 xs0 wrote:

 While I'm no expert, I doubt a moving GC is even possible in a 
 systems language like D.

 First, if you move things around, you obviously need to be precise 
 when updating pointers, lest all hell breaks loose. But how do you 
 update

 union {
     int a;
     void* b;
 }

 ? While you could forbid overlap of pointers and non-pointer data, 
 what about custom allocators, assembler, C libraries (including OS 
 runtime!), etc.?

For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } }

Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application..

I think you can though, the same way that all D classes are prepended by a pointer to a vtable and a 'monitor', or the same way that dynamic arrays have not just a pointer, but an added length variable.
 And, even if it was an option, it doesn't really solve the precision 
 issue. To be precise, the GC must know the internal structure of 
 anything on its heap, which I don't think is possible unless the 
 language was severely restricted or it was required that the programmer 
 supplies that information manually whenever the purpose of a part of 
 memory changes from pointer to non-pointer and vice-versa. Both options 
 suck quite a lot.
 

But it can be precise with said union. It will know at compile time where all of the unions are stored, in the same way it knows where pointers are stored. Then, when it looks at the unions, it determines at runtime whether or not they contain pointers by using the extra member info. Either the member info says the current member is a pointer, or it says that the current member isn't a pointer. There is no maybe. IMO this is not a severe restriction, and it does not require any programmer intervention. The compiler writes all of the code that sets unions' member info variables, and to the programmer the member info is readonly and needs no maintenance.
 
 Not sure how custom allocators mess up the GC, I thought these were 
 just on their own anyways.  If a pointer to something is outside of 
 the GC heap, the GC doesn't bother changing it or collecting it or 
 moving anything.

That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..

Hmmm. In the case of the allocator I think it is the allocator writer's responsibility to use gc.addRange() on all of the pointers that they toss into the byte[]. What bugs me now though, is other arbitrary data cases, like std.boxer. So far the only answer I can think of is to make people dealing with arbitrarily typed data be very careful about gc.addRange'ing and gc.removeRange'ing any references that goes into or out of their arbitrarily typed data, which does suck.
 
 Assembler is a bit tricky, maybe someone smarter than I can handle it 
 better, but here's a shot with some psuedoasm:

 struct Foo
 {
   int member1;
   int member2;
 }
 Foo bar;

 ...

 Foo* foo = &bar;
 int extracted;
 // foo spotted in the assembly block, never mind the context
 // as such, foo gets pinned.
 asm
 {
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
 }
 // foo is unpinned here

I'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.

Yeah, I realized stuff like ASM that rewrites/rearranges the stack would mess up the GC. Perhaps we just need a comprehensive list of do's, dont's, and workarounds in D asm programming?
 
 As for C libraries, it seems like the same thing as custom allocators. 
 The C heap is outside of the GC's jurisdiction and won't be moved or 
 manipulated in any way.  C code that handles D objects will have to be 
 careful, and the callee D code will have to pin the objects before the 
 go out into the unknown.

I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?

It already knows where pointers are in each frame of the stack for D code, can't we make it also know which frames aren't D code at all? For the C code frames it doesn't look at them at all. Any D code that could let references out into C code (only way I know of is with extern(C)) should pin that object on the D side and keep a reference until execution has returned to D.
 
 Also, I wonder, if I were to make a tool that does escape analysis on 
 your program, then finds that a number of classes can either be stack 
 allocated or safely deleted after they reach a certain point in the 
 code, then would this change the effectiveness of a generational GC? 
 Perhaps part of why young objects die so often is because they are 
 temporary things that can often be safely deleted at the end of scope 
 or some such.

Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;) xs0

Oct 05 2006
parent Don Clugston <dac nospam.com.au> writes:
Chad J > wrote:
 xs0 wrote:
 Chad J > wrote:

 xs0 wrote:

 While I'm no expert, I doubt a moving GC is even possible in a 
 systems language like D.

 First, if you move things around, you obviously need to be precise 
 when updating pointers, lest all hell breaks loose. But how do you 
 update

 union {
     int a;
     void* b;
 }

 ? While you could forbid overlap of pointers and non-pointer data, 
 what about custom allocators, assembler, C libraries (including OS 
 runtime!), etc.?

For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } }

Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application..

I think you can though, the same way that all D classes are prepended by a pointer to a vtable and a 'monitor', or the same way that dynamic arrays have not just a pointer, but an added length variable.
 And, even if it was an option, it doesn't really solve the precision 
 issue. To be precise, the GC must know the internal structure of 
 anything on its heap, which I don't think is possible unless the 
 language was severely restricted or it was required that the 
 programmer supplies that information manually whenever the purpose of 
 a part of memory changes from pointer to non-pointer and vice-versa. 
 Both options suck quite a lot.

But it can be precise with said union. It will know at compile time where all of the unions are stored, in the same way it knows where pointers are stored. Then, when it looks at the unions, it determines at runtime whether or not they contain pointers by using the extra member info. Either the member info says the current member is a pointer, or it says that the current member isn't a pointer. There is no maybe. IMO this is not a severe restriction, and it does not require any programmer intervention. The compiler writes all of the code that sets unions' member info variables, and to the programmer the member info is readonly and needs no maintenance.
 Not sure how custom allocators mess up the GC, I thought these were 
 just on their own anyways.  If a pointer to something is outside of 
 the GC heap, the GC doesn't bother changing it or collecting it or 
 moving anything.

That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..

Hmmm. In the case of the allocator I think it is the allocator writer's responsibility to use gc.addRange() on all of the pointers that they toss into the byte[]. What bugs me now though, is other arbitrary data cases, like std.boxer. So far the only answer I can think of is to make people dealing with arbitrarily typed data be very careful about gc.addRange'ing and gc.removeRange'ing any references that goes into or out of their arbitrarily typed data, which does suck.
 Assembler is a bit tricky, maybe someone smarter than I can handle it 
 better, but here's a shot with some psuedoasm:

 struct Foo
 {
   int member1;
   int member2;
 }
 Foo bar;

 ...

 Foo* foo = &bar;
 int extracted;
 // foo spotted in the assembly block, never mind the context
 // as such, foo gets pinned.
 asm
 {
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
 }
 // foo is unpinned here

I'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.

Yeah, I realized stuff like ASM that rewrites/rearranges the stack would mess up the GC. Perhaps we just need a comprehensive list of do's, dont's, and workarounds in D asm programming?
 As for C libraries, it seems like the same thing as custom 
 allocators. The C heap is outside of the GC's jurisdiction and won't 
 be moved or manipulated in any way.  C code that handles D objects 
 will have to be careful, and the callee D code will have to pin the 
 objects before the go out into the unknown.

I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?

It already knows where pointers are in each frame of the stack for D code, can't we make it also know which frames aren't D code at all? For the C code frames it doesn't look at them at all. Any D code that could let references out into C code (only way I know of is with extern(C)) should pin that object on the D side and keep a reference until execution has returned to D.
 Also, I wonder, if I were to make a tool that does escape analysis on 
 your program, then finds that a number of classes can either be stack 
 allocated or safely deleted after they reach a certain point in the 
 code, then would this change the effectiveness of a generational GC? 
 Perhaps part of why young objects die so often is because they are 
 temporary things that can often be safely deleted at the end of scope 
 or some such.

Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;) xs0


Why do you need a 100% moving GC anyway? Wouldn't it possible to combine moving GC with mark-and-sweep for those cases (like unions and asm) where you just can't be sure? In other words, a 'conservative moving gc'. (Might be a nightmare to implement, of course).
Oct 06 2006
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
Lionello Lunesu wrote:
 I've noticed that some design decisions are made with the possibility of 
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
 Is a moving GC really worth the extra trouble?
 
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, I'm 
 assuming it is.)

Fragmentation shouldn't be an issue with the correct allocator strategy, see: http://citeseer.ist.psu.edu/johnstone97memory.html I think the real reason is to speed up collections, as generational GCs (a class of moving GCs) begin by scanning only a segment of allocated memory which contains 'new' allocations. Only if no free space is found does it bother to scan older generations. On each collection, 'new' memory that is still alive typically graduates to an older generation. The idea behind this strategy is that memory which has been around for a while will probably be around for a long time, while a large number of allocations (in function temporaries) are discarded and available for collection almost immediately.
 Is the extra complexity and run-time overhead of a moving GC worth the 
 trouble, at this point in time?

Probably not, but it's more a goal to make D compatible with moving GCs than to write one at the moment. But doing so is a big pain in some cases--default implementations of toHash and opCmp being particularly sore points. Sean
Oct 02 2006
prev sibling parent reply Marcio <mqmnews123 sglebs.com> writes:
Lionello Lunesu wrote:
 I've noticed that some design decisions are made with the possibility of 
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
 Is a moving GC really worth the extra trouble?

Some people claim that a program that runs for a long time (web server, servlet container etc) would fragment memory so bad that eventually you can run out of RAM even though you still have lots. That is not a theory. This has been reported to me by a close friend who works for a big company which I will not name. A new JVM with a GC that moved objects solved the problem their customer was having. A GC that moves objects can compact memory and avoid the fragmentation. I also don't see how a moving GC can live with the D features. I know that having a non-moving GC can have people produce code that is not portable across compiler implementations. I have seen this happen with code for SmartEiffel versus ISE Eiffel for example. Using SmartEiffel, people cached pointers to Eiffel objects from C DLLs etc. Worked in SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC). marcio
Oct 04 2006
next sibling parent Sean Kelly <sean f4.ca> writes:
Marcio wrote:
 
     I also don't see how a moving GC can live with the D features. I 
 know that having a non-moving GC can have people produce code that is 
 not portable across compiler implementations. I have seen this happen 
 with code for SmartEiffel versus ISE Eiffel for example. Using 
 SmartEiffel, people cached pointers to Eiffel objects from C DLLs etc. 
 Worked in SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC).

I'll admit I've wondered about this. Without sufficient type information the GC has no way of knowing whether a particular 32-bit (or 64-bit) value that *could be* a pointer actually *is* a pointer and therefore should be altered when a move occurs. Assuming the type information is present however, it should be simple enough to state that pointers must be typed as such (ie. it is illegal to store a pointer in a byte[4], for example). In fact, I think D already has this restriction in place. Sean
Oct 04 2006
prev sibling parent Benji Smith <dlanguage benjismith.net> writes:
Marcio wrote:
     I also don't see how a moving GC can live with the D features. I 
 know that having a non-moving GC can have people produce code that is 
 not portable across compiler implementations.

In C#, a block of code can't create or access pointers unless it uses the "unsafe" keyword. And, any reference types within the method body have to be specifically pinned at one memory location (using a "fixed" block) whenever a pointer might be accessing that memory. It works really well. The vast majority of my code uses the default GC behavior, but every once in a while, I need to do some pointer manipulation, and I temporarily tell the garbage collector to refrain from moving my objects. I think a similar system could work in D. --Benji Smith
Oct 05 2006