digitalmars.D - Is a moving GC really needed?

Lionello Lunesu (24/24) Oct 02 2006 I've noticed that some design decisions are made with the possibility of...

Chris Miller (5/11) Oct 02 2006 I believe it allows for very fast allocations, almost as fast as stack

Lars Ivar Igesund (12/27) Oct 02 2006 Also, the moving part can (and most likely will) lead to the implementat...
Lionello Lunesu (10/25) Oct 02 2006 Why would it allocate faster than a non-moving GC? (Ignoring for the

Frits van Bommel (11/34) Oct 02 2006 "Normal" allocators have to deal with fragmentation.

Dave (7/44) Oct 02 2006 And then if you add 'generations' of objects by 'age', you can effective...
Lionello Lunesu (5/28) Oct 02 2006 Imagine doing that with 2^64 bytes of virtual memory! Maybe you don't

Kevin Bealer (26/57) Oct 04 2006 If you think about it from the point of view of disk blocks or pages it

xs0 (31/58) Oct 02 2006 While I'm no expert, I doubt a moving GC is even possible in a systems

Chad J (73/103) Oct 02 2006 For the union, I might suggest a more acceptable tradeoff - mandate that...

xs0 (32/110) Oct 04 2006 Well, you can't simply mandate that, I think.. It can be the case that

Chad J (32/167) Oct 05 2006 I think you can though, the same way that all D classes are prepended by...

Don Clugston (5/185) Oct 06 2006 Why do you need a 100% moving GC anyway? Wouldn't it possible to combine...

Sean Kelly (17/26) Oct 02 2006 Fragmentation shouldn't be an issue with the correct allocator strategy,...
Marcio (15/18) Oct 04 2006 Some people claim that a program that runs for a long time (web server,...

Sean Kelly (10/17) Oct 04 2006 I'll admit I've wondered about this. Without sufficient type
Benji Smith (11/14) Oct 05 2006 In C#, a block of code can't create or access pointers unless it uses

Lionello Lunesu <lio lunesu.remove.com> writes:

I've noticed that some design decisions are made with the possibility of 
a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
Is a moving GC really worth the extra trouble?

Being able to move memory blocks reduces memory fragmentation, am I 
correct? Is this the only reason? (For the remainder of this post, I'm 
assuming it is.)

I've experienced the problems of memory fragmentation first hand. In the 
project I'm working on (3D visualization software) I've had to track 
out-of-memory errors, which turned out to be because of virtual memory 
fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's 
malloc directly calls VirtualAlloc for big memory blocks) for 80MB 
failed. Our problems were resolved by reserving a huge block (~1GB) of 
virtual memory at application start-up, to prevent third-party DLLs from 
fragmenting the virtual address space.

One of the reasons we ran into problems with memory fragmentation was 
that Windows is actually only using 2GB of virtual address space. Using 
Windows Address Extension (a flag passed to the linker), however, it is 
possible to get the full 4GB of virtual address space available. That's 
an extra 2GB of continuous virtual address space! In the (near) future 
we'll have 2^64 bytes of virtual address space, which "should be enough 
for anyone".

Is the extra complexity and run-time overhead of a moving GC worth the 
trouble, at this point in time?

L.

Oct 02 2006

"Chris Miller" <chris dprogramming.com> writes:

On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu  
<lio lunesu.remove.com> wrote:

 I've noticed that some design decisions are made with the possibility of  
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why?  
 Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I  
 correct? Is this the only reason? (For the remainder of this post, I'm  
 assuming it is.)

I believe it allows for very fast allocations, almost as fast as stack  
allocation. This is something I'm very interested in.
It may also improve caching as allocated memory is more together.

Oct 02 2006

Lars Ivar Igesund <larsivar igesund.net> writes:

Chris Miller wrote:

 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu
 <lio lunesu.remove.com> wrote:
 
 I've noticed that some design decisions are made with the possibility of
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why?
 Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I
 correct? Is this the only reason? (For the remainder of this post, I'm
 assuming it is.)

 
 I believe it allows for very fast allocations, almost as fast as stack
 allocation. This is something I'm very interested in.
 It may also improve caching as allocated memory is more together.

Also, the moving part can (and most likely will) lead to the implementation
of a generational GC, which should be able to reduce the time spent in
scanning with large amounts (I believe todays Java GC's often operate with
6 generations). All GC traits have pro's and con's, the current D GC is as
close as you come to a naive implementation, different types of
applications need different traits, so having different GC's available will
be necessary to secure a future for D.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi

Oct 02 2006

Lionello Lunesu <lio lunesu.remove.com> writes:

Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 
 I've noticed that some design decisions are made with the possibility 
 of a moving GC in mind. Will D indeed end up with a moving GC? If so, 
 why? Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, I'm 
 assuming it is.)

 
 I believe it allows for very fast allocations, almost as fast as stack 
 allocation. This is something I'm very interested in.

Why would it allocate faster than a non-moving GC? (Ignoring for the 
moment the cost of moving.)

 It may also improve caching as allocated memory is more together.

This I understand, but what's the granularity (if that's the correct 
term) of a CPU cache?

http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache 
keeps copies of 64 byte lines of memory."

I guess the chances are pretty low for two blocks to be moved close 
together and be cached together, and be accessed together..

L.

Oct 02 2006

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

 I believe it allows for very fast allocations, almost as fast as stack 
 allocation. This is something I'm very interested in.

 
 Why would it allocate faster than a non-moving GC? (Ignoring for the 
 moment the cost of moving.)

"Normal" allocators have to deal with fragmentation.
With a moving GC you can allocate everything to the top of the heap and 
rely on the GC to shrink the heap when you run out of space.
So the allocator can basically be "increment top-of-heap-pointer by size 
of allocation and return old value" (synchronized if in a multi threaded 
environment, of course).
That's pretty much the most efficient allocator possible (as long as the 
GC doesn't need to run, anyway ;) )

 It may also improve caching as allocated memory is more together.

 
 This I understand, but what's the granularity (if that's the correct 
 term) of a CPU cache?
 
 http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache 
 keeps copies of 64 byte lines of memory."
 
 I guess the chances are pretty low for two blocks to be moved close 
 together and be cached together, and be accessed 

IIRC CPUs typically have multiple leveled caches, and the later levels 
have larger cache lines. Not sure how big they get though.

Oct 02 2006

Dave <Dave_member pathlink.com> writes:

Frits van Bommel wrote:
 Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

 I believe it allows for very fast allocations, almost as fast as 
 stack allocation. This is something I'm very interested in.

 Why would it allocate faster than a non-moving GC? (Ignoring for the 
 moment the cost of moving.)

 
 "Normal" allocators have to deal with fragmentation.
 With a moving GC you can allocate everything to the top of the heap and 
 rely on the GC to shrink the heap when you run out of space.
 So the allocator can basically be "increment top-of-heap-pointer by size 
 of allocation and return old value" (synchronized if in a multi threaded 
 environment, of course).
 That's pretty much the most efficient allocator possible (as long as the 
 GC doesn't need to run, anyway ;) )
 

And then if you add 'generations' of objects by 'age', you can effectively scan
only the 'youngest' 
generation to recover some good percentage of garbage, and only scan the rest
if memory is still 
tight. Seems to makes for a pretty good strategy for many programs, including
interactive ones (at 
least for imperative languages).

I think the .NET GC does a really good job from what I've seen, and IIRC it is
a moving / 
generational GC. So is Sun's latest (again, IIRC).

 It may also improve caching as allocated memory is more together.

 This I understand, but what's the granularity (if that's the correct 
 term) of a CPU cache?

 http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data 
 cache keeps copies of 64 byte lines of memory."

 I guess the chances are pretty low for two blocks to be moved close 
 together and be cached together, and be accessed 

 
 IIRC CPUs typically have multiple leveled caches, and the later levels 
 have larger cache lines. Not sure how big they get though.

Oct 02 2006

Lionello Lunesu <lio lunesu.remove.com> writes:

Frits van Bommel wrote:
 Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

 I believe it allows for very fast allocations, almost as fast as 
 stack allocation. This is something I'm very interested in.

 Why would it allocate faster than a non-moving GC? (Ignoring for the 
 moment the cost of moving.)

 
 "Normal" allocators have to deal with fragmentation.
 With a moving GC you can allocate everything to the top of the heap and 
 rely on the GC to shrink the heap when you run out of space.
 So the allocator can basically be "increment top-of-heap-pointer by size 
 of allocation and return old value" (synchronized if in a multi threaded 
 environment, of course).
 That's pretty much the most efficient allocator possible (as long as the 
 GC doesn't need to run, anyway ;) )

Imagine doing that with 2^64 bytes of virtual memory! Maybe you don't 
even need to compact (although, small blocks will keep entire pages in 
memory.. bad idea)

L.

Oct 02 2006

Kevin Bealer <kevinbealer gmail.com> writes:

Lionello Lunesu wrote:
 Chris Miller wrote:
 On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
 <lio lunesu.remove.com> wrote:

 I've noticed that some design decisions are made with the possibility 
 of a moving GC in mind. Will D indeed end up with a moving GC? If so, 
 why? Is a moving GC really worth the extra trouble?

 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, 
 I'm assuming it is.)

 I believe it allows for very fast allocations, almost as fast as stack 
 allocation. This is something I'm very interested in.

 
 Why would it allocate faster than a non-moving GC? (Ignoring for the 
 moment the cost of moving.)
 
 It may also improve caching as allocated memory is more together.

 
 This I understand, but what's the granularity (if that's the correct 
 term) of a CPU cache?
 
 http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache 
 keeps copies of 64 byte lines of memory."
 
 I guess the chances are pretty low for two blocks to be moved close 
 together and be cached together, and be accessed together..
 
 L.

If you think about it from the point of view of disk blocks or pages it 
makes more sense, but even cache lines could benefit some.

What happens is that normal applications typically have linked lists and 
arrays of pointers to sub-structures.  Moving allocators normally 
traverse these objects in 'depth-first' mode, and copying the data into 
some destination area as they scan it, rather than only moving to fill 
gaps as you might guess.

Copying a linked list in depth first order, means that the nodes of the 
list are copied to a new region of memory - and end up *sequential* in 
memory as a result.  From then on, they are sequential and contiguous, 
packed into nearly as few pages as possible.

Later, as nodes are added to the middle of the list, the list gets 
'scattered', but is always recollected and re-sequentialized during the 
next sweep.

The same is true for arrays of pointers, binary trees, and many other 
structures.  It's probably less important for hash tables and "value" 
arrays.  Also it's more important when memory is full, because disk 
pages are heavier than cache lines.

Java programmers are taught that "ArrayList" is the best list for most 
applications -- it's basically c++'s std::vector -- so now arrays have 
become the norm in Java and C++, and the argument for moving collectors 
is probably weaker now.

For LISP like languages (postscript, lisp, scheme, sml, ocaml), this is 
probably a bigger deal.

Kevin

Oct 04 2006

xs0 <xs0 xs0.com> writes:

Lionello Lunesu wrote:
 I've noticed that some design decisions are made with the possibility of 
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
 Is a moving GC really worth the extra trouble?
 
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, I'm 
 assuming it is.)
 
 I've experienced the problems of memory fragmentation first hand. In the 
 project I'm working on (3D visualization software) I've had to track 
 out-of-memory errors, which turned out to be because of virtual memory 
 fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's 
 malloc directly calls VirtualAlloc for big memory blocks) for 80MB 
 failed. Our problems were resolved by reserving a huge block (~1GB) of 
 virtual memory at application start-up, to prevent third-party DLLs from 
 fragmenting the virtual address space.
 
 One of the reasons we ran into problems with memory fragmentation was 
 that Windows is actually only using 2GB of virtual address space. Using 
 Windows Address Extension (a flag passed to the linker), however, it is 
 possible to get the full 4GB of virtual address space available. That's 
 an extra 2GB of continuous virtual address space! In the (near) future 
 we'll have 2^64 bytes of virtual address space, which "should be enough 
 for anyone".
 
 Is the extra complexity and run-time overhead of a moving GC worth the 
 trouble, at this point in time?

While I'm no expert, I doubt a moving GC is even possible in a systems 
language like D.

First, if you move things around, you obviously need to be precise when 
updating pointers, lest all hell breaks loose. But how do you update

union {
     int a;
     void* b;
}

? While you could forbid overlap of pointers and non-pointer data, what 
about custom allocators, assembler, C libraries (including OS runtime!), 
etc.?

And second, for the generational case, you need an efficient way to 
track references from older objects to newer objects, otherwise you need 
to scan them all, defeating the point of having generations in the first 
place. While a JIT-compiled language/runtime can relatively easily (and 
efficiently) do this by injecting appropriate code into older objects, I 
think it's practically impossible to do so with native code.

I've no idea how to overcome those without involving the end-user 
(programmer) and/or losing quite a lot of speed during normal operation, 
which I'm quite sure are not acceptable trade-offs.

On the bright side, I believe there's considerably less need to 
heap-allocate in D than, say, in Java, and even when used, one can 
overcome a bad(slow) GC in many cases (with stuff like malloc/free, 
delete, etc.), so the performance of GC is not as critical.

If the compiler/GC were improved to differentiate between atomic and 
non-atomic data (the latter contains pointers to other data, the first 
doesn't), so memory areas that can't contain pointers don't get scanned 
at all*, I think I'd already be quite happy with the state of things..


xs0

*) that may already be the case, but last time I checked it wasn't :)

Oct 02 2006

Chad J <""gamerChad\" spamIsBad gmail.com"> writes:

xs0 wrote:
 
 While I'm no expert, I doubt a moving GC is even possible in a systems 
 language like D.
 
 First, if you move things around, you obviously need to be precise when 
 updating pointers, lest all hell breaks loose. But how do you update
 
 union {
     int a;
     void* b;
 }
 
 ? While you could forbid overlap of pointers and non-pointer data, what 
 about custom allocators, assembler, C libraries (including OS runtime!), 
 etc.?

For the union, I might suggest a more acceptable tradeoff - mandate that 
some data be inserted before every union to tell the gc which member is 
selected at any moment during program execution.  Whenever an assignment 
is done to the union, code is inserted to update the union's status.  So 
your union would look more like this:

enum StructStatus
{
   a,
   b,
}

struct
{
   StructStatus status; //or size_t, whichever works

   union
   {
     int a;
     void* b;
   }
}

Now the GC can be precise with unions.  Notice also the enum, which 
would be nice to make available to userland - AFAIK many unions are 
coded in a struct like that, so this will not be a loss in memory usage 
for those cases, provided D exposes the implicit union information.  At 
any rate, unions seem pretty rare, so no one would notice the extra mem 
usage.

Not sure how custom allocators mess up the GC, I thought these were just 
on their own anyways.  If a pointer to something is outside of the GC 
heap, the GC doesn't bother changing it or collecting it or moving 
anything.

Assembler is a bit tricky, maybe someone smarter than I can handle it 
better, but here's a shot with some psuedoasm:

struct Foo
{
   int member1;
   int member2;
}
Foo bar;

...

Foo* foo = &bar;
int extracted;
// foo spotted in the assembly block, never mind the context
// as such, foo gets pinned.
asm
{
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
}
// foo is unpinned here

As for C libraries, it seems like the same thing as custom allocators. 
The C heap is outside of the GC's jurisdiction and won't be moved or 
manipulated in any way.  C code that handles D objects will have to be 
careful, and the callee D code will have to pin the objects before the 
go out into the unknown.

 
 On the bright side, I believe there's considerably less need to 
 heap-allocate in D than, say, in Java, and even when used, one can 
 overcome a bad(slow) GC in many cases (with stuff like malloc/free, 
 delete, etc.), so the performance of GC is not as critical.

structs are teh rulez.

I'm still not comfortable with manual memory management in D though, 
mostly because the standard lib (phobos) is built with GC in mind and 
will probably leak the hell out of my program if I trust it too far. 
Either that or I have to roll my own functions, which sucks, or I have 
to be stuck with std.c which also sucks because it's not nearly as nice 
as phobos IMO.

Mostly I agree with this though.

Also, I wonder, if I were to make a tool that does escape analysis on 
your program, then finds that a number of classes can either be stack 
allocated or safely deleted after they reach a certain point in the 
code, then would this change the effectiveness of a generational GC? 
Perhaps part of why young objects die so often is because they are 
temporary things that can often be safely deleted at the end of scope or 
some such.

 
 If the compiler/GC were improved to differentiate between atomic and 
 non-atomic data (the latter contains pointers to other data, the first 
 doesn't), so memory areas that can't contain pointers don't get scanned 
 at all*, I think I'd already be quite happy with the state of things..
 
 
 xs0
 
 *) that may already be the case, but last time I checked it wasn't :)

I'd love this optimization.  It doesn't seem too horribly hard to do 
either.  The GC needs a new heap and a new allocation function and the 
compiler needs to be trained to use the new allocation function.

Oct 02 2006

xs0 <xs0 xs0.com> writes:

Chad J > wrote:
 xs0 wrote:
 While I'm no expert, I doubt a moving GC is even possible in a systems 
 language like D.

 First, if you move things around, you obviously need to be precise 
 when updating pointers, lest all hell breaks loose. But how do you update

 union {
     int a;
     void* b;
 }

 ? While you could forbid overlap of pointers and non-pointer data, 
 what about custom allocators, assembler, C libraries (including OS 
 runtime!), etc.?

 
 For the union, I might suggest a more acceptable tradeoff - mandate that 
 some data be inserted before every union to tell the gc which member is 
 selected at any moment during program execution.  Whenever an assignment 
 is done to the union, code is inserted to update the union's status.  So 
 your union would look more like this:
 
 enum StructStatus
 {
   a,
   b,
 }
 
 struct
 {
   StructStatus status; //or size_t, whichever works
 
   union
   {
     int a;
     void* b;
   }
 }

Well, you can't simply mandate that, I think.. It can be the case that 
the exact structure of a struct/union is mandated by requirements 
external to the application..

And, even if it was an option, it doesn't really solve the precision 
issue. To be precise, the GC must know the internal structure of 
anything on its heap, which I don't think is possible unless the 
language was severely restricted or it was required that the programmer 
supplies that information manually whenever the purpose of a part of 
memory changes from pointer to non-pointer and vice-versa. Both options 
suck quite a lot.


 Not sure how custom allocators mess up the GC, I thought these were just 
 on their own anyways.  If a pointer to something is outside of the GC 
 heap, the GC doesn't bother changing it or collecting it or moving 
 anything.

That's true, but a custom allocator can use the GC heap if it wants. For 
example, if I know I'll be allocating a million linked list nodes, and 
will then release them all at once, it can be considerably faster to 
allocate a large byte[] and use chunks of it for my nodes. But how can 
the GC tell which of those bytes are pointers and which aren't? At the 
beginning none are, but later some may be..


 Assembler is a bit tricky, maybe someone smarter than I can handle it 
 better, but here's a shot with some psuedoasm:
 
 struct Foo
 {
   int member1;
   int member2;
 }
 Foo bar;
 
 ...
 
 Foo* foo = &bar;
 int extracted;
 // foo spotted in the assembly block, never mind the context
 // as such, foo gets pinned.
 asm
 {
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
 }
 // foo is unpinned here

I'm sure simple cases can be detected, but am also fairly certain it's 
impossible to generally determine which registers hold pointers and 
which don't, and/or what will get referenced and what won't.


 As for C libraries, it seems like the same thing as custom allocators. 
 The C heap is outside of the GC's jurisdiction and won't be moved or 
 manipulated in any way.  C code that handles D objects will have to be 
 careful, and the callee D code will have to pin the objects before the 
 go out into the unknown.

I meant precision again, not C's heap. Even if compiled D code somehow 
notifies the GC what's on the stack, the C code definitely won't. Then, 
how can the GC tell which parts of the stack point to its own heap, and 
which are just integers that happen to look like they could point to the 
heap?


 Also, I wonder, if I were to make a tool that does escape analysis on 
 your program, then finds that a number of classes can either be stack 
 allocated or safely deleted after they reach a certain point in the 
 code, then would this change the effectiveness of a generational GC? 
 Perhaps part of why young objects die so often is because they are 
 temporary things that can often be safely deleted at the end of scope or 
 some such.

Well, I think it's quite hard to guess whether the result would be 
better or worse, speed-wise. Obviously, allocating those objects on the 
stack would speed things up, while using generations in the GC would be 
less effective, because the GC would see less short-lived objects. I've 
no idea, however, how those two effects would combine.. Most probably it 
depends very much on the actual application (as do most things ;)


xs0

Oct 04 2006

Chad J <""gamerChad\" spamIsBad gmail.com"> writes:

xs0 wrote:
 Chad J > wrote:
 
 xs0 wrote:

 While I'm no expert, I doubt a moving GC is even possible in a 
 systems language like D.

 First, if you move things around, you obviously need to be precise 
 when updating pointers, lest all hell breaks loose. But how do you 
 update

 union {
     int a;
     void* b;
 }

 ? While you could forbid overlap of pointers and non-pointer data, 
 what about custom allocators, assembler, C libraries (including OS 
 runtime!), etc.?


 For the union, I might suggest a more acceptable tradeoff - mandate 
 that some data be inserted before every union to tell the gc which 
 member is selected at any moment during program execution.  Whenever 
 an assignment is done to the union, code is inserted to update the 
 union's status.  So your union would look more like this:

 enum StructStatus
 {
   a,
   b,
 }

 struct
 {
   StructStatus status; //or size_t, whichever works

   union
   {
     int a;
     void* b;
   }
 }

 
 
 Well, you can't simply mandate that, I think.. It can be the case that 
 the exact structure of a struct/union is mandated by requirements 
 external to the application..
 

I think you can though, the same way that all D classes are prepended by 
a pointer to a vtable and a 'monitor', or the same way that dynamic 
arrays have not just a pointer, but an added length variable.

 And, even if it was an option, it doesn't really solve the precision 
 issue. To be precise, the GC must know the internal structure of 
 anything on its heap, which I don't think is possible unless the 
 language was severely restricted or it was required that the programmer 
 supplies that information manually whenever the purpose of a part of 
 memory changes from pointer to non-pointer and vice-versa. Both options 
 suck quite a lot.
 

But it can be precise with said union.  It will know at compile time 
where all of the unions are stored, in the same way it knows where 
pointers are stored.  Then, when it looks at the unions, it determines 
at runtime whether or not they contain pointers by using the extra 
member info.  Either the member info says the current member is a 
pointer, or it says that the current member isn't a pointer.  There is 
no maybe.

IMO this is not a severe restriction, and it does not require any 
programmer intervention.  The compiler writes all of the code that sets 
unions' member info variables, and to the programmer the member info is 
readonly and needs no maintenance.

 
 Not sure how custom allocators mess up the GC, I thought these were 
 just on their own anyways.  If a pointer to something is outside of 
 the GC heap, the GC doesn't bother changing it or collecting it or 
 moving anything.

 
 
 That's true, but a custom allocator can use the GC heap if it wants. For 
 example, if I know I'll be allocating a million linked list nodes, and 
 will then release them all at once, it can be considerably faster to 
 allocate a large byte[] and use chunks of it for my nodes. But how can 
 the GC tell which of those bytes are pointers and which aren't? At the 
 beginning none are, but later some may be..
 

Hmmm.  In the case of the allocator I think it is the allocator writer's 
responsibility to use gc.addRange() on all of the pointers that they 
toss into the byte[].  What bugs me now though, is other arbitrary data 
cases, like std.boxer.

So far the only answer I can think of is to make people dealing with 
arbitrarily typed data be very careful about gc.addRange'ing and 
gc.removeRange'ing any references that goes into or out of their 
arbitrarily typed data, which does suck.

 
 Assembler is a bit tricky, maybe someone smarter than I can handle it 
 better, but here's a shot with some psuedoasm:

 struct Foo
 {
   int member1;
   int member2;
 }
 Foo bar;

 ...

 Foo* foo = &bar;
 int extracted;
 // foo spotted in the assembly block, never mind the context
 // as such, foo gets pinned.
 asm
 {
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
 }
 // foo is unpinned here

 
 
 I'm sure simple cases can be detected, but am also fairly certain it's 
 impossible to generally determine which registers hold pointers and 
 which don't, and/or what will get referenced and what won't.
 

Yeah, I realized stuff like ASM that rewrites/rearranges the stack would 
mess up the GC.  Perhaps we just need a comprehensive list of do's, 
dont's, and workarounds in D asm programming?

 
 As for C libraries, it seems like the same thing as custom allocators. 
 The C heap is outside of the GC's jurisdiction and won't be moved or 
 manipulated in any way.  C code that handles D objects will have to be 
 careful, and the callee D code will have to pin the objects before the 
 go out into the unknown.

 
 
 I meant precision again, not C's heap. Even if compiled D code somehow 
 notifies the GC what's on the stack, the C code definitely won't. Then, 
 how can the GC tell which parts of the stack point to its own heap, and 
 which are just integers that happen to look like they could point to the 
 heap?
 

It already knows where pointers are in each frame of the stack for D 
code, can't we make it also know which frames aren't D code at all?  For 
the C code frames it doesn't look at them at all.  Any D code that could 
let references out into C code (only way I know of is with extern(C)) 
should pin that object on the D side and keep a reference until 
execution has returned to D.

 
 Also, I wonder, if I were to make a tool that does escape analysis on 
 your program, then finds that a number of classes can either be stack 
 allocated or safely deleted after they reach a certain point in the 
 code, then would this change the effectiveness of a generational GC? 
 Perhaps part of why young objects die so often is because they are 
 temporary things that can often be safely deleted at the end of scope 
 or some such.

 
 
 Well, I think it's quite hard to guess whether the result would be 
 better or worse, speed-wise. Obviously, allocating those objects on the 
 stack would speed things up, while using generations in the GC would be 
 less effective, because the GC would see less short-lived objects. I've 
 no idea, however, how those two effects would combine.. Most probably it 
 depends very much on the actual application (as do most things ;)
 
 
 xs0

Oct 05 2006

Don Clugston <dac nospam.com.au> writes:

Chad J > wrote:
 xs0 wrote:
 Chad J > wrote:

 xs0 wrote:

 While I'm no expert, I doubt a moving GC is even possible in a 
 systems language like D.

 First, if you move things around, you obviously need to be precise 
 when updating pointers, lest all hell breaks loose. But how do you 
 update

 union {
     int a;
     void* b;
 }

 ? While you could forbid overlap of pointers and non-pointer data, 
 what about custom allocators, assembler, C libraries (including OS 
 runtime!), etc.?


 For the union, I might suggest a more acceptable tradeoff - mandate 
 that some data be inserted before every union to tell the gc which 
 member is selected at any moment during program execution.  Whenever 
 an assignment is done to the union, code is inserted to update the 
 union's status.  So your union would look more like this:

 enum StructStatus
 {
   a,
   b,
 }

 struct
 {
   StructStatus status; //or size_t, whichever works

   union
   {
     int a;
     void* b;
   }
 }


 Well, you can't simply mandate that, I think.. It can be the case that 
 the exact structure of a struct/union is mandated by requirements 
 external to the application..

 
 I think you can though, the same way that all D classes are prepended by 
 a pointer to a vtable and a 'monitor', or the same way that dynamic 
 arrays have not just a pointer, but an added length variable.
 
 And, even if it was an option, it doesn't really solve the precision 
 issue. To be precise, the GC must know the internal structure of 
 anything on its heap, which I don't think is possible unless the 
 language was severely restricted or it was required that the 
 programmer supplies that information manually whenever the purpose of 
 a part of memory changes from pointer to non-pointer and vice-versa. 
 Both options suck quite a lot.

 
 But it can be precise with said union.  It will know at compile time 
 where all of the unions are stored, in the same way it knows where 
 pointers are stored.  Then, when it looks at the unions, it determines 
 at runtime whether or not they contain pointers by using the extra 
 member info.  Either the member info says the current member is a 
 pointer, or it says that the current member isn't a pointer.  There is 
 no maybe.
 
 IMO this is not a severe restriction, and it does not require any 
 programmer intervention.  The compiler writes all of the code that sets 
 unions' member info variables, and to the programmer the member info is 
 readonly and needs no maintenance.
 
 Not sure how custom allocators mess up the GC, I thought these were 
 just on their own anyways.  If a pointer to something is outside of 
 the GC heap, the GC doesn't bother changing it or collecting it or 
 moving anything.


 That's true, but a custom allocator can use the GC heap if it wants. 
 For example, if I know I'll be allocating a million linked list nodes, 
 and will then release them all at once, it can be considerably faster 
 to allocate a large byte[] and use chunks of it for my nodes. But how 
 can the GC tell which of those bytes are pointers and which aren't? At 
 the beginning none are, but later some may be..

 
 Hmmm.  In the case of the allocator I think it is the allocator writer's 
 responsibility to use gc.addRange() on all of the pointers that they 
 toss into the byte[].  What bugs me now though, is other arbitrary data 
 cases, like std.boxer.
 
 So far the only answer I can think of is to make people dealing with 
 arbitrarily typed data be very careful about gc.addRange'ing and 
 gc.removeRange'ing any references that goes into or out of their 
 arbitrarily typed data, which does suck.
 
 Assembler is a bit tricky, maybe someone smarter than I can handle it 
 better, but here's a shot with some psuedoasm:

 struct Foo
 {
   int member1;
   int member2;
 }
 Foo bar;

 ...

 Foo* foo = &bar;
 int extracted;
 // foo spotted in the assembly block, never mind the context
 // as such, foo gets pinned.
 asm
 {
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
 }
 // foo is unpinned here


 I'm sure simple cases can be detected, but am also fairly certain it's 
 impossible to generally determine which registers hold pointers and 
 which don't, and/or what will get referenced and what won't.

 
 Yeah, I realized stuff like ASM that rewrites/rearranges the stack would 
 mess up the GC.  Perhaps we just need a comprehensive list of do's, 
 dont's, and workarounds in D asm programming?
 
 As for C libraries, it seems like the same thing as custom 
 allocators. The C heap is outside of the GC's jurisdiction and won't 
 be moved or manipulated in any way.  C code that handles D objects 
 will have to be careful, and the callee D code will have to pin the 
 objects before the go out into the unknown.


 I meant precision again, not C's heap. Even if compiled D code somehow 
 notifies the GC what's on the stack, the C code definitely won't. 
 Then, how can the GC tell which parts of the stack point to its own 
 heap, and which are just integers that happen to look like they could 
 point to the heap?

 
 It already knows where pointers are in each frame of the stack for D 
 code, can't we make it also know which frames aren't D code at all?  For 
 the C code frames it doesn't look at them at all.  Any D code that could 
 let references out into C code (only way I know of is with extern(C)) 
 should pin that object on the D side and keep a reference until 
 execution has returned to D.
 
 Also, I wonder, if I were to make a tool that does escape analysis on 
 your program, then finds that a number of classes can either be stack 
 allocated or safely deleted after they reach a certain point in the 
 code, then would this change the effectiveness of a generational GC? 
 Perhaps part of why young objects die so often is because they are 
 temporary things that can often be safely deleted at the end of scope 
 or some such.


 Well, I think it's quite hard to guess whether the result would be 
 better or worse, speed-wise. Obviously, allocating those objects on 
 the stack would speed things up, while using generations in the GC 
 would be less effective, because the GC would see less short-lived 
 objects. I've no idea, however, how those two effects would combine.. 
 Most probably it depends very much on the actual application (as do 
 most things ;)


 xs0


Why do you need a 100% moving GC anyway? Wouldn't it possible to combine 
moving GC with mark-and-sweep for those cases (like unions and asm) 
where you just can't be sure? In other words, a 'conservative moving 
gc'. (Might be a nightmare to implement, of course).

Oct 06 2006

Sean Kelly <sean f4.ca> writes:

Lionello Lunesu wrote:
 I've noticed that some design decisions are made with the possibility of 
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
 Is a moving GC really worth the extra trouble?
 
 Being able to move memory blocks reduces memory fragmentation, am I 
 correct? Is this the only reason? (For the remainder of this post, I'm 
 assuming it is.)

Fragmentation shouldn't be an issue with the correct allocator strategy, 
see: http://citeseer.ist.psu.edu/johnstone97memory.html

I think the real reason is to speed up collections, as generational GCs 
(a class of moving GCs) begin by scanning only a segment of allocated 
memory which contains 'new' allocations.  Only if no free space is found 
does it bother to scan older generations.  On each collection, 'new' 
memory that is still alive typically graduates to an older generation. 
The idea behind this strategy is that memory which has been around for a 
while will probably be around for a long time, while a large number of 
allocations (in function temporaries) are discarded and available for 
collection almost immediately.

 Is the extra complexity and run-time overhead of a moving GC worth the 
 trouble, at this point in time?

Probably not, but it's more a goal to make D compatible with moving GCs 
than to write one at the moment.  But doing so is a big pain in some 
cases--default implementations of toHash and opCmp being particularly 
sore points.


Sean

Oct 02 2006

Marcio <mqmnews123 sglebs.com> writes:

Lionello Lunesu wrote:
 I've noticed that some design decisions are made with the possibility of 
 a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
 Is a moving GC really worth the extra trouble?

	Some people claim that a program that runs for a long time (web server, 
servlet container etc) would fragment memory so bad that eventually you 
can run out of RAM even though you still have lots.

	That is not a theory. This has been reported to me by a close friend 
who works for a big company which I will not name. A new JVM with a GC 
that moved objects solved the problem their customer was having.

	A GC that moves objects can compact memory and avoid the fragmentation.

	I also don't see how a moving GC can live with the D features. I know 
that having a non-moving GC can have people produce code that is not 
portable across compiler implementations. I have seen this happen with 
code for SmartEiffel versus ISE Eiffel for example. Using SmartEiffel, 
people cached pointers to Eiffel objects from C DLLs etc. Worked in 
SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC).


marcio

Oct 04 2006

Sean Kelly <sean f4.ca> writes:

Marcio wrote:
 
     I also don't see how a moving GC can live with the D features. I 
 know that having a non-moving GC can have people produce code that is 
 not portable across compiler implementations. I have seen this happen 
 with code for SmartEiffel versus ISE Eiffel for example. Using 
 SmartEiffel, people cached pointers to Eiffel objects from C DLLs etc. 
 Worked in SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC).

I'll admit I've wondered about this.  Without sufficient type 
information the GC has no way of knowing whether a particular 32-bit (or 
64-bit) value that *could be* a pointer actually *is* a pointer and 
therefore should be altered when a move occurs.  Assuming the type 
information is present however, it should be simple enough to state that 
pointers must be typed as such (ie. it is illegal to store a pointer in 
a byte[4], for example).  In fact, I think D already has this 
restriction in place.


Sean

Oct 04 2006

Benji Smith <dlanguage benjismith.net> writes:

Marcio wrote:
     I also don't see how a moving GC can live with the D features. I 
 know that having a non-moving GC can have people produce code that is 
 not portable across compiler implementations.


the "unsafe" keyword. And, any reference types within the method body 
have to be specifically pinned at one memory location (using a "fixed" 
block) whenever a pointer might be accessing that memory.

It works really well. The vast majority of my code uses the default GC 
behavior, but every once in a while, I need to do some pointer 
manipulation, and I temporarily tell the garbage collector to refrain 
from moving my objects.

I think a similar system could work in D.

--Benji Smith

Oct 05 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Is a moving GC really needed?