digitalmars.D - Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

Craig Black (3/3) Mar 16 2008 Interesting article:

BCS (11/15) Mar 16 2008 That test sounds bogus to me. The results for manual memory management a...

Craig Black (14/29) Mar 16 2008 If I understand the article correctly, I think you are missing the main

renoX (13/50) Mar 16 2008 The same authors have made a VM friendly GC, it is linked at the end the...

Craig Black (9/16) Mar 17 2008 Perhaps not. There may be other ways to detect how the VM for virtual

renoX (5/12) Mar 18 2008 I doubt it very much: how would the application know what is the memory ...

Sean Kelly (7/17) Mar 17 2008 There is published research regarding VM-aware garbage collectors, and c...

Sean Kelly (3/5) Mar 17 2008 Oops. That blog links to one of the papers I was thinking of, so I gues...
Craig Black (9/30) Mar 17 2008 I don't think the *nix kernel recompile issue should be a show stopper. ...

Vladimir Panteleev (5/11) Mar 17 2008 Actually, the article gave me an interesting idea of a generational GC. ...
renoX (5/34) Mar 18 2008 Uh? Does Windows VM provides enough information as needed by the GC in t...

Craig Black (9/53) Mar 18 2008 Are you asking me or Sean? Sean made the claim that "Windows isn't terr...

Sean Kelly (8/14) Mar 19 2008 There are a number of papers on this topic--one being the paper linked i...

Frits van Bommel (20/34) Mar 19 2008 IIRC (I read the paper a while back) a full implementation of their idea...

Sean Kelly (5/14) Mar 19 2008 Ah nice. What I remember about the papers I read was that they simply s...

Frits van Bommel (26/40) Mar 19 2008 Yes, they'd need to locate the metadata for the memory block a pointer

Robert Fraser (9/29) Mar 17 2008 What's wrong with having different collectors for different OSes? In

BCS (7/23) Mar 17 2008 I didn't read the whole article (only the segments that were quoted) so

Dan (15/27) Mar 17 2008 I typically use GC, but when I don't it gets linked anyways. Bloat.

renoX (7/8) Mar 18 2008 The question is if the 'where possible' is that big if you have an array...

"Craig Black" <craigblack2 cox.net> writes:

Interesting article:
http://lambda-the-ultimate.org/node/2552

-Craig

Mar 16 2008

BCS <ao pathlink.com> writes:

Reply to Craig,

 Interesting article:
 http://lambda-the-ultimate.org/node/2552
 -Craig
 

That test sounds bogus to me. The results for manual memory management are 
effectively unbeatable. I rather suspect that most program lose the last 
reference to memory about the same time you can tell it won't be used (so 
it's almost as quick as possible there) and it has no overhead at all including 
the accounting needed to keep track of the memory. Add back in the post
processing 
time and then tell us the numbers!

What I want to see is: rebuild a Java app with smart pointer or something, 
 or how about spec a large function block as detailed as you can without 
knowing if you will be using GC or not and then implement it both ways (sever 
times by sever people). Then compare those.

Mar 16 2008

"Craig Black" <craigblack2 cox.net> writes:

"BCS" <ao pathlink.com> wrote in message 
news:55391cb32a83e8ca55bd841aba20 news.digitalmars.com...
 Reply to Craig,

 Interesting article:
 http://lambda-the-ultimate.org/node/2552
 -Craig

 That test sounds bogus to me. The results for manual memory management are 
 effectively unbeatable. I rather suspect that most program lose the last 
 reference to memory about the same time you can tell it won't be used (so 
 it's almost as quick as possible there) and it has no overhead at all 
 including the accounting needed to keep track of the memory. Add back in 
 the post processing time and then tell us the numbers!

 What I want to see is: rebuild a Java app with smart pointer or something, 
 or how about spec a large function block as detailed as you can without 
 knowing if you will be using GC or not and then implement it both ways 
 (sever times by sever people). Then compare those.

If I understand the article correctly, I think you are missing the main 
point of it.  It is not merely saying ,"We ran conclusive test: GC loses to 
explicit memory management!"  They went much further than that, and 
identified the primary bottleneck in modern GC implementations.  Namely, 
that GC doesn't cooperate well with virtual memory.  GC typically scans way 
more memory than the rest of the application, and consequently causes the 
most cache misses/page faults.  This article is not bashing GC.  I think it 
is indicating to us how we can best improve it.  Perhaps some GC algorithm 
could be developed that is optimized specifically to produce the fewer page 
faults and cache misses.

Thoughts?

-Craig

Mar 16 2008

renoX <renosky free.fr> writes:

Craig Black a �crit :
 "BCS" <ao pathlink.com> wrote in message 
 news:55391cb32a83e8ca55bd841aba20 news.digitalmars.com...
 Reply to Craig,

 Interesting article:
 http://lambda-the-ultimate.org/node/2552
 -Craig

 That test sounds bogus to me. The results for manual memory management 
 are effectively unbeatable. I rather suspect that most program lose 
 the last reference to memory about the same time you can tell it won't 
 be used (so it's almost as quick as possible there) and it has no 
 overhead at all including the accounting needed to keep track of the 
 memory. Add back in the post processing time and then tell us the 
 numbers!

 What I want to see is: rebuild a Java app with smart pointer or 
 something, or how about spec a large function block as detailed as you 
 can without knowing if you will be using GC or not and then implement 
 it both ways (sever times by sever people). Then compare those.

 
 If I understand the article correctly, I think you are missing the main 
 point of it.  It is not merely saying ,"We ran conclusive test: GC loses 
 to explicit memory management!"  They went much further than that, and 
 identified the primary bottleneck in modern GC implementations.  Namely, 
 that GC doesn't cooperate well with virtual memory.  GC typically scans 
 way more memory than the rest of the application, and consequently 
 causes the most cache misses/page faults.  This article is not bashing 
 GC.  I think it is indicating to us how we can best improve it.  Perhaps 
 some GC algorithm could be developed that is optimized specifically to 
 produce the fewer page faults and cache misses.
 
 Thoughts?


The same authors have made a VM friendly GC, it is linked at the end the 
article..
http://lambda-the-ultimate.org/node/2391

The are two drawback of their GC:
1) The OS needs to be modified so that the VM and the GC communicate.

2) It is a moving GC, in the original paper they compares their GC with 
moving and non-moving variant and the moving one win.

For Java this isn't an issue, but for D it is: working with C's library 
wouldn't work if the GC is a moving one..

I don't know how to fix this one..

Regards,
renoX



 
 -Craig

Mar 16 2008

"Craig Black" <cblack ara.com> writes:

 The are two drawback of their GC:
 1) The OS needs to be modified so that the VM and the GC communicate.

Perhaps not.  There may be other ways to detect how the VM for virtual 
memory (not virtual machine) works without specific OS queries.  It could be 
part of the application initialization to determine this.  I don't know 
enough details to know whether this is possible.

 2) It is a moving GC, in the original paper they compares their GC with 
 moving and non-moving variant and the moving one win.

 For Java this isn't an issue, but for D it is: working with C's library 
 wouldn't work if the GC is a moving one..

 I don't know how to fix this one..

Perhaps we need to transition away from reliance on non-GC friendly C 
library functions?
IMO, a moving GC for D should a high priority, and well worth the extra 
work.

-Craig

Mar 17 2008

renoX <renosky free.fr> writes:

Craig Black Wrote:

 The are two drawback of their GC:
 1) The OS needs to be modified so that the VM and the GC communicate.

 
 Perhaps not.  There may be other ways to detect how the VM for virtual 
 memory (not virtual machine) works without specific OS queries.  It could be 
 part of the application initialization to determine this.  I don't know 
 enough details to know whether this is possible.

I doubt it very much: how would the application know what is the memory
pressure without querying the OS in some way or another?

Their paper is quite detailed about how their GC works..

Regards,
renoX

Mar 18 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from renoX (renosky free.fr)'s article
 The same authors have made a VM friendly GC, it is linked at the end the
 article..
 http://lambda-the-ultimate.org/node/2391
 The are two drawback of their GC:
 1) The OS needs to be modified so that the VM and the GC communicate.
 2) It is a moving GC, in the original paper they compares their GC with
 moving and non-moving variant and the moving one win.
 For Java this isn't an issue, but for D it is: working with C's library
 wouldn't work if the GC is a moving one..
 I don't know how to fix this one..

There is published research regarding VM-aware garbage collectors, and creating
one for D wouldn't be
terribly difficult.  Portability is more of an issue, as you've mentioned.  I
believe plugging one into
Windows isn't terribly difficult, but for *nix it can require a kernel
re-compile to get access to the
proper hooks.  Because of this, I don't think that VM-aware GCs are the best
general purpose solution,
even though their performance can be quite good.


Sean

Mar 17 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Sean Kelly (sean invisibleduck.org)'s article
 There is published research regarding VM-aware garbage collectors, and
creating one for D wouldn't be
 terribly difficult.

Oops.  That blog links to one of the papers I was thinking of, so I guess
you're already aware of it :-)


Sean

Mar 17 2008

"Craig Black" <cblack ara.com> writes:

"Sean Kelly" <sean invisibleduck.org> wrote in message 
news:frmhe4$20jp$1 digitalmars.com...
 == Quote from renoX (renosky free.fr)'s article
 The same authors have made a VM friendly GC, it is linked at the end the
 article..
 http://lambda-the-ultimate.org/node/2391
 The are two drawback of their GC:
 1) The OS needs to be modified so that the VM and the GC communicate.
 2) It is a moving GC, in the original paper they compares their GC with
 moving and non-moving variant and the moving one win.
 For Java this isn't an issue, but for D it is: working with C's library
 wouldn't work if the GC is a moving one..
 I don't know how to fix this one..

 There is published research regarding VM-aware garbage collectors, and 
 creating one for D wouldn't be
 terribly difficult.  Portability is more of an issue, as you've mentioned. 
 I believe plugging one into
 Windows isn't terribly difficult, but for *nix it can require a kernel 
 re-compile to get access to the
 proper hooks.  Because of this, I don't think that VM-aware GCs are the 
 best general purpose solution,
 even though their performance can be quite good.


 Sean

I don't think the *nix kernel recompile issue should be a show stopper.  I 
would be very happy with a prototype that was Windows only.  Once we have it 
working and can measure the benefits, then tackle the *nix issues.  A fast 
GC would really help D to be competitive.  It would be nice if the D 
community was leading GC innovation (like Tango just did with the XML 
parser).

-Craig

Mar 17 2008

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 17 Mar 2008 22:23:53 +0200, Craig Black <cblack ara.com> wrote:

 I don't think the *nix kernel recompile issue should be a show stopper.  I
 would be very happy with a prototype that was Windows only.  Once we have it
 working and can measure the benefits, then tackle the *nix issues.  A fast
 GC would really help D to be competitive.  It would be nice if the D
 community was leading GC innovation (like Tango just did with the XML
 parser).

Actually, the article gave me an interesting idea of a generational GC. I may
post a prototype today (Windows-only for now).

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Mar 17 2008

renoX <renosky free.fr> writes:

Craig Black Wrote:
 "Sean Kelly" <sean invisibleduck.org> wrote in message 
 news:frmhe4$20jp$1 digitalmars.com...
 == Quote from renoX (renosky free.fr)'s article
 The same authors have made a VM friendly GC, it is linked at the end the
 article..
 http://lambda-the-ultimate.org/node/2391
 The are two drawback of their GC:
 1) The OS needs to be modified so that the VM and the GC communicate.
 2) It is a moving GC, in the original paper they compares their GC with
 moving and non-moving variant and the moving one win.
 For Java this isn't an issue, but for D it is: working with C's library
 wouldn't work if the GC is a moving one..
 I don't know how to fix this one..

 There is published research regarding VM-aware garbage collectors, and 
 creating one for D wouldn't be
 terribly difficult.  Portability is more of an issue, as you've mentioned. 
 I believe plugging one into
 Windows isn't terribly difficult, but for *nix it can require a kernel 
 re-compile to get access to the
 proper hooks.  Because of this, I don't think that VM-aware GCs are the 
 best general purpose solution,
 even though their performance can be quite good.


 Sean

 
 I don't think the *nix kernel recompile issue should be a show stopper.  I 
 would be very happy with a prototype that was Windows only. 

Uh? Does Windows VM provides enough information as needed by the GC in the
paper or are you thinking about a totally different GC?

For the records, the authors of the GC in the paper submitted their VM patch
for inclusion in the standard Linux kernel, but they didn't manage to convince
kernel maintainers to include it.. At the time, the JVM was proprietary, if
their GC goes into the JVM now that it is GPL maybe it would be a stronger
argument.

Regards,
renoX

Mar 18 2008

"Craig Black" <craigblack2 cox.net> writes:

"renoX" <renosky free.fr> wrote in message 
news:frnvkn$2na5$1 digitalmars.com...
 Craig Black Wrote:
 "Sean Kelly" <sean invisibleduck.org> wrote in message
 news:frmhe4$20jp$1 digitalmars.com...
 == Quote from renoX (renosky free.fr)'s article
 The same authors have made a VM friendly GC, it is linked at the end 
 the
 article..
 http://lambda-the-ultimate.org/node/2391
 The are two drawback of their GC:
 1) The OS needs to be modified so that the VM and the GC communicate.
 2) It is a moving GC, in the original paper they compares their GC 
 with
 moving and non-moving variant and the moving one win.
 For Java this isn't an issue, but for D it is: working with C's 
 library
 wouldn't work if the GC is a moving one..
 I don't know how to fix this one..

 There is published research regarding VM-aware garbage collectors, and
 creating one for D wouldn't be
 terribly difficult.  Portability is more of an issue, as you've 
 mentioned.
 I believe plugging one into
 Windows isn't terribly difficult, but for *nix it can require a kernel
 re-compile to get access to the
 proper hooks.  Because of this, I don't think that VM-aware GCs are the
 best general purpose solution,
 even though their performance can be quite good.


 Sean

 I don't think the *nix kernel recompile issue should be a show stopper. 
 I
 would be very happy with a prototype that was Windows only.

 Uh? Does Windows VM provides enough information as needed by the GC in the 
 paper or are you thinking about a totally different GC?

Are you asking me or Sean?  Sean made the claim that "Windows isn't terribly 
difficult".  If he's right, then Windows provides some sort of provision for 
VM querying for stuff like this.

 For the records, the authors of the GC in the paper submitted their VM 
 patch for inclusion in the standard Linux kernel, but they didn't manage 
 to convince kernel maintainers to include it.. At the time, the JVM was 
 proprietary, if their GC goes into the JVM now that it is GPL maybe it 
 would be a stronger argument.

 Regards,
 renoX

It makes sense to me to solve the problem in Windows first (since Sean says 
it won't be that bad), and then tackle the Unix/Linux kernel recompile 
issue.

-Craig

Mar 18 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Craig Black (craigblack2 cox.net)'s article
 "renoX" <renosky free.fr> wrote in message
 Uh? Does Windows VM provides enough information as needed by the GC in the
 paper or are you thinking about a totally different GC?

 Are you asking me or Sean?  Sean made the claim that "Windows isn't terribly
 difficult".  If he's right, then Windows provides some sort of provision for
 VM querying for stuff like this.

There are a number of papers on this topic--one being the paper linked in that
blog and I believe Hans
Boehm wrote another one.  From what I remember of them, I believe they said
that Windows exposes the
necessary hooks already, but a kernel recompile was necessary to do the same
thing on Linux.  That
sounds right as well, since a kernel recompile isn't really possible on Windows
anyway, so either the
feature would be available or it wouldn't.  The real work would be a
fundamental rewrite of how the
mark phase operates.  For a dedicated programmer, I'd estimate it would take
about a week.


Sean

Mar 19 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 == Quote from Craig Black (craigblack2 cox.net)'s article
 "renoX" <renosky free.fr> wrote in message
 Uh? Does Windows VM provides enough information as needed by the GC in the
 paper or are you thinking about a totally different GC?

 Are you asking me or Sean?  Sean made the claim that "Windows isn't terribly
 difficult".  If he's right, then Windows provides some sort of provision for
 VM querying for stuff like this.

 
 There are a number of papers on this topic--one being the paper linked in that
blog and I believe Hans
 Boehm wrote another one.  From what I remember of them, I believe they said
that Windows exposes the
 necessary hooks already, but a kernel recompile was necessary to do the same
thing on Linux.  That
 sounds right as well, since a kernel recompile isn't really possible on
Windows anyway, so either the
 feature would be available or it wouldn't.  The real work would be a
fundamental rewrite of how the
 mark phase operates.  For a dedicated programmer, I'd estimate it would take
about a week.

IIRC (I read the paper a while back) a full implementation of their idea 
is a moving GC[1] which may take a bit longer to implement. It not only 
impacts the mark phase, but also the sweep phase[2] and it requires 
compiler modifications as well; it'll need to supply the GC with exact 
information about pointers in static data, on the heap and on the stack.
Though I might be wrong; anyone care to implement this in GDC (+ gphobos 
and/or tango)? :)


[1]: IIRC: When the OS indicates to their GC that it's about to swap 
some of their pages out, it starts a collection and attempts to move the 
live objects from multiple partially-used pages into fewer pages so the 
now-free pages can be released to the OS to prevent the swapping from 
taking place (or swap out the now-empty page they'll then avoid using, 
I'm not sure exactly).
The cool thing is that they apparently managed to get that working 
pretty well while mostly avoiding pages being swapped back in because of 
collection activity.

[2]: The equivalent of the current sweep phase will need to move objects 
and update all pointers to them, without modifying values that only look 
like pointers.

Mar 19 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Frits van Bommel (fvbommel REMwOVExCAPSs.nl)'s article
 [1]: IIRC: When the OS indicates to their GC that it's about to swap
 some of their pages out, it starts a collection and attempts to move the
 live objects from multiple partially-used pages into fewer pages so the
 now-free pages can be released to the OS to prevent the swapping from
 taking place (or swap out the now-empty page they'll then avoid using,
 I'm not sure exactly).
 The cool thing is that they apparently managed to get that working
 pretty well while mostly avoiding pages being swapped back in because of
 collection activity.

Ah nice.  What I remember about the papers I read was that they simply scanned
through pages being
swapped out to bookmark any data being referenced.  However, it seems like
doing things this way
would require every pointer to be evaluated before being followed to avoid
hitting an inactive page.


Sean

Mar 19 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 == Quote from Frits van Bommel (fvbommel REMwOVExCAPSs.nl)'s article
 [1]: IIRC: When the OS indicates to their GC that it's about to swap
 some of their pages out, it starts a collection and attempts to move the
 live objects from multiple partially-used pages into fewer pages so the
 now-free pages can be released to the OS to prevent the swapping from
 taking place (or swap out the now-empty page they'll then avoid using,
 I'm not sure exactly).
 The cool thing is that they apparently managed to get that working
 pretty well while mostly avoiding pages being swapped back in because of
 collection activity.

 
 Ah nice.  What I remember about the papers I read was that they simply scanned
through pages being
 swapped out to bookmark any data being referenced.  However, it seems like
doing things this way
 would require every pointer to be evaluated before being followed to avoid
hitting an inactive page.

Yes, they'd need to locate the metadata for the memory block a pointer 
refers to. I remember it being stored in the first page of the 
'superpage' containing the block (a superpage is an aligned block of 2^N 
pages for some constant N, essentially a pool of memory blocks of a 
certain size with some metadata[1] in front of it).
They set it up so that they never allowed those first pages to be 
swapped out. Since superpages are aligned a pointer to a GC-ed object 
can be converted into a superpage pointer by simply masking out the 
lower bits, but they still need to verify that it is in fact a valid 
superpage (i.e. that it *is* a GC-ed object) which would require some 
sort of lookup (hashtable?) on that pointer. IIRC they used 16-page 
superpages, requiring one lookup table entry per 64 KiB allocated (minus 
superpage metadata).


[1]: IIRC they considered complete metadata (i.e. storing for each 
objects the number of references from swapped-out pages) to be too 
costly, so they instead implemented a partial scheme: A single reference 
count per page (or superpage, I'm not sure) and a bit "was this ever 
referenced from swap" per object, only clearing those bits when the 
reference counter for the containing (super?)page reached zero.
They then only allowed the GC to move objects when their bit was clear. 
(But they could be moved just before the bit was set, so they could be 
put into pages that already had such objects in them in an attempt to 
limit the number of such pages. This allows more choice for the GC when 
it comes time to decide which pages to empty when another swap-out is 
imminent)

Mar 19 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Sean Kelly wrote:
 == Quote from renoX (renosky free.fr)'s article
 The same authors have made a VM friendly GC, it is linked at the end the
 article..
 http://lambda-the-ultimate.org/node/2391
 The are two drawback of their GC:
 1) The OS needs to be modified so that the VM and the GC communicate.
 2) It is a moving GC, in the original paper they compares their GC with
 moving and non-moving variant and the moving one win.
 For Java this isn't an issue, but for D it is: working with C's library
 wouldn't work if the GC is a moving one..
 I don't know how to fix this one..

 
 There is published research regarding VM-aware garbage collectors, and
creating one for D wouldn't be
 terribly difficult.  Portability is more of an issue, as you've mentioned.  I
believe plugging one into
 Windows isn't terribly difficult, but for *nix it can require a kernel
re-compile to get access to the
 proper hooks.  Because of this, I don't think that VM-aware GCs are the best
general purpose solution,
 even though their performance can be quite good.
 
 
 Sean

What's wrong with having different collectors for different OSes? In 
fact, with Tango's pluggable GC, I can see it being very easy to get 
collectors tailor-made to the abuse a particular piece of software plans 
to put on them. For example, it would be possible to create a moving 
collector or metronomic collector and compile that in. That would impose 
a number of restrictions/requirements on the programmer which a general 
mark-and-sweep would not, but if the user was explicitly compiling that 
in, he would be aware of that.

Mar 17 2008

BCS <BCS pathlink.com> writes:

Craig Black wrote:
 
 If I understand the article correctly, I think you are missing the main 
 point of it.  It is not merely saying ,"We ran conclusive test: GC loses 
 to explicit memory management!"  They went much further than that, and 
 identified the primary bottleneck in modern GC implementations.  Namely, 
 that GC doesn't cooperate well with virtual memory.  GC typically scans 
 way more memory than the rest of the application, and consequently 
 causes the most cache misses/page faults.  This article is not bashing 
 GC.  I think it is indicating to us how we can best improve it.  Perhaps 
 some GC algorithm could be developed that is optimized specifically to 
 produce the fewer page faults and cache misses.
 

I didn't read the whole article (only the segments that were quoted) so 
I may have missed the point. The quoting article seem to me to be using 
this to claim that GC looses. If that's not the main point... Ok, my bad.

 Thoughts?

I built a version of lisp a few months back that needed some sort of 
libsegv library for it's GC. it sounds like it might be of use. 
Something like only scanning pages that have changed and caching the rest.

 
 -Craig

Mar 17 2008

Dan <murpsoft hotmail.com> writes:

BCS Wrote:

 Craig Black wrote:
 
 If I understand the article correctly, I think you are missing the main 
 point of it.  It is not merely saying ,"We ran conclusive test: GC loses 
 to explicit memory management!"  They went much further than that, and 
 identified the primary bottleneck in modern GC implementations.  Namely, 
 that GC doesn't cooperate well with virtual memory.  GC typically scans 
 way more memory than the rest of the application, and consequently 
 causes the most cache misses/page faults.  This article is not bashing 
 GC.  I think it is indicating to us how we can best improve it.  Perhaps 
 some GC algorithm could be developed that is optimized specifically to 
 produce the fewer page faults and cache misses.


I typically use GC, but when I don't it gets linked anyways.  Bloat.

As for preventing cache/page misses...

Some blocks will be in cache (a), some pages will be in memory (b), and some
will be on swap (c).
Set (b) and set (c) can be differentiated by OS accounting.  
(a) is a subset of (b) does not overlap (c).
Set (a) and set (b) cannot be differentiated by any explicit accounting.

- Any system which allowed you to not analyze memory stored on swap will
definitely improve performance.

- Performing a cache miss causes the requested data not to be available for a
guestimated 100 cycles.  Prefetching batches of memory that you want to
analyze, and analyzing it as it actually arrives means you could dramatically
reduce cache miss overhead.

- Not allowing pointers to be implicitly cast from other types might allow you
to store pointers in the GC to all pointers, and store simple patterns for
arrays of structs containing pointers and arrays of pointers.  This could let
you skip any pages that clearly don't have pointer information.

This would become sensible if pointers were treated as a typedef'd fully
qualified integer type (with shift, add etc working on them), and an alias of
'*'.  As we found with C, allowing one type to be used to mean two different
things is bad, so I would recommend a p32 and a p64.

In fact, causing all pointers and array {ptr,length} structs to be colocated
where possible in a program would *dramatically* improve GC performance as only
a few pages would ever need to be brought into cache during a GC phase - the
same that would probably already be in cache during normal program operation.  

This is probably already true; but the effect would be more dramatic if the GC
didn't need to analyze the contents of a pointer because it knew the contents
themselves didn't contain another pointer?

Best Regards,
Dan

Mar 17 2008

renoX <renosky free.fr> writes:

Dan Wrote:
 In fact, causing all pointers and array {ptr,length} structs to be colocated
where possible in a program would *dramatically* improve GC performance as only
a few pages would ever need to be brought into cache during a GC phase - the
same that would probably already be in cache during normal program operation.

The question is if the 'where possible' is that big if you have an array of
objects which contain pointers, then it isn't possible to colocate them with
pointers..

Note that many allocators sort objects by their size so an array {ptr,length}
and its data would be usually on different pages already.
  
About the typing issue, yes having strong typing help a lot a GC, but this
would imply that interoperability with C would have to use a JNI-like system
instead of the easy linking that we have now as C is weakly typed..

Regards,
renoX

Mar 18 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Quantifying the Performance of Garbage Collection vs. Explicit Memory Management