digitalmars.D.announce - Transitioning to a type aware Garbage Collector

Walter Bright (18/18) Jan 22 2007 To improve GC performance, we need to transition to a GC that is aware

Sean Kelly (4/7) Jan 22 2007 I very much agree. This is already an assumption in Tango and a failure...
Bill Baxter (5/30) Jan 23 2007 Yay! Will there also be explicit calls to take specific ranges out of
Andrei Alexandrescu (See Website For Email) (5/30) Jan 22 2007 Sounds great. Any possibility to type memory post factum (useful in e.g....

Walter Bright (2/5) Jan 22 2007 Looks like that will have to be implemented.

kris (2/10) Jan 22 2007 Tango could certainly use this in its IO-Protocol layer
Chris Miller (20/25) Jan 23 2007 What about allocating a bigger chunk and dividing it up as a few types?

Frits van Bommel (32/47) Jan 23 2007 counter-example:

Kirk McDonald (6/31) Jan 22 2007 Very nice!
Kyle Furlong (10/35) Jan 22 2007 As this move happens, some interesting possibilities open for more

Sean Kelly (5/45) Jan 22 2007 From what little I know of the approach, I agree. A strongly typed

janderson (3/5) Jan 22 2007 This is awesome. This is my number one wish for D. I can't wait.
kmk (2/27) Jan 22 2007 Good stuff.
Paul Findlay (4/6) Jan 23 2007 Does this mean the D garbage collector will be an exact (or precise)

Sean Kelly (10/15) Jan 23 2007 If a block contains pointers then the entire block will still be scanned...

Frits van Bommel (41/58) Jan 23 2007 I thought that by "a GC that is aware of the types of what it is
Pragma (10/26) Jan 23 2007 I was going to say: it sounds an awful lot like Walter is going to have ...

Sean Kelly (12/43) Jan 23 2007 That's basically it. The Tango GC already works this way, but it keys

Walter Bright (2/7) Jan 23 2007 No.

Frits van Bommel (8/16) Jan 23 2007 So what exactly *does* it mean?

Walter Bright (5/23) Jan 23 2007 All it means is that a bit gets set per block meaning if it might

Pragma (14/40) Jan 23 2007 Just shooting from the hip here: one possibility for such a list that wo...

Frits van Bommel (36/64) Jan 23 2007 I think there should also be a way to annotate as "array, repeat". This

Kyle Furlong (4/30) Jan 23 2007 Does this incremental change include the RTTI support needed to

Walter Bright (3/6) Jan 23 2007 It's most of the info, but not quite all of it. I plan to add the rest

Chad J (2/27) Jan 24 2007

Walter Bright <newshound digitalmars.com> writes:

To improve GC performance, we need to transition to a GC that is aware 
of the types of what it is allocating.

So problems can happen if the type of allocated memory is changed after 
it is allocated, via casting. The way to avoid this is to allocate as 
void[] memory that will be cast, as in:

struct Foo { ... };
Foo[] f;

p = new void[100];
f = cast(Foo[])p;	// ok
byte[] b = cast(byte[])p;
f = cast(Foo[])b;	// ok, GC still regards memory as void[]

rather than:

p = new byte[100];
f = cast(Foo[])p;	// will likely eventually corrupt memory

The GC will regard void[] as a chunk of memory containing arbitrary data 
of arbitrary types, and so will treat it conservatively.

In general, regard memory allocated by the GC as anything other than 
void[] as being *strongly typed*.

Jan 22 2007

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 
 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

I very much agree.  This is already an assumption in Tango and a failure 
to comply will result in undefined behavior.


Sean

Jan 22 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.
 
 So problems can happen if the type of allocated memory is changed after 
 it is allocated, via casting. The way to avoid this is to allocate as 
 void[] memory that will be cast, as in:
 
 struct Foo { ... };
 Foo[] f;
 
 p = new void[100];
 f = cast(Foo[])p;    // ok
 byte[] b = cast(byte[])p;
 f = cast(Foo[])b;    // ok, GC still regards memory as void[]
 
 rather than:
 
 p = new byte[100];
 f = cast(Foo[])p;    // will likely eventually corrupt memory
 
 The GC will regard void[] as a chunk of memory containing arbitrary data 
 of arbitrary types, and so will treat it conservatively.
 
 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

Yay!  Will there also be explicit calls to take specific ranges out of 
the scan list if needed? (Or to tell the GC what type a void-allocated 
chunk has changed to.)

--bb

Jan 23 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.
 
 So problems can happen if the type of allocated memory is changed after 
 it is allocated, via casting. The way to avoid this is to allocate as 
 void[] memory that will be cast, as in:
 
 struct Foo { ... };
 Foo[] f;
 
 p = new void[100];
 f = cast(Foo[])p;    // ok
 byte[] b = cast(byte[])p;
 f = cast(Foo[])b;    // ok, GC still regards memory as void[]
 
 rather than:
 
 p = new byte[100];
 f = cast(Foo[])p;    // will likely eventually corrupt memory
 
 The GC will regard void[] as a chunk of memory containing arbitrary data 
 of arbitrary types, and so will treat it conservatively.
 
 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

Sounds great. Any possibility to type memory post factum (useful in e.g. 
implementing allocators)? That is, allocate memory as void[] and then 
mark it to be of type Foo.

Andrei

Jan 22 2007

Walter Bright <newshound digitalmars.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Sounds great. Any possibility to type memory post factum (useful in e.g. 
 implementing allocators)? That is, allocate memory as void[] and then 
 mark it to be of type Foo.

Looks like that will have to be implemented.

Jan 22 2007

kris <foo bar.com> writes:

Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 
 Sounds great. Any possibility to type memory post factum (useful in 
 e.g. implementing allocators)? That is, allocate memory as void[] and 
 then mark it to be of type Foo.

 
 
 Looks like that will have to be implemented.

Tango could certainly use this in its IO-Protocol layer

Jan 22 2007

"Chris Miller" <chris dprogramming.com> writes:

On Mon, 22 Jan 2007 18:38:05 -0500, Walter Bright  
<newshound digitalmars.com> wrote:

 Andrei Alexandrescu (See Website For Email) wrote:
 Sounds great. Any possibility to type memory post factum (useful in  
 e.g. implementing allocators)? That is, allocate memory as void[] and  
 then mark it to be of type Foo.

 Looks like that will have to be implemented.

What about allocating a bigger chunk and dividing it up as a few types?

I have an idea how to make this work, all memory allocated could have  
another chunk associated with it that denotes what are pointers. It would  
only need to use one bit per pointer size.

e.g.
struct Foo
{
    int x;
    int* y;
}
would only need 1 byte to track pointer info:  01000000
with a few bits either wasted or used for the allocation at the next  
address.

void[] would probably have all bits 1, and any non-pointer, like byte[],  
would be all bits 0.

This lets you very easily reuse memory. Memory reuse is important for some  
things, such as with realtime programming or general use of memory pools.

- Chris

Jan 23 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Chris Miller wrote:
 On Mon, 22 Jan 2007 18:38:05 -0500, Walter Bright 
 <newshound digitalmars.com> wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:
 Sounds great. Any possibility to type memory post factum (useful in 
 e.g. implementing allocators)? That is, allocate memory as void[] and 
 then mark it to be of type Foo.

 Looks like that will have to be implemented.

 
 What about allocating a bigger chunk and dividing it up as a few types?
 
 I have an idea how to make this work, all memory allocated could have 
 another chunk associated with it that denotes what are pointers. It 
 would only need to use one bit per pointer size.

counter-example:

union Tree
{
     int leaf;
     Tree* inner_node;
}

You need 2 bits to catch all cases :P.
(At least, if you want to allow a moving GC; your scheme should work for 
a non-moving collector)



A while back I thought up the following encoding for a moving GC:

Two bits: "follow" and "adjustable"

follow   adjustable    Meaning:
    0        0          Not a pointer.
    0        1          Weak pointer[1]
    1        0          May be a pointer[2]
    1        1          A normal pointer

follow: The GC follows these references to determine reachability
adjustable: The GC may adjust this value. A moving GC may only move an 
object if all references to it are adjustable.

(You can also think of "follow" as "GC may read this" and "adjustable" 
as "GC may write this")


[1]: Possible future feature. These should be set to null if the 
referenced object is collected.
I'm not yet sure what to do on explicit deletion; would it be too much 
trouble to require weak pointers to be nulled in that case?
For Object instances, you could use the Object.notify[Un]Register 
functions to achieve that. Unfortunately, that solution doesn't work for 
anything but class instances.
Also: weak pointers should probably be disallowed from occurring in unions.

[2]: For "mixed" union locations, void[], etc.
Can possibly also be used (temporarily) for pinned normal pointers.

Jan 23 2007

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.
 
 So problems can happen if the type of allocated memory is changed after 
 it is allocated, via casting. The way to avoid this is to allocate as 
 void[] memory that will be cast, as in:
 
 struct Foo { ... };
 Foo[] f;
 
 p = new void[100];
 f = cast(Foo[])p;    // ok
 byte[] b = cast(byte[])p;
 f = cast(Foo[])b;    // ok, GC still regards memory as void[]
 
 rather than:
 
 p = new byte[100];
 f = cast(Foo[])p;    // will likely eventually corrupt memory
 
 The GC will regard void[] as a chunk of memory containing arbitrary data 
 of arbitrary types, and so will treat it conservatively.
 
 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

Very nice!

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org

Jan 22 2007

Kyle Furlong <kylefurlong gmail.com> writes:

Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.
 
 So problems can happen if the type of allocated memory is changed after 
 it is allocated, via casting. The way to avoid this is to allocate as 
 void[] memory that will be cast, as in:
 
 struct Foo { ... };
 Foo[] f;
 
 p = new void[100];
 f = cast(Foo[])p;    // ok
 byte[] b = cast(byte[])p;
 f = cast(Foo[])b;    // ok, GC still regards memory as void[]
 
 rather than:
 
 p = new byte[100];
 f = cast(Foo[])p;    // will likely eventually corrupt memory
 
 The GC will regard void[] as a chunk of memory containing arbitrary data 
 of arbitrary types, and so will treat it conservatively.
 
 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

As this move happens, some interesting possibilities open for more 
efficient garbage collection. One innovation is CBGC, connectivity based 
garbage collection.

The paper describing this class of algorithm can be found here: 
http://portal.acm.org/citation.cfm?id=949337&coll=GUIDE&dl=ACM&CFID=6079566&CFTOKEN=11708043

I have done an extensive review of the literature regarding garbage 
collection in compiled languages (and others) and this seems to me to be 
the best bet for a modern GC for D.

Thoughts?

Jan 22 2007

Sean Kelly <sean f4.ca> writes:

Kyle Furlong wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.

 So problems can happen if the type of allocated memory is changed 
 after it is allocated, via casting. The way to avoid this is to 
 allocate as void[] memory that will be cast, as in:

 struct Foo { ... };
 Foo[] f;

 p = new void[100];
 f = cast(Foo[])p;    // ok
 byte[] b = cast(byte[])p;
 f = cast(Foo[])b;    // ok, GC still regards memory as void[]

 rather than:

 p = new byte[100];
 f = cast(Foo[])p;    // will likely eventually corrupt memory

 The GC will regard void[] as a chunk of memory containing arbitrary 
 data of arbitrary types, and so will treat it conservatively.

 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

 
 As this move happens, some interesting possibilities open for more 
 efficient garbage collection. One innovation is CBGC, connectivity based 
 garbage collection.
 
 The paper describing this class of algorithm can be found here: 
 http://portal.acm.org/citation.cfm?id=949337&coll=GUIDE&dl=ACM&CFID=6079
66&CFTOKEN=11708043 
 
 
 I have done an extensive review of the literature regarding garbage 
 collection in compiled languages (and others) and this seems to me to be 
 the best bet for a modern GC for D.
 
 Thoughts?

 From what little I know of the approach, I agree.  A strongly typed 
moving GC may be a bit unrealistic, but CBGB seems a pretty solid 
compromise.


Sean

Jan 22 2007

janderson <askme me.com> writes:

Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.

This is awesome.  This is my number one wish for D.  I can't wait.

-Joel

Jan 22 2007

kmk <kmk200us yahoo.com> writes:

Walter Bright Wrote:

 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.
 
 So problems can happen if the type of allocated memory is changed after 
 it is allocated, via casting. The way to avoid this is to allocate as 
 void[] memory that will be cast, as in:
 
 struct Foo { ... };
 Foo[] f;
 
 p = new void[100];
 f = cast(Foo[])p;	// ok
 byte[] b = cast(byte[])p;
 f = cast(Foo[])b;	// ok, GC still regards memory as void[]
 
 rather than:
 
 p = new byte[100];
 f = cast(Foo[])p;	// will likely eventually corrupt memory
 
 The GC will regard void[] as a chunk of memory containing arbitrary data 
 of arbitrary types, and so will treat it conservatively.
 
 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

Good stuff.

Jan 22 2007

Paul Findlay <r.lph50+d gmail.com> writes:

Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.

Does this mean the D garbage collector will be an exact (or precise) 
garbage collector?

  - Paul

Jan 23 2007

Sean Kelly <sean f4.ca> writes:

Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

If a block contains pointers then the entire block will still be scanned 
instead of just the pointers in that block, so It will still be a 
conservative collector but with fewer false positives.  I think this 
could be changed so that just the pointers were scanned, but there would 
likely be a noticeable performance hit for doing so.  If false positives 
are still a concern (and I'm not convinced they will be), a better 
approach may be to pursue a different GC design such as the CBGC 
mentioned yesterday.


Sean

Jan 23 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

 
 If a block contains pointers then the entire block will still be scanned 
 instead of just the pointers in that block, so It will still be a 
 conservative collector but with fewer false positives.

I thought that by "a GC that is aware of the types of what it is 
allocating" Walter meant a (mostly) precise GC.

In most cases, the locations of pointers can be deduced from the type, 
as long as the compiler generates appropriate RTTI. The exceptions (that 
I'm aware of) are unions with overlapping pointers and non-pointers, and 
void[]s (deliberately).

Even types of variables in stack frames can be determined I think, as 
long as you format them in the standard way (return eip & previous ebp 
at the start). Then it's basically a linked list:
-----
struct StackFrame
{
     // local variables are *before* this data (stack grows down)
     StackFrame* ebp;    // pointer to next StackFrame
     size_t eip;         // return address
}
-----
If you do a lookup on eip you should be able to determine what variables 
(or at least their "pointerness") the stack frame contains, with some 
help from compiler-generated metadata. Unknown stack frames (and data 
between the defined ranges of two consecutive stackframes) should be 
considered ambiguous.
Again, ambiguous union members and (static) void arrays are exceptions.

An alternative to eip-based lookup would be to push an additional value 
at the beginning of each function call, but that would
(a) break with non-D functions
and
(b) cost performance outside of the GC (even if the GC is disabled)

 I think this 
 could be changed so that just the pointers were scanned, but there would 
 likely be a noticeable performance hit for doing so.

I'm not so sure about that "noticeable performance hit" you speak of. I 
think that would depend on the format of the RTTI and the relative 
amounts of pointers & non-pointers. (the latter is application-specific, 
by the way...)
To be sure about this we'd need to run some benchmarks and see how it 
fares in practice...

 If false positives 
 are still a concern (and I'm not convinced they will be), a better 
 approach may be to pursue a different GC design such as the CBGC 
 mentioned yesterday.

False positives are a concern for some people; look at the thread "The 
problem with the D GC" in digitalmars.D[1]. I'm pretty sure you've seen 
it already. For one thing, it contains about 10 of your posts ;).

Note: I haven't read that article on CBGC yet, but I plan to.


[1]: digitalmars.D:46407, 
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=46407

Jan 23 2007

Pragma <ericanderton yahoo.removeme.com> writes:

Sean Kelly wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

 
 If a block contains pointers then the entire block will still be scanned 
 instead of just the pointers in that block, so It will still be a 
 conservative collector but with fewer false positives.  I think this 
 could be changed so that just the pointers were scanned, but there would 
 likely be a noticeable performance hit for doing so. 


I was going to say: it sounds an awful lot like Walter is going to have the GC 
stamp each allocated block (or its 
corresponding record) with a field that amounts to "bool containsPointers". 
I'm all for it.  Especially since the 
majority of allocations and re-allocations in a typical D program are for
strings, images and other non-pointer related 
data.  This simple change will amount to much smaller GC pauses, especially in
the video games department.

 If false positives
 are still a concern (and I'm not convinced they will be), a better
 approach may be to pursue a different GC design such as the CBGC
 mentioned yesterday.

I haven't had a chance to read the paper myself, but the abstract left me with
the impression that it's a good 
middle-of-the road approach to things.

Just an aside here: I keep reading that as "CBGB".

-- 
- EricAnderton at yahoo

Jan 23 2007

Sean Kelly <sean f4.ca> writes:

Pragma wrote:
 Sean Kelly wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

 If a block contains pointers then the entire block will still be 
 scanned instead of just the pointers in that block, so It will still 
 be a conservative collector but with fewer false positives.  I think 
 this could be changed so that just the pointers were scanned, but 
 there would likely be a noticeable performance hit for doing so. 

 I was going to say: it sounds an awful lot like Walter is going to have 
 the GC  stamp each allocated block (or its corresponding record) with a 
 field that amounts to "bool containsPointers".  I'm all for it.  

That's basically it.  The Tango GC already works this way, but it keys 
off of element size for arrays or block size for non-arrays.  Walter's 
will be using TypeInfo to determine whether the block contains any 
pointers, so it will be far more accurate.  I imagine a situation could 
be contrived whereby struct or class data could cause false positives, 
but I don't expect it to be much of an issue in practice.

 Especially since the majority of allocations and re-allocations in a 
 typical D program are for strings, images and other non-pointer related 
 data.  This simple change will amount to much smaller GC pauses, 
 especially in the video games department.

Yup.  This was the original motivation for the changes in Tango.  I 
figured that by simply eliminating char strings from scanning, things 
would improve significantly.  And they pretty much have.

  > If false positives
  > are still a concern (and I'm not convinced they will be), a better
  > approach may be to pursue a different GC design such as the CBGC
  > mentioned yesterday.

 I haven't had a chance to read the paper myself, but the abstract left 
 me with the impression that it's a good middle-of-the road approach to 
 things.

 Just an aside here: I keep reading that as "CBGB".

Me too :-)

Sean

Jan 23 2007

Walter Bright <newshound digitalmars.com> writes:

Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

No.

Jan 23 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Walter Bright wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

 
 No.

So what exactly *does* it mean?
Is it, as Pragma put it, a "bool containsPointers" per block?
Or is it mostly-precise?

"Not exact/precise" still leaves a pretty big range...


And if it's just a bool, what was the reasoning behind this decision?
Ease of implementation? The overhead of the metadata needed to get to 
mostly-precise? All of the above?

Jan 23 2007

Walter Bright <newshound digitalmars.com> writes:

Frits van Bommel wrote:
 Walter Bright wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

 No.

 
 So what exactly *does* it mean?
 Is it, as Pragma put it, a "bool containsPointers" per block?
 Or is it mostly-precise?
 
 "Not exact/precise" still leaves a pretty big range...

All it means is that a bit gets set per block meaning if it might 
contain pointers or does not contain pointers. In the future, it might 
give a list of which offsets contain pointers.

 And if it's just a bool, what was the reasoning behind this decision?
 Ease of implementation? The overhead of the metadata needed to get to 
 mostly-precise? All of the above?

It's a fairly significant improvement for a small change.

Jan 23 2007

Pragma <ericanderton yahoo.removeme.com> writes:

Walter Bright wrote:
 Frits van Bommel wrote:
 Walter Bright wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

 No.

 So what exactly *does* it mean?
 Is it, as Pragma put it, a "bool containsPointers" per block?
 Or is it mostly-precise?

 "Not exact/precise" still leaves a pretty big range...

 
 All it means is that a bit gets set per block meaning if it might 
 contain pointers or does not contain pointers. In the future, it might 
 give a list of which offsets contain pointers.

Just shooting from the hip here: one possibility for such a list that won't eat
much space would be having each bit of a 
word (or dword) represent the presence of a pointer at that 4-byte offset.  The
last bit set could just signify "this is 
a big object, conservatively scan beyond bits*4 bytes".  This would take
advantage of the fact that most objects 
typically aren't very big.

Example (16 bit list):
0000 0000 0000 0000 - zero = no pointers present
0000 0000 0000 1001 - pointers at offsets 0 and 12
1111 1111 1111 1111 - max = I'm all pointers
1000 0000 1000 0001 - pointers at offsets 0 and 28, and force scan at offsets
60 through end of block.

Another option would be to make the resolution of the high-order bits more
logarithmic, making it behave something like 
a bloom filter.

 
 And if it's just a bool, what was the reasoning behind this decision?
 Ease of implementation? The overhead of the metadata needed to get to 
 mostly-precise? All of the above?

 
 It's a fairly significant improvement for a small change.


-- 
- EricAnderton at yahoo

Jan 23 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Pragma wrote:
 Walter Bright wrote:
 Frits van Bommel wrote:
 Walter Bright wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or 
 precise) garbage collector?

 No.

 So what exactly *does* it mean?
 Is it, as Pragma put it, a "bool containsPointers" per block?
 Or is it mostly-precise?

 "Not exact/precise" still leaves a pretty big range...

 All it means is that a bit gets set per block meaning if it might 
 contain pointers or does not contain pointers. In the future, it might 
 give a list of which offsets contain pointers.

 
 Just shooting from the hip here: one possibility for such a list that 
 won't eat much space would be having each bit of a word (or dword) 
 represent the presence of a pointer at that 4-byte offset.  The last bit 
 set could just signify "this is a big object, conservatively scan beyond 
 bits*4 bytes".  This would take advantage of the fact that most objects 
 typically aren't very big.

I think there should also be a way to annotate as "array, repeat". This 
would likely be useful since big objects are likely to be arrays.
This would require one bit for a flag and some way to note the element 
size, though.
If we go by the assumption that most structs/unions aren't very big, 
that could perhaps be implemented as a byte total: 1 bit = flag, 7 bits 
= size - 1 (since 0-length elements aren't useful anyway :) ) would 
allow elements up to 128 bytes, which should cover most arrays.

If we want no more than 4 bytes of meta-info[1], that would leave 23 
bits for pointer flags and one "big object/element" flag. Objects up to 
95 bytes could then be specified exactly (23 * 4 + 3 bytes, the last 3 
of which can't contain a pointer).

And yes, that's the optimum with these parameters; moving 1 bit from 
size to pointer flags leaves the size at a max of 64, which is under 95.

For 8 bytes of meta-info[2], the optimum would be
8 bits of elt/object size
1 bit array flag
1 bit big object flag
54 bits pointer flags
max exactly-specified size: 219 (54 * 4 + 3)


[1]: That's the minimum alignment for anything containing pointers on 32 
bit systems. This would allow it to be put at the start of the block 
without padding for align(4) contents.
[2]: Minimum align of anything containing ptrs on 64 bit systems 
(presumably). Can of course also be used on 32 bit systems; IIRC doubles 
and structs containing them already have align(8) as default.


[...]


Wait... Since size == 0 is useless, we can also use *that* as "big" 
'flag'. That would mean:
32 bits: 1 + 7 + 24, max exact size: 99
64 bits: 1 + 8 + 55, max exact size: 223

And if the array bit is 0, you can use the size bits as extra pointer 
flags (and one for the "big" flag), allowing normal objects up to 
123/251 bytes for 32/64 bits, respectively.


</obsess over details>

Jan 23 2007

Kyle Furlong <kylefurlong gmail.com> writes:

Walter Bright wrote:
 Frits van Bommel wrote:
 Walter Bright wrote:
 Paul Findlay wrote:
 Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is 
 aware of the types of what it is allocating.

 Does this mean the D garbage collector will be an exact (or precise) 
 garbage collector?

 No.

 So what exactly *does* it mean?
 Is it, as Pragma put it, a "bool containsPointers" per block?
 Or is it mostly-precise?

 "Not exact/precise" still leaves a pretty big range...

 
 All it means is that a bit gets set per block meaning if it might 
 contain pointers or does not contain pointers. In the future, it might 
 give a list of which offsets contain pointers.
 
 And if it's just a bool, what was the reasoning behind this decision?
 Ease of implementation? The overhead of the metadata needed to get to 
 mostly-precise? All of the above?

 
 It's a fairly significant improvement for a small change.

Does this incremental change include the RTTI support needed to 
implement algorithms like CBGC (There are others which could use it)? Or 
are you leaving that for a later time?

Jan 23 2007

Walter Bright <newshound digitalmars.com> writes:

Kyle Furlong wrote:
 Does this incremental change include the RTTI support needed to 
 implement algorithms like CBGC (There are others which could use it)? Or 
 are you leaving that for a later time?

It's most of the info, but not quite all of it. I plan to add the rest 
later.

Jan 23 2007

Chad J <gamerChad _spamIsBad_gmail.com> writes:

I was hoping this would happen some day.  Thank you Walter!

Walter Bright wrote:
 To improve GC performance, we need to transition to a GC that is aware 
 of the types of what it is allocating.
 
 So problems can happen if the type of allocated memory is changed after 
 it is allocated, via casting. The way to avoid this is to allocate as 
 void[] memory that will be cast, as in:
 
 struct Foo { ... };
 Foo[] f;
 
 p = new void[100];
 f = cast(Foo[])p;    // ok
 byte[] b = cast(byte[])p;
 f = cast(Foo[])b;    // ok, GC still regards memory as void[]
 
 rather than:
 
 p = new byte[100];
 f = cast(Foo[])p;    // will likely eventually corrupt memory
 
 The GC will regard void[] as a chunk of memory containing arbitrary data 
 of arbitrary types, and so will treat it conservatively.
 
 In general, regard memory allocated by the GC as anything other than 
 void[] as being *strongly typed*.

Jan 24 2007

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Transitioning to a type aware Garbage Collector