www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.allocator: false pointers

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,


I'm currently doing a nice optimization in the tracing code: I use the 
same bit (per block) to indicate "this memory is allocated" and "this 
memory is marked as used during tracing".

The way the trick works is, at the beginning of a collection the GC 
marks all of its memory as deallocated (sic!). Then, it traces through 
pointers and marks BACK as allocated the memory that's actually used. At 
the end of tracing, there's no need to do anything - what's used stays 
used, the rest is free by default.

This is unlike more traditional GCs, which use a SEPARATE bit to mean 
"mark during tracing". At the beginning of the collection, these "mark" 
bits are set to 0. Then collection proceeds and marks with 1 all blocks 
that are actually used. As the last step, collection deallocates all 
blocks that were marked with 0 and were previously allocated.

So the optimization consumes less memory and saves one pass. It does 
have a disadvantage, however.

Consider a false pointer. It will claim that a block that's free is 
actually occupied, and the implementation can't distinguish because it 
conflates the "mark" and the "allocated" bit together. So it's possible 
that at the end of collection there are blocks allocated that previously 
weren't. The optimization is therefore sensitive to false pointers.

Thoughts? How bad are false pointers (assuming we fix globals, which 
only leaves the stack and registers)?


Andrei
May 02 2014
next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Friday, 2 May 2014 at 15:55:06 UTC, Andrei Alexandrescu wrote:
 Hello,


 I'm currently doing a nice optimization in the tracing code: I 
 use the same bit (per block) to indicate "this memory is 
 allocated" and "this memory is marked as used during tracing".

 The way the trick works is, at the beginning of a collection 
 the GC marks all of its memory as deallocated (sic!). Then, it 
 traces through pointers and marks BACK as allocated the memory 
 that's actually used. At the end of tracing, there's no need to 
 do anything - what's used stays used, the rest is free by 
 default.

 This is unlike more traditional GCs, which use a SEPARATE bit 
 to mean "mark during tracing". At the beginning of the 
 collection, these "mark" bits are set to 0. Then collection 
 proceeds and marks with 1 all blocks that are actually used. As 
 the last step, collection deallocates all blocks that were 
 marked with 0 and were previously allocated.

 So the optimization consumes less memory and saves one pass. It 
 does have a disadvantage, however.

 Consider a false pointer. It will claim that a block that's 
 free is actually occupied, and the implementation can't 
 distinguish because it conflates the "mark" and the "allocated" 
 bit together. So it's possible that at the end of collection 
 there are blocks allocated that previously weren't. The 
 optimization is therefore sensitive to false pointers.

 Thoughts? How bad are false pointers (assuming we fix globals, 
 which only leaves the stack and registers)?

If destructors are still going to be called (see other thread), this trick is dangerous, because the resurrected objects might later on be destroyed again (double free). I'm aware that this is still about untyped allocators, but if they are going to be used as a basis for typed allocators, things like this need to be considered already at this stage.
May 02 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/2/14, 10:12 AM, "Marc Sch├╝tz" <schuetzm gmx.net>" wrote:
 If destructors are still going to be called (see other thread), this
 trick is dangerous, because the resurrected objects might later on be
 destroyed again (double free).

Yah, forgot to mention this trick is only applicable to what I call "the passive heap". Thanks! -- Andrei
May 02 2014
prev sibling next sibling parent reply Orvid King via Digitalmars-d <digitalmars-d puremagic.com> writes:
Well, in a 64-bit address space, the false pointer issue is almost
mute, the issue comes in when you try to apply this design to 32-bit,
where the false pointer issue is more prevelent. Is the volume of
memory saved by this really worth it?

Another thing to consider is that this makes it impossible to
pre-allocate blocks of varying sizes for absurdly fast allocations via
atomic linked lists, in most cases literally a single `lock cmpxchg`.

On 5/2/14, Andrei Alexandrescu via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Hello,


 I'm currently doing a nice optimization in the tracing code: I use the
 same bit (per block) to indicate "this memory is allocated" and "this
 memory is marked as used during tracing".

 The way the trick works is, at the beginning of a collection the GC
 marks all of its memory as deallocated (sic!). Then, it traces through
 pointers and marks BACK as allocated the memory that's actually used. At
 the end of tracing, there's no need to do anything - what's used stays
 used, the rest is free by default.

 This is unlike more traditional GCs, which use a SEPARATE bit to mean
 "mark during tracing". At the beginning of the collection, these "mark"
 bits are set to 0. Then collection proceeds and marks with 1 all blocks
 that are actually used. As the last step, collection deallocates all
 blocks that were marked with 0 and were previously allocated.

 So the optimization consumes less memory and saves one pass. It does
 have a disadvantage, however.

 Consider a false pointer. It will claim that a block that's free is
 actually occupied, and the implementation can't distinguish because it
 conflates the "mark" and the "allocated" bit together. So it's possible
 that at the end of collection there are blocks allocated that previously
 weren't. The optimization is therefore sensitive to false pointers.

 Thoughts? How bad are false pointers (assuming we fix globals, which
 only leaves the stack and registers)?


 Andrei

May 02 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/2/14, 10:15 AM, Orvid King via Digitalmars-d wrote:
 Well, in a 64-bit address space, the false pointer issue is almost
 mute, the issue comes in when you try to apply this design to 32-bit,
 where the false pointer issue is more prevelent. Is the volume of
 memory saved by this really worth it?

It's the time savings that are most important.
 Another thing to consider is that this makes it impossible to
 pre-allocate blocks of varying sizes for absurdly fast allocations via
 atomic linked lists, in most cases literally a single `lock cmpxchg`.

Those can be accommodated I think. Andrei
May 02 2014
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 02 May 2014 11:55:11 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Hello,


 I'm currently doing a nice optimization in the tracing code: I use the  
 same bit (per block) to indicate "this memory is allocated" and "this  
 memory is marked as used during tracing".

 The way the trick works is, at the beginning of a collection the GC  
 marks all of its memory as deallocated (sic!). Then, it traces through  
 pointers and marks BACK as allocated the memory that's actually used. At  
 the end of tracing, there's no need to do anything - what's used stays  
 used, the rest is free by default.

 This is unlike more traditional GCs, which use a SEPARATE bit to mean  
 "mark during tracing". At the beginning of the collection, these "mark"  
 bits are set to 0. Then collection proceeds and marks with 1 all blocks  
 that are actually used. As the last step, collection deallocates all  
 blocks that were marked with 0 and were previously allocated.

 So the optimization consumes less memory and saves one pass. It does  
 have a disadvantage, however.

 Consider a false pointer. It will claim that a block that's free is  
 actually occupied, and the implementation can't distinguish because it  
 conflates the "mark" and the "allocated" bit together. So it's possible  
 that at the end of collection there are blocks allocated that previously  
 weren't. The optimization is therefore sensitive to false pointers.

 Thoughts? How bad are false pointers (assuming we fix globals, which  
 only leaves the stack and registers)?

False pointers are less of a problem in 64-bit code, but you can run into worse issues. If you are not zeroing the memory when deallocating, then if it mysteriously comes alive again, it has the ghost of what could be a pointer to other code. Your blocks are more likely to resurrect once one of them resurrects. Why not keep the 3 states, but just treat unmarked blocks as free? Then the next time you go through tracing, change the bit to free if it was already marked. This doesn't save on the bit space, but I think the savings there is minimal anyway. However, it does allow the final pass to be saved. -Steve
May 02 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/2/14, 10:26 AM, Steven Schveighoffer wrote:
 False pointers are less of a problem in 64-bit code, but you can run
 into worse issues. If you are not zeroing the memory when deallocating,
 then if it mysteriously comes alive again, it has the ghost of what
 could be a pointer to other code. Your blocks are more likely to
 resurrect once one of them resurrects.

Good point.
 Why not keep the 3 states, but just treat unmarked blocks as free? Then
 the next time you go through tracing, change the bit to free if it was
 already marked.

 This doesn't save on the bit space, but I think the savings there is
 minimal anyway. However, it does allow the final pass to be saved.

That's a great idea. Thanks! Andrei
May 02 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free?
 Then the next time you go through tracing, change the bit to free if
 it was already marked.

Sorry, if it was already *unmarked* (or marked as garbage).

Yah, understood. Unfortunately I just realized that would require either to keep the bits together or to scan two memory areas when trying to allocate, both of which have disadvantages. Well, I guess I'll go with the post-tracing pass. -- Andrei
May 02 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 14:00:18 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free?
 Then the next time you go through tracing, change the bit to free if
 it was already marked.

Sorry, if it was already *unmarked* (or marked as garbage).

Yah, understood. Unfortunately I just realized that would require either to keep the bits together or to scan two memory areas when trying to allocate, both of which have disadvantages. Well, I guess I'll go with the post-tracing pass. -- Andrei

What is the problem with keeping the bits together?

More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- Andrei
May 02 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/2/14, 11:50 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:

 What is the problem with keeping the bits together?

More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- Andrei

Given a bitvector type, a 2bitvector type can be implemented on top of it.

If speed is no issue, sure :o). My intuition is that the TwoBitVector would need certain primitives from BitVector to work well.
 If one bit is "free", and another is "garbage", you just have to look
 for any set bits for free blocks. Yes, you have to look through 2x as
 much memory, but only until you find a free block.

Hmm, so if "garbage" is 0, then to allocate we'd need to scan for a "hole" of contiguous zeros (cheap) instead of a checkered pattern (expensive). I'll think about it. Andrei
May 02 2014
parent David Gileadi <gileadis NSPMgmail.com> writes:
On 5/2/14, 11:56 AM, Andrei Alexandrescu wrote:
 If speed is no issue, sure :o). My intuition is that the TwoBitVector
 would need certain primitives from BitVector to work well.

Heh, however it's implemented, TwoBitVector's very name implies that it's cheap to use ;)
May 02 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free? Then  
 the next time you go through tracing, change the bit to free if it was  
 already marked.

Sorry, if it was already *unmarked* (or marked as garbage). essentially: enum GCState {free, allocated, garbage} GCState memoryBlocks[]; fullCollect() { foreach(ref st; memoryBlocks) { final switch(st) { case GCState.free: break; case GCState.allocated: st = GCState.garbage; break; case GCState.garbage: st = GCState.free; break; } } ... // run mark/sweep setting garbage to allocated for reachable blocks }
May 02 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 02 May 2014 14:00:18 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free?
 Then the next time you go through tracing, change the bit to free if
 it was already marked.

Sorry, if it was already *unmarked* (or marked as garbage).

Yah, understood. Unfortunately I just realized that would require either to keep the bits together or to scan two memory areas when trying to allocate, both of which have disadvantages. Well, I guess I'll go with the post-tracing pass. -- Andrei

What is the problem with keeping the bits together? -Steve
May 02 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:

 What is the problem with keeping the bits together?

More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- Andrei

Given a bitvector type, a 2bitvector type can be implemented on top of it. If one bit is "free", and another is "garbage", you just have to look for any set bits for free blocks. Yes, you have to look through 2x as much memory, but only until you find a free block. -Steve
May 02 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 02 May 2014 14:56:00 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:50 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:

 What is the problem with keeping the bits together?

More implementation (I have a BitVector type but not a KBitsVector!k type), and scanning can't be done with fast primitives. -- Andrei

Given a bitvector type, a 2bitvector type can be implemented on top of it.

If speed is no issue, sure :o). My intuition is that the TwoBitVector would need certain primitives from BitVector to work well.
 If one bit is "free", and another is "garbage", you just have to look
 for any set bits for free blocks. Yes, you have to look through 2x as
 much memory, but only until you find a free block.

Hmm, so if "garbage" is 0, then to allocate we'd need to scan for a "hole" of contiguous zeros (cheap) instead of a checkered pattern (expensive). I'll think about it.

Well, you are looking for one bit set (free), or another bit set (garbage). So the pattern may not be uniform. What you probably want to do is to store the bits close together, but probably not *right* together. That way you can use logic-or to search for bits. -Steve
May 02 2014
prev sibling next sibling parent "safety0ff" <safety0ff.dev gmail.com> writes:
On Friday, 2 May 2014 at 17:15:20 UTC, Orvid King via 
Digitalmars-d wrote:
 Well, in a 64-bit address space, the false pointer issue is 
 almost
 mute, the issue comes in when you try to apply this design to 
 32-bit,
 where the false pointer issue is more prevelent. Is the volume 
 of
 memory saved by this really worth it?

False pointers don't only cause memory consumption, you're forgetting that the GC will repeatedly scan memory held by false pointers at each collection*. This adds time to each collection and further increases the risk of other false pointers. False pointers can make many reasonable looking D programs behave unexpectedly when run in a 32-bit environment (to the point that I consider that they can "break" programs.) I think false pointers must be addressed to make claims that D is well-behaved on 32-bit systems. * Unless it is marked NO_SCAN of course, but this is not likely the common case.
May 03 2014
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 3 May 2014 at 23:41:02 UTC, safety0ff wrote:
 Well, in a 64-bit address space, the false pointer issue is 
 almost
 mute, the issue comes in when you try to apply this design to 
 32-bit,


That assumes that the heap is located at a high address.
 I think false pointers must be addressed to make claims that D 
 is well-behaved on 32-bit systems.

Which means a precise collector, which means faster stack meta info access, which means faster exception handling? Then use a single pass mark-and-don't-sweep collector/allocator? (invert the meaning of the bit for every other full collection cycle) A conservative collector will never be acceptable in a language which is primarily GC based. It is something you can bolt on to a RC system to collrct cycles, IMHO.
May 04 2014