digitalmars.D - std.allocator: false pointers

Andrei Alexandrescu (24/24) May 02 2014 Hello,

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/33) May 02 2014 If destructors are still going to be called (see other thread),

Andrei Alexandrescu (3/6) May 02 2014 Yah, forgot to mention this trick is only applicable to what I call "the...

Orvid King via Digitalmars-d (9/33) May 02 2014 Well, in a 64-bit address space, the false pointer issue is almost

Andrei Alexandrescu (4/11) May 02 2014 Those can be accommodated I think.
safety0ff (13/20) May 03 2014 False pointers don't only cause memory consumption, you're

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/15) May 04 2014 Which means a precise collector, which means faster stack meta

Steven Schveighoffer (13/36) May 02 2014 False pointers are less of a problem in 64-bit code, but you can run int...

Steven Schveighoffer (24/27) May 02 2014 Sorry, if it was already *unmarked* (or marked as garbage).

Andrei Alexandrescu (5/11) May 02 2014 Yah, understood. Unfortunately I just realized that would require either...

Steven Schveighoffer (4/17) May 02 2014 What is the problem with keeping the bits together?

Andrei Alexandrescu (3/20) May 02 2014 More implementation (I have a BitVector type but not a KBitsVector!k

Steven Schveighoffer (7/11) May 02 2014 Given a bitvector type, a 2bitvector type can be implemented on top of i...

Andrei Alexandrescu (7/18) May 02 2014 If speed is no issue, sure :o). My intuition is that the TwoBitVector

Steven Schveighoffer (8/29) May 02 2014 Well, you are looking for one bit set (free), or another bit set
David Gileadi (3/5) May 02 2014 Heh, however it's implemented, TwoBitVector's very name implies that

Andrei Alexandrescu (4/14) May 02 2014 That's a great idea. Thanks!

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Hello,


I'm currently doing a nice optimization in the tracing code: I use the 
same bit (per block) to indicate "this memory is allocated" and "this 
memory is marked as used during tracing".

The way the trick works is, at the beginning of a collection the GC 
marks all of its memory as deallocated (sic!). Then, it traces through 
pointers and marks BACK as allocated the memory that's actually used. At 
the end of tracing, there's no need to do anything - what's used stays 
used, the rest is free by default.

This is unlike more traditional GCs, which use a SEPARATE bit to mean 
"mark during tracing". At the beginning of the collection, these "mark" 
bits are set to 0. Then collection proceeds and marks with 1 all blocks 
that are actually used. As the last step, collection deallocates all 
blocks that were marked with 0 and were previously allocated.

So the optimization consumes less memory and saves one pass. It does 
have a disadvantage, however.

Consider a false pointer. It will claim that a block that's free is 
actually occupied, and the implementation can't distinguish because it 
conflates the "mark" and the "allocated" bit together. So it's possible 
that at the end of collection there are blocks allocated that previously 
weren't. The optimization is therefore sensitive to false pointers.

Thoughts? How bad are false pointers (assuming we fix globals, which 
only leaves the stack and registers)?


Andrei

May 02 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 2 May 2014 at 15:55:06 UTC, Andrei Alexandrescu wrote:
 Hello,


 I'm currently doing a nice optimization in the tracing code: I 
 use the same bit (per block) to indicate "this memory is 
 allocated" and "this memory is marked as used during tracing".

 The way the trick works is, at the beginning of a collection 
 the GC marks all of its memory as deallocated (sic!). Then, it 
 traces through pointers and marks BACK as allocated the memory 
 that's actually used. At the end of tracing, there's no need to 
 do anything - what's used stays used, the rest is free by 
 default.

 This is unlike more traditional GCs, which use a SEPARATE bit 
 to mean "mark during tracing". At the beginning of the 
 collection, these "mark" bits are set to 0. Then collection 
 proceeds and marks with 1 all blocks that are actually used. As 
 the last step, collection deallocates all blocks that were 
 marked with 0 and were previously allocated.

 So the optimization consumes less memory and saves one pass. It 
 does have a disadvantage, however.

 Consider a false pointer. It will claim that a block that's 
 free is actually occupied, and the implementation can't 
 distinguish because it conflates the "mark" and the "allocated" 
 bit together. So it's possible that at the end of collection 
 there are blocks allocated that previously weren't. The 
 optimization is therefore sensitive to false pointers.

 Thoughts? How bad are false pointers (assuming we fix globals, 
 which only leaves the stack and registers)?

If destructors are still going to be called (see other thread), 
this trick is dangerous, because the resurrected objects might 
later on be destroyed again (double free).

I'm aware that this is still about untyped allocators, but if 
they are going to be used as a basis for typed allocators, things 
like this need to be considered already at this stage.

May 02 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/2/14, 10:12 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 If destructors are still going to be called (see other thread), this
 trick is dangerous, because the resurrected objects might later on be
 destroyed again (double free).

Yah, forgot to mention this trick is only applicable to what I call "the 
passive heap". Thanks! -- Andrei

May 02 2014

Orvid King via Digitalmars-d <digitalmars-d puremagic.com> writes:

Well, in a 64-bit address space, the false pointer issue is almost
mute, the issue comes in when you try to apply this design to 32-bit,
where the false pointer issue is more prevelent. Is the volume of
memory saved by this really worth it?

Another thing to consider is that this makes it impossible to
pre-allocate blocks of varying sizes for absurdly fast allocations via
atomic linked lists, in most cases literally a single `lock cmpxchg`.

On 5/2/14, Andrei Alexandrescu via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Hello,


 I'm currently doing a nice optimization in the tracing code: I use the
 same bit (per block) to indicate "this memory is allocated" and "this
 memory is marked as used during tracing".

 The way the trick works is, at the beginning of a collection the GC
 marks all of its memory as deallocated (sic!). Then, it traces through
 pointers and marks BACK as allocated the memory that's actually used. At
 the end of tracing, there's no need to do anything - what's used stays
 used, the rest is free by default.

 This is unlike more traditional GCs, which use a SEPARATE bit to mean
 "mark during tracing". At the beginning of the collection, these "mark"
 bits are set to 0. Then collection proceeds and marks with 1 all blocks
 that are actually used. As the last step, collection deallocates all
 blocks that were marked with 0 and were previously allocated.

 So the optimization consumes less memory and saves one pass. It does
 have a disadvantage, however.

 Consider a false pointer. It will claim that a block that's free is
 actually occupied, and the implementation can't distinguish because it
 conflates the "mark" and the "allocated" bit together. So it's possible
 that at the end of collection there are blocks allocated that previously
 weren't. The optimization is therefore sensitive to false pointers.

 Thoughts? How bad are false pointers (assuming we fix globals, which
 only leaves the stack and registers)?


 Andrei

May 02 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/2/14, 10:15 AM, Orvid King via Digitalmars-d wrote:
 Well, in a 64-bit address space, the false pointer issue is almost
 mute, the issue comes in when you try to apply this design to 32-bit,
 where the false pointer issue is more prevelent. Is the volume of
 memory saved by this really worth it?

It's the time savings that are most important.

 Another thing to consider is that this makes it impossible to
 pre-allocate blocks of varying sizes for absurdly fast allocations via
 atomic linked lists, in most cases literally a single `lock cmpxchg`.

Those can be accommodated I think.


Andrei

May 02 2014

"safety0ff" <safety0ff.dev gmail.com> writes:

On Friday, 2 May 2014 at 17:15:20 UTC, Orvid King via 
Digitalmars-d wrote:
 Well, in a 64-bit address space, the false pointer issue is 
 almost
 mute, the issue comes in when you try to apply this design to 
 32-bit,
 where the false pointer issue is more prevelent. Is the volume 
 of
 memory saved by this really worth it?

False pointers don't only cause memory consumption, you're 
forgetting that the GC will repeatedly scan memory held by false 
pointers at each collection*. This adds time to each collection 
and further increases the risk of other false pointers.

False pointers can make many reasonable looking D programs behave 
unexpectedly when run in a 32-bit environment (to the point that 
I consider that they can "break" programs.)

I think false pointers must be addressed to make claims that D is 
well-behaved on 32-bit systems.

* Unless it is marked NO_SCAN of course, but this is not likely 
the common case.

May 03 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Saturday, 3 May 2014 at 23:41:02 UTC, safety0ff wrote:
 Well, in a 64-bit address space, the false pointer issue is 
 almost
 mute, the issue comes in when you try to apply this design to 
 32-bit,


That assumes that the heap is located at a high address.

 I think false pointers must be addressed to make claims that D 
 is well-behaved on 32-bit systems.

Which means a precise collector, which means faster stack meta 
info access, which means faster exception handling? Then use a 
single pass mark-and-don't-sweep collector/allocator? (invert the 
meaning of the bit for every other full collection cycle)

A conservative collector will never be acceptable in a language 
which is primarily GC based. It is something you can bolt on to a 
RC system to collrct cycles, IMHO.

May 04 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 02 May 2014 11:55:11 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Hello,


 I'm currently doing a nice optimization in the tracing code: I use the  
 same bit (per block) to indicate "this memory is allocated" and "this  
 memory is marked as used during tracing".

 The way the trick works is, at the beginning of a collection the GC  
 marks all of its memory as deallocated (sic!). Then, it traces through  
 pointers and marks BACK as allocated the memory that's actually used. At  
 the end of tracing, there's no need to do anything - what's used stays  
 used, the rest is free by default.

 This is unlike more traditional GCs, which use a SEPARATE bit to mean  
 "mark during tracing". At the beginning of the collection, these "mark"  
 bits are set to 0. Then collection proceeds and marks with 1 all blocks  
 that are actually used. As the last step, collection deallocates all  
 blocks that were marked with 0 and were previously allocated.

 So the optimization consumes less memory and saves one pass. It does  
 have a disadvantage, however.

 Consider a false pointer. It will claim that a block that's free is  
 actually occupied, and the implementation can't distinguish because it  
 conflates the "mark" and the "allocated" bit together. So it's possible  
 that at the end of collection there are blocks allocated that previously  
 weren't. The optimization is therefore sensitive to false pointers.

 Thoughts? How bad are false pointers (assuming we fix globals, which  
 only leaves the stack and registers)?

False pointers are less of a problem in 64-bit code, but you can run into  
worse issues. If you are not zeroing the memory when deallocating, then if  
it mysteriously comes alive again, it has the ghost of what could be a  
pointer to other code. Your blocks are more likely to resurrect once one  
of them resurrects.

Why not keep the 3 states, but just treat unmarked blocks as free? Then  
the next time you go through tracing, change the bit to free if it was  
already marked.

This doesn't save on the bit space, but I think the savings there is  
minimal anyway. However, it does allow the final pass to be saved.

-Steve

May 02 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free? Then  
 the next time you go through tracing, change the bit to free if it was  
 already marked.

Sorry, if it was already *unmarked* (or marked as garbage).

essentially:

enum GCState {free, allocated, garbage}

GCState memoryBlocks[];

fullCollect()
{
    foreach(ref st; memoryBlocks)
    {
       final switch(st)
       {
       case GCState.free:
          break;
       case GCState.allocated:
          st = GCState.garbage;
          break;
       case GCState.garbage:
          st = GCState.free;
          break;
       }
    }
    ... // run mark/sweep setting garbage to allocated for reachable blocks
}

May 02 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free?
 Then the next time you go through tracing, change the bit to free if
 it was already marked.

 Sorry, if it was already *unmarked* (or marked as garbage).

Yah, understood. Unfortunately I just realized that would require either 
to keep the bits together or to scan two memory areas when trying to 
allocate, both of which have disadvantages. Well, I guess I'll go with 
the post-tracing pass. -- Andrei

May 02 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 02 May 2014 14:00:18 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free?
 Then the next time you go through tracing, change the bit to free if
 it was already marked.

 Sorry, if it was already *unmarked* (or marked as garbage).

 Yah, understood. Unfortunately I just realized that would require either  
 to keep the bits together or to scan two memory areas when trying to  
 allocate, both of which have disadvantages. Well, I guess I'll go with  
 the post-tracing pass. -- Andrei

What is the problem with keeping the bits together?

-Steve

May 02 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 14:00:18 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 10:33 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 13:26:41 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Why not keep the 3 states, but just treat unmarked blocks as free?
 Then the next time you go through tracing, change the bit to free if
 it was already marked.

 Sorry, if it was already *unmarked* (or marked as garbage).

 Yah, understood. Unfortunately I just realized that would require
 either to keep the bits together or to scan two memory areas when
 trying to allocate, both of which have disadvantages. Well, I guess
 I'll go with the post-tracing pass. -- Andrei

 What is the problem with keeping the bits together?

More implementation (I have a BitVector type but not a KBitsVector!k 
type), and scanning can't be done with fast primitives. -- Andrei

May 02 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:

 What is the problem with keeping the bits together?

 More implementation (I have a BitVector type but not a KBitsVector!k  
 type), and scanning can't be done with fast primitives. -- Andrei

Given a bitvector type, a 2bitvector type can be implemented on top of it.

If one bit is "free", and another is "garbage", you just have to look for  
any set bits for free blocks. Yes, you have to look through 2x as much  
memory, but only until you find a free block.

-Steve

May 02 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/2/14, 11:50 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:

 What is the problem with keeping the bits together?

 More implementation (I have a BitVector type but not a KBitsVector!k
 type), and scanning can't be done with fast primitives. -- Andrei

 Given a bitvector type, a 2bitvector type can be implemented on top of it.

If speed is no issue, sure :o). My intuition is that the TwoBitVector 
would need certain primitives from BitVector to work well.

 If one bit is "free", and another is "garbage", you just have to look
 for any set bits for free blocks. Yes, you have to look through 2x as
 much memory, but only until you find a free block.

Hmm, so if "garbage" is 0, then to allocate we'd need to scan for a 
"hole" of contiguous zeros (cheap) instead of a checkered pattern 
(expensive). I'll think about it.


Andrei

May 02 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 02 May 2014 14:56:00 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:50 AM, Steven Schveighoffer wrote:
 On Fri, 02 May 2014 14:42:52 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/2/14, 11:07 AM, Steven Schveighoffer wrote:

 What is the problem with keeping the bits together?

 More implementation (I have a BitVector type but not a KBitsVector!k
 type), and scanning can't be done with fast primitives. -- Andrei

 Given a bitvector type, a 2bitvector type can be implemented on top of  
 it.

 If speed is no issue, sure :o). My intuition is that the TwoBitVector  
 would need certain primitives from BitVector to work well.

 If one bit is "free", and another is "garbage", you just have to look
 for any set bits for free blocks. Yes, you have to look through 2x as
 much memory, but only until you find a free block.

 Hmm, so if "garbage" is 0, then to allocate we'd need to scan for a  
 "hole" of contiguous zeros (cheap) instead of a checkered pattern  
 (expensive). I'll think about it.

Well, you are looking for one bit set (free), or another bit set  
(garbage). So the pattern may not be uniform.

What you probably want to do is to store the bits close together, but  
probably not *right* together. That way you can use logic-or to search for  
bits.

-Steve

May 02 2014

David Gileadi <gileadis NSPMgmail.com> writes:

On 5/2/14, 11:56 AM, Andrei Alexandrescu wrote:
 If speed is no issue, sure :o). My intuition is that the TwoBitVector
 would need certain primitives from BitVector to work well.

Heh, however it's implemented, TwoBitVector's very name implies that 
it's cheap to use ;)

May 02 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/2/14, 10:26 AM, Steven Schveighoffer wrote:
 False pointers are less of a problem in 64-bit code, but you can run
 into worse issues. If you are not zeroing the memory when deallocating,
 then if it mysteriously comes alive again, it has the ghost of what
 could be a pointer to other code. Your blocks are more likely to
 resurrect once one of them resurrects.

Good point.

 Why not keep the 3 states, but just treat unmarked blocks as free? Then
 the next time you go through tracing, change the bit to free if it was
 already marked.

 This doesn't save on the bit space, but I think the savings there is
 minimal anyway. However, it does allow the final pass to be saved.

That's a great idea. Thanks!


Andrei

May 02 2014

D Programming

C/C++ Programming

Other

digitalmars.D - std.allocator: false pointers