www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - BitArray implementation issue

reply "safety0ff" <safety0ff.dev gmail.com> writes:
Currently the implementation of BitArray's length setter is badly 
broken.
Its implementation contains this code segment:

     if (newdim != olddim) {
         auto b = ptr[0 .. olddim];
         b.length = newdim;    // realloc
         ptr = b.ptr;
         if (newdim & (bitsPerSizeT-1)) // Set any pad bits to 0
             ptr[newdim - 1] &= ~(~0 << (newdim & 
(bitsPerSizeT-1)));
     }

There are many bugs in the last 2 lines:
1) If we're increasing the length, ptr[newdim-1] is necessarily 
zero, so we're doing nothing.
2) If we're reducing the length, we're essentially clearing 
arbitrary bits whenever the branch is taken, furthermore:
3) On 64 bit, we're always clearing the upper 32 bits of the last 
word.
4) On 64 bit, we have undefined behaviour from the left shift.

While trying to fix this the question was raised: should we even 
clear any bits?

The underlying array can be shared between BitArray's (assignment 
behaves like dynamic array assignment.)
This means that clearing the bits might affect other BitArray's 
if the extension does not cross a size_t boundary.
So it is difficult to mimic D dynamic array semantics at a bit 
level without a reference counting mechanism.

Furthermore, a BitArray can function as an arbitrary data 
reinterpreter via the function:
void init(void[] v, size_t numbits)

Help is needed for deciding whether setting length should zero 
the new bits (even if it means stomping other BitArrays.)

Currently I've implemented zero'ing bits upon extension in my PR 
[1], but it is stalled on this design issue.

[1] https://github.com/D-Programming-Language/phobos/pull/2249
Jul 22 2014
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jul 22, 2014 at 09:16:35PM +0000, safety0ff via Digitalmars-d wrote:
 Currently the implementation of BitArray's length setter is badly broken.
 Its implementation contains this code segment:
 
     if (newdim != olddim) {
         auto b = ptr[0 .. olddim];
         b.length = newdim;    // realloc
         ptr = b.ptr;
         if (newdim & (bitsPerSizeT-1)) // Set any pad bits to 0
             ptr[newdim - 1] &= ~(~0 << (newdim & (bitsPerSizeT-1)));
     }
 
 There are many bugs in the last 2 lines:
 1) If we're increasing the length, ptr[newdim-1] is necessarily zero,
 so we're doing nothing.
 2) If we're reducing the length, we're essentially clearing arbitrary
 bits whenever the branch is taken, furthermore:
 3) On 64 bit, we're always clearing the upper 32 bits of the last
 word.
 4) On 64 bit, we have undefined behaviour from the left shift.
Wow. Looks like this is in dire need of some TLC.
 While trying to fix this the question was raised: should we even clear
 any bits?
You might want to consider implementing a way of tracking how many bits in the final word are valid. That way, you can correctly trigger reallocation if the user tries to write to a bit beyond the current end of the array (instead of stomping over other BitArray's data), and you don't have to clear any bits except newly-allocated words. Assuming we want to keep dynamic-array-like behaviour for BitArray, that is. I think it's a good idea.
 The underlying array can be shared between BitArray's (assignment
 behaves like dynamic array assignment.)
 This means that clearing the bits might affect other BitArray's if the
 extension does not cross a size_t boundary.
 So it is difficult to mimic D dynamic array semantics at a bit level
 without a reference counting mechanism.
I think it's best to keep a bit count of the number of valid bits at the end of the array, in addition to the number of words in the underlying array. You might also want to consider keeping a bit count of the number of valid bits at the *start* of the array, so that it's possible to bit-level slicing in a transparent way.
 Furthermore, a BitArray can function as an arbitrary data
 reinterpreter via the function:
 void init(void[] v, size_t numbits)
 
 Help is needed for deciding whether setting length should zero the new bits
 (even if it means stomping other BitArrays.)
It shouldn't. Keeping track of the number of valid bits in the last word should solve this problem.
 Currently I've implemented zero'ing bits upon extension in my PR [1],
 but it is stalled on this design issue.
 
 [1] https://github.com/D-Programming-Language/phobos/pull/2249
I think the best solution is to keep an additional count of the number of valid bits in the last word of the array. Then you don't have to worry about stomping other BitArrays, and you can potentially implement bit-level slicing too. T -- IBM = I Blame Microsoft
Jul 22 2014
next sibling parent reply "safety0ff" <safety0ff.dev gmail.com> writes:
On Wednesday, 23 July 2014 at 00:59:38 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 You might want to consider implementing a way of tracking how 
 many bits
 in the final word are valid. That way, you can correctly trigger
 reallocation if the user tries to write to a bit beyond the 
 current end
 of the array (instead of stomping over other BitArray's data), 
 and you
 don't have to clear any bits except newly-allocated words. 
 Assuming we
 want to keep dynamic-array-like behaviour for BitArray, that 
 is. I think
 it's a good idea.
In brief, you're suggesting to always realloc on extension, no matter if it's a sub-word extension. This solves the stomping issue nicely, but it would cause a lot of GC churn in a concatenation loop.
Jul 22 2014
next sibling parent "safety0ff" <safety0ff.dev gmail.com> writes:
On Wednesday, 23 July 2014 at 02:33:39 UTC, safety0ff wrote:
 but it would cause a lot of GC churn in a concatenation loop.
Nevermind this. :o)
Jul 22 2014
prev sibling parent "safety0ff" <safety0ff.dev gmail.com> writes:
On Wednesday, 23 July 2014 at 02:33:39 UTC, safety0ff wrote:
 In brief, you're suggesting to always realloc on extension, no 
 matter if it's a sub-word extension.
 This solves the stomping issue nicely
Actually, you'd need to allocate and copy on each extension, not realloc, because realloc would do nothing on subword extension.
Jul 22 2014
prev sibling parent "safety0ff" <safety0ff.dev gmail.com> writes:
On Wednesday, 23 July 2014 at 00:59:38 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 I think it's best to keep a bit count of the number of valid 
 bits at the
 end of the array, in addition to the number of words in the 
 underlying
 array. You might also want to consider keeping a bit count of 
 the number
 of valid bits at the *start* of the array, so that it's 
 possible to
 bit-level slicing in a transparent way.
Sorry, being too eager to reply. We would need to use a secondary pointer to the count to avoid breaking code that uses the void init(void[], size_t) function which allows arbitrary bit twiddling of the passed in array.
Jul 22 2014
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jul 22, 2014 at 05:57:48PM -0700, H. S. Teoh via Digitalmars-d wrote:
[....]
 You might want to consider implementing a way of tracking how many
 bits in the final word are valid. That way, you can correctly trigger
 reallocation if the user tries to write to a bit beyond the current
 end of the array (instead of stomping over other BitArray's data), and
 you don't have to clear any bits except newly-allocated words.
 Assuming we want to keep dynamic-array-like behaviour for BitArray,
 that is. I think it's a good idea.
Hmm. It looks like the code is already counting bits rather than words. So you should already have enough information to implement the correct solution (take the current length modulo the number of bits per word, and that tells you which bits in the last word are valid). T -- Let's not fight disease by killing the patient. -- Sean 'Shaleh' Perry
Jul 22 2014