www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - BitArray shift left/right confusion.

reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
I suppose the following is not a bug, but confusing it is:

```
void main()
{
     import std.stdio;
     import std.bitmanip;
     BitArray ba = [1, 1, 1, 1, 1, 1, 1, 1];
     writeln(ba);	// [1, 1, 1, 1, 1, 1, 1, 1]
     ba >>= 4;		// right shift
     writeln(ba);	// [1, 1, 1, 1, 0, 0, 0, 0] bits shifted left
}```

I suppose this is because the array is printed left-to-right, 
whereas the bits in a byte are typically ordered right-to-left. I 
suppose I should interpret the bits in the array to increase in 
significance with increasing index (little endian) and that 
right-shift means a shift towards less significance (which is to 
the right in big endian).

The documentation of <<= and >>= [1] however just talks about 
left and right, without defining left and right or clarifying 
that the directions are reversed from how the array is printed.

Is there something I have missed?

[1] 
https://dlang.org/phobos/std_bitmanip.html#.BitArray.opOpAssign.2
Dec 27 2017
parent reply Biotronic <simen.kjaras gmail.com> writes:
On Wednesday, 27 December 2017 at 18:08:19 UTC, Bastiaan Veelo 
wrote:
 I suppose the following is not a bug, but confusing it is:

 ```
 void main()
 {
     import std.stdio;
     import std.bitmanip;
     BitArray ba = [1, 1, 1, 1, 1, 1, 1, 1];
     writeln(ba);	// [1, 1, 1, 1, 1, 1, 1, 1]
     ba >>= 4;		// right shift
     writeln(ba);	// [1, 1, 1, 1, 0, 0, 0, 0] bits shifted left
 }```

 I suppose this is because the array is printed left-to-right, 
 whereas the bits in a byte are typically ordered right-to-left. 
 I suppose I should interpret the bits in the array to increase 
 in significance with increasing index (little endian) and that 
 right-shift means a shift towards less significance (which is 
 to the right in big endian).

 The documentation of <<= and >>= [1] however just talks about 
 left and right, without defining left and right or clarifying 
 that the directions are reversed from how the array is printed.

 Is there something I have missed?

 [1] 
 https://dlang.org/phobos/std_bitmanip.html#.BitArray.opOpAssign.2
BitArray is apparently a mess. As you've pointed out it prints the bits in the wrong order. I won't mince words here, since D has binary literals on the form 0b10011110. Put that in a BitArray and print it with the format string "0b%b", and you'll get 0b01111001. While it may have been intentional, it's bug prone and confusing, and so definitely a bug. It also fucks up royally when it has an exact multiple of 32 bits in its buffer, overwriting the last word with 0s when you try and shift it in any way. It also doesn't remove set bits outside of its covered area when cast to size_t[]. That is, if I do cast(size_t[])(BitArray([1,1,1,1,1,1,1,1]) << 4), the result will be something like [4080], which corresponds to [0b0000_1111_1111_0000]. Lastly (and this is pointed out explicitly in the documentation, but still smells if you ask me), it will overwrite bits in the words it covers, even if it does not cover those exact bits. The first two are definitely bugs. The last two are results of the intended use case for BitArray, I believe. The documentation doesn't explicitly point this out, but it seems BitArray is intended to give a bit-by-bit view of an area of memory that is actually some other type. Something like this: struct S { int n; float f; } void foo(ref S s) { import std.bitmanip; auto a = BitArray((&s)[0..1], S.sizeof); a[7] = true; // Actually sets the corresponding bit in s. } -- Biotronic
Dec 27 2017
next sibling parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 27 December 2017 at 20:45:49 UTC, Biotronic wrote:
 BitArray is apparently a mess.
Thanks for your confirmation, digging and reporting issues https://issues.dlang.org/show_bug.cgi?id=18133 and https://issues.dlang.org/show_bug.cgi?id=18134 (Turns out we both live in the same country).
Dec 28 2017
prev sibling parent reply Jakub =?UTF-8?B?xYFhYmFq?= <uaaabbjjkl gmail.com> writes:
I was looking at the issues you reported and was pretty surprised 
how some of the features work. So I'm going to revive this thread 
with an explanation.

On Wednesday, 27 December 2017 at 20:45:49 UTC, Biotronic wrote:
 On Wednesday, 27 December 2017 at 18:08:19 UTC, Bastiaan Veelo 
 wrote:
 I suppose the following is not a bug, but confusing it is:

 ```
 void main()
 {
     import std.stdio;
     import std.bitmanip;
     BitArray ba = [1, 1, 1, 1, 1, 1, 1, 1];
     writeln(ba);	// [1, 1, 1, 1, 1, 1, 1, 1]
     ba >>= 4;		// right shift
     writeln(ba);	// [1, 1, 1, 1, 0, 0, 0, 0] bits shifted left
 }```

 I suppose this is because the array is printed left-to-right, 
 whereas the bits in a byte are typically ordered right-to-left.
BitArray stores its data in a `size_t[]` array - on 64-bit it consists of 8-byte units, each unit having its beginning at the least significant bit. So it works as you would probably expect: bool[128] bits; auto ba = BitArray(bits); // one bool corresponds to one bit in this constructor ba[0] = 1; ba[127] = 1; Now, the exact representation of the array in this BitArray is as follows: index: | 0 | 1 | bit no: | 64 ... 1 0 | 63 ... 1 0 | values: | 0 ... 0 1 | 1 ... 0 0 | You can get this array using cast: size_t[] exact = cast(size_t[]) ba; assert(exact[0] == 1); assert(exact[1] == size_t(1) << 63); Printing works in the human, i.e. logical, way: writeln(ba); // [1, 0, ..., 0, ..., 0, 1] BitArray has also a constructor taking a raw representation of the underlying array: size_t val = 0b1111_0000; auto ba2 = BitArray([val], 8); writeln(ba2); // [0, 0, 0, 0, 1, 1, 1, 1] It may be surprising, as it's opposite order of the binary literal. But this works as intended because it's supposed to be the inversion of casting: assert(cast(size_t[]) ba2 == [val]); It also works like that e.g. in Java's BitSet and C++'s bitset. On the other hand, shifting operators are equally confusing for me, as they are for you - they really work in the other way around! I thought this is a very weird bug, but I found this pull request: https://github.com/dlang/phobos/pull/2844, which states this is the intended behaviour. I don't know if there is anybody that would expect this - it's inconsistent with any other bitset implementation I know from other languages, as well as with what logic suggests. I'm curious if the authors changed their minds in this regard and there is any chance for that to be rewritten? And besides, it's pretty bugged, indeed.
Feb 01 2018
parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Friday, 2 February 2018 at 00:33:03 UTC, Jakub Łabaj wrote:
 On the other hand, shifting operators are equally confusing for 
 me, as they are for you - they really work in the other way 
 around! I thought this is a very weird bug, but I found this 
 pull request: https://github.com/dlang/phobos/pull/2844, which 
 states this is the intended behaviour.

 I don't know if there is anybody that would expect this - it's 
 inconsistent with any other bitset implementation I know from 
 other languages, as well as with what logic suggests. I'm 
 curious if the authors changed their minds in this regard and 
 there is any chance for that to be rewritten?
The same confusion existed amongst the authos, that pull was done in reaction to https://github.com/dlang/phobos/pull/2797#discussion-diff-22456052. So the shift direction seems to be as intended (by Martin Nowak at least) but the docs should be clear in what order the bits are printed and indexed. I guess.
Feb 03 2018