www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bit slicing bug

reply Arcane Jill <Arcane_member pathlink.com> writes:
I have to mention this.

If you declare:

       bit[] b, c;

and have:
       b.length = 128;
       c.length = 16;

then:
       b[32..48] = c[0..16];

will access-violate. It in fact tries to write to (cast (ubyte *)b)[32] ... that is, b[128]. It doesn't fail bounds checking, it just falls over. I just spent a good few hours trying to figure out why my program was crashing. Turned out that was the reason. .. and I haven't even /dared/ try something like:
       // given bit[] b
       return b[5..9];

I suggest that correct behavior for bit slicing should be:
       // pseudocode
       bit[] opSlice(uint i, uint j)
       {
           assert((i & 7) == 0);
           assert((j & 7) == 0);
           return whole number of bytes, with length = (8 * number of bytes)
       }

This will matter in time, when people start coding standard containers, and will expect bit[] to behave pretty much like byte[]. In the meantime, I'm off to try to find a workaround for my problem... Arcane Jill
May 30 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:
<snip>
      b[32..48] = c[0..16];

will access-violate. It in fact tries to write to (cast (ubyte *)b)[32] ... that is, b[128]. It doesn't fail bounds checking, it just falls over.

Yes, bit slicing has been broken practically since the beginning of time.
 This will matter in time, when people start coding standard containers, and
will
 expect bit[] to behave pretty much like byte[]. In the meantime, I'm off to try
 to find a workaround for my problem...

You can use the functions I posted last week: digitalmars.D.bugs/313 Walter, are you receiving me? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 01 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <c9hj20$fqq$1 digitaldaemon.com>, Stewart Gordon says...
You can use the functions I posted last week:
digitalmars.D.bugs/313

Ah hell. I just (last night) finished writing my OWN workaround. My own code is tiny compared with yours, but that's only because I restricted my workaround to copying on byte boundaries. (It was all I needed). Your code is more general. Of course, now that I *have* a workaround, the actual bug itself has become less important to me. I built the workaround in etc.workaround.bitslice. (I figured that etc.workaround.xxxxx could be used for getting round all sorts of compiler problems, as and when they are discovered). Obviously, it's faster for us coders to write a workaround than it is for us to report and problem and wait for it to be fixed. The cool thing about my bitslice module is that the unittest actually unittests D itself - so these unittests will fail until the bug is fixed! But build it with version(BitSliceWorkaround) enabled and then the workaround kicks in and then the unittests pass. Although your solution is more general than mine, allowing you to copy on non-byte boundaries, I question the wisdom of this. My reasoning is as follows: A bit array is stored in D as an eight-byte structure. A bit /slice/ is itself a bit array, and therefore itself takes eight bytes. The first four bytes are the length (number of bit-elements) of the array, and the second four bytes are the address of the array. Now - crucially - you cannot take the address of a bit, since a byte is the smallest addressable unit. So, if a bit array b is stored as { length, p }, and we slice it to b[8..b,length] then it is transformed into { length-8, p+1 }. But there is no way that this scheme could be extended into fractions of a byte. So Walter has two choices. Either: (1) Restrict bit-slicing to whole numbers of bytes, and throw an exception if you try to bit-slice on non byte boundaries. (2) Use a different structure to represent bit arrays - perhaps something like { length, bitNum, pointer } I can live with (1). What say the rest of you? Arcane Jill
Jun 01 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:

<snip>
 Although your solution is more general than mine, allowing you to 
 copy on non-byte boundaries, I question the wisdom of this. My 
 reasoning is as follows:

You reason only on implementation issues that are already taken care of. A real reasoning would also take practical value into consideration.
 A bit array is stored in D as an eight-byte structure. A bit /slice/ 
 is itself a bit array, and therefore itself takes eight bytes. The 
 first four bytes are the length (number of bit-elements) of the 
 array, and the second four bytes are the address of the array.

That's an implementation issue, I think you'll find.
 Now - crucially - you cannot take the address of a bit, since a byte 
 is the smallest addressable unit. So, if a bit array b is stored as 
 { length, p }, and we slice it to b[8..b,length] then it is 
 transformed into { length-8, p+1 }. But there is no way that this 
 scheme could be extended into fractions of a byte.

My implementation manages perfectly well to create a copy of a bit array slice if it needs to.
 So Walter has two choices. Either:

Not quite. You've merely given a debatable opinion, not taken away a choice.
 (1) Restrict bit-slicing to whole numbers of bytes, and throw an 
 exception if you try to bit-slice on non byte boundaries.

Being able to generally slice and concatenate bit arrays most certainly has its uses. I've done things with it already. The fact is that arbitrary bit slicing is allowed by the specification. To add an arbitrary restriction at this point would be pointless. If there's any exception to throw, it would be "not yet implemented". But considering that I've gone to the trouble of implementing it, there really is nothing left to throw. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 01 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <c9hu66$vg1$1 digitaldaemon.com>, Stewart Gordon says...
The fact is that arbitrary bit slicing is allowed by the specification. 
  To add an arbitrary restriction at this point would be pointless.  If 
there's any exception to throw, it would be "not yet implemented".  But 
considering that I've gone to the trouble of implementing it, there 
really is nothing left to throw.

I completlely agree. It's just that:
       b[3..9] = c[2..10]

ought to be completely equivalent to:
       bit[] t = c[2..10];
       b[3..9] = t;

So the question fow Walter then becomes, how should t be represented internally. The present implementation cannot represent t at all. Yes, this is an implementation issue, but still, it'll be Walter that has to do the implementing. That's why I was suggesting a different internal representation for bit arrays. Of course, another possibility is that
       bit[] t = c[2..10];

could be achieved within the PRESENT implementation by doing a *copy-by-value*, instead of just manipulating the reference. Perhaps that is the way go? It may be slower, but at least it would work. Jill
Jun 01 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:

<snip>
 I completlely agree. It's just that:
 
      b[3..9] = c[2..10]

ought to be completely equivalent to:
      bit[] t = c[2..10];
      b[3..9] = t;


If by "completely equivalent" you mean "compiler required to generate the exact same code" then no. Even the most simplistic optimiser would generate only one copy operation for the former, whereas it would take a bit more work to optimise the latter. Indeed, the type analysis ought to consider a slice with hard-coded dimensions to be a static array, and hence you'd get a compile-time error as bit[6] and bit[8] are incompatible types. But in the general case of a 'valid' program, if by equivalent we mean having identical effect, then yes. <snip>
 Of course, another possibility is that
 
      bit[] t = c[2..10];

could be achieved within the PRESENT implementation by doing a *copy-by-value*, instead of just manipulating the reference. Perhaps that is the way go? It may be slower, but at least it would work.

Exactly. With my code, this would call copyBits, which in turn copies bit by bit. Of course, if that were bit[] t = c[8..10]; then copyBits would use the byte-optimised alg. Of course, it's trivial to change it so that t is merely a reference into c iff the slice is byte-aligned. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 01 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <c9ifn6$1pqi$1 digitaldaemon.com>, Stewart Gordon says...

you'd get a compile-time 
error as bit[6] and bit[8] are incompatible types.

True. That was a typo. I intended for them both to be the same size. My apologies.
 Of course, another possibility is that
 
      bit[] t = c[2..10];

could be achieved within the PRESENT implementation by doing a *copy-by-value*, instead of just manipulating the reference. Perhaps that is the way go? It may be slower, but at least it would work.

Exactly. With my code, this would call copyBits, which in turn copies bit by bit. Of course, if that were bit[] t = c[8..10]; then copyBits would use the byte-optimised alg. Of course, it's trivial to change it so that t is merely a reference into c iff the slice is byte-aligned.

Cool. Well, I'd be completely happy with that. Jill
Jun 01 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
Just thought of something. Consider the following code:

       // given that b is a byte[] of length 16
       byte[] c = b[3..6];
       c[1] = 1;
       assert (c[1] == b[4]);

This will modify b[4], since c is copied by reference. For generic programming purposes, we might expect the following to hold for any type T:
       // given that b is a T[] of length 16
       T[] c = b[3..6];
       c[1] = 1;
       assert (c[1] == b[4]);

In particular:
       // given that b is a bit[] of length 16
       bit[] c = b[3..6];
       c[1] = 1;
       assert (c[1] == b[4]);

However, this assert will fail if the bits are copied by value. That was the basis of my concern. Sorry it took me a while to articulate it. I would expect identical behavior for bits as for any other type. That's why I was suggesting that bit array references need a different internal representation. Jill
Jun 01 2004
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:

<snip>
 This will modify b[4], since c is copied by reference. For generic programming
 purposes, we might expect the following to hold for any type T:

 In particular:
 
 
      // given that b is a bit[] of length 16
      bit[] c = b[3..6];
      c[1] = 1;
      assert (c[1] == b[4]);


Ah, you do have a point there. Requiring bit[] c = b[3..6].dup; would be one possibility, meaning that the template would fail to compile if T == bit. But that would pretty much prevent use of the byte-aligned optimisation. Of course, supporting expressions involving bit slices would then open a new can of worms. Maybe that assert is really necessary to check that someone doesn't use bits and automatically assume it's worked. Even for non-bit types, is slicing by reference actually guaranteed anyway?
 However, this assert will fail if the bits are copied by value. That was the
 basis of my concern. Sorry it took me a while to articulate it. I would expect
 identical behavior for bits as for any other type. That's why I was suggesting
 that bit array references need a different internal representation.

Maybe you're right. My implementation functions could be adapted to a bit-indexed representation, though obviously we'd need access to the internals if we want efficiency. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 02 2004