digitalmars.D - Bit slicing bug

Arcane Jill (14/27) May 30 2004 and have:

Stewart Gordon (12/20) Jun 01 2004

Arcane Jill (30/32) Jun 01 2004 Ah hell. I just (last night) finished writing my OWN workaround. My own ...

Stewart Gordon (22/37) Jun 01 2004 You reason only on implementation issues that are already taken care of....

Arcane Jill (13/22) Jun 01 2004 ought to be completely equivalent to:

Stewart Gordon (23/38) Jun 01 2004 If by "completely equivalent" you mean "compiler required to generate

Arcane Jill (5/20) Jun 01 2004 True. That was a typo. I intended for them both to be the same size. My

Arcane Jill (9/21) Jun 01 2004 This will modify b[4], since c is copied by reference. For generic progr...

Stewart Gordon (20/33) Jun 02 2004

Arcane Jill <Arcane_member pathlink.com> writes:

I have to mention this.

If you declare:

       bit[] b, c;

and have:

       b.length = 128;
       c.length = 16;

then:

       b[32..48] = c[0..16];

will access-violate. It in fact tries to write to (cast (ubyte *)b)[32] ... that
is, b[128]. It doesn't fail bounds checking, it just falls over.

I just spent a good few hours trying to figure out why my program was crashing.
Turned out that was the reason.

.. and I haven't even /dared/ try something like:

       // given bit[] b
       return b[5..9];

I suggest that correct behavior for bit slicing should be:

       // pseudocode
       bit[] opSlice(uint i, uint j)
       {
           assert((i & 7) == 0);
           assert((j & 7) == 0);
           return whole number of bytes, with length = (8 * number of bytes)
       }

This will matter in time, when people start coding standard containers, and will
expect bit[] to behave pretty much like byte[]. In the meantime, I'm off to try
to find a workaround for my problem...

Arcane Jill

May 30 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:
<snip>
      b[32..48] = c[0..16];

 
 
 will access-violate. It in fact tries to write to (cast (ubyte *)b)[32] ...
that
 is, b[128]. It doesn't fail bounds checking, it just falls over.

<snip>

Yes, bit slicing has been broken practically since the beginning of time.

 This will matter in time, when people start coding standard containers, and
will
 expect bit[] to behave pretty much like byte[]. In the meantime, I'm off to try
 to find a workaround for my problem...

You can use the functions I posted last week:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313

Walter, are you receiving me?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 01 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <c9hj20$fqq$1 digitaldaemon.com>, Stewart Gordon says...
You can use the functions I posted last week:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313

Ah hell. I just (last night) finished writing my OWN workaround. My own code is
tiny compared with yours, but that's only because I restricted my workaround to
copying on byte boundaries. (It was all I needed). Your code is more general.

Of course, now that I *have* a workaround, the actual bug itself has become less
important to me. I built the workaround in etc.workaround.bitslice. (I figured
that etc.workaround.xxxxx could be used for getting round all sorts of compiler
problems, as and when they are discovered). Obviously, it's faster for us coders
to write a workaround than it is for us to report and problem and wait for it to
be fixed. The cool thing about my bitslice module is that the unittest actually
unittests D itself - so these unittests will fail until the bug is fixed! But
build it with version(BitSliceWorkaround) enabled and then the workaround kicks
in and then the unittests pass.

Although your solution is more general than mine, allowing you to copy on
non-byte boundaries, I question the wisdom of this. My reasoning is as follows:

A bit array is stored in D as an eight-byte structure. A bit /slice/ is itself a
bit array, and therefore itself takes eight bytes. The first four bytes are the
length (number of bit-elements) of the array, and the second four bytes are the
address of the array. Now - crucially - you cannot take the address of a bit,
since a byte is the smallest addressable unit. So, if a bit array b is stored as
{ length, p }, and we slice it to b[8..b,length] then it is transformed into {
length-8, p+1 }. But there is no way that this scheme could be extended into
fractions of a byte.

So Walter has two choices. Either:
(1) Restrict bit-slicing to whole numbers of bytes, and throw an exception if
you try to bit-slice on non byte boundaries.
(2) Use a different structure to represent bit arrays - perhaps something like {
length, bitNum, pointer }

I can live with (1). What say the rest of you?

Arcane Jill

Jun 01 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:

<snip>
 Although your solution is more general than mine, allowing you to 
 copy on non-byte boundaries, I question the wisdom of this. My 
 reasoning is as follows:

You reason only on implementation issues that are already taken care of. 
  A real reasoning would also take practical value into consideration.

 A bit array is stored in D as an eight-byte structure. A bit /slice/ 
 is itself a bit array, and therefore itself takes eight bytes. The 
 first four bytes are the length (number of bit-elements) of the 
 array, and the second four bytes are the address of the array.

That's an implementation issue, I think you'll find.

 Now - crucially - you cannot take the address of a bit, since a byte 
 is the smallest addressable unit. So, if a bit array b is stored as 
 { length, p }, and we slice it to b[8..b,length] then it is 
 transformed into { length-8, p+1 }. But there is no way that this 
 scheme could be extended into fractions of a byte.

My implementation manages perfectly well to create a copy of a bit array 
slice if it needs to.

 So Walter has two choices. Either:

Not quite.  You've merely given a debatable opinion, not taken away a 
choice.

 (1) Restrict bit-slicing to whole numbers of bytes, and throw an 
 exception if you try to bit-slice on non byte boundaries.

<snip>

Being able to generally slice and concatenate bit arrays most certainly 
has its uses.  I've done things with it already.

The fact is that arbitrary bit slicing is allowed by the specification. 
  To add an arbitrary restriction at this point would be pointless.  If 
there's any exception to throw, it would be "not yet implemented".  But 
considering that I've gone to the trouble of implementing it, there 
really is nothing left to throw.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 01 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <c9hu66$vg1$1 digitaldaemon.com>, Stewart Gordon says...
The fact is that arbitrary bit slicing is allowed by the specification. 
  To add an arbitrary restriction at this point would be pointless.  If 
there's any exception to throw, it would be "not yet implemented".  But 
considering that I've gone to the trouble of implementing it, there 
really is nothing left to throw.

I completlely agree. It's just that:

       b[3..9] = c[2..10]

ought to be completely equivalent to:

       bit[] t = c[2..10];
       b[3..9] = t;

So the question fow Walter then becomes, how should t be represented internally.
The present implementation cannot represent t at all. Yes, this is an
implementation issue, but still, it'll be Walter that has to do the
implementing. That's why I was suggesting a different internal representation
for bit arrays.

Of course, another possibility is that

       bit[] t = c[2..10];

could be achieved within the PRESENT implementation by doing a *copy-by-value*,
instead of just manipulating the reference. Perhaps that is the way go? It may
be slower, but at least it would work.

Jill

Jun 01 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:

<snip>
 I completlely agree. It's just that:
 
      b[3..9] = c[2..10]

 
 ought to be completely equivalent to:
 
      bit[] t = c[2..10];
      b[3..9] = t;


If by "completely equivalent" you mean "compiler required to generate 
the exact same code" then no.  Even the most simplistic optimiser would 
generate only one copy operation for the former, whereas it would take a 
bit more work to optimise the latter.

Indeed, the type analysis ought to consider a slice with hard-coded 
dimensions to be a static array, and hence you'd get a compile-time 
error as bit[6] and bit[8] are incompatible types.

But in the general case of a 'valid' program, if by equivalent we mean 
having identical effect, then yes.

<snip>
 Of course, another possibility is that
 
      bit[] t = c[2..10];

 
 could be achieved within the PRESENT implementation by doing a 
 *copy-by-value*, instead of just manipulating the reference. Perhaps 
 that is the way go? It may be slower, but at least it would work.

Exactly.  With my code, this would call copyBits, which in turn copies 
bit by bit.  Of course, if that were

	bit[] t = c[8..10];

then copyBits would use the byte-optimised alg.  Of course, it's trivial 
to change it so that t is merely a reference into c iff the slice is 
byte-aligned.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 01 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <c9ifn6$1pqi$1 digitaldaemon.com>, Stewart Gordon says...

you'd get a compile-time 
error as bit[6] and bit[8] are incompatible types.

True. That was a typo. I intended for them both to be the same size. My
apologies.



 Of course, another possibility is that
 
      bit[] t = c[2..10];

 
 could be achieved within the PRESENT implementation by doing a 
 *copy-by-value*, instead of just manipulating the reference. Perhaps 
 that is the way go? It may be slower, but at least it would work.

Exactly.  With my code, this would call copyBits, which in turn copies 
bit by bit.  Of course, if that were

	bit[] t = c[8..10];

then copyBits would use the byte-optimised alg.  Of course, it's trivial 
to change it so that t is merely a reference into c iff the slice is 
byte-aligned.

Cool. Well, I'd be completely happy with that.

Jill

Jun 01 2004

Arcane Jill <Arcane_member pathlink.com> writes:

Just thought of something. Consider the following code:

       // given that b is a byte[] of length 16
       byte[] c = b[3..6];
       c[1] = 1;
       assert (c[1] == b[4]);

This will modify b[4], since c is copied by reference. For generic programming
purposes, we might expect the following to hold for any type T:

       // given that b is a T[] of length 16
       T[] c = b[3..6];
       c[1] = 1;
       assert (c[1] == b[4]);

In particular:

       // given that b is a bit[] of length 16
       bit[] c = b[3..6];
       c[1] = 1;
       assert (c[1] == b[4]);

However, this assert will fail if the bits are copied by value. That was the
basis of my concern. Sorry it took me a while to articulate it. I would expect
identical behavior for bits as for any other type. That's why I was suggesting
that bit array references need a different internal representation.

Jill

Jun 01 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:

<snip>
 This will modify b[4], since c is copied by reference. For generic programming
 purposes, we might expect the following to hold for any type T:

<snip>
 In particular:
 
 
      // given that b is a bit[] of length 16
      bit[] c = b[3..6];
      c[1] = 1;
      assert (c[1] == b[4]);


Ah, you do have a point there.  Requiring

	bit[] c = b[3..6].dup;

would be one possibility, meaning that the template would fail to 
compile if T == bit.  But that would pretty much prevent use of the 
byte-aligned optimisation.  Of course, supporting expressions involving 
bit slices would then open a new can of worms.  Maybe that assert is 
really necessary to check that someone doesn't use bits and 
automatically assume it's worked.

Even for non-bit types, is slicing by reference actually guaranteed anyway?

 However, this assert will fail if the bits are copied by value. That was the
 basis of my concern. Sorry it took me a while to articulate it. I would expect
 identical behavior for bits as for any other type. That's why I was suggesting
 that bit array references need a different internal representation.

Maybe you're right.  My implementation functions could be adapted to a 
bit-indexed representation, though obviously we'd need access to the 
internals if we want efficiency.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 02 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Bit slicing bug