www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bits and Bobs and Bools

reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Back to our good old D friend, the "bit" data type...
(with the two constant bits: true == 1 and false == 0)

While D will not ever have a boolean type like e.g. Java,
it doesn't hurt if the alternative "bit" is documented ?

Here is my attempt... Feel free to comment or extend.


The first thing that is good to know is that bit
is a "unsigned" data type. That is, if you convert
true to an integer, then you get a 1 and not a -1.

(having -1 as true might sound a bit silly, but has its
merits since that means all bits are set when converting
to a larger int type. This convention is in use elsewhere)

The only "universal" convention is that false is 0,
and this is also the .init value for the bit type.
(in boolean contexts, "null" has a value of 0 too)

The second that is good to know is that single bits
are stored in bytes, and in the least significant bit.
That is, "true" is equal to 0x01 when read as a byte.


There are a bunch of things about bits that are
"implementation defined", and that's not all that
good for a language that defines e.g. int sizes ?

It would be much simpler if these were defined ?
This would also allow external languages, such as
C, to access any bit variables in D structures...

http://www.digitalmars.com/d/interface.html:
 Data Type Compatibility
 
 D type    C type
 bit       no equivalent
Which is not true, it should be "signed char" in C/C++ ? (but C "bool" can't be used, since it might be 4 bytes) IMPLEMENTATION DETAIL: (D) assert(typeid(typeof(true)) == typeid(bit)); assert(typeid(typeof(false)) == typeid(bit)); assert(bit.sizeof == 1); assert(true == 1); assert(false == 0); Then, bit arrays. For efficiency reasons, these are stored in 32-bit (register sized) chunks, rounded up. Since D mandates a 32-bit minimum CPU, might as well make this implementation detail fixed in the D spec ? This means that bit[1].sizeof == 4, same as bit[32], while as bit[256] occupies 32 bytes - for the data (as usual there is some overhead for length and ptr) Dividing by 32 is the same as shifting right by 5. IMPLEMENTATION DETAIL: (C) unsigned int *data; void alloc(unsigned int bitdim) { unsigned int allocdim = (bitdim + 31) / 32; // rounding up data = (unsigned int *) calloc(allocdim, sizeof(unsigned int)); } void set(unsigned int bitnum) { data[bitnum >> 5] |= 1L << (bitnum & 31); } void clear(unsigned int bitnum) { data[bitnum >> 5] &= ~(1L << (bitnum & 31)); } int test(unsigned int bitnum) { return data[bitnum >> 5] & (1L << (bitnum & 31)); } Within the chunks, the bytes are stored in little-endian byte order but still using normal bit direction (LSB first) within each byte. That is, the first bit encountered is the least significant, and this differs from e.g. how the bit integer literals work...
void main()
{
  printf("%02x\n", 0b10000000);
  printf("%02x\n", 0b01000000);
  printf("%02x\n", 0b00100000);
  printf("%02x\n", 0b00010000);
  printf("%02x\n", 0b00001000);
  printf("%02x\n", 0b00000100);
  printf("%02x\n", 0b00000010);
  printf("%02x\n", 0b00000001);
}
80 40 20 10 08 04 02 01 That's the reverse bit order, compared to what bit[] uses: (note that the byte ordering is the same on all platforms, just interpreted differently when loaded as 32-bit integers, This byte order needs to be finalized, and written in spec)
void main()
{
  bit[32] ba;

  for (int i = 0; i < 32; i++)
  {
    ba[] = false;
    ba[i] = true;
    printf("%08lx\n", *(cast(uint*) ba.ptr));
  }
}
version (LittleEndian) // e.g. X86 { 00000001 00000002 00000004 00000008 00000010 00000020 00000040 00000080 00000100 00000200 00000400 00000800 00001000 00002000 00004000 00008000 00010000 00020000 00040000 00080000 00100000 00200000 00400000 00800000 01000000 02000000 04000000 08000000 10000000 20000000 40000000 80000000 } version (BigEndian) // e.g. PPC { 01000000 02000000 04000000 08000000 10000000 20000000 40000000 80000000 00010000 00020000 00040000 00080000 00100000 00200000 00400000 00800000 00000100 00000200 00000400 00000800 00001000 00002000 00004000 00008000 00000001 00000002 00000004 00000008 00000010 00000020 00000040 00000080 } Bit pointers (bit*) are allowed, but can only address the first bit of each byte, directly. This means that you can *not* take the address of individual bits in an array, just to the start of the bit[] array itself... (this can easily be accessed by using the .ptr property, or take the address of an individual bit variable : &b) However, you can then use this bit pointer with the regular index, as in bp[4], to set individual bits... (just as you could do with a bit[] bit array ,as well) But incrementing is a little funny, it will be divided by eight in order to fit the next whole byte address.
  bit[32] ba;
  bit* bp = ba.ptr;

  printf("%x\n", bp);
  bp += 4;
  printf("%x\n", bp);
  bp += 4;
  printf("%x\n", bp);
  bp += 8;
  printf("%x\n", bp);
// an example, actual pointer values might differ bffffaa8 bffffaa8 bffffaa8 bffffaa9 For similar reasons, slices of bit arrays are only allowed when the lower bound starts on an even byte (that is: bit number is a multiple of 8, "% 8 == 0") Misaligned bit array slices throws ArrayBoundsError, or silently does nothing for the -release builds. Appending to bit arrays might not be implemented. bool: (bit) Having bit as the default boolean type in D now means that there can be only *one* value for true: 1 (one), all other non-zero values are converted when casting. wbool: (byte) While the shift and bit operators involved in getting and setting individual bits are pretty fast, it might be yet faster to just use byte arrays instead of bit arrays. dbool: (int) And since individual bytes might need masking to fit into a register, it might be faster still to use the int type to eg. return bits from functions. IMPLEMENTATION DETAIL: (D) alias bit bool; // boolean (true/false) alias byte wbool; // wide boolean (like wchar) alias int dbool; // double boolean (like dchar) byte c = cast(byte) 256; // c is now equal to "0" bit b = cast(bit) 2; // b is now equal to "true" int i = cast(int) b; // i is now equal to "1" I will probably stick this whole long post on Wiki4D. (under the GNU Free Documentation License, that is) --anders
Feb 10 2005
next sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:cufvfa$2aet$1 digitaldaemon.com...
 I will probably stick this whole long post on Wiki4D.
 (under the GNU Free Documentation License, that is)
Please do, and thanks. It's a good article.
Feb 10 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter wrote:

I will probably stick this whole long post on Wiki4D.
Please do, and thanks. It's a good article.
Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools Here was some of the questions raised: - Should the byte order of bit[] be "fixed", like it seems to be now ? (which could be just an accident...) Or should the uint's use native byte endianness instead ? - Taking the address of a bit in an array seems to a) not complain and b) give a bogus address ? (that is: bit[] ba; bit *bp = &ba[10];) My suggestion is that it should return: ba + (10 / 8) == ba + 1 (rounded to byte) ? Similar to using : bp = ba; bp += 10; - wbool and dbool could be added, as "official" aliases perhaps ? They could all go in std.stdbool... wbool[8191] flags; // from sieve.d dbool opEquals(Object o); // from object.d I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. This probably also nixes the "isnot" keyword, as in: assert(object isnot null); which becomes assert(object); After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. --anders
Feb 10 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
Deaf ears prepare:

    bit is a useless wart and should be removed.

There, I said it. Now ignore me.

"Anders F Björklund" <afb algonet.se> wrote in message 
news:cugafi$2mrr$1 digitaldaemon.com...
 Walter wrote:

I will probably stick this whole long post on Wiki4D.
Please do, and thanks. It's a good article.
Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools Here was some of the questions raised: - Should the byte order of bit[] be "fixed", like it seems to be now ? (which could be just an accident...) Or should the uint's use native byte endianness instead ? - Taking the address of a bit in an array seems to a) not complain and b) give a bogus address ? (that is: bit[] ba; bit *bp = &ba[10];) My suggestion is that it should return: ba + (10 / 8) == ba + 1 (rounded to byte) ? Similar to using : bp = ba; bp += 10; - wbool and dbool could be added, as "official" aliases perhaps ? They could all go in std.stdbool... wbool[8191] flags; // from sieve.d dbool opEquals(Object o); // from object.d I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. This probably also nixes the "isnot" keyword, as in: assert(object isnot null); which becomes assert(object); After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. --anders
Feb 10 2005
next sibling parent Kris <Kris_member pathlink.com> writes:
Hear, Hear! For that deaf ear!

:-)

In article <cugdv5$2r8g$2 digitaldaemon.com>, Matthew says...
Deaf ears prepare:

    bit is a useless wart and should be removed.

There, I said it. Now ignore me.

"Anders F Björklund" <afb algonet.se> wrote in message 
news:cugafi$2mrr$1 digitaldaemon.com...
 Walter wrote:

I will probably stick this whole long post on Wiki4D.
Please do, and thanks. It's a good article.
Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools Here was some of the questions raised: - Should the byte order of bit[] be "fixed", like it seems to be now ? (which could be just an accident...) Or should the uint's use native byte endianness instead ? - Taking the address of a bit in an array seems to a) not complain and b) give a bogus address ? (that is: bit[] ba; bit *bp = &ba[10];) My suggestion is that it should return: ba + (10 / 8) == ba + 1 (rounded to byte) ? Similar to using : bp = ba; bp += 10; - wbool and dbool could be added, as "official" aliases perhaps ? They could all go in std.stdbool... wbool[8191] flags; // from sieve.d dbool opEquals(Object o); // from object.d I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. This probably also nixes the "isnot" keyword, as in: assert(object isnot null); which becomes assert(object); After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. --anders
Feb 10 2005
prev sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Matthew wrote:

 Deaf ears prepare:
 
     bit is a useless wart and should be removed.
 
 There, I said it. Now ignore me.
I think we had that conversion already :-) I would like a boolean type too, but I don't consider it important enough to drop D for. And it's pretty obvious that Walter won't change this lemon, so I made some limonade... Just consider it the _Bool of D, right ? --anders PS. However, this statement is nonsense and should go: http://www.digitalmars.com/d/overview.html:
 Bit type

 The fundamental data type is the bit, and D has  a bit
 data type. This is most useful in creating arrays of bits:
 
 	bit[] foo;
Bit is *not* the fundamental data type, but byte is... (these days, one could even consider "int" fundamental)
Feb 10 2005
prev sibling parent reply Derek <derek psych.ward> writes:
On Thu, 10 Feb 2005 16:43:38 +0100, Anders F Björklund wrote:


[snip]

 
 Then, bit arrays. For efficiency reasons, these are
 stored in 32-bit (register sized) chunks, rounded up.
 Since D mandates a 32-bit minimum CPU, might as well
 make this implementation detail fixed in the D spec ?
 
 This means that bit[1].sizeof == 4, same as bit[32],
 while as bit[256] occupies 32 bytes - for the data
 (as usual there is some overhead for length and ptr)
 Dividing by 32 is the same as shifting right by 5.
 
While this might work for bits, it doesn't for booleans. An array of individually addressable boolean values is a useful construct. bool[6] Flags; Flags[2] = true; Flags[4] = False; if (Flags[3] || Flags[5]) . . . foobar( Flags[1] ); void foobar(inout bool aSwitch) { . . . } etc... -- Derek Melbourne, Australia
Feb 10 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Derek wrote:

This means that bit[1].sizeof == 4, same as bit[32],
while as bit[256] occupies 32 bytes - for the data
(as usual there is some overhead for length and ptr)
Dividing by 32 is the same as shifting right by 5.
While this might work for bits, it doesn't for booleans. An array of individually addressable boolean values is a useful construct.
So use wbool instead ? Individual bit pointers just don't exist. alias byte wbool; (well, I do know that Stewart coded an implementation, but IRL... http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/495) There is no spoon. --anders
Feb 11 2005