www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bits again. A proposal.

reply larrycowan <larrycowan_member pathlink.com> writes:
One more time...

C has it wrong in several ways.  I have programmed C for over 20 years.

Bools are bits, but bits are not only boolean.  Arithmetics, characters, and
addresses are not boolean.  Comparisons produce a boolean value.

Bits are 1:0, true:false, yes:no, flags, switches, checkmarks, ...

Bits are not useful as numbers for arithmetic operations.  Their arrays should
not automatically become so.

I have no problem with numbers and characters being subject to logic operations,
but bits are not directly useful for arithmetic operations.

Perhaps shifting of bit arrays could be useful when used as a rotating mask or
some such.  We have that with slicing of arrays as well as selection out of
contiguous groups of flags or switches.

Comparative operations result in a true:false value entirely interpretable as a
bit.  They should be valid in the right hand side of an assignment or
initialization for bits, but not numerics.

Casting of a bit array as a numeric to be able to compare it to a number does
make some sense and could be implicit.  Assembling a number in a bit array and
then casting it is less useful, but maybe.  We do have logic operations directly
on numerics and characters to do anything that may be needed there.  Explicit
casting of a numeric or character as a bit array is probably necessary for
analytic purposes, but should never be implicit.  Explicit casting of a bit
array as a character or numeric should probably be handled by unions.


I propose the following:

1. All usage of numerics, characters, and addresses as boolean values should be
rejected by the compiler.  The need for === as well as == should blow away use
of addresses, and the evaluation of true as "anything not composed fully of zero
bits" is just nasty.

2. All usage of bits and bit arrays for numeric operations (except comparison as
allowed below) should be rejected by the compiler.  Logic operations suffice and
are more descriptive and are well-defined.

3. Comparisons between numbers and characters with bits and bit arrays are
well-defined and should be supported with implicit casting.  High order 0 bits
should be assumed as necesssary for compatible unit sizes.

4. No other implicit casts should be injected back and forth.

5. Shift operations on bit arrays are unnecessary and should not be supported
except by slicing and concatenating.  The compiler should be able to optimize
down to minimal operations when appropriate.

6. There should be no need for compiler requirements of either packed or
not-packed bit arrays or arrays of bit arrays.  Structs and arrays of structs
can fully describe anything there (see below).  Implementation should be left to
the compiler vendor for this.  It's tied too closely to optimization.

7. Structs with successive bits and bit arrays should be packable down to
minimum size|width.  Offsetting can be accomplished explicitly with dummy bits.
Bytes, shorts, ints, chars, etc should not intrude - they have their own
boundary considerations which should still be met with minimum of modulo-8 bit
addresses.  It would be nice if placeholder bits and bytes did not require
names, but that's just syntactic sugar.  

8. Unions which include bits and|or bit arrays can fill in the rest, and
probably should be used rather than casting bit arrays elsewhere anyway..
It therefore would be possible to disallow even explicit casting back and forth
between logic and other representations but I'm not proposing this. 

9. Add on:off as keywords identical to true:false.  Their valuation as 1:0 is
consistent with most other languages and programmatic use (though often allowed
as any:0 instead), so keep that.

10. Fix that C/C++ code.  It won't hurt to check the logic anyway - it may not
be exactly the same when pointers and references are considered properly.
While(1) {} and for(;;) {} both can easily be changed to while(true) {} and be
more safely readable.  [ We have "foreach" now, we could add "forever".  I like
the "forever break;" statement as a no-op. ] We have already blown away
"for(...;...;...) ;" for good reasons, so there's some precedent for changes
which require actually looking at C code (if you can find any without
preprocessor use) before D compiling it.

11. Let's please take at least this step toward making type safety possible.


-------------------

Another thread should take up the ignorance of C about arithmetic and logical
overflows and underflows and propose something there. (Not necessarily unaware,
but no provision for access and handling.)  The hardware knows, why can't the
programmer?.  ints uints, oints, ouints?  smart operations?  an _ or an  
standard variable?  this gets tangled with the problems of actual vs. potential
lossy casts as well.  And don't tell me to use assembler when I want this...
Do not reply to this in this thread, please.

-------------------- 





6. 
Oct 14 2004
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
larrycowan wrote:

 One more time...

New thread, old discussion...
 I propose the following:
 
 1. All usage of numerics, characters, and addresses as boolean values should be
 rejected by the compiler.  The need for === as well as == should blow away use
 of addresses, and the evaluation of true as "anything not composed fully of
zero
 bits" is just nasty.

Note: This D suggestion has been rejected by Walter. D currently follows the same "nasty" rules as C/C++. Type safety as in C# is more likely first in "D 2.0"
 9. Add on:off as keywords identical to true:false.  Their valuation as 1:0 is
 consistent with most other languages and programmatic use (though often allowed
 as any:0 instead), so keep that.

Hear, hear! This is more or less the same thing as I suggested earlier, keywords to the equivalent effect of the following #defines: const bit on = 1; const bit off = 0; Preferrably, a "bool" type (like C++) can be added to D? Barring that, the same kludge as in C99 would have to do: module std.stdbool; alias bit bool; const bool true = 1; const bool false = 0;
 11. Let's please take at least this step toward making type safety possible.

The first step would be *having* two different types for bits and bools? --anders
Oct 14 2004
next sibling parent reply larry cowan <larry_member pathlink.com> writes:
In article <ckm861$2nng$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
larrycowan wrote:

 One more time...

New thread, old discussion...
 I propose the following:
 
 1. All usage of numerics, characters, and addresses as boolean values should be
 rejected by the compiler.  The need for === as well as == should blow away use
 of addresses, and the evaluation of true as "anything not composed fully of
zero
 bits" is just nasty.

Note: This D suggestion has been rejected by Walter.

I know that. Not just once, many times. Still, too many peole have the opinion that this should change.
D currently follows the same "nasty" rules as C/C++.

So? Does that make it good?
Type safety as in C# is more likely first in "D 2.0"

No, not "like in C#", type safety is not a MS concept! ..
 11. Let's please take at least this step toward making type safety possible.

The first step would be *having* two different types for bits and bools? --anders

What justification is there for a bit type that is not only the boolean? Are you sure you don't want signed and unsigned bits? You could use the signed bit in a union with the high bit of an unsigned int to pretend it was signed and invent your own arithmetic. And an unsigned bit similarly to see if you were vulnerable to overflows... No - we don't need bits for arithmetic! or to build utf-13 character sets with! My proposal does not give us type safety (though it would greatly help), but it surely does make it less disruptive to provide it later.
Oct 14 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
larry cowan wrote:

D currently follows the same "nasty" rules as C/C++.

So? Does that make it good?

Well, perhaps not. But it does make it familiar to many ? Just like you, I had hoped for D to improve upon this - like it improves on other things that are "old" in C.
Type safety as in C# is more likely first in "D 2.0"

No, not "like in C#", type safety is not a MS concept!

Isn't it? Right after they invented the Internet ? :-) Seriously, I meant "implemented as in C#". I used to say "as in Java", but they chose the "boolean" keyword instead ? In C#, bool is just an alias for System.Boolean but who cares. It's not convertable to the other types, like int and such. I wrote a long essay about the now somewhat boring subject in: digitalmars.D/11757 --anders BTW; I am using Mono for my C# needs. Not that they are many.
Oct 14 2004
next sibling parent David L. Davis <SpottedTiger yahoo.com> writes:
larry cowan wrote: No, not "like in C#", type safety is not a MS concept!

Anders wrote: Isn't it? Right after they invented the Internet ? :-)

No, I think it was Al Gore who raised his hand first and took the credit for inventing the Internet...soon after he learned how to spell "potato." :) Sorry, I just couldn't help myself. David L. ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Oct 14 2004
prev sibling next sibling parent larrycowan <larrycowan_member pathlink.com> writes:
In article <ckmauc$2qpk$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
larry cowan wrote:

D currently follows the same "nasty" rules as C/C++.

So? Does that make it good?

Well, perhaps not. But it does make it familiar to many ?

So is the preprocessor.
Just like you, I had hoped for D to improve upon this -
like it improves on other things that are "old" in C.

Type safety as in C# is more likely first in "D 2.0"

No, not "like in C#", type safety is not a MS concept!

Isn't it? Right after they invented the Internet ? :-)

No, I invented it. I just couldn't spare the time to implement it.
Seriously, I meant "implemented as in C#". I used to say
"as in Java", but they chose the "boolean" keyword instead ?

In C#, bool is just an alias for System.Boolean but who cares.
It's not convertable to the other types, like int and such.

A boolean class would provide whatever we wanted however we wanted it, but not as efficiently most likely. But that doesn't kill the bad C'isms we are carrying forward in regard to expression defaults based on "any nozero is true" concepts. I don't even want "bit[] xa; ...; if (xa) {}" to be valid as "if any bit in xa is non-zero, then...".
I wrote a long essay about the now somewhat boring subject in:
digitalmars.D/11757

--anders

I read it (and carried forward copies of it). I have been around since last Feb, and have written a good bit of D code. I just don't post much. Did you know D code can randomly deal and find the winner in over 20,000 7-player hands of Texas Hold'em per second keeping stats? (my 700mHz w2k laptop) Makes Monte Carlo a fun thing. 5 times that fast to deal and evaluate 1-player hands.
BTW; I am using Mono for my C# needs. Not that they are many.

Oct 14 2004
prev sibling parent Sjoerd van Leent <svanleent wanadoo.nl> writes:
Anders F Björklund wrote:
 larry cowan wrote:
 
 Type safety as in C# is more likely first in "D 2.0"

No, not "like in C#", type safety is not a MS concept!

Isn't it? Right after they invented the Internet ? :-)

As a matter of fact, while MS was doing a presentation on Win95, they clearly stated that they didn't believe in the Internet. So they did sure *NOT* invent it!
Oct 14 2004
prev sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
[snip]
 Barring that, the same kludge as in C99 would have to do:

 module std.stdbool;
 alias bit bool;
 const bool true = 1;
 const bool false = 0;

These definitions can go in object.d where "alias bit bool" already exists. That way "true" and "false" can be removed as keywords and the code in src/dmd/parse.d case TOKtrue: e = new IntegerExp(loc, 1, Type::tbit); nextToken(); break; (similarly for TOKfalse) can be removed. If someone redefines true then the old true can be obtained by cast(bit)1. The only reason I can see for keeping true/false as keywords is to help editors that do syntax highlighting of keywords. But that is a pretty weak reason. Another possible reason is that it makes the strict boolean people a tad happier to have true/false as keywords. [snip]
Oct 14 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 The only reason I can see for keeping true/false as keywords is to help
 editors that do syntax highlighting of keywords. But that is a pretty weak
 reason. Another possible reason is that it makes the strict boolean people a
 tad happier to have true/false as keywords.

I think they should go together... Either bool/true/false are *all* keywords (as C++), or none are (as C). And since I think they're good, best would be to add the missing bool ? The strange part is that "bit" does have some pseudo-boolean qualities. For instance, it behaves strange when casted from a bigger size integer:
 void main()
 {
   uint i = cast(uint) 0xFFFFFFFF00000000;
   short s = cast(ushort) 0xFFFF0000;
   ubyte b = cast(ubyte) 0xFF00;
   bit t = cast(bit) 0xFE;

   printf("%ld\n", i);
   printf("%d\n", s);
   printf("%d\n", b);
   printf("%d\n", t ? 1 : 0);
 }

Also shown by:
 void main()
 {
    bit b;
    for (int i = -2; i <= +2; i++) {
      b = cast(bit) i;
      printf("%+d = %.*s\n", i, b ? "true" : "false");
    }
 }

-2 = true -1 = true +0 = false +1 = true +2 = true That sounds like boolean (i != 0) and not like a bit (i & 1) to me! This, and the fact that "true" and "false" were of the bit type makes me think that the built-in D type bit really is our long lost *bool* type ? And that bit isn't an integer type at all, but instead a boolean type... Which makes it even more puzzling why the name "bit" was chosen for it ? --anders
Oct 14 2004
parent reply Sean Kelly <sean f4.ca> writes:
In article <ckmqnk$8c8$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
-2 = true
-1 = true
+0 = false
+1 = true
+2 = true

That sounds like boolean (i != 0) and not like a bit (i & 1) to me!

This is a bit weird now that I think about it. If "bit" is supposed to represent a bit, then shouldn't 2 be false, 3 be true, etc? I grant that this would likely be a significant source of bugs, but logic suggests that bit should behave the same as any other unsigned integer type.
This, and the fact that "true" and "false" were of the bit type makes me
think that the built-in D type bit really is our long lost *bool* type ?

And that bit isn't an integer type at all, but instead a boolean type...
Which makes it even more puzzling why the name "bit" was chosen for it ?

Just about. The only reason I can think to call it "bit" is that it implies a storage size (which is accurte in arrays). Sean
Oct 14 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Sean Kelly wrote:

And that bit isn't an integer type at all, but instead a boolean type...
Which makes it even more puzzling why the name "bit" was chosen for it ?

Just about. The only reason I can think to call it "bit" is that it implies a storage size (which is accurte in arrays).

This is not always a good thing. Sometimes "char" or "int" are preferrable over "bit", for implementation performance reasons. If "bool" was kept as an *abstract* concept, the compiler could then chose a representation that was optimal for the actual task ? --anders
Oct 14 2004
parent reply Sean Kelly <sean f4.ca> writes:
In article <ckmv1a$dck$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Sean Kelly wrote:

And that bit isn't an integer type at all, but instead a boolean type...
Which makes it even more puzzling why the name "bit" was chosen for it ?

Just about. The only reason I can think to call it "bit" is that it implies a storage size (which is accurte in arrays).

This is not always a good thing. Sometimes "char" or "int" are preferrable over "bit", for implementation performance reasons. If "bool" was kept as an *abstract* concept, the compiler could then chose a representation that was optimal for the actual task ?

It seems that one design aspect of D is that all primitive types have well-defined storage attributes. byte is 8 bits, int is 32 bits, etc. C/C++ make no such claims for any type. While part of me does wonder if this is going to be a problem at some point for D (there are some rare systems where a byte is not 8 bits), I think the overall benefit is a good one. And as I said in the other thread, if packing of bits in arrays were removed as a feature in D, then the token name should change to "bool." This is the principal difference in my mind. Sean
Oct 14 2004
parent reply larrycowan <larrycowan_member pathlink.com> writes:
In article <ckn0ii$era$1 digitaldaemon.com>, Sean Kelly says...
In article <ckmv1a$dck$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Sean Kelly wrote:

And that bit isn't an integer type at all, but instead a boolean type...
Which makes it even more puzzling why the name "bit" was chosen for it ?

Just about. The only reason I can think to call it "bit" is that it implies a storage size (which is accurate in arrays).

This is not always a good thing. Sometimes "char" or "int" are preferrable over "bit", for implementation performance reasons.


What? I can't think of any logic instructions using more cpu cycles than any equivalent arithmetics. If moving them, and they are already in 8-bit or 2^n multiples of this groups, the compiler can optimize the same way you want to write code to do. If you are doing it so the code looks simpler that's not performance. If bits are properly supported in arrays, the compiler will hide this complexity and it will be faster than using char or ints in place of bits.
If "bool" was kept as an *abstract* concept, the compiler could
then chose a representation that was optimal for the actual task ?

It seems that one design aspect of D is that all primitive types have well-defined storage attributes. byte is 8 bits, int is 32 bits, etc. C/C++ make no such claims for any type. While part of me does wonder if this is going to be a problem at some point for D (there are some rare systems where a byte is not 8 bits),

1, 4, 9, 10, 12, 16, 18, ... and IBM 9000 series 10-digit words with no bits. I think we can dismiss these for our purposes - they are either obsolete or special purpose cpus, or are likely to have too small a memory to remember D, or C, (tiny C? - possibly).
             I think the overall benefit is a good one.  And as I said in the
other thread, if packing of bits in arrays were removed as a feature in D, then
the token name should change to "bool."  This is the principal difference in my
mind.


Sean

essentially boolean units and are quite commonly stored 8 to a byte. If you are saying they should have an actual size which includes many unused bits, it seems to me that this must be because addressing of these is not unique without a bit number in a byte. Slicing could also require shifts, perhaps of more than a register size. To get a set of 16 contiguous flags I should have to have an array of two packed structs with 8 individual bits and 8 different bit-position names? At this point I'd just do away with bit arrays entirely. (I wouldn't ever do this, I would just use logic ops to handle the bits individually and forget bit arrays.) I can't ever see a need to have 1 bit per byte bit arrays rather than byte arrays using only the low order bit. There are machine instructions which test or set individual bits, the only problem I see is addressing consistency in the addressing between bits and all larger uniquely addressible (using only the address field of instructions) units such as bytes, chars, longs, etc. This cannot be done away with the way you apparently want without losing more than we gain. All the above is not to say that there aren't problems in the way our compiler and other C-derived compilers handle bits and bit arrays. Addressing has always been a kludge. Provide a good, self-consistent, understandable and complete solution and the world will thank you.
Oct 14 2004
next sibling parent Sean Kelly <sean f4.ca> writes:
larrycowan wrote:
 In article <ckn0ii$era$1 digitaldaemon.com>, Sean Kelly says...

            I think the overall benefit is a good one.  And as I said in the
other thread, if packing of bits in arrays were removed as a feature in D, then
the token name should change to "bool."  This is the principal difference in my
mind.

What is wrong with packing of bits in arrays? Please explain. Flags are essentially boolean units and are quite commonly stored 8 to a byte. If you are saying they should have an actual size which includes many unused bits, it seems to me that this must be because addressing of these is not unique without a bit number in a byte. Slicing could also require shifts, perhaps of more than a register size.

These are the reasons.
 To get a set of 16 contiguous flags I should have to have an array of two
packed
 structs with 8 individual bits and 8 different bit-position names?  At this
 point I'd just do away with bit arrays entirely.  (I wouldn't ever do this, I
 would just use logic ops to handle the bits individually and forget bit
arrays.)

This was the alternative suggestion. Move to "bool" which is always one byte and let a library class handle the packing when needed.
 I can't ever see a need to have 1 bit per byte bit arrays rather than byte
 arrays using only the low order bit.

Agreed.
 There are machine instructions which test or set individual bits, the only
 problem I see is addressing consistency in the addressing between bits and all
 larger uniquely addressible (using only the address field of instructions)
units
 such as bytes, chars, longs, etc.  This cannot be done away with the way you
 apparently want without losing more than we gain.

Thus the quandry. I personally don't have any strong preference for either side of the issue. Packed bit arrays have the potential to make (robust) template code more difficult to write, they prohibit addressing elements in bit arrays, and they impose restrictions on slicing. At the same time, they are a convenient feature, it does make logical sense to represent a two-state value as a single bit when possible, and it isn't particularly difficult for a programmer to work around the problems (I already do this quite effortlessly with vector<bool>). But as I said in the other thread: from an idealistic perspective, is the tradeoff worthwhile? Walter certainly thinks it is. I'm undecided. Others don't like it one bit :)
 All the above is not to say that there aren't problems in the way our compiler
 and other C-derived compilers handle bits and bit arrays.  Addressing has
always
 been a kludge.  Provide a good, self-consistent, understandable and complete
 solution and the world will thank you.

Definately. If there were a straightforward and robust way to address these few problems, I would be quite happy. Sean
Oct 14 2004
prev sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
larry cowan wrote:

This is not always a good thing. Sometimes "char" or "int" are 
preferrable over "bit", for implementation performance reasons.


What? I can't think of any logic instructions using more cpu cycles than any equivalent arithmetics. If moving them, and they are already in 8-bit or 2^n multiples of this groups, the compiler can optimize the same way you want to write code to do. If you are doing it so the code looks simpler that's not performance. If bits are properly supported in arrays, the compiler will hide this complexity and it will be faster than using char or ints in place of bits.

I just know that in my GCC 3.4, sizeof(bool) equals sizeof(int) ? In C99/C++, the compiler can choose any representation of a "bool". (at least they standardized on a common name for it, in <stdbool.h>) Just like an "int" is allowed to between short or long, although a lot of new code just ignores 16-bit computers and then breaks if sizeof(int) is not the same as sizeof(long). Thus the <stdint.h> That being said, a "bit" looks like a perfect choice for bool if only it can have the pointer to it taken and be implemented reasonably sane. But if it walks like a bool and quaks like a bool, why not name it bool? --anders
Oct 14 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
I earlier wrote:
 That being said, a "bit" looks like a perfect choice for bool if only
 it can have the pointer to it taken and be implemented reasonably sane.
 But if it walks like a bool and quaks like a bool, why not name it bool?

More bit / bool inconsistencies: http://www.digitalmars.com/d/htomodule.html
 A little global search and replace will take care of renaming the C types
 to D types. The following table shows a typical mapping for 32 bit C code:
 C type 	D type

 bool 	int

http://www.digitalmars.com/d/ctod.html
 C to D types

       bool               =>        bit 

So at one place, bool is an integer. Of 32-bits, none the less. Hoping that the C compiler chose int for bool, and not char... (the first page should probably read: bool => char or int, just like wchar_t currently says: wchar_t => wchar or dchar) In the other, bool is a bit (which now has a "boolean" cast operator) So the current "bit" type is definitely a bool, being 1 bit in size. (Ignoring whether or not it's a good thing that it converts to int) Why the type name was changed to reflect the storage is still unclear? A questionable feature is whether a language *needs* any sub-byte integer types, such as e.g. bits (1 bit) and nybbles (4 bits)... ? Possibly because they could (potentially) be useful in packed arrays to avoid having to use any bit-operators (the nybble macros are nasty). But D's practice of calling the boolean type "bit" is *not good*. The sooner it can be changed, the better! It could *work* the same. Here is one idea: rename the D keyword from "bit" back to bool again. And then change the definition in object.d to read: "alias bool bit;" Then all we need is some bool type-safety... Which could be added now, and checked later ? (i.e. start converting code, hope for D 2.0) --anders PS. Does anyone have any good usages of bit arrays they like to share? (and I am *not* talking about bool[] arrays, like in "sieve.d")
Oct 15 2004
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Anders F Björklund wrote:
 ...
 
 But D's practice of calling the boolean type "bit" is *not good*.
 The sooner it can be changed, the better! It could *work* the same.
 
 ...

But currently D's type IS bit. bool is an alias for convenience only. And currently bit arrays are packed, and thus bit[8] is equivalent to a bit addressable byte. This can be quite useful, but since I don't know your boundaries about how you think of bit and how you think of bool, I can't claim that there isn't an overlap, but to my mind, if you care how it's packed, then it's a bit type, otherwise, bool is probably a decent label.
Oct 15 2004
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Charles Hixson wrote:

 But D's practice of calling the boolean type "bit" is *not good*.
 The sooner it can be changed, the better! It could *work* the same.

But currently D's type IS bit. bool is an alias for convenience only.

And I think that is just plain wrong. It should be the other way around. If it's all about storage, then we might just as well do away with all the character types too ? char => ubyte, wchar => ushort, dchar => uint No, my request is for "bool" to be a proper language keyword in D.
 And currently bit arrays are packed, and thus bit[8] is equivalent to a 
 bit addressable byte.  This can be quite useful, but since I don't know 
 your boundaries about how you think of bit and how you think of bool, I 
 can't claim that there isn't an overlap, but to my mind, if you care how 
 it's packed, then it's a bit type, otherwise, bool is probably a decent 
 label.

I think that using a single bit for a boolean is an elegant solution, even if does have a lot of pain implied on the implementation front... As for my own "boundaries", I happen to think that: -"bool" is a boolean type, that can have one of values "true" or "false" when you assign an integer to a bool, the end result is: b = (i != 0) -"bit" is an integer type, size 1 bit, that can contain numbers 1 and 0 when you assign an integer to a bit, the end result is: b = (i & 1) Some people (me included) think that integers and booleans should not be assignable at all, but that's another discussion... (about type-safety) And even if you use a "char" or an "int" to store values of type "bool", in the end it can only hold two result values: zero and non-zero... :-) Currently D has a boolean type, called "bit". And *that's* confusing. (*especially* for all C99 and C++ programmers that are used to "bool") --anders
Oct 16 2004
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Anders F Björklund wrote:
 Charles Hixson wrote:
 
...


I think that using a single bit for a boolean is an elegant solution, even if does have a lot of pain implied on the implementation front... As for my own "boundaries", I happen to think that: -"bool" is a boolean type, that can have one of values "true" or "false" when you assign an integer to a bool, the end result is: b = (i != 0) -"bit" is an integer type, size 1 bit, that can contain numbers 1 and 0 when you assign an integer to a bit, the end result is: b = (i & 1) ... --anders

Ah. To me NEITHER is an integer type. Bool comes out of Boolean Logic, and is either true or false. Bit comes out of Information Theory, and is the smallest individual piece of information (and thus, strictly speaking, it should be typeless). But historically arrays of flip-flops (hardwared bit representations) were ganged together to form addressable hunks of memory that were called words (I'm leaving out a lot of steps) and the words were sub-divided into characters...usually with a 6-bit/character code. IBM later changed this into a 32 bit word and 8 bit byte. But do notice that bytes are being built out of bits. So historically, there is an affinity between bits and pieces of bytes, also pieces of numbers. The bit itself is typeless. Having a type of bit is thus slightly anomolous, but not much so if you can use arrays of them to build bytes and words. (That you do this via a union is a bit peculiar, but not overwhelmingly so.) So to me neither one is an integer type. It would probably be quite reasonable if D so defined them. But bit arrays should be the packed chunks of "smallest piece of information", and have other types that are buildable from them. (I.e., bit[8] should have a to_char method, bit [16] should have a to_ushort method [possibly also a to_short method.) I know that this isn't expressible within the normal context of D, but conceptually, that's how it *OUGHT* to be.
Oct 17 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Charles Hixson wrote:

 Ah.  To me NEITHER is an integer type.
 Bool comes out of Boolean Logic, and is either true or false.
 Bit comes out of Information Theory, and is the smallest individual 
 piece of information (and thus, strictly speaking, it should be 
 typeless).  But historically arrays of flip-flops [..snip...]

Back at the gates and voltages, it seems... (feels like a kid again) Anyway, it is still strange that "true" and "false" are of type "bit"? (I think they should either be plain 1 or 0 like in C, or type "bool") I have mostly given in and up in the bit / bool wars, just wanted the D keywords to be somewhat consistent (preferrably same as C99 or C++) And I bet you are "thrilled" that *both* are integers in D, then ? (I must confess I hadn't heard the "bit should not be int" before) --anders
Oct 17 2004
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Anders F Björklund wrote:
 Charles Hixson wrote:
 
 Ah.  To me NEITHER is an integer type.
 Bool comes out of Boolean Logic, and is either true or false.
 Bit comes out of Information Theory, and is the smallest individual 
 piece of information (and thus, strictly speaking, it should be 
 typeless).  But historically arrays of flip-flops [..snip...]

Back at the gates and voltages, it seems... (feels like a kid again) Anyway, it is still strange that "true" and "false" are of type "bit"? (I think they should either be plain 1 or 0 like in C, or type "bool") I have mostly given in and up in the bit / bool wars, just wanted the D keywords to be somewhat consistent (preferrably same as C99 or C++) And I bet you are "thrilled" that *both* are integers in D, then ? (I must confess I hadn't heard the "bit should not be int" before) --anders

Well, in Ada one could define a type (as opposed to a sub-type) and get a totally separate type. typedef doesn't seem to separate things quite as thoroughly. But I'll give up a lot for a language that's easier to use (even C++ fits here!) and has a garbage collector. (I never understood why Ada didn't include one.) I don't really worry about int, etc., which will probably bit me some day, but hasn't so far. (I tend to think of bits as being inherrently typeless...but it doesn't bother me to use bit as a boolean value. I never think of either bit or bool as integers.)
Oct 17 2004
parent Derek Parnell <derek psych.ward> writes:
On Sun, 17 Oct 2004 17:59:53 -0700, Charles Hixson wrote:

 Anders F Björklund wrote:
 Charles Hixson wrote:
 
 Ah.  To me NEITHER is an integer type.
 Bool comes out of Boolean Logic, and is either true or false.
 Bit comes out of Information Theory, and is the smallest individual 
 piece of information (and thus, strictly speaking, it should be 
 typeless).  But historically arrays of flip-flops [..snip...]



[snip]
  I never think of either bit or bool as integers.)

Come to think of it, the only time I really use bits is when I'm mapping RAM structures. I can't think of why I'd use a bit or a bit array as a data item for anything else. I guess I don't get out as often as I should ;-) -- Derek Melbourne, Australia 18/10/2004 11:19:57 AM
Oct 17 2004
prev sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Charles Hixson wrote:

 And currently bit arrays are packed, and thus bit[8] is equivalent to a 
 bit addressable byte.  This can be quite useful, [...]

I'm not sure how I would use that ? My first attempt crashed the compiler: void main() { union U { ubyte bite; bit[8] bits; } U.bite = 0x80; foreach (bit b; U.bits) { printf(" %d", b ? 1 : 0); } }
 bitarray.d: In function `main':
 bitarray.d:5: internal compiler error: in d_expand_expr, at d/d-glue.cc:3000

The second attempt shows different sizes: void main() { ubyte bite; bit[8] bits; printf("byte: %d\n", bite.sizeof); printf("bits: %d\n", bits.sizeof); }
 byte: 1
 bits: 4

But maybe bit arrays has some other use I'm not aware of ? (and what about the "nybble" type ? nybble[2] hex_byte;) Just that it all feels so Pascal to me: "Bytes as bit sets" Isn't sub-byte manipulation what the bit operators are for ? Currently, I just view bit[] as a nice hack to store arrays of (1-bit) flags in an effective format, just as char[] is a way of storing arrays of (32-bit) Unicode code points effectively... (then again, one *could* just use ubyte[] and dchar[] too ?) --anders
Oct 16 2004
next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Anders F Björklund wrote:

 Charles Hixson wrote:
 
 And currently bit arrays are packed, and thus bit[8] is equivalent to a
 bit addressable byte.  This can be quite useful, [...]

I'm not sure how I would use that ? My first attempt crashed the compiler: void main() { union U { ubyte bite; bit[8] bits; } U.bite = 0x80; foreach (bit b; U.bits) { printf(" %d", b ? 1 : 0); } }
 bitarray.d: In function `main':
 bitarray.d:5: internal compiler error: in d_expand_expr, at
 d/d-glue.cc:3000


It works fine if you replace union U { ubyte bite; bit[8] bits; } with union U_t { ubyte bite; bit[8] bits; } U_t U; Your code was trying to access a type like a variable. The compiler shouldn't error, though, so I'd go ahead and post that example to D.bugs so that Walter can try to fix it.
Oct 16 2004
parent reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Ben Hinkle wrote:

 It works fine if you replace
     union U { ubyte bite; bit[8] bits; }
 with
     union U_t { ubyte bite; bit[8] bits; }
     U_t U;
 Your code was trying to access a type like a variable. The compiler
 shouldn't error, though, so I'd go ahead and post that example to D.bugs so
 that Walter can try to fix it.

Argh, you are right... Thinking in C, I guess. (or not at all) Also discovered that bit arrays are little-endian. (LSB first) --anders
Oct 16 2004
parent reply Sean Kelly <sean f4.ca> writes:
Anders F Björklund wrote:
 
 Also discovered that bit arrays are little-endian. (LSB first)

They are on x86 hardware anyway. I would be surprised if this were preserved for big-endian machines. Sean
Oct 16 2004
parent =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Sean Kelly wrote:

 Also discovered that bit arrays are little-endian. (LSB first)

They are on x86 hardware anyway. I would be surprised if this were preserved for big-endian machines.

This was on a big-endian machine... But I only meant within the byte, that is: bit[0] sets 0x01 and bit[7] sets 0x80 of the byte... If you do things like unions or casts, then it'll probably preserve the native endian of the platform (since it just copies the bytes) --anders
Oct 17 2004
prev sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
 The second attempt shows different sizes:
 
 void main()
 {
     ubyte bite;
     bit[8] bits;
 
     printf("byte: %d\n", bite.sizeof);
     printf("bits: %d\n", bits.sizeof);
 }
 
 byte: 1
 bits: 4


Looks like bit arrays are packed into ints not bytes - so the sizeof will always be a multiple of 4. Where does this matter?
Oct 16 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

void main()
{
    ubyte bite;
    bit[8] bits;

    printf("byte: %d\n", bite.sizeof);
    printf("bits: %d\n", bits.sizeof);
}


byte: 1
bits: 4


Looks like bit arrays are packed into ints not bytes - so the sizeof will always be a multiple of 4. Where does this matter?

Not at all, I guess... Changing to int shows that you are right: void main() { uint bite; bit[32] bits; printf("int: %d\n", bite.sizeof); printf("bit: %d\n", bits.sizeof); }
 int: 4
 bit: 4

--anders
Oct 16 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
I wrote:

 Looks like bit arrays are packed into ints not bytes - so the
 sizeof will always be a multiple of 4. Where does this matter?

Not at all, I guess...

It wasn't obvious to me what the sizes were, so I checked the current D implementation... Bit variables are stored in a byte, unless they occur in arrays. Then they are instead packed into blocks of 32 bits, for speed :
 void Bits::set(unsigned bitnum)
 {
     data[bitnum / 32] |= 1 << (bitnum & 31);
 }
 
 void Bits::clear(unsigned bitnum)
 {
     data[bitnum / 32] &= ~(1 << (bitnum & 31));
 }
 
 int Bits::test(unsigned bitnum)
 {
     return data[bitnum / 32] & (1 << (bitnum & 31));
 }

That should be "bitnum >> 5", but the compiler should be smart enough to optimize it away... struct bit_dynamic { bit[] bits; } struct bit_static { bit[2] bits; } struct bit_fields { bit a; bit b; }
 bit_dynamic.sizeof: 8

 bit_static.sizeof: 4
 bit_fields.sizeof: 2

So if you union a ubyte and a bit[8], the union occupies 4 bytes. Same (!) if you union a uint and a bit[32]: 4 bytes. (ulong with bit[64] is expected 8 bytes) Pointers to bits are funny, they *do* work if you access a byte-stored single bit var - but not if you try access a single bit in an array ?
 void main()
 {
   static bit[32] t = 0;
 
   t[5] = 1;
   for (int i = 0; i < 32; i++) {
     bit *p = &(t[i]);
     printf("%d ", (*p) ? 1 : 0);
   }
   printf("\n");
 
   static ubyte[32] b = 0;

   b[5] = 1;
   for (int i = 0; i < 32; i++) {
     ubyte *p = &(b[i]);
     printf("%d ", (*p) ? 1 : 0);
   }
   printf("\n");
 }

Probably for the same reasons that bit[] slices has problems, pointers only knows of whole bytes (while they need to know a 0-7 bit offset too) ? --anders
Oct 17 2004
parent Sean Kelly <sean f4.ca> writes:
Anders F Björklund wrote:
 Pointers to bits are funny, they *do* work if
 you access a byte-stored single bit var - but
 not if you try access a single bit in an array ?

 Probably for the same reasons that bit[] slices
 has problems, pointers only knows of whole bytes
 (while they need to know a 0-7 bit offset too) ?

Yup. There have been some proposals for addressing this issue (no pun intended) but all seemed a bit kludgy. Sean
Oct 17 2004
prev sibling parent reply Andy Friesen <andy ikagames.com> writes:
larrycowan wrote:
 One more time...
 
 [...smart things...]

bit should probably be done away with entirely: the only thing that makes it useful at all right now is bit arrays, and they can easily be implemented with a struct and a few overloaded operators. Moreover, the fact that bit arrays are packed creates all sorts of special cases and warts (like the behaviour of the .sizeof property), all for a feature that is hardly ever actually put to use! (Phobos itself only uses them in the sense that it implements certain bit[] operations, like bit[].reverse) I would very much like a stricter boolean, for which no implicit conversions exist, but, either way, there isn't a very compelling reason to keep bit at all. -- andy
Oct 14 2004
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Andy Friesen wrote:
 larrycowan wrote:
 
 One more time...

 [...smart things...]

bit should probably be done away with entirely: the only thing that makes it useful at all right now is bit arrays, and they can easily be implemented with a struct and a few overloaded operators. Moreover, the fact that bit arrays are packed creates all sorts of special cases and warts (like the behaviour of the .sizeof property), all for a feature that is hardly ever actually put to use! (Phobos itself only uses them in the sense that it implements certain bit[] operations, like bit[].reverse) I would very much like a stricter boolean, for which no implicit conversions exist, but, either way, there isn't a very compelling reason to keep bit at all. -- andy

I can see the desire for stricter booleans, and I can see arguments for limiting the ways in which bit arrays can be used (perhaps they could be required to be allocated in groups of, say, 32). But packed bit arrays are so useful that doing away with them seems...., well, just very undesireable. Limit them if you must. Make it so that slicing isn't implemented on them. Make them a library class. But don't eliminate them. I rarely want to slice a bit array, but I frequently have need for one. (One CAN get around this by masking and shifting, but that's a quite error-prone approach. At least, *I* find it quite error-prone.) OTOH, I can certainly see making it a special library class, with constructors that take, say, the other basic types, and methods that return the value as packed into an array of one (or several) of the other basic types.
Oct 14 2004
parent Andy Friesen <andy ikagames.com> writes:
Charles Hixson wrote:
 Andy Friesen wrote:
 
 ?????
 I can see the desire for stricter booleans, and I can see arguments for 
 limiting the ways in which bit arrays can be used (perhaps they could be 
 required to be allocated in groups of, say, 32).  But packed bit arrays 
 are so useful that doing away with them seems...., well, just very 
 undesireable.
 
 Limit them if you must.  Make it so that slicing isn't implemented on 
 them.  Make them a library class.  But don't eliminate them.
 
 I rarely want to slice a bit array, but I frequently have need for one.  
 (One CAN get around this by masking and shifting, but that's a quite 
 error-prone approach.  At least, *I* find it quite error-prone.)
 
 OTOH, I can certainly see making it a special library class, with 
 constructors that take, say, the other basic types, and methods that 
 return the value as packed into an array of one (or several) of the 
 other basic types.

I wasn't arguing that bitsets should be eradicated from existence. I was referring merely to the fact that they are currently built into the core language itself. :) It would be very easy to write a little struct that implements the indexing and slicing operators and behaves like bit[] in pretty much every way. -- andy
Oct 15 2004