www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Back to basics: integer types

reply "Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:
Hi..

Another thought I had...

I like the type size-suffix for fixed-sized integers (int8, int16, int32, 
etc) and this got me thinking about the most basic of types: bit.

Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] 
the same as the current int16, etc. ? I probably need more arguments than 
"it's nice" to get this one through, but I'll think about it some more while 
you guys fire back at me.

And, am I correct in understanding that subsequent bits in a struct will 
always be packed?

struct something
{
    bit    first;
    bit    second;
};

I.e. what's the size of this struct?

If they will be packed, then I've got a (new?) argument for "bool": 
sizeof(bool) being 1 (mostly for compatibility with old code/structs as I 
don't think there's any speed gain. Well, passing "bit second" from the 
above struct to a method taking bool/bit would need a shift operation not 
needed if "second" was a real bool).

-- 
Lio.

-- Get the CACert root certificate (and a personal one) at 
http://cacert.org/ 
Nov 23 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Lionello Lunesu wrote:

 I like the type size-suffix for fixed-sized integers (int8, int16, int32, 
 etc) and this got me thinking about the most basic of types: bit.

The supported type name "spelling" is: int8_t, int16_t, int32_t... See the D module: import std.stdint;
 Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] 
 the same as the current int16, etc. ? I probably need more arguments than 
 "it's nice" to get this one through, but I'll think about it some more while 
 you guys fire back at me.

bit[#] is optimized for speed, so it's a multiple of 4 bytes (32 bits) Had it been optimized for storage, it would have worked like you wrote
 And, am I correct in understanding that subsequent bits in a struct will 
 always be packed?

No. You could have tested it, too ?
 struct something
 {
     bit    first;
     bit    second;
 };
 
 I.e. what's the size of this struct?

2 bytes.
 If they will be packed, then I've got a (new?) argument for "bool": 
 sizeof(bool) being 1 (mostly for compatibility with old code/structs as I 
 don't think there's any speed gain. Well, passing "bit second" from the 
 above struct to a method taking bool/bit would need a shift operation not 
 needed if "second" was a real bool).

There is no "real" boolean type in D. (Like one that only takes true/false?) sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask. The different implementations are fondly known as bit, wbit and dbit Otherwise you need sub-byte pointers to access &something.second, and that's a huge pain to implement and to pass around the block. Personally, I'm not sure that bit *is* the fundamental data type... Similar to how the triangle is the fundamental type in 3D graphics, I think the byte has become the fundamental data type. For speed ? --anders PS. Bit is not an integer data type in D. It's a pseudo-bool data type. See my previous rants and the endless discussions on this newsgroup.
Nov 23 2004
next sibling parent reply Georg Wrede <Georg_member pathlink.com> writes:
Anders F Björklund wrote:
 Lionello Lunesu wrote:
 There is no "real" boolean type in D.
 (Like one that only takes true/false?)
 
 
 sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask.
 The different implementations are fondly known as bit, wbit and dbit
 
 Otherwise you need sub-byte pointers to access &something.second,
 and that's a huge pain to implement and to pass around the block.
 
 
 Personally, I'm not sure that bit *is* the fundamental data type...
 
 Similar to how the triangle is the fundamental type in 3D graphics,
 I think the byte has become the fundamental data type. For speed ?

 PS.
 Bit is not an integer data type in D. It's a pseudo-bool data type.
 See my previous rants and the endless discussions on this newsgroup.

I agree. While bit may be the fundamental unit of information, it is _not_, I repeat, _not_ the fundamental data type in computers. Contrary to public belief (IT professionals and laymen alike), these are not the same thing. Computer hardware manufacturers have known this for more than 50 years. If they hadn't, then the memory adress space would be a single bit string. Corollary (slightly off the issue) This would lead to two things: (a) memory pointers could point to any bit, and (b) the different cpu-sizes (today 16, 32 and 64 bits) could actually be _any_ number of bits. Thus, we could have a 17 bit CPU or a 67 bit cpu. Probably these computers would fetch several bits at once, say, those 17 bits on a "17-bit" CPU. But not necessarily. Opcodes could be of varying size without wasting memory. The CPU would know where (at which bit-adress) to fetch the next instruction. Opcodes (and entire applications) would exist to process bit- streams, a la audio CD reading. (Ever seen '1 bit processor' written on a CD player?) Most probably different parts of a CPU would be capable of handling different size "words", depending on the usage. For example, US-ASCII operations hardware would be on 7 bit circuits, and graphics operations hardware 24 bits. Oh, and an amazing thing with the memory: arbitrary size "DMA like" data transfers (say between main memory and graphics memory) would take only one clock cycle for the _entire_ transfer !!!! <diabolical laughter> This I leave as an excercise to the reader to figure out! </> Yes, very nice. But practical issues in hardware design and implementation have lead us to use a fixed word size and memory adressing by the byte. (Actually, the word size usually determines the fetched number of bits, so on some hardware architectures the _actual_ granularity of adressing is not the same as the _appearing_ granularity. E.g. we may have a CPU that is only capable of adressing memory "as 32 bit entities", but then, using various kludges, it pretends to be able to do it by the byte (=8bits). And this only because systems programmers' minds are stuck to byte as the "only" size.) Back to the issue at hand It is regrettable that this was not thought of in the early days of D. As a result we have this Bit data type that lingers haunting the entire D community. Would I become "the God in D world", on the First Day, I would throw away the Bit data type. On the second day I would create a Boolean data type, and write in the specification that its size would be architecture dependent, and suggest the fastest integral size would be used. (Eg, probably 32 bits on Pentiums.) Then I would seriously consider forbidding int casts to/from Boolean. You'd have to write myBoolean = ( myInt != 0 ); myInt2 = myBoolean ? 1 : 0; I would not disapprove of arrays of Booleans. I would not even disallow union wizardry with int on top of packed Boolean, but I would seriously discourage it. Of course the compiler writer would be free to use whatever intermediate values in Boolean calculations etc., but the programmer should percieve Booleans as distinct from 1 and 0. And of course, programmers would still be free to do bitwise logical ops between ints. But that is _really_ unrelated to this Boolean issue. Isn't it!
Nov 23 2004
next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <cnvusn$1hh3$1 digitaldaemon.com>, Georg Wrede says...
(Actually, the word
size usually determines the fetched number of bits, so
on some hardware architectures the _actual_ granularity
of adressing is not the same as the _appearing_ granularity.
E.g. we may have a CPU that is only capable of adressing
memory "as 32 bit entities", but then, using various
kludges, it pretends to be able to do it by the byte
(=8bits). And this only because systems programmers'
minds are stuck to byte as the "only" size.)

Technically, since a byte is defined as "the smallest addressable unit" in C/C++, I would expect a byte on such machines to be 32 bits. But then I suppose we've gotten so used to 8 bit bytes that perhaps the kludges would be expected? In any case, D imposes size constraints on all types, so in our world a byte would be 8 bits.
On the second day I would create a Boolean data type, and
write in the specification that its size would be
architecture dependent, and suggest the fastest integral
size would be used. (Eg, probably 32 bits on Pentiums.)

..which would make the boolean type the only D data type with an indeterminate size. I would almost rather require that such a type always be 8 or 32 bits, unless all size restrictions were eliminated (and I don't expect that to happen). Sean
Nov 23 2004
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Sean Kelly wrote:
 Technically, since a byte is defined as "the smallest addressable unit" in
 C/C++, I would expect a byte on such machines to be 32 bits.  But then I
suppose
 we've gotten so used to 8 bit bytes that perhaps the kludges would be expected?
 In any case, D imposes size constraints on all types, so in our world a byte
 would be 8 bits.

Well, to quibble with your quibble: on most 32bit machines, 8bit values are individually addressable. Even on architectures that require aligned reads & writes, they often use byte-addressing, and byte-level primitives are sometimes available.
Nov 23 2004
parent Sean Kelly <sean f4.ca> writes:
In article <co03i7$1nmp$1 digitaldaemon.com>, Russ Lewis says...
Sean Kelly wrote:
 Technically, since a byte is defined as "the smallest addressable unit" in
 C/C++, I would expect a byte on such machines to be 32 bits.  But then I
suppose
 we've gotten so used to 8 bit bytes that perhaps the kludges would be expected?
 In any case, D imposes size constraints on all types, so in our world a byte
 would be 8 bits.

Well, to quibble with your quibble: on most 32bit machines, 8bit values are individually addressable. Even on architectures that require aligned reads & writes, they often use byte-addressing, and byte-level primitives are sometimes available.

Understood. I was actually referring to Georg's imaginary 32 bit machine. Sean
Nov 23 2004
prev sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Georg Wrede wrote:

 Back to the issue at hand

A somewhat dead and beaten issue, but I'll bite...
 It is regrettable that this was not thought of in the
 early days of D. As a result we have this Bit data type
 that lingers haunting the entire D community.

I don't think that it wasn't thought of being the issue... Maybe bit array slices got a tad harder to implement, but.
 Would I become "the God in D world", on the First Day, I
 would throw away the Bit data type.

The current Deity being the proud bit creator, of course :-)
 On the second day I would create a Boolean data type, and
 write in the specification that its size would be
 architecture dependent, and suggest the fastest integral
 size would be used. (Eg, probably 32 bits on Pentiums.)

Going with the main "fixed-size" ideom of D, one could probably fix it as, say: 1 byte, too - without crying. Either the way, the main issue being there would be a boolean type that only accepts true and false values.
 Then I would seriously consider forbidding int casts
 to/from Boolean. You'd have to write
 
 myBoolean = ( myInt != 0 );
 myInt2 = myBoolean ? 1 : 0;

Just as in Java... (the language, not the VM or OS) Also required: myBoolean = ( myPointer != null ); Note: some people *hate* these "extra" steps
 I would not disapprove of arrays of Booleans.
 
 I would not even disallow union wizardry with int on top
 of packed Boolean, but I would seriously discourage it.

I'm assuming you mean packed bit arrays. Java does those with an Object class, but language support isn't all bad. If it hadn't been for the pain to get the pointers and slices implemented, that is. Probably not worth it, IMHO.
 Of course the compiler writer would be free to use whatever
 intermediate values in Boolean calculations etc., but the
 programmer should percieve Booleans as distinct from 1 and 0.

Heck, I'd settle for true and false being distinct from 1 and 0. (currently the are hard-coded as bit literals in the D compiler) But the fact remains that D chose to side with C and C++ on logic. (or actually more with C, since there is no "bool" primitive type) And since zero is false and non-zero is true, who's really boolean?
 And of course, programmers would still be free to do bitwise
 logical ops between ints. But that is _really_ unrelated to
 this Boolean issue. Isn't it!

It is. A related issue, however, is what type that conditionals etc expect. Had there been a boolean type, it's natural to make them require it. IMHO, Java got this right. (again, the language, not the religion...) See http://java.sun.com/docs/books/jls/ for that language's details. Anyway, D is a whole other language and has another such specification. And in *that* spec, there's three string types and three boolean types. string types: char[], wchar[], dchar[] boolean types: bit, wbit (byte), dbit (int) You get to pick one of each, to optimize for the task at hand. It's not a fatal flaw... More of a "missed opportunity", really? "alias bit bool; alias char[] string; goto on_with_our_lives;" See also http://www.prowiki.org/wiki4d/wiki.cgi?BooleanNotEquBit --anders
Nov 23 2004
prev sibling parent reply "Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:
Hi..

 bit[#] is optimized for speed, so it's a multiple of 4 bytes (32 bits)

From what I recall, accessing a byte is just as fast as acessing a dword (on
=386). Accessing a word (short, 16-bit) will result in a penalty. Is speed 

32-bits (independent of the data that follows it) ? For bit[] (dynamically sized) I understand it. It will definately be faster, because accessing a dword is not slower and you do less growing.
 No. You could have tested it, too ?

Yes yes, I should have tested it. But I'm one of those "patient lurkers": reading this newsgroup while VC++ is busy building. Anyway, I did not want to end-up discussing differences between dmd and gdc, and they might differ in this respect, so I though I'd keep it theoretical. (Also that I could have tested :-/ )
 There is no "real" boolean type in D.
 (Like one that only takes true/false?)

Oh no, I've started the discussion again? I really didn't mean to.
 Otherwise you need sub-byte pointers to access &something.second,
 and that's a huge pain to implement and to pass around the block.

Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.
 Personally, I'm not sure that bit *is* the fundamental data type...

Theoretically it still is. I mean, it's the basic granularity, there's nothing smaller than bit. The quarks equivalent in information technology.
 Similar to how the triangle is the fundamental type in 3D graphics,
 I think the byte has become the fundamental data type. For speed ?

Ah, practically yes, but triangles are not theoretically the fundamental type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. But in practice (especially in OGL/D3D-type rendering) triangles are the basis. This shouldn't prevent anybody from discussing 'the basics' though. I also do think that 3D rendering based on triangles is just a efficient hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get rid of them in the future (NURBS and other parameter-based geometries are gaining ground again). Lionello.
Nov 24 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Lionello Lunesu wrote:

bit[#] is optimized for speed, so it's a multiple of 4 bytes (32 bits)

From what I recall, accessing a byte is just as fast as acessing a dword (on >=386). Accessing a word (short, 16-bit) will result in a penalty. Is speed really _the_ reason for a bit[8] (fixed-sized array) to be allocating 32-bits (independent of the data that follows it) ? For bit[] (dynamically sized) I understand it. It will definately be faster, because accessing a dword is not slower and you do less growing.

It is faster since you can load/store 32 bits at one time, when setting several. Otherwise it would load/store 8 bits at a time, by using bytes. Like this C code: (where "/ 32" converts to ">> 5")
     unsigned *data;

 void Bits::set(unsigned bitnum)
 {
     data[bitnum / 32] |= 1 << (bitnum & 31);
 }
 
 void Bits::clear(unsigned bitnum)
 {
     data[bitnum / 32] &= ~(1 << (bitnum & 31));
 }
 
 int Bits::test(unsigned bitnum)
 {
     return data[bitnum / 32] & (1 << (bitnum & 31));
 }

bit[] is similar, just has a count parameter and then a pointer to data. (which is again allocated in 32-bit chunks, just as it is for bit[#]) You would have to ask Walter what the *real* reason is, though ? http://www.digitalmars.com/d/arrays.html just says:
  Bit vectors can be constructed:
 
 	bit[10] x;		// array of 10 bits
 
 The amount of storage used up is implementation dependent.
 Implementation Note: on Intel CPUs it would be rounded up to the next 32 bit
size.
 
 	x.length		// 10, number of bits
 	x.size			// 4,  bytes of storage
 	
 
 So, the size per element is not (x.size / x.length).

It's similar on PowerPC CPUs, by the way. At least with gdc today.
 Oh no, I've started the discussion again? I really didn't mean to.

It goes back under the rug then... "Never mind the little bump" :-)
Otherwise you need sub-byte pointers to access &something.second,
and that's a huge pain to implement and to pass around the block.

Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.

bit* is valid too. It's just that it doesn't work properly with arrays? But you can take the address of a bit field, and use it as "inout" too. (the dirty secret of course being that single bits are stored as bytes)
Personally, I'm not sure that bit *is* the fundamental data type...

Theoretically it still is. I mean, it's the basic granularity, there's nothing smaller than bit. The quarks equivalent in information technology.

No argument there. But the difference between theory and practice is a lot greater in practice than in theory... ;-)
Similar to how the triangle is the fundamental type in 3D graphics,
I think the byte has become the fundamental data type. For speed ?

Ah, practically yes, but triangles are not theoretically the fundamental type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. But in practice (especially in OGL/D3D-type rendering) triangles are the basis. This shouldn't prevent anybody from discussing 'the basics' though.

Of course not. Pixels are still the end result... Just that when drawing fast 3D animations, one does it a tri/quad at a time in hardware - instead of submitting each pixel and rendering in software. Similarly we can access bits using bytes or ints - plus the various bit operators like: and, or, shifts and have the hardware accumulate the bits in memory ?
 I also do think that 3D rendering based on triangles is just a efficient 
 hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get 
 rid of them in the future (NURBS and other parameter-based geometries are 
 gaining ground again).

There is some cool stuff possible with "Cg", http://developer.nvidia.com/page/cg_main.html (C for Graphics) Apple uses it in CoreImage : http://www.apple.com/macosx/tiger/core.html No triangles there, just plain old pixels... As a side note they are now using float pixels (i.e. RGB is now float values, and not integers) This works better, when doing several filters. But that's not really about "integer types" is it, --anders
Nov 24 2004
parent reply "Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:
Hi.

 It is faster since you can load/store 32 bits at one time, when setting 
 several. Otherwise it would load/store 8 bits at a time, by using bytes.

Yeah, but if the array-size is fixed, as is the case with bit[8]? Why would you even want to fetch 32-bits, if you know you only need 8 and fetching a byte is just as fast?
 You would have to ask Walter what the *real* reason is, though ?

Actually, I was hoping he'd provide an answer to my question :-S (Q: why does bit[8] allocate 32 bits?)
 Oh no, I've started the discussion again? I really didn't mean to.


It's amazing how the bool/string-issue pops up every other week. Walter would just have to hard-code the aliases, just to get it over with :-)
 Yes, agreed. That's one point where bool and bit would differ: bool* 
 being valid, but bit* not.

bit* is valid too. It's just that it doesn't work properly with arrays? But you can take the address of a bit field, and use it as "inout" too. (the dirty secret of course being that single bits are stored as bytes)

You mean stored as ints? If not, that would be even stranger: bit allocating a byte, but bit[x] allocating at least 4 bytes. Meaning there's a difference between bit and bit[1]. Is bit[1] allowed at all? Is an array of 1 in general allowed? I will test it myself :-)
 But the difference between theory and practice
 is a lot greater in practice than in theory... ;-)

How true. When designing anything, the "theory" should be checked once in a while though.
 There is some cool stuff possible with "Cg",

Hmm, "D for Graphics" comes to mind. Actually, considering their syntax, they might as well rename it :-)
 But that's not really about "integer types" is it,

Hmm, lets declare a floating-bit, fbit, and float as fbit[32] :-) Don't reply to this, it's a joke.. Lionello.
Nov 25 2004
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Lionello Lunesu wrote:

Oh no, I've started the discussion again? I really didn't mean to.

It goes back under the rug then... "Never mind the little bump" :-)

It's amazing how the bool/string-issue pops up every other week. Walter would just have to hard-code the aliases, just to get it over with :-)

Well, that "sort of" worked for the bool alias ? (only problem being keywords vs. literals vs. types, digitalmars.D/11757) The main problem with "alias char[] string;" is the existance of wchar[]... I suggested "ustring": digitalmars.D/11821 It kinda fits in with the whole int/uint theme of D, even if "u" here would mean unicode and not unsigned. (both types are Unicode, but char[] is ascii-biased) Or maybe "str" and "ustr" would be more orthogonal ? (keeping with: integer -> int, character -> char, etc) Just as long as it isn't "String", since that's a class.
 You mean stored as ints? If not, that would be even stranger: bit allocating 
 a byte, but bit[x] allocating at least 4 bytes. Meaning there's a difference 
 between bit and bit[1]. Is bit[1] allowed at all? Is an array of 1 in 
 general allowed? I will test it myself :-)

It's allowed, and it's 4 bytes long. (bit[1]) bit[] is 8 bytes, since it has an int length. bit[0] is also allowed, and is 0 bytes... (but that could just be a quirk or bug ?)
 Hmm, lets declare a floating-bit, fbit, and float as fbit[32] :-)

Like a fuzzy boolean. "mostly true", "kinda false" --anders
Nov 25 2004
prev sibling parent Georg Wrede <Georg_member pathlink.com> writes:
Lionello Lunesu wrote:
It is faster since you can load/store 32 bits at a time, when setting 
several. Otherwise it would load/store 8 bits at a time, by using bytes.

Yeah, but if the array-size is fixed, as is the case with bit[8]? Why would you even want to fetch 32-bits, if you know you only need 8 and fetching a byte is just as fast?

This "hypothetical" 32 bit machine I mentioned earlier, I was actually referring to Pentiums (among others). So, allocating 32 bits for bit[8] is natural, since allocating a byte would not generate faster execution: the Pentium fetches all of the 32 bytes anyhow because it just cannot adress memory in smaller chunks. There are machine code instructions to "load a byte from adress x", etc, but those really load 32 bits, and then throw away the 24 other bits. (The kludges I was talking about.) So C and other programming languages could go on believing they are on a byte-adressable machine. Since this is the case, the bit-array storage size is rouded up to the nearest 32 bits. Makes writing the compiler easier, generated code faster. And generating the code faster since you don't have to check for bit-array sizes of (less-than-or-equal-to) 8, 16, 24 bits. (Of course, things get interesting when we end up having 64 bit hardware. This is just one of the issues that has to be dealt with in the 64 bit version of D compiler.)
Nov 25 2004