digitalmars.D - Back to basics: integer types

Lionello Lunesu (25/25) Nov 23 2004 Hi..

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (20/40) Nov 23 2004 The supported type name "spelling" is: int8_t, int16_t, int32_t...

Georg Wrede (67/86) Nov 23 2004 I agree.

Sean Kelly (11/24) Nov 23 2004 Technically, since a byte is defined as "the smallest addressable unit" ...

Russ Lewis (5/10) Nov 23 2004 Well, to quibble with your quibble: on most 32bit machines, 8bit values

Sean Kelly (3/13) Nov 23 2004 Understood. I was actually referring to Georg's imaginary 32 bit machin...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (35/60) Nov 23 2004 I don't think that it wasn't thought of being the issue...

Lionello Lunesu (25/35) Nov 24 2004 really _the_ reason for a bit[8] (fixed-sized array) to be allocating

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (34/90) Nov 24 2004 It is faster since you can load/store 32 bits at one time, when setting

Lionello Lunesu (19/33) Nov 25 2004 Yeah, but if the array-size is fixed, as is the case with bit[8]? Why wo...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (19/30) Nov 25 2004 Well, that "sort of" worked for the bool alias ?
Georg Wrede (17/23) Nov 25 2004 This "hypothetical" 32 bit machine I mentioned earlier, I was actually

"Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:

Hi..

Another thought I had...

I like the type size-suffix for fixed-sized integers (int8, int16, int32, 
etc) and this got me thinking about the most basic of types: bit.

Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] 
the same as the current int16, etc. ? I probably need more arguments than 
"it's nice" to get this one through, but I'll think about it some more while 
you guys fire back at me.

And, am I correct in understanding that subsequent bits in a struct will 
always be packed?

struct something
{
    bit    first;
    bit    second;
};

I.e. what's the size of this struct?

If they will be packed, then I've got a (new?) argument for "bool": 
sizeof(bool) being 1 (mostly for compatibility with old code/structs as I 
don't think there's any speed gain. Well, passing "bit second" from the 
above struct to a method taking bool/bit would need a shift operation not 
needed if "second" was a real bool).

-- 
Lio.

-- Get the CACert root certificate (and a personal one) at 
http://cacert.org/

Nov 23 2004

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Lionello Lunesu wrote:

 I like the type size-suffix for fixed-sized integers (int8, int16, int32, 
 etc) and this got me thinking about the most basic of types: bit.

The supported type name "spelling" is: int8_t, int16_t, int32_t...

See the D module: import std.stdint;

 Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] 
 the same as the current int16, etc. ? I probably need more arguments than 
 "it's nice" to get this one through, but I'll think about it some more while 
 you guys fire back at me.



Had it been optimized for storage, it would have worked like you wrote

 And, am I correct in understanding that subsequent bits in a struct will 
 always be packed?

No. You could have tested it, too ?

 struct something
 {
     bit    first;
     bit    second;
 };
 
 I.e. what's the size of this struct?

2 bytes.

 If they will be packed, then I've got a (new?) argument for "bool": 
 sizeof(bool) being 1 (mostly for compatibility with old code/structs as I 
 don't think there's any speed gain. Well, passing "bit second" from the 
 above struct to a method taking bool/bit would need a shift operation not 
 needed if "second" was a real bool).

There is no "real" boolean type in D.
(Like one that only takes true/false?)


sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask.
The different implementations are fondly known as bit, wbit and dbit

Otherwise you need sub-byte pointers to access &something.second,
and that's a huge pain to implement and to pass around the block.


Personally, I'm not sure that bit *is* the fundamental data type...

Similar to how the triangle is the fundamental type in 3D graphics,
I think the byte has become the fundamental data type. For speed ?

--anders


PS.
Bit is not an integer data type in D. It's a pseudo-bool data type.
See my previous rants and the endless discussions on this newsgroup.

Nov 23 2004

Georg Wrede <Georg_member pathlink.com> writes:

Anders F Bj�rklund wrote:
 Lionello Lunesu wrote:
 There is no "real" boolean type in D.
 (Like one that only takes true/false?)
 
 
 sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask.
 The different implementations are fondly known as bit, wbit and dbit
 
 Otherwise you need sub-byte pointers to access &something.second,
 and that's a huge pain to implement and to pass around the block.
 
 
 Personally, I'm not sure that bit *is* the fundamental data type...
 
 Similar to how the triangle is the fundamental type in 3D graphics,
 I think the byte has become the fundamental data type. For speed ?

..
 PS.
 Bit is not an integer data type in D. It's a pseudo-bool data type.
 See my previous rants and the endless discussions on this newsgroup.

I agree.

While bit may be the fundamental unit of information, it is _not_,
I repeat, _not_ the fundamental data type in computers.

Contrary to public belief (IT professionals and laymen alike), these
are not the same thing.

Computer hardware manufacturers have known this for more than 50
years. If they hadn't, then the memory adress space would be a
single bit string.

Corollary (slightly off the issue)

This would lead to two things: (a) memory pointers
could point to any bit, and (b) the different cpu-sizes (today
16, 32 and 64 bits) could actually be _any_ number of bits. Thus,
we could have a 17 bit CPU or a 67 bit cpu.

Probably these computers would fetch several bits at once,
say, those 17 bits on a "17-bit" CPU. But not necessarily.

Opcodes could be of varying size without wasting memory. The CPU
would know where (at which bit-adress) to fetch the next
instruction.

Opcodes (and entire applications) would exist to process bit-
streams, a la audio CD reading. (Ever seen '1 bit processor'
written on a CD player?)

Most probably different parts of a CPU would be capable of
handling different size "words", depending on the usage. For
example, US-ASCII operations hardware would be on 7 bit
circuits, and graphics operations hardware 24 bits.

Oh, and an amazing thing with the memory: arbitrary size
"DMA like" data transfers (say between main memory and
graphics memory) would take only one clock cycle for the
_entire_ transfer !!!! <diabolical laughter> This I leave
as an excercise to the reader to figure out! </>

Yes, very nice. But practical issues in hardware design
and implementation have lead us to use a fixed word size
and memory adressing by the byte.

(Actually, the word
size usually determines the fetched number of bits, so
on some hardware architectures the _actual_ granularity
of adressing is not the same as the _appearing_ granularity.
E.g. we may have a CPU that is only capable of adressing
memory "as 32 bit entities", but then, using various
kludges, it pretends to be able to do it by the byte
(=8bits). And this only because systems programmers'
minds are stuck to byte as the "only" size.)

Back to the issue at hand

It is regrettable that this was not thought of in the
early days of D. As a result we have this Bit data type
that lingers haunting the entire D community.

Would I become "the God in D world", on the First Day, I
would throw away the Bit data type.

On the second day I would create a Boolean data type, and
write in the specification that its size would be
architecture dependent, and suggest the fastest integral
size would be used. (Eg, probably 32 bits on Pentiums.)

Then I would seriously consider forbidding int casts
to/from Boolean. You'd have to write

myBoolean = ( myInt != 0 );
myInt2 = myBoolean ? 1 : 0;

I would not disapprove of arrays of Booleans.

I would not even disallow union wizardry with int on top
of packed Boolean, but I would seriously discourage it.

Of course the compiler writer would be free to use whatever
intermediate values in Boolean calculations etc., but the
programmer should percieve Booleans as distinct from 1 and 0.

And of course, programmers would still be free to do bitwise
logical ops between ints. But that is _really_ unrelated to
this Boolean issue. Isn't it!

Nov 23 2004

Sean Kelly <sean f4.ca> writes:

In article <cnvusn$1hh3$1 digitaldaemon.com>, Georg Wrede says...
(Actually, the word
size usually determines the fetched number of bits, so
on some hardware architectures the _actual_ granularity
of adressing is not the same as the _appearing_ granularity.
E.g. we may have a CPU that is only capable of adressing
memory "as 32 bit entities", but then, using various
kludges, it pretends to be able to do it by the byte
(=8bits). And this only because systems programmers'
minds are stuck to byte as the "only" size.)

Technically, since a byte is defined as "the smallest addressable unit" in
C/C++, I would expect a byte on such machines to be 32 bits.  But then I suppose
we've gotten so used to 8 bit bytes that perhaps the kludges would be expected?
In any case, D imposes size constraints on all types, so in our world a byte
would be 8 bits.


On the second day I would create a Boolean data type, and
write in the specification that its size would be
architecture dependent, and suggest the fastest integral
size would be used. (Eg, probably 32 bits on Pentiums.)

..which would make the boolean type the only D data type with an indeterminate
size.  I would almost rather require that such a type always be 8 or 32 bits,
unless all size restrictions were eliminated (and I don't expect that to
happen).


Sean

Nov 23 2004

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Sean Kelly wrote:
 Technically, since a byte is defined as "the smallest addressable unit" in
 C/C++, I would expect a byte on such machines to be 32 bits.  But then I
suppose
 we've gotten so used to 8 bit bytes that perhaps the kludges would be expected?
 In any case, D imposes size constraints on all types, so in our world a byte
 would be 8 bits.

Well, to quibble with your quibble: on most 32bit machines, 8bit values 
are individually addressable.  Even on architectures that require 
aligned reads & writes, they often use byte-addressing, and byte-level 
primitives are sometimes available.

Nov 23 2004

Sean Kelly <sean f4.ca> writes:

In article <co03i7$1nmp$1 digitaldaemon.com>, Russ Lewis says...
Sean Kelly wrote:
 Technically, since a byte is defined as "the smallest addressable unit" in
 C/C++, I would expect a byte on such machines to be 32 bits.  But then I
suppose
 we've gotten so used to 8 bit bytes that perhaps the kludges would be expected?
 In any case, D imposes size constraints on all types, so in our world a byte
 would be 8 bits.

Well, to quibble with your quibble: on most 32bit machines, 8bit values 
are individually addressable.  Even on architectures that require 
aligned reads & writes, they often use byte-addressing, and byte-level 
primitives are sometimes available.

Understood.  I was actually referring to Georg's imaginary 32 bit machine.


Sean

Nov 23 2004

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Georg Wrede wrote:

 Back to the issue at hand

A somewhat dead and beaten issue, but I'll bite...

 It is regrettable that this was not thought of in the
 early days of D. As a result we have this Bit data type
 that lingers haunting the entire D community.

I don't think that it wasn't thought of being the issue...
Maybe bit array slices got a tad harder to implement, but.

 Would I become "the God in D world", on the First Day, I
 would throw away the Bit data type.

The current Deity being the proud bit creator, of course :-)

 On the second day I would create a Boolean data type, and
 write in the specification that its size would be
 architecture dependent, and suggest the fastest integral
 size would be used. (Eg, probably 32 bits on Pentiums.)

Going with the main "fixed-size" ideom of D, one could
probably fix it as, say: 1 byte, too - without crying.

Either the way, the main issue being there would be a
boolean type that only accepts true and false values.

 Then I would seriously consider forbidding int casts
 to/from Boolean. You'd have to write
 
 myBoolean = ( myInt != 0 );
 myInt2 = myBoolean ? 1 : 0;

Just as in Java... (the language, not the VM or OS)
Also required: myBoolean = ( myPointer != null );

Note: some people *hate* these "extra" steps

 I would not disapprove of arrays of Booleans.
 
 I would not even disallow union wizardry with int on top
 of packed Boolean, but I would seriously discourage it.

I'm assuming you mean packed bit arrays. Java does those
with an Object class, but language support isn't all bad.

If it hadn't been for the pain to get the pointers and
slices implemented, that is. Probably not worth it, IMHO.

 Of course the compiler writer would be free to use whatever
 intermediate values in Boolean calculations etc., but the
 programmer should percieve Booleans as distinct from 1 and 0.

Heck, I'd settle for true and false being distinct from 1 and 0.
(currently the are hard-coded as bit literals in the D compiler)

But the fact remains that D chose to side with C and C++ on logic.
(or actually more with C, since there is no "bool" primitive type)

And since zero is false and non-zero is true, who's really boolean?

 And of course, programmers would still be free to do bitwise
 logical ops between ints. But that is _really_ unrelated to
 this Boolean issue. Isn't it!

It is.

A related issue, however, is what type that conditionals etc expect.
Had there been a boolean type, it's natural to make them require it.

IMHO, Java got this right. (again, the language, not the religion...)
See http://java.sun.com/docs/books/jls/ for that language's details.


Anyway, D is a whole other language and has another such specification.
And in *that* spec, there's three string types and three boolean types.

string types: char[], wchar[], dchar[]
boolean types: bit, wbit (byte), dbit (int)


You get to pick one of each, to optimize for the task at hand.
It's not a fatal flaw... More of a "missed opportunity", really?

"alias bit bool; alias char[] string; goto on_with_our_lives;"
See also http://www.prowiki.org/wiki4d/wiki.cgi?BooleanNotEquBit

--anders

Nov 23 2004

"Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:

Hi..



From what I recall, accessing a byte is just as fast as acessing a dword (on 
=386). Accessing a word (short, 16-bit) will result in a penalty. Is speed 

really _the_ reason for a bit[8] (fixed-sized array) to be allocating 
32-bits (independent of the data that follows it) ? For bit[] (dynamically 
sized) I understand it. It will definately be faster, because accessing a 
dword is not slower and you do less growing.

 No. You could have tested it, too ?

Yes yes, I should have tested it. But I'm one of those "patient lurkers": 
reading this newsgroup while VC++ is busy building. Anyway, I did not want 
to end-up discussing differences between dmd and gdc, and they might differ 
in this respect, so I though I'd keep it theoretical. (Also that I could 
have tested :-/ )

 There is no "real" boolean type in D.
 (Like one that only takes true/false?)

Oh no, I've started the discussion again? I really didn't mean to.

 Otherwise you need sub-byte pointers to access &something.second,
 and that's a huge pain to implement and to pass around the block.

Yes, agreed. That's one point where bool and bit would differ: bool* being 
valid, but bit* not.

 Personally, I'm not sure that bit *is* the fundamental data type...

Theoretically it still is. I mean, it's the basic granularity, there's 
nothing smaller than bit. The quarks equivalent in information technology.

 Similar to how the triangle is the fundamental type in 3D graphics,
 I think the byte has become the fundamental data type. For speed ?

Ah, practically yes, but triangles are not theoretically the fundamental 
type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. 
But in practice (especially in OGL/D3D-type rendering) triangles are the 
basis. This shouldn't prevent anybody from discussing 'the basics' though.

I also do think that 3D rendering based on triangles is just a efficient 
hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get 
rid of them in the future (NURBS and other parameter-based geometries are 
gaining ground again).

Lionello.

Nov 24 2004

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Lionello Lunesu wrote:



 
 From what I recall, accessing a byte is just as fast as acessing a dword (on 
  >=386). Accessing a word (short, 16-bit) will result in a penalty. Is speed 
 really _the_ reason for a bit[8] (fixed-sized array) to be allocating 
 32-bits (independent of the data that follows it) ? For bit[] (dynamically 
 sized) I understand it. It will definately be faster, because accessing a 
 dword is not slower and you do less growing.

It is faster since you can load/store 32 bits at one time, when setting 
several. Otherwise it would load/store 8 bits at a time, by using bytes.

Like this C code: (where "/ 32" converts to ">> 5")

     unsigned *data;

[...]
 void Bits::set(unsigned bitnum)
 {
     data[bitnum / 32] |= 1 << (bitnum & 31);
 }
 
 void Bits::clear(unsigned bitnum)
 {
     data[bitnum / 32] &= ~(1 << (bitnum & 31));
 }
 
 int Bits::test(unsigned bitnum)
 {
     return data[bitnum / 32] & (1 << (bitnum & 31));
 }

bit[] is similar, just has a count parameter and then a pointer to data.


You would have to ask Walter what the *real* reason is, though ?

http://www.digitalmars.com/d/arrays.html just says:

  Bit vectors can be constructed:
 
 	bit[10] x;		// array of 10 bits
 
 The amount of storage used up is implementation dependent.
 Implementation Note: on Intel CPUs it would be rounded up to the next 32 bit
size.
 
 	x.length		// 10, number of bits
 	x.size			// 4,  bytes of storage
 	
 
 So, the size per element is not (x.size / x.length).

It's similar on PowerPC CPUs, by the way. At least with gdc today.


 Oh no, I've started the discussion again? I really didn't mean to.

It goes back under the rug then... "Never mind the little bump" :-)

Otherwise you need sub-byte pointers to access &something.second,
and that's a huge pain to implement and to pass around the block.

 
 Yes, agreed. That's one point where bool and bit would differ: bool* being 
 valid, but bit* not.

bit* is valid too. It's just that it doesn't work properly with arrays?

But you can take the address of a bit field, and use it as "inout" too.
(the dirty secret of course being that single bits are stored as bytes)

Personally, I'm not sure that bit *is* the fundamental data type...

 
 Theoretically it still is. I mean, it's the basic granularity, there's 
 nothing smaller than bit. The quarks equivalent in information technology.

No argument there.

But the difference between theory and practice
is a lot greater in practice than in theory... ;-)

Similar to how the triangle is the fundamental type in 3D graphics,
I think the byte has become the fundamental data type. For speed ?

 
 Ah, practically yes, but triangles are not theoretically the fundamental 
 type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. 
 But in practice (especially in OGL/D3D-type rendering) triangles are the 
 basis. This shouldn't prevent anybody from discussing 'the basics' though.

Of course not. Pixels are still the end result...

Just that when drawing fast 3D animations, one
does it a tri/quad at a time in hardware - instead
of submitting each pixel and rendering in software.

Similarly we can access bits using bytes or ints -
plus the various bit operators like: and, or, shifts
and have the hardware accumulate the bits in memory ?

 I also do think that 3D rendering based on triangles is just a efficient 
 hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get 
 rid of them in the future (NURBS and other parameter-based geometries are 
 gaining ground again).

There is some cool stuff possible with "Cg",
http://developer.nvidia.com/page/cg_main.html
(C for Graphics) Apple uses it in CoreImage :
http://www.apple.com/macosx/tiger/core.html

No triangles there, just plain old pixels...

As a side note they are now using float pixels
(i.e. RGB is now float values, and not integers)
This works better, when doing several filters.

But that's not really about "integer types" is it,
--anders

Nov 24 2004

"Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:

Hi.

 It is faster since you can load/store 32 bits at one time, when setting 
 several. Otherwise it would load/store 8 bits at a time, by using bytes.

Yeah, but if the array-size is fixed, as is the case with bit[8]? Why would 
you even want to fetch 32-bits, if you know you only need 8 and fetching a 
byte is just as fast?

 You would have to ask Walter what the *real* reason is, though ?

Actually, I was hoping he'd provide an answer to my question :-S
(Q: why does bit[8] allocate 32 bits?)

 Oh no, I've started the discussion again? I really didn't mean to.

 It goes back under the rug then... "Never mind the little bump" :-)

It's amazing how the bool/string-issue pops up every other week. Walter 
would just have to hard-code the aliases, just to get it over with :-)

 Yes, agreed. That's one point where bool and bit would differ: bool* 
 being valid, but bit* not.

 bit* is valid too. It's just that it doesn't work properly with arrays?

 But you can take the address of a bit field, and use it as "inout" too.
 (the dirty secret of course being that single bits are stored as bytes)

You mean stored as ints? If not, that would be even stranger: bit allocating 
a byte, but bit[x] allocating at least 4 bytes. Meaning there's a difference 
between bit and bit[1]. Is bit[1] allowed at all? Is an array of 1 in 
general allowed? I will test it myself :-)

 But the difference between theory and practice
 is a lot greater in practice than in theory... ;-)

How true. When designing anything, the "theory" should be checked once in a 
while though.

 There is some cool stuff possible with "Cg",

Hmm, "D for Graphics" comes to mind. Actually, considering their syntax, 
they might as well rename it :-)

 But that's not really about "integer types" is it,

Hmm, lets declare a floating-bit, fbit, and float as fbit[32] :-)
Don't reply to this, it's a joke..

Lionello.

Nov 25 2004

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Lionello Lunesu wrote:

Oh no, I've started the discussion again? I really didn't mean to.

It goes back under the rug then... "Never mind the little bump" :-)

 
 It's amazing how the bool/string-issue pops up every other week. Walter 
 would just have to hard-code the aliases, just to get it over with :-)

Well, that "sort of" worked for the bool alias ?
(only problem being keywords vs. literals vs. types,
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/11757)

The main problem with "alias char[] string;" is
the existance of wchar[]... I suggested "ustring":
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/11821

It kinda fits in with the whole int/uint theme of D,
even if "u" here would mean unicode and not unsigned.
(both types are Unicode, but char[] is ascii-biased)

Or maybe "str" and "ustr" would be more orthogonal ?
(keeping with: integer -> int, character -> char, etc)
Just as long as it isn't "String", since that's a class.

 You mean stored as ints? If not, that would be even stranger: bit allocating 
 a byte, but bit[x] allocating at least 4 bytes. Meaning there's a difference 
 between bit and bit[1]. Is bit[1] allowed at all? Is an array of 1 in 
 general allowed? I will test it myself :-)

It's allowed, and it's 4 bytes long. (bit[1])
bit[] is 8 bytes, since it has an int length.

bit[0] is also allowed, and is 0 bytes...
(but that could just be a quirk or bug ?)

 Hmm, lets declare a floating-bit, fbit, and float as fbit[32] :-)

Like a fuzzy boolean. "mostly true", "kinda false"

--anders

Nov 25 2004

Georg Wrede <Georg_member pathlink.com> writes:

Lionello Lunesu wrote:
It is faster since you can load/store 32 bits at a time, when setting 
several. Otherwise it would load/store 8 bits at a time, by using bytes.

 
 Yeah, but if the array-size is fixed, as is the case with bit[8]? Why would 
 you even want to fetch 32-bits, if you know you only need 8 and fetching a 
 byte is just as fast?

This "hypothetical" 32 bit machine I mentioned earlier, I was actually
referring to Pentiums (among others).

So, allocating 32 bits for bit[8] is natural, since allocating a byte
would not generate faster execution: the Pentium fetches all of the
32 bytes anyhow because it just cannot adress memory in smaller chunks.

There are machine code instructions to "load a byte from adress x", etc,
but those really load 32 bits, and then throw away the 24 other bits.
(The kludges I was talking about.) So C and other programming languages
could go on believing they are on a byte-adressable machine.

Since this is the case, the bit-array storage size is rouded up to the
nearest 32 bits. Makes writing the compiler easier, generated code
faster. And generating the code faster since you don't have to check
for bit-array sizes of (less-than-or-equal-to) 8, 16, 24 bits.

(Of course, things get interesting when we end up having 64 bit 
hardware. This is just one of the issues that has to be dealt with in
the 64 bit version of D compiler.)

Nov 25 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Back to basics: integer types