digitalmars.D.learn - How do I choose the correct primative?

Jake Thomas (80/80) Dec 31 2013 First, let me say that I am extremely enthused about D. I

Rikki Cattermole (7/91) Dec 31 2013 We have size_t defined as uint on 32bit and ulong on 64bit.

Jake Thomas (77/83) Jan 03 2014 Sorry for the delayed response - distractions.

TheFlyingFiddle (24/34) Jan 04 2014 According to this

Jake Thomas (32/40) Jan 04 2014 "Operations that output to a 32-bit subregister are automatically

TheFlyingFiddle (49/54) Jan 05 2014 Well the compiler pulls in at minimum the entire D runtime if i'm

Jake Thomas (19/45) Jan 06 2014 What do you mean by not being able to remove classes?

Jake Thomas (87/87) Jan 06 2014 Ok, I figured out how to use obj2asm. The trick is to cd to the

Jake Thomas (6/6) Jan 06 2014 Oh, and that was made from:

TheFlyingFiddle (24/37) Jan 06 2014 Well since you could potentially create classes through

Jake Thomas (35/61) Jan 06 2014 Well then. I hope that changes for the better. It should be

Marco Leise (22/22) Jan 01 2014 C compilers like D compilers will pack a struct of two 16-bit

Jake Thomas (10/13) Jan 03 2014 Abosolutely fantastic point, Marco!

TheFlyingFiddle (44/69) Jan 01 2014 Memory is very important when it comes to performance, the moving
"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail gmail.com> (5/6) Jan 02 2014 Are you looking for something like int_fast32_t and the likes

Jake Thomas (3/9) Jan 03 2014 I had never heard of int_fast32_t before, but after Googling and

"Jake Thomas" <jake fake.com> writes:

First, let me say that I am <i>extremely</i> enthused about D. I 
did research on it last year for a project and absolutely fell in 
love with it. But praise should go in another thread...


My question comes down to:
"Does dmd pack non-array primative variables in memory such that 
they are touching, or are they zero-padded out to the computer's 
native word size?"

I have a fun "little" project I work on when I have time (for 
which D is redicuosly perfect, BTW), and right now I am "merely" 
listing the prototypes of functions that will comprise its API.


On my first go-through of the function protypes, I thoughtfully 
figured out the smallest primatives I could safely use for inputs 
and outputs. Obviously, when it comes to programming, I'm a 
little OCD - who cares about memory to that degree anymore when 
we have gigabytes of RAM? This might not even come into play on 
the Raspberry Pi.

I also figured that choosing a safe minimum would make the code 
more self-commented by queing the reader into what the expected 
value range for the variable is.

Then I took Architecture & Assembly class. There I learned that 
the load instruction grabs an entire native word size, every 
time, regardless of how many bits your variable takes up.

When we programmed in assembly in that class, for both 
performance and coding ease, we only worked with variables that 
were the native code size.


I found out that it's actually extra work for the processor to 
use values smaller than the native word size: it has to AND off 
the unwanted bits and possibly shift them over.


So, if dmd packs variables together, I would want to always use 
the native word size to avoid that extra work, and I would never 
want to use shorts, ints, or longs. Instead, I'd want to do this:
<code>
version (X86)
{
   alias int native; //ints are 32-bit, the native size in this 
case.
   alias uint unative;
}

version (X86_64)
{
   alias long native; //longs are 64-bit, the native size in this 
case.
   alias ulong unative;
}
</code>

And then only use natives, unatives, and booleans (can't avoid 
them) for my primatives.

I really hope this isn't the case because it would make D's 
entire primative system pointless. In acedamia, C is often 
scolded for its ints always being the native word size, while 
Java is praised for being consistent from platform to platform. 
But if dmd packs its variables, D is the one that should be 
scolded and C is the one that should be praised for the same 
reason of the opposite.


If, however, dmd always zero-pads its variables so that each load 
instruction only grabs the desired value with no need of extra 
work, I would never have to worry about whether my variable is 
the native word size.

However, this knowledge would still affect my programming:

If I know my code will only ever be compiled for 32-bit machines 
and up, I should never use shorts. Doing so would always waste at 
least 16-bits per short. Even if I think I will never overflow a 
short, why not just take the whole 32 bits; they're allocated for 
the variable anyways; not using those bits would be wasteful.

Also, if I know I don't need anymore than 32 bits for a variable, 
I should use an int, never a long. That way, the processor does 
not have to do extra work on a 32-bit machine or a 64-bit machine 
or any higher bitage. If I always default to longs like a "good 
acedemically trained computer scientist fighting crusades against 
hard caps", 32-bit machines (and 64-bit machines still running 
32-bit OSes!!!) would have to do extra work to work on 64-bit 
values split across two native words.

And lastly, if I absolutely must have more than 32-bits for a 
single value, I have no choice but to use a long.


So, I need to have this question answered to even get past the 
function prototype stage - each answer would result in different 
code.

Thank you very much,
I love D,
Jake

Dec 31 2013

"Rikki Cattermole" <alphaglosined gmail.com> writes:

On Wednesday, 1 January 2014 at 04:17:30 UTC, Jake Thomas wrote:
 First, let me say that I am <i>extremely</i> enthused about D. 
 I did research on it last year for a project and absolutely 
 fell in love with it. But praise should go in another thread...


 My question comes down to:
 "Does dmd pack non-array primative variables in memory such 
 that they are touching, or are they zero-padded out to the 
 computer's native word size?"

 I have a fun "little" project I work on when I have time (for 
 which D is redicuosly perfect, BTW), and right now I am 
 "merely" listing the prototypes of functions that will comprise 
 its API.


 On my first go-through of the function protypes, I thoughtfully 
 figured out the smallest primatives I could safely use for 
 inputs and outputs. Obviously, when it comes to programming, 
 I'm a little OCD - who cares about memory to that degree 
 anymore when we have gigabytes of RAM? This might not even come 
 into play on the Raspberry Pi.

 I also figured that choosing a safe minimum would make the code 
 more self-commented by queing the reader into what the expected 
 value range for the variable is.

 Then I took Architecture & Assembly class. There I learned that 
 the load instruction grabs an entire native word size, every 
 time, regardless of how many bits your variable takes up.

 When we programmed in assembly in that class, for both 
 performance and coding ease, we only worked with variables that 
 were the native code size.


 I found out that it's actually extra work for the processor to 
 use values smaller than the native word size: it has to AND off 
 the unwanted bits and possibly shift them over.


 So, if dmd packs variables together, I would want to always use 
 the native word size to avoid that extra work, and I would 
 never want to use shorts, ints, or longs. Instead, I'd want to 
 do this:
 <code>
 version (X86)
 {
   alias int native; //ints are 32-bit, the native size in this 
 case.
   alias uint unative;
 }

 version (X86_64)
 {
   alias long native; //longs are 64-bit, the native size in 
 this case.
   alias ulong unative;
 }
 </code>

 And then only use natives, unatives, and booleans (can't avoid 
 them) for my primatives.

 I really hope this isn't the case because it would make D's 
 entire primative system pointless. In acedamia, C is often 
 scolded for its ints always being the native word size, while 
 Java is praised for being consistent from platform to platform. 
 But if dmd packs its variables, D is the one that should be 
 scolded and C is the one that should be praised for the same 
 reason of the opposite.


 If, however, dmd always zero-pads its variables so that each 
 load instruction only grabs the desired value with no need of 
 extra work, I would never have to worry about whether my 
 variable is the native word size.

 However, this knowledge would still affect my programming:

 If I know my code will only ever be compiled for 32-bit 
 machines and up, I should never use shorts. Doing so would 
 always waste at least 16-bits per short. Even if I think I will 
 never overflow a short, why not just take the whole 32 bits; 
 they're allocated for the variable anyways; not using those 
 bits would be wasteful.

 Also, if I know I don't need anymore than 32 bits for a 
 variable, I should use an int, never a long. That way, the 
 processor does not have to do extra work on a 32-bit machine or 
 a 64-bit machine or any higher bitage. If I always default to 
 longs like a "good acedemically trained computer scientist 
 fighting crusades against hard caps", 32-bit machines (and 
 64-bit machines still running 32-bit OSes!!!) would have to do 
 extra work to work on 64-bit values split across two native 
 words.

 And lastly, if I absolutely must have more than 32-bits for a 
 single value, I have no choice but to use a long.


 So, I need to have this question answered to even get past the 
 function prototype stage - each answer would result in 
 different code.

 Thank you very much,
 I love D,
 Jake

We have size_t defined as uint on 32bit and ulong on 64bit. 
ptrdiff_t for int/long.
I don't know how dmd handles it, although you do have the ability 
to align variables.
You may want to consider gdc or ldc more than dmd as they have 
better optimization.

Dec 31 2013

"Jake Thomas" <jake fake.com> writes:

 We have size_t defined as uint on 32bit and ulong on 64bit. 
 ptrdiff_t for int/long.
 I don't know how dmd handles it, although you do have the 
 ability to align variables.
 You may want to consider gdc or ldc more than dmd as they have 
 better optimization.

Sorry for the delayed response - distractions.

Thanks for the tips! I didn't know about aligning or ptrdiff_t.

I looked into and experimented with "align". At least with dmd, 
it seems to only work in structs.

Looking at http://dlang.org/attribute.html, one might conclude 
that this is a bug in dmd - the attributes web page doesn't say 
anywhere that align should be limited to work only within 
structs. No syntactical error is thrown by the compiler if align 
is used outside a struct, but the symantics aren't there to back 
it up.


Watch this:
[code]
import std.stdio;

int main()
{
   align(8) int testInt1;
   align(8) int testInt2;
   align(8) int testInt3;

   int* testInt1Locale = &testInt1;
   int* testInt2Locale = &testInt2;
   int* testInt3Locale = &testInt3;


   long testLong1;
   long testLong2;
   long testLong3;

   long* testLong1Locale = &testLong1;
   long* testLong2Locale = &testLong2;
   long* testLong3Locale = &testLong3;

   struct AlignTest
   {
     align (8):
       int intOne;
       int intTwo;
       int intThree;
   }

   AlignTest alignTest;

   int* alignTestIntOneLocale = &alignTest.intOne;
   int* alignTestIntTwoLocale = &alignTest.intTwo;
   int* alignTestIntThreeLocale = &alignTest.intThree;

   writeln(cast(long)testInt2Locale - cast(long)testInt1Locale);
   writeln(cast(long)testInt3Locale - cast(long)testInt2Locale);

   writeln(cast(long)testLong2Locale - cast(long)testLong1Locale);
   writeln(cast(long)testLong3Locale - cast(long)testLong2Locale);

   writeln(cast(long)alignTestIntTwoLocale - 
cast(long)alignTestIntOneLocale);
   writeln(cast(long)alignTestIntThreeLocale - 
cast(long)alignTestIntTwoLocale);
   return 0;
}
[/code]

The above code, compiled with dmd for and on 64-bit linux prints 
the following:
[quote]
4 //Ints 1 & 2 are 4 bytes apart, dispite specifying them to be 8 
bytes apart.
4 //Ints 2 & 3 are 4 bytes apart, dispite specifying them to be 8 
bytes apart.
8 //Longs 1 & 2 are 8 bytes apart.
8 //Longs 2 & 3 are 8 bytes apart.
8 //Struct's ints 1 & 2 are 8 bytes apart - specified by align 
attribute.
8 //Struct's ints 2 & 3 are 8 bytes apart - specified by align 
attribute.
[/quote]



So, short-term, it seems like one would want to use my 
"native/unative" technique. But long-term, hopefully not only 
does this get fixed, but the default behavior for the compiler be 
to pad things out to the native word size without having to 
specify alignment.

Note: I tried "auto" - they come out as ints on my 64-bit 
architecture, still separated by 4 bytes.


Even with size_t and ptrdiff_t, I would still want to make my own 
aliases "native" and "unative" because size_t is intended for 
array indexes (from what I saw when I briefly read about them) 
and "native" and "unative" sound more generic and readable - to 
me at least - and follows the "u" naming convention.


Stay efficeint,
Jake

Jan 03 2014

"TheFlyingFiddle" <kurtyan student.chalmers.se> writes:

On Saturday, 4 January 2014 at 03:14:35 UTC, Jake Thomas wrote:
So, short-term, it seems like one would want to use my
"native/unative" technique. But long-term, hopefully not only
does this get fixed, but the default behavior for the compiler
be to pad things out to the native word size without having to
specify alignment.

According to this
(http://msdn.microsoft.com/en-us/library/windows/hardware/ff5
1499(v=vs.85).aspx)
32-bit registers are automatically zero extended on x64
architecture while 16-bit and 8-bit registers are not.

So 32-bit integers are fast. And since they are smaller then
64-bits they should benefit from higher cache-locality. So you
might actually slow things down by using your (u)native sizes.

Andrei has a nice talk about optimizations that talks about this
a bit
(http://shelby.tv/video/vimeo/55639112/facebook-nyc-tech-talk-andrei-alexandrescu-three-optimization-tips-for-c)

How do you folks decide if a number should be an int, long, or
even short?

It depends what the number is for. In general i stick to (u)int
or size_t but when types have some special significance for
example the red color channel or a network port i use the
corresponding types to signal bound ranges on the values.

That being said i try to minimize the size of my structs as much
as possible.
(Keep in mind that when doing this you order the fields by
alignment size)

I tend to use ushorts as indices when i know that an array will
never be larger then ushort.max. But this is only if i use 2 of
em. There is no point in having 1 short if the struct alignment
is 4 since that will only be wasted space anyways.

I guess the exact type of variables should remain up in the air
until the whole thing is implemented and tested using different
types?

Yes very much so.

Jan 04 2014

"Jake Thomas" <jake fake.com> writes:

 According to this 
 (http://msdn.microsoft.com/en-us/library/windows/hardware/ff5
1499(v=vs.85).aspx) 
 32-bit registers are automatically zero extended on x64 
 architecture while 16-bit and 8-bit registers are not.

"Operations that output to a 32-bit subregister are automatically 
zero-extended to the entire 64-bit register. Operations that 
output to 8-bit or 16-bit subregisters are not zero-extended 
(this is compatible x86 behavior)."

Hmmm. Sounds like 32-bit compatibilty mode stuff. I'm concerned 
that a 64-bit binary outputted by dmd would contain no 32-bit 
mode instructions, therefore, although a load instuction exists 
that would automatically zero-pad, I'm not sure dmd uses it.

The file command (in Linux) tells me that my outputted binary is 
a 64-bit binary, not a 32-bit/64-bit hybrid binary. Then again, 
maybe it's not programmed to check for hybrid binaries and just 
says "64-bit" if it is 64-bit overall.


To try and check for myself, I ran the binary through objdump. I 
found the "LEA"
instruction. I don't know if it's a 32-bit load instruction or a 
64-bit load instruction.

But _wow_ was I shocked at how many lines of assembly were 
generated!

I tried the following:

int main()
{
   int loadMe = void;
   return loadMe;
}

And got 86,421 lines of assembly!! I expected a load instruction 
to load whatever was at loadMe's location into r0 (the return 
register) and not much else. Maybe 10 lines - tops - due to 
compiler fluffiness. I got about 8,641 times that - over 3 more 
orders of magnatude. What is going on here?


I guess the exact type of variables should remain up in the air 
until the whole thing is implemented and tested using different 
types?

 Yes very much so.

Ah ha. Thanks for the tip. I see that this makes one's effort to 
lay down an API/library/module specification a little 
interesting. Can't publish your chickens until they hatch.

Jan 04 2014

"TheFlyingFiddle" <kurtyan student.chalmers.se> writes:

On Sunday, 5 January 2014 at 06:31:38 UTC, Jake Thomas wrote:
 And got 86,421 lines of assembly!! I expected a load 
 instruction to load whatever was at loadMe's location into r0 
 (the return register) and not much else. Maybe 10 lines - tops 
 - due to compiler fluffiness. I got about 8,641 times that - 
 over 3 more orders of magnatude. What is going on here?

Well the compiler pulls in at minimum the entire D runtime if i'm 
not mistaken which make the standard .exe about 350kb.

Things like Object.factory also pulls in it's fair share due to 
not being able to remove classes. So we get alot of fluff in 
small programs.

The module layout of the standard library is also a problem, 
there is a lot of interconnections between the different modules 
in Phobos. (will hopefully be better when the modules are broken 
down into submodules)


I tested your test program on windows x64 and got the following 
result:

mov         ebp,esp
sub         rsp,10h
mov         eax,dword ptr [rbp-8]
lea         rsp,[rbp]
pop         rbp
ret

//This does a 32 bit load into the eax register (return register) 
from the //stack.
mov         eax,dword ptr [rbp-8]


//I also ran this to see if there was any difference
int main()
{
    int loadMe = 10;
    return loadMe;
}

--Resulting main functtion

  mov         ebp,esp
  sub         rsp,10h
  mov         eax,0Ah
  mov         dword ptr [rbp-8],eax
  lea         rsp,[rbp]
  pop         rbp
  ret

//Loads 10 the value to be returned and
//Then stores that value on the stack.
//While this is not rly necessary i ran
//the code in debug mode so it does not
//remove most useless instructions.
  mov         eax,0Ah
  mov         dword ptr [rbp-8],eax


//In optimized mode it is gone
  push        rbp
  mov         rbp,rsp
  mov         eax,0Ah
  pop         rbp
  ret

So it looks like dmd does 32-bit loads at least on windows x64.

Jan 05 2014

"Jake Thomas" <jake fake.com> writes:

On Sunday, 5 January 2014 at 08:23:45 UTC, TheFlyingFiddle wrote:
 On Sunday, 5 January 2014 at 06:31:38 UTC, Jake Thomas wrote:
 And got 86,421 lines of assembly!! I expected a load 
 instruction to load whatever was at loadMe's location into r0 
 (the return register) and not much else. Maybe 10 lines - tops 
 - due to compiler fluffiness. I got about 8,641 times that - 
 over 3 more orders of magnatude. What is going on here?

 Well the compiler pulls in at minimum the entire D runtime if 
 i'm not mistaken which make the standard .exe about 350kb.

   Ah. Thank you for the explaination.

 Things like Object.factory also pulls in it's fair share due to 
 not being able to remove classes. So we get alot of fluff in 
 small programs.

   What do you mean by not being able to remove classes?

   Isn't the whole point of offering a language that has both 
structs, which
   can have functions, and classes to do away with classes when 
inheritence
   isn't needed?

 The module layout of the standard library is also a problem, 
 there is a lot of interconnections between the different 
 modules in Phobos. (will hopefully be better when the modules 
 are broken down into submodules)

   I'm a big fan of 99% of D's specification, perhaps less a fan 
of its current implementation.
   But implementations hopefully change over time for the better. 
The hope is to one day simply re-compile the same source with a 
better compiler.
 I tested your test program on windows x64 and got the following 
 result:

 mov         ebp,esp
 sub         rsp,10h
 mov         eax,dword ptr [rbp-8]
 lea         rsp,[rbp]
 pop         rbp
 ret

 //This does a 32 bit load into the eax register (return 
 //register) from the stack.
 mov         eax,dword ptr [rbp-8]


What tools and parameters did you use to obtain that dissassembly?

I did not find "dword" anywhere in the dissassembly of my test 
program.

The last place I found eax used was this line:

   27:	2e 33 00             	xor    %cs:(%rax),%eax

Jake

Jan 06 2014

"Jake Thomas" <jake fake.com> writes:

Ok, I figured out how to use obj2asm. The trick is to cd to the 
directory holding the file you wish to dissassemble and _not_ 
specify the whole path, or else it throws a confusing "Fatal 
error: unrecognized flag" error.

I ran:

obj2asm intLoadTest.o intLoadTest.d > intLoadTest.s

and got this:


FLAT	group	
	extrn	_main
	public	_deh_beg
	public	_deh_end
	public	_tlsstart
	public	_tlsend
	public	_Dmain
	public	_D11intLoadTest12__ModuleInfoZ
	public	main
	extrn	_d_run_main
	extrn	_Dmain
	extrn	_d_dso_registry
.text	segment
	assume	CS:.text
.text	ends
.data	segment
_D11intLoadTest12__ModuleInfoZ:
	db	004h,010h,000h,000h,000h,000h,000h,000h	;........
	db	069h,06eh,074h,04ch,06fh,061h,064h,054h	;intLoadT
	db	065h,073h,074h,000h	;est.
.data	ends
.bss	segment
.bss	ends
.rodata	segment
.rodata	ends
.tdata	segment
_tlsstart:
	db	000h,000h,000h,000h,000h,000h,000h,000h	;........
	db	000h,000h,000h,000h,000h,000h,000h,000h	;........
.tdata	ends
.tdata.	segment
.tdata.	ends
.text._Dmain	segment
	assume	CS:.text._Dmain
_Dmain:
		push	RBP
		mov	RBP,RSP
		mov	EAX,0Ah
		pop	RBP
		ret
		0f1f
		add	[RAX],R8B
.text._Dmain	ends
.text.main	segment
	assume	CS:.text.main
main:
		push	RBP
		mov	RBP,RSP
		sub	RSP,010h
		mov	RDX,offset FLAT:_Dmain 64
		call	  _d_run_main PC32
		leave
		ret
.text.main	ends
.data.d_dso_rec	segment
	db	000h,000h,000h,000h,000h,000h,000h,000h	;........
.data.d_dso_rec	ends
.text.d_dso_init	segment
	assume	CS:.text.d_dso_init
L0:		enter	0,0
		lea	RAX,_deh_end PC32[RIP]
		push	RAX
		lea	RAX,_deh_beg PC32[RIP]
		push	RAX
		lea	RAX,FLAT:[00h][RIP]
		push	RAX
		lea	RAX,FLAT:[00h][RIP]
		push	RAX
		lea	RAX,FLAT:.data.d_dso_rec[00h][RIP]
		push	RAX
		push	1
		mov	RDI,RSP
		call	  _d_dso_registry PLT32
		leave
		ret
.text.d_dso_init	ends
	end

Can you tell whether a 32-bit load was used?

Jake

P.S - That's _way_ less output than what objdump gave!

Jan 06 2014

"Jake Thomas" <jake fake.com> writes:

Oh, and that was made from:

int main()
{
   int loadMe = 10;
   return loadMe;
}

Jan 06 2014

"TheFlyingFiddle" <theflyingfiddle gmail.com> writes:

On Monday, 6 January 2014 at 20:08:27 UTC, Jake Thomas wrote:
 Things like Object.factory also pulls in it's fair share due 
 to not being able to remove classes. So we get alot of fluff 
 in small programs.

   What do you mean by not being able to remove classes?

   Isn't the whole point of offering a language that has both 
 structs, which
   can have functions, and classes to do away with classes when 
 inheritence
   isn't needed?

Well since you could potentially create classes through 
Object.factory at runtime the code for unused classes will be 
compiled into the binary anyways this is even if you never use 
Object.factory directly in the code. I am not 100% sure but i 
think the main problem is ModuleInfo that keeps everything alive. 
And it keeps classes alive since they could be used by object 
factory. It also keeps other information like unittests locations 
and static constructors.

 What tools and parameters did you use to obtain that 
 dissassembly?

I used the visual studio dissassembly window.


 Can you tell whether a 32-bit load was used?


_Dmain:
		push	RBP
		mov	RBP,RSP
		mov	EAX,0Ah
		pop	RBP
		ret



----   mov EAX,0AH

This is a 32-bit instruction. 64-bit instructions use the RAX 
register.

It's actually the same register but it's just named diffrently 
depending if you use the full 64-bits or just the lower 32-bits. 
It will automatically zero extend it.

See https://github.com/yasm/yasm/wiki/AMD64 for a simple intro 
into x64.

Jan 06 2014

"Jake Thomas" <jake fake.com> writes:

 Well since you could potentially create classes through 
 Object.factory at runtime the code for unused classes will be 
 compiled into the binary anyways this is even if you never use 
 Object.factory directly in the code. I am not 100% sure but i 
 think the main problem is ModuleInfo that keeps everything 
 alive. And it keeps classes alive since they could be used by 
 object factory. It also keeps other information like unittests 
 locations and static constructors.

   Well then. I hope that changes for the better. It should be 
able to see that I'm not using the object factory or anything. 
Then again, the language has certain exceptions it is capable of 
throwing, which themselves are objects. I wonder if the garbage 
collector must retain the ability to throw an exception and thus 
retain the need of classes.

 What tools and parameters did you use to obtain that 
 dissassembly?

 I used the visual studio dissassembly window.

Thanks for the tip. Always nice to know about an assortment of 
tools.

 Can you tell whether a 32-bit load was used?


 _Dmain:
 		push	RBP
 		mov	RBP,RSP
 		mov	EAX,0Ah
 		pop	RBP
 		ret



 ----   mov EAX,0AH

 This is a 32-bit instruction. 64-bit instructions use the RAX 
 register.

 It's actually the same register but it's just named diffrently 
 depending if you use the full 64-bits or just the lower 
 32-bits. It will automatically zero extend it.

 See https://github.com/yasm/yasm/wiki/AMD64 for a simple intro 
 into x64.

Excellent! We successfully proved that it does use 32-bit load 
instructions in a 64-bit binary, both for Linux and Windows!

Good to know about RAX/EAX, thanks - I was only familiar with ARM 
assembly.
There is CISC in this world, apparently.


For the full experience, I disassembled the binary from the 
following:

/*
   I always do int mains - except when trying to simplify assembly 
as much as possible for educating myself about instructions the 
compiler outputs. I never even ran this binary, I only made it to 
look at its disassembly.
*/
void main()
{
   longLoadTest();
}

long longLoadTest()
{
   long loadMe = 10;
   return loadMe;
}

   Sure enough, I saw something very similar to what you pointed 
out, but using the RAX name instead of the EAX name (for 
longLoadTest's return).

Thank you very much,
Jake

Jan 06 2014

Marco Leise <Marco.Leise gmx.de> writes:

C compilers like D compilers will pack a struct of two 16-bit
words into a 32-bit type if you don't force an alignment:
http://dlang.org/attribute.html#align
What you should avoid is having a data type start at an
address that is not a multiple of its size, especially when it
comes to SIMD.
Working with 16-bit values is not really supported in todays
x86 CPUs though, and integer math in D typically yields ints
even when you use smaller data types, reflecting what happens
on the hardware. Usually I use uint, size_t, real for things
that will go to CPU registers and the smallest data type that
will work for storage in memory.
Keep in mind that RAM access is slow compared to how fast CPUs
run. It can be beneficial to have "slower" data types if they
allow more data to fit into the CPU cache.

Typically you sort the fields of a struct by size with the
larger ones (e.g. pointers) at the top followed by ints,
shorts and finally bytes if you want to conserve memory. There
is even a template to do that for you, but I think it is more
of a toy, when you can easily do that manually without the
clutter:

Jan 01 2014

"Jake Thomas" <jake fake.com> writes:

Keep in mind that RAM access is slow compared to how fast CPUs
run. It can be beneficial to have "slower" data types if they
allow more data to fit into the CPU cache.

Abosolutely fantastic point, Marco!

Except if everything still fits in cache as "fast" types, it'd be 
worth having faster types.


How do you folks decide if a number should be an int, long, or 
even short?

I guess the exact type of variables should remain up in the air 
until the whole thing is implemented and tested using different 
types?

Wow, this is an active forum, I must say. Much thanks!
Jake

Jan 03 2014

"TheFlyingFiddle" <kurtyan student.chalmers.se> writes:

 I'm a little OCD - who cares about memory to that degree 
 anymore when we have gigabytes of RAM? This might not even come 
 into play on the Raspberry Pi.

Memory is very important when it comes to performance, the moving 
of memory is the single most energy demanding task the CPU (and 
the GPU for that matter) has.
See
(http://channel9.msdn.com/Events/Build/2013/4-329 and
  http://media.medfarm.uu.se/play/video/3261 for why this is so)

Anyhow if you want to get a good understanding about how memory 
works and is related to performance i would highly recommend 
reading this entire PDF 
http://www.akkadia.org/drepper/cpumemory.pdf

Then I took Architecture & Assembly class. There I learned that 
the load instruction grabs an entire native word size, every 
time, regardless of how many bits your variable takes up.


When we programmed in assembly in that class, for both 
performance and coding ease, we only worked with variables that 
were the native code size.


Keep in mind that schools are usually 5-15 years behind current 
technology (especially introductory classes) and what was true 
then of performance does not have to be true today.

I found out that it's actually extra work for the processor to 
use values smaller than the native word size: it has to AND off 
the unwanted bits and possibly shift them over.


So, if dmd packs variables together, I would want to always use 
the native word size to avoid that extra work, and I would 
never want to use shorts, ints, or longs. Instead, I'd want to 
do this:


This "extra work" is highly unlikely to be your bottleneck. Also 
don't assume that making everything be of native size is going to 
make things faster just because of an AND/SHIFT instruction.

And then only use natives, unatives, and booleans (can't avoid 
them) for my primatives.


You can use normal integers in if statements if you want. 
Anything not 0 will be true.

int flag = someFunc();
if(flag)
{
    //Do something.
}

I really hope this isn't the case because it would make D's 
entire primative system pointless. In acedamia, C is often 
scolded for its ints always being the native word size, while 
Java is praised for being consistent from platform to platform. 
But if dmd packs its variables, D is the one that should be 
scolded and C is the one that should be praised for the same 
reason of the opposite.


D is like java.
(u)byte  is 8bit
(u)short is 16bit
(u)int   is 32bit
(u)long  is 64bit

size_t and pttdiff_t are of platform size.

Note that floats are always calculated by the systems highest 
precision.
float  32bit
double 64bit
real   platform dependant but never lower then 64bit.


//So, I need to have this question answered to even get past the
//function prototype stage - each answer would result in different
//code.

You should not care about this in the function prototype stage 
this is the essence of premature optimization. You cant be sure 
of how fast something is going to be before you profile it. And 
even then it will vary from run to run / computer to computer and 
compiler to compiler. I think this book gives a nice introduction 
to the subject. 
http://carlos.bueno.org/optimization/mature-optimization.pdf

Jan 01 2014

"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail gmail.com> writes:

On Wednesday, 1 January 2014 at 04:17:30 UTC, Jake Thomas wrote:
 snip

Are you looking for something like int_fast32_t and the likes 
from Boost? If you don't care terribly much for when your numbers 
overflow, then as others suggested, size_t and pttwhatever work 
fine.

Jan 02 2014

"Jake Thomas" <jake fake.com> writes:

On Friday, 3 January 2014 at 05:25:49 UTC, Casper Færgemand wrote:
 On Wednesday, 1 January 2014 at 04:17:30 UTC, Jake Thomas wrote:
 snip

 Are you looking for something like int_fast32_t and the likes 
 from Boost? If you don't care terribly much for when your 
 numbers overflow, then as others suggested, size_t and 
 pttwhatever work fine.

I had never heard of int_fast32_t before, but after Googling and 
finding out what it is, yes, I am looking for just exactly that.

Jan 03 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How do I choose the correct primative?