www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Convert some ints into a byte array without allocations?

reply Samson Smith <fsdf dsfd.com> writes:
I'm trying to make a fast little function that'll give me a 
random looking (but deterministic) value from an x,y position on 
a grid. I'm just going to run each co-ord that I need through an 
FNV-1a hash function as an array of bytes since that seems like a 
fast and easy way to go. I'm going to need to do this a lot and 
quickly for a real time application so I don't want to waste a 
lot of cycles converting data or allocating space for an array.

In a nutshell how do I cast an int into a byte array?

I tried this:

byte[] bytes = cast(byte[])x;
 Error: cannot cast expression x of type int to byte[]
What should I be doing instead?
Jan 16
next sibling parent reply Yazan D <invalid email.com> writes:
On Sat, 16 Jan 2016 14:34:54 +0000, Samson Smith wrote:

 I'm trying to make a fast little function that'll give me a random
 looking (but deterministic) value from an x,y position on a grid. I'm
 just going to run each co-ord that I need through an FNV-1a hash
 function as an array of bytes since that seems like a fast and easy way
 to go. I'm going to need to do this a lot and quickly for a real time
 application so I don't want to waste a lot of cycles converting data or
 allocating space for an array.
 
 In a nutshell how do I cast an int into a byte array?
 
 I tried this:
 
 byte[] bytes = cast(byte[])x;
 Error: cannot cast expression x of type int to byte[]
What should I be doing instead?
You can do this: ubyte[] b = (cast(ubyte*) &a)[0 .. int.sizeof]; It is casting the pointer to `a` to a ubyte (or byte) pointer and then taking a slice the size of int.
Jan 16
next sibling parent reply Yazan D <invalid email.com> writes:
On Sat, 16 Jan 2016 14:42:27 +0000, Yazan D wrote:
 
 You can do this:
 ubyte[] b = (cast(ubyte*) &a)[0 .. int.sizeof];
 
 It is casting the pointer to `a` to a ubyte (or byte) pointer and then
 taking a slice the size of int.
You can also use a union: union Foo { int i; ubyte[4] b; } // write to int part Foo f = Foo(a); // then read from ubyte part writeln(foo.b); ps. I am not sure of the aliasing rules in D for unions. In C, this is allowed, but in C++, this is undefined behaviour AFAIK.
Jan 16
parent tsbockman <thomas.bockman gmail.com> writes:
On Saturday, 16 January 2016 at 14:46:47 UTC, Yazan D wrote:
 You can also use a union:

 union Foo
 {
   int i;
   ubyte[4] b;
 }

 // write to int part
 Foo f = Foo(a);
 // then read from ubyte part
 writeln(foo.b);

 ps. I am not sure of the aliasing rules in D for unions. In C, 
 this is allowed, but in C++, this is undefined behaviour AFAIK.
I sure hope it's not undefined behaviour in D, seeing as this technique is used several places in the standard library.
Jan 16
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Yazan D:

On Saturday, 16 January 2016 at 14:42:27 UTC, Yazan D wrote:
 ubyte[] b = (cast(ubyte*) &a)[0 .. int.sizeof];
Better to use the actual size: ubyte[] b = (cast(ubyte*) &a)[0 .. a.sizeof]; Bye, bearophile
Jan 16
parent Samson Smith <fsdf dsfd.com> writes:
On Saturday, 16 January 2016 at 15:42:39 UTC, bearophile wrote:
 Yazan D:

 On Saturday, 16 January 2016 at 14:42:27 UTC, Yazan D wrote:
 ubyte[] b = (cast(ubyte*) &a)[0 .. int.sizeof];
Better to use the actual size: ubyte[] b = (cast(ubyte*) &a)[0 .. a.sizeof]; Bye, bearophile
Good thinking, I won't have to change it around if I change the type of my co-ords later. Thanks :)
Jan 16
prev sibling parent reply Samson Smith <fsdf dsfd.com> writes:
On Saturday, 16 January 2016 at 14:42:27 UTC, Yazan D wrote:
 On Sat, 16 Jan 2016 14:34:54 +0000, Samson Smith wrote:

 [...]
You can do this: ubyte[] b = (cast(ubyte*) &a)[0 .. int.sizeof]; It is casting the pointer to `a` to a ubyte (or byte) pointer and then taking a slice the size of int.
This seems to work. Thankyou!
Jan 16
parent Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Jan 2016 15:46:00 +0000
schrieb Samson Smith <fsdf dsfd.com>:

 On Saturday, 16 January 2016 at 14:42:27 UTC, Yazan D wrote:
 On Sat, 16 Jan 2016 14:34:54 +0000, Samson Smith wrote:
  
 [...]  
You can do this: ubyte[] b = (cast(ubyte*) &a)[0 .. int.sizeof]; It is casting the pointer to `a` to a ubyte (or byte) pointer and then taking a slice the size of int.
This seems to work. Thankyou!
You need to be careful with that code though. As you're taking the address of the a variable, b.ptr will point to a. If a is on the stack you must make sure you do not escape the b reference. Another option is using static arrays: ubyte[a.sizeof] b = *(cast(ubyte[a.sizeof]*)&a); Static arrays are value types. Whenever you pass b to a function it's copied and you don't have to worry about the lifetime of a. This pointer cast (int => ubyte[4]) is safe, but the inverse operation, casting from ubyte[4] to int, is not safe. For the inverse operation you'd have to use unions as shown in Yazans response.
Jan 16
prev sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Saturday, January 16, 2016 14:34:54 Samson Smith via Digitalmars-d-learn
wrote:
 I'm trying to make a fast little function that'll give me a
 random looking (but deterministic) value from an x,y position on
 a grid. I'm just going to run each co-ord that I need through an
 FNV-1a hash function as an array of bytes since that seems like a
 fast and easy way to go. I'm going to need to do this a lot and
 quickly for a real time application so I don't want to waste a
 lot of cycles converting data or allocating space for an array.

 In a nutshell how do I cast an int into a byte array?

 I tried this:

 byte[] bytes = cast(byte[])x;
 Error: cannot cast expression x of type int to byte[]
What should I be doing instead?
For this particular case, since you're hashing rather than doing something like putting the resulting value on the wire, the cast that others suggested may very well be the way to go, but the typesafe way to do the conversion would be to use std.bitmanip. int i = 12345; auto arr = nativeToBigEndian(i); where the result is ubyte[4], because the argument was an int. If it had been a long, it would have been ubyte[8]. So, you avoid bugs where you get the sizes wrong. The only reason that I can think of to _not_ do this in your case would be speed, simply because you don't care about swapping the endianness like you would when sending the data via a socket or whatnot. Of course, if you knew that you were always going to be on little endian machines, you could also use nativeToLittleEndian to avoid the swap, though that still might be slower than a simple cast depending on the optimizer (it uses a union internally). But it will be less error-prone to use those functions, and if you _do_ actually need to swap endianness, then they're exactly what you should be using. We've had cases that have come up where using those functions prevented bugs precisely because the person writing the code got the sizes wrong (and the compiler complained, since nativeToBigEndian and friends deal with the sizes in a typesafe manner). - Jonathan M Davis
Jan 16
parent reply Samson Smith <fsdf dsfd.com> writes:
On Saturday, 16 January 2016 at 16:28:21 UTC, Jonathan M Davis 
wrote:
 On Saturday, January 16, 2016 14:34:54 Samson Smith via 
 Digitalmars-d-learn wrote:
 I'm trying to make a fast little function that'll give me a 
 random looking (but deterministic) value from an x,y position 
 on a grid. I'm just going to run each co-ord that I need 
 through an FNV-1a hash function as an array of bytes since 
 that seems like a fast and easy way to go. I'm going to need 
 to do this a lot and quickly for a real time application so I 
 don't want to waste a lot of cycles converting data or 
 allocating space for an array.

 In a nutshell how do I cast an int into a byte array?

 I tried this:

 byte[] bytes = cast(byte[])x;
 Error: cannot cast expression x of type int to byte[]
What should I be doing instead?
For this particular case, since you're hashing rather than doing something like putting the resulting value on the wire, the cast that others suggested may very well be the way to go, but the typesafe way to do the conversion would be to use std.bitmanip. int i = 12345; auto arr = nativeToBigEndian(i); where the result is ubyte[4], because the argument was an int. If it had been a long, it would have been ubyte[8]. So, you avoid bugs where you get the sizes wrong. The only reason that I can think of to _not_ do this in your case would be speed, simply because you don't care about swapping the endianness like you would when sending the data via a socket or whatnot. Of course, if you knew that you were always going to be on little endian machines, you could also use nativeToLittleEndian to avoid the swap, though that still might be slower than a simple cast depending on the optimizer (it uses a union internally). But it will be less error-prone to use those functions, and if you _do_ actually need to swap endianness, then they're exactly what you should be using. We've had cases that have come up where using those functions prevented bugs precisely because the person writing the code got the sizes wrong (and the compiler complained, since nativeToBigEndian and friends deal with the sizes in a typesafe manner). - Jonathan M Davis
If I'm hoping to have my hash come out the same on both bigendian and littleendian machines but not send the results between machines, should I take these precautions? I want one machine to send the other a seed (in an endian safe way) and have both machines generate the same hashes. Here's the relevant code: uint coordHash(int x, int y, uint seed){ seed = FNV1a((cast(ubyte*) &x)[0 .. x.sizeof], seed); return FNV1a((cast(ubyte*) &y)[0 .. y.sizeof], seed); } // Byte order matters for the below function uint FNV1a(ubyte[] bytes, uint code){ for(int iii = 0; iii < bytes.length; ++iii){ code ^= bytes[iii]; code *= FNV_PRIME_32; } return code; } Am I going to get the same outcome on all machines or would a byte array be divided up in reverse order to what I'd expect on some machines? If it is... I don't mind writing separate versions depending on endianness with version(BigEndian)/version(LittleEndian) to get around a runtime check... I'm just unsure of how endianness factors into the order of an array...
Jan 16
parent Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Jan 2016 18:05:46 +0000
schrieb Samson Smith <fsdf dsfd.com>:

 On Saturday, 16 January 2016 at 16:28:21 UTC, Jonathan M Davis 
 wrote:
 But it will be less error-prone to use those functions, and if 
 you _do_ actually need to swap endianness, then they're exactly 
 what you should be using. We've had cases that have come up 
 where using those functions prevented bugs precisely because 
 the person writing the code got the sizes wrong (and the 
 compiler complained, since nativeToBigEndian and friends deal 
 with the sizes in a typesafe manner).

 - Jonathan M Davis  
If I'm hoping to have my hash come out the same on both bigendian and littleendian machines but not send the results between machines, should I take these precautions? I want one machine to send the other a seed (in an endian safe way) and have both machines generate the same hashes. Here's the relevant code: uint coordHash(int x, int y, uint seed){ seed = FNV1a((cast(ubyte*) &x)[0 .. x.sizeof], seed); return FNV1a((cast(ubyte*) &y)[0 .. y.sizeof], seed); } // Byte order matters for the below function uint FNV1a(ubyte[] bytes, uint code){ for(int iii = 0; iii < bytes.length; ++iii){ code ^= bytes[iii]; code *= FNV_PRIME_32; } return code; } Am I going to get the same outcome on all machines or would a byte array be divided up in reverse order to what I'd expect on some machines? If it is... I don't mind writing separate versions depending on endianness with version(BigEndian)/version(LittleEndian) to get around a runtime check... I'm just unsure of how endianness factors into the order of an array...
If you use the simple pointer cast you will end up with different byte orders on little vs big endian machines. Endianness does not affect array order in general: ubyte[] myArray = [1, 2, 3, 4]; myArray[0] == 1, myArray[1] == 2, ... This is the same on big vs little endian machines. Endianness does affect the representation of (multi-byte)numbers: int a = 42; ubyte[4] b = *cast(ubyte[4])&a; This will generate [42, 0, 0, 0] on little endian, [0, 0, 0, 42] on big endian. So if you want the same byte output for all architectures, just choose either big or little endian (which one doesn't matter). Then convert the values on the other architecture (e.g. if you choose little endian, do nothing on little endian, swap bytes on big endian). TLDR; Just use nativeToBigEndian or nativeToLittleEndian from std.bitmanip, these functions do the right thing. These functions do not use runtime checks, they use version(Big/LittleEndian) internally. nativeToBigEndian does not do anything on big endian machines, nativeToLittleEndian doesn't do anything on little endian machines.
Jan 16