www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - bigEndian in std.bitmanip

reply Salih Dincer <salihdb hotmail.com> writes:
Hello,

Why isn't Endian.littleEndian the default setting for read() in 
std.bitmanip?

Okay, we can easily change this if we want (I could use enum LE 
in the example) and I can also be reversed with 
data.retro.array().

```d
void main()
{
   import std.conv : hexString;
   string helloD = hexString!"48656C6C6F204421";
   // compile time converted literal string -ˆ

   import std.string : format;
   auto hexF = helloD.format!"%(%02X%)";

   import std.digest: toHexString;
   auto arr = cast(ubyte[])"Hello D!";

   auto hex = arr.toHexString;
   assert(hex == hexF);

   import std.stdio : writeln;
   hex.writeln(": ", helloD);
// 48656C6C6F204421: Hello D!
   assert(helloD == "Hello D!");

   auto data = arr.readBytes!size_t;
   data.code.writeln(": ", data.bytes);
// 2397076564600448328: Hello D!
}

template readBytes(T, R)
{
   union Bytes
   {
     T code;
     char[T.sizeof] bytes;
   }
   import std.bitmanip;
   enum LE = Endian.littleEndian;

   auto readBytes(ref R data)
   {
    import std.range : retro, array;
    auto reverse = data.retro.array;
    return Bytes(reverse.read!T);
   }
}
```

However, I think it is not compatible with Union. Thanks...

SDB 79
Oct 31 2023
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via Digitalmars-d-
learn wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in
 std.bitmanip?
Why would you expect little endian to be the default? The typical thing to do when encoding integral values in a platform-agnostic manner is to use big endian, not little endian. Either way, it supports both big endian and little endian, so if your use case requires little endian, you can do that. You just have to specifiy the endianness, and if you find that to be too verbose, you can create a wrapper to use in your own code. - Jonathan M Davis
Oct 31 2023
parent reply Salih Dincer <salihdb hotmail.com> writes:
On Tuesday, 31 October 2023 at 10:24:56 UTC, Jonathan M Davis 
wrote:
 On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via 
 Digitalmars-d- learn wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in
 std.bitmanip?
Why would you expect little endian to be the default? The typical thing to do when encoding integral values in a platform-agnostic manner is to use big endian, not little endian...
Because when we create a structure with a Union, it does reverse insertion with according to the static array(bytes) index; I showed this above. I also have a convenience template like this: ```d template readBytes(T, bool big = false, R) { // pair endian version 2.0 import bop = std.bitmanip; static if(big) enum E = bop.Endian.bigEndian; else enum E = bop.Endian.littleEndian; auto readBytes(ref R dat) => bop.read!(T, E)(dat); } ``` Sorry to give you extra engage because I already solved the problem with readBytes(). Thank you for your answer, but there is 1 more problem, or even 2! The read() in the library, which is 2nd function, conflicts with std.write. Yeah, there are many solutions to this, but what it does is just read bytes. However, you can insert 4 ushorts into one ulong. Don't you think the name of the function should be readBytes, not read? Because it doesn't work with any type other than ubyte[]! SDB 79
Oct 31 2023
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Tuesday, October 31, 2023 8:23:28 AM MDT Salih Dincer via Digitalmars-d-
learn wrote:
 On Tuesday, 31 October 2023 at 10:24:56 UTC, Jonathan M Davis

 wrote:
 On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via

 Digitalmars-d- learn wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in
 std.bitmanip?
Why would you expect little endian to be the default? The typical thing to do when encoding integral values in a platform-agnostic manner is to use big endian, not little endian...
Because when we create a structure with a Union, it does reverse insertion with according to the static array(bytes) index; I showed this above.
I fail to see what the situation with the union has to do with anything. Sure, you can convert between an array of bytes and an int with a union if you want to, but what that does is going to be dependent on your local architecture. read and its related functions in std.bitmanip are architecture-independent. So, they will convert from little endian or big endian regardless of what your local architecture is. You would typically use it on ranges of bytes that come from the network or from serialized data. The most common scenario there is likely to be that they'll be in big endian, because that's what platforma-independent binary formats typically do, but you can explicitly tell read that the range is in little endian if your range of bytes happens to be in little endian. Both scenarios can occur, and it supports both. It just defaults to big endian, because that's the more common scenario when dealing with binary formats.
 I also have a convenience template like this:
 ```d
 template readBytes(T, bool big = false, R)
 {        // pair endian version 2.0
    import bop = std.bitmanip;

    static if(big)
      enum E = bop.Endian.bigEndian;
    else
      enum E = bop.Endian.littleEndian;

    auto readBytes(ref R dat)
     => bop.read!(T, E)(dat);
 }
 ```
 Sorry to give you extra engage because I already solved the
 problem with readBytes(). Thank you for your answer, but there is
 1 more problem, or even 2! The read() in the library, which is
 2nd function, conflicts with std.write. Yeah, there are many
 solutions to this, but what it does is just read bytes. However,
 you can insert 4 ushorts into one ulong.

 Don't you think the name of the function should be readBytes, not
 read?  Because it doesn't work with any type other than ubyte[]!
D's module system makes it so that names do not need to be unique across modules, and this is not the only case in Phobos where multiple modules use the same function name. It's easy enough to import only the functions you're using or to rename them via the import if you happen to be importing from multiple modules containing functions with the same name. E.G. if you want to do std.bitmanip : readBytes = read; then you can. - Jonathan M Davis
Oct 31 2023
prev sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 31 October 2023 at 10:09:53 UTC, Salih Dincer wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in 
 std.bitmanip?

 Okay, we can easily change this if we want (I could use enum LE 
 in the example) and I can also be reversed with 
 data.retro.array().

 ```d
 void main()
 {
   import std.conv : hexString;
   string helloD = hexString!"48656C6C6F204421";
   // compile time converted literal string -ˆ

   import std.string : format;
   auto hexF = helloD.format!"%(%02X%)";

   import std.digest: toHexString;
   auto arr = cast(ubyte[])"Hello D!";

   auto hex = arr.toHexString;
   assert(hex == hexF);

   import std.stdio : writeln;
   hex.writeln(": ", helloD);
 // 48656C6C6F204421: Hello D!
   assert(helloD == "Hello D!");

   auto data = arr.readBytes!size_t;
   data.code.writeln(": ", data.bytes);
 // 2397076564600448328: Hello D!
 }

 template readBytes(T, R)
 {
   union Bytes
   {
     T code;
     char[T.sizeof] bytes;
   }
   import std.bitmanip;
   enum LE = Endian.littleEndian;

   auto readBytes(ref R data)
   {
    import std.range : retro, array;
    auto reverse = data.retro.array;
    return Bytes(reverse.read!T);
   }
 }
 ```

 However, I think it is not compatible with Union. Thanks...

 SDB 79
It might make sense to change since little endian is the most common when it comes to hardware. But big endian is most common when it comes to networking. So I guess it depends on your view of what is most common. Interacting with your local hardware or networking.
Oct 31 2023
parent reply Salih Dincer <salihdb hotmail.com> writes:
On Tuesday, 31 October 2023 at 14:43:43 UTC, Imperatorn wrote:
 It might make sense to change since little endian is the most 
 common when it comes to hardware. But big endian is most common 
 when it comes to networking. So I guess it depends on your view 
 of what is most common. Interacting with your local hardware or 
 networking.
I realized that I had to make my prefer based on the most common. But I have to use Union. That's why I have to choose little.Endian. Because it is compatible with both Union and HexString. My test code works perfectly as seen below. I'm grateful to everyone who helped here and [on the other thread](https://forum.dlang.org/thread/ekpvajiablcfueyipcal forum.dlang.org). ```d enum sampleText = "Hello D!"; // length <= 8 char void main() { //import sdb.string : UnionBytes; mixin UnionBytes!size_t; bytes.init = sampleText; import std.digest: toHexString; auto hexF = bytes.cell.toHexString; assert(hexF == "48656C6C6F204421"); import std.string : format; auto helloD = sampleText.format!"%(%02X%)"; assert(hexF == helloD); import std.stdio; bytes.code.writeln(": ", helloD); /* Prints: 2397076564600448328: 48656C6C6F204421 */ import std.conv : hexString; static assert(sampleText == hexString!"48656C6C6F204421"); //import sdb.string : readBytes; auto code = bytes.cell.readBytes!size_t; assert(code == bytes.code); bytes.init = code; code.writeln(": ", bytes); /* Prints: 2397076564600448328: Hello D! */ assert(bytes[] == [72, 101, 108, 108, 111, 32, 68, 33]); //import sdb.string : HexString auto str = "0x"; auto hex = HexString!size_t(bytes.code); hex.each!(chr => str ~= chr); str.writeln; // 0x48656C6C6F204421 } ``` My core template (UnionBytes) is initialized like this, and underneath I have the readBytes template, which also works with static arrays: ```d // ... import std.range : front, popFront; size_t i; do // new version: range support { char chr; // default init: 0xFF chr &= str.front; // masking code |= T(chr) << (i * 8); // shifting str.popFront; // next char } while(++i < size); } auto opCast(Cast : T)() const => code; auto opCast(Cast : string)() const => this.format!"%s"; auto toString(void delegate(in char[]) sink) const => sink.formattedWrite("%s", cast(char[])cell); } UnionBytes bytes; // for mixin } template readBytes(T, bool big = false, R) { // pair endian version 2.1 import std.bitmanip; static if(big) enum E = Endian.bigEndian; else enum E = Endian.littleEndian; import std.range : ElementType; alias ET = ElementType!R; auto readBytes(ref R dat) { auto data = cast(ET[])dat; return read!(T, E)(data); } } ``` SDB 79
Nov 02 2023
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 2 November 2023 at 11:29:05 UTC, Salih Dincer wrote:
 On Tuesday, 31 October 2023 at 14:43:43 UTC, Imperatorn wrote:
 It might make sense to change since little endian is the most 
 common when it comes to hardware. But big endian is most 
 common when it comes to networking. So I guess it depends on 
 your view of what is most common. Interacting with your local 
 hardware or networking.
I realized that I had to make my prefer based on the most common. But I have to use Union. That's why I have to choose little.Endian. Because it is compatible with both Union and HexString. My test code works perfectly as seen below. I'm grateful to everyone who helped here and [on the other thread](https://forum.dlang.org/thread/ekpvajiablcfueyipcal forum.dlang.org).
Nice to hear you found a solution. Little endian is *most common* in hardware but big endian is *most common* in networking, so defining a default endianness can be tricky.
Nov 02 2023