digitalmars.D.learn - bigEndian in std.bitmanip

Salih Dincer (45/45) Oct 31 2023 Hello,

Jonathan M Davis (9/12) Oct 31 2023 Why would you expect little endian to be the default? The typical thing ...

Salih Dincer (26/36) Oct 31 2023 Because when we create a structure with a Union, it does reverse

Jonathan M Davis (24/62) Oct 31 2023 I fail to see what the situation with the union has to do with anything.

Imperatorn (6/51) Oct 31 2023 It might make sense to change since little endian is the most

Salih Dincer (78/83) Nov 02 2023 I realized that I had to make my prefer based on the most common.

Imperatorn (4/16) Nov 02 2023 Nice to hear you found a solution. Little endian is *most common*

Salih Dincer <salihdb hotmail.com> writes:

Hello,

Why isn't Endian.littleEndian the default setting for read() in 
std.bitmanip?

Okay, we can easily change this if we want (I could use enum LE 
in the example) and I can also be reversed with 
data.retro.array().

```d
void main()
{
   import std.conv : hexString;
   string helloD = hexString!"48656C6C6F204421";
   // compile time converted literal string -ˆ

   import std.string : format;
   auto hexF = helloD.format!"%(%02X%)";

   import std.digest: toHexString;
   auto arr = cast(ubyte[])"Hello D!";

   auto hex = arr.toHexString;
   assert(hex == hexF);

   import std.stdio : writeln;
   hex.writeln(": ", helloD);
// 48656C6C6F204421: Hello D!
   assert(helloD == "Hello D!");

   auto data = arr.readBytes!size_t;
   data.code.writeln(": ", data.bytes);
// 2397076564600448328: Hello D!
}

template readBytes(T, R)
{
   union Bytes
   {
     T code;
     char[T.sizeof] bytes;
   }
   import std.bitmanip;
   enum LE = Endian.littleEndian;

   auto readBytes(ref R data)
   {
    import std.range : retro, array;
    auto reverse = data.retro.array;
    return Bytes(reverse.read!T);
   }
}
```

However, I think it is not compatible with Union. Thanks...

SDB 79

Oct 31 2023

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via Digitalmars-d-
learn wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in
 std.bitmanip?

Why would you expect little endian to be the default? The typical thing to
do when encoding integral values in a platform-agnostic manner is to use big
endian, not little endian. Either way, it supports both big endian and
little endian, so if your use case requires little endian, you can do that.
You just have to specifiy the endianness, and if you find that to be too
verbose, you can create a wrapper to use in your own code.

- Jonathan M Davis

Oct 31 2023

Salih Dincer <salihdb hotmail.com> writes:

On Tuesday, 31 October 2023 at 10:24:56 UTC, Jonathan M Davis 
wrote:
 On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via 
 Digitalmars-d- learn wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in
 std.bitmanip?

 Why would you expect little endian to be the default? The 
 typical thing to do when encoding integral values in a 
 platform-agnostic manner is to use big endian, not little 
 endian...

Because when we create a structure with a Union, it does reverse 
insertion with according to the static array(bytes) index; I 
showed this above.  I also have a convenience template like this:

```d
template readBytes(T, bool big = false, R)
{        // pair endian version 2.0
   import bop = std.bitmanip;

   static if(big)
     enum E = bop.Endian.bigEndian;
   else
     enum E = bop.Endian.littleEndian;

   auto readBytes(ref R dat)
    => bop.read!(T, E)(dat);
}
```
Sorry to give you extra engage because I already solved the 
problem with readBytes(). Thank you for your answer, but there is 
1 more problem, or even 2! The read() in the library, which is 
2nd function, conflicts with std.write. Yeah, there are many 
solutions to this, but what it does is just read bytes. However, 
you can insert 4 ushorts into one ulong.

Don't you think the name of the function should be readBytes, not 
read?  Because it doesn't work with any type other than ubyte[]!

SDB 79

Oct 31 2023

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, October 31, 2023 8:23:28 AM MDT Salih Dincer via Digitalmars-d-
learn wrote:
 On Tuesday, 31 October 2023 at 10:24:56 UTC, Jonathan M Davis

 wrote:
 On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via

 Digitalmars-d- learn wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in
 std.bitmanip?

 Why would you expect little endian to be the default? The
 typical thing to do when encoding integral values in a
 platform-agnostic manner is to use big endian, not little
 endian...

 Because when we create a structure with a Union, it does reverse
 insertion with according to the static array(bytes) index; I
 showed this above.

I fail to see what the situation with the union has to do with anything.
Sure, you can convert between an array of bytes and an int with a union if
you want to, but what that does is going to be dependent on your local
architecture. read and its related functions in std.bitmanip are
architecture-independent. So, they will convert from little endian or big
endian regardless of what your local architecture is. You would typically
use it on ranges of bytes that come from the network or from serialized
data. The most common scenario there is likely to be that they'll be in big
endian, because that's what platforma-independent binary formats typically
do, but you can explicitly tell read that the range is in little endian if
your range of bytes happens to be in little endian. Both scenarios can
occur, and it supports both. It just defaults to big endian, because that's
the more common scenario when dealing with binary formats.

 I also have a convenience template like this:
 ```d
 template readBytes(T, bool big = false, R)
 {        // pair endian version 2.0
    import bop = std.bitmanip;

    static if(big)
      enum E = bop.Endian.bigEndian;
    else
      enum E = bop.Endian.littleEndian;

    auto readBytes(ref R dat)
     => bop.read!(T, E)(dat);
 }
 ```
 Sorry to give you extra engage because I already solved the
 problem with readBytes(). Thank you for your answer, but there is
 1 more problem, or even 2! The read() in the library, which is
 2nd function, conflicts with std.write. Yeah, there are many
 solutions to this, but what it does is just read bytes. However,
 you can insert 4 ushorts into one ulong.

 Don't you think the name of the function should be readBytes, not
 read?  Because it doesn't work with any type other than ubyte[]!

D's module system makes it so that names do not need to be unique across
modules, and this is not the only case in Phobos where multiple modules use
the same function name. It's easy enough to import only the functions you're
using or to rename them via the import if you happen to be importing from
multiple modules containing functions with the same name. E.G. if you want
to do

std.bitmanip : readBytes = read;

then you can.

- Jonathan M Davis

Oct 31 2023

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Tuesday, 31 October 2023 at 10:09:53 UTC, Salih Dincer wrote:
 Hello,

 Why isn't Endian.littleEndian the default setting for read() in 
 std.bitmanip?

 Okay, we can easily change this if we want (I could use enum LE 
 in the example) and I can also be reversed with 
 data.retro.array().

 ```d
 void main()
 {
   import std.conv : hexString;
   string helloD = hexString!"48656C6C6F204421";
   // compile time converted literal string -ˆ

   import std.string : format;
   auto hexF = helloD.format!"%(%02X%)";

   import std.digest: toHexString;
   auto arr = cast(ubyte[])"Hello D!";

   auto hex = arr.toHexString;
   assert(hex == hexF);

   import std.stdio : writeln;
   hex.writeln(": ", helloD);
 // 48656C6C6F204421: Hello D!
   assert(helloD == "Hello D!");

   auto data = arr.readBytes!size_t;
   data.code.writeln(": ", data.bytes);
 // 2397076564600448328: Hello D!
 }

 template readBytes(T, R)
 {
   union Bytes
   {
     T code;
     char[T.sizeof] bytes;
   }
   import std.bitmanip;
   enum LE = Endian.littleEndian;

   auto readBytes(ref R data)
   {
    import std.range : retro, array;
    auto reverse = data.retro.array;
    return Bytes(reverse.read!T);
   }
 }
 ```

 However, I think it is not compatible with Union. Thanks...

 SDB 79

It might make sense to change since little endian is the most 
common when it comes to hardware. But big endian is most common 
when it comes to networking. So I guess it depends on your view 
of what is most common. Interacting with your local hardware or 
networking.

Oct 31 2023

Salih Dincer <salihdb hotmail.com> writes:

On Tuesday, 31 October 2023 at 14:43:43 UTC, Imperatorn wrote:
 It might make sense to change since little endian is the most 
 common when it comes to hardware. But big endian is most common 
 when it comes to networking. So I guess it depends on your view 
 of what is most common. Interacting with your local hardware or 
 networking.

I realized that I had to make my prefer based on the most common. 
But I have to use Union. That's why I have to choose 
little.Endian. Because it is compatible with both Union and 
HexString. My test code works perfectly as seen below. I'm 
grateful to everyone who helped here and [on the other 
thread](https://forum.dlang.org/thread/ekpvajiablcfueyipcal forum.dlang.org).

```d
enum sampleText = "Hello D!"; // length <= 8 char

void main()
{
   //import sdb.string : UnionBytes;
   mixin UnionBytes!size_t;
   bytes.init = sampleText;

   import std.digest: toHexString;
   auto hexF = bytes.cell.toHexString;
   assert(hexF == "48656C6C6F204421");

   import std.string : format;
   auto helloD = sampleText.format!"%(%02X%)";
   assert(hexF == helloD);

   import std.stdio;
   bytes.code.writeln(": ",  helloD); /* Prints:

   2397076564600448328: 48656C6C6F204421      */

   import std.conv : hexString;
   static assert(sampleText == hexString!"48656C6C6F204421");

   //import sdb.string : readBytes;
   auto code = bytes.cell.readBytes!size_t;
   assert(code == bytes.code);

   bytes.init = code;
   code.writeln(": ", bytes); /* Prints:

   2397076564600448328: Hello D!      */

   assert(bytes[] == [72, 101, 108, 108, 111, 32, 68, 33]);

   //import sdb.string : HexString
   auto str = "0x";
   auto hex = HexString!size_t(bytes.code);
   hex.each!(chr => str ~= chr);
   str.writeln; // 0x48656C6C6F204421
}
```

My core template (UnionBytes) is initialized like this, and 
underneath I have the readBytes template, which also works with 
static arrays:

```d
// ...
       import std.range : front, popFront;
       size_t i;
       do // new version: range support
       {
         char chr;                  // default init: 0xFF
         chr &= str.front;          // masking
         code |= T(chr) << (i * 8); // shifting
         str.popFront;              // next char
       } while(++i < size);
     }

     auto opCast(Cast : T)() const
       => code;

     auto opCast(Cast : string)() const
       => this.format!"%s";

     auto toString(void delegate(in char[]) sink) const
       => sink.formattedWrite("%s", cast(char[])cell);

   }
   UnionBytes bytes;     // for mixin
}

template readBytes(T, bool big = false, R)
{        // pair endian version 2.1
   import std.bitmanip;

   static if(big) enum E = Endian.bigEndian;
   else enum E = Endian.littleEndian;

   import std.range : ElementType;
   alias ET = ElementType!R;

   auto readBytes(ref R dat)
   {
     auto data = cast(ET[])dat;
     return read!(T, E)(data);
   }
}
```

SDB 79

Nov 02 2023

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 2 November 2023 at 11:29:05 UTC, Salih Dincer wrote:
 On Tuesday, 31 October 2023 at 14:43:43 UTC, Imperatorn wrote:
 It might make sense to change since little endian is the most 
 common when it comes to hardware. But big endian is most 
 common when it comes to networking. So I guess it depends on 
 your view of what is most common. Interacting with your local 
 hardware or networking.

 I realized that I had to make my prefer based on the most 
 common. But I have to use Union. That's why I have to choose 
 little.Endian. Because it is compatible with both Union and 
 HexString. My test code works perfectly as seen below. I'm 
 grateful to everyone who helped here and [on the other 
 thread](https://forum.dlang.org/thread/ekpvajiablcfueyipcal forum.dlang.org).

Nice to hear you found a solution. Little endian is *most common* 
in hardware but big endian is *most common* in networking, so 
defining a default endianness can be tricky.

Nov 02 2023

D Programming

C/C++ Programming

Other

digitalmars.D.learn - bigEndian in std.bitmanip