digitalmars.D - Right way to show numbers in binary/hex/octal in your opinion?

rempas (32/32) Dec 25 2021 So I have this function that converts a number to a string and it

Paul Backus (14/25) Dec 25 2021 The two's complement representation of a number depends on the

rempas (9/22) Dec 25 2021 Interesting. I want to create a system library but It's funny I

Siarhei Siamashka (18/23) Dec 26 2021 I don't think that this makes any sense. Numbers can be negative

rempas (13/31) Dec 30 2021 When I say that it is the only base that makes sense to read, I

Rumbu (17/20) Dec 26 2021 When people are dumping numbers to strings în any other base than

Siarhei Siamashka (25/42) Dec 27 2021 Different people may have different expectations and their

Rumbu (16/32) Dec 27 2021 Your expectations must be congruent with the host architecture,

Siarhei Siamashka (68/84) Dec 28 2021 If you actually care about overflows safety, then both of these

Rumbu (27/62) Dec 28 2021 I don't care about overflows, I care about the fact that D must

rempas (43/56) Dec 30 2021 WHAT DO YOU.... Just kidding :P. But seriously tho, what do you

rikki cattermole (6/15) Dec 30 2021 And before 64bit registers we had (and still do) emulation for that as w...

rempas (7/13) Dec 30 2021 Yeah, of course! This is how 32-bit computers can use 64-bit

rempas (16/32) Dec 30 2021 However, binary uses the most significant digit to identify if

rempas <rempas tutanota.com> writes:

So I have this function that converts a number to a string and it 
can return it in any base you want. However, for negative 
numbers, the result may not be what you expected for bases other 
than decimal if you have used "printf" or "wirtef". Let's say 
that we want to convert then number -10 from decimal to hex. 
There are two possible results (check 
[here](https://www.rapidtables.com/convert/number/decimal-to-hex.html)).

The first one is negate the number and convert it to hex and then 
add a "-" in front of the number. So the result will be: -a 
(which is what my function does)

The second one is what "printf" and "writef" does which is using 
2's complement. So in this case, we first convert the number to 
binary using 2's complement and then we convert this number to 
hex. In this example, `-10` to binary is `1111111111110110` which 
is `fff6` in hex. However, for some reason "printf" and "writef" 
return fffffff6 so go figure....

Here are the advantages of these two methods in my humble opinion:

First method:
1. Is is what my function does and I would prefer not to change 
it obviously. Also implementing the other behavior, would need a 
lot of work (and of course I will have to figure it out, don't 
know how to do it) and it will make the function much slower.
2. It is easier on they eyes as it makes it more obvious to 
understand if the number is signed or not and so what the 
equivalent is in other systems.

Second method:
1. It is probably what people would expect and what makes 
scientifically more sense as decimal was supposed to be the only 
base that will make sense for humans to read hence be the only 
base that has the "-" character.

Anyway, I don't even know why it will be practical useful to 
print a number in another system so, what are your thoughts?

Dec 25 2021

Paul Backus <snarwin gmail.com> writes:

On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:
 The first one is negate the number and convert it to hex and 
 then add a "-" in front of the number. So the result will be: 
 -a (which is what my function does)

This is the correct result.

 The second one is what "printf" and "writef" does which is 
 using 2's complement. So in this case, we first convert the 
 number to binary using 2's complement and then we convert this 
 number to hex. In this example, `-10` to binary is 
 `1111111111110110` which is `fff6` in hex. However, for some 
 reason "printf" and "writef" return fffffff6 so go figure....

The two's complement representation of a number depends on the 
width of the integer type you're using. For the number -10, the 
16-bit two's complement is `fff6` and the 32-bit two's complement 
is `fffffff6`.

The website you link to in your post uses 16 bit integers, but D 
uses 32-bit `int`s by default. That's where the difference comes 
from.

 Anyway, I don't even know why it will be practical useful to 
 print a number in another system so, what are your thoughts?

If you implement the first method, users can still get the two's 
complement representation very easily with (for example) 
`cast(uint) -10`. On the other hand, if you implement the second 
method, it's a lot trickier to get the non-two's-complement 
result. So I think the first method is the better choice here.

Dec 25 2021

rempas <rempas tutanota.com> writes:

On Saturday, 25 December 2021 at 22:16:06 UTC, Paul Backus wrote:
 The two's complement representation of a number depends on the 
 width of the integer type you're using. For the number -10, the 
 16-bit two's complement is `fff6` and the 32-bit two's 
 complement is `fffffff6`.

 The website you link to in your post uses 16 bit integers, but 
 D uses 32-bit `int`s by default. That's where the difference 
 comes from.

Interesting. I want to create a system library but It's funny I 
don't know a lot of low level stuff yet. Tho this also makes it a 
great and even more enjoyable journey :P

 If you implement the first method, users can still get the 
 two's complement representation very easily with (for example) 
 `cast(uint) -10`. On the other hand, if you implement the 
 second method, it's a lot trickier to get the 
 non-two's-complement result. So I think the first method is the 
 better choice here.

That's great! It's much easier to me to just let the first 
behavior anyway that will do! I also posted on [cboard 
forum](https://cboard.cprogramming.com/c-programming/) and I got 
the same answers so this is how we'll do it! Thanks a lot for 
your time and happy holidays!

Dec 25 2021

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:
 Second method:
 1. It is probably what people would expect and what makes 
 scientifically more sense as decimal was supposed to be the 
 only base that will make sense for humans to read hence be the 
 only base that has the "-" character.

I don't think that this makes any sense. Numbers can be negative 
regardless of the base that is used to show them. If "decimal was 
supposed to be the only base that will make sense for humans to 
read", then why bother implementing "this function that converts 
a number to a string and it can return it in any base you want"?

BTW, this is how Ruby and Crystal languages handle conversion 
between strings and integers:

```Ruby





```

Basically, strings have method ".to_i" and integers have method 
".to_s". A single optional argument specifies base (between 2 and 
36) and defaults to 10.

Dec 26 2021

rempas <rempas tutanota.com> writes:

On Sunday, 26 December 2021 at 17:35:12 UTC, Siarhei Siamashka 
wrote:
 I don't think that this makes any sense. Numbers can be 
 negative regardless of the base that is used to show them. If 
 "decimal was supposed to be the only base that will make sense 
 for humans to read", then why bother implementing "this 
 function that converts a number to a string and it can return 
 it in any base you want"?

When I say that it is the only base that makes sense to read, I 
mean that it is the only base that is in the base we learn to 
read and write and it used for a general purpose and not for 
specific tasks (like to shorten memory addresses for example). 
Binary is also very common because this is what the machine 
understands and it is also very easy to go from hex to binary and 
the opposite. This is also why they bothered adding an official 
way of showing negative binary and doing mathematic operations 
with it (the most significant bit identifies if the number is 
negative or positive).

 BTW, this is how Ruby and Crystal languages handle conversion 
 between strings and integers:

 ```Ruby





 ```

 Basically, strings have method ".to_i" and integers have method 
 ".to_s". A single optional argument specifies base (between 2 
 and 36) and defaults to 10.

This is exactly how my library handles them too!

Dec 30 2021

Rumbu <rumbu rumbu.ro> writes:

On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:
 So I have this function that converts a number to a string and 
 it can return it in any base you want. However, for negative 
 numbers, the result ...

When people are dumping numbers to strings în any other base than 
10, they are expecting to see the internal representation of that 
number. Since the sign doesn't have a reserved bit in the 
representation of integrals (like it has for floats), for me it 
doesn't make any sense if I see a negative sign before a hex, 
octal or binary value.

The trickiest value for integrals is the one with the most 
significant bit set (e.g. 0x80). This can be -128 for byte, but 
also 128 for any other type than byte. Now, if we go the other 
way around and put a minus before 0x80, how do we convert it back 
to byte? If we assume that 0x80 is always 128, -0x80 will be -128 
and can fit a byte. On the other side, you cannot store +0x80 in 
a byte because is out of range.

This is also an issue în phobos:

https://issues.dlang.org/show_bug.cgi?id=20452
https://issues.dlang.org/show_bug.cgi?id=18290

Dec 26 2021

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:
 When people are dumping numbers to strings în any other base 
 than 10, they are expecting to see the internal representation 
 of that number.

Different people may have different expectations and their 
expectations may be not the same as yours.

How does this "internal representation" logic make sense for the 
bases, which are not powers of 2? Okay, base 10 is a special 
snowflake, but what about the others?

If dumping numbers to strings in base 16 is intended to show 
their internal representation, then why are non-negative numbers 
not padded with zeroes on the left side (like the negative 
numbers are padded with Fs) when converted using Dlang's 
`to!string`?

As for my expectations, each digit in a base 3 number may be used 
to represent a chosen branch in a ternary tree (similar to how 
each digit in a base 2 number may represent a chosen branch in a 
binary tree). The other bases are useful in a similar way. This 
has nothing to do with the internal representation.

 Since the sign doesn't have a reserved bit in the 
 representation of integrals (like it has for floats), for me it 
 doesn't make any sense if I see a negative sign before a hex, 
 octal or binary value.

Why does the internal representation have to leak out and cause 
artificial troubles/inconsistencies, when these 
troubles/inconsistencies are trivially avoidable?

 The trickiest value for integrals is the one with the most 
 significant bit set (e.g. 0x80). This can be -128 for byte, but 
 also 128 for any other type than byte. Now, if we go the other 
 way around and put a minus before 0x80, how do we convert it 
 back to byte? If we assume that 0x80 is always 128, -0x80 will 
 be -128 and can fit a byte. On the other side, you cannot store 
 +0x80 in a byte because is out of range.

I don't understand what's the problem here. It can be easily 
solved by having a unit test, which verifies that "-0x80" gets 
correctly converted to -128. Or have I missed something?

 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290

To me this looks very much like just a self inflicted damage and 
historical baggage, entirely caused by making wrong choices in 
the past.

Dec 27 2021

Rumbu <rumbu rumbu.ro> writes:

On Monday, 27 December 2021 at 09:55:46 UTC, Siarhei Siamashka 
wrote:
 On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:
 When people are dumping numbers to strings în any other base 
 than 10, they are expecting to see the internal representation 
 of that number.

 Different people may have different expectations and their 
 expectations may be not the same.

Your expectations must be congruent with the host architecture, 
otherwise you can have surprises (like the ones in phobos). The 
architecture has a limited domain and a certain way to represent 
numbers, they are not infinite. Otherwise computers should 
perform math ops using strings and you don't want that for 
performance reasons.

 I don't understand what's the problem here. It can be easily 
 solved by having a unit test, which verifies that "-0x80" gets 
 correctly converted to -128. Or have I missed something?

How can you convert 0x8000_0000_0000_0000 to long?

And if your response is "use a ulong", I have another one: how do 
you convert -0x8000_0000_0000_0000 to ulong.

 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290

 To me this looks very much like just a self inflicted damage 
 and historical baggage, entirely caused by making wrong choices 
 in the past.

No, it's just the fact that phobos doesn't use the same 
convention for both senses of conversion. When converting from 
number to string, it uses the internal representation - 2's 
complement. When it is converting from string to number, it uses 
the "human readable" convention.

Dec 27 2021

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Monday, 27 December 2021 at 12:48:52 UTC, Rumbu wrote:
 How can you convert 0x8000_0000_0000_0000 to long?

 And if your response is "use a ulong", I have another one: how 
 do you convert -0x8000_0000_0000_0000 to ulong.

If you actually care about overflows safety, then both of these 
conversion attempts are invalid and should raise an exception or 
allow to handle this error in some different fashion. For 
example, Crystal language ensures overflows safety and even 
provides two varieties of string-to-integer conversion methods 
(the one with '?' in name returns nil on error, the other raises 
an exception):

```Ruby



"failed"

"failed"

9223372036854775808

-9223372036854775808


(ArgumentError)
puts "8000000000000000".to_i64(16)
```

And Dlang is doing a similar job, though it doesn't seem to be 
able to handle negative base 16 numbers:

```D
import std;

void main() {
   // prints 9223372036854775808
   writeln("8000000000000000".to!ulong(16));
   // Exception: Unexpected '-' when converting from type string 
to type long
   writeln("-8000000000000000".to!long(16));
}
```

If you want to get rid of overflow errors, then please consider 
using a larger 128-bit type or a bigint. Or figure out what's the 
source of this out-of-range input and fix the problem there.

But if you don't care about overflows safety, then it's surely 
possible to implement another library and define conversion 
operations to wraparound any arbitrarily large input until it 
fits into the valid range for the target data type. Using this 
definition, "0x8000_0000_0000_0000" converted to long will become 
-9223372036854775808 and "-0x8000_0000_0000_0000" converted to 
ulong will become 9223372036854775808. I think that this is 
incorrect, but this mimics the two's complement wraparound 
semantics and some people may like it.

 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290

 To me this looks very much like just a self inflicted damage 
 and historical baggage, entirely caused by making wrong 
 choices in the past.

 No, it's just the fact that phobos doesn't use the same 
 convention for both senses of conversion. When converting from 
 number to string, it uses the internal representation - 2's 
 complement. When it is converting from string to number, it 
 uses the "human readable" convention.

The "internal representation" is ambiguous. You can't even figure 
out if FFFF is a positive or a negative number:

```D
import std;

void main() {
   short a = -1;
   writeln(a.to!string(16)); // prints "FFFF"
   long b = 65535;
   writeln(b.to!string(16)); // prints "FFFF"
}
```
Both -1 and 65535 become exactly the same string after 
conversion. How are you going to convert it back?

Also you haven't provided any answer to my questions from the 
earlier message, so I'm repeating them again:

  1. How does this "internal representation" logic make sense for 
the bases, which are not powers of 2?

  2. If dumping numbers to strings in base 16 is intended to show 
their internal representation, then why are non-negative numbers 
not padded with zeroes on the left side (like the negative 
numbers are padded with Fs) when converted using Dlang's 
`to!string`?

Dec 28 2021

Rumbu <rumbu rumbu.ro> writes:

On Tuesday, 28 December 2021 at 23:45:17 UTC, Siarhei Siamashka 
wrote:
 On Monday, 27 December 2021 at 12:48:52 UTC, Rumbu wrote:
 How can you convert 0x8000_0000_0000_0000 to long?

 And if your response is "use a ulong", I have another one: how 
 do you convert -0x8000_0000_0000_0000 to ulong.

 If you actually care about overflows safety, then both of these 
 conversion attempts are invalid and should raise an exception 
 or allow to handle this error in some different fashion. For 
 example, Crystal language ensures overflows safety and even 
 provides two varieties of string-to-integer conversion methods 
 (the one with '?' in name returns nil on error, the other 
 raises an exception):

I don't care about overflows, I care about the fact that D must 
use the same method when it converts numbers to string and the 
other way around.

Currently D dumps byte.min in hex as "80". But it throws an 
overflow exception when I try to get my byte back from "80".

Fun fact, when I wrote my decimal library, I had millions of 
expected values in a file and some of the decimal-int conversions 
failed according to the tests.  The error source was not me, but 
this line: 
https://github.com/rumbu13/decimal/blob/a6bae32d75d56be16e82d37af0c8e4a7c08e318a/s
c/test/test.d#L152, but it took me some time to dig through test file and
realise that among the values, there are some strings that cannot be parsed in
D (the ones starting with "8").

Yes, this can be a solution to dump it as "-80", but the standard 
lib does not even parse the "-" today for other bases than 10.

 If you want to get rid of overflow errors, then please consider 
 using a larger 128-bit type or a bigint. Or figure out what's 
 the source of this out-of-range input and fix the problem there.

That's why I gave you the "long" example. We don't have (yet) a 
128-bit type. That was the idea in the first place, language has 
a limited range of numbers. And when we will have the cent, we 
will lack a 256-bit type.

 ```D
 import std;

 void main() {
   short a = -1;
   writeln(a.to!string(16)); // prints "FFFF"
   long b = 65535;
   writeln(b.to!string(16)); // prints "FFFF"
 }
 ```
 Both -1 and 65535 become exactly the same string after 
 conversion. How are you going to convert it back?

I would like to consider that I know exactly what kind of value I 
am expecting to read.

 Also you haven't provided any answer to my questions from the 
 earlier message, so I'm repeating them again:

  1. How does this "internal representation" logic make sense 
 for the bases, which are not powers of 2?

Here you have a point :) I never thought to other bases than 
powers of 2.


  2. If dumping numbers to strings in base 16 is intended to 
 show their internal representation, then why are non-negative 
 numbers not padded with zeroes on the left side (like the 
 negative numbers are padded with Fs) when converted using 
 Dlang's `to!string`?

They are not padded with F's, that's exactly what the number 
holds in memory as bits.

We are on the same side here, the current to/parse implementation 
is not the best we can get.

Happy New Year :)

Dec 28 2021

rempas <rempas tutanota.com> writes:

On Wednesday, 29 December 2021 at 07:12:02 UTC, Rumbu wrote:
 I don't care about overflows

WHAT DO YOU.... Just kidding :P. But seriously tho, what do you 
mean that you don't care about overflows? Overflows can be sneaky 
and we must always check for them. If not then tell me, what 
happens when you try to save the number "-500" in a "ubyte"? 
First of all, D will not let you do that, you need to implicitly 
cast it. But let's say that you do. What you got? 12!!! Cool ha? 
There are two problems here. First unsigned types cannot store 
negative numbers. So our library function should check if the 
number is negative and in that case not let you do that (an 
assertion will do). Now even if you say: "well in that case the 
function will negate the number and return it" but well... This 
is not logically correct. And second, the value 500 cannot fit 
into a byte so will must also check that. In
any case, overflows should be checked in debug builds at least.

 I care about the fact that D must use the same method when it 
 converts numbers to string and the other way around.

I don't understand what you mean when saying methods (methods as 
in code, the same algorithm?). But what I understand is that the 
example "Siarhei Siamashka" did about both -1 and 65535 become 
exactly the same string after conversion ("FFFF") is a practical 
problem where there are two actual problem happening.
A. When printing the number, we cannot tell if the number is 
positive or negative.
B. We cannot get back the original number because we don't know 
if it's positive or negative.
If we had the "-" sign before the number then the library could 
negate the number and then add "-" in the front of the string and 
problem solved!

 Yes, this can be a solution to dump it as "-80", but the 
 standard lib does not even parse the "-" today for other bases 
 than 10.

**RANT ALERT!!!**
Yeah because D's library (Phobos) is fucking stupid and is 
constantly criticized by everyone (including its maintainers) and 
it is one of the reasons D has the populations it has (along side 
the GC which I love to hate!). This is why I asked what YOU guys 
thought and not what how the stupid library does it. I don't want 
to copy a wrong and bad behavior, this is the idea of making 
something new. This is why I'm trying to make a library that will 
work and make things easy and simple as they should be. *RELAXED 
NOW*

 That's why I gave you the "long" example. We don't have (yet) a 
 128-bit type. That was the idea in the first place, language 
 has a limited range of numbers. And when we will have the cent, 
 we will lack a 256-bit type.

Well 128-bit, 256-bit, 512-bit are not true types. Our computers 
have 64-bit registers. To go above that, you use multiple 
registers that hold a part of the value each. At least that's 
what I read in a PDF about assembly and it makes sense.

 I would like to consider that I know exactly what kind of value 
 I am expecting to read.

I don't understand what you mean here, could you please explain?

 Happy New Year :)

Happy New Year buddy! Wish you the best!!! 😁

Dec 30 2021

rikki cattermole <rikki cattermole.co.nz> writes:

On 31/12/2021 2:52 AM, rempas wrote:
 That's why I gave you the "long" example. We don't have (yet) a 
 128-bit type. That was the idea in the first place, language has a 
 limited range of numbers. And when we will have the cent, we will lack 
 a 256-bit type.

 
 Well 128-bit, 256-bit, 512-bit are not true types. Our computers have 
 64-bit registers. To go above that, you use multiple registers that hold 
 a part of the value each. At least that's what I read in a PDF about 
 assembly and it makes sense.

And before 64bit registers we had (and still do) emulation for that as well.

All this talk is irrelevant, just add fixed point types and you can have 
whatever precision you want.

This also solves the issue with money. So it is a pretty compelling 
solution.

Dec 30 2021

rempas <rempas tutanota.com> writes:

On Thursday, 30 December 2021 at 14:12:23 UTC, rikki cattermole 
wrote:
 And before 64bit registers we had (and still do) emulation for 
 that as well.

Yeah, of course! This is how 32-bit computers can use 64-bit 
types (and even bigger) as well

 All this talk is irrelevant, just add fixed point types and you 
 can have whatever precision you want.

 This also solves the issue with money. So it is a pretty 
 compelling solution.

Yeah, don't get me wrong. I'm not saying that I don't like the 
idea. It's actually the really opposite. However, I'm really 
interested about how stable and productive ready it is.

Dec 30 2021

rempas <rempas tutanota.com> writes:

On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:
 When people are dumping numbers to strings în any other base 
 than 10, they are expecting to see the internal representation 
 of that number. Since the sign doesn't have a reserved bit in 
 the representation of integrals (like it has for floats), for 
 me it doesn't make any sense if I see a negative sign before a 
 hex, octal or binary value.

However, binary uses the most significant digit to identify if 
the number is positive or negative. I suppose that if you want to 
not "show" a digit when converting to "hex" and "octal", you 
could first convert to binary (signed) and then convert this 
number to "hex" or "octal" but still, these system are used for 
specific purposes and they don't have a real negative 
representation so it still doesn't make sense no matter how you 
see it.

 The trickiest value for integrals is the one with the most 
 significant bit set (e.g. 0x80). This can be -128 for byte, but 
 also 128 for any other type than byte. Now, if we go the other 
 way around and put a minus before 0x80, how do we convert it 
 back to byte? If we assume that 0x80 is always 128, -0x80 will 
 be -128 and can fit a byte. On the other side, you cannot store 
 +0x80 in a byte because is out of range.

Yeah, this is why I think that showing the sign is actually a 
good way to show that it is a signed negative number. If there is 
no sign, the number can be either positive signed number or 
unsigned. In the end, a "string" will only be used to print the 
number to a user so there will be no confusion to the programmers 
themselves

 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290

Yeah, I can see why, lol!

Dec 30 2021

D Programming

C/C++ Programming

Other

digitalmars.D - Right way to show numbers in binary/hex/octal in your opinion?