www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Right way to show numbers in binary/hex/octal in your opinion?

reply rempas <rempas tutanota.com> writes:
So I have this function that converts a number to a string and it 
can return it in any base you want. However, for negative 
numbers, the result may not be what you expected for bases other 
than decimal if you have used "printf" or "wirtef". Let's say 
that we want to convert then number -10 from decimal to hex. 
There are two possible results (check 
[here](https://www.rapidtables.com/convert/number/decimal-to-hex.html)).

The first one is negate the number and convert it to hex and then 
add a "-" in front of the number. So the result will be: -a 
(which is what my function does)

The second one is what "printf" and "writef" does which is using 
2's complement. So in this case, we first convert the number to 
binary using 2's complement and then we convert this number to 
hex. In this example, `-10` to binary is `1111111111110110` which 
is `fff6` in hex. However, for some reason "printf" and "writef" 
return fffffff6 so go figure....

Here are the advantages of these two methods in my humble opinion:

First method:
1. Is is what my function does and I would prefer not to change 
it obviously. Also implementing the other behavior, would need a 
lot of work (and of course I will have to figure it out, don't 
know how to do it) and it will make the function much slower.
2. It is easier on they eyes as it makes it more obvious to 
understand if the number is signed or not and so what the 
equivalent is in other systems.

Second method:
1. It is probably what people would expect and what makes 
scientifically more sense as decimal was supposed to be the only 
base that will make sense for humans to read hence be the only 
base that has the "-" character.

Anyway, I don't even know why it will be practical useful to 
print a number in another system so, what are your thoughts?
Dec 25 2021
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:
 The first one is negate the number and convert it to hex and 
 then add a "-" in front of the number. So the result will be: 
 -a (which is what my function does)
This is the correct result.
 The second one is what "printf" and "writef" does which is 
 using 2's complement. So in this case, we first convert the 
 number to binary using 2's complement and then we convert this 
 number to hex. In this example, `-10` to binary is 
 `1111111111110110` which is `fff6` in hex. However, for some 
 reason "printf" and "writef" return fffffff6 so go figure....
The two's complement representation of a number depends on the width of the integer type you're using. For the number -10, the 16-bit two's complement is `fff6` and the 32-bit two's complement is `fffffff6`. The website you link to in your post uses 16 bit integers, but D uses 32-bit `int`s by default. That's where the difference comes from.
 Anyway, I don't even know why it will be practical useful to 
 print a number in another system so, what are your thoughts?
If you implement the first method, users can still get the two's complement representation very easily with (for example) `cast(uint) -10`. On the other hand, if you implement the second method, it's a lot trickier to get the non-two's-complement result. So I think the first method is the better choice here.
Dec 25 2021
parent rempas <rempas tutanota.com> writes:
On Saturday, 25 December 2021 at 22:16:06 UTC, Paul Backus wrote:
 The two's complement representation of a number depends on the 
 width of the integer type you're using. For the number -10, the 
 16-bit two's complement is `fff6` and the 32-bit two's 
 complement is `fffffff6`.

 The website you link to in your post uses 16 bit integers, but 
 D uses 32-bit `int`s by default. That's where the difference 
 comes from.
Interesting. I want to create a system library but It's funny I don't know a lot of low level stuff yet. Tho this also makes it a great and even more enjoyable journey :P
 If you implement the first method, users can still get the 
 two's complement representation very easily with (for example) 
 `cast(uint) -10`. On the other hand, if you implement the 
 second method, it's a lot trickier to get the 
 non-two's-complement result. So I think the first method is the 
 better choice here.
That's great! It's much easier to me to just let the first behavior anyway that will do! I also posted on [cboard forum](https://cboard.cprogramming.com/c-programming/) and I got the same answers so this is how we'll do it! Thanks a lot for your time and happy holidays!
Dec 25 2021
prev sibling next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:
 Second method:
 1. It is probably what people would expect and what makes 
 scientifically more sense as decimal was supposed to be the 
 only base that will make sense for humans to read hence be the 
 only base that has the "-" character.
I don't think that this makes any sense. Numbers can be negative regardless of the base that is used to show them. If "decimal was supposed to be the only base that will make sense for humans to read", then why bother implementing "this function that converts a number to a string and it can return it in any base you want"? BTW, this is how Ruby and Crystal languages handle conversion between strings and integers: ```Ruby ``` Basically, strings have method ".to_i" and integers have method ".to_s". A single optional argument specifies base (between 2 and 36) and defaults to 10.
Dec 26 2021
parent rempas <rempas tutanota.com> writes:
On Sunday, 26 December 2021 at 17:35:12 UTC, Siarhei Siamashka 
wrote:
 I don't think that this makes any sense. Numbers can be 
 negative regardless of the base that is used to show them. If 
 "decimal was supposed to be the only base that will make sense 
 for humans to read", then why bother implementing "this 
 function that converts a number to a string and it can return 
 it in any base you want"?
When I say that it is the only base that makes sense to read, I mean that it is the only base that is in the base we learn to read and write and it used for a general purpose and not for specific tasks (like to shorten memory addresses for example). Binary is also very common because this is what the machine understands and it is also very easy to go from hex to binary and the opposite. This is also why they bothered adding an official way of showing negative binary and doing mathematic operations with it (the most significant bit identifies if the number is negative or positive).
 BTW, this is how Ruby and Crystal languages handle conversion 
 between strings and integers:

 ```Ruby





 ```

 Basically, strings have method ".to_i" and integers have method 
 ".to_s". A single optional argument specifies base (between 2 
 and 36) and defaults to 10.
This is exactly how my library handles them too!
Dec 30 2021
prev sibling parent reply Rumbu <rumbu rumbu.ro> writes:
On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:
 So I have this function that converts a number to a string and 
 it can return it in any base you want. However, for negative 
 numbers, the result ...
When people are dumping numbers to strings în any other base than 10, they are expecting to see the internal representation of that number. Since the sign doesn't have a reserved bit in the representation of integrals (like it has for floats), for me it doesn't make any sense if I see a negative sign before a hex, octal or binary value. The trickiest value for integrals is the one with the most significant bit set (e.g. 0x80). This can be -128 for byte, but also 128 for any other type than byte. Now, if we go the other way around and put a minus before 0x80, how do we convert it back to byte? If we assume that 0x80 is always 128, -0x80 will be -128 and can fit a byte. On the other side, you cannot store +0x80 in a byte because is out of range. This is also an issue în phobos: https://issues.dlang.org/show_bug.cgi?id=20452 https://issues.dlang.org/show_bug.cgi?id=18290
Dec 26 2021
next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:
 When people are dumping numbers to strings în any other base 
 than 10, they are expecting to see the internal representation 
 of that number.
Different people may have different expectations and their expectations may be not the same as yours. How does this "internal representation" logic make sense for the bases, which are not powers of 2? Okay, base 10 is a special snowflake, but what about the others? If dumping numbers to strings in base 16 is intended to show their internal representation, then why are non-negative numbers not padded with zeroes on the left side (like the negative numbers are padded with Fs) when converted using Dlang's `to!string`? As for my expectations, each digit in a base 3 number may be used to represent a chosen branch in a ternary tree (similar to how each digit in a base 2 number may represent a chosen branch in a binary tree). The other bases are useful in a similar way. This has nothing to do with the internal representation.
 Since the sign doesn't have a reserved bit in the 
 representation of integrals (like it has for floats), for me it 
 doesn't make any sense if I see a negative sign before a hex, 
 octal or binary value.
Why does the internal representation have to leak out and cause artificial troubles/inconsistencies, when these troubles/inconsistencies are trivially avoidable?
 The trickiest value for integrals is the one with the most 
 significant bit set (e.g. 0x80). This can be -128 for byte, but 
 also 128 for any other type than byte. Now, if we go the other 
 way around and put a minus before 0x80, how do we convert it 
 back to byte? If we assume that 0x80 is always 128, -0x80 will 
 be -128 and can fit a byte. On the other side, you cannot store 
 +0x80 in a byte because is out of range.
I don't understand what's the problem here. It can be easily solved by having a unit test, which verifies that "-0x80" gets correctly converted to -128. Or have I missed something?
 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290
To me this looks very much like just a self inflicted damage and historical baggage, entirely caused by making wrong choices in the past.
Dec 27 2021
parent reply Rumbu <rumbu rumbu.ro> writes:
On Monday, 27 December 2021 at 09:55:46 UTC, Siarhei Siamashka 
wrote:
 On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:
 When people are dumping numbers to strings în any other base 
 than 10, they are expecting to see the internal representation 
 of that number.
Different people may have different expectations and their expectations may be not the same.
Your expectations must be congruent with the host architecture, otherwise you can have surprises (like the ones in phobos). The architecture has a limited domain and a certain way to represent numbers, they are not infinite. Otherwise computers should perform math ops using strings and you don't want that for performance reasons.
 I don't understand what's the problem here. It can be easily 
 solved by having a unit test, which verifies that "-0x80" gets 
 correctly converted to -128. Or have I missed something?
How can you convert 0x8000_0000_0000_0000 to long? And if your response is "use a ulong", I have another one: how do you convert -0x8000_0000_0000_0000 to ulong.
 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290
To me this looks very much like just a self inflicted damage and historical baggage, entirely caused by making wrong choices in the past.
No, it's just the fact that phobos doesn't use the same convention for both senses of conversion. When converting from number to string, it uses the internal representation - 2's complement. When it is converting from string to number, it uses the "human readable" convention.
Dec 27 2021
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 27 December 2021 at 12:48:52 UTC, Rumbu wrote:
 How can you convert 0x8000_0000_0000_0000 to long?

 And if your response is "use a ulong", I have another one: how 
 do you convert -0x8000_0000_0000_0000 to ulong.
If you actually care about overflows safety, then both of these conversion attempts are invalid and should raise an exception or allow to handle this error in some different fashion. For example, Crystal language ensures overflows safety and even provides two varieties of string-to-integer conversion methods (the one with '?' in name returns nil on error, the other raises an exception): ```Ruby "failed" "failed" 9223372036854775808 -9223372036854775808 (ArgumentError) puts "8000000000000000".to_i64(16) ``` And Dlang is doing a similar job, though it doesn't seem to be able to handle negative base 16 numbers: ```D import std; void main() { // prints 9223372036854775808 writeln("8000000000000000".to!ulong(16)); // Exception: Unexpected '-' when converting from type string to type long writeln("-8000000000000000".to!long(16)); } ``` If you want to get rid of overflow errors, then please consider using a larger 128-bit type or a bigint. Or figure out what's the source of this out-of-range input and fix the problem there. But if you don't care about overflows safety, then it's surely possible to implement another library and define conversion operations to wraparound any arbitrarily large input until it fits into the valid range for the target data type. Using this definition, "0x8000_0000_0000_0000" converted to long will become -9223372036854775808 and "-0x8000_0000_0000_0000" converted to ulong will become 9223372036854775808. I think that this is incorrect, but this mimics the two's complement wraparound semantics and some people may like it.
 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290
To me this looks very much like just a self inflicted damage and historical baggage, entirely caused by making wrong choices in the past.
No, it's just the fact that phobos doesn't use the same convention for both senses of conversion. When converting from number to string, it uses the internal representation - 2's complement. When it is converting from string to number, it uses the "human readable" convention.
The "internal representation" is ambiguous. You can't even figure out if FFFF is a positive or a negative number: ```D import std; void main() { short a = -1; writeln(a.to!string(16)); // prints "FFFF" long b = 65535; writeln(b.to!string(16)); // prints "FFFF" } ``` Both -1 and 65535 become exactly the same string after conversion. How are you going to convert it back? Also you haven't provided any answer to my questions from the earlier message, so I'm repeating them again: 1. How does this "internal representation" logic make sense for the bases, which are not powers of 2? 2. If dumping numbers to strings in base 16 is intended to show their internal representation, then why are non-negative numbers not padded with zeroes on the left side (like the negative numbers are padded with Fs) when converted using Dlang's `to!string`?
Dec 28 2021
parent reply Rumbu <rumbu rumbu.ro> writes:
On Tuesday, 28 December 2021 at 23:45:17 UTC, Siarhei Siamashka 
wrote:
 On Monday, 27 December 2021 at 12:48:52 UTC, Rumbu wrote:
 How can you convert 0x8000_0000_0000_0000 to long?

 And if your response is "use a ulong", I have another one: how 
 do you convert -0x8000_0000_0000_0000 to ulong.
If you actually care about overflows safety, then both of these conversion attempts are invalid and should raise an exception or allow to handle this error in some different fashion. For example, Crystal language ensures overflows safety and even provides two varieties of string-to-integer conversion methods (the one with '?' in name returns nil on error, the other raises an exception):
I don't care about overflows, I care about the fact that D must use the same method when it converts numbers to string and the other way around. Currently D dumps byte.min in hex as "80". But it throws an overflow exception when I try to get my byte back from "80". Fun fact, when I wrote my decimal library, I had millions of expected values in a file and some of the decimal-int conversions failed according to the tests. The error source was not me, but this line: https://github.com/rumbu13/decimal/blob/a6bae32d75d56be16e82d37af0c8e4a7c08e318a/s c/test/test.d#L152, but it took me some time to dig through test file and realise that among the values, there are some strings that cannot be parsed in D (the ones starting with "8"). Yes, this can be a solution to dump it as "-80", but the standard lib does not even parse the "-" today for other bases than 10.
 If you want to get rid of overflow errors, then please consider 
 using a larger 128-bit type or a bigint. Or figure out what's 
 the source of this out-of-range input and fix the problem there.
That's why I gave you the "long" example. We don't have (yet) a 128-bit type. That was the idea in the first place, language has a limited range of numbers. And when we will have the cent, we will lack a 256-bit type.
 ```D
 import std;

 void main() {
   short a = -1;
   writeln(a.to!string(16)); // prints "FFFF"
   long b = 65535;
   writeln(b.to!string(16)); // prints "FFFF"
 }
 ```
 Both -1 and 65535 become exactly the same string after 
 conversion. How are you going to convert it back?
I would like to consider that I know exactly what kind of value I am expecting to read.
 Also you haven't provided any answer to my questions from the 
 earlier message, so I'm repeating them again:

  1. How does this "internal representation" logic make sense 
 for the bases, which are not powers of 2?
Here you have a point :) I never thought to other bases than powers of 2.
  2. If dumping numbers to strings in base 16 is intended to 
 show their internal representation, then why are non-negative 
 numbers not padded with zeroes on the left side (like the 
 negative numbers are padded with Fs) when converted using 
 Dlang's `to!string`?
They are not padded with F's, that's exactly what the number holds in memory as bits. We are on the same side here, the current to/parse implementation is not the best we can get. Happy New Year :)
Dec 28 2021
parent reply rempas <rempas tutanota.com> writes:
On Wednesday, 29 December 2021 at 07:12:02 UTC, Rumbu wrote:
 I don't care about overflows
WHAT DO YOU.... Just kidding :P. But seriously tho, what do you mean that you don't care about overflows? Overflows can be sneaky and we must always check for them. If not then tell me, what happens when you try to save the number "-500" in a "ubyte"? First of all, D will not let you do that, you need to implicitly cast it. But let's say that you do. What you got? 12!!! Cool ha? There are two problems here. First unsigned types cannot store negative numbers. So our library function should check if the number is negative and in that case not let you do that (an assertion will do). Now even if you say: "well in that case the function will negate the number and return it" but well... This is not logically correct. And second, the value 500 cannot fit into a byte so will must also check that. In any case, overflows should be checked in debug builds at least.
 I care about the fact that D must use the same method when it 
 converts numbers to string and the other way around.
I don't understand what you mean when saying methods (methods as in code, the same algorithm?). But what I understand is that the example "Siarhei Siamashka" did about both -1 and 65535 become exactly the same string after conversion ("FFFF") is a practical problem where there are two actual problem happening. A. When printing the number, we cannot tell if the number is positive or negative. B. We cannot get back the original number because we don't know if it's positive or negative. If we had the "-" sign before the number then the library could negate the number and then add "-" in the front of the string and problem solved!
 Yes, this can be a solution to dump it as "-80", but the 
 standard lib does not even parse the "-" today for other bases 
 than 10.
**RANT ALERT!!!** Yeah because D's library (Phobos) is fucking stupid and is constantly criticized by everyone (including its maintainers) and it is one of the reasons D has the populations it has (along side the GC which I love to hate!). This is why I asked what YOU guys thought and not what how the stupid library does it. I don't want to copy a wrong and bad behavior, this is the idea of making something new. This is why I'm trying to make a library that will work and make things easy and simple as they should be. *RELAXED NOW*
 That's why I gave you the "long" example. We don't have (yet) a 
 128-bit type. That was the idea in the first place, language 
 has a limited range of numbers. And when we will have the cent, 
 we will lack a 256-bit type.
Well 128-bit, 256-bit, 512-bit are not true types. Our computers have 64-bit registers. To go above that, you use multiple registers that hold a part of the value each. At least that's what I read in a PDF about assembly and it makes sense.
 I would like to consider that I know exactly what kind of value 
 I am expecting to read.
I don't understand what you mean here, could you please explain?
 Happy New Year :)
Happy New Year buddy! Wish you the best!!! 😁
Dec 30 2021
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 31/12/2021 2:52 AM, rempas wrote:
 That's why I gave you the "long" example. We don't have (yet) a 
 128-bit type. That was the idea in the first place, language has a 
 limited range of numbers. And when we will have the cent, we will lack 
 a 256-bit type.
Well 128-bit, 256-bit, 512-bit are not true types. Our computers have 64-bit registers. To go above that, you use multiple registers that hold a part of the value each. At least that's what I read in a PDF about assembly and it makes sense.
And before 64bit registers we had (and still do) emulation for that as well. All this talk is irrelevant, just add fixed point types and you can have whatever precision you want. This also solves the issue with money. So it is a pretty compelling solution.
Dec 30 2021
parent rempas <rempas tutanota.com> writes:
On Thursday, 30 December 2021 at 14:12:23 UTC, rikki cattermole 
wrote:
 And before 64bit registers we had (and still do) emulation for 
 that as well.
Yeah, of course! This is how 32-bit computers can use 64-bit types (and even bigger) as well
 All this talk is irrelevant, just add fixed point types and you 
 can have whatever precision you want.

 This also solves the issue with money. So it is a pretty 
 compelling solution.
Yeah, don't get me wrong. I'm not saying that I don't like the idea. It's actually the really opposite. However, I'm really interested about how stable and productive ready it is.
Dec 30 2021
prev sibling parent rempas <rempas tutanota.com> writes:
On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:
 When people are dumping numbers to strings în any other base 
 than 10, they are expecting to see the internal representation 
 of that number. Since the sign doesn't have a reserved bit in 
 the representation of integrals (like it has for floats), for 
 me it doesn't make any sense if I see a negative sign before a 
 hex, octal or binary value.
However, binary uses the most significant digit to identify if the number is positive or negative. I suppose that if you want to not "show" a digit when converting to "hex" and "octal", you could first convert to binary (signed) and then convert this number to "hex" or "octal" but still, these system are used for specific purposes and they don't have a real negative representation so it still doesn't make sense no matter how you see it.
 The trickiest value for integrals is the one with the most 
 significant bit set (e.g. 0x80). This can be -128 for byte, but 
 also 128 for any other type than byte. Now, if we go the other 
 way around and put a minus before 0x80, how do we convert it 
 back to byte? If we assume that 0x80 is always 128, -0x80 will 
 be -128 and can fit a byte. On the other side, you cannot store 
 +0x80 in a byte because is out of range.
Yeah, this is why I think that showing the sign is actually a good way to show that it is a signed negative number. If there is no sign, the number can be either positive signed number or unsigned. In the end, a "string" will only be used to print the number to a user so there will be no confusion to the programmers themselves
 This is also an issue în phobos:

 https://issues.dlang.org/show_bug.cgi?id=20452
 https://issues.dlang.org/show_bug.cgi?id=18290
Yeah, I can see why, lol!
Dec 30 2021