www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Float rounding (in JSON)

reply Sergey <kornburn yandex.ru> writes:
I'm not a professional of IEEE 754, but just found this behavior 
at rounding in comparison with other languages. I supose it 
happened because in D float numbers parsed as double and have a 
full length of double while rounding. But this is just doesn't 
match with behavior in other languages.

I'm not sure if this is somehow connected with JSON realizations.

Is it possible in D to have the same results as in others? 
Because explicit formatting is not the answer, since length of 
rounding could be different. That's why just specify "%.XXf" will 
not resolve the issue - two last numbers have 14 and 15 positions 
after the dot.

**Code Python**
```python
import json

str = '{ "f1": 43.476379000000065, "f2": 43.499718999999987, 
"f3": 43.499718000000087, "f4": 43.418052999999986 }'
print(json.loads(str))
```
**Result**
```
{'f1': 43.476379000000065, 'f2': 43.499718999999985, 'f3': 
43.49971800000009, 'f4': 43.418052999999986}
```

**Code Crystal**
```crystal
require "json"

str = "{ \"f1\": 43.476379000000065, \"f2\": 43.499718999999987, 
\"f3\": 43.499718000000087, \"f4\": 43.418052999999986 }"

puts JSON.parse(str)
```
**Result**
```
{"f1" => 43.476379000000065, "f2" => 43.499718999999985, "f3" => 
43.49971800000009, "f4" => 43.418052999999986}
```

**Code D**
```d
import std;

void main() {
     string s = `{ "f1": 43.476379000000065, "f2": 
43.499718999999987, "f3": 43.499718000000087, "f4": 
43.418052999999986 }`;
     JSONValue j = parseJSON(s);
     writeln(j);
}
```
**Result**
```
{"f1":43.4763790000000654,"f2":43.4997189999999847,"f3":43.4997180000000867,"f4":43.4180529999999862}
```
Oct 13 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/13/22 3:00 PM, Sergey wrote:
 I'm not a professional of IEEE 754, but just found this behavior at 
 rounding in comparison with other languages. I supose it happened 
 because in D float numbers parsed as double and have a full length of 
 double while rounding. But this is just doesn't match with behavior in 
 other languages.
 
 I'm not sure if this is somehow connected with JSON realizations.
It doesn't look really that far off. You can't expect floating point parsing to be exact, as floating point does not perfectly represent decimal numbers, especially when you get down to the least significant bits.
 
 Is it possible in D to have the same results as in others? Because 
 explicit formatting is not the answer, since length of rounding could be 
 different. That's why just specify "%.XXf" will not resolve the issue - 
 two last numbers have 14 and 15 positions after the dot.
It seems like you are looking to output a certain number of digits. If you limit the digits, you can get the outcome you desire. But I want to point out something you may have missed:
 
 **Code Python**
 ```python
 import json
 
 str = '{ "f1": 43.476379000000065, "f2": 43.499718999999987, "f3": 
 43.499718000000087, "f4": 43.418052999999986 }'
 print(json.loads(str))
 ```
 **Result**
 ```
 {'f1': 43.476379000000065, 'f2': 43.499718999999985, 'f3': 
 43.49971800000009, 'f4': 43.418052999999986}
 ```
Let's line these up so we can read it easier ``` f1 in: 43.476379000000065 f1 out: 43.476379000000065 f2 in: 43.499718999999987 f2 out: 43.499718999999985 f3 in: 43.499718000000087 f3 out: 43.49971800000009 f4 in: 43.418052999999986 f4 out: 43.418052999999986 ``` Note how f2 is a different output significantly than the input. This is an artifact of floating point parsing and the digits that are the most insignificant. Also note that the omission of the 7 in f3 doesn't seem to have to do with rounding, because the digits are less than the original. If that digit were anywhere close to significant, you would have expected the digit to appear.
 
 **Code Crystal**
 ```crystal
 require "json"
 
 str = "{ \"f1\": 43.476379000000065, \"f2\": 43.499718999999987, \"f3\": 
 43.499718000000087, \"f4\": 43.418052999999986 }"
 
 puts JSON.parse(str)
 ```
 **Result**
 ```
 {"f1" => 43.476379000000065, "f2" => 43.499718999999985, "f3" => 
 43.49971800000009, "f4" => 43.418052999999986}
 ```
Same here
 
 **Code D**
 ```d
 import std;
 
 void main() {
      string s = `{ "f1": 43.476379000000065, "f2": 43.499718999999987, 
 "f3": 43.499718000000087, "f4": 43.418052999999986 }`;
      JSONValue j = parseJSON(s);
      writeln(j);
 }
 ```
 **Result**
 ```
 {"f1":43.4763790000000654,"f2":43.4997189999999847,"f3":43.4997180000000867,"f4":43.4180529999999862}
 ```
Let's look at D's representation: ``` f1 in: 43.476379000000065 f1 out: 43.4763790000000654 f2 in: 43.499718999999987 f2 out: 43.4997189999999847 f3 in: 43.499718000000087 f3 out: 43.4997180000000867 f4 in: 43.418052999999986 f4 out: 43.4180529999999862 ``` Why does it print one more digit than the other languages? Because that must be the default for `writeln`. You can affect this by changing the number of digits printed. But probably not when printing an entire JSON structure. But look also at f3, and how actually D is closer to the expected value than with the other languages. If you want exact representation of data, parse it as a string instead of a double. I'm assuming you are comparing for testing purposes? If you are, just realize you can never be accurate here. You just have to live with the difference. Typically when comparing floating point values, you use an epsilon to ensure that the floating point value is "close enough", you can't enforce exact representation. -Steve
Oct 13 2022
next sibling parent Sergey <kornburn yandex.ru> writes:
On Thursday, 13 October 2022 at 19:27:22 UTC, Steven 
Schveighoffer wrote:

Thank you Steven, for your very detailed answer.

 It doesn't look really that far off. You can't expect floating 
 point parsing to be exact, as floating point does not perfectly 
 represent decimal numbers, especially when you get down to the 
 least significant bits.
This is sad - because "exact" match is what I need in this toy example.
 But I want to point out something you may have missed:
Actually I've meant those things too :)
 But look also at f3, and how actually D is closer to the 
 expected value than with the other languages.

 If you want exact representation of data, parse it as a string 
 instead of a double.
Unfortunately it is not helped me in this task (which is pretty awkward): it parses some GeoData from JSON file. Then create representation of that data into string format and use hash from that string. Because they use the Hash - I need exact the same string representation to match the answer.
 I'm assuming you are comparing for testing purposes? If you 
 are, just realize you can never be accurate here. You just have 
 to live with the difference. Typically when comparing floating 
 point values, you use an epsilon to ensure that the floating 
 point value is "close enough", you can't enforce exact 
 representation.

 -Steve
Actually it was my attempt to implement the benchmark-game: https://programming-language-benchmarks.vercel.app/problem/json-serde As you can see many languages have passed tests which I assume they have exactly same representation of that float numbers. Maybe I am wrong and did not understand code from other realizations. But at least I test python and crystal and found pretty confusing their results (what you wrote about more accurate example of f3), but what surpsised me even more: they have exactly same confused results with those floating numbers. That's why I've made a conclusion that maybe it is some special and declared behavior/rule for that and I just can't find how to replicate that "well known behavior" in D.
Oct 13 2022
prev sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Thursday, 13 October 2022 at 19:27:22 UTC, Steven 
Schveighoffer wrote:
 On 10/13/22 3:00 PM, Sergey wrote:
 [...]
It doesn't look really that far off. You can't expect floating point parsing to be exact, as floating point does not perfectly represent decimal numbers, especially when you get down to the least significant bits. [...]
To me it looks like there is a conversion to `real` (80 bit floats) somewhere in the D code and that the other languages stay in `double` mode everywhere. Maybe forcing `double` by disabling x87 on the D side would yield the same results as the other languages?
Oct 14 2022
parent bauss <jacobbauss gmail.com> writes:
On Friday, 14 October 2022 at 09:00:11 UTC, Patrick Schluter 
wrote:
 On Thursday, 13 October 2022 at 19:27:22 UTC, Steven 
 Schveighoffer wrote:
 On 10/13/22 3:00 PM, Sergey wrote:
 [...]
It doesn't look really that far off. You can't expect floating point parsing to be exact, as floating point does not perfectly represent decimal numbers, especially when you get down to the least significant bits. [...]
To me it looks like there is a conversion to `real` (80 bit floats) somewhere in the D code and that the other languages stay in `double` mode everywhere. Maybe forcing `double` by disabling x87 on the D side would yield the same results as the other languages?
Looking through the source code then for floating points we call `parse!double` when parsing the json as a floating point. I don't see real being used anywhere when parsing. So if anything then it would have to be internally in parse or dmd. I haven't checked either yet.
Oct 14 2022
prev sibling parent Sergey <kornburn yandex.ru> writes:
On Thursday, 13 October 2022 at 19:00:30 UTC, Sergey wrote:
 I'm not a professional of IEEE 754, but just found this 
 behavior at rounding in comparison with other languages. I 
 supose it happened because in D float numbers parsed as double 
 and have a full length of double while rounding. But this is 
 just doesn't match with behavior in other languages.
So there is no luck with std.json for me. But when std is not the solution, third party libraries could help. I've tried ASDF. This is kind of archived library, but it works well, its documentation is small and clear (mir-ion really needs to improve documentation). So in asdf we could just serialize the json and it will automatically round numbers with the same **magic** logic for floating as other languages do. The only thing: some numbers which are usually double could be presented in JSON as integers. Automatically asdf convert them to double too. In case you need to process them exactly as integers you could use Variant!(int, double) as a type of the data. And provide your custom serializer/deserializer as it is proposed in asdf documentation example. PS Thanks to Steven for his suggestions in Discord.
Dec 30 2022