www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Casting double to ulong weirdness

reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
I'm posting this here for visibility. This was silently 
corrupting our data, and might be doing the same for others as 
well.

import std.stdio;
void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
}

Output:
11
12


to!ulong instead of the cast does the right thing, and is a 
viable work-around.

Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
Aug 24 2015
next sibling parent anonymous <anonymous example.com> writes:
On Monday 24 August 2015 18:52,  wrote:

 import std.stdio;
 void main() {
    double x = 1.2;
    writeln(cast(ulong)(x * 10.0));
    double y = 1.2 * 10.0;
    writeln(cast(ulong)y);
 }
 
 Output:
 11
 12
 
 
 to!ulong instead of the cast does the right thing, and is a
 viable work-around.
 
 Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
1.2 is not representable exactly in binary. Try printing it with a lot of decimal places: writefln("%.20f", x); /* prints "1.19999999999999995559" */ Multiply that by 10: ~11.999; cast to ulong: 11. Interestingly, printing x * 10.0 that way shows exactly 12: writefln("%.20f", x * 10.0); /* 12.00000000000000000000 */ But cast one operand to real and you're back at 11.9...: writefln("%.20f", cast(real)x * 10.0); /* 11.99999999999999955591 */ So, apparently, real precision is used in your code. This is not unexpected; compilers are allowed to use higher precision than requested for floating point operations. I think people have argued against it in the past, but so far Walter has been adamant about it being the right choice.
Aug 24 2015
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 12:52 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
<marcioapm gmail.com>\"" wrote:
 I'm posting this here for visibility. This was silently corrupting our
 data, and might be doing the same for others as well.

 import std.stdio;
 void main() {
    double x = 1.2;
    writeln(cast(ulong)(x * 10.0));
    double y = 1.2 * 10.0;
    writeln(cast(ulong)y);
 }

 Output:
 11
 12
Yes. This is part of the issue of floating point. 1.2 cannot be represented accurately. The second case is done via real, not double, and at compile time (i.e. constant folding). There may be other reasons why this works. You are better off adding a small epsilon: writeln(cast(ulong)(x * 10.0 + 0.1));
 to!ulong instead of the cast does the right thing, and is a viable
 work-around.
to!ulong likely adds the epsilon, but you'd have to look to be sure. Note, this is NOT a D problem, this is a problem with floating point. And by problem, I mean feature-that-you-should-avoid :) -Steve
Aug 24 2015
next sibling parent =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Monday, 24 August 2015 at 17:26:12 UTC, Steven Schveighoffer 
wrote:
 On 8/24/15 12:52 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
 <marcioapm gmail.com>\"" wrote:
 I'm posting this here for visibility. This was silently 
 corrupting our
 data, and might be doing the same for others as well.

 import std.stdio;
 void main() {
    double x = 1.2;
    writeln(cast(ulong)(x * 10.0));
    double y = 1.2 * 10.0;
    writeln(cast(ulong)y);
 }

 Output:
 11
 12
Yes. This is part of the issue of floating point. 1.2 cannot be represented accurately. The second case is done via real, not double, and at compile time (i.e. constant folding). There may be other reasons why this works. You are better off adding a small epsilon: writeln(cast(ulong)(x * 10.0 + 0.1));
 to!ulong instead of the cast does the right thing, and is a 
 viable
 work-around.
to!ulong likely adds the epsilon, but you'd have to look to be sure. Note, this is NOT a D problem, this is a problem with floating point. And by problem, I mean feature-that-you-should-avoid :) -Steve
I am familiar with floating-point representations and their pitfalls, and I think that is not the issue here. The issue I am trying to illustrate is the fact that the same exact operation returns different results. Both operations are x * 10.0, except one of them passes through the stack before the cast. I would expect this to be consistent, as I believe is the case in C/C++.
Aug 24 2015
prev sibling parent reply "rumbu" <rumbu rumbu.ro> writes:
On Monday, 24 August 2015 at 17:26:12 UTC, Steven Schveighoffer 
wrote:
 Note, this is NOT a D problem, this is a problem with floating 
 ponit. And by problem, I mean feature-that-you-should-avoid :)

 -Steve
Visual C++ 19.00.23026, x86, x64: int _tmain(int argc, _TCHAR* argv[]) { double x = 1.2; printf("%d\r\n", (unsigned long long)(x * 10.0)); double y = 1.2 * 10.0; printf("%d\r\n", ((unsigned long long)y)); return 0; } Output: 12 12 Same output in debugger for an ARM Windows App. C# 6.0: static void Main(string[] args) { double x = 1.2; WriteLine((ulong)(x * 10.0)); double y = 1.2 * 10.0; WriteLine((ulong)y); } Output: 12 12 Same output in debugger for ARM in all flavours (Android, iOS, Windows) It seems like a D problem.
Aug 24 2015
parent reply "rumbu" <rumbu rumbu.ro> writes:
BTW, 1.2 and 12.0 are directly representable as double

In C++:

printf("%.20f\r\n", 1.2);
printf("%.20f\r\n", 12.0);

will output:

1.20000000000000000000
12.00000000000000000000

Either upcasting to real is the wrong decision here, either the 
writeln string conversion is wrong.
Aug 24 2015
next sibling parent reply Justin Whear <justin economicmodeling.com> writes:
On Mon, 24 Aug 2015 18:06:07 +0000, rumbu wrote:

 BTW, 1.2 and 12.0 are directly representable as double
 
 In C++:
 
 printf("%.20f\r\n", 1.2);
 printf("%.20f\r\n", 12.0);
 
 will output:
 
 1.20000000000000000000 12.00000000000000000000
 
 Either upcasting to real is the wrong decision here, either the writeln
 string conversion is wrong.
No it's not, this must be some sort of constant-folding or precision increase. $ cat test.c #include "stdio.h" int main(int nargs, char** args) { double x = 1.2; printf("%.20f\n", x); } $ clang test.c && ./a.out 1.19999999999999995559
Aug 24 2015
parent "Warwick" <warwick warwick.com> writes:
On Monday, 24 August 2015 at 18:16:44 UTC, Justin Whear wrote:
 On Mon, 24 Aug 2015 18:06:07 +0000, rumbu wrote:

 BTW, 1.2 and 12.0 are directly representable as double
 
 In C++:
 
 printf("%.20f\r\n", 1.2);
 printf("%.20f\r\n", 12.0);
 
 will output:
 
 1.20000000000000000000 12.00000000000000000000
 
 Either upcasting to real is the wrong decision here, either 
 the writeln string conversion is wrong.
No it's not, this must be some sort of constant-folding or precision increase.
Maybe the constant folding is using a different rounding mode to the runtime?
Aug 24 2015
prev sibling next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 08/24/2015 11:06 AM, rumbu wrote:

 BTW, 1.2 and 12.0 are directly representable as double
12 is but 1.2 is not.
 In C++:

 printf("%.20f\r\n", 1.2);
 printf("%.20f\r\n", 12.0);

 will output:

 1.20000000000000000000
 12.00000000000000000000

 Either upcasting to real is the wrong decision here, either the writeln
 string conversion is wrong.
Output is one thing. The issue is with the representation of 1.2. You need infinite digits. D's %a helps with visualizing it: import std.stdio; void main() { writefln("%a", 1.2); writefln("%a", 12.0); } Outputs 0x1.3333333333333p+0 0x1.8p+3 Ali
Aug 24 2015
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 24 August 2015 at 18:06:08 UTC, rumbu wrote:
 BTW, 1.2 and 12.0 are directly representable as double
12.0 is representable, but I'm pretty sure, if you work it out, 1.2 isn't.
Aug 24 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 2:06 PM, rumbu wrote:
 BTW, 1.2 and 12.0 are directly representable as double

 In C++:

 printf("%.20f\r\n", 1.2);
 printf("%.20f\r\n", 12.0);

 will output:

 1.20000000000000000000
 12.00000000000000000000

 Either upcasting to real is the wrong decision here, either the writeln
 string conversion is wrong.
I don't think they are directly representable as floating point, because they are have factors other than 2 in the decimal portion. From my understanding, anything that only has to do with powers of 2 are representable in floating point, just like you cannot represent 1/3 in decimal exactly. But there is definitely something weird going on with the casting. I wrote this program: testfp.d: extern(C) void foo(double x); void main() { double x = 1.2; foo(x); } testfp2.d: extern(C) void foo(double x) { import std.stdio; writeln(cast(ulong)(x * 10.0)); } testfp2.c: #include <stdio.h> void foo(double x) { printf("%lld\n", (unsigned long long)(x * 10)); } If I link testfp.d against testfp2.c, then it outputs 12. If I link against testfp2.d, it outputs 11. I have faith that printf and writeln properly output ulongs. Something different happens with the cast. There can be no constant folding operations or optimizations going on here, as this is done via separate compilation. I'll re-open the bug report. -Steve
Aug 24 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 2:38 PM, Steven Schveighoffer wrote:
 On 8/24/15 2:06 PM, rumbu wrote:
 BTW, 1.2 and 12.0 are directly representable as double

 In C++:

 printf("%.20f\r\n", 1.2);
 printf("%.20f\r\n", 12.0);

 will output:

 1.20000000000000000000
 12.00000000000000000000

 Either upcasting to real is the wrong decision here, either the writeln
 string conversion is wrong.
I don't think they are directly representable as floating point, because they are have factors other than 2 in the decimal portion. From my understanding, anything that only has to do with powers of 2 are representable in floating point, just like you cannot represent 1/3 in decimal exactly. But there is definitely something weird going on with the casting. I wrote this program: testfp.d: extern(C) void foo(double x); void main() { double x = 1.2; foo(x); } testfp2.d: extern(C) void foo(double x) { import std.stdio; writeln(cast(ulong)(x * 10.0)); } testfp2.c: #include <stdio.h> void foo(double x) { printf("%lld\n", (unsigned long long)(x * 10)); } If I link testfp.d against testfp2.c, then it outputs 12. If I link against testfp2.d, it outputs 11.
More data: It definitely has something to do with the representation of 1.2 * 10.0 in *real*. I changed the code so that it writes the result of the multiplication to a shared double. In this case it *works* and prints 12, just like C does. This also works: double x = 1.2; double y = x * 10.0; writeln(cast(ulong)y); // 12 However, change y to a real, and you get 11. Note that if I first convert from real to double, then convert to ulong, it works. This code: double x = 1.2; double x2 = x * 10.0; real y = x * 10.0; real y2 = x2; double y3 = y; writefln("%a, %a, %a", y, y2, cast(real)y3); outputs: 0xb.ffffffffffffep+0, 0xcp+0, 0xcp+0 So some rounding happens in the conversion from real to double, that doesn't happen in the conversion from real to ulong. All this gets down to: FP cannot accurately represent decimal. Should this be fixed? Can it be fixed? I don't know. But I would be very cautious about converting anything FP to integers without some epsilon. -Steve
Aug 24 2015
parent reply "bachmeier" <no spam.net> writes:
On Monday, 24 August 2015 at 18:59:58 UTC, Steven Schveighoffer 
wrote:
 All this gets down to: FP cannot accurately represent decimal. 
 Should this be fixed? Can it be fixed? I don't know. But I 
 would be very cautious about converting anything FP to integers 
 without some epsilon.

 -Steve
I don't see anything that needs to be fixed, because I don't think anything is broken - there is nothing that violates my understanding of floating point precision in D. cast is not round. What is broken is a program that attempts to convert a double to an integer type using a cast rather than the functions that were written to do it correctly.
Aug 24 2015
next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 3:15 PM, bachmeier wrote:
 On Monday, 24 August 2015 at 18:59:58 UTC, Steven Schveighoffer wrote:
 All this gets down to: FP cannot accurately represent decimal. Should
 this be fixed? Can it be fixed? I don't know. But I would be very
 cautious about converting anything FP to integers without some epsilon.

 -Steve
I don't see anything that needs to be fixed, because I don't think anything is broken - there is nothing that violates my understanding of floating point precision in D. cast is not round. What is broken is a program that attempts to convert a double to an integer type using a cast rather than the functions that were written to do it correctly.
What is surprising, and possibly buggy, is that none of these operations involve real, but the issue only happens because under the hood, real is used instead of double for the multiplication. I pretty much agree with you that the code is written incorrectly. But it is unfortunate it differs in the way it handles this from C. I think this issue has been brought up before on the newsgroup, especially where CTFE is involved. -Steve
Aug 24 2015
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Aug 24, 2015 at 07:15:43PM +0000, bachmeier via Digitalmars-d wrote:
 On Monday, 24 August 2015 at 18:59:58 UTC, Steven Schveighoffer wrote:
All this gets down to: FP cannot accurately represent decimal. Should
this be fixed? Can it be fixed? I don't know. But I would be very
cautious about converting anything FP to integers without some
epsilon.

-Steve
I don't see anything that needs to be fixed, because I don't think anything is broken - there is nothing that violates my understanding of floating point precision in D. cast is not round. What is broken is a program that attempts to convert a double to an integer type using a cast rather than the functions that were written to do it correctly.
+1. Floating-point != mathematical real numbers. Don't expect it to behave the same. T -- It's amazing how careful choice of punctuation can leave you hanging:
Aug 24 2015
prev sibling next sibling parent reply "bachmeier" <no spam.net> writes:
On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins wrote:
 I'm posting this here for visibility. This was silently 
 corrupting our data, and might be doing the same for others as 
 well.

 import std.stdio;
 void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
 }

 Output:
 11
 12


 to!ulong instead of the cast does the right thing, and is a 
 viable work-around.

 Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
I would not describe to!ulong as a "work-around". You just discovered one of the reasons to! exists: it is the right way to do it and cast(ulong) is the wrong way. As the others have noted, floating point is tricky business, and you need to use the right tools for the job. std.math.round also works.
Aug 24 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 1:43 PM, bachmeier wrote:
 On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins wrote:
 I'm posting this here for visibility. This was silently corrupting our
 data, and might be doing the same for others as well.

 import std.stdio;
 void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
 }

 Output:
 11
 12


 to!ulong instead of the cast does the right thing, and is a viable
 work-around.

 Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
I would not describe to!ulong as a "work-around". You just discovered one of the reasons to! exists: it is the right way to do it and cast(ulong) is the wrong way. As the others have noted, floating point is tricky business, and you need to use the right tools for the job.
real y = x * 10.0; writeln(y.to!ulong); // 11 to! does not do anything different than cast. What is happening here is the implicit cast from real to double. D treats the result of x * 10.0 as type double, but it's done at real precision. In that conversion, the error is hidden by a rounding automatically done by the processor I think. -Steve
Aug 24 2015
next sibling parent "bachmeier" <no spam.net> writes:
On Monday, 24 August 2015 at 19:23:44 UTC, Steven Schveighoffer 
wrote:

 real y = x * 10.0;
 writeln(y.to!ulong); // 11

 to! does not do anything different than cast. What is happening 
 here is the implicit cast from real to double. D treats the 
 result of x * 10.0 as type double, but it's done at real 
 precision. In that conversion, the error is hidden by a 
 rounding automatically done by the processor I think.

 -Steve
Yes, I was mistaken. You have to use roundTo or std.math.round. to! and cast both truncate.
Aug 24 2015
prev sibling parent reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Monday, 24 August 2015 at 19:23:44 UTC, Steven Schveighoffer 
wrote:
 On 8/24/15 1:43 PM, bachmeier wrote:
 On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins 
 wrote:
 I'm posting this here for visibility. This was silently 
 corrupting our
 data, and might be doing the same for others as well.

 import std.stdio;
 void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
 }

 Output:
 11
 12


 to!ulong instead of the cast does the right thing, and is a 
 viable
 work-around.

 Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
I would not describe to!ulong as a "work-around". You just discovered one of the reasons to! exists: it is the right way to do it and cast(ulong) is the wrong way. As the others have noted, floating point is tricky business, and you need to use the right tools for the job.
real y = x * 10.0; writeln(y.to!ulong); // 11 to! does not do anything different than cast. What is happening here is the implicit cast from real to double. D treats the result of x * 10.0 as type double, but it's done at real precision. In that conversion, the error is hidden by a rounding automatically done by the processor I think. -Steve
Whatever the issue is, it is not unavoidable, because as has been shown, other languages do it correctly. From the data presented so far, it seems like the issue is that the mul is performed in 80-bit precision, storing it before the cast forces a truncation down to 64-bit. Similarly, passing it to a function will also truncate to 64-bit, due to ABIs. This is why to! works as expected. Please do keep in mind that the issue is not one of precision, but one of inconsistency. They are not the same thing. The result being 11 or 12 is irrelevant to this issue. It should just be the same for two instances of the same expression. In an attempt to make things more obvious, consider this example, which also illustrates why to! works, despite apparently doing nothing extra at all. double noop(double z) { return z; } void main() { double x = 1.2; writeln(cast(ulong)(x * 10.0)); writeln(cast(ulong)noop(x * 10.0)); } Outputs: 11 12
Aug 24 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 4:34 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
<marcioapm gmail.com>\"" wrote:
 On Monday, 24 August 2015 at 19:23:44 UTC, Steven Schveighoffer wrote:
 On 8/24/15 1:43 PM, bachmeier wrote:
 On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins wrote:
 I'm posting this here for visibility. This was silently corrupting our
 data, and might be doing the same for others as well.

 import std.stdio;
 void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
 }

 Output:
 11
 12


 to!ulong instead of the cast does the right thing, and is a viable
 work-around.

 Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
I would not describe to!ulong as a "work-around". You just discovered one of the reasons to! exists: it is the right way to do it and cast(ulong) is the wrong way. As the others have noted, floating point is tricky business, and you need to use the right tools for the job.
real y = x * 10.0; writeln(y.to!ulong); // 11 to! does not do anything different than cast. What is happening here is the implicit cast from real to double. D treats the result of x * 10.0 as type double, but it's done at real precision. In that conversion, the error is hidden by a rounding automatically done by the processor I think.
Whatever the issue is, it is not unavoidable, because as has been shown, other languages do it correctly.
Your other examples use doubles, not reals. It's not apples to apples.
  From the data presented so far, it seems like the issue is that the mul
 is performed in 80-bit precision, storing it before the cast forces a
 truncation down to 64-bit.
Not just truncation, rounding too.
 Similarly, passing it to a function will also
 truncate to 64-bit, due to ABIs. This is why to! works as expected.

 Please do keep in mind that the issue is not one of precision, but one
 of inconsistency.
It is an issue of precision. In order to change from real to double, some bits must be lost. Since certain numbers cannot be represented, the CPU must round or truncate.
 They are not the same thing. The result being 11 or 12
 is irrelevant to this issue. It should just be the same for two
 instances of the same expression.
They are not the same expression. One goes from double through multiplication to real, then back to double, then to ulong. The other skips the real to double conversion and goes directly to ulong. The real issue here is that you are not correctly converting from a floating point number to an integer.
 In an attempt to make things more obvious, consider this example, which
 also illustrates why to! works, despite apparently doing nothing extra
 at all.

 double noop(double z) {
    return z;
 }

 void main() {
    double x = 1.2;
    writeln(cast(ulong)(x * 10.0));
    writeln(cast(ulong)noop(x * 10.0));
 }

 Outputs:
 11
 12
I understand the inconsistency, and I agree it is an issue that should be examined. But the issue is entirely avoidable by not using incorrect methods to convert from floating point to integer after floating point operations introduce some small level of error. Perhaps there is some way to make it properly round in this case, but I guarantee it will not fix all floating point errors. -Steve
Aug 24 2015
next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 5:03 PM, Steven Schveighoffer wrote:
 Whatever the issue is, it is not unavoidable, because as has been shown,
 other languages do it correctly.
Your other examples use doubles, not reals. It's not apples to apples.
#include <stdio.h> int main() { long double x = 1.2; x *= 10.0; printf("%lld\n", (unsigned long long)x); } output: 11 -Steve
Aug 24 2015
prev sibling parent reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Monday, 24 August 2015 at 21:03:50 UTC, Steven Schveighoffer 
wrote:
 On 8/24/15 4:34 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
 <marcioapm gmail.com>\"" wrote:
 On Monday, 24 August 2015 at 19:23:44 UTC, Steven 
 Schveighoffer wrote:
 On 8/24/15 1:43 PM, bachmeier wrote:
 On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins 
 wrote:
 I'm posting this here for visibility. This was silently 
 corrupting our
 data, and might be doing the same for others as well.

 import std.stdio;
 void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
 }

 Output:
 11
 12


 to!ulong instead of the cast does the right thing, and is a 
 viable
 work-around.

 Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
I would not describe to!ulong as a "work-around". You just discovered one of the reasons to! exists: it is the right way to do it and cast(ulong) is the wrong way. As the others have noted, floating point is tricky business, and you need to use the right tools for the job.
real y = x * 10.0; writeln(y.to!ulong); // 11 to! does not do anything different than cast. What is happening here is the implicit cast from real to double. D treats the result of x * 10.0 as type double, but it's done at real precision. In that conversion, the error is hidden by a rounding automatically done by the processor I think.
Whatever the issue is, it is not unavoidable, because as has been shown, other languages do it correctly.
Your other examples use doubles, not reals. It's not apples to apples.
All my examples are doubles, and I have tested them all in C++ as well, using doubles. It is indeed apples to apples :)
  From the data presented so far, it seems like the issue is 
 that the mul
 is performed in 80-bit precision, storing it before the cast 
 forces a
 truncation down to 64-bit.
Not just truncation, rounding too.
What? If rounding was performed, then it would work as expected. i.e. both outputs would be 12.
 Similarly, passing it to a function will also
 truncate to 64-bit, due to ABIs. This is why to! works as 
 expected.

 Please do keep in mind that the issue is not one of precision, 
 but one
 of inconsistency.
It is an issue of precision. In order to change from real to double, some bits must be lost. Since certain numbers cannot be represented, the CPU must round or truncate.
There is no mention of real anywhere in any code. The intent is clearly stated in the code and while I accept precision and rounding errors, especially because DMD has no way to select a floating point model, that I am aware of, at least, it's very hard for me to accept the inconsistency.
 They are not the same thing. The result being 11 or 12
 is irrelevant to this issue. It should just be the same for two
 instances of the same expression.
They are not the same expression. One goes from double through multiplication to real, then back to double, then to ulong. The other skips the real to double conversion and goes directly to ulong.
There is only 1 floating-point operation and one cast per expression. They are effectively the same except one value is stored in a temporary before casting. The intent expressed in the code is absolutely the same. All values are the same, operation order is the same, and types are all the same.
 The real issue here is that you are not correctly converting 
 from a floating point number to an integer.

 In an attempt to make things more obvious, consider this 
 example, which
 also illustrates why to! works, despite apparently doing 
 nothing extra
 at all.

 double noop(double z) {
    return z;
 }

 void main() {
    double x = 1.2;
    writeln(cast(ulong)(x * 10.0));
    writeln(cast(ulong)noop(x * 10.0));
 }

 Outputs:
 11
 12
I understand the inconsistency, and I agree it is an issue that should be examined. But the issue is entirely avoidable by not using incorrect methods to convert from floating point to integer after floating point operations introduce some small level of error. Perhaps there is some way to make it properly round in this case, but I guarantee it will not fix all floating point errors. -Steve
What is the correct way to truncate, not round, a floating-point value to an integer?
Aug 24 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Aug 24, 2015 at 09:34:22PM +0000, via Digitalmars-d wrote:
[...]
 What is the correct way to truncate, not round, a floating-point value
 to an integer?
std.math.trunc. T -- Having a smoking section in a restaurant is like having a peeing section in a swimming pool. -- Edward Burr
Aug 24 2015
parent =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Monday, 24 August 2015 at 22:12:42 UTC, H. S. Teoh wrote:
 On Mon, Aug 24, 2015 at 09:34:22PM +0000, via Digitalmars-d 
 wrote: [...]
 What is the correct way to truncate, not round, a 
 floating-point value to an integer?
std.math.trunc. T
import std.stdio; import std.math; void main() { double x = 1.2; writeln(std.math.trunc(x * 10.0)); double y = x * 10.0; writeln(std.math.trunc(y)); } Outputs: 11 12
Aug 24 2015
prev sibling next sibling parent "bachmeier" <no spam.net> writes:
On Monday, 24 August 2015 at 21:34:23 UTC, Márcio Martins wrote:
 Whatever the issue is, it is not unavoidable, because as has 
 been shown,
 other languages do it correctly.
There's no guarantee that it will be done consistently or correctly in C or C++ to my knowledge. Some compilers will do it consistently, but it's absolutely not portable.
 It is an issue of precision. In order to change from real to 
 double, some bits must be lost. Since certain numbers cannot 
 be represented, the CPU must round or truncate.
There is no mention of real anywhere in any code. The intent is clearly stated in the code and while I accept precision and rounding errors, especially because DMD has no way to select a floating point model, that I am aware of, at least, it's very hard for me to accept the inconsistency.
It's fully consistent with what DMD claims to do: http://dlang.org/portability.html While a compiler can guarantee consistency, I don't know of any way to guarantee correctness, which makes the question of consistency irrelevant. There's no way to know what will happen when you run the program.
 What is the correct way to truncate, not round, a 
 floating-point value to an integer?
If you can be an epsilon above or below the exact answer, there's no way to guarantee correctness unless you know you're not doing something that resembles integer operations. If the exact answer is 12.2 or 12.6, you can do it correctly. If it is 12.0 or 23.0, you can get the wrong answer.
Aug 24 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/24/15 5:34 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
<marcioapm gmail.com>\"" wrote:
 On Monday, 24 August 2015 at 21:03:50 UTC, Steven Schveighoffer wrote:
 I understand the inconsistency, and I agree it is an issue that should
 be examined. But the issue is entirely avoidable by not using
 incorrect methods to convert from floating point to integer after
 floating point operations introduce some small level of error.

 Perhaps there is some way to make it properly round in this case, but
 I guarantee it will not fix all floating point errors.
What is the correct way to truncate, not round, a floating-point value to an integer?
auto result = cast(ulong)(x * 10.0 + x.epsilon); -Steve
Aug 25 2015
next sibling parent reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Tuesday, 25 August 2015 at 11:14:35 UTC, Steven Schveighoffer 
wrote:
 On 8/24/15 5:34 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
 <marcioapm gmail.com>\"" wrote:
 On Monday, 24 August 2015 at 21:03:50 UTC, Steven 
 Schveighoffer wrote:
 I understand the inconsistency, and I agree it is an issue 
 that should
 be examined. But the issue is entirely avoidable by not using
 incorrect methods to convert from floating point to integer 
 after
 floating point operations introduce some small level of error.

 Perhaps there is some way to make it properly round in this 
 case, but
 I guarantee it will not fix all floating point errors.
What is the correct way to truncate, not round, a floating-point value to an integer?
auto result = cast(ulong)(x * 10.0 + x.epsilon); -Steve
import std.stdio; void main() { double x = 1.2; writeln(cast(ulong)(x * 10.0 + x.epsilon)); double y = x * 10.0; writeln(cast(ulong)(y + x.epsilon)); double z = x * 10.0 + x.epsilon; writeln(cast(ulong)(z)); } Outputs: 11 12 12 I leave it at this. It seems like this only bothers me, and I have no more time to argue. The workaround is not that bad, and at the end of the day, it is just one more thing on the list.
Aug 25 2015
next sibling parent "bachmeier" <no spam.com> writes:
On Tuesday, 25 August 2015 at 13:51:18 UTC, Márcio Martins wrote:

 import std.stdio;
 void main() {
 	double x = 1.2;
 	writeln(cast(ulong)(x * 10.0 + x.epsilon));

 	double y = x * 10.0;
 	writeln(cast(ulong)(y + x.epsilon));
 	
 	double z = x * 10.0 + x.epsilon;
 	writeln(cast(ulong)(z));
 }

 Outputs:
 11
 12
 12

 I leave it at this. It seems like this only bothers me, and I 
 have no more time to argue.
 The workaround is not that bad, and at the end of the day, it 
 is just one more thing on the list.
What you are attempting to do is impossible. Is there a reason you can't use std.math.round, which is the tool that was made for the task?
Aug 25 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/25/15 9:51 AM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
<marcioapm gmail.com>\"" wrote:
 On Tuesday, 25 August 2015 at 11:14:35 UTC, Steven Schveighoffer wrote:
 On 8/24/15 5:34 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\"
 <marcioapm gmail.com>\"" wrote:
 On Monday, 24 August 2015 at 21:03:50 UTC, Steven Schveighoffer wrote:
 I understand the inconsistency, and I agree it is an issue that should
 be examined. But the issue is entirely avoidable by not using
 incorrect methods to convert from floating point to integer after
 floating point operations introduce some small level of error.

 Perhaps there is some way to make it properly round in this case, but
 I guarantee it will not fix all floating point errors.
What is the correct way to truncate, not round, a floating-point value to an integer?
auto result = cast(ulong)(x * 10.0 + x.epsilon);
import std.stdio; void main() { double x = 1.2; writeln(cast(ulong)(x * 10.0 + x.epsilon));
Sorry, I misunderstood what epsilon was (I think it's the smallest incremental value for a given floating point type with an exponent of 1). Because your number is further away than this value, it doesn't help. You need to add something to correct for the error that might exist. The best thing to do is to add a very small number, as that will only adjust truly close numbers. In this case, the number you could add is 0.1, since it's not going to affect anything other than a slightly-off value. It depends on where you expect the error to be. As bachmeier says, it's not something that's easy to get right.
      double y = x * 10.0;
      writeln(cast(ulong)(y + x.epsilon));

      double z = x * 10.0 + x.epsilon;
      writeln(cast(ulong)(z));
these work because you have converted to double, which appears to round up. -Steve
Aug 25 2015
next sibling parent reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Tuesday, 25 August 2015 at 14:54:41 UTC, Steven Schveighoffer 
wrote:
 On 8/25/15 9:51 AM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
 <marcioapm gmail.com>\"" wrote:
      [...]
Sorry, I misunderstood what epsilon was (I think it's the smallest incremental value for a given floating point type with an exponent of 1). Because your number is further away than this value, it doesn't help. You need to add something to correct for the error that might exist. The best thing to do is to add a very small number, as that will only adjust truly close numbers. In this case, the number you could add is 0.1, since it's not going to affect anything other than a slightly-off value. It depends on where you expect the error to be. As bachmeier says, it's not something that's easy to get right.
      [...]
these work because you have converted to double, which appears to round up. -Steve
I didn't convert to double. My computations are all in double to start with, as you can see for my explicit types everywhere. If you compile it with *GDC* it works fine. If you compile a port with clang, gcc or msvc, it works right as well. I suspect it will also work fine with LDC.
Aug 25 2015
parent reply "Matthias Bentrup" <matthias.bentrup googlemail.com> writes:
On Tuesday, 25 August 2015 at 15:19:41 UTC, Márcio Martins wrote:
 If you compile it with *GDC* it works fine. If you compile a 
 port with clang, gcc or msvc, it works right as well. I suspect 
 it will also work fine with LDC.
The same program "fails" in gcc too, if you use x87 math. Usually C compilers allow excess precision for intermediate results, because the extra precision seldom hurts and changing precision on x87 is very expensive (depends on the CPU, but it is more expensive than the trigonometric functions on some models).
Aug 25 2015
parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 25 August 2015 at 21:21:59 UTC, Matthias Bentrup 
wrote:
 On Tuesday, 25 August 2015 at 15:19:41 UTC, Márcio Martins 
 wrote:
 If you compile it with *GDC* it works fine. If you compile a 
 port with clang, gcc or msvc, it works right as well. I 
 suspect it will also work fine with LDC.
The same program "fails" in gcc too, if you use x87 math. Usually C compilers allow excess precision for intermediate results, because the extra precision seldom hurts and changing precision on x87 is very expensive (depends on the CPU, but it is more expensive than the trigonometric functions on some models).
That's because of floating point exception. It is very constraining for the hardware.
Aug 25 2015
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 25 August 2015 at 14:54:41 UTC, Steven Schveighoffer 
wrote:
 As bachmeier says, it's not something that's easy to get right.
Are you sure you follow IEEE 754 recommendations? Floating point arithmetics should be reproducible according to the chosen rounding mode. https://en.wikipedia.org/wiki/IEEE_floating_point#Reproducibility «The reproducibility clause recommends that language standards should provide a means to write reproducible programs (i.e., programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.»
Aug 25 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/25/15 11:56 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Tuesday, 25 August 2015 at 14:54:41 UTC, Steven Schveighoffer wrote:
 As bachmeier says, it's not something that's easy to get right.
Are you sure you follow IEEE 754 recommendations? Floating point arithmetics should be reproducible according to the chosen rounding mode. https://en.wikipedia.org/wiki/IEEE_floating_point#Reproducibility «The reproducibility clause recommends that language standards should provide a means to write reproducible programs (i.e., programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.»
I'm not an expert on floating point, but I have written code that uses it, and I have gotten it very wrong because I didn't take into account the floating point error (worst was causing an infinite loop in a corner case). I'll note that D does exactly what C does in the case where you are using 80-bit floating point numbers. There is definitely an issue with the fact that storing it as a double causes a change in the behavior, and that D doesn't treat expressions that are typed as doubles, as doubles. I see an issue with this: double x = 1.2; auto y = x * 10.0; // typed as double writefln("%s %s", cast(ulong)y, cast(ulong)(x * 10.0)); // 12 11 IMO, these two operations should be the same. If the result of an expression is detected to be double, then it should behave like one. You can't have the calculation done in 80-bit mode, and then magically throw away the rounding to get to 64-bit mode. I think Marcio has a point that this is both surprising and troublesome. But I think this is an anecdotal instance of a toy example. I'd expect real code to use adjustments when truncating to avoid the FP error (this obviously isn't his real code). -Steve
Aug 25 2015
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 25 August 2015 at 17:40:06 UTC, Steven Schveighoffer 
wrote:
 I'll note that D does exactly what C does in the case where you 
 are using 80-bit floating point numbers.
I don't think C specifies how it should be done, but some compilers have a "precise" compilation flag that is supposed to retain order and accurate intermediate rounding.
 IMO, these two operations should be the same. If the result of 
 an expression is detected to be double, then it should behave 
 like one. You can't have the calculation done in 80-bit mode, 
 and then magically throw away the rounding to get to 64-bit 
 mode.
Yes, that is rather obvious. IEEE754-2008 go much further than that, though. It requires that all arithmetic have correct rounding. Yes, I am aware that the D specification allows higher precision, but it seems to me that this neither gets you predictable results or maximum performance. And what is the point of being able to set the rounding mode if you don't know the bit width used? It is a practical issue in all simulations where you want reproducible results. If D is meant for scientific computing it should support correct rounding and reproducible results. If D is meant for gaming it should provide ways of expressing minimum precision or other ways of loosening the accuracy where needed. I'm not really sure which group the current semantics appeals to. I personally either want reproducible or very fast...
Aug 25 2015
parent reply "bachmeier" <no spam.com> writes:
On Tuesday, 25 August 2015 at 18:15:03 UTC, Ola Fosheim Grøstad 
wrote:

 It is a practical issue in all simulations where you want 
 reproducible results. If D is meant for scientific computing it 
 should support correct rounding and reproducible results. If D 
 is meant for gaming it should provide ways of expressing 
 minimum precision or other ways of loosening the accuracy where 
 needed.  I'm not really sure which group the current semantics 
 appeals to. I personally either want reproducible or very 
 fast...
As long as it doesn't change from one release of the compiler to the next, we have reproducibility. In many cases though, reproducibility doesn't mean exact reproducibility, at least in the old days it didn't, due to floating point issues. You generally want to allow for replication of the results using other languages, so you have to allow for some differences. I'm pretty sure Walter has stated the reason that you cannot count on exact precision, but I don't remember what it is.
Aug 25 2015
next sibling parent reply "Warwick" <warwick warwick.com> writes:
On Tuesday, 25 August 2015 at 20:00:11 UTC, bachmeier wrote:
 On Tuesday, 25 August 2015 at 18:15:03 UTC, Ola Fosheim Grøstad 
 wrote:

 I'm pretty sure Walter has stated the reason that you cannot 
 count on exact precision, but I don't remember what it is.
Probably because DMD is spewing out x87 code. The x87 FPU converts everything to its internal working bit depth before it does the math op. You can set it to work at different bit depths but IIRC it's a fairly expensive operation to change the FPU flags. You really dont want to be doing it every time some mixes a double and a float. The compilers that dont exhibit this problem might set the x87 to work at 64 bit at startup or more likely they are using scalar SSE. You cant mix different depth operands in SSE. You cant multiply a float by double for example, you have to convert one of them so they have the same type. So in SSE the bit depth of every op is always explicit.
Aug 25 2015
parent "rumbu" <rumbu rumbu.ro> writes:
On Tuesday, 25 August 2015 at 21:30:03 UTC, Warwick wrote:

 The compilers that dont exhibit this problem might set the x87 
 to work at 64 bit at startup or more likely they are using 
 scalar SSE. You cant mix different depth operands in SSE. You 
 cant multiply a float by double for example, you have to 
 convert one of them so they have the same type. So in SSE the 
 bit depth of every op is always explicit.
True word: This is msvc compiler generated code (default configuration, debug): double x = 1.2; 012F174E movsd xmm0,mmword ptr ds:[12F6B30h] 012F1756 movsd mmword ptr [x],xmm0 unsigned long long u = (unsigned long long)(x * 10); 012F175B movsd xmm0,mmword ptr [x] 012F1760 mulsd xmm0,mmword ptr ds:[12F6B40h] 012F1768 call __dtoul3 (012F102Dh) 012F176D mov dword ptr [u],eax 012F1770 mov dword ptr [ebp-18h],edx
Aug 25 2015
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/25/2015 10:00 PM, bachmeier wrote:
 As long as it doesn't change from one release of the compiler to the
 next, we have reproducibility.
No, we don't. There are multiple platforms.
Aug 25 2015
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/26/2015 12:46 AM, Timon Gehr wrote:
 On 08/25/2015 10:00 PM, bachmeier wrote:
 As long as it doesn't change from one release of the compiler to the
 next, we have reproducibility.
No, we don't. There are multiple platforms.
Oh, and multiple compilers. We don't "have" reproducibility unless it's in the spec, and the opposite is in the spec.
Aug 25 2015
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 25 August 2015 at 20:00:11 UTC, bachmeier wrote:
 to the next, we have reproducibility. In many cases though, 
 reproducibility doesn't mean exact reproducibility, at least in 
 the old days it didn't, due to floating point issues. You 
 generally want to allow for replication of the results using 
 other languages, so you have to allow for some differences.
You don't get portable results for some builtin float functions, but otherwise I believe the 2008 edition of IEEE is exact. Latest version of ECMAScript also use the 2008 version of IEEE.
Aug 25 2015
parent "bachmeier" <no spam.net> writes:
On Tuesday, 25 August 2015 at 23:09:07 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 August 2015 at 20:00:11 UTC, bachmeier wrote:
 to the next, we have reproducibility. In many cases though, 
 reproducibility doesn't mean exact reproducibility, at least 
 in the old days it didn't, due to floating point issues. You 
 generally want to allow for replication of the results using 
 other languages, so you have to allow for some differences.
You don't get portable results for some builtin float functions, but otherwise I believe the 2008 edition of IEEE is exact. Latest version of ECMAScript also use the 2008 version of IEEE.
I haven't looked at any of this in years. It sounds like the situation is better now.
Aug 25 2015
prev sibling parent "bachmeier" <no spam.com> writes:
On Tuesday, 25 August 2015 at 11:14:35 UTC, Steven Schveighoffer 
wrote:
 On 8/24/15 5:34 PM, "=?UTF-8?B?Ik3DoXJjaW8=?= Martins\" 
 <marcioapm gmail.com>\"" wrote:
 On Monday, 24 August 2015 at 21:03:50 UTC, Steven 
 Schveighoffer wrote:
 I understand the inconsistency, and I agree it is an issue 
 that should
 be examined. But the issue is entirely avoidable by not using
 incorrect methods to convert from floating point to integer 
 after
 floating point operations introduce some small level of error.

 Perhaps there is some way to make it properly round in this 
 case, but
 I guarantee it will not fix all floating point errors.
What is the correct way to truncate, not round, a floating-point value to an integer?
auto result = cast(ulong)(x * 10.0 + x.epsilon); -Steve
That will work in this case (or maybe not, as Marcio's other post shows) but it's still not a general solution. You're imposing the assumption that anything sufficiently close to an integer value is that integer. Truncating a floating point number is not a well-defined exercise because you only know an interval that holds the true value.
Aug 25 2015
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins wrote:
 I'm posting this here for visibility. This was silently 
 corrupting our data, and might be doing the same for others as 
 well.

 import std.stdio;
 void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
 }

 Output:
 11
 12


 to!ulong instead of the cast does the right thing, and is a 
 viable work-around.

 Issue: https://issues.dlang.org/show_bug.cgi?id=14958)
http://www.smbc-comics.com/?id=2999
Aug 24 2015
prev sibling parent "Matthias Bentrup" <matthias.bentrup googlemail.com> writes:
On Monday, 24 August 2015 at 16:52:54 UTC, Márcio Martins wrote:
 I'm posting this here for visibility. This was silently 
 corrupting our data, and might be doing the same for others as 
 well.

 import std.stdio;
 void main() {
   double x = 1.2;
   writeln(cast(ulong)(x * 10.0));
   double y = 1.2 * 10.0;
   writeln(cast(ulong)y);
 }

 Output:
 11
 12
Internally the first case calculates x * 10.0 in real precision and casts it to ulong in truncating mode directly. As 1.2 is not representable, x is really 1.199999999999999956 and the result is trunc(11.99999999999999956) = 11. In the second case x * 10.0 is calculated in real precision, but first converted to double in round-to-nearest mode and then the result is truncated.
Aug 25 2015