digitalmars.D.learn - float.max + 1.0 does not overflow

rumbu (8/8) Dec 27 2017 Is that normal?

Benjamin Thaut (23/31) Dec 27 2017 This is actually correct floating point behavior. Consider the

Dave Jones (12/25) Dec 28 2017 The float with the lower exponent would have to be shifted to

rumbu <rumbu rumbu.ro> writes:

Is that normal?

use std.math;
float f = float.max;
f += 1.0;
assert(IeeeFlags.overflow) //failure
assert(f == float.inf) //failure, f is in fact float.max

On the contrary, float.max + float.max will overflow. The 
behavior is the same for double and real.

Dec 27 2017

Benjamin Thaut <code benjamin-thaut.de> writes:

On Wednesday, 27 December 2017 at 13:40:28 UTC, rumbu wrote:
 Is that normal?

 use std.math;
 float f = float.max;
 f += 1.0;
 assert(IeeeFlags.overflow) //failure
 assert(f == float.inf) //failure, f is in fact float.max

 On the contrary, float.max + float.max will overflow. The 
 behavior is the same for double and real.

This is actually correct floating point behavior. Consider the 
following program:

float nextReprensentableToMax = float.max;
// find next smaller representable floating point number
(*cast(int*)&nextReprensentableToMax)--;
writefln("%f", float.max - nextReprensentableToMax);

It computes the difference between float.max and the next smaller 
reprensentable number in floating point. The difference printed 
by the program is:
20282409603651670423947251286016.0

As you might notice this is siginificantly bigger then 1.0. 
Floating point operations work like this: They perform the 
operation and then round to the nearest representable number in 
floating point. So adding 1.0 to float.max and then rounding to 
the nearest representable number will just give you back 
float.max. If you however add float.max and float.max the next 
nearest reprensentable number is float.inf.

When trying to understand how floating point works I would highly 
recommend that you read these articles (oldest first): 
https://randomascii.wordpress.com/category/floating-point/

Kind Regards
Benjamin Thaut

Dec 27 2017

Dave Jones <dave jones.com> writes:

On Wednesday, 27 December 2017 at 14:14:42 UTC, Benjamin Thaut 
wrote:
 On Wednesday, 27 December 2017 at 13:40:28 UTC, rumbu wrote:
 Is that normal?

 It computes the difference between float.max and the next 
 smaller reprensentable number in floating point. The difference 
 printed by the program is:
 20282409603651670423947251286016.0

 As you might notice this is siginificantly bigger then 1.0. 
 Floating point operations work like this: They perform the 
 operation and then round to the nearest representable number in 
 floating point. So adding 1.0 to float.max and then rounding to 
 the nearest representable number will just give you back 
 float.max. If you however add float.max and float.max the next 
 nearest reprensentable number is float.inf.

The float with the lower exponent would have to be shifted to 
match the higher which means 1.0 would be shifted something like 
156 bits to the right before the addition can be done. If you 
shift right more bits than are in the mantissa then it get 
rounded to zero. Hence once the two values are lined up to do the 
actual op it becomes float.max + 0.0.

That said i suspect the OP was expecting the FPU unit to catch 
that in theory it should overflow. Not that the actual op would 
overflow but that the FPU would be checking the values on input. 
Maybe.

Dec 28 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - float.max + 1.0 does not overflow