www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Nice document on IEEE 754 floating point arithmetic

reply Norbert Nemec <Norbert Nemec-online.de> writes:
Hi there,

I just found a really nice document at:

 http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

It really gives a lot of insight on the rationale behind IEEE754 design,
also focusing a lot on language and compiler design.

One of the most relevant points with respect to D is probably the behavior
of comparison operators. The D specs go a rather practical way by defining
comparisons based on mathematical semantics. For floating points, though,
this is often not correct.

There probably are a few more points to consider when evaluating D for
numerical purposes. Some of the demands given in the document are probably
unrealistic for a general purpose language, but quite a number seem
perfectly reasonable to me.

Anyhow: instead of spending many words on the topic at this point, I would
rather advise anyone interested in numerics to have a look at the document
- be it only to get an understanding what the concerns might be about.

Ciao,
Norbert
Jan 10 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:crv1i3$2bn0$1 digitaldaemon.com...
 Hi there,

 I just found a really nice document at:

  http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

 It really gives a lot of insight on the rationale behind IEEE754 design,
 also focusing a lot on language and compiler design.

 One of the most relevant points with respect to D is probably the behavior
 of comparison operators. The D specs go a rather practical way by defining
 comparisons based on mathematical semantics. For floating points, though,
 this is often not correct.

 There probably are a few more points to consider when evaluating D for
 numerical purposes. Some of the demands given in the document are probably
 unrealistic for a general purpose language, but quite a number seem
 perfectly reasonable to me.

 Anyhow: instead of spending many words on the topic at this point, I would
 rather advise anyone interested in numerics to have a look at the document
 - be it only to get an understanding what the concerns might be about.
Thanks for the pointer. As far as I know, D (and Digital Mars C/C++) are the only languages that properly support NaN's in comparison operators. This is deliberate on my part, even at the slight cost of performance it entails. Digital Mars compilers have always leaned towards doing accurate and correct floating point as a priority over performance.
Jan 12 2005
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Walter wrote:
 As far as I know, D (and Digital Mars C/C++) are
 the only languages that properly support NaN's in comparison operators.
 This is deliberate on my part, even at the slight cost of performance it
 entails. Digital Mars compilers have always leaned towards doing accurate
 and correct floating point as a priority over performance.
Great" "Full IEEE 754 conformance" definitely would be a tremendous argument for using D in the numerics field. Anyhow, there is a few details I wonder about: A) The chapter "Expressions", section "Equality Expressions" states "If either or both operands are NaN, then both the == and != comparisons return false." which is in contrast to the table in "Relational Expressions" (and to the IEEE754 standard as well) B) The note 1. under the same table states that "For floating point comparison operators, (a !op b) is not the same as !(a op b)." It should be noted that this refers only to the question whether they signal on NaNs - otherwise, this sentence is hard to understand. C) Raising of Invalid Exceptions on should just be an option but not the default. IEEE754 states that a *flag* should be raised which can then be checked and reset by the user lateron. Raising an exception would be equivalent to what the document calls "trapping". Unless you are debugging your code, that is hardly ever what you want. The whole power of the NaN-concept is that they do not interrupt the calculation but instead are handled just like similar numbers. My output data from numerical calculations usually is a long block of floating points which may contain some NaNs. This simply tells me that I hit singularities or other special points in some cases and either drop or specially mark these points in the resulting plots. D) Furthermore: even if the behavior for native floats is correct, operator overloading still is not capable to mimic this behavior. In any case, I think that operator overloading is not quite flexible enough in several respects. Matlab, for example, allows comparing two arrays of numbers, returning an array of bools, which can then be used in many ways. In D, this would not be possible, since the comparison operators are based on opCmp. Furthermore, ! is not overloadable at all, so even if I write an opEquals for two arrays returning an array of bools, I could never mimic the correct behavior for != My suggestion would be: * Introduce opNot (unlike && and ||, there is no compelling reason why it should not be overloadable) * Introduce opLess, opGreater, opLessOrEqual, opGreaterOrEqual. If these are not defined, the compiler can still fall back to opCmp. In the same course, even == could fall back to opCmp if opEquals does not exist.
Jan 12 2005
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Norbert Nemec wrote:

 My suggestion would be:
 * Introduce opNot (unlike && and ||, there is no compelling reason why it
 should not be overloadable)
 * Introduce opLess, opGreater, opLessOrEqual, opGreaterOrEqual. If these
 are not defined, the compiler can still fall back to opCmp. In the same
 course, even == could fall back to opCmp if opEquals does not exist.
On second thought: if this were done, one clearly would have to take care of .sort and other builtins that are based on opCmp. How is .sort currently supposed to behave on hitting a NaN in an array of floats? One way to deal with this would be to offer both: opLess etc. for doing exact comparisons that may follow IEEE754 standards or whatever exactly, and opCmp that might not be mathematically correct but guarantees a partial ordering of the list. For float-like objects, opCmp would then just do binary sorting which is mostly accurate and good for sortings floats, but not IEEE754 conformant. Still, <, >, <=, etc. would do the correct thing since they are mapped to the opLess, etc. functions instead of the over-simplistic opCmp
Jan 13 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:cs5bt7$1s3s$1 digitaldaemon.com...
 How is .sort currently supposed to behave on hitting a NaN in an array of
 floats?
Putting all the NaN's at the end is one reasonable solution.
 One way to deal with this would be to offer both: opLess etc. for doing
 exact comparisons that may follow IEEE754 standards or whatever exactly,
 and opCmp that might not be mathematically correct but guarantees a
partial
 ordering of the list. For float-like objects, opCmp would then just do
 binary sorting which is mostly accurate and good for sortings floats, but
 not IEEE754 conformant. Still, <, >, <=, etc. would do the correct thing
 since they are mapped to the opLess, etc. functions instead of the
 over-simplistic opCmp
Jan 13 2005
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Walter wrote:
 "Norbert Nemec" <Norbert Nemec-online.de> wrote in message
 news:cs5bt7$1s3s$1 digitaldaemon.com...
 
How is .sort currently supposed to behave on hitting a NaN in an array of
floats?
Putting all the NaN's at the end is one reasonable solution.
I noticed in that document that the spec was designed such that you could sort floats using an ordinary sort (as though they were ints). I don't remember any exception for NaN's, but maybe I missed or forgot it.
Jan 13 2005
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Russ Lewis wrote:

 Walter wrote:
 "Norbert Nemec" <Norbert Nemec-online.de> wrote in message
 news:cs5bt7$1s3s$1 digitaldaemon.com...
 
How is .sort currently supposed to behave on hitting a NaN in an array of
floats?
Putting all the NaN's at the end is one reasonable solution.
I noticed in that document that the spec was designed such that you could sort floats using an ordinary sort (as though they were ints). I don't remember any exception for NaN's, but maybe I missed or forgot it.
True, if you sort floats by their binary representation, you get the correct ordering for all finite and infinite values with the NaNs sorted to some special position. This would mean that opCmp for floats could just do a binary comparison while the relational operators would have to be decoupled from it. Exactly following my proposal that relational operators should be overloadable individually (using opLess, opGreater, etc.) and only fall back to opCmp if the former do not exist.
Jan 13 2005
prev sibling parent "Walter" <newshound digitalmars.com> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message
news:cs41vr$2vgl$1 digitaldaemon.com...
 Walter wrote:
 As far as I know, D (and Digital Mars C/C++) are
 the only languages that properly support NaN's in comparison operators.
 This is deliberate on my part, even at the slight cost of performance it
 entails. Digital Mars compilers have always leaned towards doing
accurate
 and correct floating point as a priority over performance.
Great" "Full IEEE 754 conformance" definitely would be a tremendous
argument
 for using D in the numerics field.

 Anyhow, there is a few details I wonder about:

 A) The chapter "Expressions", section "Equality Expressions" states

  "If either or both operands are NaN, then both the == and != comparisons
  return false."

 which is in contrast to the table in "Relational Expressions" (and to the
 IEEE754 standard as well)
I believe D has this to be correct. Note that there are two different equality operators, one that says a NaN operand returns false and the other giving true for a NaN operand. IEEE 754 suggests "=" for the former (lining up with D's "=="), and "?=" for the latter (in D it is "!<>").
 B) The note 1. under the same table states that "For floating point
 comparison operators, (a !op b) is not the same as !(a op b)." It should
be
 noted that this refers only to the question whether they signal on NaNs -
 otherwise, this sentence is hard to understand.
NaNs in general are hard to get used to <g>. But it's worthwhile.
 C) Raising of Invalid Exceptions on should just be an option but not the
 default. IEEE754 states that a *flag* should be raised which can then be
 checked and reset by the user lateron. Raising an exception would be
 equivalent to what the document calls "trapping". Unless you are debugging
 your code, that is hardly ever what you want. The whole power of the
 NaN-concept is that they do not interrupt the calculation but instead are
 handled just like similar numbers. My output data from numerical
 calculations usually is a long block of floating points which may contain
 some NaNs. This simply tells me that I hit singularities or other special
 points in some cases and either drop or specially mark these points in the
 resulting plots.
True, and this is exactly what D does - set the invalid operation flag, which is sticky, and can be tested/set/cleared under programmer control.
 D) Furthermore: even if the behavior for native floats is correct,
operator
 overloading still is not capable to mimic this behavior.

 In any case, I think that operator overloading is not quite flexible
enough
 in several respects. Matlab, for example, allows comparing two arrays of
 numbers, returning an array of bools, which can then be used in many ways.
 In D, this would not be possible, since the comparison operators are based
 on opCmp. Furthermore, ! is not overloadable at all, so even if I write an
 opEquals for two arrays returning an array of bools, I could never mimic
 the correct behavior for !=

 My suggestion would be:
 * Introduce opNot (unlike && and ||, there is no compelling reason why it
 should not be overloadable)
 * Introduce opLess, opGreater, opLessOrEqual, opGreaterOrEqual. If these
are
 not defined, the compiler can still fall back to opCmp. In the same
course,
 even == could fall back to opCmp if opEquals does not exist.
You're right that the current opCmp overloading cannot handle NaN operands. Overloading opNot won't help, either, and I firmly believe that it is a mistake to overload opNot. The problem is that opCmp returns one of 3 states, but 4 states are needed. I think the correct approach is to have an opFCmp overload that returns one of 4 states (less than, greater than, equal, unordered) which the code generator can use to support each of the extended relational operators. The operator overloading mechanism would first look for opFCmp, only if that does not exist will it look for opCmp.
Jan 13 2005