## digitalmars.D - Questions about IEEE754 floating point in D

- Trip Volpe <mraccident gmail.com> Feb 21 2010
- Don <nospam nospam.com> Feb 22 2010
- bearophile <bearophileHUGS lycos.com> Feb 22 2010
- Don <nospam nospam.com> Feb 23 2010
- "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> Feb 23 2010
- Walter Bright <newshound1 digitalmars.com> Feb 23 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 23 2010
- Don <nospam nospam.com> Feb 23 2010

I'm currently writing a compiler for my own language in D, and one of the things I'm implementing at the moment is the processing of floating-point literals. My primary reference is William Clinger's "How to read floating point numbers accurately," which is available here: ftp://ftp.ccs.neu.edu/pub/people/will/howtoread.pdf Clinger describes a method for guaranteeing the selection of the binary floating-point number most closely approximating the number input in decimal (or any other base). So far so good, but along the way I've had occasion to consider what D itself is doing, and I have a couple of questions: 1. Does D guarantee the closest approximation for decimal floating-point literals? I ask mainly because for unit testing it would be convenient to be able to write expect( 0.001 == nearestDouble( 1, -3, 10 ) ); as opposed to manually checking the mantissa and exponent. :-) 2. Minimum exponent. In D, double.min_exp is equal to -1021. However, the Wikipedia article on IEEE754-2008 and appendix D in Sun's Numerical Computation Guide ("What Every Computer Scientist Should Know About Floating-Point Arithmetic", http://docs.sun.com/source/806-3568/ncg_goldberg.html) list Emin for the IEEE754 double format as -1022. Is this an error? As expected under the standard, D has no trouble producing a normalized double with exponent less than -1021: DoubleRep dr; dr.value = 0x1p-1022; writefln("f = %d, e = %d", dr.fraction, dr.exponent); This prints "f = 0, e = 1", which corresponds to a mantissa of 1.0 and an exponent of -1022, as expected. If you try 0x1p-1023, you get a denormal, also as expected, with an exponent field of 0. Subtract DoubleRep.bias and you get -1023, which according to the standard must be Emin - 1. So why isn't double.min_exp equal to -1022?

Feb 21 2010

Trip Volpe wrote:1. Does D guarantee the closest approximation for decimal floating-point literals?

Not at present.2. Minimum exponent. In D, double.min_exp is equal to -1021. However, the Wikipedia article on IEEE754-2008 and appendix D in Sun's Numerical Computation Guide ("What Every Computer Scientist Should Know About Floating-Point Arithmetic", http://docs.sun.com/source/806-3568/ncg_goldberg.html) list Emin for the IEEE754 double format as -1022. Is this an error?

There are no errors anywhere. However, double.min_exp is defined in the spec as "the minimum value such that 2^^min_exp-1 is representable as a normalized value". This means that 2^^min_exp-1 == 2^^Emin. And indeed min_exp-1 == -1022 = Emin. I have no idea why min_exp is defined in such a peculiar way. In particular, I don't know why it's different from the definition of min_10_exp. It seems bizarre and useless. I've had a sudden thought though -- DMC/DMD used to have an out-by-1 bug in the %a format for denormals. Maybe this behaviour isn't intentional, but was rather a mistake, caused by that? Note to Walter: I changed min --> min_normal in the ddoc for my 'floatingpoint' article long ago, but it hasn't been copied into the download.

Feb 22 2010

Don:I have no idea why min_exp is defined in such a peculiar way. In particular, I don't know why it's different from the definition of min_10_exp. It seems bizarre and useless.

If you spot problems in such things it's MUCH better if you try to discuss&fix them now than never :-) Bye and thank you, bearophile

Feb 22 2010

Don wrote:Trip Volpe wrote:1. Does D guarantee the closest approximation for decimal floating-point literals?

Not at present.2. Minimum exponent. In D, double.min_exp is equal to -1021. However, the Wikipedia article on IEEE754-2008 and appendix D in Sun's Numerical Computation Guide ("What Every Computer Scientist Should Know About Floating-Point Arithmetic", http://docs.sun.com/source/806-3568/ncg_goldberg.html) list Emin for the IEEE754 double format as -1022. Is this an error?

There are no errors anywhere. However, double.min_exp is defined in the spec as "the minimum value such that 2^^min_exp-1 is representable as a normalized value". This means that 2^^min_exp-1 == 2^^Emin. And indeed min_exp-1 == -1022 = Emin. I have no idea why min_exp is defined in such a peculiar way. In particular, I don't know why it's different from the definition of min_10_exp. It seems bizarre and useless.

For C++, DBL_MIN_EXP = -1021. http://www.qnx.com/developers/docs/6.4.1/dinkum_en/ecpp/float.html And it is defined as the minimum integer such that FLT_RADIX^^(DBL_MIN_EXP - 1) is a normalized, finite representable value of type double. I have a vague recollection that this bizarre definition is for compatibility with an ancient mistake in C. Some clown miscalculated it, and by the time people realized, they felt it was too late to fix it.

Feb 23 2010

Don wrote:Don wrote:Trip Volpe wrote:1. Does D guarantee the closest approximation for decimal floating-point literals?

Not at present.2. Minimum exponent. In D, double.min_exp is equal to -1021. However, the Wikipedia article on IEEE754-2008 and appendix D in Sun's Numerical Computation Guide ("What Every Computer Scientist Should Know About Floating-Point Arithmetic", http://docs.sun.com/source/806-3568/ncg_goldberg.html) list Emin for the IEEE754 double format as -1022. Is this an error?

There are no errors anywhere. However, double.min_exp is defined in the spec as "the minimum value such that 2^^min_exp-1 is representable as a normalized value". This means that 2^^min_exp-1 == 2^^Emin. And indeed min_exp-1 == -1022 = Emin. I have no idea why min_exp is defined in such a peculiar way. In particular, I don't know why it's different from the definition of min_10_exp. It seems bizarre and useless.

For C++, DBL_MIN_EXP = -1021. http://www.qnx.com/developers/docs/6.4.1/dinkum_en/ecpp/float.html And it is defined as the minimum integer such that FLT_RADIX^^(DBL_MIN_EXP - 1) is a normalized, finite representable value of type double. I have a vague recollection that this bizarre definition is for compatibility with an ancient mistake in C. Some clown miscalculated it, and by the time people realized, they felt it was too late to fix it.

Then it sounds like something D should get right. -Lars

Feb 23 2010

Lars T. Kyllingstad wrote:Don wrote:I have a vague recollection that this bizarre definition is for compatibility with an ancient mistake in C. Some clown miscalculated it, and by the time people realized, they felt it was too late to fix it.

Then it sounds like something D should get right.

There's a problem with that - porting working C numerics code to D.

Feb 23 2010

Walter Bright wrote:Lars T. Kyllingstad wrote:Don wrote:I have a vague recollection that this bizarre definition is for compatibility with an ancient mistake in C. Some clown miscalculated it, and by the time people realized, they felt it was too late to fix it.

Then it sounds like something D should get right.

There's a problem with that - porting working C numerics code to D.

Let's do what we know works - define the right thing with a different name, and deprecate the existing name. Andrei

Feb 23 2010

Walter Bright wrote:Lars T. Kyllingstad wrote:Don wrote:I have a vague recollection that this bizarre definition is for compatibility with an ancient mistake in C. Some clown miscalculated it, and by the time people realized, they felt it was too late to fix it.

Then it sounds like something D should get right.

There's a problem with that - porting working C numerics code to D.

That's not a concern. The syntax is completely different, so it always requires thought. We could perhaps define float.min_2_exp with the correct value (by analogy with float.min_10_exp) and get rid of .min_exp. If porting C code mechanically, you'll just import core.stdc.float_; It contains the line: enum DBL_MIN_EXP = double.min_exp; which could be changed to: enum DBL_MIN_EXP = double.min_2_exp + 1; It's not a big deal, but it's certainly something we could fix.

Feb 23 2010