## digitalmars.D - Submission: Floating point near-equality

- Don Clugston <Don_member pathlink.com> Aug 14 2005
- BCS <BCS_member pathlink.com> Aug 15 2005
- Dave <Dave_member pathlink.com> Aug 15 2005
- "Ben Hinkle" <bhinkle mathworks.com> Aug 15 2005
- "Ben Hinkle" <bhinkle mathworks.com> Aug 15 2005
- Don Clugston <Don_member pathlink.com> Aug 15 2005
- "Ben Hinkle" <ben.hinkle gmail.com> Aug 15 2005
- Don Clugston <dac nospam.com.au> Aug 15 2005
- "Ben Hinkle" <ben.hinkle gmail.com> Aug 16 2005
- Don Clugston <dac nospam.com.au> Aug 16 2005
- Don Clugston <dac nospam.com.au> Aug 16 2005
- "Ben Hinkle" <ben.hinkle gmail.com> Aug 17 2005
- Shammah Chancellor <Shammah_member pathlink.com> Aug 17 2005
- Don Clugston <dac nospam.com.au> Aug 17 2005
- "Ben Hinkle" <ben.hinkle gmail.com> Aug 17 2005
- "Walter" <newshound digitalmars.com> Aug 15 2005
- Don Clugston <Don_member pathlink.com> Aug 15 2005
- "Ben Hinkle" <ben.hinkle gmail.com> Aug 15 2005
- Don Clugston <Don_member pathlink.com> Aug 15 2005

Submission: probably for std.math int precisionEquality(real x, real y) "How equal" are two reals x and y? Rationale: The use of == to test floating point numbers for equality is somewhat dangerous, because == is completely intolerant of round-off error. For example, almost every unit test involving floating point numbers has an assertion of floating-point equality; but operator == is usually far too strict. Instead, some form of "close enough" test must be used. If not provided by the language, programmers must write this test themselves; but unfortunately, it's easy to write this test incorrectly. In std.math and std.math2 there are two floating point closeness tests: feq() and mfeq(). Ultimately these are: return fabs(x - y) <= precision; where precision is (say) 0.000001 This is incorrect: for very small numbers, this is a very weak assertion eg mfeq(1e-30, 7.8e-300) returns true! but for large numbers, eg x=1e60, it is an *extremely* strong assertion. The "correct" way to do it, according to Knuth, is to combine a ratio test with a difference test. But, this is a bit slow, and it still has the arbitrary magic precision number ("0.000001" in the code above). An interesting idea by Bruce Dawson (http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm) is to calculate the number of representable IEEE reals which lie between x and y. Unfortunately.. * the answer depends on whether the numbers are float, double, or real. * the resulting value gets _very_ large if x and y are far apart... * ... and with an IEEE extended float, it won't even fit in a ulong! * it's not very intuitive. IMHO, what you actually want is a function which tells you: "to how many significant figures (bits) is x==y?" The function below works on 80-bit reals. It is very fast (with possibilities for further optimisation), and it provides sensible results for any input, including subnormals, infinities, and NaNs. It could easily be converted for reals with different precision, (you'd just need to know where the exponent lies). For example, this function ensures that there are at most 4 bits of round-off error: int feq(real x, real y) { return precisionEquality(x, y, real.mant_dig-4); } As well as providing a 'closeness' test, the function allows you to evaluate the accuracy of a calculation. More on this in a future post. writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI)); I believe that some equivalent to this function belongs in std.math. It would be used in almost every unit test involving floating point numbers! Some design choices which are open to debate: * the name. precisionEquality() is the best I've come up with so far. * behaviour with NaNs. Like ==, the version below states that two NaNs are completely unequal. But it would also be reasonable to state that they are equal to 64 bits! * My expansion of the notion of "significant figures" to negative values is unconventional. This is my first D contribution, my style may be inadequate. I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, no language has ever got this right. -Don. ================================================ import std.math2; // for abs import std.math; // for isNan /* int precisionEquality(real x, real y) "How equal" are two reals x and y? Returns the number of mantissa bits which are equal in x and y; this is similar to the concept of "significant figures". For example, 0x1.F8p+60 and 0x1.F1p+60 are equal to 5 bits of precision. If x==y, then precisionEquality(a,b) == real.mant_dig. If x and y differ by a factor of two or more, then they have no bits in common, so this function returns a value which is <=0. Returns 0 if they differ by a factor of >=2 but < 4, returns 1 for factors >=4 but <8, etc. If the numbers have opposite sign, the difference in orders of magnitude is based on the IEEE binary encoding. For example, -1 and +1 differ by half the number line. Pretend the numbers were fixed point. In the IEEE extended format, there are about 32768 digits in front of the decimal point, and the same number behind. But only the first 64 bits after the first '1' bit are recorded. If the exponents are different, then the return value is negative, and gives the number of (binary) orders of magnitude of difference between a and b. */ int precisionEquality(real a, real b) { real diff = a-b; ushort *pa = cast(ushort *)(&a); ushort *pb = cast(ushort *)(&b); ushort *pd = cast(ushort *)(&diff); // The difference in exponent between 'a' and 'a-b' // is equal to the number of mantissa bits of a which are // equal to b. If the difference is negative, then a and b // have different exponents. If it is positive, then a and b // are equal to that number of decimal places. // AND with 0x7FFF to form the absolute value. // Subtract 1 so that we round downwards. int bitsdiff = ( ((pa[4]&0x7FFF) + (pb[4]&0x7FFF)-1)>>1) - (pd[4]&0x7FFF); if ((pd[4]&0x7FFF) == 0) { // Difference is zero or denormal if (a==b) return real.mant_dig; // 'diff' is denormal, so the number of equal bits is higher; // we need to add the number of zeros that lie at the start of // its mantissa. We do this by multiplying by 2^real.mant_dig diff*=0x1p+63; return bitsdiff + real.mant_dig-(pd[4]&0x7FFF); } if (bitsdiff>0) return bitsdiff+1; // Check for NAN or Infinity. if (pd[4]&0x7FFF==0x7FFF) { // Result is NAN or INF if (a==b) { // both were +INF or both were -INF. return real.mant_dig; } if (isnan(diff)) { return -65535; // at least one was a NAN. } // fall through for comparisons of INF with large reals. } // Return the negative of the absolute value of // the difference in exponents. if ((pa[4]^pb[4])&0x8000) { // do they have opposite signs? // Convert the 'offset' exponent format to twos-complement, // then do a normal subtraction. return 1-abs(pa[4]-(0x8000-pb[4])); } return 1-abs(pa[4]-pb[4]); } unittest { // Exact equality assert(precisionEquality(real.max,real.max)==real.mant_dig); assert(precisionEquality(0,0)==real.mant_dig); assert(precisionEquality(7.1824,7.1824)==real.mant_dig); assert(precisionEquality(real.infinity,real.infinity)==real.mant_dig); // one bit off exact equality assert(precisionEquality(1+real.epsilon,1)==real.mant_dig-1); assert(precisionEquality(1,1+real.epsilon)==real.mant_dig-1); // one bit below. Must round off correctly when exponents are different. assert(precisionEquality(1,1-real.epsilon)==real.mant_dig-1); assert(precisionEquality(1-real.epsilon,1)==real.mant_dig-1); assert(precisionEquality(-1-real.epsilon,-1)==real.mant_dig-1); assert(precisionEquality(-1+real.epsilon,-1)==real.mant_dig-1); // Numbers that are close assert(precisionEquality(0x1.Bp+5, 0x1.B8p+5)==5); assert(precisionEquality(0x1.8p+10, 0x1.Cp+10)==2); assert(precisionEquality(1, 1.1249)==4); assert(precisionEquality(1.5, 1)==1); // Extreme inequality assert(precisionEquality(real.nan,0)==-65535); assert(precisionEquality(0,-real.nan)==-65535); assert(precisionEquality(real.nan,real.infinity)==-65535); assert(precisionEquality(real.infinity,-real.infinity)==-65533); assert(precisionEquality(-real.max,real.infinity)==-65532); assert(precisionEquality(-real.max,real.max)==-65531); // Numbers that are half the number line apart // Note that (real.max_exp-real.min_exp) = 32765 assert(precisionEquality(-real.max,0)==-32765); assert(precisionEquality(-1,1)==-32765); assert(precisionEquality(real.min,1)==-16381); // Numbers that differ by one order of magnitude assert(precisionEquality(real.max,real.infinity)==0); assert(precisionEquality(1, 2)==0); assert(precisionEquality(2, 1)==0); assert(precisionEquality(1.5*(1-real.epsilon), 1)==2); assert(precisionEquality(4*(1-real.epsilon), 1)==0); assert(precisionEquality(4, 1)==-1); }

Aug 14 2005

#$% you!!! ;-) I was going to post on this EXACT subject today!! :) I was thinking of making it an operator, maybe "~~".

Aug 15 2005

In article <ddp3eh$b3$1 digitaldaemon.com>, Don Clugston says...

I'm no mathematician, but your proposal seems very reasonable and would be nice to incorporate into an operator of some kind too.

Aug 15 2005

Comments inline "Don Clugston" <Don_member pathlink.com> wrote in message news:ddp3eh$b3$1 digitaldaemon.com...Submission: probably for std.math int precisionEquality(real x, real y) "How equal" are two reals x and y? Rationale: The use of == to test floating point numbers for equality is somewhat dangerous, because == is completely intolerant of round-off error. For example, almost every unit test involving floating point numbers has an assertion of floating-point equality; but operator == is usually far too strict. Instead, some form of "close enough" test must be used. If not provided by the language, programmers must write this test themselves; but unfortunately, it's easy to write this test incorrectly.

Agreed - though the notion of 'close enough' depends on the situation. I think there was an earlier thread about this kind of thing, too. There's a really short thread in the archives with Walter's opinion: http://www.digitalmars.com/d/archives/digitalmars/D/6397.htmlIn std.math and std.math2 there are two floating point closeness tests: feq() and mfeq(). Ultimately these are: return fabs(x - y) <= precision; where precision is (say) 0.000001 This is incorrect: for very small numbers, this is a very weak assertion eg mfeq(1e-30, 7.8e-300) returns true! but for large numbers, eg x=1e60, it is an *extremely* strong assertion. The "correct" way to do it, according to Knuth, is to combine a ratio test with a difference test. But, this is a bit slow, and it still has the arbitrary magic precision number ("0.000001" in the code above).

It'll be hard to avoid magic numbers. Even a statement of "4 bits of precision" is a magic number - just in binary. I actually wouldn't mind being up-front about the fact that there are two notions of equality absolute and relative and have the test take two parameters (the relative and absute tolerances) and go with those. I'm personally not worried about the performance of a division or multiplication when doing a floating pt test - if that's the performance hit you're worried about.An interesting idea by Bruce Dawson (http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm) is to calculate the number of representable IEEE reals which lie between x and y. Unfortunately.. * the answer depends on whether the numbers are float, double, or real. * the resulting value gets _very_ large if x and y are far apart... * ... and with an IEEE extended float, it won't even fit in a ulong! * it's not very intuitive. IMHO, what you actually want is a function which tells you: "to how many significant figures (bits) is x==y?"

With special treatment for 0? or around 0?The function below works on 80-bit reals. It is very fast (with possibilities for further optimisation), and it provides sensible results for any input, including subnormals, infinities, and NaNs. It could easily be converted for reals with different precision, (you'd just need to know where the exponent lies).

It's actually very useful to have equality tests that know about float and double. That way the promotion to real doesn't introduce extra "precision" in a number that it never had to start with and it's easier to just type feq(x,y) when x and y are doubles instead of something like feq(x,y,double.mant_dig-4)For example, this function ensures that there are at most 4 bits of round-off error: int feq(real x, real y) { return precisionEquality(x, y, real.mant_dig-4); }

The precisionEquality function below doesn't take 3 inputs. Did you mean something like return precisionEquality(x,y) >= real.mant_dig-4 Also glancing over the code does it assume little-endian architecture? I see the ushort* used to grab different parts of the real.As well as providing a 'closeness' test, the function allows you to evaluate the accuracy of a calculation. More on this in a future post. writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI)); I believe that some equivalent to this function belongs in std.math. It would be used in almost every unit test involving floating point numbers! Some design choices which are open to debate: * the name. precisionEquality() is the best I've come up with so far.

yeah - that name is pretty long. 'feq' would be the best IMHO.* behaviour with NaNs. Like ==, the version below states that two NaNs are completely unequal. But it would also be reasonable to state that they are equal to 64 bits!

NaN's should always be unequal. A function 'fis' could be defined that returned true if both are NaN.* My expansion of the notion of "significant figures" to negative values is unconventional.

I'm not sure what you are referring to.This is my first D contribution, my style may be inadequate. I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, no language has ever got this right. -Don.

It would be nice to combine that with an abstol parameter near 0. That way a test like feq(0,double.epsilon) will be true. I could see something like feq checking if one of the two values is exactly 0 and doing absolute checking there - or just doing a straightforward abs/rel check.

Aug 15 2005

In std.math and std.math2 there are two floating point closeness tests: feq() and mfeq(). Ultimately these are: return fabs(x - y) <= precision; where precision is (say) 0.000001 This is incorrect: for very small numbers, this is a very weak assertion eg mfeq(1e-30, 7.8e-300) returns true! but for large numbers, eg x=1e60, it is an *extremely* strong assertion. The "correct" way to do it, according to Knuth, is to combine a ratio test with a difference test. But, this is a bit slow, and it still has the arbitrary magic precision number ("0.000001" in the code above).

It'll be hard to avoid magic numbers. Even a statement of "4 bits of precision" is a magic number - just in binary. I actually wouldn't mind being up-front about the fact that there are two notions of equality absolute and relative and have the test take two parameters (the relative and absute tolerances) and go with those. I'm personally not worried about the performance of a division or multiplication when doing a floating pt test - if that's the performance hit you're worried about.

To be more concrete here's what I was thinking of: const float TinyFloat = 1e-4f; const double TinyDouble = 1e-8; const real TinyReal = 1e-10L; // depends on real.epsilon int feq(real x, real y, real reltol = TinyReal, real abstol = TinyReal) { real diff = fabs(x-y); if (diff <= abstol) return 1; real fx = fabs(x); real fy = fabs(y); real scale = fx>fy ? fx : fy; return diff/scale <= reltol; } int feq(double x, double y, double reltol = TinyDouble, double abstol = TinyDouble) { return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol); } int feq(float x, float y, float reltol = TinyFloat, float abstol = TinyFloat) { return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol); } That way if you know you'll be dealing with very small numbers (less than TinyReal, for example) then one can pass 0 for the abstol (and let the optimzer remove any dead code) and only use the relative equality tests. If one wants a pure absolute test (suppose one knows the numbers will always be near 1) then pass 0 for reltol and let the optimzer remove that code. For "normal usage", though, numbers that are "close" either in absolute or relative terms will test as equal.

Aug 15 2005

From your first post:Also glancing over the code does it assume little-endian architecture? I see the ushort* used to grab different parts of the real.

Yes, it's just like the isfinite(), isnan() functions in std.math. It's also only valid for 80-bit reals. I'm just happy that I managed to do it without reverting to asm!It'll be hard to avoid magic numbers. Even a statement of "4 bits of precision" is a magic number - just in binary. I actually wouldn't mind being up-front about the fact that there are two notions of equality absolute and relative and have the test take two parameters (the relative and absute tolerances) and go with those.

The point is that in IEEE arithmetic, the relative and absolute tolerances are related via the number of bits of precision. Near zero, "bits of precision" looks like an absolute tolerance. Away from zero, it looks like relative tolerance. And there's a region in between where it's really difficult to think in terms of absolute or relative tolerance, they kind of blur together. Note that there are many uses beyond simple "equal-enough" tests. This function allows you to say: At what value of x are f(x) and g(x) most different? eg when is exp(ln(x)) most different to x? You can just try a range of different x values, and find the one where precisionEquality(exp(ln(x)), x) is smallest. You don't need to worry about whether you should be using absolute or relative tolerance, that's already taken care of. So, if writing a function, you can find where the precision is least adequate. Once you've done the best you can, you can put that into a unit test. This gives you a simple way of reporting the accuracy of library functions and identities, and for performing more powerful "second-order" unit tests. (eg, acos(cos(x)) must be close to x for all -PI/2 <= x <= PI/2). Then, if you know that cos(x) is accurate, you've proved that acos(x) is also accurate. My goal is to provide one-line unit tests for most of the math functions. You're right: for equality tests, it probably is necessary to have seperate implementations for double and float, because subnormals become normal when moved to higher precision. It's only a problem if your intermediate calculations were done in double or float precision, instead of reals. What is true, though, is that if precisionEquality(real x, real y) >= double.mant.dig then cast(double)x == cast(double)y. The converse does not apply because of the aforementioned subnormals. (But in this case you kind of "got lucky"). I did the real case first because it is the most difficult. It's trivial to extend it to other precisions. -Don.

Aug 15 2005

It'll be hard to avoid magic numbers. Even a statement of "4 bits of precision" is a magic number - just in binary. I actually wouldn't mind being up-front about the fact that there are two notions of equality absolute and relative and have the test take two parameters (the relative and absute tolerances) and go with those.

The point is that in IEEE arithmetic, the relative and absolute tolerances are related via the number of bits of precision. Near zero, "bits of precision" looks like an absolute tolerance. Away from zero, it looks like relative tolerance.

Do you mean "near one"? Near zero life gets very interesting. If I understand your algorithm correctly the number 1e-200 is still very far away from 0 or even 1e-201 or -1e-200. Then again I might not understand the algorithm.And there's a region in between where it's really difficult to think in terms of absolute or relative tolerance, they kind of blur together.

Agreed (I think). Over number that aren't "too small or too large" the absolute and relative measure pretty much the same thing. The distinction only gets important at the extremes.Note that there are many uses beyond simple "equal-enough" tests. This function allows you to say: At what value of x are f(x) and g(x) most different? eg when is exp(ln(x)) most different to x? You can just try a range of different x values, and find the one where precisionEquality(exp(ln(x)), x) is smallest. You don't need to worry about whether you should be using absolute or relative tolerance, that's already taken care of. So, if writing a function, you can find where the precision is least adequate. Once you've done the best you can, you can put that into a unit test. This gives you a simple way of reporting the accuracy of library functions and identities, and for performing more powerful "second-order" unit tests. (eg, acos(cos(x)) must be close to x for all -PI/2 <= x <= PI/2). Then, if you know that cos(x) is accurate, you've proved that acos(x) is also accurate. My goal is to provide one-line unit tests for most of the math functions.

In another post I pointed out that sin(PI) (which I just actually tried and got something around -1e-20) tests and having no digits of precision in common with 0. So I think in general an absolute component to the notion of equality will be important if you want trig functions to behave "normally".You're right: for equality tests, it probably is necessary to have seperate implementations for double and float, because subnormals become normal when moved to higher precision. It's only a problem if your intermediate calculations were done in double or float precision, instead of reals. What is true, though, is that if precisionEquality(real x, real y) >= double.mant.dig then cast(double)x == cast(double)y. The converse does not apply because of the aforementioned subnormals. (But in this case you kind of "got lucky").

Yup. I have no problem with doing things with real - I think that's the way to go. I'm just suggesting adding some overloads to make the common case of double and float easier to use.I did the real case first because it is the most difficult. It's trivial to extend it to other precisions. -Don.

Aug 15 2005

Ben Hinkle wrote:The point is that in IEEE arithmetic, the relative and absolute tolerances are related via the number of bits of precision. Near zero, "bits of precision" looks like an absolute tolerance. Away from zero, it looks like relative tolerance.

Do you mean "near one"? Near zero life gets very interesting. If I understand your algorithm correctly the number 1e-200 is still very far away from 0 or even 1e-201 or -1e-200. Then again I might not understand the algorithm.

I did mean "near zero". You just need to go much smaller with 80-bit reals! real.min = 3.3e-4932 !! And the smallest representable number is real.min * real.epsilon = 3.3 e-4951. It's just amazing how much range you have with long doubles. So 1e-200 is a very poor estimation to 0; there are *billions* of better IEEE reals! My algorithm only gives 4e-4932 as close to 0 (and it's also close to -4e-4932). It measures closeness in "representability space". You could reasonably argue that 1e-4900 is close to 1e-4930. But in representability space they are no closer than 1 vs 1e30. The question then is, is representability space what we want? (I suspect the answer is: sometimes, but not always. Maybe about half the time?)In another post I pointed out that sin(PI) (which I just actually tried and got something around -1e-20) tests and having no digits of precision in common with 0. So I think in general an absolute component to the notion of equality will be important if you want trig functions to behave "normally".

Ah. I'm learning a lot here. 1e-20 is normally an exceedingly poor approximation to 0. But, since sin(x) maps "linearly" to the interval (-1, 1), 1e-20 is close to 0 as sin(x) will get. Or alternatively, sin(PI) is small but not close to 0, because PI is not close enough to the mathematical pi. sin(n*PI) ==0 is not an identity which is preserved! But sin(x)*sin(x) + cos(x)*cos(x) == 1 seems to be preserved for all x. Yet there are clearly many contexts when you want to treat sin(PI) as zero. The relevant quantity seems to be the difference divided by the range. For trig functions (sin + cos, at least), it seems you only want an absolute difference, regardless of how close to 0 you are. A few weeks with D, and I've already learned more about numerical analysis than years with C++ :-) -Don.

Aug 15 2005

"Don Clugston" <dac nospam.com.au> wrote in message news:ddrvuo$2e0g$1 digitaldaemon.com...Ben Hinkle wrote:The point is that in IEEE arithmetic, the relative and absolute tolerances are related via the number of bits of precision. Near zero, "bits of precision" looks like an absolute tolerance. Away from zero, it looks like relative tolerance.

Do you mean "near one"? Near zero life gets very interesting. If I understand your algorithm correctly the number 1e-200 is still very far away from 0 or even 1e-201 or -1e-200. Then again I might not understand the algorithm.

I did mean "near zero". You just need to go much smaller with 80-bit reals! real.min = 3.3e-4932 !! And the smallest representable number is real.min * real.epsilon = 3.3 e-4951. It's just amazing how much range you have with long doubles. So 1e-200 is a very poor estimation to 0; there are *billions* of better IEEE reals! My algorithm only gives 4e-4932 as close to 0 (and it's also close to -4e-4932). It measures closeness in "representability space".

Ah ok. I didn't think you meant the underflow numbers as the the only things near 0. To me that's waaaay too near (typically). I'm talking just about normalized (ie "proper") numbers. In regular everyday use the concept of how close to zero is "close" is context dependent. When you have a simple number like x = 1+eps (which is very close to 1 in anyone's book) and then you ask "is x-1 near zero?" the answer depends on if you consider eps to be near zero. By your measurement just about all the time nothing will be near zero except zero (in particular eps will be extremely far away). Or to put it another way a given application might consider 3 and 3.001 to be close and 2 and 2.001 close and 1 and 1.001 close but as soon as they try to say 0 and .001 are close your test would say they are infinitely far apart. In my experience a rough measurement of near zero is abs(x)<sqrt(eps). That's why people need to be able to control both the relative tolerance (ie - bits of precision in agreement) and the absolute tolerance (ie - distance to zero).You could reasonably argue that 1e-4900 is close to 1e-4930. But in representability space they are no closer than 1 vs 1e30. The question then is, is representability space what we want? (I suspect the answer is: sometimes, but not always. Maybe about half the time?)

You mean in your measurement they are not close. In typical numerical apps that manipulate numbers near 1 a value of 1e-4900 is definetly near zero. It all depends on the scale of the problem being solved. The numerical applications I can think of in fact force the user to choose what they mean when they say "are these roughly equal". There isn't just one notion of equal or approximate so maybe the thing to do is make feqabs and feqrel. I'd say your precision test would make a great feqrel.

Aug 16 2005

Ben Hinkle wrote:you consider eps to be near zero. By your measurement just about all the time nothing will be near zero except zero (in particular eps will be extremely far away). Or to put it another way a given application might consider 3 and 3.001 to be close and 2 and 2.001 close and 1 and 1.001 close but as soon as they try to say 0 and .001 are close your test would say they are infinitely far apart.

Yes. There is so much precision available near zero, it's insane. It's clear that with IEEE extendeds, you'll almost never underflow to zero.In my experience a rough measurement of near zero is abs(x)<sqrt(eps). That's why people need to be able to control both the relative tolerance (ie - bits of precision in agreement) and the absolute tolerance (ie - distance to zero).

Agreed.You could reasonably argue that 1e-4900 is close to 1e-4930. But in representability space they are no closer than 1 vs 1e30. The question then is, is representability space what we want? (I suspect the answer is: sometimes, but not always. Maybe about half the time?)

You mean in your measurement they are not close. In typical numerical apps that manipulate numbers near 1 a value of 1e-4900 is definetly near zero.

Well, it is a genuine feature of the represention. I think that almost all of the time, having 64 bits available at 1e-4900 is not much use. They are definitely less useful than having 64 bits near 1!It all depends on the scale of the problem being solved. The numerical applications I can think of in fact force the user to choose what they mean when they say "are these roughly equal". There isn't just one notion of equal or approximate so maybe the thing to do is make feqabs and feqrel. I'd say your precision test would make a great feqrel.

I concede -- you're right. There are actually *three* 'close enough tests, not two: (1) relative error (2) absolute error near zero, to cope with subnormals/limited precision of FP numbers (3) absolute error near zero, which is application-specific. My code deals with (1) and (2) which are intimately related via the machine representation. But test (3) is important. As you say, it is going to be very rare that the difference between 1e-4900 and 1e-4930 is significant. It seems like a difficult problem in general, though. Consider the case you mentioned earlier, sin(PI). Consider writing a unit test that sin(n*PI)==0 for all n. sin(0) = 0 sin(PI) = -5e-20 sin(PI*1e5) = 3.7e-15 sin(PI*1e10) = 1.4 e-9 sin(PI*1e15) = 6.7 e-5 sin(PI*1e18) = 0.041 sin(PI*1e20) = 3.14159 e+20 ?????? Hmmm. also cos(PI*1e20) = 3.14159 e+20 sin(infinity) = nan. It appears to be impossible to come up with an absolute tolerance for what "close enough" means for all these cases; clearly it depends on the tolerance of the input. Can this be codified somehow? I think you'd hope that the following should be true for all x, n: -1 <= sin(x) <= 1 sin(x)*sin(x) + cos(x)*cos(x) = 1 sin (n*PI) is closer to zero than sin(n*PI+x.epsilon) and sin(n*PI-x.epsilon), if n*PI + PI >= n*PI * (1+ 4*epsilon) (ie, sin(x) crosses zero between sin(n*PI) and sin(n*PI +- epsilon) ). Anyway, sin(PI*1e20) fails any test I can think of. I think it should return nan. -Don.

Aug 16 2005

Ben Hinkle wrote:In typical numerical apps that manipulate numbers near 1 a value of

It all depends on the scale of the problem being solved.

IDEA: Do you think the same applies for very large numbers? For example, I think 1e4900 would often be acceptably close to 1e4930, and both are reasonable approximations to infinity (and in fact they'll be equal to infinity if you later truncate it to double or float precision). The IEEE number line is basically -inf -1 0 1 +inf |-----------|-----------|-----------|-----------| with all four sections of almost equal length. If this is the case, then actually you don't want an absolute tolerance: you want a comparison of the exponent bits. So, the appropriate algorithm is something like: if it is near 1, then compare the mantissa bits. Result: 0 to mant_dig. if it is near 0 or infinity, compare the exponent bits (4930 vs 4900). This would mean that 0 and real.min would be equal to 16 bits, even before taking the mantissa into account. But you'd only start counting from the first set bit in the exponent. eg. 0x1.0p+1 and 1.1111p+1 would have only one bit in common from the exponent + 1 bit from the mantissa = 2 bits in total. But 0x1.00p+0x3FFF (6e4931) and ( 1.1111p+0x3F00 (1e4850) would have 6bits in common from the exponent + 0 from the mantissa. Q. Is there any way to combine these two? You might hope that precisionEquality(x, y) >= double.something if and only if cast(double)x == cast(double)y; and likewise for float. This means that double.min would have to have as much closeness to 0 as 1+double.epsilon does to 1. I think that when you add the requirements for float, it becomes mathematically impossible. But it might lead in the right direction. -Don.

Aug 16 2005

"Don Clugston" <dac nospam.com.au> wrote in message news:ddudr8$1nqo$1 digitaldaemon.com...Ben Hinkle wrote:In typical numerical apps that manipulate numbers near 1 a value of

It all depends on the scale of the problem being solved.

IDEA: Do you think the same applies for very large numbers? For example, I think 1e4900 would often be acceptably close to 1e4930, and both are reasonable approximations to infinity (and in fact they'll be equal to infinity if you later truncate it to double or float precision). The IEEE number line is basically -inf -1 0 1 +inf |-----------|-----------|-----------|-----------| with all four sections of almost equal length.

Actually I like your original relative equality for large numbers. It's only at zero that relative inequality diverges from people's expectations of the real line (which is how one thinks of floating point numbers casually). Now if the real line were a real circle instead (connect -inf to +inf) then I'd say numbers near +-inf should be treated like near zero.If this is the case, then actually you don't want an absolute tolerance: you want a comparison of the exponent bits. So, the appropriate algorithm is something like: if it is near 1, then compare the mantissa bits. Result: 0 to mant_dig. if it is near 0 or infinity, compare the exponent bits (4930 vs 4900). This would mean that 0 and real.min would be equal to 16 bits, even before taking the mantissa into account. But you'd only start counting from the first set bit in the exponent. eg. 0x1.0p+1 and 1.1111p+1 would have only one bit in common from the exponent + 1 bit from the mantissa = 2 bits in total. But 0x1.00p+0x3FFF (6e4931) and ( 1.1111p+0x3F00 (1e4850) would have 6bits in common from the exponent + 0 from the mantissa. Q. Is there any way to combine these two? You might hope that precisionEquality(x, y) >= double.something if and only if cast(double)x == cast(double)y; and likewise for float. This means that double.min would have to have as much closeness to 0 as 1+double.epsilon does to 1. I think that when you add the requirements for float, it becomes mathematically impossible. But it might lead in the right direction. -Don.

I still think the way to go is either 1) have one function with two knobs - one for abs tol and one for rel tol or 2) have two functions each with one knob - one for abs tol and one for rel tol The function I posted originally was feq with two knobs but now I'm thinking its probably a better idea to have two functions. With one it's easy to forget which is the abstol and which is the reltol since D doesn't have named optional parameters. So I'd prefer two functions feqrel and feqabs. The name feqrel tells the user that essentially nothing is close to zero since a relative tolerance is used (I don't mind that feqrel doesn't actually use a relative tolerance for denormalized numbers - it might just be easier to say underflow values all are approx 0 but who knows). The name feqabs tells users not to use it for large numbers and that it matches regular closeness notions for numbers near zero.

Aug 17 2005

In article <ddudr8$1nqo$1 digitaldaemon.com>, Don Clugston says...The IEEE number line is basically -inf -1 0 1 +inf |-----------|-----------|-----------|-----------| with all four sections of almost equal length.

What is the formula that defines this coordinate system? Why not use this to redefine the numbers in terms of this system and then take their distance appart and compare that to an absolute tolerance for approximately equal. That would scale correctly. -Sha

Aug 17 2005

Shammah Chancellor wrote:In article <ddudr8$1nqo$1 digitaldaemon.com>, Don Clugston says...The IEEE number line is basically -inf -1 0 1 +inf |-----------|-----------|-----------|-----------| with all four sections of almost equal length.

What is the formula that defines this coordinate system? Why not use this to redefine the numbers in terms of this system and then take their distance appart and compare that to an absolute tolerance for approximately equal. That would scale correctly. -Sha

Yes, and that's exactly what my code does, but as Ben has pointed out, the internal representation frequently doesn't match your application. Specifically, IEEE reals are fantastic at distinguishing really small numbers, but you often don't care whether a number is 1e-2000 or 1e-4000: it's just "really really small". I think the problem only really comes up when comparing numbers to zero, or possibly to infinity. Even 0.000000000000000000001 is very close to 1, and very very far from 0 on the IEEE number line. Actually comparison with 0 is complicated, because sometimes you demand exact equality, and sometimes 'tiny' is acceptable. Also sometimes sign matters (a tiny positive number might be completely different to a positive negative number). -Don.

Aug 17 2005

"Don Clugston" <dac nospam.com.au> wrote in message news:de0ms5$o94$1 digitaldaemon.com...Shammah Chancellor wrote:In article <ddudr8$1nqo$1 digitaldaemon.com>, Don Clugston says...The IEEE number line is basically -inf -1 0 1 +inf |-----------|-----------|-----------|-----------| with all four sections of almost equal length.

What is the formula that defines this coordinate system? Why not use this to redefine the numbers in terms of this system and then take their distance appart and compare that to an absolute tolerance for approximately equal. That would scale orrectly. -Sha

Yes, and that's exactly what my code does, but as Ben has pointed out, the internal representation frequently doesn't match your application. Specifically, IEEE reals are fantastic at distinguishing really small numbers, but you often don't care whether a number is 1e-2000 or 1e-4000: it's just "really really small". I think the problem only really comes up when comparing numbers to zero, or possibly to infinity. Even 0.000000000000000000001 is very close to 1, and very very far from 0 on the IEEE number line. Actually comparison with 0 is complicated, because sometimes you demand exact equality, and sometimes 'tiny' is acceptable. Also sometimes sign matters (a tiny positive number might be completely different to a positive negative number). -Don.

Another way to describe your metric is like the usual Euclidean metric on the following nubmer lines: 4 2 1 1/2 1/4 ---|-----------|-----------|-----------|-----------|---- subnormal 0 -subnormal --|--|--|--|--|--|--|--|--|-- -1/4 -1/2 -1 -2 -4 ---|-----------|-----------|-----------|-----------|---- For example if two numbers like 2.001 and 2.002 are d distance apart then 4.002 and 4.004 are also d distance apart because they have the same mantissas but just differ in the exponents. That's also why essentially nothing is close to 0 and why the positive and negative axes are "not comparable". It's also why 0 is so special. That is, the formula for the coordinate system is log2(x).

Aug 17 2005

I think this is pretty cool. I've thought about building such a thing, but never got around to it. You've got the right idea, which is the number of significant bits that match. I agree that if one or both of the operands are NaN, the result should be 0. And you've got unit tests, too, which I love to see! Need to add a test for (real.nan, real.nan). I need to know what license you pick for it - public domain?

Aug 15 2005

In article <ddr5og$1phb$1 digitaldaemon.com>, Walter says...I think this is pretty cool. I've thought about building such a thing, but never got around to it. You've got the right idea, which is the number of significant bits that match. I agree that if one or both of the operands are NaN, the result should be 0. And you've got unit tests, too, which I love to see! Need to add a test for (real.nan, real.nan).

I need to add a few more tests for denormals, too. There are so many corner cases!I need to know what license you pick for it - public domain?

Please note that I haven't finished with it. It works, but some of the comments are wrong! And some unit tests are missing, and there's some further optimisation/ cleanup to do. I just wanted to check whether people thought this was the correct approach before I spent any more time on it. I will post an update soon.

Aug 15 2005

I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, no language has ever got this right. -Don.

Well Walter seems gung-ho with it but please add in a difference test with feq (at least near 0 where it matters) so that, for example, sin(PI) and sin(-PI) and 0 all feq with each other. I think using a pure relative test is too narrow.

Aug 15 2005

In article <ddre2a$1vck$1 digitaldaemon.com>, Ben Hinkle says...I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, no language has ever got this right. -Don.

Well Walter seems gung-ho with it but please add in a difference test with feq (at least near 0 where it matters) so that, for example, sin(PI) and sin(-PI) and 0 all feq with each other. I think using a pure relative test is too narrow.

It's not a pure relative test. Subnormals will compare equal to zero. So it should work. But... I just checked, and was surprised to find that sin(PI) = 1p-64 ~ 1e-20. If you define real reducedsin(real x) { return sin(x-PI); } then reducedsin(PI) ==0. I would have thought that sin(n*PI) ==0 was an important identity to preserve. Could there be an out-by-1 bug in the argument reduction function used by the trig functions? This is exactly the sort of thing I'm hoping to uncover. -Don.

Aug 15 2005