## digitalmars.D - Submission: Floating point near-equality

• Don Clugston (165/165) Aug 14 2005 Submission: probably for std.math
• BCS (3/3) Aug 15 2005 #\$%@ you!!! ;-)
• Dave (3/3) Aug 15 2005 In article , Don Clugston says...
• Ben Hinkle (34/107) Aug 15 2005 Comments inline
• Ben Hinkle (28/46) Aug 15 2005 To be more concrete here's what I was thinking of:
• Don Clugston (35/42) Aug 15 2005 Yes, it's just like the isfinite(), isnan() functions in std.math.
• Ben Hinkle (14/66) Aug 15 2005 Do you mean "near one"? Near zero life gets very interesting. If I
• Don Clugston (29/45) Aug 15 2005 I did mean "near zero". You just need to go much smaller with 80-bit
• Ben Hinkle (25/51) Aug 16 2005 Ah ok. I didn't think you meant the underflow numbers as the the only th...
• Don Clugston (42/67) Aug 16 2005 Yes. There is so much precision available near zero, it's insane. It's
• Don Clugston (32/34) Aug 16 2005 IDEA: Do you think the same applies for very large numbers?
• Ben Hinkle (21/55) Aug 17 2005 Actually I like your original relative equality for large numbers. It's ...
• Shammah Chancellor (7/11) Aug 17 2005 What is the formula that defines this coordinate system? Why not use th...
• Don Clugston (15/34) Aug 17 2005 Yes, and that's exactly what my code does, but as Ben has pointed out,
• Ben Hinkle (16/48) Aug 17 2005 Another way to describe your metric is like the usual Euclidean metric o...
• Walter (7/7) Aug 15 2005 I think this is pretty cool. I've thought about building such a thing, b...
• Don Clugston (9/16) Aug 15 2005 I need to add a few more tests for denormals, too. There are so many cor...
• Ben Hinkle (4/8) Aug 15 2005 Well Walter seems gung-ho with it but please add in a difference test wi...
• Don Clugston (11/20) Aug 15 2005 It's not a pure relative test. Subnormals will compare equal to zero. So...
```Submission: probably for std.math
int precisionEquality(real x, real y)

"How equal" are two reals x and y?

Rationale:
The use of == to test floating point numbers for equality is somewhat dangerous,
because == is completely intolerant of round-off error.
For example, almost every unit test involving floating point numbers has an
assertion of floating-point equality; but operator == is usually far too strict.
Instead, some form of "close enough" test must be used. If not provided by the
language, programmers must write this test themselves; but unfortunately,
it's easy to write this test incorrectly.

In std.math and std.math2 there are two floating point closeness tests: feq()
and mfeq(). Ultimately these are:
return fabs(x - y) <= precision;
where precision is (say) 0.000001
This is incorrect: for very small numbers, this is a very weak assertion
eg mfeq(1e-30, 7.8e-300) returns true!
but for large numbers, eg x=1e60, it is an *extremely* strong assertion.

The "correct" way to do it, according to Knuth, is to combine a ratio
test with a difference test. But, this is a bit slow, and it still has
the arbitrary magic precision number ("0.000001" in the code above).

An interesting idea by Bruce Dawson
(http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm)
is to calculate the number of representable IEEE reals which lie between x and
y.
Unfortunately..
* the answer depends on whether the numbers are float, double, or real.
* the resulting value gets _very_ large if x and y are far apart...
* ... and with an IEEE extended float, it won't even fit in a ulong!
* it's not very intuitive.

IMHO, what you actually want is a function which tells you: "to how many
significant figures (bits) is x==y?"

The function below works on 80-bit reals. It is very fast (with possibilities
for further optimisation), and it provides sensible results for any input,
including subnormals, infinities, and NaNs. It could easily be converted for
reals with different precision, (you'd just need to know where the exponent
lies).

For example, this function ensures that there are at most 4 bits of round-off
error:

int feq(real x, real y)
{
return precisionEquality(x, y, real.mant_dig-4);
}

As well as providing a 'closeness' test, the function allows you to evaluate the
accuracy of a calculation. More on this in a future post.

writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI));

I believe that some equivalent to this function belongs in std.math. It would
be used in almost every unit test involving floating point numbers!

Some design choices which are open to debate:
* the name. precisionEquality() is the best I've come up with so far.
* behaviour with NaNs. Like ==,  the version below states that two NaNs are
completely unequal. But it would also be reasonable to state that they are equal
to 64 bits!
* My expansion of the notion of "significant figures" to negative values is
unconventional.

This is my first D contribution, my style may be inadequate.
I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, no
language has ever got this right.

-Don.

================================================

import std.math2; // for abs
import std.math; // for isNan

/*
int precisionEquality(real x, real y)

"How equal" are two reals x and y?
Returns the number of mantissa bits which are equal in x and y;
this is similar to the concept of "significant figures".
For example, 0x1.F8p+60 and 0x1.F1p+60 are equal to 5 bits of precision.
If x==y, then precisionEquality(a,b) == real.mant_dig.

If x and y differ by a factor of two or more, then they have no
bits in common, so this function returns a value which is <=0.
Returns 0 if they differ by a factor of >=2 but < 4,
returns 1 for factors >=4 but <8, etc.

If the numbers have opposite sign, the difference in orders
of magnitude is based on the IEEE binary encoding.
For example, -1 and +1 differ by half the number line.

Pretend the numbers were fixed point. In the IEEE extended format,
there are about 32768 digits in front of the decimal point, and
the same number behind. But only the first 64 bits after the first '1'
bit are recorded.

If the exponents are different, then the return value
is negative, and gives the number of (binary) orders of
magnitude of difference between a and b.
*/
int precisionEquality(real a, real b)
{
real diff = a-b;
ushort *pa = cast(ushort *)(&a);
ushort *pb = cast(ushort *)(&b);
ushort *pd = cast(ushort *)(&diff);

// The difference in exponent between 'a' and 'a-b'
// is equal to the number of mantissa bits of a which are
// equal to b. If the difference is negative, then a and b
// have different exponents. If it is positive, then a and b
// are equal to that number of decimal places.
// AND with 0x7FFF to form the absolute value.
// Subtract 1 so that we round downwards.
int bitsdiff = ( ((pa[4]&0x7FFF) + (pb[4]&0x7FFF)-1)>>1) - (pd[4]&0x7FFF);

if ((pd[4]&0x7FFF) == 0) { // Difference is zero or denormal
if (a==b) return real.mant_dig;
// 'diff' is denormal, so the number of equal bits is higher;
// we need to add the number of zeros that lie at the start of
// its mantissa. We do this by multiplying by 2^real.mant_dig
diff*=0x1p+63;
return bitsdiff + real.mant_dig-(pd[4]&0x7FFF);
}

if (bitsdiff>0) return bitsdiff+1;

// Check for NAN or Infinity.
if (pd[4]&0x7FFF==0x7FFF) { // Result is NAN or INF
if (a==b) { 			 // both were +INF or both were -INF.
return real.mant_dig;
}
if (isnan(diff)) {
return -65535;  // at least one was a NAN.
}
// fall through for comparisons of INF with large reals.
}
// Return the negative of the absolute value of
// the difference in exponents.

if ((pa[4]^pb[4])&0x8000) { // do they have opposite signs?
// Convert the 'offset' exponent format to twos-complement,
// then do a normal subtraction.
return 1-abs(pa[4]-(0x8000-pb[4]));
}
return 1-abs(pa[4]-pb[4]);
}

unittest {
// Exact equality
assert(precisionEquality(real.max,real.max)==real.mant_dig);
assert(precisionEquality(0,0)==real.mant_dig);
assert(precisionEquality(7.1824,7.1824)==real.mant_dig);
assert(precisionEquality(real.infinity,real.infinity)==real.mant_dig);

// one bit off exact equality
assert(precisionEquality(1+real.epsilon,1)==real.mant_dig-1);
assert(precisionEquality(1,1+real.epsilon)==real.mant_dig-1);
// one bit below. Must round off correctly when exponents are different.
assert(precisionEquality(1,1-real.epsilon)==real.mant_dig-1);
assert(precisionEquality(1-real.epsilon,1)==real.mant_dig-1);
assert(precisionEquality(-1-real.epsilon,-1)==real.mant_dig-1);
assert(precisionEquality(-1+real.epsilon,-1)==real.mant_dig-1);

// Numbers that are close
assert(precisionEquality(0x1.Bp+5, 0x1.B8p+5)==5);
assert(precisionEquality(0x1.8p+10, 0x1.Cp+10)==2);
assert(precisionEquality(1, 1.1249)==4);
assert(precisionEquality(1.5, 1)==1);

// Extreme inequality
assert(precisionEquality(real.nan,0)==-65535);
assert(precisionEquality(0,-real.nan)==-65535);
assert(precisionEquality(real.nan,real.infinity)==-65535);
assert(precisionEquality(real.infinity,-real.infinity)==-65533);
assert(precisionEquality(-real.max,real.infinity)==-65532);
assert(precisionEquality(-real.max,real.max)==-65531);

// Numbers that are half the number line apart
// Note that (real.max_exp-real.min_exp) = 32765
assert(precisionEquality(-real.max,0)==-32765);
assert(precisionEquality(-1,1)==-32765);
assert(precisionEquality(real.min,1)==-16381);

// Numbers that differ by one order of magnitude
assert(precisionEquality(real.max,real.infinity)==0);
assert(precisionEquality(1, 2)==0);
assert(precisionEquality(2, 1)==0);
assert(precisionEquality(1.5*(1-real.epsilon), 1)==2);
assert(precisionEquality(4*(1-real.epsilon), 1)==0);
assert(precisionEquality(4, 1)==-1);
}
```
Aug 14 2005
```#\$%  you!!!   ;-)

I was going to post on this EXACT subject today!! :)

I was thinking of making it an operator, maybe "~~".
```
Aug 15 2005
```In article <ddp3eh\$b3\$1 digitaldaemon.com>, Don Clugston says...

I'm no mathematician, but your proposal seems very reasonable and would be nice
to incorporate into an operator of some kind too.
```
Aug 15 2005
"Ben Hinkle" <bhinkle mathworks.com> writes:
```Comments inline

"Don Clugston" <Don_member pathlink.com> wrote in message
news:ddp3eh\$b3\$1 digitaldaemon.com...
Submission: probably for std.math
int precisionEquality(real x, real y)

"How equal" are two reals x and y?

Rationale:
The use of == to test floating point numbers for equality is somewhat
dangerous,
because == is completely intolerant of round-off error.
For example, almost every unit test involving floating point numbers has
an
assertion of floating-point equality; but operator == is usually far too
strict.
Instead, some form of "close enough" test must be used. If not provided by
the
language, programmers must write this test themselves; but unfortunately,
it's easy to write this test incorrectly.

Agreed - though the notion of 'close enough' depends on the situation. I
really short thread in the archives with Walter's opinion:
http://www.digitalmars.com/d/archives/digitalmars/D/6397.html

In std.math and std.math2 there are two floating point closeness tests:
feq()
and mfeq(). Ultimately these are:
return fabs(x - y) <= precision;
where precision is (say) 0.000001
This is incorrect: for very small numbers, this is a very weak assertion
eg mfeq(1e-30, 7.8e-300) returns true!
but for large numbers, eg x=1e60, it is an *extremely* strong assertion.

The "correct" way to do it, according to Knuth, is to combine a ratio
test with a difference test. But, this is a bit slow, and it still has
the arbitrary magic precision number ("0.000001" in the code above).

It'll be hard to avoid magic numbers. Even a statement of "4 bits of
precision" is a magic number - just in binary. I actually wouldn't mind
being up-front about the fact that there are two notions of equality
absolute and relative and have the test take two parameters (the relative
and absute tolerances) and go with those. I'm personally not worried about
the performance of a division or multiplication when doing a floating pt
test - if that's the performance hit you're worried about.

An interesting idea by Bruce Dawson
(http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm)
is to calculate the number of representable IEEE reals which lie between x
and
y.
Unfortunately..
* the answer depends on whether the numbers are float, double, or real.
* the resulting value gets _very_ large if x and y are far apart...
* ... and with an IEEE extended float, it won't even fit in a ulong!
* it's not very intuitive.

IMHO, what you actually want is a function which tells you: "to how many
significant figures (bits) is x==y?"

With special treatment for 0? or around 0?

The function below works on 80-bit reals. It is very fast (with
possibilities
for further optimisation), and it provides sensible results for any input,
including subnormals, infinities, and NaNs. It could easily be converted
for
reals with different precision, (you'd just need to know where the
exponent lies).

It's actually very useful to have equality tests that know about float and
double. That way the promotion to real doesn't introduce extra "precision"
feq(x,y) when x and y are doubles instead of something like
feq(x,y,double.mant_dig-4)

For example, this function ensures that there are at most 4 bits of
round-off
error:

int feq(real x, real y)
{
return precisionEquality(x, y, real.mant_dig-4);
}

The precisionEquality function below doesn't take 3 inputs. Did you mean
something like
return precisionEquality(x,y) >= real.mant_dig-4
Also glancing over the code does it assume little-endian architecture? I see
the ushort* used to grab different parts of the real.

As well as providing a 'closeness' test, the function allows you to
evaluate the
accuracy of a calculation. More on this in a future post.

writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI));

I believe that some equivalent to this function belongs in std.math. It
would
be used in almost every unit test involving floating point numbers!

Some design choices which are open to debate:
* the name. precisionEquality() is the best I've come up with so far.

yeah - that name is pretty long. 'feq' would be the best IMHO.

* behaviour with NaNs. Like ==,  the version below states that two NaNs
are
completely unequal. But it would also be reasonable to state that they are
equal
to 64 bits!

NaN's should always be unequal. A function 'fis' could be defined that
returned true if both are NaN.

* My expansion of the notion of "significant figures" to negative values
is unconventional.

I'm not sure what you are referring to.

This is my first D contribution, my style may be inadequate.
I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK,
no
language has ever got this right.

-Don.

It would be nice to combine that with an abstol parameter near 0. That way a
test like feq(0,double.epsilon) will be true. I could see something like feq
checking if one of the two values is exactly 0 and doing absolute checking
there - or just doing a straightforward abs/rel check.

```
Aug 15 2005
"Ben Hinkle" <bhinkle mathworks.com> writes:
``` In std.math and std.math2 there are two floating point closeness tests:
feq() and mfeq(). Ultimately these are:
return fabs(x - y) <= precision;
where precision is (say) 0.000001
This is incorrect: for very small numbers, this is a very weak assertion
eg mfeq(1e-30, 7.8e-300) returns true!
but for large numbers, eg x=1e60, it is an *extremely* strong assertion.

The "correct" way to do it, according to Knuth, is to combine a ratio
test with a difference test. But, this is a bit slow, and it still has
the arbitrary magic precision number ("0.000001" in the code above).

It'll be hard to avoid magic numbers. Even a statement of "4 bits of
precision" is a magic number - just in binary. I actually wouldn't mind
being up-front about the fact that there are two notions of equality
absolute and relative and have the test take two parameters (the relative
and absute tolerances) and go with those. I'm personally not worried about
the performance of a division or multiplication when doing a floating pt
test - if that's the performance hit you're worried about.

To be more concrete here's what I was thinking of:

const float TinyFloat = 1e-4f;
const double TinyDouble = 1e-8;
const real TinyReal = 1e-10L; // depends on real.epsilon
int feq(real x, real y, real reltol = TinyReal, real abstol = TinyReal) {
real diff = fabs(x-y);
if (diff <= abstol)
return 1;
real fx = fabs(x);
real fy = fabs(y);
real scale = fx>fy ? fx : fy;
return diff/scale <= reltol;
}
int feq(double x, double y, double reltol = TinyDouble, double abstol =
TinyDouble) {
return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol);
}
int feq(float x, float y, float reltol = TinyFloat, float abstol =
TinyFloat) {
return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol);
}

That way if you know you'll be dealing with very small numbers (less than
TinyReal, for example) then one can pass 0 for the abstol (and let the
optimzer remove any dead code) and only use the relative equality tests. If
one wants a pure absolute test (suppose one knows the numbers will always be
near 1) then pass 0 for reltol and let the optimzer remove that code. For
"normal usage", though, numbers that are "close" either in absolute or
relative terms will test as equal.
```
Aug 15 2005
```From your first post:
Also glancing over the code does it assume little-endian architecture? I see
the ushort* used to grab different parts of the real.

Yes, it's just like the isfinite(), isnan() functions in std.math.
It's also only valid for 80-bit reals.
I'm just happy that I managed to do it without reverting to asm!

It'll be hard to avoid magic numbers. Even a statement of "4 bits of
precision" is a magic number - just in binary. I actually wouldn't mind
being up-front about the fact that there are two notions of equality
absolute and relative and have the test take two parameters (the relative
and absute tolerances) and go with those.

The point is that in IEEE arithmetic, the relative and absolute tolerances are
related via the number of bits of precision. Near zero, "bits of precision"
looks like an absolute tolerance. Away from zero, it looks like relative
tolerance. And there's a region in between where it's really difficult to think
in terms of absolute or relative tolerance, they kind of blur together.

Note that there are many uses beyond simple "equal-enough" tests.
This function allows you to say: At what value of x are f(x) and g(x) most
different? eg when is exp(ln(x)) most different to x?
You can just try a range of different x values, and find the one where
precisionEquality(exp(ln(x)), x) is smallest. You don't need to worry about
whether you should be using absolute or relative tolerance, that's already taken
care of. So, if writing a function, you can find where the precision is least
adequate. Once you've done the best you can, you can put that into a unit test.

This gives you a simple way of reporting the accuracy of library functions and
identities, and for performing more powerful "second-order" unit tests.
(eg, acos(cos(x)) must be close to x for all -PI/2 <= x <= PI/2).
Then, if you know that cos(x) is accurate, you've proved that acos(x) is also
accurate. My goal is to provide one-line unit tests for most of the math
functions.

You're right: for equality tests, it probably is necessary to have seperate
implementations for double and float, because subnormals become normal when
moved to higher precision. It's only a problem if your intermediate calculations
were done in double or float precision, instead of reals.
What is true, though, is that
if precisionEquality(real x, real y) >= double.mant.dig
then cast(double)x == cast(double)y.
The converse does not apply because of the aforementioned subnormals. (But in
this case you kind of "got lucky").

I did the real case first because it is the most difficult. It's trivial to
extend it to other precisions.

-Don.
```
Aug 15 2005
"Ben Hinkle" <ben.hinkle gmail.com> writes:
``` It'll be hard to avoid magic numbers. Even a statement of "4 bits of
precision" is a magic number - just in binary. I actually wouldn't mind
being up-front about the fact that there are two notions of equality
absolute and relative and have the test take two parameters (the
relative
and absute tolerances) and go with those.

The point is that in IEEE arithmetic, the relative and absolute tolerances
are
related via the number of bits of precision.
Near zero, "bits of precision"
looks like an absolute tolerance. Away from zero, it looks like relative
tolerance.

Do you mean "near one"? Near zero life gets very interesting. If I
understand your algorithm correctly the number 1e-200 is still very far away
from 0 or even 1e-201 or -1e-200. Then again I might not understand the
algorithm.

And there's a region in between where it's really difficult to think
in terms of absolute or relative tolerance, they kind of blur together.

Agreed (I think). Over number that aren't "too small or too large" the
absolute and relative measure pretty much the same thing. The distinction
only gets important at the extremes.

Note that there are many uses beyond simple "equal-enough" tests.
This function allows you to say: At what value of x are f(x) and g(x) most
different? eg when is exp(ln(x)) most different to x?
You can just try a range of different x values, and find the one where
precisionEquality(exp(ln(x)), x) is smallest. You don't need to worry
whether you should be using absolute or relative tolerance, that's already
taken
care of. So, if writing a function, you can find where the precision is
least
adequate. Once you've done the best you can, you can put that into a unit
test.

This gives you a simple way of reporting the accuracy of library functions
and
identities, and for performing more powerful "second-order" unit tests.
(eg, acos(cos(x)) must be close to x for all -PI/2 <= x <= PI/2).
Then, if you know that cos(x) is accurate, you've proved that acos(x) is
also
accurate. My goal is to provide one-line unit tests for most of the math
functions.

In another post I pointed out that sin(PI) (which I just actually tried and
got something around -1e-20) tests and having no digits of precision in
common with 0. So I think in general an absolute component to the notion of
equality will be important if you want trig functions to behave "normally".

You're right: for equality tests, it probably is necessary to have
seperate
implementations for double and float, because subnormals become normal
when
moved to higher precision. It's only a problem if your intermediate
calculations
were done in double or float precision, instead of reals.
What is true, though, is that
if precisionEquality(real x, real y) >= double.mant.dig
then cast(double)x == cast(double)y.
The converse does not apply because of the aforementioned subnormals. (But
in
this case you kind of "got lucky").

Yup. I have no problem with doing things with real - I think that's the way
to go. I'm just suggesting adding some overloads to make the common case of
double and float easier to use.

I did the real case first because it is the most difficult. It's trivial
to
extend it to other precisions.

-Don.

```
Aug 15 2005
Don Clugston <dac nospam.com.au> writes:
```Ben Hinkle wrote:
The point is that in IEEE arithmetic, the relative and absolute tolerances
are
related via the number of bits of precision.
Near zero, "bits of precision"
looks like an absolute tolerance. Away from zero, it looks like relative
tolerance.

Do you mean "near one"? Near zero life gets very interesting. If I
understand your algorithm correctly the number 1e-200 is still very far away
from 0 or even 1e-201 or -1e-200. Then again I might not understand the
algorithm.

I did mean "near zero". You just need to go much smaller with 80-bit
reals! real.min = 3.3e-4932 !!
And the smallest representable number is real.min * real.epsilon = 3.3
e-4951. It's just amazing how much range you have with long doubles.
So 1e-200 is a very poor estimation to 0; there are *billions* of better
IEEE reals!
My algorithm only gives 4e-4932 as close to 0 (and it's also close to
-4e-4932). It measures closeness in "representability space".

You could reasonably argue that 1e-4900 is close to 1e-4930.
But in representability space they are no closer than 1 vs 1e30. The
question then is, is representability space what we want?
(I suspect the answer is: sometimes, but not always. Maybe about half
the time?)

In another post I pointed out that sin(PI) (which I just actually tried and
got something around -1e-20) tests and having no digits of precision in
common with 0. So I think in general an absolute component to the notion of
equality will be important if you want trig functions to behave "normally".

Ah. I'm learning a lot here.
1e-20 is normally an exceedingly poor approximation to 0. But, since
sin(x) maps "linearly" to the interval (-1, 1), 1e-20 is close to 0 as
sin(x) will get.

Or alternatively, sin(PI) is small but not close to 0, because PI is not
close enough to the mathematical pi.

sin(n*PI) ==0 is not an identity which is preserved!
But sin(x)*sin(x) + cos(x)*cos(x) == 1 seems to be preserved for all x.

Yet there are clearly many contexts when you want to treat sin(PI) as
zero. The relevant quantity seems to be the difference divided by the
range. For trig functions (sin + cos, at least), it seems you only want
an absolute difference, regardless of how close to 0 you are.

A few weeks with D, and I've already learned more about numerical
analysis than years with C++ :-)

-Don.
```
Aug 15 2005
"Ben Hinkle" <ben.hinkle gmail.com> writes:
```"Don Clugston" <dac nospam.com.au> wrote in message
news:ddrvuo\$2e0g\$1 digitaldaemon.com...
Ben Hinkle wrote:
The point is that in IEEE arithmetic, the relative and absolute
tolerances are
related via the number of bits of precision.
Near zero, "bits of precision"
looks like an absolute tolerance. Away from zero, it looks like relative
tolerance.

Do you mean "near one"? Near zero life gets very interesting. If I
understand your algorithm correctly the number 1e-200 is still very far
away from 0 or even 1e-201 or -1e-200. Then again I might not understand
the algorithm.

I did mean "near zero". You just need to go much smaller with 80-bit
reals! real.min = 3.3e-4932 !!
And the smallest representable number is real.min * real.epsilon = 3.3
e-4951. It's just amazing how much range you have with long doubles.
So 1e-200 is a very poor estimation to 0; there are *billions* of better
IEEE reals!
My algorithm only gives 4e-4932 as close to 0 (and it's also close
to -4e-4932). It measures closeness in "representability space".

Ah ok. I didn't think you meant the underflow numbers as the the only things
near 0. To me that's waaaay too near (typically). I'm talking just about
normalized (ie "proper") numbers. In regular everyday
use the concept of how close to zero is "close" is context dependent. When
you have a simple number like x = 1+eps (which is very close to 1 in
anyone's book) and then you ask "is x-1 near zero?" the answer depends on if
you consider eps to be near zero. By your measurement just about all the
time nothing will be near zero except zero (in particular eps will be
extremely far away). Or to put it another way a given application might
consider 3 and 3.001 to be close and 2 and 2.001 close and 1 and 1.001 close
but as soon as they try to say 0 and .001 are close your test would say they
are infinitely far apart.
In my experience a rough measurement of near zero is
abs(x)<sqrt(eps). That's why people need to be able to control both the
relative tolerance (ie - bits of precision in agreement) and the absolute
tolerance (ie - distance to zero).

You could reasonably argue that 1e-4900 is close to 1e-4930.
But in representability space they are no closer than 1 vs 1e30. The
question then is, is representability space what we want?
(I suspect the answer is: sometimes, but not always. Maybe about half the
time?)

You mean in your measurement they are not close. In typical numerical apps
that manipulate numbers near 1 a value of 1e-4900 is definetly near zero. It
all depends on the scale of the problem being solved. The numerical
applications I can think of in fact force the user to choose what they mean
when they say "are these roughly equal". There isn't just one notion of
equal or approximate so maybe the thing to do is make feqabs and feqrel. I'd
say your precision test would make a great feqrel.
```
Aug 16 2005
Don Clugston <dac nospam.com.au> writes:
```Ben Hinkle wrote:
you consider eps to be near zero. By your measurement just about all the
time nothing will be near zero except zero (in particular eps will be
extremely far away). Or to put it another way a given application might
consider 3 and 3.001 to be close and 2 and 2.001 close and 1 and 1.001 close
but as soon as they try to say 0 and .001 are close your test would say they
are infinitely far apart.

Yes. There is so much precision available near zero, it's insane. It's
clear that with IEEE extendeds, you'll almost never underflow to zero.

In my experience a rough measurement of near zero is
abs(x)<sqrt(eps). That's why people need to be able to control both the
relative tolerance (ie - bits of precision in agreement) and the absolute
tolerance (ie - distance to zero).

Agreed.

You could reasonably argue that 1e-4900 is close to 1e-4930.
But in representability space they are no closer than 1 vs 1e30. The
question then is, is representability space what we want?
(I suspect the answer is: sometimes, but not always. Maybe about half the
time?)

You mean in your measurement they are not close. In typical numerical apps
that manipulate numbers near 1 a value of 1e-4900 is definetly near zero.

Well, it is a genuine feature of the represention. I think that almost
all of the time, having 64 bits available at 1e-4900 is not much use.
They are definitely less useful than having 64 bits near 1!

It
all depends on the scale of the problem being solved. The numerical
applications I can think of in fact force the user to choose what they mean
when they say "are these roughly equal". There isn't just one notion of
equal or approximate so maybe the thing to do is make feqabs and feqrel. I'd
say your precision test would make a great feqrel.

I concede -- you're right.
There are actually *three* 'close enough tests, not two:
(1) relative error
(2) absolute error near zero, to cope with subnormals/limited precision
of FP numbers
(3) absolute error near zero, which is application-specific.

My code deals with (1) and (2) which are intimately related via the
machine representation.
But test (3) is important. As you say, it is going to be very rare that
the difference between 1e-4900 and 1e-4930 is significant.
It seems like a difficult problem in general, though.

Consider the case you mentioned earlier, sin(PI).
Consider writing a unit test that sin(n*PI)==0 for all n.

sin(0) = 0
sin(PI) = -5e-20
sin(PI*1e5) = 3.7e-15
sin(PI*1e10) = 1.4 e-9
sin(PI*1e15) = 6.7 e-5
sin(PI*1e18) = 0.041
sin(PI*1e20) = 3.14159 e+20    ?????? Hmmm.
also cos(PI*1e20) = 3.14159 e+20
sin(infinity) = nan.

It appears to be impossible to come up with an absolute tolerance
for what "close enough" means for all these cases; clearly it
depends on the tolerance of the input. Can this be codified somehow?

I think you'd hope that the following should be true for all x, n:
-1 <= sin(x) <= 1
sin(x)*sin(x) + cos(x)*cos(x) = 1
sin (n*PI) is closer to zero
than sin(n*PI+x.epsilon) and sin(n*PI-x.epsilon), if
n*PI + PI >= n*PI * (1+ 4*epsilon)
(ie, sin(x) crosses zero between sin(n*PI) and sin(n*PI +- epsilon) ).

Anyway, sin(PI*1e20) fails any test I can think of. I think it should
return nan.

-Don.
```
Aug 16 2005
Don Clugston <dac nospam.com.au> writes:
```Ben Hinkle wrote:
In typical numerical apps that manipulate numbers near 1 a value of

1e-4900 is definetly near zero.
It all depends on the scale of the problem being solved.

IDEA: Do you think the same applies for very large numbers?
For example, I think 1e4900 would often be acceptably close to 1e4930,
and both are reasonable approximations to infinity (and in fact they'll
be equal to infinity if you later truncate it to double or float precision).

The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

If this is the case, then actually you don't want an absolute tolerance:
you want a comparison of the exponent bits.

So, the appropriate algorithm is something like:
if it is near 1, then compare the mantissa bits. Result: 0 to mant_dig.
if it is near 0 or infinity, compare the exponent bits (4930 vs 4900).

This would mean that 0 and real.min would be equal to 16 bits, even
before taking the mantissa into account.
But you'd only start counting from the first set bit in the exponent.
eg. 0x1.0p+1 and 1.1111p+1 would have only one bit in common from the
exponent + 1 bit from the mantissa = 2 bits in total.
But 0x1.00p+0x3FFF (6e4931) and ( 1.1111p+0x3F00 (1e4850) would have
6bits in common from the exponent + 0 from the mantissa.

Q. Is there any way to combine these two?
You might hope that
precisionEquality(x, y) >=  double.something
if and only if cast(double)x == cast(double)y;
and likewise for float.
This means that double.min would have to have as much closeness to 0 as
1+double.epsilon does to 1.
I think that when you add the requirements for float, it becomes
mathematically impossible. But it might lead in the right direction.

-Don.
```
Aug 16 2005
"Ben Hinkle" <ben.hinkle gmail.com> writes:
```"Don Clugston" <dac nospam.com.au> wrote in message
news:ddudr8\$1nqo\$1 digitaldaemon.com...
Ben Hinkle wrote:
In typical numerical apps that manipulate numbers near 1 a value of

1e-4900 is definetly near zero.
It all depends on the scale of the problem being solved.

IDEA: Do you think the same applies for very large numbers?
For example, I think 1e4900 would often be acceptably close to 1e4930,
and both are reasonable approximations to infinity (and in fact they'll be
equal to infinity if you later truncate it to double or float precision).

The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

Actually I like your original relative equality for large numbers. It's only
at zero that relative inequality diverges from people's expectations of the
real line (which is how one thinks of floating point numbers casually). Now
if the real line were a real circle instead (connect -inf to +inf) then I'd
say numbers near +-inf should be treated like near zero.

If this is the case, then actually you don't want an absolute tolerance:
you want a comparison of the exponent bits.

So, the appropriate algorithm is something like:
if it is near 1, then compare the mantissa bits. Result: 0 to mant_dig.
if it is near 0 or infinity, compare the exponent bits (4930 vs 4900).

This would mean that 0 and real.min would be equal to 16 bits, even before
taking the mantissa into account.
But you'd only start counting from the first set bit in the exponent.
eg. 0x1.0p+1 and 1.1111p+1 would have only one bit in common from the
exponent + 1 bit from the mantissa = 2 bits in total.
But 0x1.00p+0x3FFF (6e4931) and ( 1.1111p+0x3F00 (1e4850) would have 6bits
in common from the exponent + 0 from the mantissa.

Q. Is there any way to combine these two?
You might hope that
precisionEquality(x, y) >=  double.something
if and only if cast(double)x == cast(double)y;
and likewise for float.
This means that double.min would have to have as much closeness to 0 as
1+double.epsilon does to 1.
I think that when you add the requirements for float, it becomes
mathematically impossible. But it might lead in the right direction.

-Don.

I still think the way to go is either
1) have one function with two knobs - one for abs tol and one for rel tol or
2) have two functions each with one knob - one for abs tol and one for rel
tol
The function I posted originally was feq with two knobs but now I'm thinking
its probably a better idea to have two functions. With one it's easy to
forget which is the abstol and which is the reltol since D doesn't have
named optional parameters. So I'd prefer two functions feqrel and feqabs.
The name feqrel tells the user that essentially nothing is close to zero
since a relative tolerance is used (I don't mind that feqrel doesn't
actually use a relative tolerance for denormalized numbers - it might just
be easier to say underflow values all are approx 0 but who knows). The name
feqabs tells users not to use it for large numbers and that it matches
regular closeness notions for numbers near zero.
```
Aug 17 2005
```In article <ddudr8\$1nqo\$1 digitaldaemon.com>, Don Clugston says...
The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

What is the formula that defines this coordinate system?  Why not use this to
redefine the numbers in
terms of this system and then take their distance appart and compare that to an
absolute tolerance for
approximately equal.  That would scale correctly.

-Sha
```
Aug 17 2005
Don Clugston <dac nospam.com.au> writes:
```Shammah Chancellor wrote:
In article <ddudr8\$1nqo\$1 digitaldaemon.com>, Don Clugston says...

The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

What is the formula that defines this coordinate system?  Why not use this to
redefine the numbers in
terms of this system and then take their distance appart and compare that to an
absolute tolerance for
approximately equal.  That would scale correctly.

-Sha

Yes, and that's exactly what my code does, but as Ben has pointed out,
the internal representation frequently doesn't match your application.
Specifically, IEEE reals are fantastic at distinguishing really small
numbers, but you often don't care whether a number is 1e-2000 or
1e-4000: it's just "really really small".
I think the problem only really comes up when comparing numbers to zero,
or possibly to infinity.
Even 0.000000000000000000001 is very close to 1, and very very far from
0 on the IEEE number line.
Actually comparison with 0 is complicated, because sometimes you demand
exact equality, and sometimes 'tiny' is acceptable. Also sometimes sign
matters (a tiny positive number might be completely different to a
positive negative number).

-Don.
```
Aug 17 2005
"Ben Hinkle" <ben.hinkle gmail.com> writes:
```"Don Clugston" <dac nospam.com.au> wrote in message
news:de0ms5\$o94\$1 digitaldaemon.com...
Shammah Chancellor wrote:
In article <ddudr8\$1nqo\$1 digitaldaemon.com>, Don Clugston says...

The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

What is the formula that defines this coordinate system?  Why not use
this to
redefine the numbers in terms of this system and then take their distance
appart and compare that to an
absolute tolerance for approximately equal.  That would scale
orrectly.  -Sha

Yes, and that's exactly what my code does, but as Ben has pointed out, the
internal representation frequently doesn't match your application.
Specifically, IEEE reals are fantastic at distinguishing really small
numbers, but you often don't care whether a number is 1e-2000 or 1e-4000:
it's just "really really small".
I think the problem only really comes up when comparing numbers to zero,
or possibly to infinity.
Even 0.000000000000000000001 is very close to 1, and very very far from 0
on the IEEE number line.
Actually comparison with 0 is complicated, because sometimes you demand
exact equality, and sometimes 'tiny' is acceptable. Also sometimes sign
matters (a tiny positive number might be completely different to a
positive negative number).

-Don.

Another way to describe your metric is like the usual Euclidean metric on
the following nubmer lines:

4           2           1          1/2         1/4
---|-----------|-----------|-----------|-----------|----

subnormal     0    -subnormal
--|--|--|--|--|--|--|--|--|--

-1/4         -1/2         -1         -2          -4
---|-----------|-----------|-----------|-----------|----

For example if two numbers like 2.001 and 2.002 are d distance apart then
4.002 and 4.004 are also d distance apart because they have the same
mantissas but just differ in the exponents. That's also why essentially
nothing is close to 0 and why the positive and negative axes are "not
comparable". It's also why 0 is so special.
That is, the formula for the coordinate system is log2(x).
```
Aug 17 2005
"Walter" <newshound digitalmars.com> writes:
```I think this is pretty cool. I've thought about building such a thing, but
never got around to it. You've got the right idea, which is the number of
significant bits that match.

I agree that if one or both of the operands are NaN, the result should be 0.

And you've got unit tests, too, which I love to see! Need to add a test for
(real.nan, real.nan).

I need to know what license you pick for it - public domain?
```
Aug 15 2005
```In article <ddr5og\$1phb\$1 digitaldaemon.com>, Walter says...
I think this is pretty cool. I've thought about building such a thing, but
never got around to it. You've got the right idea, which is the number of
significant bits that match.

I agree that if one or both of the operands are NaN, the result should be 0.

And you've got unit tests, too, which I love to see! Need to add a test for
(real.nan, real.nan).

I need to add a few more tests for denormals, too. There are so many corner
cases!

I need to know what license you pick for it - public domain?

Public domain.

Please note that I haven't finished with it. It works, but some of the comments
are wrong! And some unit tests are missing, and there's some further
optimisation/ cleanup to do. I just wanted to check whether people thought this
was the correct approach before I spent any more time on it.
I will post an update soon.
```
Aug 15 2005
"Ben Hinkle" <ben.hinkle gmail.com> writes:
``` I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK,
no
language has ever got this right.

-Don.

Well Walter seems gung-ho with it but please add in a difference test with
feq (at least near 0 where it matters) so that, for example, sin(PI) and
sin(-PI) and 0 all feq with each other. I think using a pure relative test
is too narrow.
```
Aug 15 2005
```In article <ddre2a\$1vck\$1 digitaldaemon.com>, Ben Hinkle says...
I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK,
no
language has ever got this right.

-Don.

Well Walter seems gung-ho with it but please add in a difference test with
feq (at least near 0 where it matters) so that, for example, sin(PI) and
sin(-PI) and 0 all feq with each other. I think using a pure relative test
is too narrow.

It's not a pure relative test. Subnormals will compare equal to zero. So it
should work. But...
I just checked, and was surprised to find that sin(PI) = 1p-64 ~ 1e-20.
If you define
real reducedsin(real x) { return sin(x-PI); }
then reducedsin(PI) ==0.

I would have thought that sin(n*PI) ==0 was an important identity to preserve.
Could there be an out-by-1 bug in the argument reduction function used by the
trig functions? This is exactly the sort of thing I'm hoping to uncover.

-Don.
```
Aug 15 2005