digitalmars.D - Sorting floating-point values, and NaN

Vladimir Panteleev (16/16) Nov 11 2013 double[] arr = [1, 2, 3, double.nan, 1, 2];

qznc (10/27) Nov 11 2013 You cannot sort NaNs. A comparison with NaN must always evaluate

tn (11/40) Nov 12 2013 But that is exactly what it does.

tn (9/52) Nov 12 2013 Actually, you could also use "!>=" instead of "<" so that the

Walter Bright (4/6) Nov 12 2013 That's because they just don't translate to opCmp calls. At one point I ...

tn (21/27) Nov 12 2013 Well, they seem to translate to something, because this works as

Jonathan M Davis (5/8) Nov 12 2013 As I understand it, they only work with the built-in floating point type...

Andrei Alexandrescu (4/43) Nov 12 2013 I think NaNs are singular unordered values that make invalid inputs for

Vladimir Panteleev (5/8) Nov 12 2013 Considering their effect on the program state, NaNs can be seen

Andrei Alexandrescu (11/16) Nov 12 2013 I should know, since I dedicated a chapter to the topic in TDPL :o).

tn (7/13) Nov 12 2013 But sort and isSorted both allow user to provide a custom "less"

Andrei Alexandrescu (7/18) Nov 12 2013 I think this is a misunderstanding. I never advocated inserting

Jonathan M Davis (8/28) Nov 12 2013 The check could be special-cased for floating point values so that it ta...

Vladimir Panteleev (21/55) Nov 12 2013 Here is the problem:

Andrei Alexandrescu (16/29) Nov 12 2013 They are still singular values. Code has no business comparing them

Vladimir Panteleev (30/50) Nov 12 2013 Both of the above hinge on the relative obscurity of NaNs and the

Bigsandwich (7/13) Nov 12 2013 Please, please, please just no. As someone who works with

Walter Bright (6/13) Nov 12 2013 I think it's apropos to reference this famous document:

Walter Bright (7/11) Nov 12 2013 NaNs are valid values for floating point numbers. Trying to pretend they...

Vladimir Panteleev (37/52) Nov 12 2013 That doesn't change the reality that not everyone is aware of

Andrei Alexandrescu (15/47) Nov 12 2013 I agree. I'll just add that people who plan to get work done with

Walter Bright (17/19) Nov 12 2013 I want to add that NaN has been a reality on x86 machines since about 19...
Vladimir Panteleev (9/14) Nov 12 2013 The distinction between good and bad documentation in this

tn (23/35) Nov 12 2013 But 'a

Vladimir Panteleev (10/34) Nov 12 2013 Whoops, you're right. I meant transitivity of the negated

tn (7/19) Nov 12 2013 Yes, I was expecting this. :)

Andrei Alexandrescu (21/66) Nov 12 2013 This is a bit much.

Vladimir Panteleev (9/40) Nov 12 2013 (I partially disagree here - see my reply to Walter.)

Andrei Alexandrescu (12/25) Nov 12 2013 (... the disadvantage being b is evaluated even if a is false.

Vladimir Panteleev (7/12) Nov 12 2013 I've thought about this as well, but then there are cases like...

monarch_dodra (24/38) Nov 13 2013 I may be late to the party, but I think it might be a bit unfair

tn (12/29) Nov 12 2013 Shouldn't the implication be to the other direction? Then it

Andrei Alexandrescu (7/34) Nov 12 2013 It is in the other direction, but the latter doesn't compile :o). Again,...

tn (11/50) Nov 12 2013 If you want the isSorted function to always fail if the input

Vladimir Panteleev (4/10) Nov 12 2013 <= as in ≤ (less or equal), not ⇐ (reverse implies). <= can be

tn (4/16) Nov 12 2013 Thanks for the explanation. I thought it was pseudocode. That is

Vladimir Panteleev (3/18) Nov 13 2013 https://github.com/D-Programming-Language/phobos/pull/1691

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

double[] arr = [1, 2, 3, double.nan, 1, 2];
assert(arr.isSorted);
arr.sort();

This code will fail with an assert error, but not the one on the 
second line. Rather, it will fail inside std.range, when sort 
calls assumeSorted, which checks elements in an order different 
than sort and isSorted.

Here's a case where the odd way NaN interacts with generic code 
messes things up in an ugly way.

This is concerning. It's very easy to overlook this while writing 
an application, and it can become a hidden vulnerability.

We can't fix the operators... but, perhaps, we could define a 
safe default comparison predicate (e.g. std.algorithm.less) for 
use with sorting and related tasks? Aside from bitwise 
comparison, is it even possible to efficiently compare 
floating-point values in a way suitable for sorting?

Nov 11 2013

"qznc" <qznc web.de> writes:

On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir Panteleev 
wrote:
 double[] arr = [1, 2, 3, double.nan, 1, 2];
 assert(arr.isSorted);
 arr.sort();

 This code will fail with an assert error, but not the one on 
 the second line. Rather, it will fail inside std.range, when 
 sort calls assumeSorted, which checks elements in an order 
 different than sort and isSorted.

 Here's a case where the odd way NaN interacts with generic code 
 messes things up in an ugly way.

 This is concerning. It's very easy to overlook this while 
 writing an application, and it can become a hidden 
 vulnerability.

 We can't fix the operators... but, perhaps, we could define a 
 safe default comparison predicate (e.g. std.algorithm.less) for 
 use with sorting and related tasks? Aside from bitwise 
 comparison, is it even possible to efficiently compare 
 floating-point values in a way suitable for sorting?

You cannot sort NaNs. A comparison with NaN must always evaluate 
to false.

One could change isSorted to check for false instead of true, so 
it would accept NaN. That would probably break too much code, 
though.

   arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])

Apparently, isSorted contains an antisymmetry check, which is not 
triggered, because it relies on true results.

Nov 11 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 07:33:57 UTC, qznc wrote:
 On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir 
 Panteleev wrote:
 double[] arr = [1, 2, 3, double.nan, 1, 2];
 assert(arr.isSorted);
 arr.sort();

 This code will fail with an assert error, but not the one on 
 the second line. Rather, it will fail inside std.range, when 
 sort calls assumeSorted, which checks elements in an order 
 different than sort and isSorted.

 Here's a case where the odd way NaN interacts with generic 
 code messes things up in an ugly way.

 This is concerning. It's very easy to overlook this while 
 writing an application, and it can become a hidden 
 vulnerability.

 We can't fix the operators... but, perhaps, we could define a 
 safe default comparison predicate (e.g. std.algorithm.less) 
 for use with sorting and related tasks? Aside from bitwise 
 comparison, is it even possible to efficiently compare 
 floating-point values in a way suitable for sorting?

 You cannot sort NaNs. A comparison with NaN must always 
 evaluate to false.

 One could change isSorted to check for false instead of true, 
 so it would accept NaN. That would probably break too much 
 code, though.

   arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])

But that is exactly what it does.

https://github.com/D-Programming-Language/phobos/blob/8372da44f66f20c67b69b4b8fb39233ce42d493c/std/algorithm.d#L10021

And that is the reason the second line "assert(arr.isSorted);" 
passes, while it should fail as the array is clearly not sorted. 
Instead modifying

   !(arr[n+1] < arr[n])  =>  arr[n] <= arr[n+1]

would make the function work correctly (return false if the range 
contains nans), but then the template parameter "less" should be 
replaced by "lessOrEqual". An alternative would be to check for 
nans explicitly.

Nov 12 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 08:54:35 UTC, tn wrote:
 On Tuesday, 12 November 2013 at 07:33:57 UTC, qznc wrote:
 On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir 
 Panteleev wrote:
 double[] arr = [1, 2, 3, double.nan, 1, 2];
 assert(arr.isSorted);
 arr.sort();

 This code will fail with an assert error, but not the one on 
 the second line. Rather, it will fail inside std.range, when 
 sort calls assumeSorted, which checks elements in an order 
 different than sort and isSorted.

 Here's a case where the odd way NaN interacts with generic 
 code messes things up in an ugly way.

 This is concerning. It's very easy to overlook this while 
 writing an application, and it can become a hidden 
 vulnerability.

 We can't fix the operators... but, perhaps, we could define a 
 safe default comparison predicate (e.g. std.algorithm.less) 
 for use with sorting and related tasks? Aside from bitwise 
 comparison, is it even possible to efficiently compare 
 floating-point values in a way suitable for sorting?

 You cannot sort NaNs. A comparison with NaN must always 
 evaluate to false.

 One could change isSorted to check for false instead of true, 
 so it would accept NaN. That would probably break too much 
 code, though.

  arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])

 But that is exactly what it does.

 https://github.com/D-Programming-Language/phobos/blob/8372da44f66f20c67b69b4b8fb39233ce42d493c/std/algorithm.d#L10021

 And that is the reason the second line "assert(arr.isSorted);" 
 passes, while it should fail as the array is clearly not 
 sorted. Instead modifying

   !(arr[n+1] < arr[n])  =>  arr[n] <= arr[n+1]

 would make the function work correctly (return false if the 
 range contains nans), but then the template parameter "less" 
 should be replaced by "lessOrEqual". An alternative would be to 
 check for nans explicitly.

Actually, you could also use "!>=" instead of "<" so that the 
compasison would become:

   !(arr[n+1] !>= arr[n])

This is also correct for floating point numbers. However, it 
might be problematic for user defined types that also implement 
something similar to nan (eg. bigfloat). I could not find any 
documentation on how the unordered comparison operators (<>, 
!<>=, !<=, !<, !>=, !>, !<>) translate into opCmp calls.

Nov 12 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/12/2013 1:33 AM, tn wrote:
 I could not find any documentation on how the unordered
 comparison operators (<>, !<>=, !<=, !<, !>=, !>, !<>) translate into opCmp
calls.

That's because they just don't translate to opCmp calls. At one point I was 
going to set up an opExtendedCmp or something like that for those operators,
but 
then we went ahead and deprecated those operators, rendering the idea moot.

Nov 12 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 09:40:52 UTC, Walter Bright wrote:
 On 11/12/2013 1:33 AM, tn wrote:
 I could not find any documentation on how the unordered
 comparison operators (<>, !<>=, !<=, !<, !>=, !>, !<>) 
 translate into opCmp calls.

 That's because they just don't translate to opCmp calls.

Well, they seem to translate to something, because this works as 
expected:

   struct S {
       private double v;
       auto opCmp(S rhs) { return v - rhs.v; }
   }

   S v1 = S(1);
   S v2 = S(2);
   S vn = S(double.nan);
   assert((v2 < v1) == false);
   assert((v1 < v2) == true);
   assert((v1 < v1) == false);
   assert((v1 < vn) == false);
   assert((v2 !>= v1) == false);
   assert((v1 !>= v2) == true);
   assert((v1 !>= v1) == false);
   assert((v1 !>= vn) == true);


 but then we went ahead and deprecated those operators

Maybe the spec then needs to be updated:

http://dlang.org/expression.html#RelExpression
http://dlang.org/expression.html#floating_point_comparisons

Nov 12 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, November 12, 2013 10:33:26 tn wrote:
  I could not find any
 documentation on how the unordered comparison operators (<>,
 !<>=, !<=, !<, !>=, !>, !<>) translate into opCmp calls.

As I understand it, they only work with the built-in floating point types and 
are supposed to be deprecated. So, writing new code which uses them - 
particularly generic code - wouldn't make sense.

- Jonathan M Davis

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 12:54 AM, tn wrote:
 On Tuesday, 12 November 2013 at 07:33:57 UTC, qznc wrote:
 On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir Panteleev wrote:
 double[] arr = [1, 2, 3, double.nan, 1, 2];
 assert(arr.isSorted);
 arr.sort();

 This code will fail with an assert error, but not the one on the
 second line. Rather, it will fail inside std.range, when sort calls
 assumeSorted, which checks elements in an order different than sort
 and isSorted.

 Here's a case where the odd way NaN interacts with generic code
 messes things up in an ugly way.

 This is concerning. It's very easy to overlook this while writing an
 application, and it can become a hidden vulnerability.

 We can't fix the operators... but, perhaps, we could define a safe
 default comparison predicate (e.g. std.algorithm.less) for use with
 sorting and related tasks? Aside from bitwise comparison, is it even
 possible to efficiently compare floating-point values in a way
 suitable for sorting?

 You cannot sort NaNs. A comparison with NaN must always evaluate to
 false.

 One could change isSorted to check for false instead of true, so it
 would accept NaN. That would probably break too much code, though.

   arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])

 But that is exactly what it does.

 https://github.com/D-Programming-Language/phobos/blob/8372da44f66f20c67b69b4b8fb39233ce42d493c/std/algorithm.d#L10021


 And that is the reason the second line "assert(arr.isSorted);" passes,
 while it should fail as the array is clearly not sorted. Instead modifying

    !(arr[n+1] < arr[n])  =>  arr[n] <= arr[n+1]

 would make the function work correctly (return false if the range
 contains nans), but then the template parameter "less" should be
 replaced by "lessOrEqual". An alternative would be to check for nans
 explicitly.

I think NaNs are singular unordered values that make invalid inputs for 
either sort or isSorted. We should simply add an assert to isSorted.

Andrei

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 15:56:25 UTC, Andrei Alexandrescu 
wrote:
 I think NaNs are singular unordered values that make invalid 
 inputs for either sort or isSorted. We should simply add an 
 assert to isSorted.

Considering their effect on the program state, NaNs can be seen 
as invalid input, and asserts are not suitable for input 
validation.

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 8:05 AM, Vladimir Panteleev wrote:
 On Tuesday, 12 November 2013 at 15:56:25 UTC, Andrei Alexandrescu wrote:
 I think NaNs are singular unordered values that make invalid inputs
 for either sort or isSorted. We should simply add an assert to isSorted.

 Considering their effect on the program state, NaNs can be seen as
 invalid input, and asserts are not suitable for input validation.

I should know, since I dedicated a chapter to the topic in TDPL :o). 
It's all a matter of boundaries, i.e. what you consider "input" and what 
you consider "precondition". When user code reads floating point data 
off of disk, they should do real validation. For isSorted we could say 
that the output is only defined for inputs that are comparable. Or 
indeed we could specify that isSorted throws if unordered data is 
presented, and verify everything at runtime, at a cost in efficiency. 
It's a judgment call, not a cut and dried decision. In this case I think 
assert is the better call.

Andrei

Nov 12 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 15:56:25 UTC, Andrei Alexandrescu 
wrote:
 On 11/12/13 12:54 AM, tn wrote:
 ...
 An alternative would be to check for nans explicitly.

 I think NaNs are singular unordered values that make invalid 
 inputs for either sort or isSorted. We should simply add an 
 assert to isSorted.

But sort and isSorted both allow user to provide a custom "less" 
function. What if the user needs NaNs and, for example, wants to 
sort them before all the other values? Then he calls 
"arr.sort!((a,b) => a < b || (isnan(a) && !isnan(b)))" and the 
code asserts?

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 10:31 AM, tn wrote:
 On Tuesday, 12 November 2013 at 15:56:25 UTC, Andrei Alexandrescu wrote:
 On 11/12/13 12:54 AM, tn wrote:
 ...
 An alternative would be to check for nans explicitly.

 I think NaNs are singular unordered values that make invalid inputs
 for either sort or isSorted. We should simply add an assert to isSorted.

 But sort and isSorted both allow user to provide a custom "less"
 function. What if the user needs NaNs and, for example, wants to sort
 them before all the other values? Then he calls "arr.sort!((a,b) => a <
 b || (isnan(a) && !isnan(b)))" and the code asserts?

I think this is a misunderstanding. I never advocated inserting 
specialized asserts for floating-point numbers. http://goo.gl/3dGGkf 
takes care of that the right way - by making sure that the less 
predicate is sensible for the kind of work isSorted and sort do. That is 
regardless of the nature of the objects being sorted.

Andrei

Nov 12 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, November 12, 2013 05:41:55 Vladimir Panteleev wrote:
 double[] arr = [1, 2, 3, double.nan, 1, 2];
 assert(arr.isSorted);
 arr.sort();
 
 This code will fail with an assert error, but not the one on the
 second line. Rather, it will fail inside std.range, when sort
 calls assumeSorted, which checks elements in an order different
 than sort and isSorted.
 
 Here's a case where the odd way NaN interacts with generic code
 messes things up in an ugly way.
 
 This is concerning. It's very easy to overlook this while writing
 an application, and it can become a hidden vulnerability.
 
 We can't fix the operators... but, perhaps, we could define a
 safe default comparison predicate (e.g. std.algorithm.less) for
 use with sorting and related tasks? Aside from bitwise
 comparison, is it even possible to efficiently compare
 floating-point values in a way suitable for sorting?

The check could be special-cased for floating point values so that it takes NaN 
into account, but NaN pretty much screws with everything. Once it's there, 
pretty much anything involving math or comparisons is going to go bad. So, you 
could argue that it's a bug in the program if sort is passed a NaN and that 
the assertion in sort is therefore perfectly valid. You can't really sort 
anything with NaN in it anyway.

- Jonathan M Davis

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 08:10:16 UTC, Jonathan M Davis 
wrote:
 On Tuesday, November 12, 2013 05:41:55 Vladimir Panteleev wrote:
 double[] arr = [1, 2, 3, double.nan, 1, 2];
 assert(arr.isSorted);
 arr.sort();
 
 This code will fail with an assert error, but not the one on 
 the
 second line. Rather, it will fail inside std.range, when sort
 calls assumeSorted, which checks elements in an order different
 than sort and isSorted.
 
 Here's a case where the odd way NaN interacts with generic code
 messes things up in an ugly way.
 
 This is concerning. It's very easy to overlook this while 
 writing
 an application, and it can become a hidden vulnerability.
 
 We can't fix the operators... but, perhaps, we could define a
 safe default comparison predicate (e.g. std.algorithm.less) for
 use with sorting and related tasks? Aside from bitwise
 comparison, is it even possible to efficiently compare
 floating-point values in a way suitable for sorting?

 The check could be special-cased for floating point values so 
 that it takes NaN
 into account, but NaN pretty much screws with everything. Once 
 it's there,
 pretty much anything involving math or comparisons is going to 
 go bad. So, you
 could argue that it's a bug in the program if sort is passed a 
 NaN and that
 the assertion in sort is therefore perfectly valid. You can't 
 really sort
 anything with NaN in it anyway.

Here is the problem:

1) NaN values are accepted as input almost anywhere where doubles 
are. std.conv recognizes the string "nan" and converts it to 
double.nan,
2) It is unreasonable to expect every D user out there dealing 
with floating point numbers to be forced to check every source of 
floating point values in their program against NaNs.
3) If the program is compiled in debug mode, it will crash 
quasi-randomly, since the assumeSorted assert does not check 
every pair.

For these reasons, this problem can become a sort of time bomb in 
their application: even with 100% test coverage, one NaN in the 
wrong place inputted by a malicious end user can cause a 
situation the program's authors have not foreseen.

It can be argued that it is a flaw in the language in that it 
allowed such a situation to arise in the first place.

P.S. Please enable RFC 2646 (format=flowed) support in your email 
client. It is emitting unreflowable text, which causes jagged 
quotes as seen above.

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 8:00 AM, Vladimir Panteleev wrote:
 1) NaN values are accepted as input almost anywhere where doubles are.
 std.conv recognizes the string "nan" and converts it to double.nan,

They are still singular values. Code has no business comparing them 
unwittingly.

 2) It is unreasonable to expect every D user out there dealing with
 floating point numbers to be forced to check every source of floating
 point values in their program against NaNs.

I find that perfectly reasonable. How do you mean that?

 3) If the program is compiled in debug mode, it will crash
 quasi-randomly, since the assumeSorted assert does not check every pair.

It should always crash. As I said: insert an assert in assumeSorted and 
be done with it.

 For these reasons, this problem can become a sort of time bomb in their
 application: even with 100% test coverage, one NaN in the wrong place
 inputted by a malicious end user can cause a situation the program's
 authors have not foreseen.

 It can be argued that it is a flaw in the language in that it allowed
 such a situation to arise in the first place.

The language allows NaN floating point number with the semantics of IEEE 
754. We hardly have any leeway in the matter unless we want to irate a 
lot of people for no good reason.

I don't understand your entire point here. I agree with something like 
"NaNs are a nuisance." We need to fix the corresponding bugs in 
assumeSorted, isSorted etc. by inserting sanity checks such as:

if (a[i] < a[i + 1]) { assert(!(a[i + 1] < a[i])); ... }
else { assert(a[i + 1] >= a[i]); ... }

etc.


Andrei

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 16:48:01 UTC, Andrei Alexandrescu 
wrote:
 On 11/12/13 8:00 AM, Vladimir Panteleev wrote:
 1) NaN values are accepted as input almost anywhere where 
 doubles are.
 std.conv recognizes the string "nan" and converts it to 
 double.nan,

 They are still singular values. Code has no business comparing 
 them unwittingly.

 2) It is unreasonable to expect every D user out there dealing 
 with
 floating point numbers to be forced to check every source of 
 floating
 point values in their program against NaNs.

 I find that perfectly reasonable. How do you mean that?

Both of the above hinge on the relative obscurity of NaNs and the 
problems they could cause. People who are not specifically 
familiar with NaNs and how they interact with other 
floating-point values will treat floating-point values as "just 
numbers", and expect them to compare and sort in the same way as 
integers.

 The language allows NaN floating point number with the 
 semantics of IEEE 754. We hardly have any leeway in the matter 
 unless we want to irate a lot of people for no good reason.

 It's a judgment call, not a cut and dried decision.

Agreed - however, I think there might be more than one way to 
deal with this.

1) As above, introduce a "less" function which guarantees 
transitivity for basic types, and use it in examples throughout.

2) Improve documentation and visibility of this problem. For 
example, add this to the documentation of sort:

"The predicate must be transitive (`a<b && b<c` implies `a<c`) in 
order for `sort` to behave properly - otherwise, the program may 
fail on certain inputs (but not others) when not compiled in 
release mode. Note that `a<b` is not transitive for floating 
points, because the expression will always be `false` when either 
`a` or `b` is `NaN`."

 We should simply add an assert to isSorted.

How would this be implemented, anyway? I stumbled upon this 
problem with this code:

enum sortPred = q{a.hits > b.hits || (a.hits == b.hits && a.value 
< b.value)};
all.sort!sortPred();

(I know about multiSort, but it currently seems to be buggy - I 
do have a test case, but it has a few million entries and I need 
to reduce it.)

 if (a[i] < a[i + 1]) { assert(!(a[i + 1] < a[i])); ... }
 else { assert(a[i + 1] >= a[i]); ... }

Sort et al don't know how to check "a>=b"... and they can't use 
"a<b" either, because !(a<b) && !(b<a) implies a==b.

Nov 12 2013

"Bigsandwich" <bigsandwich gmail.com> writes:

 Both of the above hinge on the relative obscurity of NaNs and 
 the problems they could cause. People who are not specifically 
 familiar with NaNs and how they interact with other 
 floating-point values will treat floating-point values as "just 
 numbers", and expect them to compare and sort in the same way 
 as integers.

Please, please, please just no.  As someone who works with 
floating point daily, you cannot idiot proof it like this.  They 
will never behave like "just numbers" and you can't make them.

Even leaving out NAN, you have issues with precision, accumulated 
error, denormals, equality comparison, and on and on.

If you don't know what you are doing with them, then you just 
shouldn't be touching them.  Unicode has similar issues.

Nov 12 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/12/2013 10:25 AM, Bigsandwich wrote:
 Please, please, please just no.  As someone who works with floating point
daily,
 you cannot idiot proof it like this.  They will never behave like "just
numbers"
 and you can't make them.

 Even leaving out NAN, you have issues with precision, accumulated error,
 denormals, equality comparison, and on and on.

 If you don't know what you are doing with them, then you just shouldn't be
 touching them.  Unicode has similar issues.

I think it's apropos to reference this famous document:

"What Every Computer Scientist Should Know About Floating-Point Arithmetic"

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

I know, I know, RTFM, but one cannot ignore this stuff and write professional 
quality fp code, nor is it practical to have the language ignore it for you.

Nov 12 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/12/2013 9:59 AM, Vladimir Panteleev wrote:
 Both of the above hinge on the relative obscurity of NaNs and the problems they
 could cause. People who are not specifically familiar with NaNs and how they
 interact with other floating-point values will treat floating-point values as
 "just numbers", and expect them to compare and sort in the same way as
integers.

NaNs are valid values for floating point numbers. Trying to pretend they don't 
exist is a doomed strategy for programmers. I regard this as analogous to the 
regular proposals to hide the fact that char[] are sequences of unicode code 
points, attempts to pretend that integers don't overflow, etc.

Floating point math has some strange characteristics (NaNs are only one such), 
and anyone writing a non-toy fp app needs to learn that stuff. There's no other
way.

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 18:31:40 UTC, Walter Bright wrote:
 NaNs are valid values for floating point numbers. Trying to 
 pretend they don't exist is a doomed strategy for programmers. 
 I regard this as analogous to the regular proposals to hide the 
 fact that char[] are sequences of unicode code points, attempts 
 to pretend that integers don't overflow, etc.

 Floating point math has some strange characteristics (NaNs are 
 only one such), and anyone writing a non-toy fp app needs to 
 learn that stuff. There's no other way.

On Tuesday, 12 November 2013 at 18:25:10 UTC, Bigsandwich wrote:
 Please, please, please just no.  As someone who works with 
 floating point daily, you cannot idiot proof it like this.  
 They will never behave like "just numbers" and you can't make 
 them.

 Even leaving out NAN, you have issues with precision, 
 accumulated error, denormals, equality comparison, and on and 
 on.

That doesn't change the reality that not everyone is aware of 
these issues, and that even people who are aware of them may 
overlook something somewhere. We can't pretend that there is no 
problem just because it's something you "have to know" and "have 
to be careful about" - if we can improve the situation without 
making a compromise elsewhere, then it must be done.

I don't see this stance as any different than shouting "RTFM!!!" 
at anyone who makes a mistake which, in theory, could be avoided 
by studying, memorization and careful application of the 
appropriate documentation located ... somewhere. This approach 
does not work - even if that user has learned his lesson the hard 
way, more new users will make the same mistake. Instead, you must 
actually make the problem less likely to manifest - improve your 
documentation, or improve the design of your system such that 
such errors will be less likely to occur.

This particular situation affects only one specific property of 
floating-point numbers, so other properties are not relevant to 
this discussion. Personally, I've been taught about 
floating-point precision issues back in school, but I've only 
learned about NaNs while learning D. Call me an unlearned idiot 
if that makes you feel better, but it is rather likely that there 
will be more users making the same mistake in the future if the 
situation is not addressed.

Furthermore, the problem is aggravated by that it is hidden by 
generic code. This thread isn't a complaint that "the line `if 
(a<b)` in my program behaves in a funny way" - there are no 
explicit floating-point comparisons in the example in the 
original post. We are calling sort, a function in the standard 
library, with its default predicate, as specified in the standard 
library, using a built-in type, which is part of the language, 
and it fails its own assertion on certain input. This is a 
problem!

I have suggested possible improvements earlier in this thread, so 
I'd like to ask you to comment on those rather than hand-waving 
the problem away.

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 11:06 AM, Vladimir Panteleev wrote:
 That doesn't change the reality that not everyone is aware of these
 issues, and that even people who are aware of them may overlook
 something somewhere. We can't pretend that there is no problem just
 because it's something you "have to know" and "have to be careful about"
 - if we can improve the situation without making a compromise elsewhere,
 then it must be done.

I agree. I'll just add that people who plan to get work done with 
floating point will inevitably stumble upon NaN. There's no two ways 
about it.

 I don't see this stance as any different than shouting "RTFM!!!" at
 anyone who makes a mistake which, in theory, could be avoided by
 studying, memorization and careful application of the appropriate
 documentation located ... somewhere. This approach does not work - even
 if that user has learned his lesson the hard way, more new users will
 make the same mistake. Instead, you must actually make the problem less
 likely to manifest - improve your documentation, or improve the design
 of your system such that such errors will be less likely to occur.

I agree here, too. Again, I'll add that "improving the documentation" 
helps little unless RTFM is at work :o).

 This particular situation affects only one specific property of
 floating-point numbers, so other properties are not relevant to this
 discussion. Personally, I've been taught about floating-point precision
 issues back in school, but I've only learned about NaNs while learning
 D. Call me an unlearned idiot if that makes you feel better, but it is
 rather likely that there will be more users making the same mistake in
 the future if the situation is not addressed.

You're young. I guarantee you were to hit NaN whether or not D had 
anything to do with it.

 Furthermore, the problem is aggravated by that it is hidden by generic
 code. This thread isn't a complaint that "the line `if (a<b)` in my
 program behaves in a funny way" - there are no explicit floating-point
 comparisons in the example in the original post. We are calling sort, a
 function in the standard library, with its default predicate, as
 specified in the standard library, using a built-in type, which is part
 of the language, and it fails its own assertion on certain input. This
 is a problem!

Yes.

 I have suggested possible improvements earlier in this thread, so I'd
 like to ask you to comment on those rather than hand-waving the problem
 away.

I think your solutions are not good. I also think my solution in 
http://goo.gl/3dGGkf is good. (I know it sounds awful, but I hope you'll 
appreciate the brevity :o).) The solution I propose explains exactly how 
NaN messes up the ordering comparison operator, in a way that doesn't 
make the tests refer to NaN or floating point numbers in particular.


Andrei

Nov 12 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/12/2013 12:24 PM, Andrei Alexandrescu wrote:
 You're young. I guarantee you were to hit NaN whether or not D had anything to
 do with it.

I want to add that NaN has been a reality on x86 machines since about 1983. C 
(and C++) compilers for decades have utterly ignored its existence - but that 
didn't make it go away. It just meant that things like the fp functions in the 
Standard library exhibited undefined behavior with NaNs, which is far worse
than 
at least having sensible documented behavior.

The C compilers I wrote (and C++) always made an effort to handle NaN correctly 
(and overflows, underflows, and subnormals), and I often felt that I was the 
only one who cared about it :-)

What D does is simply recognize that NaNs are an inevitable characteristic of 
the floating point hardware, and deal with them in the most straightforward 
manner practical. Deciding and documenting what .sort should do with NaNs is 
part of that. Trying to pretend that NaNs aren't a fact of life with IEEE 
floating point hardware is not going to work.

Note:



and how the behavior with NaN arguments is all carefully documented.

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 20:24:11 UTC, Andrei Alexandrescu 
wrote:
 Again, I'll add that "improving the documentation" helps little 
 unless RTFM is at work :o).

The distinction between good and bad documentation in this 
context is whether the respective information is discoverable. 
That is, don't make users ask "How should I have known that?". A 
warning on the documentation of std.algorithm.sort is good, but 
an article on floating-points merely included together with the 
documentation, and not referenced from anywhere, isn't.

 I think your solutions are not good. I also think my solution 
 in http://goo.gl/3dGGkf is good. (I know it sounds awful, but I 
 hope you'll appreciate the brevity :o).)

Brevity appreciated :D

Nov 12 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 17:59:37 UTC, Vladimir Panteleev 
wrote:
 1) As above, introduce a "less" function which guarantees 
 transitivity for basic types, and use it in examples throughout.

 2) Improve documentation and visibility of this problem. For 
 example, add this to the documentation of sort:

 "The predicate must be transitive (`a<b && b<c` implies `a<c`) 
 in order for `sort` to behave properly - otherwise, the program 
 may fail on certain inputs (but not others) when not compiled 
 in release mode. Note that `a<b` is not transitive for floating 
 points, because the expression will always be `false` when 
 either `a` or `b` is `NaN`."

But 'a<b' is transitive for floating points. The transitivity 
condition (`a<b && b<c` implies `a<c`) says nothing about the 
cases where a and b (or b and c) are incomparable. Transitivity 
is not the problem

 Sort et al don't know how to check "a>=b"... and they can't use 
 "a<b" either, because !(a<b) && !(b<a) implies a==b.

This is the problem, that is, the fact that with '<' you can't 
tell the difference between equal values and incomparable values 
(one or both are NaNs). On the other hand, with '<=' you can. See:

  (a<b) && !(b<a)  -->  a<b
!(a<b) &&  (b<a)  -->  b<a
  (a<b) &&  (b<a)  -->  (not possible)
!(a<b) && !(b<a)  -->  a==b OR incomparable

but

  (a<=b) && !(b<=a)  -->  a<b
!(a<=b) &&  (b<=a)  -->  b<a
  (a<=b) &&  (b<=a)  -->  a==b
!(a<=b) && !(b<=a)  -->  incomparable

Thus, the correct solution would be to modify the functions to 
take "lessOrEqual" as a template parameter instead of "less".

If "less" is kept, then we need another template argument to 
separate equality and incomparability (something like 
"equals(a,b)", or "incomparable(a,b)").

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 19:26:12 UTC, tn wrote:
 But 'a<b' is transitive for floating points. The transitivity 
 condition (`a<b && b<c` implies `a<c`) says nothing about the 
 cases where a and b (or b and c) are incomparable. Transitivity 
 is not the problem

Whoops, you're right. I meant transitivity of the negated 
expression: !(a>b) && !(b>c) implies !(a>c). This fails for [3, 
NaN, 1].

 Sort et al don't know how to check "a>=b"... and they can't 
 use "a<b" either, because !(a<b) && !(b<a) implies a==b.

 This is the problem, that is, the fact that with '<' you can't 
 tell the difference between equal values and incomparable 
 values (one or both are NaNs). On the other hand, with '<=' you 
 can. See:

  (a<b) && !(b<a)  -->  a<b
 !(a<b) &&  (b<a)  -->  b<a
  (a<b) &&  (b<a)  -->  (not possible)
 !(a<b) && !(b<a)  -->  a==b OR incomparable

 but

  (a<=b) && !(b<=a)  -->  a<b
 !(a<=b) &&  (b<=a)  -->  b<a
  (a<=b) &&  (b<=a)  -->  a==b
 !(a<=b) && !(b<=a)  -->  incomparable

 Thus, the correct solution would be to modify the functions to 
 take "lessOrEqual" as a template parameter instead of "less".

Too late for that now... just like it's too late to change IEEE 
comparison logic.

 If "less" is kept, then we need another template argument to 
 separate equality and incomparability (something like 
 "equals(a,b)", or "incomparable(a,b)").

Is this to allow generic partial ordering (rather than to fix 
sorting NaNs specifically)? I remember seeing a thread about 
partial ordering and opCmp:
http://forum.dlang.org/post/dpfbtu$1ocf$1 digitaldaemon.com

Nov 12 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 19:39:19 UTC, Vladimir Panteleev 
wrote:
 On Tuesday, 12 November 2013 at 19:26:12 UTC, tn wrote:
 Thus, the correct solution would be to modify the functions to 
 take "lessOrEqual" as a template parameter instead of "less".

 Too late for that now... just like it's too late to change IEEE 
 comparison logic.

Yes, I was expecting this. :)

 If "less" is kept, then we need another template argument to 
 separate equality and incomparability (something like 
 "equals(a,b)", or "incomparable(a,b)").

 Is this to allow generic partial ordering (rather than to fix 
 sorting NaNs specifically)? I remember seeing a thread about 
 partial ordering and opCmp:
 http://forum.dlang.org/post/dpfbtu$1ocf$1 digitaldaemon.com

I had generic partial orderings in my mind too, but I think it is 
also needed for user defined floating point types such as 
bigfloat. (Unless "isnan" function is defined for the bigfloat 
type too and detected automatically.)

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 9:59 AM, Vladimir Panteleev wrote:
 On Tuesday, 12 November 2013 at 16:48:01 UTC, Andrei Alexandrescu wrote:
 On 11/12/13 8:00 AM, Vladimir Panteleev wrote:
 1) NaN values are accepted as input almost anywhere where doubles are.
 std.conv recognizes the string "nan" and converts it to double.nan,

 They are still singular values. Code has no business comparing them
 unwittingly.

 2) It is unreasonable to expect every D user out there dealing with
 floating point numbers to be forced to check every source of floating
 point values in their program against NaNs.

 I find that perfectly reasonable. How do you mean that?

 Both of the above hinge on the relative obscurity of NaNs and the
 problems they could cause. People who are not specifically familiar with
 NaNs and how they interact with other floating-point values will treat
 floating-point values as "just numbers", and expect them to compare and
 sort in the same way as integers.

I think that's their problem, not ours.

 The language allows NaN floating point number with the semantics of
 IEEE 754. We hardly have any leeway in the matter unless we want to
 irate a lot of people for no good reason.

 It's a judgment call, not a cut and dried decision.

 Agreed - however, I think there might be more than one way to deal with
 this.

 1) As above, introduce a "less" function which guarantees transitivity
 for basic types, and use it in examples throughout.

This is a bit much.

 2) Improve documentation and visibility of this problem. For example,
 add this to the documentation of sort:

 "The predicate must be transitive (`a<b && b<c` implies `a<c`) in order
 for `sort` to behave properly - otherwise, the program may fail on
 certain inputs (but not others) when not compiled in release mode. Note
 that `a<b` is not transitive for floating points, because the expression
 will always be `false` when either `a` or `b` is `NaN`."

Great idea, just file an enh request or just write the pull request.

 We should simply add an assert to isSorted.

 How would this be implemented, anyway? I stumbled upon this problem with
 this code:

 enum sortPred = q{a.hits > b.hits || (a.hits == b.hits && a.value <
 b.value)};
 all.sort!sortPred();

 (I know about multiSort, but it currently seems to be buggy - I do have
 a test case, but it has a few million entries and I need to reduce it.)

 if (a[i] < a[i + 1]) { assert(!(a[i + 1] < a[i])); ... }
 else { assert(a[i + 1] >= a[i]); ... }

 Sort et al don't know how to check "a>=b"... and they can't use "a<b"
 either, because !(a<b) && !(b<a) implies a==b.

It really implies "a and b are in the same equivalence class". Building 
on that notion, we can figure what's wrong with NaNs and how to assert 
against their failings.

Consider we define equiv(a, b) as (a, b) => !less(a, b) && !less(b, a).

If at least one of the numbers is NaN, all comparisons return false. 
That puts NaN in the same equivalence class with all numbers, including 
numbers that are deemed in distinct equivalence classes. That's where 
NaNs mess things up, and that's what we need to assert. Consider three 
numbers a, b, and c. Then the following must pass:

assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

("<=" on Booleans is actually implication.)

That test will fail with NaNs and should be part of isSorted, sort etc.

We should also check such as:

assert(!less(a, a));
assert(less(a, b) <= !less(b, a));

These should be checked in sort only; isSorted does not need these.


Andrei

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 20:03:37 UTC, Andrei Alexandrescu 
wrote:
 Both of the above hinge on the relative obscurity of NaNs and 
 the
 problems they could cause. People who are not specifically 
 familiar with
 NaNs and how they interact with other floating-point values 
 will treat
 floating-point values as "just numbers", and expect them to 
 compare and
 sort in the same way as integers.

 I think that's their problem, not ours.

(I partially disagree here - see my reply to Walter.)

 Sort et al don't know how to check "a>=b"... and they can't 
 use "a<b"
 either, because !(a<b) && !(b<a) implies a==b.

 It really implies "a and b are in the same equivalence class". 
 Building on that notion, we can figure what's wrong with NaNs 
 and how to assert against their failings.

 Consider we define equiv(a, b) as (a, b) => !less(a, b) && 
 !less(b, a).

 If at least one of the numbers is NaN, all comparisons return 
 false. That puts NaN in the same equivalence class with all 
 numbers, including numbers that are deemed in distinct 
 equivalence classes. That's where NaNs mess things up, and 
 that's what we need to assert. Consider three numbers a, b, and 
 c. Then the following must pass:

 assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

OK, we're on the same page so far (although you've presented the 
problem more eloquently).

 ("<=" on Booleans is actually implication.)

(Cool!)

 That test will fail with NaNs and should be part of isSorted, 
 sort etc.

OK, but which a, b and c will be checked? Taking all adjacent 
triples will not work with two adjacent NaNs.

 We should also check such as:

 assert(!less(a, a));
 assert(less(a, b) <= !less(b, a));

Again, for which a/b?

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 12:45 PM, Vladimir Panteleev wrote:
 assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

 OK, we're on the same page so far (although you've presented the problem
 more eloquently).

 ("<=" on Booleans is actually implication.)

 (Cool!)

(... the disadvantage being b is evaluated even if a is false. 
Shouldn't; false implies everything.)

 That test will fail with NaNs and should be part of isSorted, sort etc.

 OK, but which a, b and c will be checked? Taking all adjacent triples
 will not work with two adjacent NaNs.

We can make that work if we insert the tests at the end of a run of 
equivalent values. That would still miss other cases though in isSorted 
(I think sort() can actually be much more thorough there). The point 
here is that we should work together on an innovative solution. Getting 
bogged in the "there's a problem with the language here" mindset 
prevents the forming of good ideas.

 We should also check such as:

 assert(!less(a, a));
 assert(less(a, b) <= !less(b, a));

 Again, for which a/b?

In the isSorted case, for all adjacent values inspected. For sort, 
assertions can be added after each less() that does work.


Andrei

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 21:15:39 UTC, Andrei Alexandrescu 
wrote:
 We can make that work if we insert the tests at the end of a 
 run of equivalent values.

I've thought about this as well, but then there are cases like...

[4, 5, NaN, 3, NaN, 5, 6]

"5, NaN, 3, NaN, 5" is a run of equivalent values as long as 
neighboring pairs are concerned.

 The point here is that we should work together on an innovative 
 solution. Getting bogged in the "there's a problem with the 
 language here" mindset prevents the forming of good ideas.

Agreed.

Nov 12 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 12 November 2013 at 22:17:10 UTC, Vladimir Panteleev
wrote:
 On Tuesday, 12 November 2013 at 21:15:39 UTC, Andrei 
 Alexandrescu wrote:
 We can make that work if we insert the tests at the end of a 
 run of equivalent values.

 I've thought about this as well, but then there are cases 
 like...

 [4, 5, NaN, 3, NaN, 5, 6]

 "5, NaN, 3, NaN, 5" is a run of equivalent values as long as 
 neighboring pairs are concerned.

 The point here is that we should work together on an 
 innovative solution. Getting bogged in the "there's a problem 
 with the language here" mindset prevents the forming of good 
 ideas.

 Agreed.

I may be late to the party, but I think it might be a bit unfair
for sort to be able to reliably "catch" this issue.

Heck, if you call *sort* with NaN, how should it even behave?
Error? Exception?

With AssumeSorted, I think it should throw an Error, if it can
*reliably* do so.

--------

That said, we may have been looking at this problem the wrong
way. Instead of trying to verify the comparison inside the sort,
why not do the check in the comparator itself? It is the
comparator that should "know" how to deal with NaN: Either it
considers it bigger/smaller than everything, and sorts
accordingly, or doesn't know how to deal with it, and throws an
Error/Exception, depending on the users' wishes.

We could make sort/assumeSorted default comparator simply assert
on NaN: This should be able to keep the handling of the NaN
relatively customizable, and cheap in release, without having to
hack into sort/assumeSorted, to do some weird bidirectional
checks.

This keeps the default fast and safe, and if users want more
speed/customization, they can do whatever the heck they like.

My 0.02$

Nov 13 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 20:03:37 UTC, Andrei Alexandrescu 
wrote:
 Consider we define equiv(a, b) as (a, b) => !less(a, b) && 
 !less(b, a).

 If at least one of the numbers is NaN, all comparisons return 
 false. That puts NaN in the same equivalence class with all 
 numbers, including numbers that are deemed in distinct 
 equivalence classes. That's where NaNs mess things up, and 
 that's what we need to assert. Consider three numbers a, b, and 
 c. Then the following must pass:

 assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

 ("<=" on Booleans is actually implication.)

Shouldn't the implication be to the other direction? Then it 
becomes

assert((equiv(a, b) && equiv(b, c)) => equiv(a, c));

 That test will fail with NaNs and should be part of isSorted, 
 sort etc.

But it requires that the input contains at least three elements. 
And it has to test a lot of possible triples (a,b,c), which I 
think requires at least O(n^2) time. And it does not fail 
consistently whenever the input contains at least one NaN 
(consider an input that contains only NaNs).

 We should also check such as:

 assert(!less(a, a));
 assert(less(a, b) <= !less(b, a));

 These should be checked in sort only; isSorted does not need 
 these.

The latter fails also if a==b. (Unless the direction of the 
implication is again wrong.)

Nov 12 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/12/13 1:03 PM, tn wrote:
 On Tuesday, 12 November 2013 at 20:03:37 UTC, Andrei Alexandrescu wrote:
 Consider we define equiv(a, b) as (a, b) => !less(a, b) && !less(b, a).

 If at least one of the numbers is NaN, all comparisons return false.
 That puts NaN in the same equivalence class with all numbers,
 including numbers that are deemed in distinct equivalence classes.
 That's where NaNs mess things up, and that's what we need to assert.
 Consider three numbers a, b, and c. Then the following must pass:

 assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

 ("<=" on Booleans is actually implication.)

 Shouldn't the implication be to the other direction? Then it becomes

 assert((equiv(a, b) && equiv(b, c)) => equiv(a, c));

It is in the other direction, but the latter doesn't compile :o). Again, 
it just turns out "a <= b" has the meaning of "a implies b". It's not a 
particularly intuitive way to express it.

 That test will fail with NaNs and should be part of isSorted, sort etc.

 But it requires that the input contains at least three elements. And it
 has to test a lot of possible triples (a,b,c), which I think requires at
 least O(n^2) time. And it does not fail consistently whenever the input
 contains at least one NaN (consider an input that contains only NaNs).

I agree. Ideas?

 We should also check such as:

 assert(!less(a, a));
 assert(less(a, b) <= !less(b, a));

 These should be checked in sort only; isSorted does not need these.

 The latter fails also if a==b. (Unless the direction of the implication
 is again wrong.)

Try it!


Andrei

Nov 12 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 21:07:48 UTC, Andrei Alexandrescu 
wrote:
 On 11/12/13 1:03 PM, tn wrote:
 On Tuesday, 12 November 2013 at 20:03:37 UTC, Andrei 
 Alexandrescu wrote:
 Consider we define equiv(a, b) as (a, b) => !less(a, b) && 
 !less(b, a).

 If at least one of the numbers is NaN, all comparisons return 
 false.
 That puts NaN in the same equivalence class with all numbers,
 including numbers that are deemed in distinct equivalence 
 classes.
 That's where NaNs mess things up, and that's what we need to 
 assert.
 Consider three numbers a, b, and c. Then the following must 
 pass:

 assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

 ("<=" on Booleans is actually implication.)

 Shouldn't the implication be to the other direction? Then it 
 becomes

 assert((equiv(a, b) && equiv(b, c)) => equiv(a, c));

 It is in the other direction, but the latter doesn't compile 
 :o). Again, it just turns out "a <= b" has the meaning of "a 
 implies b". It's not a particularly intuitive way to express it.

 That test will fail with NaNs and should be part of isSorted, 
 sort etc.

 But it requires that the input contains at least three 
 elements. And it
 has to test a lot of possible triples (a,b,c), which I think 
 requires at
 least O(n^2) time. And it does not fail consistently whenever 
 the input
 contains at least one NaN (consider an input that contains 
 only NaNs).

 I agree. Ideas?

If you want the isSorted function to always fail if the input 
contains NaNs, then I have no other ideas than what I have 
already mentioned. It is just not possible by using only "<" 
comparison. Something more is needed.

The other way would be to define isSorted to return true if and 
only if the _comparable_ elements in the input are sorted (thus 
ignoring possible NaNs in between). This is actually 
implementable with only "<" comparison, but probably requires 
O(n^2) time in the worst case.

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 21:03:03 UTC, tn wrote:
 assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

 ("<=" on Booleans is actually implication.)

 Shouldn't the implication be to the other direction? Then it 
 becomes

 assert((equiv(a, b) && equiv(b, c)) => equiv(a, c));

<= as in ≤ (less or equal), not ⇐ (reverse implies). <= can be 
used on booleans as "implies" due to the way it treats booleans 
as integers (true as 1, false as 0).

Nov 12 2013

"tn" <no email.com> writes:

On Tuesday, 12 November 2013 at 21:09:34 UTC, Vladimir Panteleev 
wrote:
 On Tuesday, 12 November 2013 at 21:03:03 UTC, tn wrote:
 assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));

 ("<=" on Booleans is actually implication.)

 Shouldn't the implication be to the other direction? Then it 
 becomes

 assert((equiv(a, b) && equiv(b, c)) => equiv(a, c));

 <= as in ≤ (less or equal), not ⇐ (reverse implies). <= can be 
 used on booleans as "implies" due to the way it treats booleans 
 as integers (true as 1, false as 0).

Thanks for the explanation. I thought it was pseudocode. That is 
a horribly confusing trick.

Nov 12 2013

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 12 November 2013 at 20:03:37 UTC, Andrei Alexandrescu 
wrote:
 2) Improve documentation and visibility of this problem. For 
 example,
 add this to the documentation of sort:

 "The predicate must be transitive (`a<b && b<c` implies `a<c`) 
 in order
 for `sort` to behave properly - otherwise, the program may 
 fail on
 certain inputs (but not others) when not compiled in release 
 mode. Note
 that `a<b` is not transitive for floating points, because the 
 expression
 will always be `false` when either `a` or `b` is `NaN`."

 Great idea, just file an enh request or just write the pull 
 request.

https://github.com/D-Programming-Language/phobos/pull/1691

Nov 13 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Sorting floating-point values, and NaN