digitalmars.D - sort API design

Johannes Riecken (31/31) Nov 20 2019 I had looked at how to do case-insensitive string sorting with

mipri (35/42) Nov 20 2019 I remember reading that Ada/SPARK did some checking that could've
Dennis (10/12) Nov 20 2019 Try to make it a habit to use `const` instead of `auto` if you
Dukc (7/14) Nov 20 2019 Theoretically it could do the same thing as `std.algortihm.move`

Jonathan M Davis (7/21) Nov 20 2019 The fact that sort returns a SortedRange is useful for some algorithms

Johannes Riecken (6/38) Nov 21 2019 Thanks, that's some nice ideas in this thread! I should

Jonathan M Davis (48/95) Nov 21 2019 There's nothing erroneous about sorting a range in two different ways

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/11) Nov 21 2019 This has come up several times before, here are two of them:
mipri (64/69) Nov 21 2019 I don't disagree, but the problem also wasn't presented as a

Johannes Riecken <johannes.riecken gmail.com> writes:

I had looked at how to do case-insensitive string sorting with 
Phobos and without looking at the documentation much, I went with 
the following code and then wrongly concluded that `<` does 
case-insensitive comparisons on strings by default:

```
import std.algorithm;
import std.stdio;
import std.string;

void main() {
     auto arr0 = ["foo", "bar", "Baz"];
     auto case_sensitive   = sort!((a, b) => a           < b       
    )(arr0);
     auto case_insensitive = sort!((a, b) => a.toLower() < 
b.toLower())(arr0);
     writeln(case_sensitive);
     writeln(case_insensitive);
}
```
Output:
```
["bar", "Baz", "foo"]
["bar", "Baz", "foo"]
```

Later I learned that I was trapped by `sort` being in-place but 
still returning a value. If this API were to be designed from 
scratch, I wonder if it would be possible to rule out this 
incorrect usage at compile-time while still retaining the other 
nice features of the `sort` function like being in-place and 
giving strong types which can be used to choose faster algorithms 
to further transform the result, etc.? Or are there any ideas in 
programming language research that would help with such a case?

Nov 20 2019

mipri <mipri minimaltype.com> writes:

On Wednesday, 20 November 2019 at 09:33:25 UTC, Johannes Riecken 
wrote:
 Or are there
 any ideas in programming language research that would help with 
 such a case?

I remember reading that Ada/SPARK did some checking that could've
helped here, but I can't find a reference now, and I'm not sure of
the terminology. It might be one of the implications of the data
flow analysis that it does.

Suppose you had this in your code:

   int x;
   x = 2;
   x = 3;
   do_something(x);

What is the point of the first assignment? Waste heat from the 
CPU?
Of course there's no point, and SPARK makes it an error to do 
that:
your program is doing some *work*, it is generating some outputs,
so it must go on to use those outputs, and not throw them away.

And in your program that led to a confusing result, you're also
performing work that is lost:

     auto case_sensitive   = sort!((a, b) => a           < b
    )(arr0);
     auto case_insensitive = sort!((a, b) => a.toLower() <
 b.toLower())(arr0);

That first sort() has a return value, but a conceptual output is
the order that it provides to arr0, and the second sort() can be
statically known to destroy order in an arr0 provided to it. So:

   1. sort() discards arr0's order!0 and then provides a new 
order!1
   2. sort() discards arr0's order!1 and then provides a new 
order!2
   3. writeln reads order!2
   4. writeln reads order!2

Or, written like the first example above:

   Order o;
   o = order!1;
   o = order!2;
   do_something(o);

What is the point of the first assignment? Waste heat from the 
CPU?

Nov 20 2019

Dennis <dkorpel gmail.com> writes:

On Wednesday, 20 November 2019 at 09:33:25 UTC, Johannes Riecken 
wrote:
 Or are there any ideas in programming language research that 
 would help with such a case?

Try to make it a habit to use `const` instead of `auto` if you 
don't expect the variable to be mutated.

```
const arr0 = ["foo", "bar", "Baz"];
```

This is obviously not a fancy beginner-friendly solution like the 
thing mipri describes, but it is something that you can start 
doing today that might help.

Nov 20 2019

Dukc <ajieskola gmail.com> writes:

On Wednesday, 20 November 2019 at 09:33:25 UTC, Johannes Riecken 
wrote:
 Later I learned that I was trapped by `sort` being in-place but 
 still returning a value. If this API were to be designed from 
 scratch, I wonder if it would be possible to rule out this 
 incorrect usage at compile-time while still retaining the other 
 nice features of the `sort` function like being in-place and 
 giving strong types which can be used to choose faster 
 algorithms to further transform the result, etc.?

Theoretically it could do the same thing as `std.algortihm.move` 
- assign the original range to it's init value, so if it's 
accidently reused it'd be more obvious. But of course it won't be 
worth doing anymore because of the breakage. Perhaps in the std 
v2 Wilzbach has talked about, if the authors so wish.

Nov 20 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, November 20, 2019 6:15:12 AM MST Dukc via Digitalmars-d wrote:
 On Wednesday, 20 November 2019 at 09:33:25 UTC, Johannes Riecken

 wrote:
 Later I learned that I was trapped by `sort` being in-place but
 still returning a value. If this API were to be designed from
 scratch, I wonder if it would be possible to rule out this
 incorrect usage at compile-time while still retaining the other
 nice features of the `sort` function like being in-place and
 giving strong types which can be used to choose faster
 algorithms to further transform the result, etc.?

 Theoretically it could do the same thing as `std.algortihm.move`
 - assign the original range to it's init value, so if it's
 accidently reused it'd be more obvious. But of course it won't be
 worth doing anymore because of the breakage. Perhaps in the std
 v2 Wilzbach has talked about, if the authors so wish.

The fact that sort returns a SortedRange is useful for some algorithms
(since they can then statically detect that the range is sorted), but it's
also useful to be able to use sort the range in-place and continue to use it
with the exact same type. The current API is only a problem if you don't
read the documentation.

- Jonathan M Davis

Nov 20 2019

Johannes Riecken <johannes.riecken gmail.com> writes:

On Wednesday, 20 November 2019 at 09:33:25 UTC, Johannes Riecken 
wrote:
 I had looked at how to do case-insensitive string sorting with 
 Phobos and without looking at the documentation much, I went 
 with the following code and then wrongly concluded that `<` 
 does case-insensitive comparisons on strings by default:

 ```
 import std.algorithm;
 import std.stdio;
 import std.string;

 void main() {
     auto arr0 = ["foo", "bar", "Baz"];
     auto case_sensitive   = sort!((a, b) => a           < b
    )(arr0);
     auto case_insensitive = sort!((a, b) => a.toLower() < 
 b.toLower())(arr0);
     writeln(case_sensitive);
     writeln(case_insensitive);
 }
 ```
 Output:
 ```
 ["bar", "Baz", "foo"]
 ["bar", "Baz", "foo"]
 ```

 Later I learned that I was trapped by `sort` being in-place but 
 still returning a value. If this API were to be designed from 
 scratch, I wonder if it would be possible to rule out this 
 incorrect usage at compile-time while still retaining the other 
 nice features of the `sort` function like being in-place and 
 giving strong types which can be used to choose faster 
 algorithms to further transform the result, etc.? Or are there 
 any ideas in programming language research that would help with 
 such a case?

Thanks, that's some nice ideas in this thread! I should 
definitely get into the habit of using const wherever possible.
Jonathan, I think a major selling point of Phobos and D is that 
in lots of cases it can catch erroneous API usage at compile time.

Nov 21 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, November 21, 2019 3:19:45 PM MST Johannes Riecken via 
Digitalmars-d wrote:
 On Wednesday, 20 November 2019 at 09:33:25 UTC, Johannes Riecken

 wrote:
 I had looked at how to do case-insensitive string sorting with
 Phobos and without looking at the documentation much, I went
 with the following code and then wrongly concluded that `<`
 does case-insensitive comparisons on strings by default:

 ```
 import std.algorithm;
 import std.stdio;
 import std.string;

 void main() {

     auto arr0 = ["foo", "bar", "Baz"];
     auto case_sensitive   = sort!((a, b) => a           < b

    )(arr0);

     auto case_insensitive = sort!((a, b) => a.toLower() <

 b.toLower())(arr0);

     writeln(case_sensitive);
     writeln(case_insensitive);

 }
 ```
 Output:
 ```
 ["bar", "Baz", "foo"]
 ["bar", "Baz", "foo"]
 ```

 Later I learned that I was trapped by `sort` being in-place but
 still returning a value. If this API were to be designed from
 scratch, I wonder if it would be possible to rule out this
 incorrect usage at compile-time while still retaining the other
 nice features of the `sort` function like being in-place and
 giving strong types which can be used to choose faster
 algorithms to further transform the result, etc.? Or are there
 any ideas in programming language research that would help with
 such a case?

 Thanks, that's some nice ideas in this thread! I should
 definitely get into the habit of using const wherever possible.
 Jonathan, I think a major selling point of Phobos and D is that
 in lots of cases it can catch erroneous API usage at compile time.

There's nothing erroneous about sorting a range in two different ways
(though it obviously wouldn't make sense to do so twice in row without doing
anything with the range in between). And having sort return a new range that
was sorted while not affecting the one that was passed in would require
either sorting lazily (which really doesn't work) or allocating an entirely
new range in order to sort into - neither of which makes much sense. So, I'm
honestly very surprised that anyone would think that calling sort would
result in a new range without affecting the one that was passed in.

Regardless, the documentation makes the behavior of sort clear, and anyone
using a function needs to read its documentation if they don't want to run
into problems. If you don't read documentation, you're going to make bad
assumptions about how a function behaves eventually - if not frequently -
and end up with bugs in your code.

It is of course nice when the language is able to catch incorrect behavior
for you at compile time, but I don't see how that could be done here. Sure,
you used the function incorrectly, and if sort allocated a new range to sort
into and returned that, you wouldn't have been bitten. However, pretty much
every sort function I have ever seen sorts in-place, so I would expect most
folks to assume that sort sorted the range in-place. And if sort returned a
newly allocated range, anyone making that assumption and used sort without
reading the documentation would end up with buggy code and could have
complaints similiar to yours. Whether sort returns a newly allocated range
or sorts in-place, _someone_ is going to make the wrong assumption about how
it works and get bitten without the compiler being able to help them out.
And the current implementation is by far more efficient than the version
that returned a newly allocated range would be. It's also more flexible,
since it allows you to sort the same range multiple times, multiple ways if
that makes sense for the code. Anyone who wants to get a new range can
simply allocate a new array to hold the data - either via .dup if the range
to be sorted is an array or by calling save on the range and passing it to
std.array.array, making your code something like

    auto case_sensitive   = sort!((a, b) => a < b)(arr0.dup);
    auto case_insensitive =
        sort!((a, b) => a.toLower() < b.toLower())(arr0.dup);

So, I'm sorry that you gotten bitten by sort not working the way you
expected, but I don't see how that could have been prevented without
implementing the range in the way that you expected, which would then screw
over anyone who assumed that the range would be sorted in-place. The
compiler can't fix this for you. Ultimately, you have to read the
documentation. And given the inefficiency of returning a sorted range
without affecting the one that was passed in, the current implementation
makes far more sense than how you assumed that sort worked.

Even if D is able to catch more problems at compile time than another
language, there's still a limit to how many bugs it can catch. There's a
reason that unit tests are a feature that's built into D.

- Jonathan M Davis

Nov 21 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Thursday, 21 November 2019 at 23:28:48 UTC, Jonathan M Davis 
wrote:
 So, I'm sorry that you gotten bitten by sort not working the 
 way you expected, but I don't see how that could have been 
 prevented without implementing the range in the way that you 
 expected, which would then screw over anyone who assumed that 
 the range would be sorted in-place.

This has come up several times before, here are two of them:

https://forum.dlang.org/search?q=Sorted+past+tense&search=Search

It can be fixed with careful naming conventions.

(lazy sorting make sense actually... Heap... Bucket...)

Nov 21 2019

mipri <mipri minimaltype.com> writes:

On Thursday, 21 November 2019 at 23:28:48 UTC, Jonathan M Davis 
wrote:
 So, I'm sorry that you gotten bitten by sort not working the
 way you expected, but I don't see how that could have been
 prevented without implementing the range in the way that you
 expected, which would then screw over anyone who assumed that
 the range would be sorted in-place.

I don't disagree, but the problem also wasn't presented as a 
defect
of Phobos, but as "I did wrote this code. I learned it was wrong.
How might my error have been averted?"

 From some other languages,

| Language | Sort        | Mutates? | Returns? | Both? |
|----------+-------------+----------+----------+-------|
| Perl     | sort        | No       | Yes      | No    |
| Python   | list.sort   | Yes      | No       | No    |
| Python   | list.sorted | No       | Yes      | No    |
| C        | qsort       | Yes      | No       | No    |
| C++      | std::sort   | Yes      | No       | No    |
| Rust     | vec::sort   | Yes      | No       | No    |

The norm is to do one or the other, and not both.

Why does D do both? Because it has an efficiency focus (avoid
unnecessary copying) and also an expressiveness focus (go ahead 
and
do things with long pipelines of transformations), and this
combination is uncommon.

Actually I'm surprised Rust doesn't do both... and come to think 
of
it, I *was* surprised by that, when I rewrote a benchmark in it: I
tried putting the .sort_by() in the middle of a pipeline of
function calls and was confused by the error. So I had the exact
opposite "I did a thing. I learned it was wrong." problem with
sorting in Rust.

There's another reason D does both: 
thing.change().change().change()
is how you build pipelines in D, and each of these change()
functions needs to return the input to the next change(). But 
note,
this isn't a deliberate language feature, it's not taking 
advantage
of "pipeline syntax", it's just a consequence of how methods work
in OOP languages.

So it's more like a pattern :) One of those awful things that
highlights a failure of language design, where programmers, 
instead
of directly expressing their intentions with code, must 
laboriously
work around a void in the language. Or so I hear.

https://www.martinfowler.com/articles/collection-pipeline/
https://clojure.org/guides/threading_macros

Suppose, instead of expecting D programmers to make use of the
"collection pipelien" pattern, D just had a collection pipeline
feature? Like:

   thing |> change1() |> change2() |> change3()

If it's a language feature, D can, since it knows the types of all
these functions, apply a rule like

   if a function is void, reuse the last input for the next 
function
   otherwise, behave like . chaining

With this, Phobos could have a void sort that could still appear 
in
the middle of a long pipeline of transformations, and confusions
like the one in this thread about Phobos, and *also* the one I had
about Rust's sort, would both not be possible.

Instead you'd have the new potential confusion of thinking that
sort doesn't mutate because you saw it in a pipeline :)


To be clear, I don't recommend changing anything. I just think 
it's
not true that D has nothing to do with this.

Nov 21 2019

D Programming

C/C++ Programming

Other

digitalmars.D - sort API design