## digitalmars.D - Another algo for faster sorting

• Dmitry Olshansky (4/4) Apr 07 2016 Coincidentally with another NG thread I'm curious if we can
• Andrea Fontana (9/13) Apr 07 2016 Radix sort is a very fast way to sort strings and integers . But
• tsbockman (25/27) Apr 07 2016 I was planning to respond to your original question, I just
• Andrea Fontana (15/23) Apr 08 2016 Ditto
• tsbockman (12/16) Apr 08 2016 There are tons of algorithms in Phobos that *could* be made
• tsbockman (9/11) Apr 08 2016 Also, your response makes it sound like I was advocating for just
• deadalnix (6/16) Apr 08 2016 You can make it work with float as well with some horrible hacks.
• Andrea Fontana (7/26) Apr 08 2016 Using your functions to radixify float this seems to run faster
• Andrei Alexandrescu (4/31) Apr 08 2016 Guess we need a pull request. In order to distinguish the "less than"
• Andrea Fontana (4/8) Apr 08 2016 Why not a sort with SortAscending.yes/no or an enum as
• Jonathan Villa (12/16) Apr 07 2016 The other day I had problems using Quicksort (trying to sorting
• tsbockman (8/12) Apr 07 2016 It would be cool to have a shell sort implementation available
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
```Coincidentally with another NG thread I'm curious if we can
special-case our sort for
strings to Three-Way Radix QuickSort which is more efficient:

```
Apr 07 2016
Andrea Fontana <nospam example.com> writes:
```On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:
Coincidentally with another NG thread I'm curious if we can
special-case our sort for
strings to Three-Way Radix QuickSort which is more efficient:

Radix sort is a very fast way to sort strings and integers . But
it does not work with a custom " less" function . It just sorts
date to lexical way .

If phobos had a lexicalSort ( T [ ] , Ascending.YES ) ; probably
a lot of optimizations could be done for many data types.

Anyway my topic about sort optimization seems not to have a good
luck if not for benchmark inaccuracy :)

Andrea
```
Apr 07 2016
tsbockman <thomas.bockman gmail.com> writes:
```On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
Anyway my topic about sort optimization seems not to have a
good luck if not for benchmark inaccuracy :)

I was planning to respond to your original question, I just
didn't have time last night...

I don't think that the simple "counting sort" you demonstrated in
the other thread is particularly useful to have in Phobos, by
itself:
- It doesn't scale well at all to higher-entropy element
types, like `int` or `string`.
- It only works on value types, not reference types.
- It's so trivial to implement, that anyone who really needs
it can write it themselves.

However, what might be really useful is to have a good generic
implementation of "bucket sort". Using a user-supplied fast hash
function whose output follows the same ordering as the original
elements:
1. Move/copy each element into a bucket list chosen by the
hash function.
2. Sort each individual bucket list using another algorithm.
3. Iterate through all the buckets lists, and move/copy each
element to the output.

With good design, it should be possible to express counting sort
for small value types like `bool` and `byte` simply as a
parameterization of bucket sort. (This would require that the
bucket list type be parameterized, in addition to the hash
function.)
```
Apr 07 2016
Andrea Fontana <nospam example.com> writes:
```On Thursday, 7 April 2016 at 18:15:32 UTC, tsbockman wrote:
I don't think that the simple "counting sort" you demonstrated
in the other thread is particularly useful to have in Phobos,
by itself:
- It doesn't scale well at all to higher-entropy element
types, like `int` or `string`.

I mean to replace sort() on phobos if T == byte || T == ...

- It only works on value types, not reference types.

Ditto

- It's so trivial to implement, that anyone who really
needs it can write it themselves.

Ok, but it's not a good reason to keep an inefficient sort() on
phobos. I would use algorithms in phobos without thinking if I
have to reimplement them or not. If not we should implement

... sort(T)(T[]) if (T == byte) assert(0, "hey this is a bad
implementation but is trivial to implement the right one"); :)

Moreover: many algorithms are generic in phobos and in D
ecosystem, and it could happen that byte's sort (or even bool)
isn't used directly, but instead inside other algorithms. Let's
say we need to write a burrows-wheeler transform for bzip2 and we
need to sort byte... Won't you use the default sort function for
it?

Andrea
```
Apr 08 2016
tsbockman <thomas.bockman gmail.com> writes:
```On Friday, 8 April 2016 at 07:28:38 UTC, Andrea Fontana wrote:
I mean to replace sort() on phobos if T == byte || T == ...

I know that's what you meant.

Ok, but it's not a good reason to keep an inefficient sort() on
phobos. I would use algorithms in phobos without thinking if I
have to reimplement them or not.

There are tons of algorithms in Phobos that *could* be made
faster for some combinations of inputs, but it doesn't make sense
to add every possible template specialization because of the
long-term maintenance costs.

My point is that I doubt specializing `sort()` for `bool` and
`byte` is actually worth the costs. That's just my opinion
though, and I could be swayed easily enough given some evidence
of a specific real-world need for the optimization.

In any case, you are free to submit a Phobos pull request if you
think it's worth it.
```
Apr 08 2016
tsbockman <thomas.bockman gmail.com> writes:
```On Friday, 8 April 2016 at 07:28:38 UTC, Andrea Fontana wrote:
Ok, but it's not a good reason to keep an inefficient sort() on
phobos...

Also, your response makes it sound like I was advocating for just
keeping everything like it is now.

But, my main point was actually that counting sort is just a
variant of bucket sort, and it would (in my opinion) make more
sense to write a good generic bucket sort implementation. This
would be applicable to a much wider range of use cases, and could
still be used to speed up `sort()` for `bool`, `byte`, and
`ubyte` if needed.
```
Apr 08 2016
```On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky
wrote:
Coincidentally with another NG thread I'm curious if we can
special-case our sort for
strings to Three-Way Radix QuickSort which is more efficient:

Radix sort is a very fast way to sort strings and integers .
But it does not work with a custom " less" function . It just
sorts date to lexical way .

You can make it work with float as well with some horrible hacks.
I tried to implement it, but performances were not that great at
the end. Maybe I missed something, or maybe this is just not CPU
friendly enough for the sample I had to throw at it.

```
Apr 08 2016
Andrea Fontana <nospam example.com> writes:
```On Friday, 8 April 2016 at 07:33:32 UTC, deadalnix wrote:
On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky
wrote:
Coincidentally with another NG thread I'm curious if we can
special-case our sort for
strings to Three-Way Radix QuickSort which is more efficient:

Radix sort is a very fast way to sort strings and integers .
But it does not work with a custom " less" function . It just
sorts date to lexical way .

You can make it work with float as well with some horrible
hacks. I tried to implement it, but performances were not that
great at the end. Maybe I missed something, or maybe this is
just not CPU friendly enough for the sample I had to throw at
it.

for me than phobos sort (for large arrays 2x, 3x it seems... not
so many benchmarks)

http://dpaste.dzfl.pl/1b4752ff37b1

I don't optimize it that much. Just did a try.

Andrea
```
Apr 08 2016
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 4/8/16 6:06 AM, Andrea Fontana wrote:
On Friday, 8 April 2016 at 07:33:32 UTC, deadalnix wrote:
On Thursday, 7 April 2016 at 13:32:43 UTC, Andrea Fontana wrote:
On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:
Coincidentally with another NG thread I'm curious if we can
special-case our sort for
strings to Three-Way Radix QuickSort which is more efficient:

Radix sort is a very fast way to sort strings and integers . But it
does not work with a custom " less" function . It just sorts date to
lexical way .

You can make it work with float as well with some horrible hacks. I
tried to implement it, but performances were not that great at the
end. Maybe I missed something, or maybe this is just not CPU friendly
enough for the sample I had to throw at it.

Using your functions to radixify float this seems to run faster for me
than phobos sort (for large arrays 2x, 3x it seems... not so many
benchmarks)

http://dpaste.dzfl.pl/1b4752ff37b1

I don't optimize it that much. Just did a try.

Andrea

Guess we need a pull request. In order to distinguish the "less than"
(default) predicate from arbitrary predicates, just define two overloads
of sort, one without the predicate parameter. -- Andrei
```
Apr 08 2016
Andrea Fontana <nospam example.com> writes:
```On Friday, 8 April 2016 at 13:35:53 UTC, Andrei Alexandrescu
wrote:
Guess we need a pull request. In order to distinguish the "less
than" (default) predicate from arbitrary predicates, just
define two overloads of sort, one without the predicate
parameter. -- Andrei

Why not a sort with SortAscending.yes/no or an enum as
(template?) param?
```
Apr 08 2016
Jonathan Villa <jv_vortex msn.com> writes:
```On Thursday, 7 April 2016 at 10:58:21 UTC, Dmitry Olshansky wrote:
Coincidentally with another NG thread I'm curious if we can
special-case our sort for
strings to Three-Way Radix QuickSort which is more efficient:

The other day I had problems using Quicksort (trying to sorting
100_000 doubles) causing stackoverflow for so many recursive
calls (using .NET), then searching for other sorting algorithms I
found SHELL-SORT, it's not recursive and it ended being even
faster than Quicksort (what the heck? xd, well probably the JIT
compiler).

Now you want to sort strings, I don't know in that case, but
maybe that can be useful, who knows? c:

Here's the link where I got the algorithm (is in visual basic D:)
https://support.microsoft.com/en-us/kb/169617

JV
```
Apr 07 2016
tsbockman <thomas.bockman gmail.com> writes:
```On Thursday, 7 April 2016 at 16:23:31 UTC, Jonathan Villa wrote:
...then searching for other sorting algorithms I found
SHELL-SORT, it's not recursive and it ended being even faster
than Quicksort (what the heck? xd, well probably the JIT
compiler).

It would be cool to have a shell sort implementation available
for small data sets. It's a bad choice in the general case
though, because its asymptotic scaling is worse than the standard
O(N log(N)) achieved by stuff like Timsort.

Quicksort is actually bad too, because its worst case asymptotic
complexity is an embarrassing O(N^2), which makes it highly
vulnerable to denial-of-service attacks.
```
Apr 07 2016