digitalmars.D - Sorting algorithm

Xinok (11/11) Oct 07 2011 Hi, it's been years since I've posted here. I just wanted to share

Andrei Alexandrescu (3/14) Oct 07 2011 This is interesting. What do the numbers in the benchmark represent?

Xinok (19/35) Oct 07 2011 I'll just post the code I used for benchmarking. Simply put, smaller

Andrei Alexandrescu (10/17) Oct 07 2011 [snip]

Ellery Newcomer (3/11) Oct 07 2011 stable sort is flat out broken

Andrei Alexandrescu (3/14) Oct 07 2011 Then the contribution would be even more welcome!

Xinok (17/36) Oct 07 2011 I'm not familiar with the process. What all would I need to do to

Andrei Alexandrescu (12/39) Oct 07 2011 D's standard library is on github:

Xinok (5/46) Oct 07 2011 Thanks, I'll look into it when I have a little more time.

Andrei Alexandrescu (4/8) Oct 08 2011 Wouldn't swapping the argument to less work?

Xinok (12/20) Oct 08 2011 The merge step in my algorithm is similar to Timsort, in that it can

Andrei Alexandrescu (7/28) Oct 08 2011 I don't understand. Say all you have is "<". Then you can do everything:

Xinok (9/41) Oct 08 2011 I actually wasn't aware you could do that O.o, although you got the

Andrei Alexandrescu (10/56) Oct 08 2011 That's rig[off - c] < lef[c].

Xinok (2/9) Oct 08 2011 nm, I got it now. >.< I can't believe I didn't know this.

Ellery Newcomer (3/8) Oct 08 2011 it'd be nice if there were some convenience templates somewhere for

Ellery Newcomer (12/12) Oct 07 2011 tinkered with timsort a bit a few months ago. comparing that to your

Xinok (77/89) Oct 07 2011 The benchmark for descending is no surprise since timsort explicitly

Xinok (5/16) Oct 08 2011 To follow up on my original post, I wrote a text file which explains the...

Andrei Alexandrescu (17/35) Oct 08 2011 Nice writeup, but I found it quite difficult to get into. What would

Xinok (20/35) Oct 08 2011 I didn't mean for this text to be anything official. I just felt it was

Andrei Alexandrescu (15/53) Oct 08 2011 My attempt was to make a few suggestions that would improve your

Xinok (16/56) Oct 08 2011 Thanks for that. :) I can't update the original link, so the text is

Xinok (9/27) Oct 09 2011 And to follow up on this post, I created a permanent page for xinok sort...

Xinok <xinok live.com> writes:

Hi, it's been years since I've posted here. I just wanted to share 
something I worked on, a new sorting algorithm. It's a variant of merge 
sort, but much more memory efficient with only a small loss in 
performance. The most similar algorithm I know of is Timsort.

I had need for a stable sorting algorithm, but the performance of stable 
sort in Phobos is terrible. This pretty much left merge sort, which has 
good performance but requires O(n) space. That persuaded me to design my 
own sorting algorithm.

Here, you can find the code, details of the algorithm, and benchmarks 
(introSort is the unstable sort in Phobos).
http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

Oct 07 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/11 11:42 AM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

This is interesting. What do the numbers in the benchmark represent?

Andrei

Oct 07 2011

Xinok <xinok live.com> writes:

On 10/7/2011 1:03 PM, Andrei Alexandrescu wrote:
 On 10/7/11 11:42 AM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

 This is interesting. What do the numbers in the benchmark represent?

 Andrei

I'll just post the code I used for benchmarking. Simply put, smaller 
numbers are faster.

ubyte[16] digest;
uint[] base;
base.length = 1024 * 1024 * 16;
foreach(i, ref v; base) v = i;
randomShuffle(base);

writeln("Is sorted: ", isSorted(base));
writeln("Is length: ", base.length);

SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
long start, finish;

auto copy = base.dup;
QueryPerformanceCounter(&start);
xinokSort(copy);
QueryPerformanceCounter(&finish);
sum(digest, copy);
writeln("xinokSort: ", finish - start, "\t", digestToString(digest));
assert(isSorted(copy), "Array not sorted");

Oct 07 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

 This is interesting. What do the numbers in the benchmark represent?

 Andrei

 I'll just post the code I used for benchmarking. Simply put, smaller
 numbers are faster.

[snip]

Thanks. It would be great if you wanted to contribute your stable sort 
to Phobos via a pull request.

Also, which version of D are you using? I'm seeing that 
std.algorithm.sort (introSort) performs quite badly; for example, it's 
twice as slow on shuffled data against quickSort, and it also deals 
badly with already sorted data. Generally it does much worse than the 
quickSort baseline. Wonder why.


Andrei

Oct 07 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 10/07/2011 01:20 PM, Andrei Alexandrescu wrote:
 Also, which version of D are you using? I'm seeing that
 std.algorithm.sort (introSort) performs quite badly; for example, it's
 twice as slow on shuffled data against quickSort, and it also deals
 badly with already sorted data. Generally it does much worse than the
 quickSort baseline. Wonder why.
 
 
 Andrei

stable sort is flat out broken

http://d.puremagic.com/issues/show_bug.cgi?id=4584

Oct 07 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/07/11 13:36, Ellery Newcomer wrote:
 On 10/07/2011 01:20 PM, Andrei Alexandrescu wrote:
 Also, which version of D are you using? I'm seeing that
 std.algorithm.sort (introSort) performs quite badly; for example, it's
 twice as slow on shuffled data against quickSort, and it also deals
 badly with already sorted data. Generally it does much worse than the
 quickSort baseline. Wonder why.


 Andrei

 stable sort is flat out broken

 http://d.puremagic.com/issues/show_bug.cgi?id=4584

Then the contribution would be even more welcome!

Andrei

Oct 07 2011

Xinok <xinok live.com> writes:

On 10/7/2011 2:20 PM, Andrei Alexandrescu wrote:
 On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

 This is interesting. What do the numbers in the benchmark represent?

 Andrei

 I'll just post the code I used for benchmarking. Simply put, smaller
 numbers are faster.

 [snip]

 Thanks. It would be great if you wanted to contribute your stable sort
 to Phobos via a pull request.

 Also, which version of D are you using? I'm seeing that
 std.algorithm.sort (introSort) performs quite badly; for example, it's
 twice as slow on shuffled data against quickSort, and it also deals
 badly with already sorted data. Generally it does much worse than the
 quickSort baseline. Wonder why.


 Andrei

I'm not familiar with the process. What all would I need to do to 
contribute my sort to Phobos?

On another note, my sort is most efficient on random access ranges. A 
simple merge sort would be more practical for other data structures like 
linked lists.

I used DMD 2.051 for those benchmarks, so I did new benchmarks using DMD 
2.055. Unstable sort saw a big improvement, but stable sort still did 
poorly. I find it unusual since I thought stable sort was supposed to 
use merge sort.

xinokSort:       7564654
shellSort:       8843484
quickSort:       6005074
mergeSort:       6625306
radixSort:       2837697
phobos Unstable: 5559726
phobos Stable:   113924852

Oct 07 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/07/11 13:50, Xinok wrote:
 On 10/7/2011 2:20 PM, Andrei Alexandrescu wrote:
 On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

 This is interesting. What do the numbers in the benchmark represent?

 Andrei

 I'll just post the code I used for benchmarking. Simply put, smaller
 numbers are faster.

 [snip]

 Thanks. It would be great if you wanted to contribute your stable sort
 to Phobos via a pull request.

 Also, which version of D are you using? I'm seeing that
 std.algorithm.sort (introSort) performs quite badly; for example, it's
 twice as slow on shuffled data against quickSort, and it also deals
 badly with already sorted data. Generally it does much worse than the
 quickSort baseline. Wonder why.


 Andrei

 I'm not familiar with the process. What all would I need to do to
 contribute my sort to Phobos?

D's standard library is on github: 
https://github.com/D-Programming-Language/phobos. Anyone can fork it, 
modify it as they please, and then submit the changes for review via a 
so-called "pull request". Here's an example of a pull request with 
comments and all: 
https://github.com/D-Programming-Language/phobos/pull/272. There is 
documentation available on github.com about how to use the site. It's 
some work but it's time well invested - the kin of git and github are 
here to stay. Would anyone in the community be willing to shepherd Xinok 
through the first pull request?


Andrei

Oct 07 2011

Xinok <xinok live.com> writes:

On 10/7/2011 5:08 PM, Andrei Alexandrescu wrote:
 On 10/07/11 13:50, Xinok wrote:
 On 10/7/2011 2:20 PM, Andrei Alexandrescu wrote:
 On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

 This is interesting. What do the numbers in the benchmark represent?

 Andrei

 I'll just post the code I used for benchmarking. Simply put, smaller
 numbers are faster.

 [snip]

 Thanks. It would be great if you wanted to contribute your stable sort
 to Phobos via a pull request.

 Also, which version of D are you using? I'm seeing that
 std.algorithm.sort (introSort) performs quite badly; for example, it's
 twice as slow on shuffled data against quickSort, and it also deals
 badly with already sorted data. Generally it does much worse than the
 quickSort baseline. Wonder why.


 Andrei

 I'm not familiar with the process. What all would I need to do to
 contribute my sort to Phobos?

 D's standard library is on github:
 https://github.com/D-Programming-Language/phobos. Anyone can fork it,
 modify it as they please, and then submit the changes for review via a
 so-called "pull request". Here's an example of a pull request with
 comments and all:
 https://github.com/D-Programming-Language/phobos/pull/272. There is
 documentation available on github.com about how to use the site. It's
 some work but it's time well invested - the kin of git and github are
 here to stay. Would anyone in the community be willing to shepherd Xinok
 through the first pull request?


 Andrei

Thanks, I'll look into it when I have a little more time.

If my algorithm were to be implemented in Phobos, the template arguments 
for std.algorithm.sort would need to be modified. For my algorithm to be 
stable, it requires defining two conditions, both a 'less' and 'greater'.

Oct 07 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

Looking forward to it.

 If my algorithm were to be implemented in Phobos, the template arguments
 for std.algorithm.sort would need to be modified. For my algorithm to be
 stable, it requires defining two conditions, both a 'less' and 'greater'.

Wouldn't swapping the argument to less work?


Andrei

Oct 08 2011

Xinok <xinok live.com> writes:

On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

 Looking forward to it.

 If my algorithm were to be implemented in Phobos, the template arguments
 for std.algorithm.sort would need to be modified. For my algorithm to be
 stable, it requires defining two conditions, both a 'less' and 'greater'.

 Wouldn't swapping the argument to less work?


 Andrei

The merge step in my algorithm is similar to Timsort, in that it can 
merge left to right as well as right to left. This needs two conditions 
to do a stable sort, 'less or equal' and 'greater or equal'.

By swapping the arguments, you can only get 'greater or equal'. You're 
still missing 'less or equal', which is required for my algorithm to be 
stable.

It wouldn't be difficult to overload the template. When using an 
unstable sort, the greater argument can be left empty.
sort!("a < b")
would simply translate to:
sort!("a < b", "", SwapStrategy.unstable)

Oct 08 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/08/11 10:01, Xinok wrote:
 On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

 Looking forward to it.

 If my algorithm were to be implemented in Phobos, the template arguments
 for std.algorithm.sort would need to be modified. For my algorithm to be
 stable, it requires defining two conditions, both a 'less' and
 'greater'.

 Wouldn't swapping the argument to less work?


 Andrei

 The merge step in my algorithm is similar to Timsort, in that it can
 merge left to right as well as right to left. This needs two conditions
 to do a stable sort, 'less or equal' and 'greater or equal'.

 By swapping the arguments, you can only get 'greater or equal'. You're
 still missing 'less or equal', which is required for my algorithm to be
 stable.

I don't understand. Say all you have is "<". Then you can do everything:

a <= b is !(b < a)
a > b is b < a

This is an absolute classic. Probably it's negation that is missing from 
your assumptions? You can always negate a predicate.


Andrei

Oct 08 2011

Xinok <xinok live.com> writes:

On 10/8/2011 11:04 AM, Andrei Alexandrescu wrote:
 On 10/08/11 10:01, Xinok wrote:
 On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

 Looking forward to it.

 If my algorithm were to be implemented in Phobos, the template
 arguments
 for std.algorithm.sort would need to be modified. For my algorithm
 to be
 stable, it requires defining two conditions, both a 'less' and
 'greater'.

 Wouldn't swapping the argument to less work?


 Andrei

 The merge step in my algorithm is similar to Timsort, in that it can
 merge left to right as well as right to left. This needs two conditions
 to do a stable sort, 'less or equal' and 'greater or equal'.

 By swapping the arguments, you can only get 'greater or equal'. You're
 still missing 'less or equal', which is required for my algorithm to be
 stable.

 I don't understand. Say all you have is "<". Then you can do everything:

 a <= b is !(b < a)
 a > b is b < a

 This is an absolute classic. Probably it's negation that is missing from
 your assumptions? You can always negate a predicate.


 Andrei

I actually wasn't aware you could do that O.o, although you got the 
operators wrong in your example.

It does add overhead though. It requires two comparisons, and possibly 
twice the number of lookups, such as in this statement:
lef[c] > rig[off - c]

How about, if the greater argument is empty, then it fills it 
automatically with this. But also give the programmer the option to 
define both conditions for more efficient code.

Oct 08 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/8/11 10:36 AM, Xinok wrote:
 On 10/8/2011 11:04 AM, Andrei Alexandrescu wrote:
 On 10/08/11 10:01, Xinok wrote:
 On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

 Looking forward to it.

 If my algorithm were to be implemented in Phobos, the template
 arguments
 for std.algorithm.sort would need to be modified. For my algorithm
 to be
 stable, it requires defining two conditions, both a 'less' and
 'greater'.

 Wouldn't swapping the argument to less work?


 Andrei

 The merge step in my algorithm is similar to Timsort, in that it can
 merge left to right as well as right to left. This needs two conditions
 to do a stable sort, 'less or equal' and 'greater or equal'.

 By swapping the arguments, you can only get 'greater or equal'. You're
 still missing 'less or equal', which is required for my algorithm to be
 stable.

 I don't understand. Say all you have is "<". Then you can do everything:

 a <= b is !(b < a)
 a > b is b < a

 This is an absolute classic. Probably it's negation that is missing from
 your assumptions? You can always negate a predicate.


 Andrei

 I actually wasn't aware you could do that O.o, although you got the
 operators wrong in your example.

Where?

 It does add overhead though. It requires two comparisons, and possibly
 twice the number of lookups, such as in this statement:
 lef[c] > rig[off - c]

That's rig[off - c] < lef[c].

 How about, if the greater argument is empty, then it fills it
 automatically with this. But also give the programmer the option to
 define both conditions for more efficient code.

I think I'm missing something. Again: if you have "<" then "<=", ">", 
and ">=" are zero cost.

What is more expensive is deciding a == b. You need to do it by saying 
!(a < b) && !(b < a). That's a cost worth paying because the second 
expression is more general - it allows you to define equivalence classes 
by using "<" even though the objects are not equal.


Andrei

Oct 08 2011

Xinok <xinok live.com> writes:

On 10/8/2011 11:56 AM, Andrei Alexandrescu wrote:
 I think I'm missing something. Again: if you have "<" then "<=", ">",
 and ">=" are zero cost.

 What is more expensive is deciding a == b. You need to do it by saying
 !(a < b) && !(b < a). That's a cost worth paying because the second
 expression is more general - it allows you to define equivalence classes
 by using "<" even though the objects are not equal.


 Andrei

nm, I got it now. >.< I can't believe I didn't know this.

Oct 08 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

it'd be nice if there were some convenience templates somewhere for
converting a lt function to le,gt,ge, etc


On 10/08/2011 10:04 AM, Andrei Alexandrescu wrote:
 
 I don't understand. Say all you have is "<". Then you can do everything:
 
 a <= b is !(b < a)
 a > b is b < a

Oct 08 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

tinkered with timsort a bit a few months ago. comparing that to your
sort, I get numbers like

xinokSort
random: 77   ascending: 0   descending: 21

timsort
random: 354   ascending: 1   descending: 4


where each are sorting a 500k element array of int, times are msecs,
compilation flags were -O -inline -release, sources are

http://personal.utulsa.edu/~ellery-newcomer/timsort.d
http://personal.utulsa.edu/~ellery-newcomer/xinokSort.d


Nice job, Xinok.

anyone want to try to optimize my timsort? :)

Oct 07 2011

Xinok <xinok live.com> writes:

On 10/7/2011 2:27 PM, Ellery Newcomer wrote:
 tinkered with timsort a bit a few months ago. comparing that to your
 sort, I get numbers like

 xinokSort
 random: 77   ascending: 0   descending: 21

 timsort
 random: 354   ascending: 1   descending: 4


 where each are sorting a 500k element array of int, times are msecs,
 compilation flags were -O -inline -release, sources are

 http://personal.utulsa.edu/~ellery-newcomer/timsort.d
 http://personal.utulsa.edu/~ellery-newcomer/xinokSort.d


 Nice job, Xinok.

 anyone want to try to optimize my timsort? :)

The benchmark for descending is no surprise since timsort explicitly 
searches for runs in descending order. However, the random benchmark 
isn't what I expected. My sort is a bit slower than a standard merge 
sort, so I would expect Timsort to be faster.

While part of the reason may be your code is unoptimized, it may also be 
the fault of a bug. I ran some more benchmarks and I found a few cases 
where your code failed, specifically when the data is mostly random. 
Just a guess, but your code may have a problem when the "runs" are too 
small.

I also found a case when std.algorithm.sort performs poorly, under Small 
Shuffle + Descending.


Numbers represent time taken; Smaller numbers are faster
An MD5 hash is generated for each sort, to verify it was sorted correctly
phoboSort is unstable std.algorithm.sort

Ascending
Is length: 16777216
xinokSort: 50722        C8B8D3A2B4D9882ABCFC31721EF27145
shellSort: 726046       C8B8D3A2B4D9882ABCFC31721EF27145
quickSort: 943475       C8B8D3A2B4D9882ABCFC31721EF27145
mergeSort: 1778944      C8B8D3A2B4D9882ABCFC31721EF27145
radixSort: 3082901      C8B8D3A2B4D9882ABCFC31721EF27145
phoboSort: 955285       C8B8D3A2B4D9882ABCFC31721EF27145
tim  Sort: 89201        C8B8D3A2B4D9882ABCFC31721EF27145


Descending
Is length: 16777216
xinokSort: 1916452      C8B8D3A2B4D9882ABCFC31721EF27145
shellSort: 2238446      C8B8D3A2B4D9882ABCFC31721EF27145
quickSort: 1581095      C8B8D3A2B4D9882ABCFC31721EF27145
mergeSort: 3390033      C8B8D3A2B4D9882ABCFC31721EF27145
radixSort: 3067271      C8B8D3A2B4D9882ABCFC31721EF27145
phoboSort: 1035827      C8B8D3A2B4D9882ABCFC31721EF27145
tim  Sort: 240896       C8B8D3A2B4D9882ABCFC31721EF27145


Complete Shuffle
Is length: 16777216
xinokSort: 7607511      C8B8D3A2B4D9882ABCFC31721EF27145
shellSort: 8814887      C8B8D3A2B4D9882ABCFC31721EF27145
quickSort: 5623837      C8B8D3A2B4D9882ABCFC31721EF27145
mergeSort: 6704733      C8B8D3A2B4D9882ABCFC31721EF27145
radixSort: 2825567      C8B8D3A2B4D9882ABCFC31721EF27145
phoboSort: 5532275      C8B8D3A2B4D9882ABCFC31721EF27145
tim  Sort: 29142913     E03C4778321F69D8AD10F624E6093599


Small Shuffle (1024 pairs were picked at random and swapped)
Is length: 16777216
xinokSort: 589882       C8B8D3A2B4D9882ABCFC31721EF27145
shellSort: 3651222      C8B8D3A2B4D9882ABCFC31721EF27145
quickSort: 2655391      C8B8D3A2B4D9882ABCFC31721EF27145
mergeSort: 2162840      C8B8D3A2B4D9882ABCFC31721EF27145
radixSort: 2988630      C8B8D3A2B4D9882ABCFC31721EF27145
phoboSort: 963103       C8B8D3A2B4D9882ABCFC31721EF27145
tim  Sort: 6523251      C8B8D3A2B4D9882ABCFC31721EF27145


Large Shuffle (1,048,576 pairs were picked at random and swapped)
Is length: 16777216
xinokSort: 1956126      C8B8D3A2B4D9882ABCFC31721EF27145
shellSort: 7617207      C8B8D3A2B4D9882ABCFC31721EF27145
quickSort: 3089674      C8B8D3A2B4D9882ABCFC31721EF27145
mergeSort: 2606200      C8B8D3A2B4D9882ABCFC31721EF27145
radixSort: 2932306      C8B8D3A2B4D9882ABCFC31721EF27145
phoboSort: 1619484      C8B8D3A2B4D9882ABCFC31721EF27145
tim  Sort: 14883251     DE54E94D1A058459FEA3A80096FCBB18


Small Shuffle + Descending
Is length: 16777216
xinokSort: 2288869      C8B8D3A2B4D9882ABCFC31721EF27145
shellSort: 4599417      C8B8D3A2B4D9882ABCFC31721EF27145
quickSort: 2604222      C8B8D3A2B4D9882ABCFC31721EF27145
mergeSort: 3457241      C8B8D3A2B4D9882ABCFC31721EF27145
radixSort: 3055080      C8B8D3A2B4D9882ABCFC31721EF27145
phoboSort: 261768564    C8B8D3A2B4D9882ABCFC31721EF27145
tim  Sort: 10149160     C8B8D3A2B4D9882ABCFC31721EF27145


Large Shuffle + Descending
xinokSort: 2923813      C8B8D3A2B4D9882ABCFC31721EF27145
shellSort: 7678966      C8B8D3A2B4D9882ABCFC31721EF27145
quickSort: 3833138      C8B8D3A2B4D9882ABCFC31721EF27145
mergeSort: 3971124      C8B8D3A2B4D9882ABCFC31721EF27145
radixSort: 2941206      C8B8D3A2B4D9882ABCFC31721EF27145
phoboSort: 3749512      C8B8D3A2B4D9882ABCFC31721EF27145
tim  Sort: 22709918     191BD30B3D0E52C922A7E6A16E5E63A5

Oct 07 2011

Xinok <xinok live.com> writes:

On 10/7/2011 12:42 PM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

To follow up on my original post, I wrote a text file which explains the 
algorithm in detail. The most important thing to understand is the 
"range swap", which I tried to explain as simply as possible.

http://cl.ly/0H193k2s0G2T1A002v3I/xinokSort.txt

Oct 08 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/8/11 10:11 AM, Xinok wrote:
 On 10/7/2011 12:42 PM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

 To follow up on my original post, I wrote a text file which explains the
 algorithm in detail. The most important thing to understand is the
 "range swap", which I tried to explain as simply as possible.

 http://cl.ly/0H193k2s0G2T1A002v3I/xinokSort.txt

Nice writeup, but I found it quite difficult to get into. What would 
help is anchoring it with already known stuff (if it's not, the reader 
must assume it's unrelated, which makes things difficult). So it would 
be great if you compared and contrasted range swap with the in-place 
merge algorithm (e.g. http://www.sgi.com/tech/stl/inplace_merge.html), 
STL's stable sort (http://www.sgi.com/tech/stl/stable_sort.html) which 
is O(N log(N) log(N)), and possibly with std.algorithm.bringToFront.

Simply presenting a stylized implementation of swap range would be helpful.

Also there are a few oddities in the text:

* "- Constant additional memory (one memory allocation per thread)" -> 
the parenthesis does not sustain the point. There could be one memory 
allocation but it might allocate a non-constant amount.

* All discussion about tail call optimization is unneeded. Tail calls 
can be converted trivially to loops, so don't mention anything. Feel 
free to convert to loops if needed.


Andrei

Oct 08 2011

Xinok <xinok live.com> writes:

On 10/8/2011 12:47 PM, Andrei Alexandrescu wrote:
 Nice writeup, but I found it quite difficult to get into. What would
 help is anchoring it with already known stuff (if it's not, the reader
 must assume it's unrelated, which makes things difficult). So it would
 be great if you compared and contrasted range swap with the in-place
 merge algorithm (e.g. http://www.sgi.com/tech/stl/inplace_merge.html),
 STL's stable sort (http://www.sgi.com/tech/stl/stable_sort.html) which
 is O(N log(N) log(N)), and possibly with std.algorithm.bringToFront.

 Simply presenting a stylized implementation of swap range would be helpful.

I didn't mean for this text to be anything official. I just felt it was 
important to provide an explanation of my algorithm so others could 
better understand my algorithm and it's implications. That's all. 
There's also the issue of, "what if I'm not the first?" I couldn't find 
anything similar to the "range swap", but that doesn't mean it didn't 
already exist.

Writing papers isn't my forte, I'm a self taught programmer. So if my 
algorithm ever gains popularity, I'll let the experts deal with it.

 Also there are a few oddities in the text:

 * "- Constant additional memory (one memory allocation per thread)" ->
 the parenthesis does not sustain the point. There could be one memory
 allocation but it might allocate a non-constant amount.

I thought it was important to clarify that. My algorithm is easy to 
parallelize, but it does require allocating a unique block of memory for 
each thread. It is relatively constant as well, as it would make sense 
that the number of running threads matches the number of cores in the 
hardware. The only reason to allocate a non-constant amount is if you 
include the optimization I mentioned, to allocate O(n/1024) space.

 * All discussion about tail call optimization is unneeded. Tail calls
 can be converted trivially to loops, so don't mention anything. Feel
 free to convert to loops if needed.

I think it's an issue worth addressing though. Some programmers might 
assume that, because it's a variant of merge sort, stack overflows won't 
be an issue. When I originally implemented my algorithm, I didn't use 
tail calls and I had problems with stack overflows on partially sorted 
data. So it is an issue.

Oct 08 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/8/11 12:30 PM, Xinok wrote:
 On 10/8/2011 12:47 PM, Andrei Alexandrescu wrote:
 Nice writeup, but I found it quite difficult to get into. What would
 help is anchoring it with already known stuff (if it's not, the reader
 must assume it's unrelated, which makes things difficult). So it would
 be great if you compared and contrasted range swap with the in-place
 merge algorithm (e.g. http://www.sgi.com/tech/stl/inplace_merge.html),
 STL's stable sort (http://www.sgi.com/tech/stl/stable_sort.html) which
 is O(N log(N) log(N)), and possibly with std.algorithm.bringToFront.

 Simply presenting a stylized implementation of swap range would be
 helpful.

 I didn't mean for this text to be anything official. I just felt it was
 important to provide an explanation of my algorithm so others could
 better understand my algorithm and it's implications. That's all.
 There's also the issue of, "what if I'm not the first?" I couldn't find
 anything similar to the "range swap", but that doesn't mean it didn't
 already exist.

 Writing papers isn't my forte, I'm a self taught programmer. So if my
 algorithm ever gains popularity, I'll let the experts deal with it.

My attempt was to make a few suggestions that would improve your 
writeup, which in turn not only helps popularity of the algorithm, but 
also you to hone a useful skill.

 Also there are a few oddities in the text:

 * "- Constant additional memory (one memory allocation per thread)" ->
 the parenthesis does not sustain the point. There could be one memory
 allocation but it might allocate a non-constant amount.

 I thought it was important to clarify that. My algorithm is easy to
 parallelize, but it does require allocating a unique block of memory for
 each thread. It is relatively constant as well, as it would make sense
 that the number of running threads matches the number of cores in the
 hardware. The only reason to allocate a non-constant amount is if you
 include the optimization I mentioned, to allocate O(n/1024) space.

Then you may want to say:

* Constant additional memory and one call to an allocation routine per 
thread.

 * All discussion about tail call optimization is unneeded. Tail calls
 can be converted trivially to loops, so don't mention anything. Feel
 free to convert to loops if needed.

 I think it's an issue worth addressing though. Some programmers might
 assume that, because it's a variant of merge sort, stack overflows won't
 be an issue. When I originally implemented my algorithm, I didn't use
 tail calls and I had problems with stack overflows on partially sorted
 data. So it is an issue.

No, it's not. Please think through: tail recursion IS A loop, period. 
(BTW I see you use simple tail recursion as opposed to the more general 
tail calls.) It's cut and dried. So no discussion is needed. Just 
mention there is no risk of stack overflow, without ever mentioning tail 
calls. In fact there's not even a need to mention that because otherwise 
the algorithm would have non-constant space complexity, which you 
clarify it doesn't.


Andrei

Oct 08 2011

Xinok <xinok live.com> writes:

On 10/8/2011 1:56 PM, Andrei Alexandrescu wrote:
 On 10/8/11 12:30 PM, Xinok wrote:
 I didn't mean for this text to be anything official. I just felt it was
 important to provide an explanation of my algorithm so others could
 better understand my algorithm and it's implications. That's all.
 There's also the issue of, "what if I'm not the first?" I couldn't find
 anything similar to the "range swap", but that doesn't mean it didn't
 already exist.

 Writing papers isn't my forte, I'm a self taught programmer. So if my
 algorithm ever gains popularity, I'll let the experts deal with it.

 My attempt was to make a few suggestions that would improve your
 writeup, which in turn not only helps popularity of the algorithm, but
 also you to hone a useful skill.

Thanks for that. :) I can't update the original link, so the text is 
what it is. I'll look into making a better write-up of my algorithm and 
posting it somewhere more permanent.

 Also there are a few oddities in the text:

 * "- Constant additional memory (one memory allocation per thread)" ->
 the parenthesis does not sustain the point. There could be one memory
 allocation but it might allocate a non-constant amount.

 I thought it was important to clarify that. My algorithm is easy to
 parallelize, but it does require allocating a unique block of memory for
 each thread. It is relatively constant as well, as it would make sense
 that the number of running threads matches the number of cores in the
 hardware. The only reason to allocate a non-constant amount is if you
 include the optimization I mentioned, to allocate O(n/1024) space.

 Then you may want to say:

 * Constant additional memory and one call to an allocation routine per
 thread.

I'll do that.

 I think it's an issue worth addressing though. Some programmers might
 assume that, because it's a variant of merge sort, stack overflows won't
 be an issue. When I originally implemented my algorithm, I didn't use
 tail calls and I had problems with stack overflows on partially sorted
 data. So it is an issue.

 No, it's not. Please think through: tail recursion IS A loop, period.
 (BTW I see you use simple tail recursion as opposed to the more general
 tail calls.) It's cut and dried. So no discussion is needed. Just
 mention there is no risk of stack overflow, without ever mentioning tail
 calls. In fact there's not even a need to mention that because otherwise
 the algorithm would have non-constant space complexity, which you
 clarify it doesn't.

Take a look at the Wikipedia page for quick sort. It mentions issues 
with the algorithm, such as using the first element as the pivot causes 
worst-case behavior on sorted data, and even something I didn't know 
about is integer overflow when choosing a pivot. It also mentions tail 
calls / tail recursion five times.
https://secure.wikimedia.org/wikipedia/en/wiki/Quick_sort#Implementation_issues

I'll reduce talk about tail calls, I'll probably just list it under 
optimizations, but I'm not removing it completely. Otherwise, I'll make 
a note that, depending on the implementation, the range swap can result 
in a stack overflow, but I won't mention tail calls / tail recursion as 
a solution.

Oct 08 2011

Xinok <xinok live.com> writes:

On 10/8/2011 11:11 AM, Xinok wrote:
 On 10/7/2011 12:42 PM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

 To follow up on my original post, I wrote a text file which explains the
 algorithm in detail. The most important thing to understand is the
 "range swap", which I tried to explain as simply as possible.

 http://cl.ly/0H193k2s0G2T1A002v3I/xinokSort.txt

And to follow up on this post, I created a permanent page for xinok sort 
on Sourceforge. There's nothing new to see on this page yet, but I'll 
work on that in due time.

https://sourceforge.net/projects/xinoksort/

My first goal is to write an implementation which I can contribute as 
the new stable sorting algorithm for Phobos. This includes string 
mixins, ranges, asserts, unittests, and anything else that is required. 
If anybody wants to assist me with this, I welcome the help.

Oct 09 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Sorting algorithm