www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Sorting algorithm

reply Xinok <xinok live.com> writes:
Hi, it's been years since I've posted here. I just wanted to share 
something I worked on, a new sorting algorithm. It's a variant of merge 
sort, but much more memory efficient with only a small loss in 
performance. The most similar algorithm I know of is Timsort.

I had need for a stable sorting algorithm, but the performance of stable 
sort in Phobos is terrible. This pretty much left merge sort, which has 
good performance but requires O(n) space. That persuaded me to design my 
own sorting algorithm.

Here, you can find the code, details of the algorithm, and benchmarks 
(introSort is the unstable sort in Phobos).
http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/
Oct 07 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/11 11:42 AM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

This is interesting. What do the numbers in the benchmark represent? Andrei
Oct 07 2011
parent reply Xinok <xinok live.com> writes:
On 10/7/2011 1:03 PM, Andrei Alexandrescu wrote:
 On 10/7/11 11:42 AM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

This is interesting. What do the numbers in the benchmark represent? Andrei

I'll just post the code I used for benchmarking. Simply put, smaller numbers are faster. ubyte[16] digest; uint[] base; base.length = 1024 * 1024 * 16; foreach(i, ref v; base) v = i; randomShuffle(base); writeln("Is sorted: ", isSorted(base)); writeln("Is length: ", base.length); SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL); long start, finish; auto copy = base.dup; QueryPerformanceCounter(&start); xinokSort(copy); QueryPerformanceCounter(&finish); sum(digest, copy); writeln("xinokSort: ", finish - start, "\t", digestToString(digest)); assert(isSorted(copy), "Array not sorted");
Oct 07 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

This is interesting. What do the numbers in the benchmark represent? Andrei

I'll just post the code I used for benchmarking. Simply put, smaller numbers are faster.

Thanks. It would be great if you wanted to contribute your stable sort to Phobos via a pull request. Also, which version of D are you using? I'm seeing that std.algorithm.sort (introSort) performs quite badly; for example, it's twice as slow on shuffled data against quickSort, and it also deals badly with already sorted data. Generally it does much worse than the quickSort baseline. Wonder why. Andrei
Oct 07 2011
next sibling parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 10/07/2011 01:20 PM, Andrei Alexandrescu wrote:
 Also, which version of D are you using? I'm seeing that
 std.algorithm.sort (introSort) performs quite badly; for example, it's
 twice as slow on shuffled data against quickSort, and it also deals
 badly with already sorted data. Generally it does much worse than the
 quickSort baseline. Wonder why.
 
 
 Andrei

stable sort is flat out broken http://d.puremagic.com/issues/show_bug.cgi?id=4584
Oct 07 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/07/11 13:36, Ellery Newcomer wrote:
 On 10/07/2011 01:20 PM, Andrei Alexandrescu wrote:
 Also, which version of D are you using? I'm seeing that
 std.algorithm.sort (introSort) performs quite badly; for example, it's
 twice as slow on shuffled data against quickSort, and it also deals
 badly with already sorted data. Generally it does much worse than the
 quickSort baseline. Wonder why.


 Andrei

stable sort is flat out broken http://d.puremagic.com/issues/show_bug.cgi?id=4584

Then the contribution would be even more welcome! Andrei
Oct 07 2011
prev sibling parent reply Xinok <xinok live.com> writes:
On 10/7/2011 2:20 PM, Andrei Alexandrescu wrote:
 On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

This is interesting. What do the numbers in the benchmark represent? Andrei

I'll just post the code I used for benchmarking. Simply put, smaller numbers are faster.

Thanks. It would be great if you wanted to contribute your stable sort to Phobos via a pull request. Also, which version of D are you using? I'm seeing that std.algorithm.sort (introSort) performs quite badly; for example, it's twice as slow on shuffled data against quickSort, and it also deals badly with already sorted data. Generally it does much worse than the quickSort baseline. Wonder why. Andrei

I'm not familiar with the process. What all would I need to do to contribute my sort to Phobos? On another note, my sort is most efficient on random access ranges. A simple merge sort would be more practical for other data structures like linked lists. I used DMD 2.051 for those benchmarks, so I did new benchmarks using DMD 2.055. Unstable sort saw a big improvement, but stable sort still did poorly. I find it unusual since I thought stable sort was supposed to use merge sort. xinokSort: 7564654 shellSort: 8843484 quickSort: 6005074 mergeSort: 6625306 radixSort: 2837697 phobos Unstable: 5559726 phobos Stable: 113924852
Oct 07 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/07/11 13:50, Xinok wrote:
 On 10/7/2011 2:20 PM, Andrei Alexandrescu wrote:
 On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

This is interesting. What do the numbers in the benchmark represent? Andrei

I'll just post the code I used for benchmarking. Simply put, smaller numbers are faster.

Thanks. It would be great if you wanted to contribute your stable sort to Phobos via a pull request. Also, which version of D are you using? I'm seeing that std.algorithm.sort (introSort) performs quite badly; for example, it's twice as slow on shuffled data against quickSort, and it also deals badly with already sorted data. Generally it does much worse than the quickSort baseline. Wonder why. Andrei

I'm not familiar with the process. What all would I need to do to contribute my sort to Phobos?

D's standard library is on github: https://github.com/D-Programming-Language/phobos. Anyone can fork it, modify it as they please, and then submit the changes for review via a so-called "pull request". Here's an example of a pull request with comments and all: https://github.com/D-Programming-Language/phobos/pull/272. There is documentation available on github.com about how to use the site. It's some work but it's time well invested - the kin of git and github are here to stay. Would anyone in the community be willing to shepherd Xinok through the first pull request? Andrei
Oct 07 2011
parent reply Xinok <xinok live.com> writes:
On 10/7/2011 5:08 PM, Andrei Alexandrescu wrote:
 On 10/07/11 13:50, Xinok wrote:
 On 10/7/2011 2:20 PM, Andrei Alexandrescu wrote:
 On 10/7/11 12:23 PM, Xinok wrote:
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

This is interesting. What do the numbers in the benchmark represent? Andrei

I'll just post the code I used for benchmarking. Simply put, smaller numbers are faster.

Thanks. It would be great if you wanted to contribute your stable sort to Phobos via a pull request. Also, which version of D are you using? I'm seeing that std.algorithm.sort (introSort) performs quite badly; for example, it's twice as slow on shuffled data against quickSort, and it also deals badly with already sorted data. Generally it does much worse than the quickSort baseline. Wonder why. Andrei

I'm not familiar with the process. What all would I need to do to contribute my sort to Phobos?

D's standard library is on github: https://github.com/D-Programming-Language/phobos. Anyone can fork it, modify it as they please, and then submit the changes for review via a so-called "pull request". Here's an example of a pull request with comments and all: https://github.com/D-Programming-Language/phobos/pull/272. There is documentation available on github.com about how to use the site. It's some work but it's time well invested - the kin of git and github are here to stay. Would anyone in the community be willing to shepherd Xinok through the first pull request? Andrei

Thanks, I'll look into it when I have a little more time. If my algorithm were to be implemented in Phobos, the template arguments for std.algorithm.sort would need to be modified. For my algorithm to be stable, it requires defining two conditions, both a 'less' and 'greater'.
Oct 07 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

Looking forward to it.
 If my algorithm were to be implemented in Phobos, the template arguments
 for std.algorithm.sort would need to be modified. For my algorithm to be
 stable, it requires defining two conditions, both a 'less' and 'greater'.

Wouldn't swapping the argument to less work? Andrei
Oct 08 2011
parent reply Xinok <xinok live.com> writes:
On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

Looking forward to it.
 If my algorithm were to be implemented in Phobos, the template arguments
 for std.algorithm.sort would need to be modified. For my algorithm to be
 stable, it requires defining two conditions, both a 'less' and 'greater'.

Wouldn't swapping the argument to less work? Andrei

The merge step in my algorithm is similar to Timsort, in that it can merge left to right as well as right to left. This needs two conditions to do a stable sort, 'less or equal' and 'greater or equal'. By swapping the arguments, you can only get 'greater or equal'. You're still missing 'less or equal', which is required for my algorithm to be stable. It wouldn't be difficult to overload the template. When using an unstable sort, the greater argument can be left empty. sort!("a < b") would simply translate to: sort!("a < b", "", SwapStrategy.unstable)
Oct 08 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/08/11 10:01, Xinok wrote:
 On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

Looking forward to it.
 If my algorithm were to be implemented in Phobos, the template arguments
 for std.algorithm.sort would need to be modified. For my algorithm to be
 stable, it requires defining two conditions, both a 'less' and
 'greater'.

Wouldn't swapping the argument to less work? Andrei

The merge step in my algorithm is similar to Timsort, in that it can merge left to right as well as right to left. This needs two conditions to do a stable sort, 'less or equal' and 'greater or equal'. By swapping the arguments, you can only get 'greater or equal'. You're still missing 'less or equal', which is required for my algorithm to be stable.

I don't understand. Say all you have is "<". Then you can do everything: a <= b is !(b < a) a > b is b < a This is an absolute classic. Probably it's negation that is missing from your assumptions? You can always negate a predicate. Andrei
Oct 08 2011
next sibling parent reply Xinok <xinok live.com> writes:
On 10/8/2011 11:04 AM, Andrei Alexandrescu wrote:
 On 10/08/11 10:01, Xinok wrote:
 On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

Looking forward to it.
 If my algorithm were to be implemented in Phobos, the template
 arguments
 for std.algorithm.sort would need to be modified. For my algorithm
 to be
 stable, it requires defining two conditions, both a 'less' and
 'greater'.

Wouldn't swapping the argument to less work? Andrei

The merge step in my algorithm is similar to Timsort, in that it can merge left to right as well as right to left. This needs two conditions to do a stable sort, 'less or equal' and 'greater or equal'. By swapping the arguments, you can only get 'greater or equal'. You're still missing 'less or equal', which is required for my algorithm to be stable.

I don't understand. Say all you have is "<". Then you can do everything: a <= b is !(b < a) a > b is b < a This is an absolute classic. Probably it's negation that is missing from your assumptions? You can always negate a predicate. Andrei

I actually wasn't aware you could do that O.o, although you got the operators wrong in your example. It does add overhead though. It requires two comparisons, and possibly twice the number of lookups, such as in this statement: lef[c] > rig[off - c] How about, if the greater argument is empty, then it fills it automatically with this. But also give the programmer the option to define both conditions for more efficient code.
Oct 08 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/11 10:36 AM, Xinok wrote:
 On 10/8/2011 11:04 AM, Andrei Alexandrescu wrote:
 On 10/08/11 10:01, Xinok wrote:
 On 10/8/2011 9:15 AM, Andrei Alexandrescu wrote:
 On 10/07/11 23:00, Xinok wrote:
 Thanks, I'll look into it when I have a little more time.

Looking forward to it.
 If my algorithm were to be implemented in Phobos, the template
 arguments
 for std.algorithm.sort would need to be modified. For my algorithm
 to be
 stable, it requires defining two conditions, both a 'less' and
 'greater'.

Wouldn't swapping the argument to less work? Andrei

The merge step in my algorithm is similar to Timsort, in that it can merge left to right as well as right to left. This needs two conditions to do a stable sort, 'less or equal' and 'greater or equal'. By swapping the arguments, you can only get 'greater or equal'. You're still missing 'less or equal', which is required for my algorithm to be stable.

I don't understand. Say all you have is "<". Then you can do everything: a <= b is !(b < a) a > b is b < a This is an absolute classic. Probably it's negation that is missing from your assumptions? You can always negate a predicate. Andrei

I actually wasn't aware you could do that O.o, although you got the operators wrong in your example.

Where?
 It does add overhead though. It requires two comparisons, and possibly
 twice the number of lookups, such as in this statement:
 lef[c] > rig[off - c]

That's rig[off - c] < lef[c].
 How about, if the greater argument is empty, then it fills it
 automatically with this. But also give the programmer the option to
 define both conditions for more efficient code.

I think I'm missing something. Again: if you have "<" then "<=", ">", and ">=" are zero cost. What is more expensive is deciding a == b. You need to do it by saying !(a < b) && !(b < a). That's a cost worth paying because the second expression is more general - it allows you to define equivalence classes by using "<" even though the objects are not equal. Andrei
Oct 08 2011
parent Xinok <xinok live.com> writes:
On 10/8/2011 11:56 AM, Andrei Alexandrescu wrote:
 I think I'm missing something. Again: if you have "<" then "<=", ">",
 and ">=" are zero cost.

 What is more expensive is deciding a == b. You need to do it by saying
 !(a < b) && !(b < a). That's a cost worth paying because the second
 expression is more general - it allows you to define equivalence classes
 by using "<" even though the objects are not equal.


 Andrei

nm, I got it now. >.< I can't believe I didn't know this.
Oct 08 2011
prev sibling parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
it'd be nice if there were some convenience templates somewhere for
converting a lt function to le,gt,ge, etc


On 10/08/2011 10:04 AM, Andrei Alexandrescu wrote:
 
 I don't understand. Say all you have is "<". Then you can do everything:
 
 a <= b is !(b < a)
 a > b is b < a

Oct 08 2011
prev sibling next sibling parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
tinkered with timsort a bit a few months ago. comparing that to your
sort, I get numbers like

xinokSort
random: 77   ascending: 0   descending: 21

timsort
random: 354   ascending: 1   descending: 4


where each are sorting a 500k element array of int, times are msecs,
compilation flags were -O -inline -release, sources are

http://personal.utulsa.edu/~ellery-newcomer/timsort.d
http://personal.utulsa.edu/~ellery-newcomer/xinokSort.d


Nice job, Xinok.

anyone want to try to optimize my timsort? :)
Oct 07 2011
parent Xinok <xinok live.com> writes:
On 10/7/2011 2:27 PM, Ellery Newcomer wrote:
 tinkered with timsort a bit a few months ago. comparing that to your
 sort, I get numbers like

 xinokSort
 random: 77   ascending: 0   descending: 21

 timsort
 random: 354   ascending: 1   descending: 4


 where each are sorting a 500k element array of int, times are msecs,
 compilation flags were -O -inline -release, sources are

 http://personal.utulsa.edu/~ellery-newcomer/timsort.d
 http://personal.utulsa.edu/~ellery-newcomer/xinokSort.d


 Nice job, Xinok.

 anyone want to try to optimize my timsort? :)

The benchmark for descending is no surprise since timsort explicitly searches for runs in descending order. However, the random benchmark isn't what I expected. My sort is a bit slower than a standard merge sort, so I would expect Timsort to be faster. While part of the reason may be your code is unoptimized, it may also be the fault of a bug. I ran some more benchmarks and I found a few cases where your code failed, specifically when the data is mostly random. Just a guess, but your code may have a problem when the "runs" are too small. I also found a case when std.algorithm.sort performs poorly, under Small Shuffle + Descending. Numbers represent time taken; Smaller numbers are faster An MD5 hash is generated for each sort, to verify it was sorted correctly phoboSort is unstable std.algorithm.sort Ascending Is length: 16777216 xinokSort: 50722 C8B8D3A2B4D9882ABCFC31721EF27145 shellSort: 726046 C8B8D3A2B4D9882ABCFC31721EF27145 quickSort: 943475 C8B8D3A2B4D9882ABCFC31721EF27145 mergeSort: 1778944 C8B8D3A2B4D9882ABCFC31721EF27145 radixSort: 3082901 C8B8D3A2B4D9882ABCFC31721EF27145 phoboSort: 955285 C8B8D3A2B4D9882ABCFC31721EF27145 tim Sort: 89201 C8B8D3A2B4D9882ABCFC31721EF27145 Descending Is length: 16777216 xinokSort: 1916452 C8B8D3A2B4D9882ABCFC31721EF27145 shellSort: 2238446 C8B8D3A2B4D9882ABCFC31721EF27145 quickSort: 1581095 C8B8D3A2B4D9882ABCFC31721EF27145 mergeSort: 3390033 C8B8D3A2B4D9882ABCFC31721EF27145 radixSort: 3067271 C8B8D3A2B4D9882ABCFC31721EF27145 phoboSort: 1035827 C8B8D3A2B4D9882ABCFC31721EF27145 tim Sort: 240896 C8B8D3A2B4D9882ABCFC31721EF27145 Complete Shuffle Is length: 16777216 xinokSort: 7607511 C8B8D3A2B4D9882ABCFC31721EF27145 shellSort: 8814887 C8B8D3A2B4D9882ABCFC31721EF27145 quickSort: 5623837 C8B8D3A2B4D9882ABCFC31721EF27145 mergeSort: 6704733 C8B8D3A2B4D9882ABCFC31721EF27145 radixSort: 2825567 C8B8D3A2B4D9882ABCFC31721EF27145 phoboSort: 5532275 C8B8D3A2B4D9882ABCFC31721EF27145 tim Sort: 29142913 E03C4778321F69D8AD10F624E6093599 Small Shuffle (1024 pairs were picked at random and swapped) Is length: 16777216 xinokSort: 589882 C8B8D3A2B4D9882ABCFC31721EF27145 shellSort: 3651222 C8B8D3A2B4D9882ABCFC31721EF27145 quickSort: 2655391 C8B8D3A2B4D9882ABCFC31721EF27145 mergeSort: 2162840 C8B8D3A2B4D9882ABCFC31721EF27145 radixSort: 2988630 C8B8D3A2B4D9882ABCFC31721EF27145 phoboSort: 963103 C8B8D3A2B4D9882ABCFC31721EF27145 tim Sort: 6523251 C8B8D3A2B4D9882ABCFC31721EF27145 Large Shuffle (1,048,576 pairs were picked at random and swapped) Is length: 16777216 xinokSort: 1956126 C8B8D3A2B4D9882ABCFC31721EF27145 shellSort: 7617207 C8B8D3A2B4D9882ABCFC31721EF27145 quickSort: 3089674 C8B8D3A2B4D9882ABCFC31721EF27145 mergeSort: 2606200 C8B8D3A2B4D9882ABCFC31721EF27145 radixSort: 2932306 C8B8D3A2B4D9882ABCFC31721EF27145 phoboSort: 1619484 C8B8D3A2B4D9882ABCFC31721EF27145 tim Sort: 14883251 DE54E94D1A058459FEA3A80096FCBB18 Small Shuffle + Descending Is length: 16777216 xinokSort: 2288869 C8B8D3A2B4D9882ABCFC31721EF27145 shellSort: 4599417 C8B8D3A2B4D9882ABCFC31721EF27145 quickSort: 2604222 C8B8D3A2B4D9882ABCFC31721EF27145 mergeSort: 3457241 C8B8D3A2B4D9882ABCFC31721EF27145 radixSort: 3055080 C8B8D3A2B4D9882ABCFC31721EF27145 phoboSort: 261768564 C8B8D3A2B4D9882ABCFC31721EF27145 tim Sort: 10149160 C8B8D3A2B4D9882ABCFC31721EF27145 Large Shuffle + Descending xinokSort: 2923813 C8B8D3A2B4D9882ABCFC31721EF27145 shellSort: 7678966 C8B8D3A2B4D9882ABCFC31721EF27145 quickSort: 3833138 C8B8D3A2B4D9882ABCFC31721EF27145 mergeSort: 3971124 C8B8D3A2B4D9882ABCFC31721EF27145 radixSort: 2941206 C8B8D3A2B4D9882ABCFC31721EF27145 phoboSort: 3749512 C8B8D3A2B4D9882ABCFC31721EF27145 tim Sort: 22709918 191BD30B3D0E52C922A7E6A16E5E63A5
Oct 07 2011
prev sibling parent reply Xinok <xinok live.com> writes:
On 10/7/2011 12:42 PM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

To follow up on my original post, I wrote a text file which explains the algorithm in detail. The most important thing to understand is the "range swap", which I tried to explain as simply as possible. http://cl.ly/0H193k2s0G2T1A002v3I/xinokSort.txt
Oct 08 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/11 10:11 AM, Xinok wrote:
 On 10/7/2011 12:42 PM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

To follow up on my original post, I wrote a text file which explains the algorithm in detail. The most important thing to understand is the "range swap", which I tried to explain as simply as possible. http://cl.ly/0H193k2s0G2T1A002v3I/xinokSort.txt

Nice writeup, but I found it quite difficult to get into. What would help is anchoring it with already known stuff (if it's not, the reader must assume it's unrelated, which makes things difficult). So it would be great if you compared and contrasted range swap with the in-place merge algorithm (e.g. http://www.sgi.com/tech/stl/inplace_merge.html), STL's stable sort (http://www.sgi.com/tech/stl/stable_sort.html) which is O(N log(N) log(N)), and possibly with std.algorithm.bringToFront. Simply presenting a stylized implementation of swap range would be helpful. Also there are a few oddities in the text: * "- Constant additional memory (one memory allocation per thread)" -> the parenthesis does not sustain the point. There could be one memory allocation but it might allocate a non-constant amount. * All discussion about tail call optimization is unneeded. Tail calls can be converted trivially to loops, so don't mention anything. Feel free to convert to loops if needed. Andrei
Oct 08 2011
parent reply Xinok <xinok live.com> writes:
On 10/8/2011 12:47 PM, Andrei Alexandrescu wrote:
 Nice writeup, but I found it quite difficult to get into. What would
 help is anchoring it with already known stuff (if it's not, the reader
 must assume it's unrelated, which makes things difficult). So it would
 be great if you compared and contrasted range swap with the in-place
 merge algorithm (e.g. http://www.sgi.com/tech/stl/inplace_merge.html),
 STL's stable sort (http://www.sgi.com/tech/stl/stable_sort.html) which
 is O(N log(N) log(N)), and possibly with std.algorithm.bringToFront.

 Simply presenting a stylized implementation of swap range would be helpful.

I didn't mean for this text to be anything official. I just felt it was important to provide an explanation of my algorithm so others could better understand my algorithm and it's implications. That's all. There's also the issue of, "what if I'm not the first?" I couldn't find anything similar to the "range swap", but that doesn't mean it didn't already exist. Writing papers isn't my forte, I'm a self taught programmer. So if my algorithm ever gains popularity, I'll let the experts deal with it.
 Also there are a few oddities in the text:

 * "- Constant additional memory (one memory allocation per thread)" ->
 the parenthesis does not sustain the point. There could be one memory
 allocation but it might allocate a non-constant amount.

I thought it was important to clarify that. My algorithm is easy to parallelize, but it does require allocating a unique block of memory for each thread. It is relatively constant as well, as it would make sense that the number of running threads matches the number of cores in the hardware. The only reason to allocate a non-constant amount is if you include the optimization I mentioned, to allocate O(n/1024) space.
 * All discussion about tail call optimization is unneeded. Tail calls
 can be converted trivially to loops, so don't mention anything. Feel
 free to convert to loops if needed.

I think it's an issue worth addressing though. Some programmers might assume that, because it's a variant of merge sort, stack overflows won't be an issue. When I originally implemented my algorithm, I didn't use tail calls and I had problems with stack overflows on partially sorted data. So it is an issue.
Oct 08 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/11 12:30 PM, Xinok wrote:
 On 10/8/2011 12:47 PM, Andrei Alexandrescu wrote:
 Nice writeup, but I found it quite difficult to get into. What would
 help is anchoring it with already known stuff (if it's not, the reader
 must assume it's unrelated, which makes things difficult). So it would
 be great if you compared and contrasted range swap with the in-place
 merge algorithm (e.g. http://www.sgi.com/tech/stl/inplace_merge.html),
 STL's stable sort (http://www.sgi.com/tech/stl/stable_sort.html) which
 is O(N log(N) log(N)), and possibly with std.algorithm.bringToFront.

 Simply presenting a stylized implementation of swap range would be
 helpful.

I didn't mean for this text to be anything official. I just felt it was important to provide an explanation of my algorithm so others could better understand my algorithm and it's implications. That's all. There's also the issue of, "what if I'm not the first?" I couldn't find anything similar to the "range swap", but that doesn't mean it didn't already exist. Writing papers isn't my forte, I'm a self taught programmer. So if my algorithm ever gains popularity, I'll let the experts deal with it.

My attempt was to make a few suggestions that would improve your writeup, which in turn not only helps popularity of the algorithm, but also you to hone a useful skill.
 Also there are a few oddities in the text:

 * "- Constant additional memory (one memory allocation per thread)" ->
 the parenthesis does not sustain the point. There could be one memory
 allocation but it might allocate a non-constant amount.

I thought it was important to clarify that. My algorithm is easy to parallelize, but it does require allocating a unique block of memory for each thread. It is relatively constant as well, as it would make sense that the number of running threads matches the number of cores in the hardware. The only reason to allocate a non-constant amount is if you include the optimization I mentioned, to allocate O(n/1024) space.

Then you may want to say: * Constant additional memory and one call to an allocation routine per thread.
 * All discussion about tail call optimization is unneeded. Tail calls
 can be converted trivially to loops, so don't mention anything. Feel
 free to convert to loops if needed.

I think it's an issue worth addressing though. Some programmers might assume that, because it's a variant of merge sort, stack overflows won't be an issue. When I originally implemented my algorithm, I didn't use tail calls and I had problems with stack overflows on partially sorted data. So it is an issue.

No, it's not. Please think through: tail recursion IS A loop, period. (BTW I see you use simple tail recursion as opposed to the more general tail calls.) It's cut and dried. So no discussion is needed. Just mention there is no risk of stack overflow, without ever mentioning tail calls. In fact there's not even a need to mention that because otherwise the algorithm would have non-constant space complexity, which you clarify it doesn't. Andrei
Oct 08 2011
parent Xinok <xinok live.com> writes:
On 10/8/2011 1:56 PM, Andrei Alexandrescu wrote:
 On 10/8/11 12:30 PM, Xinok wrote:
 I didn't mean for this text to be anything official. I just felt it was
 important to provide an explanation of my algorithm so others could
 better understand my algorithm and it's implications. That's all.
 There's also the issue of, "what if I'm not the first?" I couldn't find
 anything similar to the "range swap", but that doesn't mean it didn't
 already exist.

 Writing papers isn't my forte, I'm a self taught programmer. So if my
 algorithm ever gains popularity, I'll let the experts deal with it.

My attempt was to make a few suggestions that would improve your writeup, which in turn not only helps popularity of the algorithm, but also you to hone a useful skill.

Thanks for that. :) I can't update the original link, so the text is what it is. I'll look into making a better write-up of my algorithm and posting it somewhere more permanent.
 Also there are a few oddities in the text:

 * "- Constant additional memory (one memory allocation per thread)" ->
 the parenthesis does not sustain the point. There could be one memory
 allocation but it might allocate a non-constant amount.

I thought it was important to clarify that. My algorithm is easy to parallelize, but it does require allocating a unique block of memory for each thread. It is relatively constant as well, as it would make sense that the number of running threads matches the number of cores in the hardware. The only reason to allocate a non-constant amount is if you include the optimization I mentioned, to allocate O(n/1024) space.

Then you may want to say: * Constant additional memory and one call to an allocation routine per thread.

I'll do that.
 I think it's an issue worth addressing though. Some programmers might
 assume that, because it's a variant of merge sort, stack overflows won't
 be an issue. When I originally implemented my algorithm, I didn't use
 tail calls and I had problems with stack overflows on partially sorted
 data. So it is an issue.

No, it's not. Please think through: tail recursion IS A loop, period. (BTW I see you use simple tail recursion as opposed to the more general tail calls.) It's cut and dried. So no discussion is needed. Just mention there is no risk of stack overflow, without ever mentioning tail calls. In fact there's not even a need to mention that because otherwise the algorithm would have non-constant space complexity, which you clarify it doesn't.

Take a look at the Wikipedia page for quick sort. It mentions issues with the algorithm, such as using the first element as the pivot causes worst-case behavior on sorted data, and even something I didn't know about is integer overflow when choosing a pivot. It also mentions tail calls / tail recursion five times. https://secure.wikimedia.org/wikipedia/en/wiki/Quick_sort#Implementation_issues I'll reduce talk about tail calls, I'll probably just list it under optimizations, but I'm not removing it completely. Otherwise, I'll make a note that, depending on the implementation, the range swap can result in a stack overflow, but I won't mention tail calls / tail recursion as a solution.
Oct 08 2011
prev sibling parent Xinok <xinok live.com> writes:
On 10/8/2011 11:11 AM, Xinok wrote:
 On 10/7/2011 12:42 PM, Xinok wrote:
 Hi, it's been years since I've posted here. I just wanted to share
 something I worked on, a new sorting algorithm. It's a variant of merge
 sort, but much more memory efficient with only a small loss in
 performance. The most similar algorithm I know of is Timsort.

 I had need for a stable sorting algorithm, but the performance of stable
 sort in Phobos is terrible. This pretty much left merge sort, which has
 good performance but requires O(n) space. That persuaded me to design my
 own sorting algorithm.

 Here, you can find the code, details of the algorithm, and benchmarks
 (introSort is the unstable sort in Phobos).
 http://www.neowin.net/forum/blog/422/entry-3737-sort-algorithm-complete/

To follow up on my original post, I wrote a text file which explains the algorithm in detail. The most important thing to understand is the "range swap", which I tried to explain as simply as possible. http://cl.ly/0H193k2s0G2T1A002v3I/xinokSort.txt

And to follow up on this post, I created a permanent page for xinok sort on Sourceforge. There's nothing new to see on this page yet, but I'll work on that in due time. https://sourceforge.net/projects/xinoksort/ My first goal is to write an implementation which I can contribute as the new stable sorting algorithm for Phobos. This includes string mixins, ranges, asserts, unittests, and anything else that is required. If anybody wants to assist me with this, I welcome the help.
Oct 09 2011