## digitalmars.D - Timsort vs some others

- bearophile (7/7) Dec 17 2012 Regarding the recent Phobos im...
- Xinok (25/33) Dec 17 2012 While Smoothsort may be mathem...
- Peter Alexander (5/10) Dec 18 2012 Different implementations use ...
- Andrei Alexandrescu (41/49) Dec 18 2012 We should use a deferred pivot...
- Xinok (13/21) Dec 18 2012 Perhaps a "median of log n" is...
- Andrei Alexandrescu (7/27) Dec 18 2012 You don't need to choose a med...
- bearophile (11/13) Dec 18 2012 Do you mean to use it when the...
- Andrei Alexandrescu (4/15) Dec 18 2012 That's a commonly used approac...
- ixid (5/47) Dec 18 2012 A while ago in another thread ...
- Andrei Alexandrescu (3/6) Dec 18 2012 Not that I know of....
- Philippe Sigaud (4/10) Dec 19 2012 My bad, that was me and I got ...
- Xinok (4/7) Dec 18 2012 I don't think it would work we...
- Andrei Alexandrescu (3/8) Dec 18 2012 I mostly fear cache touching i...
- Xinok (37/49) Dec 21 2012 I based my little experiment o...
- Joseph Rushton Wakeling (5/10) Dec 18 2012 ... but I would guess that giv...
- bearophile (4/7) Dec 18 2012 Why?...
- Joseph Rushton Wakeling (6/7) Dec 18 2012 Because if you have to allocat...
- deadalnix (4/12) Dec 18 2012 Unless you have the data someh...
- Joseph Rushton Wakeling (6/8) Dec 19 2012 I was probably imprecise with ...
- Xinok (28/42) Dec 18 2012 Heap sort actually performs re...

Regarding the recent Phobos improvements that introduce a Timsort: http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e sh3.rs.github.com.mail I have found a blog post that compares the performance of Timsort, Smoothsort, and std::stable_sort: http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/ Bye, bearophile

Dec 17 2012

On Monday, 17 December 2012 at 15:28:36 UTC, bearophile wrote:Regarding the recent Phobos improvements that introduce a Timsort: http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e sh3.rs.github.com.mail I have found a blog post that compares the performance of Timsort, Smoothsort, and std::stable_sort: http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/ Bye, bearophile

While Smoothsort may be mathematically sound, it simply doesn't translate well to computer hardware. It's a variant of heap sort which requires a great deal of random access, whereas Timsort is a variant of merge sort which is largely sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is much more computationally expensive than a typical binary or ternary heap. Both Timsort and Smoothsort are what you call "natural sorts," meaning they typically require fewer comparisons on data with low entropy. They're also rather complex which means added overhead. When sorting primitive types like int, comparisons are inexpensive, so the overhead makes these algorithms slower. But had he tested it on a data type like strings, then we'd likely see Timsort take the lead. On purely random data, quick sort and merge sort will win most of the time. But Timsort has an advantage over Smoothsort of being an adaptive algorithm; the so called "galloping mode," which is computationally expensive, is only activated when minGallop reaches a certain threshold and therefore beneficial. Otherwise, a simple linear merge is used (i.e. merge sort). On another note, I highly doubt that std::sort uses a "median of medians" algorithm, which would add much overhead and essentially double the number of comparisons required with little to no benefit. More likely, it simply chooses the pivot from a median of three.

Dec 17 2012

On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:On another note, I highly doubt that std::sort uses a "median of medians" algorithm, which would add much overhead and essentially double the number of comparisons required with little to no benefit. More likely, it simply chooses the pivot from a median of three.

Different implementations use different strategies. libstdc++ seems to use median of 3. The Dinkumware standard library (which ships with MSVC) uses median of 9.

Dec 18 2012

On 12/18/12 5:50 AM, Peter Alexander wrote:On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:On another note, I highly doubt that std::sort uses a "median of medians" algorithm, which would add much overhead and essentially double the number of comparisons required with little to no benefit. More likely, it simply chooses the pivot from a median of three.

Different implementations use different strategies. libstdc++ seems to use median of 3. The Dinkumware standard library (which ships with MSVC) uses median of 9.

We should use a deferred pivot algorithm that I thought of a long time ago but never got around to test. One thing about current pivot selection approaches is that they decide on a strategy (middle, median of 3, median of 9, random etc) _before_ ever looking at the data. A different approach would be to take a look at a bit of data and _then_ decide what the pivot is. Consider the following approach: size_t partition(T[] d) { assert(a.length); size_t a = 0, z = arr.length - 1; auto maxa = a, minz = z; for (; a < z && mina <= maxz;) { if (d[a] > d[z]) { swap(d[a], d[z]); } if (d[maxa] < d[++a]) maxa = a; if (d[minz] > d[--z]) minz = z; } --a; ++z; /* Here data is already partitioned wrt d[a] or d[z]. If enough progress has been made (i.e. a is large enough compared to d.length), choose one of these as the pivot and only partition d[a .. z + 1]. Otherwise, use a classic pivot choice criterion. */ ... } Another approach I wanted to try was to choose the median of K with K increasing with the size of the array. This is because a good pivot is more important for large arrays than for small arrays. As such, a possibility would be to simply sort a stride of the input (std.range.stride is random access and can be sorted right away without any particular implementation effort) and then choose the middle of the stride as the pivot. If anyone has the time and inclination, have at it! Andrei

Dec 18 2012

On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote:Another approach I wanted to try was to choose the median of K with K increasing with the size of the array. This is because a good pivot is more important for large arrays than for small arrays. As such, a possibility would be to simply sort a stride of the input (std.range.stride is random access and can be sorted right away without any particular implementation effort) and then choose the middle of the stride as the pivot. If anyone has the time and inclination, have at it!

Perhaps a "median of log n" is in order, but the trouble is finding an algorithm for picking the median from n elements. The so called "median of medians" algorithm can choose a pivot within 30-70% of the range of the median. Otherwise, there doesn't seem to be any algorithm for choosing the absolute median other than, say, an insertion sort. I came up with this clever little guy a while ago which I used in my implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8 I would love to enhance it to work on a variable number of elements, but from what I can comprehend, the result is essentially a partial heap sort.

Dec 18 2012

On 12/18/12 8:42 PM, Xinok wrote:On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote:Another approach I wanted to try was to choose the median of K with K increasing with the size of the array. This is because a good pivot is more important for large arrays than for small arrays. As such, a possibility would be to simply sort a stride of the input (std.range.stride is random access and can be sorted right away without any particular implementation effort) and then choose the middle of the stride as the pivot. If anyone has the time and inclination, have at it!

Perhaps a "median of log n" is in order,

Yah I thought so!but the trouble is finding an algorithm for picking the median from n elements. The so called "median of medians" algorithm can choose a pivot within 30-70% of the range of the median. Otherwise, there doesn't seem to be any algorithm for choosing the absolute median other than, say, an insertion sort.

You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.I came up with this clever little guy a while ago which I used in my implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8 I would love to enhance it to work on a variable number of elements, but from what I can comprehend, the result is essentially a partial heap sort.

A very efficient sort for various small fixed sizes is a great complement for quicksort. Andrei

Dec 18 2012

Andrei Alexandrescu:A very efficient sort for various small fixed sizes is a great complement for quicksort.

Do you mean to use it when the input is very short, or when the QuickSort recursion has produced a very small subarray? In the first case it's useful, but in the second case I've seen it's more efficient (maybe not for huge arrays, but it is more efficient for normal arrays in RAM) to not call a specialized sort for such small sub-arrays, and instead just stop the QuickSort recursion and produce a locally unsorted array, and then call an insertion sort on the whole data. Bye, bearophile

Dec 18 2012

On 12/18/12 9:21 PM, bearophile wrote:Andrei Alexandrescu:A very efficient sort for various small fixed sizes is a great complement for quicksort.

Do you mean to use it when the input is very short, or when the QuickSort recursion has produced a very small subarray?

The latter.In the first case it's useful, but in the second case I've seen it's more efficient (maybe not for huge arrays, but it is more efficient for normal arrays in RAM) to not call a specialized sort for such small sub-arrays, and instead just stop the QuickSort recursion and produce a locally unsorted array, and then call an insertion sort on the whole data.

That's a commonly used approach, but I think it can be improved. Andrei

Dec 18 2012

On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:On 12/18/12 8:42 PM, Xinok wrote:On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote:Another approach I wanted to try was to choose the median of K with K increasing with the size of the array. This is because a good pivot is more important for large arrays than for small arrays. As such, a possibility would be to simply sort a stride of the input (std.range.stride is random access and can be sorted right away without any particular implementation effort) and then choose the middle of the stride as the pivot. If anyone has the time and inclination, have at it!

Perhaps a "median of log n" is in order,

Yah I thought so!but the trouble is finding an algorithm for picking the median from n elements. The so called "median of medians" algorithm can choose a pivot within 30-70% of the range of the median. Otherwise, there doesn't seem to be any algorithm for choosing the absolute median other than, say, an insertion sort.

You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.I came up with this clever little guy a while ago which I used in my implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8 I would love to enhance it to work on a variable number of elements, but from what I can comprehend, the result is essentially a partial heap sort.

A very efficient sort for various small fixed sizes is a great complement for quicksort. Andrei

A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything?

Dec 18 2012

On 12/18/12 10:35 PM, ixid wrote:A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything?

Not that I know of. Andrei

Dec 18 2012

On Wed, Dec 19, 2012 at 6:47 AM, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:On 12/18/12 10:35 PM, ixid wrote:A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything?

Not that I know of.

My bad, that was me and I got sidetracked. I'll modifiy std.algo.sort to see if I get any speed-up.

Dec 19 2012

On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.

I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.

Dec 18 2012

On 12/18/12 11:37 PM, Xinok wrote:On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.

I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.

I mostly fear cache touching issues. Andrei

Dec 18 2012

On Wednesday, 19 December 2012 at 05:48:04 UTC, Andrei Alexandrescu wrote:On 12/18/12 11:37 PM, Xinok wrote:On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.

I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.

I mostly fear cache touching issues. Andrei

I based my little experiment on my 'unstablesort' module, located here: https://github.com/Xinok/XSort/blob/master/unstablesort.d The results (sorting a random array of 1024*1024 uints): Median of Five: 142ms 21627203 comps Median of log n: 152ms 20778577 comps The code: size_t choosePivot(R range) { import std.math; // Reserve slice of range for choosing pivot R sub = range[0 .. cast(uint)log2(range.length) | 1]; // Pull in equally distributed elements swap(sub[sub.length - 1], range[range.length - 1]); foreach(i; 1 .. sub.length - 1) swap(sub[i], range[range.length / sub.length * i]); // Sort sublist to choose pivot insertionSort(sub); // Move partitionable elements sub = sub[piv + 1 .. sub.length]; foreach(i; 0 .. sub.length) swap(sub[i], range[range.length - sub.length + i]); // Return index of pivot return sub.length / 2; } My thoughts, while the idea does have merit, I think the median of five does a good job as it is. If you're interested in reducing the number of comparisons, replacing "optimisticInsertionSort" in std.algorithm with a binary insertion sort will do much more to achieve that goal. And if you're interested in O(n log n) running time, then add heap sort as a fall-back algorithm, as I did in my module (I actually plan to do this myself ... eventually).

Dec 21 2012

On 12/18/2012 07:52 AM, Xinok wrote:While Smoothsort may be mathematically sound, it simply doesn't translate well to computer hardware. It's a variant of heap sort which requires a great deal of random access, whereas Timsort is a variant of merge sort which is largely sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is much more computationally expensive than a typical binary or ternary heap.

... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no? That was surely a much, much bigger issue back in 1981 when Dijkstra proposed it, but still has a place today.

Dec 18 2012

Joseph Rushton Wakeling:... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no?

Why? Bye, bearophile

Dec 18 2012

On 12/18/2012 04:30 PM, bearophile wrote:Why?

Because if you have to allocate O(n) memory for another algorithm, that might either give you a memory hit that you can't take (less likely these days, to be fair), or simply take a large amount of time to allocate that degrades the performance. Happy to learn if my guess is wrong, though.

Dec 18 2012

On Tuesday, 18 December 2012 at 15:41:28 UTC, Joseph Rushton Wakeling wrote:On 12/18/2012 04:30 PM, bearophile wrote:Why?

Because if you have to allocate O(n) memory for another algorithm, that might either give you a memory hit that you can't take (less likely these days, to be fair), or simply take a large amount of time to allocate that degrades the performance. Happy to learn if my guess is wrong, though.

Unless you have the data somehow presorted, or you get them one by one, other sort are faster.

Dec 18 2012

On 12/19/2012 06:00 AM, deadalnix wrote:Unless you have the data somehow presorted, or you get them one by one, other sort are faster.

I was probably imprecise with my use of the word "scales". Obviously other algorithms have superior O() for the general case, but if memory use also scales with n, you are surely going to run into some kind of performance issues as n increases -- whereas if memory use is O(1), not so. Again, I imagine that was a more urgent issue in 1981 ...

Dec 19 2012

On Tuesday, 18 December 2012 at 15:27:07 UTC, Joseph Rushton Wakeling wrote:On 12/18/2012 07:52 AM, Xinok wrote:While Smoothsort may be mathematically sound, it simply doesn't translate well to computer hardware. It's a variant of heap sort which requires a great deal of random access, whereas Timsort is a variant of merge sort which is largely sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is much more computationally expensive than a typical binary or ternary heap.

... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no?

Heap sort actually performs really well if the entirety of the data is small enough to fit into the CPU cache. But on larger data sets, heap sort is jumping all over the place resulting in a significant number of cache misses. When a block of memory is stored in cache, it doesn't remain there for long and very little work is done on it when it is cached. (I mention heap sort because the leonardo heap of smoothsort is still very computationally expensive) Although merge sort is O(n), it's sequential nature results in far fewer cache misses. There are three blocks of memory being operated on at anytime: the two blocks to be merged, and a temporary buffer to store the merged elements. Three (or four) small pieces of memory can be sorted in the cache without any cache misses. Furthermore, thanks to the divide-and-conquer nature of merge sort, fairly large sublists can be sorted entirely in the CPU cache; this is even more so if you parallelize on a multi-core CPU which has a dedicated L1 cache for each CPU. Merge sort can be further optimized by using insertion sort on small sublists... which happens entirely in the CPU cache... Another way to put it, merge sort is an ideal algorithm for sorting linked lists, and it was even practical for sorting large lists stored on tape drives. Quick sort is a sequential sorting algorithm with O(log n) space complexity which is likely the reason it outperforms merge sort in most cases, albeit not being stable.

Dec 18 2012