www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Timsort vs some others

reply "bearophile" <bearophileHUGS lycos.com> writes:
Regarding the recent Phobos improvements that introduce a Timsort:

http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e sh3.rs.github.com.mail

I have found a blog post that compares the performance of 
Timsort, Smoothsort, and std::stable_sort:

http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/

Bye,
bearophile
Dec 17 2012
next sibling parent reply "Xinok" <xinok live.com> writes:
On Monday, 17 December 2012 at 15:28:36 UTC, bearophile wrote:
 Regarding the recent Phobos improvements that introduce a 
 Timsort:

 http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e sh3.rs.github.com.mail

 I have found a blog post that compares the performance of 
 Timsort, Smoothsort, and std::stable_sort:

 http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/

 Bye,
 bearophile

While Smoothsort may be mathematically sound, it simply doesn't translate well to computer hardware. It's a variant of heap sort which requires a great deal of random access, whereas Timsort is a variant of merge sort which is largely sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is much more computationally expensive than a typical binary or ternary heap. Both Timsort and Smoothsort are what you call "natural sorts," meaning they typically require fewer comparisons on data with low entropy. They're also rather complex which means added overhead. When sorting primitive types like int, comparisons are inexpensive, so the overhead makes these algorithms slower. But had he tested it on a data type like strings, then we'd likely see Timsort take the lead. On purely random data, quick sort and merge sort will win most of the time. But Timsort has an advantage over Smoothsort of being an adaptive algorithm; the so called "galloping mode," which is computationally expensive, is only activated when minGallop reaches a certain threshold and therefore beneficial. Otherwise, a simple linear merge is used (i.e. merge sort). On another note, I highly doubt that std::sort uses a "median of medians" algorithm, which would add much overhead and essentially double the number of comparisons required with little to no benefit. More likely, it simply chooses the pivot from a median of three.
Dec 17 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/12 5:50 AM, Peter Alexander wrote:
 On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:
 On another note, I highly doubt that std::sort uses a "median of
 medians" algorithm, which would add much overhead and essentially
 double the number of comparisons required with little to no benefit.
 More likely, it simply chooses the pivot from a median of three.

Different implementations use different strategies. libstdc++ seems to use median of 3. The Dinkumware standard library (which ships with MSVC) uses median of 9.

We should use a deferred pivot algorithm that I thought of a long time ago but never got around to test. One thing about current pivot selection approaches is that they decide on a strategy (middle, median of 3, median of 9, random etc) _before_ ever looking at the data. A different approach would be to take a look at a bit of data and _then_ decide what the pivot is. Consider the following approach: size_t partition(T[] d) { assert(a.length); size_t a = 0, z = arr.length - 1; auto maxa = a, minz = z; for (; a < z && mina <= maxz;) { if (d[a] > d[z]) { swap(d[a], d[z]); } if (d[maxa] < d[++a]) maxa = a; if (d[minz] > d[--z]) minz = z; } --a; ++z; /* Here data is already partitioned wrt d[a] or d[z]. If enough progress has been made (i.e. a is large enough compared to d.length), choose one of these as the pivot and only partition d[a .. z + 1]. Otherwise, use a classic pivot choice criterion. */ ... } Another approach I wanted to try was to choose the median of K with K increasing with the size of the array. This is because a good pivot is more important for large arrays than for small arrays. As such, a possibility would be to simply sort a stride of the input (std.range.stride is random access and can be sorted right away without any particular implementation effort) and then choose the middle of the stride as the pivot. If anyone has the time and inclination, have at it! Andrei
Dec 18 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/12 8:42 PM, Xinok wrote:
 On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote:
 Another approach I wanted to try was to choose the median of K with K
 increasing with the size of the array. This is because a good pivot is
 more important for large arrays than for small arrays. As such, a
 possibility would be to simply sort a stride of the input
 (std.range.stride is random access and can be sorted right away
 without any particular implementation effort) and then choose the
 middle of the stride as the pivot.

 If anyone has the time and inclination, have at it!

Perhaps a "median of log n" is in order,

Yah I thought so!
 but the trouble is finding an
 algorithm for picking the median from n elements. The so called "median
 of medians" algorithm can choose a pivot within 30-70% of the range of
 the median. Otherwise, there doesn't seem to be any algorithm for
 choosing the absolute median other than, say, an insertion sort.

You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.
 I came up with this clever little guy a while ago which I used in my
 implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8
 I would love to enhance it to work on a variable number of elements, but
 from what I can comprehend, the result is essentially a partial heap sort.

A very efficient sort for various small fixed sizes is a great complement for quicksort. Andrei
Dec 18 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/12 10:35 PM, ixid wrote:
 A while ago in another thread about sorting I believe you mentioned the
 possibility of having templated sorting networks for small numbers of
 items, did that idea come to anything?

Not that I know of. Andrei
Dec 18 2012
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/12 9:21 PM, bearophile wrote:
 Andrei Alexandrescu:

 A very efficient sort for various small fixed sizes is a great
 complement for quicksort.

Do you mean to use it when the input is very short, or when the QuickSort recursion has produced a very small subarray?

The latter.
 In the first
 case it's useful, but in the second case I've seen it's more efficient
 (maybe not for huge arrays, but it is more efficient for normal arrays
 in RAM) to not call a specialized sort for such small sub-arrays, and
 instead just stop the QuickSort recursion and produce a locally unsorted
 array, and then call an insertion sort on the whole data.

That's a commonly used approach, but I think it can be improved. Andrei
Dec 18 2012
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/12 11:37 PM, Xinok wrote:
 On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:
 You don't need to choose a median - just sort the data (thereby making
 progress toward the end goal) and choose the middle element.

I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.

I mostly fear cache touching issues. Andrei
Dec 18 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:
 On another note, I highly doubt that std::sort uses a "median 
 of medians" algorithm, which would add much overhead and 
 essentially double the number of comparisons required with 
 little to no benefit. More likely, it simply chooses the pivot 
 from a median of three.

Different implementations use different strategies. libstdc++ seems to use median of 3. The Dinkumware standard library (which ships with MSVC) uses median of 9.
Dec 18 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 12/18/2012 07:52 AM, Xinok wrote:
 While Smoothsort may be mathematically sound, it simply doesn't translate well
 to computer hardware. It's a variant of heap sort which requires a great deal
of
 random access, whereas Timsort is a variant of merge sort which is largely
 sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is
 much more computationally expensive than a typical binary or ternary heap.

... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no? That was surely a much, much bigger issue back in 1981 when Dijkstra proposed it, but still has a place today.
Dec 18 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

 ... but I would guess that given the O(1) memory requirements 
 it probably scales much better to sorting really, really large 
 data, no?

Why? Bye, bearophile
Dec 18 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 12/18/2012 04:30 PM, bearophile wrote:
 Why?

Because if you have to allocate O(n) memory for another algorithm, that might either give you a memory hit that you can't take (less likely these days, to be fair), or simply take a large amount of time to allocate that degrades the performance. Happy to learn if my guess is wrong, though.
Dec 18 2012
prev sibling next sibling parent "Xinok" <xinok live.com> writes:
On Tuesday, 18 December 2012 at 15:27:07 UTC, Joseph Rushton 
Wakeling wrote:
 On 12/18/2012 07:52 AM, Xinok wrote:
 While Smoothsort may be mathematically sound, it simply 
 doesn't translate well
 to computer hardware. It's a variant of heap sort which 
 requires a great deal of
 random access, whereas Timsort is a variant of merge sort 
 which is largely
 sequential and benefits from the CPU cache. Furthermore, the 
 Leonardo heap is
 much more computationally expensive than a typical binary or 
 ternary heap.

... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no?

Heap sort actually performs really well if the entirety of the data is small enough to fit into the CPU cache. But on larger data sets, heap sort is jumping all over the place resulting in a significant number of cache misses. When a block of memory is stored in cache, it doesn't remain there for long and very little work is done on it when it is cached. (I mention heap sort because the leonardo heap of smoothsort is still very computationally expensive) Although merge sort is O(n), it's sequential nature results in far fewer cache misses. There are three blocks of memory being operated on at anytime: the two blocks to be merged, and a temporary buffer to store the merged elements. Three (or four) small pieces of memory can be sorted in the cache without any cache misses. Furthermore, thanks to the divide-and-conquer nature of merge sort, fairly large sublists can be sorted entirely in the CPU cache; this is even more so if you parallelize on a multi-core CPU which has a dedicated L1 cache for each CPU. Merge sort can be further optimized by using insertion sort on small sublists... which happens entirely in the CPU cache... Another way to put it, merge sort is an ideal algorithm for sorting linked lists, and it was even practical for sorting large lists stored on tape drives. Quick sort is a sequential sorting algorithm with O(log n) space complexity which is likely the reason it outperforms merge sort in most cases, albeit not being stable.
Dec 18 2012
prev sibling next sibling parent "Xinok" <xinok live.com> writes:
On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu 
wrote:
 Another approach I wanted to try was to choose the median of K 
 with K increasing with the size of the array. This is because a 
 good pivot is more important for large arrays than for small 
 arrays. As such, a possibility would be to simply sort a stride 
 of the input (std.range.stride is random access and can be 
 sorted right away without any particular implementation effort) 
 and then choose the middle of the stride as the pivot.

 If anyone has the time and inclination, have at it!

Perhaps a "median of log n" is in order, but the trouble is finding an algorithm for picking the median from n elements. The so called "median of medians" algorithm can choose a pivot within 30-70% of the range of the median. Otherwise, there doesn't seem to be any algorithm for choosing the absolute median other than, say, an insertion sort. I came up with this clever little guy a while ago which I used in my implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8 I would love to enhance it to work on a variable number of elements, but from what I can comprehend, the result is essentially a partial heap sort.
Dec 18 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 A very efficient sort for various small fixed sizes is a great 
 complement for quicksort.

Do you mean to use it when the input is very short, or when the QuickSort recursion has produced a very small subarray? In the first case it's useful, but in the second case I've seen it's more efficient (maybe not for huge arrays, but it is more efficient for normal arrays in RAM) to not call a specialized sort for such small sub-arrays, and instead just stop the QuickSort recursion and produce a locally unsorted array, and then call an insertion sort on the whole data. Bye, bearophile
Dec 18 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei 
Alexandrescu wrote:
 On 12/18/12 8:42 PM, Xinok wrote:
 On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei 
 Alexandrescu wrote:
 Another approach I wanted to try was to choose the median of 
 K with K
 increasing with the size of the array. This is because a good 
 pivot is
 more important for large arrays than for small arrays. As 
 such, a
 possibility would be to simply sort a stride of the input
 (std.range.stride is random access and can be sorted right 
 away
 without any particular implementation effort) and then choose 
 the
 middle of the stride as the pivot.

 If anyone has the time and inclination, have at it!

Perhaps a "median of log n" is in order,

Yah I thought so!
 but the trouble is finding an
 algorithm for picking the median from n elements. The so 
 called "median
 of medians" algorithm can choose a pivot within 30-70% of the 
 range of
 the median. Otherwise, there doesn't seem to be any algorithm 
 for
 choosing the absolute median other than, say, an insertion 
 sort.

You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.
 I came up with this clever little guy a while ago which I used 
 in my
 implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8
 I would love to enhance it to work on a variable number of 
 elements, but
 from what I can comprehend, the result is essentially a 
 partial heap sort.

A very efficient sort for various small fixed sizes is a great complement for quicksort. Andrei

A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything?
Dec 18 2012
prev sibling next sibling parent "Xinok" <xinok live.com> writes:
On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei 
Alexandrescu wrote:
 You don't need to choose a median - just sort the data (thereby 
 making progress toward the end goal) and choose the middle 
 element.

I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.
Dec 18 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 18 December 2012 at 15:41:28 UTC, Joseph Rushton 
Wakeling wrote:
 On 12/18/2012 04:30 PM, bearophile wrote:
 Why?

Because if you have to allocate O(n) memory for another algorithm, that might either give you a memory hit that you can't take (less likely these days, to be fair), or simply take a large amount of time to allocate that degrades the performance. Happy to learn if my guess is wrong, though.

Unless you have the data somehow presorted, or you get them one by one, other sort are faster.
Dec 18 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 12/19/2012 06:00 AM, deadalnix wrote:
 Unless you have the data somehow presorted, or you get them one by one, other
 sort are faster.

I was probably imprecise with my use of the word "scales". Obviously other algorithms have superior O() for the general case, but if memory use also scales with n, you are surely going to run into some kind of performance issues as n increases -- whereas if memory use is O(1), not so. Again, I imagine that was a more urgent issue in 1981 ...
Dec 19 2012
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--20cf3074b4581f8db004d13a46bd
Content-Type: text/plain; charset=UTF-8

On Wed, Dec 19, 2012 at 6:47 AM, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 On 12/18/12 10:35 PM, ixid wrote:

 A while ago in another thread about sorting I believe you mentioned the
 possibility of having templated sorting networks for small numbers of
 items, did that idea come to anything?

Not that I know of.

My bad, that was me and I got sidetracked. I'll modifiy std.algo.sort to see if I get any speed-up. --20cf3074b4581f8db004d13a46bd Content-Type: text/html; charset=UTF-8 <br><br><div class="gmail_quote">On Wed, Dec 19, 2012 at 6:47 AM, Andrei Alexandrescu <span dir="ltr">&lt;<a href="mailto:SeeWebsiteForEmail erdani.org" target="_blank">SeeWebsiteForEmail erdani.org</a>&gt;</span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 12/18/12 10:35 PM, ixid wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> A while ago in another thread about sorting I believe you mentioned the<br> possibility of having templated sorting networks for small numbers of<br> items, did that idea come to anything?<br> </blockquote> <br></div> Not that I know of.</blockquote><div><br></div><div>My bad, that was me and I got sidetracked. I&#39;ll modifiy std.algo.sort to see if I get any speed-up.</div><div><br></div><div><br></div></div> --20cf3074b4581f8db004d13a46bd--
Dec 19 2012
prev sibling parent "Xinok" <xinok live.com> writes:
On Wednesday, 19 December 2012 at 05:48:04 UTC, Andrei 
Alexandrescu wrote:
 On 12/18/12 11:37 PM, Xinok wrote:
 On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei 
 Alexandrescu wrote:
 You don't need to choose a median - just sort the data 
 (thereby making
 progress toward the end goal) and choose the middle element.

I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.

I mostly fear cache touching issues. Andrei

I based my little experiment on my 'unstablesort' module, located here: https://github.com/Xinok/XSort/blob/master/unstablesort.d The results (sorting a random array of 1024*1024 uints): Median of Five: 142ms 21627203 comps Median of log n: 152ms 20778577 comps The code: size_t choosePivot(R range) { import std.math; // Reserve slice of range for choosing pivot R sub = range[0 .. cast(uint)log2(range.length) | 1]; // Pull in equally distributed elements swap(sub[sub.length - 1], range[range.length - 1]); foreach(i; 1 .. sub.length - 1) swap(sub[i], range[range.length / sub.length * i]); // Sort sublist to choose pivot insertionSort(sub); // Move partitionable elements sub = sub[piv + 1 .. sub.length]; foreach(i; 0 .. sub.length) swap(sub[i], range[range.length - sub.length + i]); // Return index of pivot return sub.length / 2; } My thoughts, while the idea does have merit, I think the median of five does a good job as it is. If you're interested in reducing the number of comparisons, replacing "optimisticInsertionSort" in std.algorithm with a binary insertion sort will do much more to achieve that goal. And if you're interested in O(n log n) running time, then add heap sort as a fall-back algorithm, as I did in my module (I actually plan to do this myself ... eventually).
Dec 21 2012