digitalmars.D - Timsort vs some others

bearophile (7/7) Dec 17 2012 Regarding the recent Phobos improvements that introduce a Timsort:

Xinok (25/33) Dec 17 2012 While Smoothsort may be mathematically sound, it simply doesn't

Peter Alexander (5/10) Dec 18 2012 Different implementations use different strategies.

Andrei Alexandrescu (41/49) Dec 18 2012 We should use a deferred pivot algorithm that I thought of a long time

Xinok (13/21) Dec 18 2012 Perhaps a "median of log n" is in order, but the trouble is

Andrei Alexandrescu (7/27) Dec 18 2012 You don't need to choose a median - just sort the data (thereby making

bearophile (11/13) Dec 18 2012 Do you mean to use it when the input is very short, or when the

Andrei Alexandrescu (4/15) Dec 18 2012 That's a commonly used approach, but I think it can be improved.

ixid (5/47) Dec 18 2012 A while ago in another thread about sorting I believe you

Andrei Alexandrescu (3/6) Dec 18 2012 Not that I know of.

Philippe Sigaud (4/10) Dec 19 2012 My bad, that was me and I got sidetracked. I'll modifiy std.algo.sort to

Xinok (4/7) Dec 18 2012 I don't think it would work well in practice, but I'll put

Andrei Alexandrescu (3/8) Dec 18 2012 I mostly fear cache touching issues.

Xinok (37/49) Dec 21 2012 I based my little experiment on my 'unstablesort' module, located

Joseph Rushton Wakeling (5/10) Dec 18 2012 ... but I would guess that given the O(1) memory requirements it probabl...

bearophile (4/7) Dec 18 2012 Why?

Joseph Rushton Wakeling (6/7) Dec 18 2012 Because if you have to allocate O(n) memory for another algorithm, that ...

deadalnix (4/12) Dec 18 2012 Unless you have the data somehow presorted, or you get them one

Joseph Rushton Wakeling (6/8) Dec 19 2012 I was probably imprecise with my use of the word "scales". Obviously ot...

Xinok (28/42) Dec 18 2012 Heap sort actually performs really well if the entirety of the

"bearophile" <bearophileHUGS lycos.com> writes:

Regarding the recent Phobos improvements that introduce a Timsort:

http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e sh3.rs.github.com.mail

I have found a blog post that compares the performance of 
Timsort, Smoothsort, and std::stable_sort:

http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/

Bye,
bearophile

Dec 17 2012

"Xinok" <xinok live.com> writes:

On Monday, 17 December 2012 at 15:28:36 UTC, bearophile wrote:
 Regarding the recent Phobos improvements that introduce a 
 Timsort:

 http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e sh3.rs.github.com.mail

 I have found a blog post that compares the performance of 
 Timsort, Smoothsort, and std::stable_sort:

 http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/

 Bye,
 bearophile

While Smoothsort may be mathematically sound, it simply doesn't 
translate well to computer hardware. It's a variant of heap sort 
which requires a great deal of random access, whereas Timsort is 
a variant of merge sort which is largely sequential and benefits 
from the CPU cache. Furthermore, the Leonardo heap is much more 
computationally expensive than a typical binary or ternary heap.

Both Timsort and Smoothsort are what you call "natural sorts," 
meaning they typically require fewer comparisons on data with low 
entropy. They're also rather complex which means added overhead. 
When sorting primitive types like int, comparisons are 
inexpensive, so the overhead makes these algorithms slower. But 
had he tested it on a data type like strings, then we'd likely 
see Timsort take the lead.

On purely random data, quick sort and merge sort will win most of 
the time. But Timsort has an advantage over Smoothsort of being 
an adaptive algorithm; the so called "galloping mode," which is 
computationally expensive, is only activated when minGallop 
reaches a certain threshold and therefore beneficial. Otherwise, 
a simple linear merge is used (i.e. merge sort).

On another note, I highly doubt that std::sort uses a "median of 
medians" algorithm, which would add much overhead and essentially 
double the number of comparisons required with little to no 
benefit. More likely, it simply chooses the pivot from a median 
of three.

Dec 17 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:
 On another note, I highly doubt that std::sort uses a "median 
 of medians" algorithm, which would add much overhead and 
 essentially double the number of comparisons required with 
 little to no benefit. More likely, it simply chooses the pivot 
 from a median of three.

Different implementations use different strategies.

libstdc++ seems to use median of 3.

The Dinkumware standard library (which ships with MSVC) uses 
median of 9.

Dec 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/18/12 5:50 AM, Peter Alexander wrote:
 On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:
 On another note, I highly doubt that std::sort uses a "median of
 medians" algorithm, which would add much overhead and essentially
 double the number of comparisons required with little to no benefit.
 More likely, it simply chooses the pivot from a median of three.

 Different implementations use different strategies.

 libstdc++ seems to use median of 3.

 The Dinkumware standard library (which ships with MSVC) uses median of 9.

We should use a deferred pivot algorithm that I thought of a long time 
ago but never got around to test.

One thing about current pivot selection approaches is that they decide 
on a strategy (middle, median of 3, median of 9, random etc) _before_ 
ever looking at the data.

A different approach would be to take a look at a bit of data and _then_ 
decide what the pivot is.

Consider the following approach:

size_t partition(T[] d)
{
     assert(a.length);
     size_t a = 0, z = arr.length - 1;
     auto maxa = a, minz = z;

     for (; a < z && mina <= maxz;)
     {
         if (d[a] > d[z])
         {
             swap(d[a], d[z]);
         }
         if (d[maxa] < d[++a]) maxa = a;
         if (d[minz] > d[--z]) minz = z;
     }
     --a;
     ++z;

/* Here data is already partitioned wrt d[a] or d[z]. If enough progress 
has been made (i.e. a is large enough compared to d.length), choose one 
of these as the pivot and only partition d[a .. z + 1]. Otherwise, use a 
classic pivot choice criterion.
*/
     ...
}

Another approach I wanted to try was to choose the median of K with K 
increasing with the size of the array. This is because a good pivot is 
more important for large arrays than for small arrays. As such, a 
possibility would be to simply sort a stride of the input 
(std.range.stride is random access and can be sorted right away without 
any particular implementation effort) and then choose the middle of the 
stride as the pivot.

If anyone has the time and inclination, have at it!


Andrei

Dec 18 2012

"Xinok" <xinok live.com> writes:

On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu 
wrote:
 Another approach I wanted to try was to choose the median of K 
 with K increasing with the size of the array. This is because a 
 good pivot is more important for large arrays than for small 
 arrays. As such, a possibility would be to simply sort a stride 
 of the input (std.range.stride is random access and can be 
 sorted right away without any particular implementation effort) 
 and then choose the middle of the stride as the pivot.

 If anyone has the time and inclination, have at it!

Perhaps a "median of log n" is in order, but the trouble is 
finding an algorithm for picking the median from n elements. The 
so called "median of medians" algorithm can choose a pivot within 
30-70% of the range of the median. Otherwise, there doesn't seem 
to be any algorithm for choosing the absolute median other than, 
say, an insertion sort.

I came up with this clever little guy a while ago which I used in 
my implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8
I would love to enhance it to work on a variable number of 
elements, but from what I can comprehend, the result is 
essentially a partial heap sort.

Dec 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/18/12 8:42 PM, Xinok wrote:
 On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote:
 Another approach I wanted to try was to choose the median of K with K
 increasing with the size of the array. This is because a good pivot is
 more important for large arrays than for small arrays. As such, a
 possibility would be to simply sort a stride of the input
 (std.range.stride is random access and can be sorted right away
 without any particular implementation effort) and then choose the
 middle of the stride as the pivot.

 If anyone has the time and inclination, have at it!

 Perhaps a "median of log n" is in order,

Yah I thought so!

 but the trouble is finding an
 algorithm for picking the median from n elements. The so called "median
 of medians" algorithm can choose a pivot within 30-70% of the range of
 the median. Otherwise, there doesn't seem to be any algorithm for
 choosing the absolute median other than, say, an insertion sort.

You don't need to choose a median - just sort the data (thereby making 
progress toward the end goal) and choose the middle element.

 I came up with this clever little guy a while ago which I used in my
 implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8
 I would love to enhance it to work on a variable number of elements, but
 from what I can comprehend, the result is essentially a partial heap sort.

A very efficient sort for various small fixed sizes is a great 
complement for quicksort.


Andrei

Dec 18 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 A very efficient sort for various small fixed sizes is a great 
 complement for quicksort.

Do you mean to use it when the input is very short, or when the 
QuickSort recursion has produced a very small subarray? In the 
first case it's useful, but in the second case I've seen it's 
more efficient (maybe not for huge arrays, but it is more 
efficient for normal arrays in RAM) to not call a specialized 
sort for such small sub-arrays, and instead just stop the 
QuickSort recursion and produce a locally unsorted array, and 
then call an insertion sort on the whole data.

Bye,
bearophile

Dec 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/18/12 9:21 PM, bearophile wrote:
 Andrei Alexandrescu:

 A very efficient sort for various small fixed sizes is a great
 complement for quicksort.

 Do you mean to use it when the input is very short, or when the
 QuickSort recursion has produced a very small subarray?

The latter.

 In the first
 case it's useful, but in the second case I've seen it's more efficient
 (maybe not for huge arrays, but it is more efficient for normal arrays
 in RAM) to not call a specialized sort for such small sub-arrays, and
 instead just stop the QuickSort recursion and produce a locally unsorted
 array, and then call an insertion sort on the whole data.

That's a commonly used approach, but I think it can be improved.


Andrei

Dec 18 2012

"ixid" <nuaccount gmail.com> writes:

On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei 
Alexandrescu wrote:
 On 12/18/12 8:42 PM, Xinok wrote:
 On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei 
 Alexandrescu wrote:
 Another approach I wanted to try was to choose the median of 
 K with K
 increasing with the size of the array. This is because a good 
 pivot is
 more important for large arrays than for small arrays. As 
 such, a
 possibility would be to simply sort a stride of the input
 (std.range.stride is random access and can be sorted right 
 away
 without any particular implementation effort) and then choose 
 the
 middle of the stride as the pivot.

 If anyone has the time and inclination, have at it!

 Perhaps a "median of log n" is in order,

 Yah I thought so!

 but the trouble is finding an
 algorithm for picking the median from n elements. The so 
 called "median
 of medians" algorithm can choose a pivot within 30-70% of the 
 range of
 the median. Otherwise, there doesn't seem to be any algorithm 
 for
 choosing the absolute median other than, say, an insertion 
 sort.

 You don't need to choose a median - just sort the data (thereby 
 making progress toward the end goal) and choose the middle 
 element.

 I came up with this clever little guy a while ago which I used 
 in my
 implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8
 I would love to enhance it to work on a variable number of 
 elements, but
 from what I can comprehend, the result is essentially a 
 partial heap sort.

 A very efficient sort for various small fixed sizes is a great 
 complement for quicksort.


 Andrei

A while ago in another thread about sorting I believe you 
mentioned the possibility of having templated sorting networks 
for small numbers of items, did that idea come to anything?

Dec 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/18/12 10:35 PM, ixid wrote:
 A while ago in another thread about sorting I believe you mentioned the
 possibility of having templated sorting networks for small numbers of
 items, did that idea come to anything?

Not that I know of.

Andrei

Dec 18 2012

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Wed, Dec 19, 2012 at 6:47 AM, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 On 12/18/12 10:35 PM, ixid wrote:

 A while ago in another thread about sorting I believe you mentioned the
 possibility of having templated sorting networks for small numbers of
 items, did that idea come to anything?

 Not that I know of.


My bad, that was me and I got sidetracked. I'll modifiy std.algo.sort to
see if I get any speed-up.

Dec 19 2012

"Xinok" <xinok live.com> writes:

On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei 
Alexandrescu wrote:
 You don't need to choose a median - just sort the data (thereby 
 making progress toward the end goal) and choose the middle 
 element.

I don't think it would work well in practice, but I'll put 
something together to see if the idea does have merit.

Dec 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/18/12 11:37 PM, Xinok wrote:
 On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:
 You don't need to choose a median - just sort the data (thereby making
 progress toward the end goal) and choose the middle element.

 I don't think it would work well in practice, but I'll put something
 together to see if the idea does have merit.

I mostly fear cache touching issues.

Andrei

Dec 18 2012

"Xinok" <xinok live.com> writes:

On Wednesday, 19 December 2012 at 05:48:04 UTC, Andrei 
Alexandrescu wrote:
 On 12/18/12 11:37 PM, Xinok wrote:
 On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei 
 Alexandrescu wrote:
 You don't need to choose a median - just sort the data 
 (thereby making
 progress toward the end goal) and choose the middle element.

 I don't think it would work well in practice, but I'll put 
 something
 together to see if the idea does have merit.

 I mostly fear cache touching issues.

 Andrei

I based my little experiment on my 'unstablesort' module, located 
here:
https://github.com/Xinok/XSort/blob/master/unstablesort.d

The results (sorting a random array of 1024*1024 uints):

Median of Five:
142ms     21627203 comps

Median of log n:
152ms     20778577 comps

The code:

size_t choosePivot(R range)
{
	import std.math;
	// Reserve slice of range for choosing pivot
	R sub = range[0 .. cast(uint)log2(range.length) | 1];
	// Pull in equally distributed elements
	swap(sub[sub.length - 1], range[range.length - 1]);
	foreach(i; 1 .. sub.length - 1) swap(sub[i], range[range.length 
/ sub.length * i]);
	// Sort sublist to choose pivot
	insertionSort(sub);
	// Move partitionable elements
	sub = sub[piv + 1 .. sub.length];
	foreach(i; 0 .. sub.length) swap(sub[i], range[range.length - 
sub.length + i]);
	// Return index of pivot
	return sub.length / 2;
}

My thoughts, while the idea does have merit, I think the median 
of five does a good job as it is. If you're interested in 
reducing the number of comparisons, replacing 
"optimisticInsertionSort" in std.algorithm with a binary 
insertion sort will do much more to achieve that goal. And if 
you're interested in O(n log n) running time, then add heap sort 
as a fall-back algorithm, as I did in my module (I actually plan 
to do this myself ... eventually).

Dec 21 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 12/18/2012 07:52 AM, Xinok wrote:
 While Smoothsort may be mathematically sound, it simply doesn't translate well
 to computer hardware. It's a variant of heap sort which requires a great deal
of
 random access, whereas Timsort is a variant of merge sort which is largely
 sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is
 much more computationally expensive than a typical binary or ternary heap.

... but I would guess that given the O(1) memory requirements it probably
scales 
much better to sorting really, really large data, no?

That was surely a much, much bigger issue back in 1981 when Dijkstra proposed 
it, but still has a place today.

Dec 18 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Joseph Rushton Wakeling:

 ... but I would guess that given the O(1) memory requirements 
 it probably scales much better to sorting really, really large 
 data, no?

Why?

Bye,
bearophile

Dec 18 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 12/18/2012 04:30 PM, bearophile wrote:
 Why?

Because if you have to allocate O(n) memory for another algorithm, that might 
either give you a memory hit that you can't take (less likely these days, to be 
fair), or simply take a large amount of time to allocate that degrades the 
performance.

Happy to learn if my guess is wrong, though.

Dec 18 2012

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 18 December 2012 at 15:41:28 UTC, Joseph Rushton 
Wakeling wrote:
 On 12/18/2012 04:30 PM, bearophile wrote:
 Why?

 Because if you have to allocate O(n) memory for another 
 algorithm, that might either give you a memory hit that you 
 can't take (less likely these days, to be fair), or simply take 
 a large amount of time to allocate that degrades the 
 performance.

 Happy to learn if my guess is wrong, though.

Unless you have the data somehow presorted, or you get them one 
by one, other sort are faster.

Dec 18 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 12/19/2012 06:00 AM, deadalnix wrote:
 Unless you have the data somehow presorted, or you get them one by one, other
 sort are faster.

I was probably imprecise with my use of the word "scales".  Obviously other 
algorithms have superior O() for the general case, but if memory use also
scales 
with n, you are surely going to run into some kind of performance issues as n 
increases -- whereas if memory use is O(1), not so.

Again, I imagine that was a more urgent issue in 1981 ...

Dec 19 2012

"Xinok" <xinok live.com> writes:

On Tuesday, 18 December 2012 at 15:27:07 UTC, Joseph Rushton 
Wakeling wrote:
 On 12/18/2012 07:52 AM, Xinok wrote:
 While Smoothsort may be mathematically sound, it simply 
 doesn't translate well
 to computer hardware. It's a variant of heap sort which 
 requires a great deal of
 random access, whereas Timsort is a variant of merge sort 
 which is largely
 sequential and benefits from the CPU cache. Furthermore, the 
 Leonardo heap is
 much more computationally expensive than a typical binary or 
 ternary heap.

 ... but I would guess that given the O(1) memory requirements 
 it probably scales much better to sorting really, really large 
 data, no?

Heap sort actually performs really well if the entirety of the 
data is small enough to fit into the CPU cache. But on larger 
data sets, heap sort is jumping all over the place resulting in a 
significant number of cache misses. When a block of memory is 
stored in cache, it doesn't remain there for long and very little 
work is done on it when it is cached. (I mention heap sort 
because the leonardo heap of smoothsort is still very 
computationally expensive)

Although merge sort is O(n), it's sequential nature results in 
far fewer cache misses. There are three blocks of memory being 
operated on at anytime: the two blocks to be merged, and a 
temporary buffer to store the merged elements. Three (or four) 
small pieces of memory can be sorted in the cache without any 
cache misses. Furthermore, thanks to the divide-and-conquer 
nature of merge sort, fairly large sublists can be sorted 
entirely in the CPU cache; this is even more so if you 
parallelize on a multi-core CPU which has a dedicated L1 cache 
for each CPU. Merge sort can be further optimized by using 
insertion sort on small sublists... which happens entirely in the 
CPU cache...

Another way to put it, merge sort is an ideal algorithm for 
sorting linked lists, and it was even practical for sorting large 
lists stored on tape drives.

Quick sort is a sequential sorting algorithm with O(log n) space 
complexity which is likely the reason it outperforms merge sort 
in most cases, albeit not being stable.

Dec 18 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Timsort vs some others