www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Sorting algorithms benchmark

reply Stewart Gordon <smjg_1998 yahoo.com> writes:
I've written a program to benchmark a number of sorting algorithms.

http://pr.stewartsplace.org.uk/d/sortbench.d

It includes options to output the timings to a CSV file and to control the
number of tests of each algorithm to perform (likely to be needed on modern
machines).

Stewart.
Feb 27 2007
parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Stewart Gordon wrote:
 I've written a program to benchmark a number of sorting algorithms.
 
 http://pr.stewartsplace.org.uk/d/sortbench.d
 
 It includes options to output the timings to a CSV file and to control the
number of tests of each algorithm to perform (likely to be needed on modern
machines).
 
 Stewart.
 

Nice. One of the very first D programs I wrote was a class Sorter which aimed to do the same, but I never published it since most of the code was just pretty much directly copied from somewhere. Besides, I didn't really understand most of the algorithms, and couldn't get all the ones I wanted to work. Pigeonhole sort, in particular, couldn't reliably work for all types without opAssign overloading, which was necessary. Some comments from what I do (think I do) know, however: Your comb sort uses a shrink factor of 0.7. Comb sort 11 uses 1 / (1 - 1 / (e ^ phi)), where e is Napier's constant and phi the golden ratio. This leads to possible optimization in the algorithm. But alas, the page that used to contain the article describing it is gone from the web: http://world.std.com/~jdveale/combsort.htm. Dammit! I haven't copied the source code for it from there, either. Oh well, a less optimal version but still, I wager, better than yours: http://yagni.com/combsort/index.php#cpp-combsort What I used: const real SHRINK_FACTOR = 1.2473309501039786540366528676643474234433714124826; You're missing shell sort: http://en.wikipedia.org/wiki/Shell_sort See also: http://www.cs.princeton.edu/~rs/talks/shellsort.pdf My 1.5-year old code (no idea how it works - I'm particularly frightened of the bit math, I think it's the formula here: http://www.research.att.com/~njas/sequences/A033622) is at the bottom of this post, if you want to have a look. Other sorts that could be added: - Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort - Introspective sort: http://en.wikipedia.org/wiki/Introsort - Three-way merge sort: http://www.cs.queensu.ca/home/cisc121/2002w/assignments/assn8/solution/MergeThreeWay.java - Counting sort, http://en.wikipedia.org/wiki/Counting_sort - Bucket sort, http://en.wikipedia.org/wiki/Bucket_sort - Pigeonhole sort, http://en.wikipedia.org/wiki/Pigeonhole_sort I didn't look too much at your radix sorts, they might have a lot in common with the last three of the above. Also, your "buffered insertion sort" is perhaps more commonly known as "binary insertion sort", since you mention it uses binary search. -- // TYPE is just a template parameter // comp defaults to a simple less-than comparison // the inout is probably unnecessary, I wouldn't have known that back then void shellSort(inout TYPE[] a, size_t beg = 0, size_t end = 0, cmpType cmp = null) { if (cmp is null) cmp = this.comp; if (end == 0) end = a.length; // can't figure out how to get it working for arbitrary beg and end, hence kept separate here void actualShellSort(inout TYPE[] ra) { size_t h = 1, hh = 1; while (h < ra.length) { // magic numbers from formula: // http://www.research.att.com/cgi-bin/access.cgi/as/njas/sequences/eisA.cgi?Anum=A033622 if (hh % 2) h = 8 << hh - 6 << ((hh + 1) / 2) + 1; else h = 9 << hh - 9 << ( hh / 2) + 1; ++hh; } while (h > 0) { // for each set, of which there are h for (size_t i = h - 1; i < ra.length; ++i) { // pick last element in set TYPE tmp = ra[i]; size_t j = i; // compare tmp to the one before it in the set // if they are not in order, continue this loop, moving // elements back to make room for tmp for (; j >= h && !cmp(tmp, ra[j-h]); j -= h) ra[j] = ra[j - h]; ra[j] = tmp; } // all h-sets sorted h /= 3; } } if (!smartToSort(a, beg, end, cmp)) return; TYPE[] foo = a[beg..end]; foo.actualShellSort(); } // other relevant stuff to get the above to compile (not tested, though): bit smartToSort(inout TYPE[] a, size_t beg, size_t end, cmpType cmp) { if (end <= beg) return false; if (end - beg > a.length) return false; if (end - beg < 2) return false; else if (end - beg == 2) { if (!cmp(a[end - 1], a[beg])) a.swap(beg, end - 1); return false; } return true; } alias int function(in TYPE left, in TYPE right) cmpType;
Feb 28 2007
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Deewiant Wrote:

<snip>
 Your comb sort uses a shrink factor of 0.7.  Comb sort 11 uses 
 1 / (1 - 1 / (e ^ phi)), where e is Napier's constant and phi 
 the golden ratio.  This leads to possible optimization in the 
 algorithm.

Hang on - isn't 0.7 a grow factor, rather than a shrink factor? <snip>
 You're missing shell sort: 
 http://en.wikipedia.org/wiki/Shell_sort

Indeed, that could be implemented as well. I noticed there: "For instance, if a list was 5-sorted and then 3-sorted, the list is now not only 3-sorted, but both 5- and 3-sorted." Guess I should read up on the proof sometime. <snip>
 Other sorts that could be added:
 	- Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort

Not sure if this one is worth it....
 	- Introspective sort: http://en.wikipedia.org/wiki/Introsort
 	- Three-way merge sort:
 http://www.cs.queensu.ca/home/cisc121/2002w/assignments/assn8/solution/MergeThreeWay.java

Certainly worth considering.
 	- Counting sort, http://en.wikipedia.org/wiki/Counting_sort
 	- Bucket sort, http://en.wikipedia.org/wiki/Bucket_sort
 	- Pigeonhole sort, 
 http://en.wikipedia.org/wiki/Pigeonhole_sort

It took me a moment to figure the difference between these three. But I would need to reduce the range of the random sequences a lot before any of them will be remotely feasible. Stewart.
Feb 28 2007
parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Stewart Gordon wrote:
 Deewiant Wrote:
 Your comb sort uses a shrink factor of 0.7.  Comb sort 11 uses 
 1 / (1 - 1 / (e ^ phi)), where e is Napier's constant and phi 
 the golden ratio.  This leads to possible optimization in the 
 algorithm.

Hang on - isn't 0.7 a grow factor, rather than a shrink factor?

I took a more detailed (but still quite cursory) look at your code, and it seems you do things differently from the "traditional" method. I'm not sure if that's even equivalent to the normal comb sort. But yes, since 0.7 is less than 1, I suppose you could call it a grow factor. It still shrinks the size of the gap you're looking at, though. Up to you, but all references to comb sort I've seen talk about shrink factors. :-)
 Other sorts that could be added:
 	- Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort

Not sure if this one is worth it....

It's only a few lines of code, so there's not much to be worth. And at least it's faster than bubble sort. ;-) -- Remove ".doesnotlike.spam" from the mail address.
Mar 06 2007
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Deewiant Wrote:
<snip>
 I took a more detailed (but still quite cursory) look at your 
 code, and it seems you do things differently from the 
 "traditional" method.  I'm not sure if that's even equivalent 
 to the normal comb sort.

What do you consider to be the "normal comb sort"? OK, so the call to lastSwapBubbleSort was my idea, but other than that, there doesn't seem to be any difference at all, at least from this: http://en.wikipedia.org/wiki/Comb_sort <snip>
 Other sorts that could be added:
     - Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort

Not sure if this one is worth it....

It's only a few lines of code, so there's not much to be worth. And at least it's faster than bubble sort. ;-)

Looking at it on Wikipedia, it appears to be just a way of implementing insertion sort without nesting loops. Strange - before, I'd understood it to be something even slower - only one index variable, and so after moving an element down you step forward one by one just to get back to where you moved it from. Stewart.
Mar 06 2007
parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Stewart Gordon wrote:
 Deewiant Wrote:
 <snip>
 I took a more detailed (but still quite cursory) look at your 
 code, and it seems you do things differently from the 
 "traditional" method.  I'm not sure if that's even equivalent 
 to the normal comb sort.

What do you consider to be the "normal comb sort"? OK, so the call to lastSwapBubbleSort was my idea, but other than that, there doesn't seem to be any difference at all, at least from this: http://en.wikipedia.org/wiki/Comb_sort

That's exactly what's confusing me. I suppose the reason you need that call is that you're not counting the number of swaps made in the loop, and you're stopping just when the gap is 1, not when swaps are also zero. Which I think makes the algorithm a bit different, since the number of "combs" you do is limited to once per gap size, plus the one bubble sort after you're done. I added my comb sort 11 implementation to your benchmark, and it's noticeably faster for random data than your comb sort, at least on my machine. So something's different, to be sure. Even without the optimization related to gap size 11, mine is faster. Here's the source, changed to work with your benchmark, if you're interested: void combSort11(uint[] a) out { debug (100) printList(a); checkSorted(a); } body { // empirically found to be good at: // http://world.std.com/~jdveale/combsort.htm // 1 / (1 - 1 / (e ^ phi)), phi being golden ratio = (sqrt(5) + 1)/2 const real SHRINK_FACTOR = 1.2473309501039786540366528676643474234433714124826L; bit swapped = false; size_t gap = a.length; do { if (gap > 1) { gap = cast(size_t)(cast(real) gap / SHRINK_FACTOR); // hence, comb sort 11 if (gap == 9 || gap == 10) gap = 11; } swapped = false; for (size_t i = 0; i < a.length - gap; ++i) { size_t j = i + gap; if (a[i] > a[j]) { swap(a[i], a[j]); swapped = true; } } } while (swapped || gap > 1); } -- Remove ".doesnotlike.spam" from the mail address.
Mar 07 2007
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Deewiant Wrote:

<snip>
 That's exactly what's confusing me.  I suppose the reason you 
 need that call is that you're not counting the number of swaps 
 made in the loop, and you're stopping just when the gap is 1, 
 not when swaps are also zero.  Which I think makes the 
 algorithm a bit different, since the number of "combs" you do 
 is limited to once per gap size, plus the one bubble sort 
 after you're done.

It's more the other way round - calling bubble sort is the way I chose to implement the passes of gap size 1. It saves having to maintain a swap flag during the passes of gap > 1.
 I added my comb sort 11 implementation to your benchmark, and 
 it's noticeably faster for random data than your comb sort, at 
 least on my machine.  So something's different, to be sure.  
 Even without the optimization related to gap size 11, mine is 
 faster.

That appears to be because you've implemented the optimum shrink factor. If I change my implementation to use it as well, it runs at about the same speed as yours. Stewart.
Mar 07 2007
parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Stewart Gordon wrote:
 Deewiant Wrote:
 I added my comb sort 11 implementation to your benchmark, and 
 it's noticeably faster for random data than your comb sort, at 
 least on my machine.  So something's different, to be sure.  
 Even without the optimization related to gap size 11, mine is 
 faster.

That appears to be because you've implemented the optimum shrink factor. If I change my implementation to use it as well, it runs at about the same speed as yours.

Ah, of course. That clears it up. I ran the whole benchmark: for these cases, merge sort and comb sort 11 are the only two to beat the built-in sort (taking the average of the results of all 9 runs). Interesting. -- Remove ".doesnotlike.spam" from the mail address.
Mar 07 2007