digitalmars.D.announce - Sorting algorithms benchmark

Stewart Gordon (4/4) Feb 27 2007 I've written a program to benchmark a number of sorting algorithms.

Deewiant (93/101) Feb 28 2007 Nice. One of the very first D programs I wrote was a class Sorter which ...

Stewart Gordon (15/30) Feb 28 2007 Hang on - isn't 0.7 a grow factor, rather than a shrink factor?

Deewiant (11/22) Mar 06 2007 I took a more detailed (but still quite cursory) look at your code, and ...

Stewart Gordon (7/18) Mar 06 2007 What do you consider to be the "normal comb sort"? OK, so the call to l...

Deewiant (41/52) Mar 07 2007 That's exactly what's confusing me. I suppose the reason you need that c...

Stewart Gordon (6/18) Mar 07 2007 It's more the other way round - calling bubble sort is the way I chose t...

Deewiant (7/17) Mar 07 2007 Ah, of course. That clears it up.

Stewart Gordon <smjg_1998 yahoo.com> writes:

I've written a program to benchmark a number of sorting algorithms.

http://pr.stewartsplace.org.uk/d/sortbench.d

It includes options to output the timings to a CSV file and to control the
number of tests of each algorithm to perform (likely to be needed on modern
machines).

Stewart.

Feb 27 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Stewart Gordon wrote:
 I've written a program to benchmark a number of sorting algorithms.
 
 http://pr.stewartsplace.org.uk/d/sortbench.d
 
 It includes options to output the timings to a CSV file and to control the
number of tests of each algorithm to perform (likely to be needed on modern
machines).
 
 Stewart.
 

Nice. One of the very first D programs I wrote was a class Sorter which aimed to
do the same, but I never published it since most of the code was just pretty
much directly copied from somewhere. Besides, I didn't really understand most of
the algorithms, and couldn't get all the ones I wanted to work. Pigeonhole sort,
in particular, couldn't reliably work for all types without opAssign
overloading, which was necessary.

Some comments from what I do (think I do) know, however:

Your comb sort uses a shrink factor of 0.7. Comb sort 11 uses 1 / (1 - 1 / (e ^
phi)), where e is Napier's constant and phi the golden ratio. This leads to
possible optimization in the algorithm.

But alas, the page that used to contain the article describing it is gone from
the web: http://world.std.com/~jdveale/combsort.htm. Dammit! I haven't copied
the source code for it from there, either. Oh well, a less optimal version but
still, I wager, better than yours:
http://yagni.com/combsort/index.php#cpp-combsort

What I used: const real SHRINK_FACTOR =
1.2473309501039786540366528676643474234433714124826;

You're missing shell sort: http://en.wikipedia.org/wiki/Shell_sort
See also: http://www.cs.princeton.edu/~rs/talks/shellsort.pdf
My 1.5-year old code (no idea how it works - I'm particularly frightened of the
bit math, I think it's the formula here:
http://www.research.att.com/~njas/sequences/A033622) is at the bottom of this
post, if you want to have a look.

Other sorts that could be added:
	- Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort
	- Introspective sort: http://en.wikipedia.org/wiki/Introsort
	- Three-way merge sort:
http://www.cs.queensu.ca/home/cisc121/2002w/assignments/assn8/solution/MergeThreeWay.java
	- Counting sort, http://en.wikipedia.org/wiki/Counting_sort
	- Bucket sort, http://en.wikipedia.org/wiki/Bucket_sort
	- Pigeonhole sort, http://en.wikipedia.org/wiki/Pigeonhole_sort

I didn't look too much at your radix sorts, they might have a lot in common with
the last three of the above.

Also, your "buffered insertion sort" is perhaps more commonly known as "binary
insertion sort", since you mention it uses binary search.

--

// TYPE is just a template parameter
// comp defaults to a simple less-than comparison
// the inout is probably unnecessary, I wouldn't have known that back then
void shellSort(inout TYPE[] a, size_t beg = 0, size_t end = 0, cmpType cmp =
null) {
	if (cmp is null) cmp = this.comp;
	if (end == 0) end = a.length;

	// can't figure out how to get it working for arbitrary beg and end, hence kept
separate here
	void actualShellSort(inout TYPE[] ra) {
		size_t h = 1, hh = 1;
		while (h < ra.length) {
			// magic numbers from formula:
			//
http://www.research.att.com/cgi-bin/access.cgi/as/njas/sequences/eisA.cgi?Anum=A033622
			if (hh % 2)
				h = 8 << hh - 6 << ((hh + 1) / 2) + 1;
			else
				h = 9 << hh - 9 << ( hh      / 2) + 1;
			++hh;
		}

		while (h > 0) {
			// for each set, of which there are h
			for (size_t i = h - 1; i < ra.length; ++i) {
				// pick last element in set
				TYPE tmp = ra[i];
				size_t j = i;

				// compare tmp to the one before it in the set
				// if they are not in order, continue this loop, moving
				// elements back to make room for tmp
				for (; j >= h && !cmp(tmp, ra[j-h]); j -= h)
					ra[j] = ra[j - h];

				ra[j] = tmp;
			}

			// all h-sets sorted
			h /= 3;
		}
	}

	if (!smartToSort(a, beg, end, cmp)) return;

	TYPE[] foo = a[beg..end];
	foo.actualShellSort();
}

// other relevant stuff to get the above to compile (not tested, though):

bit smartToSort(inout TYPE[] a, size_t beg, size_t end, cmpType cmp) {
	if (end <= beg)
		return false;
	if (end - beg > a.length)
		return false;

	if (end - beg < 2)
		return false;
	else if (end - beg == 2) {
		if (!cmp(a[end - 1], a[beg]))
			a.swap(beg, end - 1);
		return false;
	}

	return true;
}

alias int function(in TYPE left, in TYPE right) cmpType;

Feb 28 2007

Stewart Gordon <smjg_1998 yahoo.com> writes:

Deewiant Wrote:

<snip>
 Your comb sort uses a shrink factor of 0.7.  Comb sort 11 uses 
 1 / (1 - 1 / (e ^ phi)), where e is Napier's constant and phi 
 the golden ratio.  This leads to possible optimization in the 
 algorithm.

Hang on - isn't 0.7 a grow factor, rather than a shrink factor?

<snip>
 You're missing shell sort: 
 http://en.wikipedia.org/wiki/Shell_sort

Indeed, that could be implemented as well.

I noticed there:
"For instance, if a list was 5-sorted and then 3-sorted, the 
list is now not only 3-sorted, but both 5- and 3-sorted."
Guess I should read up on the proof sometime.

<snip>
 Other sorts that could be added:
 	- Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort

Not sure if this one is worth it....

 	- Introspective sort: http://en.wikipedia.org/wiki/Introsort
 	- Three-way merge sort:
 http://www.cs.queensu.ca/home/cisc121/2002w/assignments/assn8/solution/MergeThreeWay.java

Certainly worth considering.

 	- Counting sort, http://en.wikipedia.org/wiki/Counting_sort
 	- Bucket sort, http://en.wikipedia.org/wiki/Bucket_sort
 	- Pigeonhole sort, 
 http://en.wikipedia.org/wiki/Pigeonhole_sort

<snip>

It took me a moment to figure the difference between these three.  But I would
need to reduce the range of the random sequences a lot before any of them will
be remotely feasible.

Stewart.

Feb 28 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Stewart Gordon wrote:
 Deewiant Wrote:
 Your comb sort uses a shrink factor of 0.7.  Comb sort 11 uses 
 1 / (1 - 1 / (e ^ phi)), where e is Napier's constant and phi 
 the golden ratio.  This leads to possible optimization in the 
 algorithm.

 
 Hang on - isn't 0.7 a grow factor, rather than a shrink factor?

I took a more detailed (but still quite cursory) look at your code, and it seems
you do things differently from the "traditional" method. I'm not sure if that's
even equivalent to the normal comb sort.

But yes, since 0.7 is less than 1, I suppose you could call it a grow factor. It
still shrinks the size of the gap you're looking at, though. Up to you, but all
references to comb sort I've seen talk about shrink factors. :-)

 Other sorts that could be added:
 	- Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort

 
 Not sure if this one is worth it....

It's only a few lines of code, so there's not much to be worth. And at least
it's faster than bubble sort. ;-)

-- 
Remove ".doesnotlike.spam" from the mail address.

Mar 06 2007

Stewart Gordon <smjg_1998 yahoo.com> writes:

Deewiant Wrote:
<snip>
 I took a more detailed (but still quite cursory) look at your 
 code, and it seems you do things differently from the 
 "traditional" method.  I'm not sure if that's even equivalent 
 to the normal comb sort.

What do you consider to be the "normal comb sort"?  OK, so the call to
lastSwapBubbleSort was my idea, but other than that, there doesn't seem to be
any difference at all, at least from this:

http://en.wikipedia.org/wiki/Comb_sort

<snip>
 Other sorts that could be added:
     - Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort

 
 Not sure if this one is worth it....

 
 It's only a few lines of code, so there's not much to be 
 worth.  And at least it's faster than bubble sort.  ;-)

Looking at it on Wikipedia, it appears to be just a way of implementing
insertion sort without nesting loops.  Strange - before, I'd understood it to
be something even slower - only one index variable, and so after moving an
element down you step forward one by one just to get back to where you moved it
from.

Stewart.

Mar 06 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Stewart Gordon wrote:
 Deewiant Wrote:
 <snip>
 I took a more detailed (but still quite cursory) look at your 
 code, and it seems you do things differently from the 
 "traditional" method.  I'm not sure if that's even equivalent 
 to the normal comb sort.

 
 What do you consider to be the "normal comb sort"?  OK, so the call to
lastSwapBubbleSort was my idea, but other than that, there doesn't seem to be
any difference at all, at least from this:
 
 http://en.wikipedia.org/wiki/Comb_sort
 

That's exactly what's confusing me. I suppose the reason you need that call is
that you're not counting the number of swaps made in the loop, and you're
stopping just when the gap is 1, not when swaps are also zero. Which I think
makes the algorithm a bit different, since the number of "combs" you do is
limited to once per gap size, plus the one bubble sort after you're done.

I added my comb sort 11 implementation to your benchmark, and it's noticeably
faster for random data than your comb sort, at least on my machine. So
something's different, to be sure. Even without the optimization related to gap
size 11, mine is faster.

Here's the source, changed to work with your benchmark, if you're interested:

void combSort11(uint[] a)
out {
	debug (100) printList(a);
	checkSorted(a);
} body {
	// empirically found to be good at:
	// http://world.std.com/~jdveale/combsort.htm
	// 1 / (1 - 1 / (e ^ phi)), phi being golden ratio = (sqrt(5) + 1)/2
	const real SHRINK_FACTOR =
1.2473309501039786540366528676643474234433714124826L;

	bit swapped = false;
	size_t gap = a.length;

	do {
		if (gap > 1) {
			gap = cast(size_t)(cast(real) gap / SHRINK_FACTOR);

			// hence, comb sort 11
			if (gap == 9 || gap == 10)
				gap = 11;
		}

		swapped = false;

		for (size_t i = 0; i < a.length - gap; ++i) {
			size_t j = i + gap;

			if (a[i] > a[j]) {
				swap(a[i], a[j]);
				swapped = true;
			}
		}

	} while (swapped || gap > 1);
}

-- 
Remove ".doesnotlike.spam" from the mail address.

Mar 07 2007

Stewart Gordon <smjg_1998 yahoo.com> writes:

Deewiant Wrote:

<snip>
 That's exactly what's confusing me.  I suppose the reason you 
 need that call is that you're not counting the number of swaps 
 made in the loop, and you're stopping just when the gap is 1, 
 not when swaps are also zero.  Which I think makes the 
 algorithm a bit different, since the number of "combs" you do 
 is limited to once per gap size, plus the one bubble sort 
 after you're done.

It's more the other way round - calling bubble sort is the way I chose to
implement the passes of gap size 1.  It saves having to maintain a swap flag
during the passes of gap > 1.

 I added my comb sort 11 implementation to your benchmark, and 
 it's noticeably faster for random data than your comb sort, at 
 least on my machine.  So something's different, to be sure.  
 Even without the optimization related to gap size 11, mine is 
 faster.

<snip>

That appears to be because you've implemented the optimum shrink factor.  If I
change my implementation to use it as well, it runs at about the same speed as
yours.

Stewart.

Mar 07 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Stewart Gordon wrote:
 Deewiant Wrote:
 I added my comb sort 11 implementation to your benchmark, and 
 it's noticeably faster for random data than your comb sort, at 
 least on my machine.  So something's different, to be sure.  
 Even without the optimization related to gap size 11, mine is 
 faster.

 <snip>
 
 That appears to be because you've implemented the optimum shrink factor.  If I
change my implementation to use it as well, it runs at about the same speed as
yours.
 

Ah, of course. That clears it up.

I ran the whole benchmark: for these cases, merge sort and comb sort 11 are the
only two to beat the built-in sort (taking the average of the results of all 9
runs). Interesting.

-- 
Remove ".doesnotlike.spam" from the mail address.

Mar 07 2007

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Sorting algorithms benchmark