www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Randomness in built-in .sort

reply "Bill Baxter" <wbaxter gmail.com> writes:
It seems to me that the built-in .sort uses a randomized algorithm
that leads to results that differ from run-to-run when there are
elements with identical keys.

This seems like a bad idea to me for the built-in algorithm to have
non-deterministic behavior because it means you can have bugs that
aren't consistently repeatable.

I think the built-in sort should be some kind of stable sort.   Also
the stability or lack thereof is not mentioned in the spec, and it
probably should be because stability is critical for some uses of
sorting.  One shouldn't have to guess whether a given implementation
of D will use a stable sort or not.

--bb
Jan 04 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Bill Baxter (wbaxter gmail.com)'s article
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.
 This seems like a bad idea to me for the built-in algorithm to have
 non-deterministic behavior because it means you can have bugs that
 aren't consistently repeatable.
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.
 --bb
IDK why sorting is builtin anyhow. I did some benchmarks a while back, and the builtin sort is slower than std.algorithm. IIRC the builtin sort uses function pointers for comparison, while the std.algorithm sort uses template parameters that can be inlined at compile time. I know this was mentioned before, but it didn't really get fully discussed. Why is sorting builtin? With property syntax, there's not even any syntactic sugar advantage. Furthermore, the builtin sort is substantially less flexible than a library sort. On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort. Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance. This is good when space is tight, but I think in most cases a merge sort is better as a stable sort. On the other hand, if builtin sort is kept, then I agree with you: It should follow the D convention of doing the safe thing by default (stable sort with no O(N^2) corner cases) and allowing less safe behavior (unstable but possibly faster sort that may or may not have some O(N^2) corner cases) in libraries.
Jan 04 2009
next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Mon, Jan 5, 2009 at 12:05 PM, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.
 This seems like a bad idea to me for the built-in algorithm to have
 non-deterministic behavior because it means you can have bugs that
 aren't consistently repeatable.
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.
 --bb
IDK why sorting is builtin anyhow. I did some benchmarks a while back, and the builtin sort is slower than std.algorithm. IIRC the builtin sort uses function pointers for comparison, while the std.algorithm sort uses template parameters that can be inlined at compile time. I know this was mentioned before, but it didn't really get fully discussed. Why is sorting builtin? With property syntax, there's not even any syntactic sugar advantage. Furthermore, the builtin sort is substantially less flexible than a library sort.
I agree. Builtin sort should probably just go away in D2. But it's there in D1 and can't be removed now. The best we can hope for there is for the specification to be tightened up to specify that .sort must be a stable algorithm.
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.

 On the other hand, if builtin sort is kept, then I agree with you:  It should
 follow the D convention of doing the safe thing by default (stable sort with no
 O(N^2) corner cases) and allowing less safe behavior (unstable but possibly
faster
 sort that may or may not have some O(N^2) corner cases) in libraries.
For my purposes, I just found Oskar Linde's stable sort implementation (based on merge sort), and am using that now. It supports a delegate as a predicate, too, which made my code a lot simpler (got rid of a dummy struct who's only purpose was to provide the necessary opCmp I wanted). So yeh, there's not much point for built-in .sort (and .reverse)... except the ever-present "it it's easier to get CTFE working if it's built-in", because the compiler can have a special case CTFE implementation of those operations. --bb
Jan 04 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.
I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. Andrei
Jan 04 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.
I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. Andrei
You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d. This is very heavily tested and optimized because some important higher level dstats functionality depends on it. You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias. One thing to note, though, is that this function is designed to allow for sorting parallel arrays in lockstep by simply passing them in as additional parameters. This adds some complexity to the implementation. Also, it uses the TempAlloc region allocator for temp space. You could put this in Phobos too (This allocator was actually your idea a while back, though you called it a SuperStack), or you could change the temp space allocation to simply use the regular heap allocation scheme.
Jan 05 2009
next sibling parent reply Don <nospam nospam.com> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.
I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. Andrei
You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d. This is very heavily tested and optimized because some important higher level dstats functionality depends on it. You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias. One thing to note, though, is that this function is designed to allow for sorting parallel arrays in lockstep by simply passing them in as additional parameters. This adds some complexity to the implementation. Also, it uses the TempAlloc region allocator for temp space. You could put this in Phobos too (This allocator was actually your idea a while back, though you called it a SuperStack), or you could change the temp space allocation to simply use the regular heap allocation scheme.
I want TempAlloc in dcore! Eventually, anyway.
Jan 05 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Don (nospam nospam.com)'s article
 I want TempAlloc in dcore! Eventually, anyway.
The only problem is that, in its current form, TempAlloc is kind of dangerous because it puts the onus on the programmer to not store the only reference to a reference type in a TempAlloc-allocated block. If TempAlloc were more tightly integrated with the GC, this could be solved easily with negligible overhead. One would simply have to keep a thread-local variable that tracks how many blocks are currently in use, kind of like the stack pointer in the call stack, and a bit array that tracks whether each block that is currently in use should be scanned. However, without GC integration, just making everything scanned would cause so many false pointer issues and slowdowns when GC is run that I decided that, for now, given the anticipated use cases, making it a little dangerous and scanning nothing was the lesser of two evils.
Jan 05 2009
prev sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Jan 6, 2009 at 12:16 AM, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.
I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it. Andrei
You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d. This is very heavily tested and optimized because some important higher level dstats functionality depends on it. You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias.
Thanks for the pointer. Actually when I said I "found" Oskar Linde's sorting code, that really meant I found it sitting in my source tree where I had incorporated it long ago after Oskar made it available. So I just used that.
 One thing to note, though, is that this function is designed to allow for
sorting
 parallel arrays in lockstep by simply passing them in as additional parameters.
 This adds some complexity to the implementation.  Also, it uses the TempAlloc
 region allocator for temp space.  You could put this in Phobos too (This
allocator
 was actually your idea a while back, though you called it a SuperStack), or you
 could change the temp space allocation to simply use the regular heap
allocation
 scheme.
Actually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort. So that doesn't sound like a limitation to me at all. :-) --bb
Jan 05 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb
Am I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.
Jan 05 2009
next sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Jan 6, 2009 at 6:15 AM, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb
Am I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality?
I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order. That can then be used as to index other arrays (thereby permuting them to the sorted order). order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]
 The standard rebuttal might be
 "use arrays of structs instead of parallel arrays".
 This is a good idea in some
 situations, but for others, parallel arrays are just plain better.
Yeh, that's just a silly argument. Sometimes a column-oriented organization is more suitable than a row-oriented one. Sort can be made to work on column-oriented data, so there's no reason not to make it so.
 Furthermore, with D's handling of variadic functions, generalizing any sort to
handle parallel
 arrays is easy.
Yeh, this works pretty nicely. --bb
Jan 05 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:

I use this all the time in NumPy in the form of "argsort" which returns a list
of indices giving the sort order.  That can then be used as to index other
arrays (thereby permuting them to the sorted order).
order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]< My dlibs already contain an efficient sorting routine, much faster than the built in one. So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html The code: http://www.fantascienza.net/leonardo/so/libs_d.zip The usage is very simple, sortedIndexes is similar to sorted(): auto a = [-5, 2, 3, 1, -11].dup; auto order1 = a.sortedIndexes(); // now order1 == [4U, 0, 3, 1, 2] auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;}); // Now order2 == [3U, 1, 2, 0, 4] int[2][] c = [[3, 4], [1,2], [1, 1]]; auto order3 = c.sortedIndexes(); // now order3 == [2U, 1, 0] auto a = [-5, 2, 3, 1, -11].dup; a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3] // a isn't changed here a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11] auto b = ["oranges"[], "for", "apples", "is", "right"].dup; b.remapOrder([3, 1, 2, 0, 4]) ==> ["is"[], "for", "apples", "oranges", "right"].dup); (Maybe I have to rename remapOrder as remappedOrder). -------------------------------- While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup; While the compiler accepts this, I don't know why: int[2][] c0 = [[3, 4], [1, 2], [1, 1]]; int[2][] c2 = c0.dup; Bye, bearophile
Jan 05 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
bearophile:
 (Maybe I have to rename remapOrder as remappedOrder).
'reordered' is better still, I've changed the name. Bye, bearophile
Jan 05 2009
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Jan 6, 2009 at 10:18 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Bill Baxter:

I use this all the time in NumPy in the form of "argsort" which returns a list
of indices giving the sort order.  That can then be used as to index other
arrays (thereby permuting them to the sorted order).
order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]< My dlibs already contain an efficient sorting routine, much faster than the built in one. So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html The code: http://www.fantascienza.net/leonardo/so/libs_d.zip The usage is very simple, sortedIndexes is similar to sorted(): auto a = [-5, 2, 3, 1, -11].dup; auto order1 = a.sortedIndexes(); // now order1 == [4U, 0, 3, 1, 2] auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;}); // Now order2 == [3U, 1, 2, 0, 4] int[2][] c = [[3, 4], [1,2], [1, 1]]; auto order3 = c.sortedIndexes(); // now order3 == [2U, 1, 0] auto a = [-5, 2, 3, 1, -11].dup; a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3] // a isn't changed here a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11] auto b = ["oranges"[], "for", "apples", "is", "right"].dup; b.remapOrder([3, 1, 2, 0, 4]) ==> ["is"[], "for", "apples", "oranges", "right"].dup); (Maybe I have to rename remapOrder as remappedOrder). -------------------------------- While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup; While the compiler accepts this, I don't know why: int[2][] c0 = [[3, 4], [1, 2], [1, 1]]; int[2][] c2 = c0.dup;
Making a permutation array like that is simple and nice, but another alternative which may be more efficient is to override the "swap(i,j)" routine used by the sort implementation, and make it swap the i'th and jth elements of all the arrays that are being sorted. That can be done with O(1) extra storage instead of the O(n) required by making to make a permutation array. I think it's also hard to avoid O(n) extra memory for permuting an array. At least I can't think of how to do it with out using an extra allocation. --bb
Jan 05 2009
prev sibling parent reply mw <mingwu gmail.com> writes:
On Tuesday, 6 January 2009 at 01:18:21 UTC, bearophile wrote:
 Bill Baxter:

I use this all the time in NumPy in the form of "argsort" which 
returns a list of indices giving the sort order.  That can then 
be used as to index other arrays (thereby permuting them to the 
sorted order).
order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]< My dlibs already contain an efficient sorting routine, much faster than the built in one. So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html The code: http://www.fantascienza.net/leonardo/so/libs_d.zip The usage is very simple, sortedIndexes is similar to sorted(): auto a = [-5, 2, 3, 1, -11].dup; auto order1 = a.sortedIndexes(); // now order1 == [4U, 0, 3, 1, 2]
I'm now looking for np.argsort, and found this post. bearophile, the doc page above is gone, although the zip file is still there (~10 years old), do you want to create a github repo, and contribute the code to dub (https://code.dlang.org/)? BTW, does anyone know if there is a np.argsort function in some other d libraries? Thanks.
Mar 23 2021
parent jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 23 March 2021 at 21:22:25 UTC, mw wrote:
 [snip]

 I'm now looking for np.argsort, and found this post.

  bearophile, the doc page above is gone, although the zip file 
 is still there (~10 years old), do you want to create a github 
 repo, and contribute the code to dub (https://code.dlang.org/)?


 BTW, does anyone know if there is a np.argsort function in some 
 other d libraries?


 Thanks.
import std.algorithm.sorting: makeIndex; import std.algorithm.comparison: equal; import std.range: indexed; void main() { int[] arr0 = [ 2, 3, 1, 5, 0 ]; int[] arr1 = [ 1, 5, 4, 2, -1 ]; auto index = new int[arr0.length]; makeIndex!("a < b")(arr0, index); assert(index == [4, 2, 0, 1, 3]); auto view = arr1.indexed(index); assert(view.equal([-1, 4, 1, 5, 2])); }
Mar 23 2021
prev sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
dsimcha wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb
Am I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.
I've written my own parallel-array quicksort implementation (several times over, in many different languages). Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library. --benji
Jan 05 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 dsimcha wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb
Am I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.
I've written my own parallel-array quicksort implementation (several times over, in many different languages). Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library. --benji
std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples. Andrei
Jan 05 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 std.algorithm's parameterized swap primitive is motivated in part by 
 parallel array manipulation. See the schwartz routines in there for 
 examples.
And may the schwartz be with your sorts!
Jan 05 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 std.algorithm's parameterized swap primitive is motivated in part by 
 parallel array manipulation. See the schwartz routines in there for 
 examples.
And may the schwartz be with your sorts!
A Schwartz joke has been present in http://digitalmars.com/d/2.0/phobos/std_algorithm.html since its early days. I am appalled nobody has noticed it. Andrei
Jan 05 2009
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Bill Baxter wrote:
<snip>
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.
 
 --bb
A stable sort can be slower than an unstable sort. There are many cases where the order of equally-ranking keys does not matter or it has no effect whatsoever (e.g. sorting an array of integers). Then why take the extra overhead of making the sort stable? We could have an extra property .stableSort, for those cases where you need stability. Of course, it would be valid to implement .sort and .stableSort with the same algorithm, though to do so would be naive unless someone comes up with a stable sorting algorithm with O(n log n) or better time complexity and O(log n) or better extra memory requirement. But in the absence of such an algorithm, the compiler could still replace .stableSort with .sort in those cases where it can guarantee it will make no difference to the end result. Or you could argue that requiring sort to be stable is sufficiently uncommon that it might as well be left to a library function. Stewart.
Jan 05 2009
parent "Bill Baxter" <wbaxter gmail.com> writes:
On Mon, Jan 5, 2009 at 8:40 PM, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 Bill Baxter wrote:
 <snip>
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.

 --bb
 A stable sort can be slower than an unstable sort.  There are many cases
 where the order of equally-ranking keys does not matter or it has no effect
 whatsoever (e.g. sorting an array of integers).  Then why take the extra
 overhead of making the sort stable?
 We could have an extra property .stableSort, for those cases where you need
 stability.  Of course, it would be valid to implement .sort and .stableSort
 with the same algorithm, though to do so would be naive unless someone comes
 up with a stable sorting algorithm with O(n log n) or better time complexity
 and O(log n) or better extra memory requirement.  But in the absence of such
 an algorithm, the compiler could still replace .stableSort with .sort in
 those cases where it can guarantee it will make no difference to the end
 result.
Actually yeh, I don't really care so much if .sort is stable or not (as long as it's clear in the spec what to expect). But I do care that it is deterministic. The current implementation seems to use some randomness (random partitions?) so if I run my program 5 times I get 5 different sort orders when there are duplicated keys. This makes it really hard to debug things that only happen with certain orders.
 Or you could argue that requiring sort to be stable is sufficiently uncommon
 that it might as well be left to a library function.
There is no one sort algorithm that is fastest for all inputs. So it would make more sense to me to argue that if you want the fastest possible sort for your problem then you should go with a library solution. Built-ins should be efficient, yes, but to argue that efficiency is more important than generality in a built-in seems wrong to me. A least for .sort. There's no performance benefit to having a built-in sort, so it's really only there for convenience. Using a stable sort would make .sort applicable to more problems, thus increasing the integral of convenience. --bb
Jan 05 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Bill Baxter wrote:
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.
No, it doesn't use random sorts. The source code is included, so this should be easy to verify.
Jan 06 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Bill Baxter wrote:
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.
No, it doesn't use random sorts. The source code is included, so this should be easy to verify.
I don't know if it helps, but the built-in sort uses a similar algorithm to the sort routine in tango.core.Array. It picks a "median of 3" value for the pivot. As Walter said, I don't believe any randomness is involved. Sean
Jan 06 2009
parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Bill Baxter wrote:
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.
No, it doesn't use random sorts. The source code is included, so this should be easy to verify.
I don't know if it helps, but the built-in sort uses a similar algorithm to the sort routine in tango.core.Array. It picks a "median of 3" value for the pivot. As Walter said, I don't believe any randomness is involved. Sean
I think Bill speaks about a stable sort. You can have an unstable sort algorithm without having explicity a random invocation. Note that he's saying "leads to results that differ from run-to-run when there are elements with identical keys".
Jan 06 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Ary Borenszweig wrote:
 I think Bill speaks about a stable sort. You can have an unstable sort 
 algorithm without having explicity a random invocation. Note that he's 
 saying "leads to results that differ from run-to-run when there are 
 elements with identical keys".
And I believe that the sort implementation is completely deterministic. If Bill is getting different results from different runs, something else is going on.
Jan 06 2009
parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Jan 7, 2009 at 4:09 AM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Ary Borenszweig wrote:
 I think Bill speaks about a stable sort. You can have an unstable sort
 algorithm without having explicity a random invocation. Note that he's
 saying "leads to results that differ from run-to-run when there are elements
 with identical keys".
And I believe that the sort implementation is completely deterministic. If Bill is getting different results from different runs, something else is going on.
Really? Hmm, I wonder what the reason was, then. It did go away when I switched to a different sort routine. Don't have time to investigate right now, unfortunately. But thank you very much for the info that it should be deterministic. --bb
Jan 06 2009