digitalmars.D - Randomness in built-in .sort

Bill Baxter (12/12) Jan 04 2009 It seems to me that the built-in .sort uses a randomized algorithm

dsimcha (16/28) Jan 04 2009 IDK why sorting is builtin anyhow. I did some benchmarks a while back, ...

Bill Baxter (15/43) Jan 04 2009 I agree. Builtin sort should probably just go away in D2.
Andrei Alexandrescu (5/9) Jan 04 2009 I agree. I didn't have the time to implement a very good stable sort,

dsimcha (15/24) Jan 05 2009 You could try the merge sort implementation from my dstats library at

Don (2/28) Jan 05 2009 I want TempAlloc in dcore! Eventually, anyway.

dsimcha (12/13) Jan 05 2009 The only problem is that, in its current form, TempAlloc is kind of dang...

Bill Baxter (9/33) Jan 05 2009 Thanks for the pointer. Actually when I said I "found" Oskar Linde's

dsimcha (7/11) Jan 05 2009 Am I (and possibly you) the only one(s) who think that sorting multiple ...

Bill Baxter (14/27) Jan 05 2009 I use this all the time in NumPy in the form of "argsort" which

bearophile (34/35) Jan 05 2009 order = npy.argsort(keys)

bearophile (4/5) Jan 05 2009 'reordered' is better still, I've changed the name.
Bill Baxter (10/43) Jan 05 2009 Making a permutation array like that is simple and nice, but another
mw (8/26) Mar 23 2021 I'm now looking for np.argsort, and found this post.

jmh530 (13/21) Mar 23 2021 import std.algorithm.sorting: makeIndex;

Benji Smith (6/18) Jan 05 2009 I've written my own parallel-array quicksort implementation (several

Andrei Alexandrescu (5/31) Jan 05 2009 std.algorithm's parameterized swap primitive is motivated in part by

Walter Bright (2/5) Jan 05 2009 And may the schwartz be with your sorts!

Andrei Alexandrescu (5/11) Jan 05 2009 A Schwartz joke has been present in

Stewart Gordon (17/24) Jan 05 2009 A stable sort can be slower than an unstable sort. There are many cases...

Bill Baxter (18/42) Jan 05 2009 Actually yeh, I don't really care so much if .sort is stable or not

Walter Bright (3/6) Jan 06 2009 No, it doesn't use random sorts. The source code is included, so this

Sean Kelly (5/11) Jan 06 2009 I don't know if it helps, but the built-in sort uses a similar algorithm

Ary Borenszweig (5/19) Jan 06 2009 I think Bill speaks about a stable sort. You can have an unstable sort

Walter Bright (4/8) Jan 06 2009 And I believe that the sort implementation is completely deterministic.

Bill Baxter (7/16) Jan 06 2009 Really? Hmm, I wonder what the reason was, then. It did go away when

"Bill Baxter" <wbaxter gmail.com> writes:

It seems to me that the built-in .sort uses a randomized algorithm
that leads to results that differ from run-to-run when there are
elements with identical keys.

This seems like a bad idea to me for the built-in algorithm to have
non-deterministic behavior because it means you can have bugs that
aren't consistently repeatable.

I think the built-in sort should be some kind of stable sort.   Also
the stability or lack thereof is not mentioned in the spec, and it
probably should be because stability is critical for some uses of
sorting.  One shouldn't have to guess whether a given implementation
of D will use a stable sort or not.

--bb

Jan 04 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Bill Baxter (wbaxter gmail.com)'s article
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.
 This seems like a bad idea to me for the built-in algorithm to have
 non-deterministic behavior because it means you can have bugs that
 aren't consistently repeatable.
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.
 --bb

IDK why sorting is builtin anyhow.  I did some benchmarks a while back, and the
builtin sort is slower than std.algorithm.  IIRC the builtin sort uses function
pointers for comparison, while the std.algorithm sort uses template parameters
that can be inlined at compile time.  I know this was mentioned before, but it
didn't really get fully discussed.  Why is sorting builtin?  With property
syntax,
there's not even any syntactic sugar advantage.  Furthermore, the builtin sort
is
substantially less flexible than a library sort.

On another note, it would be nice if std.algorithm implemented a stable O(N log
N)
sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping
algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
space is tight, but I think in most cases a merge sort is better as a stable
sort.

On the other hand, if builtin sort is kept, then I agree with you:  It should
follow the D convention of doing the safe thing by default (stable sort with no
O(N^2) corner cases) and allowing less safe behavior (unstable but possibly
faster
sort that may or may not have some O(N^2) corner cases) in libraries.

Jan 04 2009

"Bill Baxter" <wbaxter gmail.com> writes:

On Mon, Jan 5, 2009 at 12:05 PM, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.
 This seems like a bad idea to me for the built-in algorithm to have
 non-deterministic behavior because it means you can have bugs that
 aren't consistently repeatable.
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.
 --bb

 IDK why sorting is builtin anyhow.  I did some benchmarks a while back, and the
 builtin sort is slower than std.algorithm.  IIRC the builtin sort uses function
 pointers for comparison, while the std.algorithm sort uses template parameters
 that can be inlined at compile time.  I know this was mentioned before, but it
 didn't really get fully discussed.  Why is sorting builtin?  With property
syntax,
 there's not even any syntactic sugar advantage.  Furthermore, the builtin sort
is
 substantially less flexible than a library sort.

I agree.  Builtin sort should probably just go away in D2.
But it's there in D1 and can't be removed now.  The best we can hope
for there is for the specification to be tightened up to specify that
.sort must be a stable algorithm.

 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.

 On the other hand, if builtin sort is kept, then I agree with you:  It should
 follow the D convention of doing the safe thing by default (stable sort with no
 O(N^2) corner cases) and allowing less safe behavior (unstable but possibly
faster
 sort that may or may not have some O(N^2) corner cases) in libraries.


For my purposes, I just found Oskar Linde's stable sort implementation
(based on merge sort), and am using that now.
It supports a delegate as a predicate, too, which made my code a lot
simpler (got rid of a dummy struct who's only purpose was to provide
the necessary opCmp I wanted).

So yeh, there's not much point for built-in .sort (and .reverse)...
except the ever-present "it it's easier to get CTFE working if it's
built-in", because the compiler can have a special case CTFE
implementation of those operations.

--bb

Jan 04 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.

I agree. I didn't have the time to implement a very good stable sort, 
but I also think the extra slowness is not critical. If anyone comes 
with a better implementation, I'd be glad to integrate it.

Andrei

Jan 04 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.

 I agree. I didn't have the time to implement a very good stable sort,
 but I also think the extra slowness is not critical. If anyone comes
 with a better implementation, I'd be glad to integrate it.
 Andrei

You could try the merge sort implementation from my dstats library at
http://dsource.org/projects/dstats/browser/trunk/sort.d.  This is very heavily
tested and optimized because some important higher level dstats functionality
depends on it.  You'd basically just have to add a template parameter to allow a
user-provided swap function to make it conform to std.algorithm conventions.
Obviously, since it's by its nature a stable sort, you don't really need the
swapStrategy alias.

One thing to note, though, is that this function is designed to allow for
sorting
parallel arrays in lockstep by simply passing them in as additional parameters.
This adds some complexity to the implementation.  Also, it uses the TempAlloc
region allocator for temp space.  You could put this in Phobos too (This
allocator
was actually your idea a while back, though you called it a SuperStack), or you
could change the temp space allocation to simply use the regular heap allocation
scheme.

Jan 05 2009

Don <nospam nospam.com> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.

 I agree. I didn't have the time to implement a very good stable sort,
 but I also think the extra slowness is not critical. If anyone comes
 with a better implementation, I'd be glad to integrate it.
 Andrei

 
 You could try the merge sort implementation from my dstats library at
 http://dsource.org/projects/dstats/browser/trunk/sort.d.  This is very heavily
 tested and optimized because some important higher level dstats functionality
 depends on it.  You'd basically just have to add a template parameter to allow
a
 user-provided swap function to make it conform to std.algorithm conventions.
 Obviously, since it's by its nature a stable sort, you don't really need the
 swapStrategy alias.
 
 One thing to note, though, is that this function is designed to allow for
sorting
 parallel arrays in lockstep by simply passing them in as additional parameters.
 This adds some complexity to the implementation.  Also, it uses the TempAlloc
 region allocator for temp space.  You could put this in Phobos too (This
allocator
 was actually your idea a while back, though you called it a SuperStack), or you
 could change the temp space allocation to simply use the regular heap
allocation
 scheme.

I want TempAlloc in dcore! Eventually, anyway.

Jan 05 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Don (nospam nospam.com)'s article
 I want TempAlloc in dcore! Eventually, anyway.

The only problem is that, in its current form, TempAlloc is kind of dangerous
because it puts the onus on the programmer to not store the only reference to a
reference type in a TempAlloc-allocated block.  If TempAlloc were more tightly
integrated with the GC, this could be solved easily with negligible overhead. 
One
would simply have to keep a thread-local variable that tracks how many blocks
are
currently in use, kind of like the stack pointer in the call stack, and a bit
array that tracks whether each block that is currently in use should be scanned.
However, without GC integration, just making everything scanned would cause so
many false pointer issues and slowdowns when GC is run that I decided that, for
now, given the anticipated use cases, making it a little dangerous and scanning
nothing was the lesser of two evils.

Jan 05 2009

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Jan 6, 2009 at 12:16 AM, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 On another note, it would be nice if std.algorithm implemented a stable O(N
log N)
 sort, like a merge sort.  Right now, IIRC it uses an interesting stable
swapping
 algorithm on top of a quick sort for O(N log^2 N) performance.  This is good
when
 space is tight, but I think in most cases a merge sort is better as a stable
sort.

 I agree. I didn't have the time to implement a very good stable sort,
 but I also think the extra slowness is not critical. If anyone comes
 with a better implementation, I'd be glad to integrate it.
 Andrei

 You could try the merge sort implementation from my dstats library at
 http://dsource.org/projects/dstats/browser/trunk/sort.d.  This is very heavily
 tested and optimized because some important higher level dstats functionality
 depends on it.  You'd basically just have to add a template parameter to allow
a
 user-provided swap function to make it conform to std.algorithm conventions.
 Obviously, since it's by its nature a stable sort, you don't really need the
 swapStrategy alias.

Thanks for the pointer.  Actually when I said I "found" Oskar Linde's
sorting code, that really meant I found it sitting in my source tree
where I had incorporated it long ago after Oskar made it available.
So I just used that.

 One thing to note, though, is that this function is designed to allow for
sorting
 parallel arrays in lockstep by simply passing them in as additional parameters.
 This adds some complexity to the implementation.  Also, it uses the TempAlloc
 region allocator for temp space.  You could put this in Phobos too (This
allocator
 was actually your idea a while back, though you called it a SuperStack), or you
 could change the temp space allocation to simply use the regular heap
allocation
 scheme.

Actually, a function to sort multiple arrays in parallel was exactly
what I was implementing using .sort.  So that doesn't sound like a
limitation to me at all.   :-)

--bb

Jan 05 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb

Am I (and possibly you) the only one(s) who think that sorting multiple arrays
in
parallel should be standard library functionality?  The standard rebuttal might
be
"use arrays of structs instead of parallel arrays".  This is a good idea in some
situations, but for others, parallel arrays are just plain better.  Furthermore,
with D's handling of variadic functions, generalizing any sort to handle
parallel
arrays is easy.

Jan 05 2009

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Jan 6, 2009 at 6:15 AM, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb

 Am I (and possibly you) the only one(s) who think that sorting multiple arrays
in
 parallel should be standard library functionality?

I use this all the time in NumPy in the form of "argsort" which
returns a list of indices giving the sort order.  That can then be
used as to index other arrays (thereby permuting them to the sorted
order).
order = npy.argsort(keys)
sortedA = A[order]
sortedB = B[order]

 The standard rebuttal might be
 "use arrays of structs instead of parallel arrays".
 This is a good idea in some
 situations, but for others, parallel arrays are just plain better.

Yeh, that's just a silly argument.  Sometimes a column-oriented
organization is more suitable than a row-oriented one.
Sort can be made to work on column-oriented data, so there's no reason
not to make it so.

 Furthermore, with D's handling of variadic functions, generalizing any sort to
handle parallel
 arrays is easy.

Yeh, this works pretty nicely.

--bb

Jan 05 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:

I use this all the time in NumPy in the form of "argsort" which returns a list
of indices giving the sort order.  That can then be used as to index other
arrays (thereby permuting them to the sorted order).

order = npy.argsort(keys)
sortedA = A[order]
sortedB = B[order]<

My dlibs already contain an efficient sorting routine, much faster than the
built in one.

So to my dlibs I have just added sortedIndexes and remapOrder, you can find
their docs here:
http://www.fantascienza.net/leonardo/so/dlibs/func.html

The code:
http://www.fantascienza.net/leonardo/so/libs_d.zip

The usage is very simple, sortedIndexes is similar to sorted():

auto a =  [-5, 2, 3, 1, -11].dup;
auto order1 = a.sortedIndexes();
// now order1 == [4U, 0, 3, 1, 2]

auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;});
// Now order2 == [3U, 1, 2, 0, 4]

int[2][] c = [[3, 4], [1,2], [1, 1]];
auto order3 = c.sortedIndexes();
// now order3 == [2U, 1, 0]

auto a =  [-5, 2, 3, 1, -11].dup;

a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3]
// a isn't changed here
a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11]

auto b = ["oranges"[], "for", "apples", "is", "right"].dup;
b.remapOrder([3, 1, 2, 0, 4]) ==>
["is"[], "for", "apples", "oranges", "right"].dup);

(Maybe I have to rename remapOrder as remappedOrder).

--------------------------------

While writing the unittest for those functions, I have seen this isn't accepted
by DMD:
int[2][] c = [[3, 4], [1, 2], [1, 1]].dup;

While the compiler accepts this, I don't know why:
int[2][] c0 = [[3, 4], [1, 2], [1, 1]];
int[2][] c2 = c0.dup;

Bye,
bearophile

Jan 05 2009

bearophile <bearophileHUGS lycos.com> writes:

bearophile:
 (Maybe I have to rename remapOrder as remappedOrder).

'reordered' is better still, I've changed the name.

Bye,
bearophile

Jan 05 2009

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Jan 6, 2009 at 10:18 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Bill Baxter:

I use this all the time in NumPy in the form of "argsort" which returns a list
of indices giving the sort order.  That can then be used as to index other
arrays (thereby permuting them to the sorted order).

 order = npy.argsort(keys)
 sortedA = A[order]
 sortedB = B[order]<

 My dlibs already contain an efficient sorting routine, much faster than the
built in one.

 So to my dlibs I have just added sortedIndexes and remapOrder, you can find
their docs here:
 http://www.fantascienza.net/leonardo/so/dlibs/func.html

 The code:
 http://www.fantascienza.net/leonardo/so/libs_d.zip

 The usage is very simple, sortedIndexes is similar to sorted():

 auto a =  [-5, 2, 3, 1, -11].dup;
 auto order1 = a.sortedIndexes();
 // now order1 == [4U, 0, 3, 1, 2]

 auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;});
 // Now order2 == [3U, 1, 2, 0, 4]

 int[2][] c = [[3, 4], [1,2], [1, 1]];
 auto order3 = c.sortedIndexes();
 // now order3 == [2U, 1, 0]

 auto a =  [-5, 2, 3, 1, -11].dup;

 a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3]
 // a isn't changed here
 a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11]

 auto b = ["oranges"[], "for", "apples", "is", "right"].dup;
 b.remapOrder([3, 1, 2, 0, 4]) ==>
 ["is"[], "for", "apples", "oranges", "right"].dup);

 (Maybe I have to rename remapOrder as remappedOrder).

 --------------------------------

 While writing the unittest for those functions, I have seen this isn't
accepted by DMD:
 int[2][] c = [[3, 4], [1, 2], [1, 1]].dup;

 While the compiler accepts this, I don't know why:
 int[2][] c0 = [[3, 4], [1, 2], [1, 1]];
 int[2][] c2 = c0.dup;


Making a permutation array like that is simple and nice, but another
alternative which may be more efficient is to override the "swap(i,j)"
routine used by the sort implementation, and make it swap the i'th and
jth elements of all the arrays that are being sorted.  That can be
done with O(1) extra storage instead of the O(n) required by making to
make a permutation array.  I think it's also hard to avoid O(n) extra
memory for permuting an array.  At least I can't think of how to do it
with out using an extra allocation.

--bb

Jan 05 2009

mw <mingwu gmail.com> writes:

On Tuesday, 6 January 2009 at 01:18:21 UTC, bearophile wrote:
 Bill Baxter:

I use this all the time in NumPy in the form of "argsort" which 
returns a list of indices giving the sort order.  That can then 
be used as to index other arrays (thereby permuting them to the 
sorted order).

 order = npy.argsort(keys)
 sortedA = A[order]
 sortedB = B[order]<

 My dlibs already contain an efficient sorting routine, much 
 faster than the built in one.

 So to my dlibs I have just added sortedIndexes and remapOrder, 
 you can find their docs here: 
 http://www.fantascienza.net/leonardo/so/dlibs/func.html

 The code: http://www.fantascienza.net/leonardo/so/libs_d.zip

 The usage is very simple, sortedIndexes is similar to sorted():

 auto a =  [-5, 2, 3, 1, -11].dup;
 auto order1 = a.sortedIndexes();
 // now order1 == [4U, 0, 3, 1, 2]


I'm now looking for np.argsort, and found this post.

 bearophile, the doc page above is gone, although the zip file is 
still there (~10 years old), do you want to create a github repo, 
and contribute the code to dub (https://code.dlang.org/)?


BTW, does anyone know if there is a np.argsort function in some 
other d libraries?


Thanks.

Mar 23 2021

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 23 March 2021 at 21:22:25 UTC, mw wrote:
 [snip]

 I'm now looking for np.argsort, and found this post.

  bearophile, the doc page above is gone, although the zip file 
 is still there (~10 years old), do you want to create a github 
 repo, and contribute the code to dub (https://code.dlang.org/)?


 BTW, does anyone know if there is a np.argsort function in some 
 other d libraries?


 Thanks.

import std.algorithm.sorting: makeIndex;
import std.algorithm.comparison: equal;
import std.range: indexed;

void main() {
     int[] arr0 = [ 2, 3, 1, 5, 0 ];
     int[] arr1 = [ 1, 5, 4, 2, -1 ];

     auto index = new int[arr0.length];
     makeIndex!("a < b")(arr0, index);
     assert(index == [4, 2, 0, 1, 3]);
     auto view = arr1.indexed(index);
     assert(view.equal([-1, 4, 1, 5, 2]));
}

Mar 23 2021

Benji Smith <dlanguage benjismith.net> writes:

dsimcha wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb

 
 Am I (and possibly you) the only one(s) who think that sorting multiple arrays
in
 parallel should be standard library functionality?  The standard rebuttal
might be
 "use arrays of structs instead of parallel arrays".  This is a good idea in
some
 situations, but for others, parallel arrays are just plain better. 
Furthermore,
 with D's handling of variadic functions, generalizing any sort to handle
parallel
 arrays is easy.

I've written my own parallel-array quicksort implementation (several 
times over, in many different languages).

Parallel sorting is one of my favorite tricks, and I think it definitely 
belongs in the standard library.

--benji

Jan 05 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Benji Smith wrote:
 dsimcha wrote:
 == Quote from Bill Baxter (wbaxter gmail.com)'s article
 Actually, a function to sort multiple arrays in parallel was exactly
 what I was implementing using .sort.  So that doesn't sound like a
 limitation to me at all.   :-)
 --bb

 Am I (and possibly you) the only one(s) who think that sorting 
 multiple arrays in
 parallel should be standard library functionality?  The standard 
 rebuttal might be
 "use arrays of structs instead of parallel arrays".  This is a good 
 idea in some
 situations, but for others, parallel arrays are just plain better.  
 Furthermore,
 with D's handling of variadic functions, generalizing any sort to 
 handle parallel
 arrays is easy.

 
 I've written my own parallel-array quicksort implementation (several 
 times over, in many different languages).
 
 Parallel sorting is one of my favorite tricks, and I think it definitely 
 belongs in the standard library.
 
 --benji

std.algorithm's parameterized swap primitive is motivated in part by 
parallel array manipulation. See the schwartz routines in there for 
examples.

Andrei

Jan 05 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 std.algorithm's parameterized swap primitive is motivated in part by 
 parallel array manipulation. See the schwartz routines in there for 
 examples.

And may the schwartz be with your sorts!

Jan 05 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 std.algorithm's parameterized swap primitive is motivated in part by 
 parallel array manipulation. See the schwartz routines in there for 
 examples.

 
 And may the schwartz be with your sorts!

A Schwartz joke has been present in 
http://digitalmars.com/d/2.0/phobos/std_algorithm.html since its early 
days. I am appalled nobody has noticed it.


Andrei

Jan 05 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Bill Baxter wrote:
<snip>
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.
 
 --bb

A stable sort can be slower than an unstable sort.  There are many cases 
where the order of equally-ranking keys does not matter or it has no 
effect whatsoever (e.g. sorting an array of integers).  Then why take 
the extra overhead of making the sort stable?

We could have an extra property .stableSort, for those cases where you 
need stability.  Of course, it would be valid to implement .sort and 
.stableSort with the same algorithm, though to do so would be naive 
unless someone comes up with a stable sorting algorithm with O(n log n) 
or better time complexity and O(log n) or better extra memory 
requirement.  But in the absence of such an algorithm, the compiler 
could still replace .stableSort with .sort in those cases where it can 
guarantee it will make no difference to the end result.

Or you could argue that requiring sort to be stable is sufficiently 
uncommon that it might as well be left to a library function.

Stewart.

Jan 05 2009

"Bill Baxter" <wbaxter gmail.com> writes:

On Mon, Jan 5, 2009 at 8:40 PM, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 Bill Baxter wrote:
 <snip>
 I think the built-in sort should be some kind of stable sort.   Also
 the stability or lack thereof is not mentioned in the spec, and it
 probably should be because stability is critical for some uses of
 sorting.  One shouldn't have to guess whether a given implementation
 of D will use a stable sort or not.

 --bb


 A stable sort can be slower than an unstable sort.  There are many cases
 where the order of equally-ranking keys does not matter or it has no effect
 whatsoever (e.g. sorting an array of integers).  Then why take the extra
 overhead of making the sort stable?

 We could have an extra property .stableSort, for those cases where you need
 stability.  Of course, it would be valid to implement .sort and .stableSort
 with the same algorithm, though to do so would be naive unless someone comes
 up with a stable sorting algorithm with O(n log n) or better time complexity
 and O(log n) or better extra memory requirement.  But in the absence of such
 an algorithm, the compiler could still replace .stableSort with .sort in
 those cases where it can guarantee it will make no difference to the end
 result.

Actually yeh, I don't really care so much if .sort is stable or not
(as long as it's clear in the spec what to expect).  But I do care
that it is deterministic.  The current implementation seems to use
some randomness (random partitions?) so if I run my program 5 times I
get 5 different sort orders when there are duplicated keys.  This
makes it really hard to debug things that only happen with certain
orders.

 Or you could argue that requiring sort to be stable is sufficiently uncommon
 that it might as well be left to a library function.

There is no one sort algorithm that is fastest for all inputs.  So it
would make more sense to me to argue that if you want the fastest
possible sort for your problem then you should go with a library
solution.  Built-ins should be efficient, yes, but to argue that
efficiency is more important than generality in a built-in seems wrong
to me.  A least for .sort.   There's no performance benefit to having
a built-in sort, so it's really only there for convenience.   Using a
stable sort would make .sort applicable to more problems, thus
increasing the integral of convenience.

--bb

Jan 05 2009

Walter Bright <newshound1 digitalmars.com> writes:

Bill Baxter wrote:
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.

No, it doesn't use random sorts. The source code is included, so this 
should be easy to verify.

Jan 06 2009

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Bill Baxter wrote:
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.

 No, it doesn't use random sorts. The source code is included, so this
 should be easy to verify.

I don't know if it helps, but the built-in sort uses a similar algorithm
to the sort routine in tango.core.Array.  It picks a "median of 3" value
for the pivot.  As Walter said, I don't believe any randomness is involved.


Sean

Jan 06 2009

Ary Borenszweig <ary esperanto.org.ar> writes:

Sean Kelly wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Bill Baxter wrote:
 It seems to me that the built-in .sort uses a randomized algorithm
 that leads to results that differ from run-to-run when there are
 elements with identical keys.

 No, it doesn't use random sorts. The source code is included, so this
 should be easy to verify.

 
 I don't know if it helps, but the built-in sort uses a similar algorithm
 to the sort routine in tango.core.Array.  It picks a "median of 3" value
 for the pivot.  As Walter said, I don't believe any randomness is involved.
 
 
 Sean

I think Bill speaks about a stable sort. You can have an unstable sort 
algorithm without having explicity a random invocation. Note that he's 
saying "leads to results that differ from run-to-run when there are 
elements with identical keys".

Jan 06 2009

Walter Bright <newshound1 digitalmars.com> writes:

Ary Borenszweig wrote:
 I think Bill speaks about a stable sort. You can have an unstable sort 
 algorithm without having explicity a random invocation. Note that he's 
 saying "leads to results that differ from run-to-run when there are 
 elements with identical keys".

And I believe that the sort implementation is completely deterministic. 
If Bill is getting different results from different runs, something else 
is going on.

Jan 06 2009

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Jan 7, 2009 at 4:09 AM, Walter Bright
<newshound1 digitalmars.com> wrote:
 Ary Borenszweig wrote:
 I think Bill speaks about a stable sort. You can have an unstable sort
 algorithm without having explicity a random invocation. Note that he's
 saying "leads to results that differ from run-to-run when there are elements
 with identical keys".

 And I believe that the sort implementation is completely deterministic. If
 Bill is getting different results from different runs, something else is
 going on.

Really?  Hmm, I wonder what the reason was, then.  It did go away when
I switched to a different sort routine.
Don't have time to investigate right now, unfortunately.  But thank
you very much for the info that it should be deterministic.

--bb

Jan 06 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Randomness in built-in .sort