www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to sort a range

reply rcorre <ryan rcorre.net> writes:
I was in a situation where I wanted to remove duplicates from an 
OnlyResult.
To do this with uniq, I needed to sort it. OnlyResult doesn't 
satisfy the template constraints of sort, but this seems easy 
enough to fix. I made front, back, and opIndex return by ref. 
With this, the following compiles:

     assert(only(3,1,2).sort.equal(only(1,2,3)));

However, it fails with:

     core.exception.AssertError std/algorithm/sorting.d(1052): 
Failed to sort range of type OnlyResult!(int, 3LU)

So, if you have a range for which sort compiles, what does it 
take to make sorting actually work?

For reference, my two attempts were:

https://github.com/rcorre/phobos/commit/d89b3cfab7a0938e178a506b4ceb8faae6ecbfe2

https://github.com/rcorre/phobos/commit/512d9b8db6f311db6a9b6ccb077a691cec66ce70
Mar 08 2016
parent reply Edwin van Leeuwen <edder tkwsping.nl> writes:
On Wednesday, 9 March 2016 at 03:05:52 UTC, rcorre wrote:
 I was in a situation where I wanted to remove duplicates from 
 an OnlyResult.
 To do this with uniq, I needed to sort it. OnlyResult doesn't 
 satisfy the template constraints of sort, but this seems easy 
 enough to fix. I made front, back, and opIndex return by ref. 
 With this, the following compiles:

     assert(only(3,1,2).sort.equal(only(1,2,3)));

 However, it fails with:

     core.exception.AssertError std/algorithm/sorting.d(1052): 
 Failed to sort range of type OnlyResult!(int, 3LU)

 So, if you have a range for which sort compiles, what does it 
 take to make sorting actually work?

 For reference, my two attempts were:

 https://github.com/rcorre/phobos/commit/d89b3cfab7a0938e178a506b4ceb8faae6ecbfe2

 https://github.com/rcorre/phobos/commit/512d9b8db6f311db6a9b6ccb077a691cec66ce70
I'm not sure why your fix didn't work, but generally I work around this by converting the OnlyResult into an array: import std.array : array; assert(only(3,1,2).array.sort.equal(only(1,2,3))); If you are looking for a lazy uniq that works on non sorted ranges, I implemented one not to long ago: http://github.com/BlackEdder/ggplotd/blob/master/source/ggplotd/range.d
Mar 09 2016
parent reply rcorre <ryan rcorre.net> writes:
On Wednesday, 9 March 2016 at 09:15:01 UTC, Edwin van Leeuwen 
wrote:
 I'm not sure why your fix didn't work, but generally I work 
 around this by converting the OnlyResult into an array:

 import std.array : array;
 assert(only(3,1,2).array.sort.equal(only(1,2,3)));
I'd like to avoid allocating here.
 If you are looking for a lazy uniq that works on non sorted 
 ranges, I implemented one not to long ago:
 http://github.com/BlackEdder/ggplotd/blob/master/source/ggplotd/range.d
That sounds like the kind of thing I was looking for. I'll take a look, thanks!
Mar 09 2016
next sibling parent reply Edwin van Leeuwen <edder tkwsping.nl> writes:
On Wednesday, 9 March 2016 at 12:21:55 UTC, rcorre wrote:
 If you are looking for a lazy uniq that works on non sorted 
 ranges, I implemented one not to long ago:
 http://github.com/BlackEdder/ggplotd/blob/master/source/ggplotd/range.d
That sounds like the kind of thing I was looking for. I'll take a look, thanks!
Well that one does allocate, because it keeps track of which values have already been seen.
Mar 09 2016
parent reply rcorre <ryan rcorre.net> writes:
On Wednesday, 9 March 2016 at 12:31:18 UTC, Edwin van Leeuwen 
wrote:
 On Wednesday, 9 March 2016 at 12:21:55 UTC, rcorre wrote:
 If you are looking for a lazy uniq that works on non sorted 
 ranges, I implemented one not to long ago:
 http://github.com/BlackEdder/ggplotd/blob/master/source/ggplotd/range.d
That sounds like the kind of thing I was looking for. I'll take a look, thanks!
Well that one does allocate, because it keeps track of which values have already been seen.
Yup, just noticed that >.<
Mar 09 2016
parent Edwin van Leeuwen <edder tkwsping.nl> writes:
On Wednesday, 9 March 2016 at 13:04:31 UTC, rcorre wrote:
 On Wednesday, 9 March 2016 at 12:31:18 UTC, Edwin van Leeuwen 
 wrote:
 On Wednesday, 9 March 2016 at 12:21:55 UTC, rcorre wrote:
 If you are looking for a lazy uniq that works on non sorted 
 ranges, I implemented one not to long ago:
 http://github.com/BlackEdder/ggplotd/blob/master/source/ggplotd/range.d
That sounds like the kind of thing I was looking for. I'll take a look, thanks!
Well that one does allocate, because it keeps track of which values have already been seen.
Yup, just noticed that >.<
Of course it only allocates when the actual result is used, so this will probably be more efficient if you only need a small number of unique results or need to keep the unsorted range around/intact. Sorting without allocating and then using uniq should indeed be more efficient in other cases. Did you try different SwapStrategy values in your original?
Mar 09 2016
prev sibling parent reply cym13 <cpicard openmailbox.org> writes:
On Wednesday, 9 March 2016 at 12:21:55 UTC, rcorre wrote:
 On Wednesday, 9 March 2016 at 09:15:01 UTC, Edwin van Leeuwen 
 wrote:
 I'm not sure why your fix didn't work, but generally I work 
 around this by converting the OnlyResult into an array:

 import std.array : array;
 assert(only(3,1,2).array.sort.equal(only(1,2,3)));
I'd like to avoid allocating here.
Note that an input range isn't even remotely a container, it's a way to iterate on a container. As you don't have all elements at hand you can't sort them, that's why you have to use array here. If you don't want to allocate using the GC just allocate your own memory and store your range elements in it before sorting but sort has to have access to all elements one way or another.
Mar 09 2016
next sibling parent reply rcorre <ryan rcorre.net> writes:
On Wednesday, 9 March 2016 at 14:28:11 UTC, cym13 wrote:
 On Wednesday, 9 March 2016 at 12:21:55 UTC, rcorre wrote:
 On Wednesday, 9 March 2016 at 09:15:01 UTC, Edwin van Leeuwen 
 wrote:
 I'm not sure why your fix didn't work, but generally I work 
 around this by converting the OnlyResult into an array:

 import std.array : array;
 assert(only(3,1,2).array.sort.equal(only(1,2,3)));
I'd like to avoid allocating here.
Note that an input range isn't even remotely a container, it's a way to iterate on a container. As you don't have all elements at hand you can't sort them, that's why you have to use array here.
In the general case, yes. However only is a range wrapper around a static array, and does have all elements at hand. Maybe I should just be using a static array... Still curious as to why it fails; maybe the range is getting copied at some point? I guess I need to step through it. I did try different SwapStrategies with no luck.
Mar 09 2016
next sibling parent Edwin van Leeuwen <edder tkwsping.nl> writes:
On Wednesday, 9 March 2016 at 15:39:55 UTC, rcorre wrote:
 On Wednesday, 9 March 2016 at 14:28:11 UTC, cym13 wrote:

 Still curious as to why it fails; maybe the range is getting 
 copied at some point? I guess I need to step through it.

 I did try different SwapStrategies with no luck.
Since you are adapting phobos anyway you could try commenting out the assert and see what the result of the sort is. That might give you some clue: //assert(isSorted!lessFun(r), "Failed to sort range of type " ~ Range.stringof); Also I notice the line numbering is different in my sorted.d file. Did you test the latest version of dmd/phobos?
Mar 09 2016
prev sibling parent reply Xinok <xinok live.com> writes:
On Wednesday, 9 March 2016 at 15:39:55 UTC, rcorre wrote:
 Still curious as to why it fails; maybe the range is getting 
 copied at some point? I guess I need to step through it.
That's my suspicion as well. It seems that OnlyResult is pass-by-value so every time it gets passed to another function, it creates a copy of all the elements. A simple solution is to provide a wrapper type which refers to the elements in the original container. However, I'm going to argue that the sort function is fine but the modifications you made to OnlyResult are incorrect. I tried running your example of only(...).sort but got a compilation error. Similarly, trying to sort a static array also gives a compilation error. However, if I slice the static array before passing it to sort (thus passing by reference), then it works just fine.
Mar 09 2016
parent reply rcorre <ryan rcorre.net> writes:
On Wednesday, 9 March 2016 at 16:53:08 UTC, Xinok wrote:
 On Wednesday, 9 March 2016 at 15:39:55 UTC, rcorre wrote:
 Still curious as to why it fails; maybe the range is getting 
 copied at some point? I guess I need to step through it.
That's my suspicion as well. It seems that OnlyResult is pass-by-value so every time it gets passed to another function, it creates a copy of all the elements. A simple solution is to provide a wrapper type which refers to the elements in the original container. However, I'm going to argue that the sort function is fine but the modifications you made to OnlyResult are incorrect. I tried running your example of only(...).sort but got a compilation error. Similarly, trying to sort a static array also gives a compilation error. However, if I slice the static array before passing it to sort (thus passing by reference), then it works just fine.
Got it. sort calls to quicksort (for unstable, at least) which uses swapAt. swapAt takes the range by value, so it just swaps the values in its local copy. The original OnlyResult is untouched. I guess a static array slice maintains a pointer to the underlying array (which is why returning one from a function errors with 'escaping reference to local variable'). Meanwhile, I've realized my code probably doensn't need to remove duplicates anyways, so its a moot point, but still an interesting discovery :)
Mar 09 2016
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 03/09/2016 06:50 PM, rcorre wrote:

 sort calls to quicksort (for unstable, at least) which uses
 swapAt. swapAt takes the range by value, so it just swaps the values in
 its local copy.
Remembering that a range is not the collection, swapAt takes the range by value but it does not copy the elements. So, sort() does sort the original array: import std.algorithm; void main() { auto a = [ 9, -1, 2, 0 ]; a.sort(); assert(a == [ -1, 0, 2, 9 ]); }
 The original OnlyResult is untouched. I guess a static
 array slice maintains a pointer to the underlying array (which is why
 returning one from a function errors with 'escaping reference to local
 variable').
Yes: A static array is just a collection of elements, which implicitly converts to a slice and a slice is nothing but a pair of "pointer to the first element" and "number of elements". Ali
Mar 09 2016
prev sibling next sibling parent rcorre <ryan rcorre.net> writes:
On Wednesday, 9 March 2016 at 14:28:11 UTC, cym13 wrote:
 Note that an input range isn't even remotely a container, it's 
 a way to iterate on a container. As you don't have all elements 
 at hand you can't sort them, that's why you have to use array 
 here.
Oh, I think it just clicked. I was thinking 'sort takes a range, so it must be used for sorting ranges', but I should have thought 'sort takes a range so it can sort a container via a range over that container'.
Mar 09 2016
prev sibling parent Chris Wright <dhasenan gmail.com> writes:
On Wed, 09 Mar 2016 14:28:11 +0000, cym13 wrote:

 Note that an input range isn't even remotely a container
Which is why sort() has template constraints beyond isInputRange. The constraints ensure that it is possible to swap values in the range.
Mar 09 2016