www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Stable topN?

reply "Xinok" <xinok live.com> writes:
I've managed to build a small laundry list the past couple days, 
so why not add another item? I've been tasked with implementing a 
"deterministic topN", but I also noticed that topN is lacking a 
stable implementation, so this seemed like an ideal opportunity 
to kill two birds with one stone. However, for me, it begged the 
question:

What exactly does stable topN imply?

A stable sort only implies that equal elements retain their 
original order. However, topN is a partitioning function whose 
pivot is unknown beforehand. To me, it would make more sense that 
not just equal elements but ALL elements retain their original 
order in their respective partitions.

However, to accomplish the above, the Nth element would need to 
be discovered before any other elements are reordered. This 
suggests making a full copy of the range to search for the Nth 
element. The unstable topN could be applied to the copy to 
discover the Nth element, then the stable partition function 
could finish the job with the correct pivot.

What do you all think? Do you agree with my definition of stable 
topN? Do you know of a better algorithm/approach for implementing 
this function?
Jun 23 2014
next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:
 What do you all think? Do you agree with my definition of 
 stable topN? Do you know of a better algorithm/approach for 
 implementing this function?

I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.
Jun 23 2014
prev sibling next sibling parent "Xinok" <xinok live.com> writes:
On Tuesday, 24 June 2014 at 04:40:48 UTC, Peter Alexander wrote:
 On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:
 What do you all think? Do you agree with my definition of 
 stable topN? Do you know of a better algorithm/approach for 
 implementing this function?

I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.

A stable sort would lose the original ordering of the elements though. Furthermore, the entire point of the topN function is to be faster than simply sorting the range.
Jun 24 2014
prev sibling next sibling parent "matovitch" <camille.brugel laposte.net> writes:
On Tuesday, 24 June 2014 at 11:50:37 UTC, Xinok wrote:
 On Tuesday, 24 June 2014 at 04:40:48 UTC, Peter Alexander wrote:
 On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:
 What do you all think? Do you agree with my definition of 
 stable topN? Do you know of a better algorithm/approach for 
 implementing this function?

I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.

A stable sort would lose the original ordering of the elements though. Furthermore, the entire point of the topN function is to be faster than simply sorting the range.

Yes you can't use stable sort. However you can use a priority queue, which would made the algorithm worst case O(M log N) in time and O(N) in space. (M is the number of elements in the original array and N the number of top elements)
Jun 24 2014
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 24 June 2014 at 11:50:37 UTC, Xinok wrote:
 On Tuesday, 24 June 2014 at 04:40:48 UTC, Peter Alexander wrote:
 On Monday, 23 June 2014 at 22:47:20 UTC, Xinok wrote:
 What do you all think? Do you agree with my definition of 
 stable topN? Do you know of a better algorithm/approach for 
 implementing this function?

I agree with your definition, and can't think of a better algorithm, but wouldn't a stable sort have the same effect as your algorithm? Both are O(n lg(n)) time and O(n) space.

A stable sort would lose the original ordering of the elements though. Furthermore, the entire point of the topN function is to be faster than simply sorting the range.

Yeah ignore me. Just woke up realising I'd made that mistake, but you've already replied :-) However, is your version faster? The last stable partition step is O(n lg n), same as stable sort.
Jun 24 2014
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/23/14, 3:47 PM, Xinok wrote:
 I've managed to build a small laundry list the past couple days, so why
 not add another item? I've been tasked with implementing a
 "deterministic topN", but I also noticed that topN is lacking a stable
 implementation, so this seemed like an ideal opportunity to kill two
 birds with one stone. However, for me, it begged the question:

 What exactly does stable topN imply?

 A stable sort only implies that equal elements retain their original
 order. However, topN is a partitioning function whose pivot is unknown
 beforehand. To me, it would make more sense that not just equal elements
 but ALL elements retain their original order in their respective
 partitions.

 However, to accomplish the above, the Nth element would need to be
 discovered before any other elements are reordered. This suggests making
 a full copy of the range to search for the Nth element. The unstable
 topN could be applied to the copy to discover the Nth element, then the
 stable partition function could finish the job with the correct pivot.

 What do you all think? Do you agree with my definition of stable topN?
 Do you know of a better algorithm/approach for implementing this function?

There's a notion of "semiStable" in std.algorithm that you may use for further refining topN guarantees. I'd say use semiStable for stability among the top N items (seeing as those are the most interesting), and stable for stability of all items. Andrei
Jun 24 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/24/14, 11:50 AM, Xinok wrote:
  From what I know of partitioning algorithms, they're either
 intrinsically stable or intrinsically unstable. Implementing an
 algorithm with the attribute you described would likely come with little
 benefit. I could be wrong, but this is my intuition.

I'd say start spinning some code and code will lead you to good insights regarding the relative costs of different stability guarantees. -- Andrei
Jun 24 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/24/14, 11:50 AM, Xinok wrote:
[snip]

See also https://issues.dlang.org/show_bug.cgi?id=12987. Thanks! -- Andrei
Jun 24 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/24/14, 4:41 PM, Xinok wrote:
 On Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu wrote:
 On 6/24/14, 11:50 AM, Xinok wrote:
 [snip]

 See also https://issues.dlang.org/show_bug.cgi?id=12987. Thanks! --
 Andrei

Given that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.

I think the breakage brought about from changing return types from void to something else is acceptable. -- Andrei
Jun 24 2014
prev sibling next sibling parent "Xinok" <xinok live.com> writes:
On Tuesday, 24 June 2014 at 16:03:26 UTC, Andrei Alexandrescu 
wrote:
 On 6/23/14, 3:47 PM, Xinok wrote:
 I've managed to build a small laundry list the past couple 
 days, so why
 not add another item? I've been tasked with implementing a
 "deterministic topN", but I also noticed that topN is lacking 
 a stable
 implementation, so this seemed like an ideal opportunity to 
 kill two
 birds with one stone. However, for me, it begged the question:

 What exactly does stable topN imply?

 A stable sort only implies that equal elements retain their 
 original
 order. However, topN is a partitioning function whose pivot is 
 unknown
 beforehand. To me, it would make more sense that not just 
 equal elements
 but ALL elements retain their original order in their 
 respective
 partitions.

 However, to accomplish the above, the Nth element would need 
 to be
 discovered before any other elements are reordered. This 
 suggests making
 a full copy of the range to search for the Nth element. The 
 unstable
 topN could be applied to the copy to discover the Nth element, 
 then the
 stable partition function could finish the job with the 
 correct pivot.

 What do you all think? Do you agree with my definition of 
 stable topN?
 Do you know of a better algorithm/approach for implementing 
 this function?

There's a notion of "semiStable" in std.algorithm that you may use for further refining topN guarantees. I'd say use semiStable for stability among the top N items (seeing as those are the most interesting), and stable for stability of all items. Andrei

From what I know of partitioning algorithms, they're either intrinsically stable or intrinsically unstable. Implementing an algorithm with the attribute you described would likely come with little benefit. I could be wrong, but this is my intuition.
Jun 24 2014
prev sibling next sibling parent "Xinok" <xinok live.com> writes:
On Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu 
wrote:
 On 6/24/14, 11:50 AM, Xinok wrote:
 [snip]

 See also https://issues.dlang.org/show_bug.cgi?id=12987. 
 Thanks! -- Andrei

Given that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.
Jun 24 2014
prev sibling next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 25 June 2014 at 00:50:48 UTC, Andrei Alexandrescu 
wrote:
 On 6/24/14, 4:41 PM, Xinok wrote:
 On Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu 
 wrote:
 On 6/24/14, 11:50 AM, Xinok wrote:
 [snip]

 See also https://issues.dlang.org/show_bug.cgi?id=12987. 
 Thanks! --
 Andrei

Given that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.

I think the breakage brought about from changing return types from void to something else is acceptable. -- Andrei

Agreed. Let's add that. David
Jun 24 2014
prev sibling parent "Xinok" <xinok live.com> writes:
On Wednesday, 25 June 2014 at 00:50:48 UTC, Andrei Alexandrescu 
wrote:
 On 6/24/14, 4:41 PM, Xinok wrote:
 On Tuesday, 24 June 2014 at 22:09:39 UTC, Andrei Alexandrescu 
 wrote:
 On 6/24/14, 11:50 AM, Xinok wrote:
 [snip]

 See also https://issues.dlang.org/show_bug.cgi?id=12987. 
 Thanks! --
 Andrei

Given that's a breaking change, I'd rather leave it open to discussion for a while. It would be really bad to silently change its behavior without so much as a deprecation warning. For now, I'll work on implementing deterministic & stable topN.

I think the breakage brought about from changing return types from void to something else is acceptable. -- Andrei

Ahhh, I missed that minor detail. :-P I had mistakenly assumed it returns the full range.
Jun 24 2014