www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.partition is fucked

reply superdan <super dan.org> writes:
dis must be related to bug 2966 sayin' std.sort is fucked. problem must be with
std.partition. i tested and unstable partition is 10 times slower than stable.
should be faster akshully. so looked into tat and found in da loop for
std.partition unstable and found da range r1 is fucked.

        for (;;)
        {
            for (;;)
            {
                if (r.empty) return r;
                if (!pred(r.front)) break;
                r.popFront;
            }
            // found the left bound
            assert(!r.empty);
            auto r1 = r;
            for (;;)
            {
                if (pred(r1.back)) break;
                r1.popBack;
                if (r1.empty) return r;
            }
            // found the right bound, swap & make progress
            swap(r.front, r1.back);
            r.popFront;
            r1.popBack;
        }

r1 is popbacked a few times but then all tat is forgotten the next time around
da loop coz r1 is local to da loop. so da loop is quadratic methinks. what u
need iz save r outside loop & then popfront & popback from da same range 'n'
shit.
May 13 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
superdan wrote:
 dis must be related to bug 2966 sayin' std.sort is fucked. problem
 must be with std.partition. i tested and unstable partition is 10
 times slower than stable. should be faster akshully. so looked into
 tat and found in da loop for std.partition unstable and found da
 range r1 is fucked.
That is correct. Where were you yesterday when I was looking for this? :o) The fixed loop is: // Inspired from www.stepanovpapers.com/PAM3-partition_notes.pdf, // section "Bidirectional Partition Algorithm (Hoare)" auto result = r; for (;;) { for (;;) { if (r.empty) return result; if (!pred(r.front)) break; r.popFront; result.popFront; } // found the left bound assert(!r.empty); for (;;) { if (pred(r.back)) break; r.popBack; if (r.empty) return result; } // found the right bound, swap & make progress swap(r.front, r.back); r.popFront; result.popFront; r.popBack; } Note how the left edge of result follows the left edge of r, but the right edge stays put because partition() returns the right-hand-side range. r shrinks from both ends to exhaustion. The performance of std.sort is back to normal - still slower than the built-in sort (but not by orders of magnitude), probably because bumping ranges is at a disadvantage. Thanks David and superdan. Andrei
May 13 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 Note how the left edge of result follows the left edge of r, but the 
 right edge stays put because partition() returns the right-hand-side 
 range. r shrinks from both ends to exhaustion.
So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
May 13 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Note how the left edge of result follows the left edge of r, but the 
 right edge stays put because partition() returns the right-hand-side 
 range. r shrinks from both ends to exhaustion.
So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate. Andrei
May 13 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Note how the left edge of result follows the left edge of r, but the
 right edge stays put because partition() returns the right-hand-side
 range. r shrinks from both ends to exhaustion.
So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.
Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.
May 13 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Note how the left edge of result follows the left edge of r, but the
 right edge stays put because partition() returns the right-hand-side
 range. r shrinks from both ends to exhaustion.
So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.
Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.
Nono, lessThan is a binary predicate whereas partition takes a unary predicate. The spec of partition is very simple: do whatever the hell it takes to put elements e that satisfy pred(e) before elements that don't. If you follow the way std.sort uses partition, you'll see that it passes a unary predicate that compares for less than against the chosen pivot. Andrei
May 13 2009
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Note how the left edge of result follows the left edge of r, but the
 right edge stays put because partition() returns the right-hand-side
 range. r shrinks from both ends to exhaustion.
So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.
Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.
Nono, lessThan is a binary predicate whereas partition takes a unary predicate. The spec of partition is very simple: do whatever the hell it takes to put elements e that satisfy pred(e) before elements that don't. If you follow the way std.sort uses partition, you'll see that it passes a unary predicate that compares for less than against the chosen pivot.
Okay, I think we're talking at cross-purposes :-) It seems we both agree that partition should place the elements satisfying pred before those that do not. And I should have been more clear about pred being a unary function. In any case, what I'm wondering is why partition returns the range representing the elements that do not satisfy pred. When I call partition, wouldn't I be interested in the result containing the elements that have satisfied the supplied predicate? Or does this cause problems for the sort routine.
May 13 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Note how the left edge of result follows the left edge of r, but the
 right edge stays put because partition() returns the right-hand-side
 range. r shrinks from both ends to exhaustion.
So all the elements that satisfy the predicate end up at the end of the original range instead of the beginning? Was that an arbitrary choice, or is there a reason for it?
The elements satisfying the predicate are at the beginning, see e.g. the unittests. The range returned is (as always) the right-hand side, i.e. the range containing the elements that don't satisfy the predicate.
Weird. I'd always thought that the standard behavior was the opposite. That way partition could be passed a lessThan predicate and be used for sorting. Though I guess you can just use a greaterThan predicate instead.
Nono, lessThan is a binary predicate whereas partition takes a unary predicate. The spec of partition is very simple: do whatever the hell it takes to put elements e that satisfy pred(e) before elements that don't. If you follow the way std.sort uses partition, you'll see that it passes a unary predicate that compares for less than against the chosen pivot.
Okay, I think we're talking at cross-purposes :-) It seems we both agree that partition should place the elements satisfying pred before those that do not. And I should have been more clear about pred being a unary function. In any case, what I'm wondering is why partition returns the range representing the elements that do not satisfy pred. When I call partition, wouldn't I be interested in the result containing the elements that have satisfied the supplied predicate? Or does this cause problems for the sort routine.
Oh, now I understand. Sorry. As a general rule, when they could return either the left or the right subrange of a range, functions in std.algorithm return the right range. This is because sentinel-terminated ranges cannot express the left-hand-side range. Partition is in fact the perfect example because it works with forward ranges. If you want to partition a singly-linked list, you'd have to return the right-hand sublist. There's nothing else you could possibly return! If you wanted to return the left-hand sublist, you'd have to return a different type of range (something like "list up to this node"). So for generality's sake, whenever you have a choice, return the right-hand part of a range. There is growing interest in defining additional ranges that express (at a cost) things like "the portion of a singly-linked list starting at node X and ending at node Y". For example, people want to do a find() and then deal with the portion _before_ the found element. I'd love to discuss that further, but I guess I'll have to wait for the d.next newsgroup. :oD Andrei
May 13 2009
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 As a general rule, when they could return either the left or the right
 subrange of a range, functions in std.algorithm return the right range.
 This is because sentinel-terminated ranges cannot express the
 left-hand-side range.
Gotcha. That's unfortunate, but makes perfect sense.
 Partition is in fact the perfect example because it works with forward
 ranges. If you want to partition a singly-linked list, you'd have to
 return the right-hand sublist. There's nothing else you could possibly
 return! If you wanted to return the left-hand sublist, you'd have to
 return a different type of range (something like "list up to this node").
 So for generality's sake, whenever you have a choice, return the
 right-hand part of a range.
It looks like there may be another bug in partition then. The static else clause (ss == SwapStrategy.unstable) is supposed to work for forward ranges but it calls popBack(). It looks like only the semistable option is actually available to forward ranges. Is this correct?
 There is growing interest in defining additional ranges that express (at
 a cost) things like "the portion of a singly-linked list starting at
 node X and ending at node Y". For example, people want to do a find()
 and then deal with the portion _before_ the found element. I'd love to
 discuss that further, but I guess I'll have to wait for the d.next
 newsgroup. :oD
Yeah, it would be great if it were possible to slice and dice a range in any manner of different ways.
May 13 2009
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 13 May 2009 17:17:33 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Partition is in fact the perfect example because it works with forward
 ranges. If you want to partition a singly-linked list, you'd have to
 return the right-hand sublist. There's nothing else you could possibly
 return! If you wanted to return the left-hand sublist, you'd have to
 return a different type of range (something like "list up to this node").
It depends on if the sentinel is static. For example, it would be perfectly legal to specify a linked list as the nodes between node x and y, where y is null at runtime in the case of a full list. I'm not even sure the performance would suffer significantly. dcollections' linked list is a doubly linked list, but I use a sentinel dummy element to denote both the beginning and the end. It makes insertion/deletion code completely uniform because there is *always* a valid previous and next node for each node. No checking for "if this is the head node, update the original list pointer" Not being able to return a subrange of a container as fundamental as a linked list is a pretty significant oversight in my opinion. -Steve
May 13 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Wed, 13 May 2009 17:17:33 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Partition is in fact the perfect example because it works with forward
 ranges. If you want to partition a singly-linked list, you'd have to
 return the right-hand sublist. There's nothing else you could possibly
 return! If you wanted to return the left-hand sublist, you'd have to
 return a different type of range (something like "list up to this node").
It depends on if the sentinel is static. For example, it would be perfectly legal to specify a linked list as the nodes between node x and y, where y is null at runtime in the case of a full list. I'm not even sure the performance would suffer significantly. dcollections' linked list is a doubly linked list, but I use a sentinel dummy element to denote both the beginning and the end. It makes insertion/deletion code completely uniform because there is *always* a valid previous and next node for each node. No checking for "if this is the head node, update the original list pointer" Not being able to return a subrange of a container as fundamental as a linked list is a pretty significant oversight in my opinion.
Well it's not quite an oversight because I was well aware of the issue. The only thing is, I started with the narrowest interface I could to figure how far I can get. Primitives can be added to select range categories as needed. Andrei
May 13 2009
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 The performance of std.sort is back to normal - still slower than the 
 built-in sort (but not by orders of magnitude), probably because bumping 
 ranges is at a disadvantage.
The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
May 13 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 The performance of std.sort is back to normal - still slower than the 
 built-in sort (but not by orders of magnitude), probably because 
 bumping ranges is at a disadvantage.
The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time. Andrei
May 13 2009
parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Andrei Alexandrescu wrote:

 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 The performance of std.sort is back to normal - still slower than the 
 built-in sort (but not by orders of magnitude), probably because 
 bumping ranges is at a disadvantage.
The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time. Andrei
Some time ago I reinvented these wheels for study purpose. A custom stack was a little faster but not that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called many times. Switching to another sort if a certain recursion depth is reached helped, but mostly in degenerate cases. I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I could dig it up and provide some timings if you want.
May 13 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Lutger wrote:
 Andrei Alexandrescu wrote:
 
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 The performance of std.sort is back to normal - still slower than the 
 built-in sort (but not by orders of magnitude), probably because 
 bumping ranges is at a disadvantage.
The built-in sort uses an internal stack rather than recursion, which makes its performance on a best-case dataset hard to beat. I finally satisfied myself by having a constant time overhead vs. the built-in sort for best-case and beat it by polynomial time for worst-case (the built-in sort doesn't adapt to worst-case datasets very well).
Yeah, I saw that fixed-size stack. I'm not sure whether that's responsible for much of its performance, one of these days I need to measure. My speculation is that the depth of the stack is small enough to not really contribute much to the running time. Andrei
Some time ago I reinvented these wheels for study purpose. A custom stack was a little faster but not that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called many times. Switching to another sort if a certain recursion depth is reached helped, but mostly in degenerate cases. I still have the thing somewhere, it is about twice as fast as builtin sort. It's not a lot of work but I could dig it up and provide some timings if you want.
Would be interesting if it's not too much trouble. If swap is not inlined that would be serious. Sort does a lot of swapping. Andrei
May 13 2009
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 If swap is not inlined that would be serious. Sort does a lot of swapping.
It only works for arrays, but what I ended up doing to get swap inlined was to pass it two array indices instead of two references. There must be some way to address this for ranges if ref parameters prevent inlining with DMD.
May 13 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 If swap is not inlined that would be serious. Sort does a lot of swapping.
It only works for arrays, but what I ended up doing to get swap inlined was to pass it two array indices instead of two references. There must be some way to address this for ranges if ref parameters prevent inlining with DMD.
Well the least we can do is to say static if (is(Range R == T[], T)) { ... use array-specific swap ... } else { ... use regular swap ... } Andrei
May 13 2009
prev sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
fwiw, I found the code and attached it with a benchmark found in the bugreport
(2966). It works only on arrays, might be buggy too. But it does sort ;)

Timings based on 1000_000 elements:

std.algorithm:  624 // modified with the fix posted in this thread
Builtin sort:  540
iterative introsort:  244
May 13 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 13 May 2009 22:24:51 +0400, Lutger <lutger.blijdestijn gmail.com> wrote:

 std.swap can't be inlined because it uses ref params
Can you elaborate on that one? I'm not so sure that it can't be inlined.
May 13 2009
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Lutger (lutger.blijdestijn gmail.com)'s article
 Some time ago I reinvented these wheels for study purpose. A custom stack was
a little faster but not
that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called
 many times. Switching to another sort if a certain recursion depth is reached
helped, but mostly in
degenerate cases.
 I still have the thing somewhere, it is about twice as fast as builtin sort.
It's not a lot of work but I could
dig it up and provide some timings if you want. The sort I wrote for Tango uses the same basic heuristics, thanks to a ticket that either you or Stewart Gordon submitted long ago. I've been meaning to compare it against the sort routine in std.algorithm to see whether the Phobos routine could benefit from any of the same tweaks.
May 13 2009
next sibling parent reply Matti Niemenmaa <see_signature for.real.address> writes:
Sean Kelly wrote:
 The sort I wrote for Tango uses the same basic heuristics, thanks to
 a ticket that either you or Stewart Gordon submitted long ago.
*Ahem*, I believe that http://www.dsource.org/projects/tango/ticket/571 was one of mine ;-) -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
May 13 2009
parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Matti Niemenmaa (see_signature for.real.address)'s article
 Sean Kelly wrote:
 The sort I wrote for Tango uses the same basic heuristics, thanks to
 a ticket that either you or Stewart Gordon submitted long ago.
*Ahem*, I believe that http://www.dsource.org/projects/tango/ticket/571 was one of mine ;-)
Darnit, I knew if I didn't look it up I'd get it wrong. Sorry about that :-)
May 13 2009
prev sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Sean Kelly wrote:

 == Quote from Lutger (lutger.blijdestijn gmail.com)'s article
 Some time ago I reinvented these wheels for study purpose. A custom stack was
a little faster but not
that much. std.swap can't be inlined because it uses ref params, that cost also a bit since it's called
 many times. Switching to another sort if a certain recursion depth is reached
helped, but mostly in
degenerate cases.
 I still have the thing somewhere, it is about twice as fast as builtin sort.
It's not a lot of work but I could
dig it up and provide some timings if you want. The sort I wrote for Tango uses the same basic heuristics, thanks to a ticket that either you or Stewart Gordon submitted long ago. I've been meaning to compare it against the sort routine in std.algorithm to see whether the Phobos routine could benefit from any of the same tweaks.
It wasn't me, I used the Tango code as one the sources to study this algorithm :)
May 13 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Sean Kelly:

The built-in sort uses an internal stack rather than recursion, which makes its
performance on a best-case dataset hard to beat.<
The built-in sort is snail slow, and its stack overflows if your array isn't small: void main() { auto a = new uint[0x8F_FFFF]; a.sort; } So the current built-in sort is bad, and it must be fixed or removed. Bye, bearophile (Sorry for the answering delay, I was busy for about a week. Tons of posts to catch up!)
May 15 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Sean Kelly:
 
 The built-in sort uses an internal stack rather than recursion, which makes
its performance on a best-case dataset hard to beat.<
The built-in sort is snail slow, and its stack overflows if your array isn't small: void main() { auto a = new uint[0x8F_FFFF]; a.sort; } So the current built-in sort is bad, and it must be fixed or removed.
That pretty much settles it. Seeing sort's fixed stack, I had this feeling for a while, but only now I saw the proof :o). Andrei
May 15 2009