## digitalmars.D - Idempotent partition around median of 5?

- Andrei Alexandrescu (17/17) Feb 03 2016 This appears a simple problem: given numbers a, b, c, d, e, swap them
- Era Scarecrow (7/16) Feb 03 2016 Well if we look at it, the max compares for optimal sorting
- Xinok (16/34) Feb 03 2016 The order of a,b and d,e don't matter because we're partitioning,
- Andrei Alexandrescu (2/4) Feb 03 2016 Thanks very much! -- Andrei
- Era Scarecrow (11/17) Feb 03 2016 Hmmm if i wrote it (this way) it would probably brute force the
- Timon Gehr (4/10) Feb 04 2016 Three swaps are always enough. Using the first swap, put the median
- Timon Gehr (244/261) Feb 04 2016 At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
- Era Scarecrow (4/21) Feb 04 2016 That's about what i expected for the actual function (like this)
- Xinok (6/11) Feb 04 2016 Great, we can all go home!
- tn (45/48) Feb 05 2016 Inspired by this, I made four other versions of the function that
- Xinok (5/17) Feb 05 2016 Very nice! I'm curious, to both you and Timon, how did you go
- Timon Gehr (16/33) Feb 05 2016 I quickly hacked together a brute-force search.
- Andrei Alexandrescu (2/5) Feb 05 2016 I was thinking the same exact things. -- Andrei
- tn (32/52) Feb 06 2016 I basically just started from Timons solution and made is smaller
- Andrei Alexandrescu (12/34) Feb 06 2016 Whoa, I missed this last night. Awesome work! Thanks! I'm kinda running
- tn (103/108) Feb 06 2016 This now prints the numbers of comparison and swap lines and
- Andrei Alexandrescu (4/17) Feb 06 2016 Awesome, thanks. Will need to try at least a few of these out. At a
- Fool (13/15) Feb 05 2016 One swap usually decomposes into three moves. A cycle of length n
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Feb 05 2016 Yes, well, this all breaks down when you realize that this is all
- Era Scarecrow (11/12) Feb 05 2016 Recently from reading some interesting hacks and super code, a
- Xinok (9/21) Feb 05 2016 It's a cool trick but I would be surprised if this were actually
- Andrei Alexandrescu (2/3) Feb 05 2016 That changes every few years. Last time I measured it was slower. -- And...
- Timon Gehr (3/17) Feb 05 2016 The goal is to partition though, not to sort.
- Fool (2/4) Feb 05 2016 Indeed. I was wondering about that. Great job!
- Andrei Alexandrescu (8/9) Feb 05 2016 Timon, Ivan, this is amazing work. I don't know how your minds work
- Andrei Alexandrescu (3/4) Feb 05 2016 What is the minimum number of comparisons? Thx! -- Andrei
- Timon Gehr (4/8) Feb 05 2016 There is no partition algorithm that uses <= 5 comparisons in all cases.
- Andrei Alexandrescu (5/12) Feb 06 2016 Was asking about this particular one. For example, the maximum
- Timon Gehr (23/36) Feb 06 2016 All versions posted in this thread do 6 comparisons on all code paths.
- Andrei Alexandrescu (10/13) Feb 06 2016 Thanks. Tried this just now, it's better than the pre-discussion
- Andrei Alexandrescu (5/6) Feb 05 2016 Oh, also: could you let that bad boy run and let it find anything that
- tn (7/15) Feb 06 2016 That is kind of what I tried to do by hand. My function
- Ivan Kazmenko (58/62) Feb 05 2016 Here's my take at it.
- Andrei Alexandrescu (2/4) Feb 05 2016 Under what circumstances isn't your function unstable? -- Andrei
- Andrei Alexandrescu (3/7) Feb 05 2016 I meant "is your function unstable". Double negation, but if you speak
- Ivan Kazmenko (26/30) Feb 05 2016 For example, if elements are "value | id", compared by value,
- Ivan Kazmenko (5/14) Feb 05 2016 The code I used to check stability:
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (19/27) Feb 06 2016 That's right, but other factors are more important: preventing
- Andrei Alexandrescu (6/34) Feb 06 2016 Yah. I think stability should be solved if there's a need/application

This appears a simple problem: given numbers a, b, c, d, e, swap them around so as to place the median in c and partition the others around it. I.e. the postcondition is: a <= c && b <= c && c <= d && c <= e. Searching the Net for this isn't easy. Fortunately "our own" Xinok has the best of breed result, see http://stackoverflow.com/questions/11065963/possible-to-partition-five-elements-by-median-with-six-comparisons. His algorithm is a little suboptimal in that when given a sorted array, it sometimes leaves it unsorted. So I changed it to http://dpaste.dzfl.pl/5fb2238d9891 to fix that. If I count right, it also saves one swap: does 0-9 swaps instead of 0-10. Ideally however, such an algorithm would do 0 swaps if the median is already in c. My algorithm may still swap d and e without necessity. So there's got to be a better solution. Your challenge - should you choose to accept it :o) - is an algorithm that does the partitioning in 6 comparisons and <= 9 swaps, which is idempotent: when applied twice, it always does 0 swaps. Andrei

Feb 03 2016

On Thursday, 4 February 2016 at 01:24:15 UTC, Andrei Alexandrescu wrote:This appears a simple problem: given numbers a, b, c, d, e, swap them around so as to place the median in c and partition the others around it. I.e. the postcondition is: a <= c && b <= c && c <= d && c <= e. <snip> So there's got to be a better solution. Your challenge - should you choose to accept it :o) - is an algorithm that does the partitioning in 6 comparisons and <= 9 swaps, which is idempotent: when applied twice, it always does 0 swaps.Well if we look at it, the max compares for optimal sorting algorithm is the log2 of the max combinations in how the elements could be arranged; !5 = 120 or 7 compares. I'm not sure of the max swaps, seems like it could be 4 if done right, but that might be impractical.

Feb 03 2016

On Thursday, 4 February 2016 at 01:33:54 UTC, Era Scarecrow wrote:On Thursday, 4 February 2016 at 01:24:15 UTC, Andrei Alexandrescu wrote:The order of a,b and d,e don't matter because we're partitioning, not sorting. So this problem can be solved in six comparisons.This appears a simple problem: given numbers a, b, c, d, e, swap them around so as to place the median in c and partition the others around it. I.e. the postcondition is: a <= c && b <= c && c <= d && c <= e. <snip> So there's got to be a better solution. Your challenge - should you choose to accept it :o) - is an algorithm that does the partitioning in 6 comparisons and <= 9 swaps, which is idempotent: when applied twice, it always does 0 swaps.Well if we look at it, the max compares for optimal sorting algorithm is the log2 of the max combinations in how the elements could be arranged; !5 = 120 or 7 compares.I'm not sure of the max swaps, seems like it could be 4 if done right, but that might be impractical.I believe that's correct. Each swap can move at least one element into it's final position. After three swaps, there may be two elements which are still out of place, and swapping them will move them both into their final position. Thus, four swaps max. :-) A solution to this problem does exist: Build a tree of if-else branches for all the different possibilities (2^6=64) and hard-code the optimal sequence of swaps. However, this function would be huge so this isn't very practical. Ideally, we want to minimize the size of the function as much as possible, adding in a few extra swaps if necessary. I'll take a crack at it this weekend. It sounds like an interesting problem.

Feb 03 2016

On 02/03/2016 09:46 PM, Xinok wrote:I'll take a crack at it this weekend. It sounds like an interesting problem.Thanks very much! -- Andrei

Feb 03 2016

On Thursday, 4 February 2016 at 02:46:42 UTC, Xinok wrote:A solution to this problem does exist: Build a tree of if-else branches for all the different possibilities (2^6=64) and hard-code the optimal sequence of swaps. However, this function would be huge so this isn't very practical.Yeah that's kinda what i was coming to a conclusion to.Ideally, we want to minimize the size of the function as much as possible, adding in a few extra swaps if necessary.Hmmm if i wrote it (this way) it would probably brute force the proper way to build the function rather than doing it by hand myself. So a helper function to make the optimal if/else statements to deal with it. BUT this only makes sense if it's always going to be 5 elements, if it's fewer or more, then it's sorta pointless and a more generic algorithm would be preferred. Although writing a brute force program creation helper function could still be of some use.

Feb 03 2016

On 02/04/2016 03:46 AM, Xinok wrote:Three swaps are always enough. Using the first swap, put the median where it belongs, and each subsequent swap can be chosen such that it moves two elements to their final position.I'm not sure of the max swaps, seems like it could be 4 if done right, but that might be impractical.I believe that's correct. Each swap can move at least one element into it's final position. After three swaps, there may be two elements which are still out of place, and swapping them will move them both into their final position. Thus, four swaps max. :-)

Feb 04 2016

On 02/04/2016 02:24 AM, Andrei Alexandrescu wrote:This appears a simple problem: given numbers a, b, c, d, e, swap them around so as to place the median in c and partition the others around it. I.e. the postcondition is: a <= c && b <= c && c <= d && c <= e. Searching the Net for this isn't easy. Fortunately "our own" Xinok has the best of breed result, see http://stackoverflow.com/questions/11065963/possible-to-partition-five-elements-by-median-with-six-comparisons. His algorithm is a little suboptimal in that when given a sorted array, it sometimes leaves it unsorted. So I changed it to http://dpaste.dzfl.pl/5fb2238d9891 to fix that. If I count right, it also saves one swap: does 0-9 swaps instead of 0-10. Ideally however, such an algorithm would do 0 swaps if the median is already in c. My algorithm may still swap d and e without necessity. So there's got to be a better solution. Your challenge - should you choose to accept it :o) - is an algorithm that does the partitioning in 6 comparisons and <= 9 swaps, which is idempotent: when applied twice, it always does 0 swaps. AndreiAt most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): void partition5(ref int[5] a){ if(a[0]<a[1]){ if(a[2]<a[3]){ if(a[0]<a[2]){ if(a[1]<a[4]){ if(a[1]<a[2]){ if(!(a[2]<a[4])){ swap(a[4],a[2]); } }else if(a[1]<a[3]){ swap(a[1],a[2]); }else{ swap(a[3],a[2]); swap(a[1],a[3]); } }else if(a[2]<a[4]){ if(a[3]<a[4]){ swap(a[3],a[2]); swap(a[1],a[3]); }else{ swap(a[4],a[2]); swap(a[1],a[4]); } }else if(a[1]<a[2]){ swap(a[1],a[2]); swap(a[1],a[4]); }else{ swap(a[1],a[4]); } }else if(a[3]<a[4]){ if(a[0]<a[3]){ if(a[1]<a[3]){ swap(a[1],a[2]); }else{ swap(a[3],a[2]); swap(a[1],a[3]); } }else if(a[0]<a[4]){ swap(a[0],a[2]); swap(a[1],a[3]); }else{ swap(a[4],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[0]<a[4]){ if(a[1]<a[4]){ swap(a[1],a[2]); }else{ swap(a[4],a[2]); swap(a[1],a[4]); } }else if(a[0]<a[3]){ swap(a[0],a[2]); swap(a[1],a[4]); }else{ swap(a[3],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[0]<a[3]){ if(a[1]<a[4]){ if(a[1]<a[3]){ if(a[3]<a[4]){ swap(a[3],a[2]); }else{ swap(a[4],a[2]); } }else if(a[1]<a[2]){ swap(a[1],a[2]); swap(a[1],a[3]); }else{ swap(a[1],a[3]); } }else if(a[3]<a[4]){ if(a[2]<a[4]){ swap(a[1],a[3]); }else{ swap(a[4],a[2]); swap(a[1],a[3]); } }else if(a[1]<a[3]){ swap(a[1],a[2]); swap(a[1],a[4]); }else{ swap(a[3],a[2]); swap(a[1],a[4]); } }else if(a[2]<a[4]){ if(a[0]<a[2]){ if(a[1]<a[2]){ swap(a[1],a[2]); swap(a[1],a[3]); }else{ swap(a[1],a[3]); } }else if(a[0]<a[4]){ swap(a[0],a[2]); swap(a[1],a[3]); }else{ swap(a[4],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[0]<a[4]){ if(a[1]<a[4]){ swap(a[1],a[2]); swap(a[1],a[3]); }else{ swap(a[4],a[2]); swap(a[1],a[3]); } }else if(a[0]<a[2]){ swap(a[0],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); }else{ swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[2]<a[3]){ if(a[0]<a[3]){ if(a[2]<a[4]){ if(a[0]<a[4]){ if(!(a[0]<a[2])){ swap(a[0],a[2]); } }else if(a[1]<a[4]){ swap(a[4],a[2]); swap(a[0],a[4]); }else{ swap(a[1],a[2]); swap(a[0],a[4]); } }else if(a[0]<a[2]){ if(a[0]<a[4]){ swap(a[4],a[2]); }else{ swap(a[0],a[2]); swap(a[0],a[4]); } }else if(a[1]<a[2]){ swap(a[0],a[4]); }else{ swap(a[1],a[2]); swap(a[0],a[4]); } }else if(a[1]<a[4]){ if(a[3]<a[4]){ if(a[1]<a[3]){ swap(a[3],a[2]); swap(a[0],a[3]); }else{ swap(a[1],a[2]); swap(a[0],a[3]); } }else if(a[2]<a[4]){ swap(a[4],a[2]); swap(a[0],a[4]); }else{ swap(a[0],a[4]); } }else if(a[1]<a[3]){ if(a[1]<a[2]){ swap(a[0],a[4]); }else{ swap(a[1],a[2]); swap(a[0],a[4]); } }else if(a[3]<a[4]){ swap(a[4],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); }else{swap(a[3],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[0]<a[2]){ if(a[3]<a[4]){ if(a[0]<a[4]){ if(a[0]<a[3]){ swap(a[3],a[2]); }else{ swap(a[0],a[2]); swap(a[0],a[3]); } }else if(a[1]<a[4]){ swap(a[4],a[2]); swap(a[0],a[3]); }else{ swap(a[1],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[0]<a[3]){ if(a[0]<a[4]){ swap(a[4],a[2]); }else{ swap(a[0],a[2]); swap(a[0],a[4]); } }else if(a[1]<a[3]){ swap(a[3],a[2]); swap(a[0],a[4]); }else{ swap(a[1],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[1]<a[4]){ if(a[2]<a[4]){ if(a[1]<a[2]){ swap(a[0],a[3]); }else{ swap(a[1],a[2]); swap(a[0],a[3]); } }else if(a[3]<a[4]){ swap(a[4],a[2]); swap(a[0],a[3]); }else{ swap(a[3],a[2]); swap(a[0],a[4]); } }else if(a[1]<a[2]){ if(a[1]<a[3]){ swap(a[3],a[2]); swap(a[0],a[4]); }else{ swap(a[1],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); } }else if(a[2]<a[4]){ swap(a[4],a[2]); swap(a[0],a[3]); swap(a[1],a[4]); }else{ swap(a[0],a[3]); swap(a[1],a[4]); } }

Feb 04 2016

On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): void partition5(ref int[5] a){ if(a[0]<a[1]){ if(a[2]<a[3]){ if(a[0]<a[2]){ if(a[1]<a[4]){ if(a[1]<a[2]){ if(!(a[2]<a[4])){ swap(a[4],a[2]); } }else if(a[1]<a[3]){ swap(a[1],a[2]); }else{ swap(a[3],a[2]); swap(a[1],a[3]); <snip>That's about what i expected for the actual function (like this) to look like. Course the only way to test this is to brute force the combinations and confirm it's all in the order they should be.

Feb 04 2016

On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): void partition5(ref int[5] a){ if(a[0]<a[1]){ ...Great, we can all go home! I was curious so I did a crude measurement: when compiled for 64-bit with all optimizations turned on (-release -O -inline -boundscheck=off), the machine code for this function is about 3KB in size.

Feb 04 2016

On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): ...Inspired by this, I made four other versions of the function that are shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively). Each makes 6 comparisons and should be idempotent. http://dpaste.dzfl.pl/1c53d8f432f7 Here are the distributions of the number of swaps when tested with all 120 possible permutations as the input: testing 'partition5a' (by Timon Gehr) 0 swaps: 4 orderings 1 swaps: 32 orderings 2 swaps: 68 orderings 3 swaps: 16 orderings testing 'partition5b' 0 swaps: 4 orderings 1 swaps: 20 orderings 2 swaps: 42 orderings 3 swaps: 40 orderings 4 swaps: 14 orderings testing 'partition5c' 0 swaps: 4 orderings 1 swaps: 14 orderings 2 swaps: 25 orderings 3 swaps: 30 orderings 4 swaps: 26 orderings 5 swaps: 16 orderings 6 swaps: 5 orderings testing 'partition5d' 0 swaps: 4 orderings 1 swaps: 14 orderings 2 swaps: 24 orderings 3 swaps: 29 orderings 4 swaps: 26 orderings 5 swaps: 16 orderings 6 swaps: 6 orderings 7 swaps: 1 orderings testing 'partition5e' 0 swaps: 4 orderings 1 swaps: 12 orderings 2 swaps: 19 orderings 3 swaps: 25 orderings 4 swaps: 25 orderings 5 swaps: 18 orderings 6 swaps: 11 orderings 7 swaps: 5 orderings 8 swaps: 1 orderings

Feb 05 2016

On Friday, 5 February 2016 at 15:13:56 UTC, tn wrote:On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:Very nice! I'm curious, to both you and Timon, how did you go about writing these and coming up with the solutions? I'm not sure if I could come up with these results myself and so quickly at that.At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): ...Inspired by this, I made four other versions of the function that are shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively). Each makes 6 comparisons and should be idempotent. http://dpaste.dzfl.pl/1c53d8f432f7 ...

Feb 05 2016

On 02/05/2016 10:42 PM, Xinok wrote:On Friday, 5 February 2016 at 15:13:56 UTC, tn wrote:I quickly hacked together a brute-force search. Code: http://dpaste.dzfl.pl/43503913ac1d The search only makes sure that it works for distinct input values. Other input orders come for free; the verification code I have included checks this. The code recursively splits the set of all permutations on 5 elements by all possible comparisons between two elements up to a maximal depth of 6 comparisons, using some basic pruning. The recursion terminates if all permutations in the set can be partitioned using the same swaps. (I.e., the index of the median is fixed and there are exactly two possible indices for the first two elements, and exactly two possible indices for the last two elements.) The partition5 function I have posted is the first result that the search found. The above code also supports optimizing for source code size (-version=SHORT). The result is not that much shorter though.On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:Very nice! I'm curious, to both you and Timon, how did you go about writing these and coming up with the solutions? I'm not sure if I could come up with these results myself and so quickly at that.At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): ...Inspired by this, I made four other versions of the function that are shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively). Each makes 6 comparisons and should be idempotent. http://dpaste.dzfl.pl/1c53d8f432f7 ...

Feb 05 2016

On 02/05/2016 04:42 PM, Xinok wrote:Very nice! I'm curious, to both you and Timon, how did you go about writing these and coming up with the solutions? I'm not sure if I could come up with these results myself and so quickly at that.I was thinking the same exact things. -- Andrei

Feb 05 2016

On Friday, 5 February 2016 at 21:42:41 UTC, Xinok wrote:On Friday, 5 February 2016 at 15:13:56 UTC, tn wrote:I basically just started from Timons solution and made is smaller by replacing branches by swaps. The outermost structure of Timons code looks like this: if (a[0] < a[1]) { do something 1 } else { do something 2 } I replaced the above by if (a[0] < a[1]) { swap(a[0], a[1]); } do something 1 So adding one swap just about halved the size of the code. Now "do something 1" again has two branches: if (a[2] < a[3]) { do something 1.1 } else { do something 1.2 } So we can do the same trick again and replace it by if (a[2] < a[3]) { swap(a[2], a[3]); } do something 1.1 And so on. (When doing subsequent swaps we also need to make sure to not break the conditions achieved by the previous swaps.) However, the swap of a[0] and a[1] in the first replacement would break idempotence, so I also had to change which elements are compared (and swapped). So in my code we first compare a[0] and a[2], then a[1] and a[3], and so on.On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:Very nice! I'm curious, to both you and Timon, how did you go about writing these and coming up with the solutions? I'm not sure if I could come up with these results myself and so quickly at that.At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): ...Inspired by this, I made four other versions of the function that are shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively). Each makes 6 comparisons and should be idempotent. http://dpaste.dzfl.pl/1c53d8f432f7 ...

Feb 06 2016

On 02/05/2016 10:13 AM, tn wrote:On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:Whoa, I missed this last night. Awesome work! Thanks! I'm kinda running out of time budget here for this particular subproblem, but I've got to do these justice and try them out as well.At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps): ...Inspired by this, I made four other versions of the function that are shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively). Each makes 6 comparisons and should be idempotent. http://dpaste.dzfl.pl/1c53d8f432f7Here are the distributions of the number of swaps when tested with all 120 possible permutations as the input: testing 'partition5a' (by Timon Gehr) 0 swaps: 4 orderings 1 swaps: 32 orderings 2 swaps: 68 orderings 3 swaps: 16 orderings testing 'partition5b' 0 swaps: 4 orderings 1 swaps: 20 orderings 2 swaps: 42 orderings 3 swaps: 40 orderings 4 swaps: 14 orderings[...] Could you please add two simple calculations? Assuming uniform random distribution of data, compute the average number of swaps as a weighted average of orderings. Also, show the number of lines (one test or one swap per line) as a proxy for generated code size. That should provide good insights. Thanks! Andrei

Feb 06 2016

On Saturday, 6 February 2016 at 13:06:37 UTC, Andrei Alexandrescu wrote:Could you please add two simple calculations? Assuming uniform random distribution of data, compute the average number of swaps as a weighted average of orderings. Also, show the number of lines (one test or one swap per line) as a proxy for generated code size. That should provide good insights.This now prints the numbers of comparison and swap lines and computes the average numbers of swaps. In addition, it also runs the same tests for permutations of 5 elements some of which might be equal (an example of such permutation would be [1, 2, 0, 2, 3], on the other hand [1, 2, 0, 2, 4] should not be included since it is identical). However, in this case the average number of swaps is perhaps not so meaningful. http://dpaste.dzfl.pl/2012caf872ec Output: testing 'partition5a' Code: comparisons = 63, swaps = 107, total = 170 With all orderings of distinct elements: 0 swaps: 4 instances 1 swaps: 32 instances 2 swaps: 68 instances 3 swaps: 16 instances Average number of swaps: 1.8 With all orderings of potentially nondistinct elements: Error, not idempotent: [0, 0, 0, 0, 0] => [0, 0, 0, 0, 0] testing 'partition5b' Code: comparisons = 17, swaps = 21, total = 38 With all orderings of distinct elements: 0 swaps: 4 instances 1 swaps: 20 instances 2 swaps: 42 instances 3 swaps: 40 instances 4 swaps: 14 instances Average number of swaps: 2.33333 With all orderings of potentially nondistinct elements: 0 swaps: 36 instances 1 swaps: 122 instances 2 swaps: 200 instances 3 swaps: 146 instances 4 swaps: 37 instances Average number of swaps: 2.04806 testing 'partition5c' Code: comparisons = 10, swaps = 12, total = 22 With all orderings of distinct elements: 0 swaps: 4 instances 1 swaps: 14 instances 2 swaps: 25 instances 3 swaps: 30 instances 4 swaps: 26 instances 5 swaps: 16 instances 6 swaps: 5 instances Average number of swaps: 3.06667 With all orderings of potentially nondistinct elements: 0 swaps: 36 instances 1 swaps: 92 instances 2 swaps: 130 instances 3 swaps: 138 instances 4 swaps: 92 instances 5 swaps: 42 instances 6 swaps: 11 instances Average number of swaps: 2.60628 testing 'partition5d' Code: comparisons = 7, swaps = 8, total = 15 With all orderings of distinct elements: 0 swaps: 4 instances 1 swaps: 14 instances 2 swaps: 24 instances 3 swaps: 29 instances 4 swaps: 26 instances 5 swaps: 16 instances 6 swaps: 6 instances 7 swaps: 1 instances Average number of swaps: 3.13333 With all orderings of potentially nondistinct elements: 0 swaps: 36 instances 1 swaps: 100 instances 2 swaps: 128 instances 3 swaps: 133 instances 4 swaps: 94 instances 5 swaps: 39 instances 6 swaps: 10 instances 7 swaps: 1 instances Average number of swaps: 2.57486 testing 'partition5e' Code: comparisons = 6, swaps = 8, total = 14 With all orderings of distinct elements: 0 swaps: 4 instances 1 swaps: 12 instances 2 swaps: 19 instances 3 swaps: 25 instances 4 swaps: 25 instances 5 swaps: 18 instances 6 swaps: 11 instances 7 swaps: 5 instances 8 swaps: 1 instances Average number of swaps: 3.53333 With all orderings of potentially nondistinct elements: 0 swaps: 36 instances 1 swaps: 88 instances 2 swaps: 100 instances 3 swaps: 123 instances 4 swaps: 106 instances 5 swaps: 53 instances 6 swaps: 25 instances 7 swaps: 9 instances 8 swaps: 1 instances Average number of swaps: 2.89649

Feb 06 2016

On 02/06/2016 01:42 PM, tn wrote:On Saturday, 6 February 2016 at 13:06:37 UTC, Andrei Alexandrescu wrote:Awesome, thanks. Will need to try at least a few of these out. At a quick glance, partition5d seems to be the sweet spot - it's small and does only 3.13/2.57 swaps on average. -- AndreiCould you please add two simple calculations? Assuming uniform random distribution of data, compute the average number of swaps as a weighted average of orderings. Also, show the number of lines (one test or one swap per line) as a proxy for generated code size. That should provide good insights.This now prints the numbers of comparison and swap lines and computes the average numbers of swaps. In addition, it also runs the same tests for permutations of 5 elements some of which might be equal (an example of such permutation would be [1, 2, 0, 2, 3], on the other hand [1, 2, 0, 2, 4] should not be included since it is identical). However, in this case the average number of swaps is perhaps not so meaningful. http://dpaste.dzfl.pl/2012caf872ec

Feb 06 2016

On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps)[...]One swap usually decomposes into three moves. A cycle of length n can be sorted with n + 1 moves. Corresponding to the seven integer partitions of 5 we obtain: 5: 6 4 + 1: 5 3 + 2: 4 + 3 3 + 1 + 1: 4 2 + 2 + 1: 3 + 3 2 + 1 + 1 + 1: 3 1 + 1 + 1 + 1 + 1: 0 So we can do with seven moves instead of three swaps (nine moves). ;-)

Feb 05 2016

On Friday, 5 February 2016 at 15:41:11 UTC, Fool wrote:So we can do with seven moves instead of three swaps (nine moves). ;-)Yes, well, this all breaks down when you realize that this is all completely dominated by cache misses and that you can do median more effectively using SIMD instructions. :^P

Feb 05 2016

On Friday, 5 February 2016 at 15:41:11 UTC, Fool wrote:One swap usually decomposes into three moves.Recently from reading some interesting hacks and super code, a good swap can also be done via 3 xor operations (and avoiding an intermediate storage unit). http://aggregate.ee.engr.uky.edu/MAGIC/#Swap%20Values%20Without%20a%20Temporary x ^= y; y ^= x; x ^= y; Since these can be used directly as registers it might be faster (assuming all 5 are mapped to registers until the final write out, which might not work).

Feb 05 2016

On Friday, 5 February 2016 at 17:33:55 UTC, Era Scarecrow wrote:On Friday, 5 February 2016 at 15:41:11 UTC, Fool wrote:It's a cool trick but I would be surprised if this were actually faster. Modern x64 processors have 16 "general purpose" registers so saving a single register for a simple set of instructions is likely to not have any benefit. My other thought is that the compiler may not recognize this pattern, making it more difficult to optimize. On the other hand, compilers are very good at rearranging simple reads and writes to avoid stalling the pipeline in the CPU.One swap usually decomposes into three moves.Recently from reading some interesting hacks and super code, a good swap can also be done via 3 xor operations (and avoiding an intermediate storage unit). http://aggregate.ee.engr.uky.edu/MAGIC/#Swap%20Values%20Without%20a%20Temporary x ^= y; y ^= x; x ^= y; Since these can be used directly as registers it might be faster (assuming all 5 are mapped to registers until the final write out, which might not work).

Feb 05 2016

On 02/05/2016 04:35 PM, Xinok wrote:It's a cool trick but I would be surprised if this were actually faster.That changes every few years. Last time I measured it was slower. -- Andrei

Feb 05 2016

On 02/05/2016 04:41 PM, Fool wrote:On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:The goal is to partition though, not to sort. Six moves are sufficient. (And necessary. E.g. for 42310.)At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps)[...]One swap usually decomposes into three moves. A cycle of length n can be sorted with n + 1 moves. Corresponding to the seven integer partitions of 5 we obtain: 5: 6 4 + 1: 5 3 + 2: 4 + 3 3 + 1 + 1: 4 2 + 2 + 1: 3 + 3 2 + 1 + 1 + 1: 3 1 + 1 + 1 + 1 + 1: 0 So we can do with seven moves instead of three swaps (nine moves). ;-)

Feb 05 2016

On Friday, 5 February 2016 at 21:48:58 UTC, Timon Gehr wrote:The goal is to partition though, not to sort. Six moves are sufficient. (And necessary. E.g. for 42310.)Indeed. I was wondering about that. Great job!

Feb 05 2016

On 02/04/2016 03:30 PM, Timon Gehr wrote:At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):Timon, Ivan, this is amazing work. I don't know how your minds work folks - I sat for like two hours on it yesterday and couldn't crack it. Surprisingly in a test on millions of random numbers, Ivan's version wins by a hair. Might be the tighter code which has icache effects. Thanks very much! I should say that at these point no better definitions can be found on the Net. Andrei

Feb 05 2016

On 02/04/2016 03:30 PM, Timon Gehr wrote:At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):What is the minimum number of comparisons? Thx! -- Andrei P.S. The sythesized searcher is genius.

Feb 05 2016

On 02/06/2016 02:39 AM, Andrei Alexandrescu wrote:On 02/04/2016 03:30 PM, Timon Gehr wrote:There is no partition algorithm that uses <= 5 comparisons in all cases. (If that is what you're asking.)At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):What is the minimum number of comparisons? Thx! -- AndreiP.S. The sythesized searcher is genius.Thanks!

Feb 05 2016

On 02/05/2016 08:58 PM, Timon Gehr wrote:On 02/06/2016 02:39 AM, Andrei Alexandrescu wrote:Was asking about this particular one. For example, the maximum comparisons is 6, but it would be good to know the average number of comparisons. I know I could read through the code, but it's hairy. Thanks! -- AndreiOn 02/04/2016 03:30 PM, Timon Gehr wrote:There is no partition algorithm that uses <= 5 comparisons in all cases. (If that is what you're asking.)At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):What is the minimum number of comparisons? Thx! -- Andrei

Feb 06 2016

On 02/06/2016 02:00 PM, Andrei Alexandrescu wrote:On 02/05/2016 08:58 PM, Timon Gehr wrote:All versions posted in this thread do 6 comparisons on all code paths. (It is not possible to do better.) BTW, the code I had posted earlier suffers from the following flaws: - Branches were cut off too early, so -version=SHORT didn't actually find the shortest code. [1] - The code (by necessity) only performs the optimal number of swaps if all input elements are different. While it leaves already partitioned integer arrays in the same state as they were encountered, if multiple input elements are equal, the generated algorithm will sometimes swap them, violating idempotence if input elements can compare equal but be different. However, tn's versions fix this: they never perform any swaps if the input array is already partitioned. [1] This seems to be the shortest code that satisfies the specification I have given (<=6 comparisons, optimal number of swaps) for permutations and that performs all the comparing before all the swapping: void partition5(ref int[5] a){ if(a[0]<a[2]){if(a[1]<a[3]){if(a[0]<a[1]){if(a[2]<a[4]){if(a[1]<a[2]){if(a[3]<a[2]){swap(a[3],a[2]);}}else{if(a[1]<a[4]){swap(a[1],a[2]);}else{swap(a[4],a[2]);swap(a[1],a[4]);}}}else{if(a[1]<a[4]){if(a[3]<a[4]){swap(a[3],a[2]);}else{swap(a[4],a[2]);}}else{if(a[1]<a[2]){swap(a[1],a[2]);swap(a[1],a[4]);}else{swap(a[1],a[4]);}}}}else{if(a[3]<a[4]){if(a[0]<a[3]){if(a[3]<a[2]){swap(a[3],a[2]);}}else{if(a[0]<a[4]){swap(a[0],a[2]);swap(a[0],a[3]);}else{swap(a[4],a[2]);swap(a[0],a[3]);}}}else{if(a[0]<a[4]){if(a[4]<a[2]){swap(a[4],a[2]);}}else{if(a[0]<a[3]){swap(a[0],a[2]);swap(a[0],a[4]);}else{swap(a[3],a[2]);swap(a[0],a[4]);}}}}}else{if(a[0]<a[3]){if(a[2]<a[4]){if(a[2]<a[3]){if(a[3]<a[4]){swap(a[3],a[2]);swap(a[1],a[3]);}else{swap(a[4],a[2]);swap(a[1],a[4]);}}else{if(a[1]<a[2]){swap(a[1],a[2]);swap(a[1],a[3]);}else{swap(a[1],a[3]);}}}else{if(a[3]<a[4]){if(a[1]<a[4]){swap(a[1],a[2]);swap(a[1],a[3]);}else{swap(a[4],a[2]);swap(a[1],a[3]);}}else{if(a[2]<a[3]){swap(a[1],a[4]);}else{swap(a[3],a[2] );swap(a[1],a[4]);}}}}else{if(a[1]<a[4]){if(a[0]<a[1]){if(a[1]<a[2]){swap(a[1],a[2]);swap(a[1],a[3]);}else{swap(a[1],a[3]);}}else{if(a[0]<a[4]){swap(a[0],a[2]);swap(a[0],a[3]);}else{swap(a[4],a[2]);swap(a[0],a[3]);}}}else{if(a[0]<a[4]){if(a[2]<a[4]){swap(a[1],a[3]);}else{swap(a[4],a[2]);swap(a[1],a[3]);}}else{if(a[0]<a[1]){swap(a[0],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}}}}}}else{if(a[3]<a[4]){if(a[0]<a[4]){if(a[1]<a[3]){if(a[0]<a[3]){if(a[0]<a[1]){swap(a[1],a[2]);}else{swap(a[0],a[2]);}}else{if(a[2]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);}else{swap(a[0],a[3]);}}}else{if(a[0]<a[1]){if(a[0]<a[3]){swap(a[3],a[2]);swap(a[1],a[3]);}else{swap(a[0],a[2]);swap(a[1],a[3]);}}else{if(a[1]<a[2]){swap(a[0],a[3]);}else{swap(a[1],a[2]);swap(a[0],a[3]);}}}}else{if(a[1]<a[2]){if(a[2]<a[4]){if(a[2]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);}else{swap(a[0],a[3]);}}else{if(a[1]<a[4]){swap(a[4],a[2]);swap(a[0],a[3]);}else{swap(a[1],a[2]);swap(a[0],a[3]);s wap(a[1],a[4]);}}}else{if(a[1]<a[4]){if(a[1]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);}else{swap(a[1],a[2]);swap(a[0],a[3]);}}else{if(a[2]<a[4]){swap(a[4],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}else{swap(a[0],a[3]);swap(a[1],a[4]);}}}}}else{if(a[0]<a[3]){if(a[1]<a[4]){if(a[0]<a[4]){if(a[0]<a[1]){swap(a[1],a[2]);}else{swap(a[0],a[2]);}}else{if(a[2]<a[4]){swap(a[4],a[2]);swap(a[0],a[4]);}else{swap(a[0],a[4]);}}}else{if(a[0]<a[1]){if(a[0]<a[4]){swap(a[4],a[2]);swap(a[1],a[4]);}else{swap(a[0],a[2]);swap(a[1],a[4]);}}else{if(a[1]<a[2]){swap(a[0],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[4]);}}}}else{if(a[1]<a[2]){if(a[2]<a[3]){if(a[2]<a[4]){swap(a[4],a[2]);swap(a[0],a[4]);}else{swap(a[0],a[4]);}}else{if(a[1]<a[3]){swap(a[3],a[2]);swap(a[0],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}}}else{if(a[1]<a[3]){if(a[1]<a[4]){swap(a[4],a[2]);swap(a[0],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[4]);}}else{if(a[2]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}else{swap(a[0],a[3]);sw ap(a[1],a[4]);}}}}}} }On 02/06/2016 02:39 AM, Andrei Alexandrescu wrote:Was asking about this particular one. For example, the maximum comparisons is 6, but it would be good to know the average number of comparisons. I know I could read through the code, but it's hairy. Thanks! -- AndreiOn 02/04/2016 03:30 PM, Timon Gehr wrote:There is no partition algorithm that uses <= 5 comparisons in all cases. (If that is what you're asking.)At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):What is the minimum number of comparisons? Thx! -- Andrei

Feb 06 2016

On 2/6/16 9:26 AM, Timon Gehr wrote:[1] This seems to be the shortest code that satisfies the specification I have given (<=6 comparisons, optimal number of swaps) for permutations and that performs all the comparing before all the swapping:Thanks. Tried this just now, it's better than the pre-discussion baselines but Ivan's still beats it (by a little). Then I just tried tn's partition5c which beats Ivan's. I should also add that returns are starting to diminish. Idempotence did make a large difference. Then reducing max swaps made a smaller difference (at least for the uints I'm working with). BTW my testbed is a careful implementation of BFPRT on random arrays of uint of various sizes. Andrei

Feb 06 2016

On 02/04/2016 03:30 PM, Timon Gehr wrote:At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):Oh, also: could you let that bad boy run and let it find anything that does idempotent partition in 6 compares and 0-7 swaps? Then slowly tighten the number of swaps until you find the best balance between number of swaps and code size. -- Andrei

Feb 05 2016

On Saturday, 6 February 2016 at 01:46:58 UTC, Andrei Alexandrescu wrote:On 02/04/2016 03:30 PM, Timon Gehr wrote:That is kind of what I tried to do by hand. My function partition5e seems to be practically identical to Ivans solution and partition5a is just a copy of Timons solution. Function partition5b, partition5c and partition5d are in between of those with various tradeoffs between the number of swaps and code size.At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):Oh, also: could you let that bad boy run and let it find anything that does idempotent partition in 6 compares and 0-7 swaps? Then slowly tighten the number of swaps until you find the best balance between number of swaps and code size. -- Andrei

Feb 06 2016

On Thursday, 4 February 2016 at 01:24:15 UTC, Andrei Alexandrescu wrote:So there's got to be a better solution. Your challenge - should you choose to accept it :o) - is an algorithm that does the partitioning in 6 comparisons and <= 9 swaps, which is idempotent: when applied twice, it always does 0 swaps.Here's my take at it. It's idempotent and does exactly 6 comparisons and 0-8 swaps. The advantages to counter the not-so-good stats are that it has a flat structure, and is not hard to reason about. The idea is to process the following steps: 1. Find the minimum of {b, c, d, e} in three comparisons, and put it into b. The structure is: b < d, c < e, b < c. Note that d and e were not compared if no swaps were made. 2. Find the minimum of {a, c, d, e} in two more comparisons, and put it into a. The structure is: a < d, c < e, a < c. Note that a and b were not compared if no swaps were made. 3. Find the minimum of {c, d, e} in one more comparison, and put it into c. The structure is: c < d, c < e. Note that d and e were not compared if no swaps were made. In the end, we have a < c, b < c, c < d, c < e. Additionally, if these inequalities held initially, everything is left in place regardless of the results of comparisons of (a, b) and (d, e). In code: ----- import std.algorithm; import std.exception; import std.range; import std.stdio; void p5 (ref int a, ref int b, ref int c, ref int d, ref int e) { if (d < b) {swap (b, d);} if (e < c) {swap (c, e);} if (c < b) {swap (b, c); swap (d, e);} if (d < a) {swap (a, d);} if (c < a) {swap (a, c); swap (d, e);} if (d < c) {swap (c, d);} } void main () { auto a = 5.iota.array; do { auto b = a.dup; p5 (b[0], b[1], b[2], b[3], b[4]); auto c = b.dup; p5 (c[0], c[1], c[2], c[3], c[4]); writeln (a, " -> ", b, " -> ", c); enforce (b[0] < b[2] && b[1] < b[2] && b[2] < b[3] && b[2] < b[4]); enforce (equal (b, c)); } while (nextPermutation (a)); } ----- Another interesting task would be to make the function stable, but I don't see how it is possible with such flat structure. Ivan Kazmenko.

Feb 05 2016

On 02/05/2016 06:36 AM, Ivan Kazmenko wrote:Another interesting task would be to make the function stable, but I don't see how it is possible with such flat structure.Under what circumstances isn't your function unstable? -- Andrei

Feb 05 2016

On 02/05/2016 07:59 PM, Andrei Alexandrescu wrote:On 02/05/2016 06:36 AM, Ivan Kazmenko wrote:I meant "is your function unstable". Double negation, but if you speak Russian those count as one :o). -- AndreiAnother interesting task would be to make the function stable, but I don't see how it is possible with such flat structure.Under what circumstances isn't your function unstable? -- Andrei

Feb 05 2016

On Saturday, 6 February 2016 at 00:59:17 UTC, Andrei Alexandrescu wrote:On 02/05/2016 06:36 AM, Ivan Kazmenko wrote:For example, if elements are "value | id", compared by value, then: 0|0, 0|1, 1|2, 0|3, 0|4 is transformed into 0|0, 0|1, 0|4, 0|3, 1|2 and 0|4 is now placed earlier than 0|3, which violates the stability requirement. Things can be reordered a bit, but it seems that no possible order eliminates the need to remember a part of the history. On the other hand, when we track our whole path in the decision tree (e.g. at the leaves of Timon Gehr's tree of ifs), we have all information to make the partition stable. In any case, finding the shortest-code stable partition-of-5 algorithm looks like a problem solvable by an automated searcher. Regarding the speed, there are different use cases with different requirements, for example: 1. Primitive types (cheap swap, cheap comparison). 2. Heavy structs A (expensive swap, cheap comparison - e.g., compare one field of primitive type). 3. Heavy structs B (expensive swap, expensive comparison - e.g., call a heavy external function). 4. Heavy classes (cheap swap - pointers only, expensive comparison). So there's perhaps no single best solution.Another interesting task would be to make the function stable, but I don't see how it is possible with such flat structure.Under what circumstances isn't your function unstable? -- Andrei

Feb 05 2016

On Saturday, 6 February 2016 at 07:06:27 UTC, Ivan Kazmenko wrote:On Saturday, 6 February 2016 at 00:59:17 UTC, Andrei Alexandrescu wrote:The code I used to check stability: http://dpaste.dzfl.pl/2b950cb3e5d8 The "y" and "n" at end of lines are "yes"/"no", as in "stable"/"unstable".

Feb 05 2016

On Saturday, 6 February 2016 at 07:06:27 UTC, Ivan Kazmenko wrote:1. Primitive types (cheap swap, cheap comparison). 2. Heavy structs A (expensive swap, cheap comparison - e.g., compare one field of primitive type). 3. Heavy structs B (expensive swap, expensive comparison - e.g., call a heavy external function). 4. Heavy classes (cheap swap - pointers only, expensive comparison). So there's perhaps no single best solution.That's right, but other factors are more important: preventing pipeline stalls. If you are collecting from 5 different cachelines in an array you are likely to get several 40-120 cycles delays unless you do prefetching, and if you do, you need to have other instructions to fill in the latency gaps. But also instructions have latency and concurrency issues. Which is why your version did reasonably well as it made the compares/swaps independent so that they could be concurrently scheduled. Yet, Haswell has SIMD instructions that can do 8-16x 32-bit max/min operations per cycle, with a latency of only 1 cycle, and 4-8x 64bit compares with a latency of 1 cycle. So if you use as small fixed N, like 5, it makes very little sense to count compares/swaps. What makes sense is to focus on how you can avoid branching and build an algorithm with no pipeline stalls. If sorting large arrays you also might want to look at multi-threaded parallel sort functions.

Feb 06 2016

On 02/06/2016 02:06 AM, Ivan Kazmenko wrote:On Saturday, 6 February 2016 at 00:59:17 UTC, Andrei Alexandrescu wrote:Yah. I think stability should be solved if there's a need/application for it, e.g. is there a larger algorithm that would be stable if partition5 were stable?Regarding the speed, there are different use cases with different requirements, for example: 1. Primitive types (cheap swap, cheap comparison). 2. Heavy structs A (expensive swap, cheap comparison - e.g., compare one field of primitive type). 3. Heavy structs B (expensive swap, expensive comparison - e.g., call a heavy external function). 4. Heavy classes (cheap swap - pointers only, expensive comparison). So there's perhaps no single best solution.Yah, good point. I should also add that more comparisons generally mean more branching and more code. -- Andrei

Feb 06 2016