digitalmars.D - Stable partition3 implementation

Xinok (13/13) Jul 09 2015 I wanted to get some feedback on a stable partition3

Xinok (4/7) Jul 09 2015 I apologize, I didn't realize this link was behind a paywall. The

Nick B (40/45) Jul 09 2015 from the pdf, above, in case readers, like me, don't know the

Timon Gehr (3/10) Jul 10 2015 Note how /they/ don't mention "complexity". This is because algorithms

Timon Gehr (8/15) Jul 10 2015 I think this method is likely to be less practically relevant than the

Xinok (6/15) Jul 10 2015 log* n grows fast for small inputs, so we can't simply ignore it.

Timon Gehr (20/36) Jul 10 2015 No, it does not. It has no space to grow. With base e, it is 3 or 4 for

Xinok (15/37) Jul 22 2015 I understand now. I had never heard of an iterated logarithm

John Colvin (4/21) Jul 22 2015 I haven't looked at the paper, but # followed by a set often

Andrei Alexandrescu (4/15) Jul 10 2015 I'd say both would be nice (not to mention the one in the paper) so how

Xinok (20/27) Jul 10 2015 I'm hesitant to add yet another template parameter; IMHO, it's

Andrei Alexandrescu (2/5) Jul 10 2015 Then give them different names. -- Andrei

"Xinok" <xinok live.com> writes:

I wanted to get some feedback on a stable partition3 
implementation for Phobos before I work on a pull request. I 
wrote this implementation which is in-place (zero memory 
allocations) but runs in O(n lg n) time.

http://dpaste.dzfl.pl/e12f50ad947d

I found this paper which describes an in-place algorithm with 
O(n) time complexity but it's over my head at the moment.

http://link.springer.com/article/10.1007%2FBF01994842

It is trivial to implement an algorithm with O(n) time complexity 
and O(n) space complexity, but that requires allocating memory. 
Furthermore, using a buffer requires the range to have assignable 
elements.


Any thoughts?

Jul 09 2015

"Xinok" <xinok live.com> writes:

On Thursday, 9 July 2015 at 21:57:39 UTC, Xinok wrote:
 I found this paper which describes an in-place algorithm with 
 O(n) time complexity but it's over my head at the moment.

 http://link.springer.com/article/10.1007%2FBF01994842

I apologize, I didn't realize this link was behind a paywall. The 
paper is freely available here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.5554&rep=rep1&type=pdf

Jul 09 2015

"Nick B" <nick.barbalich gmail.com> writes:

On Friday, 10 July 2015 at 00:39:16 UTC, Xinok wrote:
 On Thursday, 9 July 2015 at 21:57:39 UTC, Xinok wrote:
 I found this paper which describes an in-place algorithm with 
 O(n) time complexity but it's over my head at the moment.


[snip]
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.5554&rep=rep1&type=pdf

from the pdf, above, in case readers, like me, don't know the 
context of a Stable Partition implementation:

Stable minimum space partitioning
in linear time
Jyrki Katajainen1 and Tomi Pasanen2

Abstract.
In the stable 0-1 sorting problem the task is to sort an array of 
n elements with
two distinct values such that equal elements retain their 
relative input order.
Recently, Munro, Raman and Salowe [BIT 1990] gave an algorithm 
which solves
this problem in O(nlog*n)
3 time and constant extra space. We show that by a
modification of their method the stable 0-1 sorting is possible 
in O(n) time and
O(1) extra space. Stable three-way partitioning can be reduced to 
stable 0-1
sorting. This immediately yields a stable minimum space 
quicksort, which sorts
multisets in asymptotically optimal time with high probability.
CR categories: E.5, F.2.2.

The stable 0-1 sorting problem is defined as follows: Given an 
array of n elements
and a function f mapping each element to the set {0,1}, the task 
is to rearrange the
elements such that all elements, whose f-value is zero, become 
before elements, whose
f-value is one. Moreover, the relative order of elements with 
equal f-values should be
maintained. For the sake of simplicity, we hereafter refer to 
bits instead of the f-values
of elements.

Stable partitioning is a special case of stable 0-1 sorting, 
where the f-values are
obtained by comparing every element xi
to some pivot element xj  (which will not take part in 
partitioning):

Jul 09 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/10/2015 03:48 AM, Nick B wrote:
 On Friday, 10 July 2015 at 00:39:16 UTC, Xinok wrote:
 On Thursday, 9 July 2015 at 21:57:39 UTC, Xinok wrote:
 I found this paper which describes an in-place algorithm with O(n)
 time complexity but it's over my head at the moment.


 ... stable 0-1 sorting is possible in O(n) time and
 O(1) extra space.

Note how /they/ don't mention "complexity". This is because algorithms 
don't have "complexities". Problems do. (Sorry, pet peeve of mine.)

Jul 10 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/09/2015 11:57 PM, Xinok wrote:
 I found this paper which describes an in-place algorithm with O(n) time
 complexity but it's over my head at the moment.

 http://link.springer.com/article/10.1007%2FBF01994842

 It is trivial to implement an algorithm with O(n) time complexity and
 O(n) space complexity, but that requires allocating memory. Furthermore,
 using a buffer requires the range to have assignable elements.


 Any thoughts?

I think this method is likely to be less practically relevant than the 
one they improve upon. (log* n really is constant for all practical 
purposes, it is the number of times one needs to iteratively take the 
logarithm until a number below 1 is obtained.)

That paper appears to be available here: 
https://github.com/lispc/lispc.github.com/raw/master/assets/files/situ.pdf

No idea what the constant is though, I have not read the paper (yet).

Jul 10 2015

"Xinok" <xinok live.com> writes:

On Friday, 10 July 2015 at 21:26:50 UTC, Timon Gehr wrote:
 I think this method is likely to be less practically relevant 
 than the one they improve upon. (log* n really is constant for 
 all practical purposes, it is the number of times one needs to 
 iteratively take the logarithm until a number below 1 is 
 obtained.)

log* n grows fast for small inputs, so we can't simply ignore it. 
For example, if n = 2^16 = 65536, then n*lg(n) = 16*2^16 = 2^20 = 
1048576.

 That paper appears to be available here: 
 https://github.com/lispc/lispc.github.com/raw/master/assets/files/situ.pdf

 No idea what the constant is though, I have not read the paper 
 (yet).

I don't know if there is anything relevant but that paper focuses 
on a different problem involving sorting.

Jul 10 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/11/2015 02:32 AM, Xinok wrote:
 On Friday, 10 July 2015 at 21:26:50 UTC, Timon Gehr wrote:
 I think this method is likely to be less practically relevant than the
 one they improve upon. (log* n really is constant for all practical
 purposes, it is the number of times one needs to iteratively take the
 logarithm until a number below 1 is obtained.)

 log* n grows fast for small inputs,

No, it does not. It has no space to grow. With base e, it is 3 or 4 for 
all input sizes that matter:

http://www.wolframalpha.com/input/?i=plot+IteratedLog+from+0+to+2^8
http://www.wolframalpha.com/input/?i=plot+IteratedLog+from+0+to+2^32
http://www.wolframalpha.com/input/?i=plot+IteratedLog+from+0+to+2^64
http://www.wolframalpha.com/input/?i=IteratedLog%282^2^2^2^2%29

 so we can't simply ignore it.

There is a priori no practical difference between a O(n) algorithm and a 
O(n*log*(n)) algorithm. It all depends on the constants, and it is hence 
not clear-cut that the O(n) algorithm will run faster.

I'm just saying that this should be taken into consideration.

 For example, if n = 2^16 = 65536, then n*lg(n) = 16*2^16 = 2^20 = 1048576.
 ...

The algorithm runs in O(n*log*(n)). It's not n log(n).

 That paper appears to be available here:
 https://github.com/lispc/lispc.github.com/raw/master/assets/files/situ.pdf


 No idea what the constant is though, I have not read the paper (yet).

 I don't know if there is anything relevant but that paper focuses on a
 different problem involving sorting.

No, it focuses on a few closely related problems, and the 
O(n*log*(n))-algorithms solves a problem that is a straightforward 
generalization of the problem you are looking at. Stable three way 
partition is stable sorting where there are only three "distinct" values 
(smaller, equal, greater). The paper you provided builds on top of this 
algorithm. It is the main reference and part (ii) of the "Algorithm B" 
part of the O(n)/O(1) algorithm does not occur in the paper you 
provided, but only in that other paper, so yes, it is relevant. :-)

Jul 10 2015

"Xinok" <xinok live.com> writes:

On Saturday, 11 July 2015 at 02:56:55 UTC, Timon Gehr wrote:
 ...
 The algorithm runs in O(n*log*(n)). It's not n log(n).
 ...

I understand now. I had never heard of an iterated logarithm 
before. I took the asterisk to mean some constant, like a wild 
card if you will. Sorry for the confusion.

I wonder if the true O(n) algorithm is still worth pursuing. 
Although log*(n) may only be a factor of 2 or 3, the O(n) 
algorithm may have a small constant factor which makes it 2-3x 
faster.

 That paper appears to be available here:
 https://github.com/lispc/lispc.github.com/raw/master/assets/files/situ.pdf


 No idea what the constant is though, I have not read the 
 paper (yet).

 I don't know if there is anything relevant but that paper 
 focuses on a
 different problem involving sorting.

 No, it focuses on a few closely related problems, and the 
 O(n*log*(n))-algorithms solves a problem that is a 
 straightforward generalization of the problem you are looking 
 at. Stable three way partition is stable sorting where there 
 are only three "distinct" values (smaller, equal, greater). The 
 paper you provided builds on top of this algorithm. It is the 
 main reference and part (ii) of the "Algorithm B" part of the 
 O(n)/O(1) algorithm does not occur in the paper you provided, 
 but only in that other paper, so yes, it is relevant. :-)

I get that much now, but this paper is still way above my head. 
There's some notation in the sample code that I don't understand. 
In this statement:




assigned to e. I'm assuming it performs some operation on the 
set, like "max", but I'm not sure what.

Jul 22 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 22 July 2015 at 16:59:29 UTC, Xinok wrote:
 On Saturday, 11 July 2015 at 02:56:55 UTC, Timon Gehr wrote:
 [...]

 I understand now. I had never heard of an iterated logarithm 
 before. I took the asterisk to mean some constant, like a wild 
 card if you will. Sorry for the confusion.

 I wonder if the true O(n) algorithm is still worth pursuing. 
 Although log*(n) may only be a factor of 2 or 3, the O(n) 
 algorithm may have a small constant factor which makes it 2-3x 
 faster.

 [...]

 I get that much now, but this paper is still way above my head. 
 There's some notation in the sample code that I don't 
 understand. In this statement:




 being assigned to e. I'm assuming it performs some operation on 
 the set, like "max", but I'm not sure what.


means cardinality, i.e. (assuming the set is is finite) the 
number of elements in the set.

Jul 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 7/9/15 5:57 PM, Xinok wrote:
 I wanted to get some feedback on a stable partition3 implementation for
 Phobos before I work on a pull request. I wrote this implementation
 which is in-place (zero memory allocations) but runs in O(n lg n) time.

 http://dpaste.dzfl.pl/e12f50ad947d

 I found this paper which describes an in-place algorithm with O(n) time
 complexity but it's over my head at the moment.

 http://link.springer.com/article/10.1007%2FBF01994842

 It is trivial to implement an algorithm with O(n) time complexity and
 O(n) space complexity, but that requires allocating memory. Furthermore,
 using a buffer requires the range to have assignable elements.


 Any thoughts?

I'd say both would be nice (not to mention the one in the paper) so how 
about both are present selectable with a policy ala Yes.tightMemory or 
something. -- Andrei

Jul 10 2015

"Xinok" <xinok live.com> writes:

On Saturday, 11 July 2015 at 00:00:47 UTC, Andrei Alexandrescu 
wrote:
 On 7/9/15 5:57 PM, Xinok wrote:
 ...

 Any thoughts?

 I'd say both would be nice (not to mention the one in the 
 paper) so how about both are present selectable with a policy 
 ala Yes.tightMemory or something. -- Andrei

I'm hesitant to add yet another template parameter; IMHO, it's 
bad enough that we have to manually write the predicate just to 
reach the swap strategy.

There's a fourth option I didn't think to mention. It's easy to 
utilize a small buffer which would be used to partition small 
sublists instead of insertions. partition3 would not allocate 
it's own memory so would be in-place by default, but the user 
could provide their own buffer and pass it as an extra function 
argument. For example:

     int[] arr = iota(0, 100000).array;
     int[] buf = new int[100];
     partition3!(...)(arr, 1000, buf);

Interestingly, for some constant k, if you implement this 
algorithm to use O(n/k) space, then it runs in O(n lg n) time 
because it performs at most O(lg k) rotations.

Regarding the algorithm in the paper, I don't have it completely 
figured out yet because it refers to algorithms in other papers 
which I can't find.

Jul 10 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 7/10/15 8:55 PM, Xinok wrote:
 I'm hesitant to add yet another template parameter; IMHO, it's bad
 enough that we have to manually write the predicate just to reach the
 swap strategy.

Then give them different names. -- Andrei

Jul 10 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Stable partition3 implementation