digitalmars.D - [OT] Algorithm question

H. S. Teoh via Digitalmars-d (70/70) Apr 30 2017 I'm been thinking about the following problem, and thought I'd pick the

Jack Stouffer (5/8) Apr 30 2017 as the first element found in A will be chosen, and then each

H. S. Teoh via Digitalmars-d (29/40) Apr 30 2017 Actually, this is exactly why it *does* satisfy the requirement.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/6) May 01 2017 O(N) given P(x)===false for all x...
John Colvin (27/88) May 01 2017 You can recover O(n) calls to P by caching the results. You can

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (11/19) May 01 2017 Yes, this is the standard text-book way of doing it, still O(N)

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (31/33) May 01 2017 To avoid confusion: I didn't mean a literal "heap", but some kind

H. S. Teoh via Digitalmars-d (36/56) May 01 2017 Sorry, I forgot to specify that P(x) can be different each time.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/16) May 01 2017 Does it exist? Yes, because you can build permutation tables for

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/30) May 01 2017 Keep also in mind that you can split the original array in two,

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (13/28) May 01 2017 But if you don't use a hardware random generator and are happy

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/14) May 01 2017 Well, actually, just select the single cyclic generator that is

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/8) May 01 2017 I take that back:

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/4) May 01 2017 Ah, found a sensible search term: «pseudorandom permutation»

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (9/13) May 01 2017 A readable paper for most people:

Moritz Maxeiner (33/48) May 01 2017 Works only with a random access range and I haven't done the full

Timon Gehr (6/49) May 01 2017 This deterministically chooses 0. (uniform is right-exclusive.) I assume...

Moritz Maxeiner (5/15) May 01 2017 Dang. Serves me right for not doing the full analysis. Thanks for

Mike B Johnson (12/15) May 01 2017 Since most real world problems would require selecting elements

H. S. Teoh via Digitalmars-d (8/25) May 01 2017 [...]

Andrea Fontana (6/24) May 01 2017 If you find an element that doesn't satisfy P(x) move it on top

Andrea Fontana (2/5) May 01 2017 Whoops i mean uniform(1, r.length) of course :)

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/21) May 01 2017 I guess you meant to the end, but if mutating the array is ok,

H. S. Teoh via Digitalmars-d (8/31) May 01 2017 [...]

Ivan Kazmenko (50/84) May 01 2017 I'd like to note here that, if you make use of the same P(x) many

H. S. Teoh via Digitalmars-d (23/82) May 01 2017 Unfortunately, P(x) could be different each time (sorry, should have

MysticZach (43/55) May 01 2017 Here's how I would do it:

MysticZach (7/24) May 01 2017 The line above has a bug. Replace it with:
MysticZach (6/16) May 01 2017 Now I'm questioning this, because if the positive cases are

Ivan Kazmenko (8/24) May 02 2017 A pity; it sounded nice!

Ivan Kazmenko (9/12) May 02 2017 I tried to implement a similar approach, but instead of a queue

MysticZach (24/37) May 02 2017 Well, I thought of another idea that may not be technically

MysticZach (39/52) May 02 2017 Taking this a step further, it occurred to me that you could use

Timon Gehr (9/57) May 02 2017 Counterexample: [1,1,1,0,0]

MysticZach (10/31) May 02 2017 skipLength is determined modulo hop, thus it can't be more than

MysticZach (5/23) May 03 2017 Actually, I did botch the implementation. The hopping interval
Timon Gehr (9/26) May 04 2017 It's not though. For example, [1,1,1,0,...,0] (length 29), you get 0 and...

MysticZach (7/30) May 04 2017 Or how close it can get, depending on the range of intervals

MysticZach (2/20) May 04 2017 Within a small range of error, I mean.
H. S. Teoh via Digitalmars-d (13/16) May 12 2017 Of course there is. The random permutation approach works. In this

Ola Fosheim Grostad (2/5) May 12 2017 I've told you all you need to know...

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

I'm been thinking about the following problem, and thought I'd pick the
brains of the bright people around these parts...

Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in, what's
the best algorithm (in terms of big-O time complexity) for selecting a
random element x satisfying P(x), such that elements that satisfy P(x)
have equal probability of being chosen? (Elements that do not satisfy
P(x) are disregarded.)

Which elements of A satisfy P(x) is not known ahead of time, nor is the
relative proportion of elements that satisfy P(x) or not.

Note that A is considered to be immutable (swapping elements or
otherwise changing A is not allowed).

So far, I have come up with the following algorithms:

1) "Random shooting":

	auto r = ... /* elements of A */
	for (;;) {
		auto i = uniform(0, r.length);
		if (P(r[i])) return r[i];
	}

   Advantages: If a large percentage of elements in A satisfy P(x), then
   the loop will terminate within a small number of iterations; if the
   majority of elements satisfy P(x), this will terminate well below n
   iterations.

   Disadvantages: If only a small percentage of elements satisfy P(x),
   then this loop could take arbitrarily long to terminate, and it will
   not terminate at all if no elements satisfy P(x).

2) "Turtle walk":

	auto r = ... /* elements of A */
	int nSatisfy = 0;
	ElementType!A currentChoice;
	bool found = false;
	foreach (e; r) {
		if (P(e)) {
			if (uniform(0, nSatisfy) == 0)
			{
				currentChoice = e;
				found = true;
			}
			nSatisfy++;
		}
	}
	if (found) return currentChoice;
	else ... /* no elements satisfy P(x) */

   Advantages: If there is any element at all in A that satisfies P(x),
   it will be found. The loop always terminates after n iterations.

   Disadvantages: Even if 99% of elements in A satisfies P(x), we still
   have to traverse the entire data set before we terminate. (We cannot
   terminate before that, otherwise the probability of elements
   satisfying P(x) being selected will not be even.)

3) Permutation walk:

	auto r = ... /* elements of A */
	foreach (i; iota(0 .. r.length).randomPermutation) {
		if (P(r[i])) return r[i];
	}
	/* no elements satisfy P(x) */

   Advantages: if an element that satisfies P(x) is found early, the
   loop will terminate before n iterations. This seems like the best of
   both worlds of (1) and (2), except:

   Disadvantages: AFAIK, generating a random permutation of indices from
   0 .. n requires at least O(n) time, so any advantage we may have had
   seems to be negated.

Is there an algorithm for *incrementally* generating a random
permutation of indices? If there is, we could use that in (3) and thus
achieve the best of both worlds between early termination if an element
satisfying P(x) is found, and guaranteeing termination after n
iterations if no elements satisfying P(x) exists.


T

-- 
The early bird gets the worm. Moral: ewww...

Apr 30 2017

Jack Stouffer <jack jackstouffer.com> writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 2) "Turtle walk":

I'd just like to point out that this algorithm doesn't satisfy

such that elements that satisfy P(x) have equal probability of 
being chosen

as the first element found in A will be chosen, and then each 
subsequent element has a decreasing probability of replacing the 
first element of P = 1/nSatisfy.

Apr 30 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Mon, May 01, 2017 at 05:03:17AM +0000, Jack Stouffer via Digitalmars-d wrote:
 On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 2) "Turtle walk":

 
 I'd just like to point out that this algorithm doesn't satisfy
 
 such that elements that satisfy P(x) have equal probability of being
 chosen

 
 as the first element found in A will be chosen, and then each
 subsequent element has a decreasing probability of replacing the first
 element of P = 1/nSatisfy.

Actually, this is exactly why it *does* satisfy the requirement.

If there is exactly 1 element that satisfies P(x), then the probability
of it being picked is 1.

If there are 2 elements, the first element is picked with probability 1
initially, but the 2nd element has a 1/2 chance of replacing it, meaning
the overall probability distribution is 1/2 and 1/2.

If there are 3 elements, then by the time the 2nd element is processed,
the first 2 elements are equally likely to be currently chosen (with
probability 1/2 for each); when the 3rd element is found, it has 1/3
chance of replacing the previous choice, meaning there is a 2/3 chance
the previous choice prevails. In that case, the 2/3 chance is equally
distributed between the first two elements (they were picked with 1/2
and 1/2 probability), so the effective probability is 1/3 each after the
3rd element is processed.

In general, by the time you process the n'th element, the previous
elements would have had been picked with 1/(n-1) chance each, and the
n'th element would be picked with 1/n chance, meaning there is a (n-1)/n
chance the previous choice prevails. That (n-1)/n chance is equally
distributed among the previous (n-1) elements, so effectively they are
picked with 1/n chance each.

The key here is that the probability that the n'th element is *not*
picked is exactly (n-1)/n, which, by inductive hypothesis, must be
evenly distributed among the previous (n-1) candidates, thereby making
their *eventual* probability of remaining as the chosen element exactly
1/n.


T

-- 
Two wrongs don't make a right; but three rights do make a left...

Apr 30 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 Which elements of A satisfy P(x) is not known ahead of time, 
 nor is the
 relative proportion of elements that satisfy P(x) or not.

O(N) given P(x)===false for all x...

What you want is probably average case analysis, not worst case?

May 01 2017

John Colvin <john.loughran.colvin gmail.com> writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 I'm been thinking about the following problem, and thought I'd 
 pick the brains of the bright people around these parts...

 Given a set A of n elements (let's say it's a random-access 
 range of
 size n, where n is relatively large), and a predicate P(x) that
 specifies some subset of A of elements that we're interested 
 in, what's
 the best algorithm (in terms of big-O time complexity) for 
 selecting a
 random element x satisfying P(x), such that elements that 
 satisfy P(x)
 have equal probability of being chosen? (Elements that do not 
 satisfy
 P(x) are disregarded.)

 Which elements of A satisfy P(x) is not known ahead of time, 
 nor is the
 relative proportion of elements that satisfy P(x) or not.

 Note that A is considered to be immutable (swapping elements or
 otherwise changing A is not allowed).

 So far, I have come up with the following algorithms:

 1) "Random shooting":

 	auto r = ... /* elements of A */
 	for (;;) {
 		auto i = uniform(0, r.length);
 		if (P(r[i])) return r[i];
 	}

    Advantages: If a large percentage of elements in A satisfy 
 P(x), then
    the loop will terminate within a small number of iterations; 
 if the
    majority of elements satisfy P(x), this will terminate well 
 below n
    iterations.

    Disadvantages: If only a small percentage of elements 
 satisfy P(x),
    then this loop could take arbitrarily long to terminate, and 
 it will
    not terminate at all if no elements satisfy P(x).

You can recover O(n) calls to P by caching the results. You can 
recover O(n) for both P and rng by progressively removing 
elements from r or any other method that results in an easily 
samplable set of all elements of r not yet visited (and therefore 
possible candidates).

I like this one (warning, just thought of, untested):

auto r = ... /* elements of A */
auto nRemaining = r.length;
while (nRemaining) {
	auto i = uniform(0, nRemaining);
	if (P(r[i])) return r[i];
	else swap(r[i], r[--nRemaining]);
}

 3) Permutation walk:

 	auto r = ... /* elements of A */
 	foreach (i; iota(0 .. r.length).randomPermutation) {
 		if (P(r[i])) return r[i];
 	}
 	/* no elements satisfy P(x) */

    Advantages: if an element that satisfies P(x) is found 
 early, the
    loop will terminate before n iterations. This seems like the 
 best of
    both worlds of (1) and (2), except:

    Disadvantages: AFAIK, generating a random permutation of 
 indices from
    0 .. n requires at least O(n) time, so any advantage we may 
 have had
    seems to be negated.

 Is there an algorithm for *incrementally* generating a random 
 permutation of indices? If there is, we could use that in (3) 
 and thus achieve the best of both worlds between early 
 termination if an element satisfying P(x) is found, and 
 guaranteeing termination after n iterations if no elements 
 satisfying P(x) exists.


 T

Yes. As a matter of fact it would be the logical extension of my 
algorithm above for when you can't or don't want to change r 
(again, only just thought of this, untested...):

auto r = ... /* elements of A */
auto indices = iota(r.length).array;
auto nRemaining = r.length;
while (nRemaining) {
	auto i = uniform(0, nRemaining);
	if (P(r[indices[i]])) return r[indices[i]];
	else swap(indices[i], indices[--nRemaining]);
}

You could also do the same thing but with an array of pointers to 
elements of r instead of indices.

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 08:42:05 UTC, John Colvin wrote:
 I like this one (warning, just thought of, untested):

 auto r = ... /* elements of A */
 auto nRemaining = r.length;
 while (nRemaining) {
 	auto i = uniform(0, nRemaining);
 	if (P(r[i])) return r[i];
 	else swap(r[i], r[--nRemaining]);
 }


Yes, this is the standard text-book way of doing it, still O(N) 
of course. The other standard-text-book way is to generate an 
array of indexes and mutate that instead, still O(N). If you 
cache in a heap you get O(N log N).

Anyway, this kind of worst case analysis doesn't really help. And 
neither does average case, which will be O(1) assuming 50% match 
P.

So you need is to specify what kind of average case analysis you 
want. For instance expressed as f(N,S)  where N is the number of 
elements and S is the number of elements that satisfies P.

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 09:01:33 UTC, Ola Fosheim Grøstad wrote:
 array of indexes and mutate that instead, still O(N). If you 
 cache in a heap you get O(N log N).

To avoid confusion: I didn't mean a literal "heap", but some kind 
of search structure that tracks and draws from unused indice 
ranges in a way that does not require O(N) initialization and can 
be implemented as a single array on the stack. I don't remember 
what it is called, but you draw on each node based on the number 
of children (kind of like the ones used in markov-chain type 
situations). I assume this can be maintained in O(log N) per 
iteration with O(1) initialization.

E.g.

1. push_indices(0,N-1) //data (0,N-1)
2. i = draw_and_remove_random_index() // data could now be 
(0,i-1), (i+1,N-1)
3. if( i == -1 ) return
4. dostuff(i)
5. goto 2


Even though it is O(N log N) it could outperform other solutions 
if you only want to draw a small number of elements. And a small 
array with a linear search (tracking used indices) would 
outperform that if you only want to draw say 8 elements. If you 
want to draw close to N elements then you would be better off 
just randomizing a full array (with elements from 0 to N-1). Or 
you could do all of these, start with linear search, then convert 
to a tree like search, then convert to full enumeration.

In general I don't think asymptotical analysis will do much good. 
I think you need to look on actual costs for typical N, P and 
different distributions of satisfied P, e.g.

cost of loop iteration: 10
average cost of P(x)=>false: 1
average cost of P(x)=>true: 1000
probability for P(a[i]) for each i

May 01 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Mon, May 01, 2017 at 08:42:05AM +0000, John Colvin via Digitalmars-d wrote:
[...]
 You can recover O(n) calls to P by caching the results.

Sorry, I forgot to specify that P(x) can be different each time.

Also, the input A may also be different each time, albeit remain the
same length (though the expectation is that it will be mostly the same
-- not that it matters, though, I don't think any of the proposed
solutions here count on that).


 You can recover O(n) for both P and rng by progressively removing
 elements from r or any other method that results in an easily
 samplable set of all elements of r not yet visited (and therefore
 possible candidates).

[...]

Yes, I realize that if r was mutable, there are existing simple
solutions. The problem is that it's not mutable, or rather, should not
be mutated by this selection algorithm.


[...]
 Yes. As a matter of fact it would be the logical extension of my
 algorithm above for when you can't or don't want to change r (again,
 only just thought of this, untested...):
 
 auto r = ... /* elements of A */
 auto indices = iota(r.length).array;
 auto nRemaining = r.length;
 while (nRemaining) {
 	auto i = uniform(0, nRemaining);
 	if (P(r[indices[i]])) return r[indices[i]];
 	else swap(indices[i], indices[--nRemaining]);
 }
 
 You could also do the same thing but with an array of pointers to
 elements of r instead of indices.

The trouble with this is that you have to allocate an array of indices,
and r.length is rather large, so creating this array is O(n) each time.
But I think Ivan Kazmenko has an excellent idea to cache this index
array. Since, by symmetry, it doesn't matter if the starting indices
array was the identity permutation, we could just initialize this array
once, and the next time just skip to the loop directly (reusing the
previous (possibly partially) permuted indices as the starting point).
It does require O(n) memory, which is unfortunate, but at least we only
pay for that once.

I was hoping for something more along the lines of a random permutation
algorithm with sublinear (if not O(1)) space complexity that can return
elements of the permutation consecutively on-demand, like a lazy range.
Something along the lines of:

	auto seed = ... /* get random number */
	auto lazyRange = generatePermutation(seed, n);

where generatePermutation is some incremental algorithm that produces a
unique permutation of 0 .. n per seed value. Ideally, this algorithm
would only do as much work as is necessary to produce the first k
elements of the permutation, so that if the main loop terminates early
(we found an element satisfying P(x)) we don't waste time generating the
rest of the permutation.


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured
megaliths that address all questions by piling on ridiculous internal links in
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 Is there an algorithm for *incrementally* generating a random 
 permutation of indices?

Does it exist? Yes, because you can build permutation tables for 
it, but you'll have to find the closed form for it that is 
efficient... Can you do that without selecting a specific N? It 
is easy for N=2 and N=3 at least.

E.g.

For array length 2 you get the 2 permutations: 0 1, 1 0
Then you just select one of them at random (you know the number 
of permutations)

So if it exists you'll should probably do a search for 
permutations. "How do I efficiently generate permutation number N"

Of course for smaller N you could generate a big array of bytes 
and compress it by having arrays of slices into it (as many 
permutations will share index sequences).

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 08:44:25 UTC, Ola Fosheim Grøstad wrote:
 Does it exist? Yes, because you can build permutation tables 
 for it, but you'll have to find the closed form for it that is 
 efficient... Can you do that without selecting a specific N? It 
 is easy for N=2 and N=3 at least.

 E.g.

 For array length 2 you get the 2 permutations: 0 1, 1 0
 Then you just select one of them at random (you know the number 
 of permutations)

 So if it exists you'll should probably do a search for 
 permutations. "How do I efficiently generate permutation number 
 N"

 Of course for smaller N you could generate a big array of bytes 
 and compress it by having arrays of slices into it (as many 
 permutations will share index sequences).

Keep also in mind that you can split the original array in two, 
so it might be sufficient to  have the permutations for various 
2^M sizes if those are easier to generate (my guess, maybe 
through some xor magic?)

E.g. if N = 21 then it can be split into 16 + (4 + 1)

So you can draw like this:

1. randomly select group_16 over group_5 based with 16/21 
probability)

2. if group_5 selected: ramdomly select group_4 over group 1 on 
4/5 probability

etc.

That way you only need log(N) permutation sequences to keep track 
of. Which for all practical purposes are going to stay pretty low 
(<20?)

Right?

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 10:51:46 UTC, Ola Fosheim Grøstad wrote:
 Keep also in mind that you can split the original array in two, 
 so it might be sufficient to  have the permutations for various 
 2^M sizes if those are easier to generate (my guess, maybe 
 through some xor magic?)

 E.g. if N = 21 then it can be split into 16 + (4 + 1)

 So you can draw like this:

 1. randomly select group_16 over group_5 based with 16/21 
 probability)

 2. if group_5 selected: ramdomly select group_4 over group 1 on 
 4/5 probability

 etc.

 That way you only need log(N) permutation sequences to keep 
 track of. Which for all practical purposes are going to stay 
 pretty low (<20?)

 Right?


But if you don't use a hardware random generator and are happy 
with more sloppy pseudo-randomness then you would be better of 
ditching the whole concept of permutations and replace it with 
some cylic random generators intead using the same technique I 
just outlined.

E.g. find a set of cyclic random generators and break down N into 
a sum of these cycle-sizes, then just keep track of the seed for 
each. If they are 2^N then you could use xor to get more 
randomness between runs.

Also in the algorithms above you need to change the probabilities 
each time you take away one index from a group (you don't want to 
draw from an empty group).

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 11:08:56 UTC, Ola Fosheim Grøstad wrote:
 E.g. find a set of cyclic random generators and break down N 
 into a sum of these cycle-sizes, then just keep track of the 
 seed for each. If they are 2^N then you could use xor to get 
 more randomness between runs.

 Also in the algorithms above you need to change the 
 probabilities each time you take away one index from a group 
 (you don't want to draw from an empty group).

Well, actually, just select the single cyclic generator that is 
larger or equal to N, then just redraw if it returns a value >= 
N. Duh! Sloppy thinking. I would think you should be able to find 
some prime sized ones that will get the next index with an 
insignificant number of redraws.

But permutations is the way to go if you want scientific quality.

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 11:15:28 UTC, Ola Fosheim Grøstad wrote:
 But permutations is the way to go if you want scientific 
 quality.

I take that back:

«Polynomial pseudo-random number generator via cyclic phase»
http://www.sciencedirect.com/science/article/pii/S0378475409001463

Might be what you are looking for and if not it probably 
references other papers that fits the bill.

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

Ah, found a sensible search term: «pseudorandom permutation»

http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/FCrypto/slides/LubyRackoff.pdf

If you go down this path, please share links. Useful stuff 
actually.

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 11:57:32 UTC, Ola Fosheim Grøstad wrote:
 Ah, found a sensible search term: «pseudorandom permutation»

 http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/FCrypto/slides/LubyRackoff.pdf

 If you go down this path, please share links. Useful stuff 
 actually.

A readable paper for most people:
http://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf

«some users would prefer a much stronger property analogous to
uniformity: that over the full period of the generator, every 
possible k-tuple will occur,
and it will occur the same number of times.»

PCG claims to have arbitrary cycle sizes.

I haven't looked at it, but it is available as C/C++.

May 01 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 Given a set A of n elements (let's say it's a random-access 
 range of
 size n, where n is relatively large), and a predicate P(x) that
 specifies some subset of A of elements that we're interested 
 in, what's
 the best algorithm (in terms of big-O time complexity) for 
 selecting a
 random element x satisfying P(x), such that elements that 
 satisfy P(x)
 have equal probability of being chosen? (Elements that do not 
 satisfy
 P(x) are disregarded.)

 Which elements of A satisfy P(x) is not known ahead of time, 
 nor is the
 relative proportion of elements that satisfy P(x) or not.

Works only with a random access range and I haven't done the full 
analysis (and I'm a bit rusty on probability), but my first 
thought was random divide and conquer:

ElementType!A* select(A,P)(A r)
{
   // Recursion anchor
   if (r.length == 1) {
     if (P(r[0])) return r[0];
     else return null;
   // Recurse randomly with p=0.5 into either the left, or right 
half of r
   } else {
     ElementType!A* e;
     ElementType!A[][2] half;
     half[0] = r[0..floor(r.length/2)];
     half[1] = r[ceil(r.length/2)..$];

     ubyte coinFlip = uniform(0,1) > 0;
     // Recurse in one half and terminate if e is found there
     e = select(half[coinFlip]);
     if (e) return e;
     // Otherwise, recurse into other half
     return select(half[1 - coinFlip]);
   }
}

As stated above, I haven't done the full analysis, but 
intuitively speaking (which can be wrong, of course), the p=0.5 
recursion ought satisfy the condition of elements satisfying P(x) 
being chosen uniformly; also intuitively, I'd expect the expected 
runtime for a uniform distribution of elements satisfying P(x) to 
be around O(log N).
Worst-case would be that it has to inspect every element in r 
once => O(N)

May 01 2017

Timon Gehr <timon.gehr gmx.ch> writes:

On 01.05.2017 11:51, Moritz Maxeiner wrote:
 On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 Given a set A of n elements (let's say it's a random-access range of
 size n, where n is relatively large), and a predicate P(x) that
 specifies some subset of A of elements that we're interested in, what's
 the best algorithm (in terms of big-O time complexity) for selecting a
 random element x satisfying P(x), such that elements that satisfy P(x)
 have equal probability of being chosen? (Elements that do not satisfy
 P(x) are disregarded.)

 Which elements of A satisfy P(x) is not known ahead of time, nor is the
 relative proportion of elements that satisfy P(x) or not.

 Works only with a random access range and I haven't done the full
 analysis (and I'm a bit rusty on probability), but my first thought was
 random divide and conquer:

 ElementType!A* select(A,P)(A r)
 {
   // Recursion anchor
   if (r.length == 1) {
     if (P(r[0])) return r[0];
     else return null;
   // Recurse randomly with p=0.5 into either the left, or right half of r
   } else {
     ElementType!A* e;
     ElementType!A[][2] half;
     half[0] = r[0..floor(r.length/2)];
     half[1] = r[ceil(r.length/2)..$];

     ubyte coinFlip = uniform(0,1) > 0;

This deterministically chooses 0. (uniform is right-exclusive.) I assume 
you mean uniform(0,2).

     // Recurse in one half and terminate if e is found there
     e = select(half[coinFlip]);
     if (e) return e;
     // Otherwise, recurse into other half
     return select(half[1 - coinFlip]);
   }
 }

 As stated above, I haven't done the full analysis, but intuitively
 speaking (which can be wrong, of course), the p=0.5 recursion ought
 satisfy the condition of elements satisfying P(x) being chosen
 uniformly;

Counterexample: [1,0,1,1].

The first element is chosen with probability 1/2, which is not 1/3.

 also intuitively, I'd expect the expected runtime for a
 uniform distribution of elements satisfying P(x)  to be around O(log N).

I don't understand this input model.

 Worst-case would be that it has to inspect every element in r once => O(N)

May 01 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Monday, 1 May 2017 at 09:58:39 UTC, Timon Gehr wrote:
 On 01.05.2017 11:51, Moritz Maxeiner wrote:
     [...]

 This deterministically chooses 0. (uniform is right-exclusive.) 
 I assume you mean uniform(0,2).

Yes.

 [...]

 Counterexample: [1,0,1,1].

 The first element is chosen with probability 1/2, which is not 
 1/3.

Dang. Serves me right for not doing the full analysis. Thanks for 
the counter example and sorry for wasting your time.

 [...]

 I don't understand this input model.

Since the idea was nonsense, it doesn't matter, anyway.

May 01 2017

Mike B Johnson <Mikey Ikes.com> writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 I'm been thinking about the following problem, and thought I'd 
 pick the brains of the bright people around these parts...

 [...]


Since most real world problems would require selecting elements 
more than once it may be far more efficient to sort by 
P(x)(filter) then simply select a random element.

e.g.,

a = A.filter(x->P(x))  // Creates a new set a where only the 
elements of A that satisfy P(x) are added

...

e = a.random;


this is O(1) if you only have to filter once(you can create a 
container that always "sorts" on P(x) so to speak.. like a sorted 
dictionary).

May 01 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Mon, May 01, 2017 at 11:32:19AM +0000, Mike B Johnson via Digitalmars-d
wrote:
[...]
 Since most real world problems would require selecting elements more
 than once it may be far more efficient to sort by P(x)(filter) then
 simply select a random element.
 
 e.g.,
 
 a = A.filter(x->P(x))  // Creates a new set a where only the elements
 of A that satisfy P(x) are added
 
 ...
 
 e = a.random;
 
 
 this is O(1) if you only have to filter once(you can create a
 container that always "sorts" on P(x) so to speak.. like a sorted
 dictionary).

[...]

Sorry, I forgot to specify that P(x) may change each time, as may the
input A. So caching would be of little help in this case.


T

-- 
This is a tpyo.

May 01 2017

Andrea Fontana <nospam example.com> writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 1) "Random shooting":

 	auto r = ... /* elements of A */
 	for (;;) {
 		auto i = uniform(0, r.length);
 		if (P(r[i])) return r[i];
 	}

    Advantages: If a large percentage of elements in A satisfy 
 P(x), then
    the loop will terminate within a small number of iterations; 
 if the
    majority of elements satisfy P(x), this will terminate well 
 below n
    iterations.

    Disadvantages: If only a small percentage of elements 
 satisfy P(x),
    then this loop could take arbitrarily long to terminate, and 
 it will
    not terminate at all if no elements satisfy P(x).

If you find an element that doesn't satisfy P(x) move it on top 
of array (swap!) and then try again with uniform(1, r.length - 1) 
and so on.

It terminates in this way.

Andrea

May 01 2017

Andrea Fontana <nospam example.com> writes:

On Monday, 1 May 2017 at 12:31:48 UTC, Andrea Fontana wrote:
 If you find an element that doesn't satisfy P(x) move it on top 
 of array (swap!) and then try again with uniform(1, r.length - 
 1) and so on.


Whoops i mean uniform(1, r.length) of course :)

May 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Monday, 1 May 2017 at 12:38:32 UTC, Andrea Fontana wrote:
 On Monday, 1 May 2017 at 12:31:48 UTC, Andrea Fontana wrote:
 If you find an element that doesn't satisfy P(x) move it on 
 top of array (swap!) and then try again with uniform(1, 
 r.length - 1) and so on.


 Whoops i mean uniform(1, r.length) of course :)

I guess you meant to the end, but if mutating the array is ok, 
then you don't have to swap. Just move in the last element and 
chop 1 off the length.

Another cache-friendly mutation solution (assuming PRNG is cheap) 
is to have two phases:

Phase 1:
linear walkover the array and select elements with P(a[i])==true 
with probability 1/N
the overwrite the ones with P(a[i]) = false.

when done truncate the array.

Phase 2:
regular random selection from the array were all elements satisfy 
P.

The OP underspecified the requirements... There is a gazillion 
variations... :-P

May 01 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Mon, May 01, 2017 at 12:31:48PM +0000, Andrea Fontana via Digitalmars-d
wrote:
 On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 1) "Random shooting":
 
 	auto r = ... /* elements of A */
 	for (;;) {
 		auto i = uniform(0, r.length);
 		if (P(r[i])) return r[i];
 	}
 
    Advantages: If a large percentage of elements in A satisfy P(x), then
    the loop will terminate within a small number of iterations; if the
    majority of elements satisfy P(x), this will terminate well below n
    iterations.
 
    Disadvantages: If only a small percentage of elements satisfy P(x),
    then this loop could take arbitrarily long to terminate, and it will
    not terminate at all if no elements satisfy P(x).
 

 
 If you find an element that doesn't satisfy P(x) move it on top of array
 (swap!) and then try again with uniform(1, r.length - 1) and so on.
 
 It terminates in this way.

[...]

The problem is that swapping elements is not allowed for this particular
problem.  If swapping was allowed this would be a much easier problem to
solve. :-)


T

-- 
Chance favours the prepared mind. -- Louis Pasteur

May 01 2017

Ivan Kazmenko <gassa mail.ru> writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 Given a set A of n elements (let's say it's a random-access 
 range of
 size n, where n is relatively large), and a predicate P(x) that
 specifies some subset of A of elements that we're interested 
 in, what's
 the best algorithm (in terms of big-O time complexity) for 
 selecting a
 random element x satisfying P(x), such that elements that 
 satisfy P(x)
 have equal probability of being chosen? (Elements that do not 
 satisfy
 P(x) are disregarded.)

I'd like to note here that, if you make use of the same P(x) many 
times (instead of different predicates on each call), it makes 
sense to spend O(n) time and memory filtering by that predicate 
and storing the result, and then answer each query in O(1).

 3) Permutation walk:

 	auto r = ... /* elements of A */
 	foreach (i; iota(0 .. r.length).randomPermutation) {
 		if (P(r[i])) return r[i];
 	}
 	/* no elements satisfy P(x) */

    Advantages: if an element that satisfies P(x) is found 
 early, the
    loop will terminate before n iterations. This seems like the 
 best of
    both worlds of (1) and (2), except:

    Disadvantages: AFAIK, generating a random permutation of 
 indices from
    0 .. n requires at least O(n) time, so any advantage we may 
 have had
    seems to be negated.

 Is there an algorithm for *incrementally* generating a random 
 permutation of indices? If there is, we could use that in (3) 
 and thus achieve the best of both worlds between early 
 termination if an element satisfying P(x) is found, and 
 guaranteeing termination after n iterations if no elements 
 satisfying P(x) exists.

Yes, there is.

There are actually two variations of Fisher-Yates shuffle:
(https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)

1.
     auto p = n.iota.array;
     foreach (pos; 0..n) {
         auto otherPos = uniform (0, pos + 1);
         swap (p[pos], p[otherPos]);
     }

When we look at this after k-th iteration, the first k elements 
are randomly and uniformly permuted, and the rest (n-k) are left 
untouched.

2.
     auto p = n.iota.array;
     foreach (pos; 0..n) {
         auto otherPos = uniform (pos, n);
         swap (p[pos], p[otherPos]);
     }

When we look at this after k-th iteration, the first k elements 
are a random combination of all n elements, and this combination 
is randomly and uniformly permuted.  So, the second variation is 
what we need: each new element is randomly and uniformly selected 
from all the elements left.  Once we get the first element 
satisfying the predicate, we can just terminate the loop.  If 
there are m out of n elements satisfying the predicate, the 
average number of steps is n/m.

Now, the problem is that both of these allocate n "size_t"-s of 
memory to start with.  And your problem does not allow to shuffle 
the elements of the original array in place, so we do need an 
external permutation for these algorithms.  However, there are at 
least two ways to mitigate that:

(I)
We can allocate the permutation once using n time and memory, and 
then, on every call, just reuse it in its current state in n/m 
time.  It does not matter if the permutation is not identity 
permutation: by symmetry argument, any starting permutation will 
do just fine.

(II)
We can store the permutation p in an associative array instead of 
a regular array, actually storing only the elements accessed at 
least once, and assuming other elements to satisfy the identity 
p[x] = x.  So, if we finish in n/m steps on average, the time and 
extra memory used will be O(n/m) too.  I can put together an 
example implementation if this best satisfies your requirements.

Ivan Kazmenko.

May 01 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Mon, May 01, 2017 at 02:38:09PM +0000, Ivan Kazmenko via Digitalmars-d wrote:
 On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 Given a set A of n elements (let's say it's a random-access range of
 size n, where n is relatively large), and a predicate P(x) that
 specifies some subset of A of elements that we're interested in,
 what's the best algorithm (in terms of big-O time complexity) for
 selecting a random element x satisfying P(x), such that elements
 that satisfy P(x) have equal probability of being chosen? (Elements
 that do not satisfy P(x) are disregarded.)

 
 I'd like to note here that, if you make use of the same P(x) many
 times (instead of different predicates on each call), it makes sense
 to spend O(n) time and memory filtering by that predicate and storing
 the result, and then answer each query in O(1).

Unfortunately, P(x) could be different each time (sorry, should have
stated this explicitly), and A may also change in between calls to this
random selection algorithm.  So filtering would not be a good approach.


[...]
 There are actually two variations of Fisher-Yates shuffle:
 (https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)
 
 1.
     auto p = n.iota.array;
     foreach (pos; 0..n) {
         auto otherPos = uniform (0, pos + 1);
         swap (p[pos], p[otherPos]);
     }
 
 When we look at this after k-th iteration, the first k elements are
 randomly and uniformly permuted, and the rest (n-k) are left
 untouched.
 
 2.
     auto p = n.iota.array;
     foreach (pos; 0..n) {
         auto otherPos = uniform (pos, n);
         swap (p[pos], p[otherPos]);
     }
 
 When we look at this after k-th iteration, the first k elements are a
 random combination of all n elements, and this combination is randomly
 and uniformly permuted.  So, the second variation is what we need:
 each new element is randomly and uniformly selected from all the
 elements left.  Once we get the first element satisfying the
 predicate, we can just terminate the loop.  If there are m out of n
 elements satisfying the predicate, the average number of steps is n/m.

But wouldn't creating the array p already be O(n)? Albeit, somewhat
faster than n iterations of the main loop since copying iota ought to be
somewhat cheaper than generating a random number and swapping two
elements.


 Now, the problem is that both of these allocate n "size_t"-s of memory
 to start with.  And your problem does not allow to shuffle the
 elements of the original array in place, so we do need an external
 permutation for these algorithms.  However, there are at least two
 ways to mitigate that:
 
 (I)
 We can allocate the permutation once using n time and memory, and
 then, on every call, just reuse it in its current state in n/m time.
 It does not matter if the permutation is not identity permutation: by
 symmetry argument, any starting permutation will do just fine.

I like this idea very much. This lets us only pay once for creating the
index array, and reuse it almost "for free" thereafter.

Still, the O(n) space requirement could potentially be onerous, since n
is expected to be rather large.


 (II)
 We can store the permutation p in an associative array instead of a
 regular array, actually storing only the elements accessed at least
 once, and assuming other elements to satisfy the identity p[x] = x.
 So, if we finish in n/m steps on average, the time and extra memory
 used will be O(n/m) too.  I can put together an example implementation
 if this best satisfies your requirements.

[...]

This is interesting, and relaxes the O(n) space requirement initially.
After enough queries, though, I'd expect the AA to eventually grow to n
elements (as there is a chance some queries will end up finding no
elements that satisfy P(x) so it would generate the entire permutation).

But I like your idea of reusing the index array.  It's definitely a
possibility!


T

-- 
Your inconsistency is the only consistent thing about you! -- KD

May 01 2017

MysticZach <reachzach ggmail.com> writes:

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
 Given a set A of n elements (let's say it's a random-access 
 range of
 size n, where n is relatively large), and a predicate P(x) that
 specifies some subset of A of elements that we're interested 
 in, what's
 the best algorithm (in terms of big-O time complexity) for 
 selecting a
 random element x satisfying P(x), such that elements that 
 satisfy P(x)
 have equal probability of being chosen? (Elements that do not 
 satisfy
 P(x) are disregarded.)

Here's how I would do it:

// returns a random index of array satisfying P(x), -1 if not 
found
int randomlySatisfy(A[] array) {
    if (array.length == 0)
       return -1;
    int i = uniform(0, array.length);
    return randomlySatisfyImpl(array, i);
}

// recursive function
private int randomlySatisfyImpl(A[] da, int i) {
    if (P(da[i]))
       return i;
    if (da.length == 1)
       return -1;

    // choose a random partition proportionally
    auto j = uniform(da.length - 1);
    if (j < i) {
       // the lower partition
       int a = randomlySatisfyImpl(da[0..i], j);
       if (a != -1) return a;
       else return randomlySatisfyImpl(da[i+1 .. da.length], j - 
(i + 1));
    }
    else {
       // higher partition, investigate in reverse order
       int a = randomlySatisfyImpl(da[i+1 .. da.length], j - (i + 
1));
       if (a != -1) return i +1 + a;
       else return i + 1 + randomlySatisfyImpl(da[0..i], j);
    }
}

The goal is to have the first hit be the one you return. The 
method: if a random pick doesn't satisfy, randomly choose the 
partition greater than or less than based on 
uniform(0..array.length-1), and do the same procedure on that 
partition, reusing the random index to avoid having to call 
uniform twice on each recursion (one to choose a partition and 
one to choose an index within that partition). If the probability 
of choosing a partition is precisely proportional to the number 
of elements in that partition, it seems to me that the result 
will be truly random, but I'm not sure.

May 01 2017

MysticZach <reachzach ggmail.com> writes:

On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:
    // choose a random partition proportionally
    auto j = uniform(da.length - 1);
    if (j < i) {
       // the lower partition
       int a = randomlySatisfyImpl(da[0..i], j);
       if (a != -1) return a;
       else return randomlySatisfyImpl(da[i+1 .. da.length], j - 
 (i + 1));
    }
    else {
       // higher partition, investigate in reverse order
       int a = randomlySatisfyImpl(da[i+1 .. da.length], j - (i 
 + 1));
       if (a != -1) return i +1 + a;
       else return i + 1 + randomlySatisfyImpl(da[0..i], j);

The line above has a bug. Replace it with:

        else {
           a = randomlySatisfyImpl(da[0..i], j);
           return (a == -1) ? -1 : i + 1 + a;
        }

    }
 }

But the idea's the same. Hopefully it's clear.

May 01 2017

MysticZach <reachzach ggmail.com> writes:

On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:
 The goal is to have the first hit be the one you return. The 
 method: if a random pick doesn't satisfy, randomly choose the 
 partition greater than or less than based on 
 uniform(0..array.length-1), and do the same procedure on that 
 partition, reusing the random index to avoid having to call 
 uniform twice on each recursion (one to choose a partition and 
 one to choose an index within that partition). If the 
 probability of choosing a partition is precisely proportional 
 to the number of elements in that partition, it seems to me 
 that the result will be truly random, but I'm not sure.

Now I'm questioning this, because if the positive cases are 
unevenly distributed, i.e., [111111000100], the last one has 
about 50% chance to get picked instead of a 1 in 7 chance with my 
method. I guess you'd need to build a whole new array like the 
others are saying.

May 01 2017

Ivan Kazmenko <gassa mail.ru> writes:

On Monday, 1 May 2017 at 21:54:43 UTC, MysticZach wrote:
 On Monday, 1 May 2017 at 16:56:58 UTC, MysticZach wrote:
 The goal is to have the first hit be the one you return. The 
 method: if a random pick doesn't satisfy, randomly choose the 
 partition greater than or less than based on 
 uniform(0..array.length-1), and do the same procedure on that 
 partition, reusing the random index to avoid having to call 
 uniform twice on each recursion (one to choose a partition and 
 one to choose an index within that partition). If the 
 probability of choosing a partition is precisely proportional 
 to the number of elements in that partition, it seems to me 
 that the result will be truly random, but I'm not sure.

 Now I'm questioning this, because if the positive cases are 
 unevenly distributed, i.e., [111111000100], the last one has 
 about 50% chance to get picked instead of a 1 in 7 chance with 
 my method. I guess you'd need to build a whole new array like 
 the others are saying.

A pity; it sounded nice!
But yeah, once we pick the right ~half, we have to completely 
traverse it before paying any attention to the left ~half.

I hope some part of the idea is still salvageable.
For example, what if we put the intervals in a queue instead of a 
stack?

Ivan Kazmenko.

May 02 2017

Ivan Kazmenko <gassa mail.ru> writes:

On Tuesday, 2 May 2017 at 10:35:46 UTC, Ivan Kazmenko wrote:
 I hope some part of the idea is still salvageable.
 For example, what if we put the intervals in a queue instead of 
 a stack?

I tried to implement a similar approach, but instead of a queue 
or a stack, I used a random-access array of intervals.

Sadly, it is still far from uniform, since it does not take 
interval lengths into account, and I don't see how to do that 
without at least O(log n) time per interval insertion or deletion.

Implementation and empiric frequencies for n=5 elements in a 
permutation: http://ideone.com/3zSxLN

Ivan Kazmenko.

May 02 2017

MysticZach <reachzach ggmail.com> writes:

On Tuesday, 2 May 2017 at 11:27:17 UTC, Ivan Kazmenko wrote:
 On Tuesday, 2 May 2017 at 10:35:46 UTC, Ivan Kazmenko wrote:
 I hope some part of the idea is still salvageable.
 For example, what if we put the intervals in a queue instead 
 of a stack?

 I tried to implement a similar approach, but instead of a queue 
 or a stack, I used a random-access array of intervals.

 Sadly, it is still far from uniform, since it does not take 
 interval lengths into account, and I don't see how to do that 
 without at least O(log n) time per interval insertion or 
 deletion.

 Implementation and empiric frequencies for n=5 elements in a 
 permutation: http://ideone.com/3zSxLN

 Ivan Kazmenko.

Well, I thought of another idea that may not be technically 
random, but which I imagine would get pretty close in real world 
use cases:

1. Test a random element in the array. If it satisfies P, return 
it.
2. Choose a hopper interval, h, that is relatively prime to the 
number of elements in the array, n. You could do this by randomly 
selecting from a pre-made list of primes of various orders of 
magnitude larger than n, with h = the prime % n.
3. Hop along the array, testing each element as you go. Increment 
a counter. If you reach the end of the array, cycle back to the 
beginning starting with the remainder of h that you didn't use. I 
think that h being relatively prime means it will thereby hit 
every element in the array.
4. Return the first hit. If the counter reaches n, return failure.

The way I see it, with a random start position, and a *mostly* 
random interval, the only way to defeat the randomness of the 
result would be if the elements that satisfy P were dispersed 
precisely according to the random interval specified for that 
particular evocation of the function — in which case the first in 
the dispersed set would be chosen more often. This would require 
a rare convergence depending on both n and h, and would not 
repeat from one call to the next.

May 02 2017

MysticZach <reachzach ggmail.com> writes:

On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
 1. Test a random element in the array. If it satisfies P, 
 return it.
 2. Choose a hopper interval, h, that is relatively prime to the 
 number of elements in the array, n. You could do this by 
 randomly selecting from a pre-made list of primes of various 
 orders of magnitude larger than n, with h = the prime % n.
 3. Hop along the array, testing each element as you go. 
 Increment a counter. If you reach the end of the array, cycle 
 back to the beginning starting with the remainder of h that you 
 didn't use. I think that h being relatively prime means it will 
 thereby hit every element in the array.
 4. Return the first hit. If the counter reaches n, return 
 failure.

Taking this a step further, it occurred to me that you could use 
*any* hopping interval from 1 to array.length, if the length of 
the array were prime. So artificially extend the array and just 
skip a jump when you land in the extended area. And since you'd 
skip lots of jumps if the extension were any bigger that the 
hopping interval, reduce its length to modulo the hopping 
interval.

// returns a random index of array satisfying P(x), -1 if not 
found
int randomlySatisfy(A[] array) {
    auto length = array.length;
    if (!length)
       return -1;
    auto i = uniform(0, length);
    auto hop = uniform(1, length);

    // greatest prime < 2^^31, for simplicity
    enum prime = 2147483477;
    assert (length <= prime);

    auto skipLength = ((prime - length) % hop) + length;
    auto count = 0;

    for (;;) {
       if (P(a[i]))
          return i;
       ++count;
       if (count == length)
          return -1;
       i += hop;
       if (i < length)
          continue;
       if (i < skipLength)
          i += hop;
       i -= skipLength;
    }
    return -1;
}

This solution is stupidly simple and I haven't tested it. But I 
believe it's truly random, since the hopping interval is 
arbitrary. Who knows, it might work.

May 02 2017

Timon Gehr <timon.gehr gmx.ch> writes:

On 02.05.2017 22:42, MysticZach wrote:
 On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
 1. Test a random element in the array. If it satisfies P, return it.
 2. Choose a hopper interval, h, that is relatively prime to the number
 of elements in the array, n. You could do this by randomly selecting
 from a pre-made list of primes of various orders of magnitude larger
 than n, with h = the prime % n.
 3. Hop along the array, testing each element as you go. Increment a
 counter. If you reach the end of the array, cycle back to the
 beginning starting with the remainder of h that you didn't use. I
 think that h being relatively prime means it will thereby hit every
 element in the array.
 4. Return the first hit. If the counter reaches n, return failure.

 Taking this a step further, it occurred to me that you could use *any*
 hopping interval from 1 to array.length, if the length of the array were
 prime. So artificially extend the array and just skip a jump when you
 land in the extended area. And since you'd skip lots of jumps if the
 extension were any bigger that the hopping interval, reduce its length
 to modulo the hopping interval.

 // returns a random index of array satisfying P(x), -1 if not found
 int randomlySatisfy(A[] array) {
    auto length = array.length;
    if (!length)
       return -1;
    auto i = uniform(0, length);
    auto hop = uniform(1, length);

    // greatest prime < 2^^31, for simplicity
    enum prime = 2147483477;
    assert (length <= prime);

    auto skipLength = ((prime - length) % hop) + length;
    auto count = 0;

    for (;;) {
       if (P(a[i]))
          return i;
       ++count;
       if (count == length)
          return -1;
       i += hop;
       if (i < length)
          continue;
       if (i < skipLength)
          i += hop;

(I guess this should be 'while'.)

       i -= skipLength;
    }
    return -1;
 }

 This solution is stupidly simple and I haven't tested it. But I believe
 it's truly random, since the hopping interval is arbitrary. Who knows,
 it might work.

Counterexample: [1,1,1,0,0]

Your algorithm returns 0 with probability 7/20, 1 with probability 6/20 
and 2 with probability 7/20.

Note that there is a simple reason why your algorithm cannot work for 
this case: it picks one of 20 numbers at random. The resulting 
probability mass cannot be evenly distributed among the three elements, 
because 20 is not divisible by 3.

May 02 2017

MysticZach <reachzach ggmail.com> writes:

On Tuesday, 2 May 2017 at 21:00:36 UTC, Timon Gehr wrote:
 On 02.05.2017 22:42, MysticZach wrote:
 On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
    for (;;) {
       if (P(a[i]))
          return i;
       ++count;
       if (count == length)
          return -1;
       i += hop;
       if (i < length)
          continue;
       if (i < skipLength)
          i += hop;

 (I guess this should be 'while'.)

skipLength is determined modulo hop, thus it can't be more than 
one hop away.

 Counterexample: [1,1,1,0,0]

 Your algorithm returns 0 with probability 7/20, 1 with 
 probability 6/20 and 2 with probability 7/20.

 Note that there is a simple reason why your algorithm cannot 
 work for this case: it picks one of 20 numbers at random. The 
 resulting probability mass cannot be evenly distributed among 
 the three elements, because 20 is not divisible by 3.

That's true. Two points, though: If the range of error is within 
1/(n*(n-1)), with array length n, it may be close enough for a 
given real world use case. 2. n is known to be large, which makes 
1/(n*(n-1)) even smaller. You'd probably have to trade accuracy 
for performance. I wonder how much less performant a truly 
accurate algorithm would be. Also, whether there's a way to make 
an algorithm of this style completely accurate.

May 02 2017

MysticZach <reachzach ggmail.com> writes:

On Tuesday, 2 May 2017 at 23:09:54 UTC, MysticZach wrote:
 On Tuesday, 2 May 2017 at 21:00:36 UTC, Timon Gehr wrote:
 On 02.05.2017 22:42, MysticZach wrote:
 On Tuesday, 2 May 2017 at 13:44:03 UTC, MysticZach wrote:
    for (;;) {
       if (P(a[i]))
          return i;
       ++count;
       if (count == length)
          return -1;
       i += hop;
       if (i < length)
          continue;
       if (i < skipLength)
          i += hop;

 (I guess this should be 'while'.)

 skipLength is determined modulo hop, thus it can't be more than 
 one hop away.

Actually, I did botch the implementation. The hopping interval 
must be based on the prime rather than on n. But you could still 
short circuit the while loop, I think. `if (prime - n > hop) { 
skipLength = n + ((prime - n) % hop); }

May 03 2017

Timon Gehr <timon.gehr gmx.ch> writes:

On 03.05.2017 01:09, MysticZach wrote:
 Counterexample: [1,1,1,0,0]

 Your algorithm returns 0 with probability 7/20, 1 with probability
 6/20 and 2 with probability 7/20.

 Note that there is a simple reason why your algorithm cannot work for
 this case: it picks one of 20 numbers at random. The resulting
 probability mass cannot be evenly distributed among the three
 elements, because 20 is not divisible by 3.

 That's true. Two points, though: If the range of error is within
 1/(n*(n-1)), with array length n,

It's not though. For example, [1,1,1,0,...,0] (length 29), you get 0 and 
2 each with probability 43/116, but 1 only with probability 30/116.

It might be interesting to figure out how far from uniform the 
distribution can get.

 it may be close enough for a given
 real world use case. 2. n is known to be large, which makes 1/(n*(n-1))
 even smaller.  You'd probably have to trade accuracy for performance. I
 wonder how much less performant a truly accurate algorithm would be.
 Also, whether there's a way to make an algorithm of this style
 completely accurate.

For an accurate algorithm based on (unconditional) uniformly random 
sampling, the number of outcomes from sampling needs to be divisible by 
all numbers from 1 to n. (Because each of those could conceivably be the 
number of elements satisfying the predicate.)

May 04 2017

MysticZach <reachzach ggmail.com> writes:

On Thursday, 4 May 2017 at 08:04:22 UTC, Timon Gehr wrote:
 On 03.05.2017 01:09, MysticZach wrote:
 Counterexample: [1,1,1,0,0]

 Your algorithm returns 0 with probability 7/20, 1 with 
 probability
 6/20 and 2 with probability 7/20.

 Note that there is a simple reason why your algorithm cannot 
 work for
 this case: it picks one of 20 numbers at random. The resulting
 probability mass cannot be evenly distributed among the three
 elements, because 20 is not divisible by 3.

 That's true. Two points, though: If the range of error is 
 within
 1/(n*(n-1)), with array length n,

 It's not though. For example, [1,1,1,0,...,0] (length 29), you 
 get 0 and 2 each with probability 43/116, but 1 only with 
 probability 30/116.

 It might be interesting to figure out how far from uniform the 
 distribution can get.

Or how close it can get, depending on the range of intervals 
used. My math skill is shaky here.

Maybe there's no way to deterministically jump to every element 
of an array with equal probability of hitting any given element 
satisfying a given predicate first. It sure would be cool if 
there were.

May 04 2017

MysticZach <reachzach ggmail.com> writes:

On Thursday, 4 May 2017 at 13:19:43 UTC, MysticZach wrote:
 On Thursday, 4 May 2017 at 08:04:22 UTC, Timon Gehr wrote:
 On 03.05.2017 01:09, MysticZach wrote:
 That's true. Two points, though: If the range of error is 
 within
 1/(n*(n-1)), with array length n,

 It's not though. For example, [1,1,1,0,...,0] (length 29), you 
 get 0 and 2 each with probability 43/116, but 1 only with 
 probability 30/116.

 It might be interesting to figure out how far from uniform the 
 distribution can get.

 Or how close it can get, depending on the range of intervals 
 used. My math skill is shaky here.

 Maybe there's no way to deterministically jump to every element 
 of an array with equal probability of hitting any given element 
 satisfying a given predicate first. It sure would be cool if 
 there were.

Within a small range of error, I mean.

May 04 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, May 04, 2017 at 01:19:43PM +0000, MysticZach via Digitalmars-d wrote:
[...]
 Maybe there's no way to deterministically jump to every element of an
 array with equal probability of hitting any given element satisfying a
 given predicate first. It sure would be cool if there were.

Of course there is.  The random permutation approach works. In this
case, the data itself is not allowed to be permuted, but equivalent
semantics can be obtained by permuting an array of indices into the
data. But the challenge is how efficiently you can generate a random
permutation so that the cost of generating one doesn't outweigh the O(n)
linear scan algorithm.

I'm surprised there are no (known) incremental algorithms for generating
a random permutation of 0..n that requires less than O(n) space.


T

-- 
Life is complex. It consists of real and imaginary parts. -- YHL

May 12 2017

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Friday, 12 May 2017 at 18:43:53 UTC, H. S. Teoh wrote:
 I'm surprised there are no (known) incremental algorithms for 
 generating
 a random permutation of 0..n that requires less than O(n) space.

I've told you all you need to know...

May 12 2017

D Programming

C/C++ Programming

Other

digitalmars.D - [OT] Algorithm question