www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Mir Random and Dlang Ranges [Example]

reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
The discussion [2] about Mir Random [1] and Range API become a 
holy war [2]. I am putting the following example here. It also 
can be found at GitHub[1].

------
import std.range, std.stdio;

import random;
import random.variable: NormalVariable;

// Engines are allocated on stack or global

auto rng = Random(unpredictableSeed);

// Engines are passed by reference to algorithms

auto sample = rng

     // Random variables are passed by value

     .randomRange(NormalVariable!double(0, 1))

      // Fix sample length equal to 1000 elements (Input Range API)

     .take(1000)

     // Allocates memory and performs computation

     .array;

writeln(sample);
------

If you still think that Dlang Random API should be a Range API 
_by default_ instead of opCall, please submit a solid Range API 
solution, that will be fine for the real world applications such 
as Normal Variance Mean Mixtures [3], be  nogc  safe and be 
without data duplication. Only predefined ranges should be used 
for NormalVariable and the second distribution. It is an API 
incapsulation problem that have a nice solution using opCall 
because RandomVariables are tuples of params with opCall(Engine) 
method.

[1] https://github.com/libmir/mir-random
[2] 
http://forum.dlang.org/thread/zlhuomzishrqrjjpxfho forum.dlang.org
[3] https://en.wikipedia.org/wiki/Normal_variance-mean_mixture

Best regards,
Ilya
Nov 25 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/25/16 9:27 AM, Ilya Yaroshenko wrote:
 The discussion [2] about Mir Random [1] and Range API become a holy war [2].
I thought I agreed that a noncopyable struct with opCall is fine in conjunction with a range API adapter that uses a pointer. -- Andrei
Nov 25 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Friday, 25 November 2016 at 15:04:24 UTC, Andrei Alexandrescu 
wrote:
 I thought I agreed that a noncopyable struct with opCall is 
 fine in conjunction with a range API adapter that uses a 
 pointer. -- Andrei
FWIW, I suspect that Ilya means simply that it became long and convoluted, not that anyone is taking a religious position on particular issues.
Nov 25 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Friday, 25 November 2016 at 15:22:36 UTC, Joseph Rushton 
Wakeling wrote:
 On Friday, 25 November 2016 at 15:04:24 UTC, Andrei 
 Alexandrescu wrote:
 I thought I agreed that a noncopyable struct with opCall is 
 fine in conjunction with a range API adapter that uses a 
 pointer. -- Andrei
FWIW, I suspect that Ilya means simply that it became long and convoluted, not that anyone is taking a religious position on particular issues.
Yes, thanks
Nov 25 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/25/16 10:30 AM, Ilya Yaroshenko wrote:
 On Friday, 25 November 2016 at 15:22:36 UTC, Joseph Rushton Wakeling wrote:
 On Friday, 25 November 2016 at 15:04:24 UTC, Andrei Alexandrescu wrote:
 I thought I agreed that a noncopyable struct with opCall is fine in
 conjunction with a range API adapter that uses a pointer. -- Andrei
FWIW, I suspect that Ilya means simply that it became long and convoluted, not that anyone is taking a religious position on particular issues.
Yes, thanks
I'd say we can bury the hatchet and move on with Ilya's API. -- Andrei
Nov 25 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Friday, 25 November 2016 at 16:04:07 UTC, Andrei Alexandrescu 
wrote:
 On 11/25/16 10:30 AM, Ilya Yaroshenko wrote:
 On Friday, 25 November 2016 at 15:22:36 UTC, Joseph Rushton 
 Wakeling wrote:
 On Friday, 25 November 2016 at 15:04:24 UTC, Andrei 
 Alexandrescu wrote:
 I thought I agreed that a noncopyable struct with opCall is 
 fine in
 conjunction with a range API adapter that uses a pointer. -- 
 Andrei
FWIW, I suspect that Ilya means simply that it became long and convoluted, not that anyone is taking a religious position on particular issues.
Yes, thanks
I'd say we can bury the hatchet and move on with Ilya's API. -- Andrei
Yes, this was clear. There are also others who may disagree. This is the reason for this thread. --Ilya
Nov 25 2016
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Friday, 25 November 2016 at 16:08:15 UTC, Ilya Yaroshenko 
wrote:
 On Friday, 25 November 2016 at 16:04:07 UTC, Andrei 
 Alexandrescu wrote:
 I'd say we can bury the hatchet and move on with Ilya's API. 
 -- Andrei
Yes, this was clear. There are also others who may disagree. This is the reason for this thread. --Ilya
Without wanting to re-open the debate now, it seems to me that it's worth distinguishing carefully between: * the changes necessary to support Ilya's goals in terms of generating really high quality random variates (where I'm not sure the switch from ranges to functors really matters that much, except as a preference), and * the changes necessary to avoid the known issues in terms of accidental copying-by-value of RNG state (where the functor-based proposal is interesting, but not necessarily the best or only solution). On the second point, I would suggest maintaining an open mind both towards Ilya's proposal but also to the possibility that there might be better alternatives. (Minor remark: the copying-by-value of RNG state is not limited to random engines [i.e. pseudo-random algorithms]; it also applies to e.g. a random device which caches a buffer of bits read from a 'true' random source.)
Nov 25 2016
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Friday, 25 November 2016 at 14:27:13 UTC, Ilya Yaroshenko 
wrote:
 The discussion [2] about Mir Random [1] and Range API become a 
 holy war [2]. I am putting the following example here. It also 
 can be found at GitHub[1].
It's not a holy war, but your code example doesn't really address the key issues to do with the interaction of random number generation and ranges, which are not really much about how you create a reference-type range for RNGs themselves. (Your solution is one of several.) What would convince me of the merits of your approach is an implementation of `RandomSample` (as in, the range that extracts a subset of n elements out of the first N elements of an input range, with equal probability of selecting any particular element) that deals with the following challenges: * its own internal state (or at least, those parts of it that correspond to state underlying the pseudo-random process, which is NOT limited to the RNG state) is not at risk of being copied by value; * it's possible to generate multiple `RandomSample` instances deep in the inner loop of a program, without creating slow-down related to memory allocation (this was the reason I abandoned a class-based approach); * it's easy to fit the `RandomSample` implementation into a range chain. The last point doesn't have to be quite as simple as the sort of thing you can do currently, such as: iota(100).randomSample(10, rng).filter!(a => a > 50).writeln; Ideally it would be possible to create a function that creates that kind of range chain and returns it, as in: auto myMagicRange(RNG)(size_t n, size_t m, ref RNG rng) { assert(n < m); return iota(m).randomSample(n, rng).filter!(a => a > n/2); } // Now output 10,000 unique filtered samples ... foreach (_; 0 .. 10_000) { myMagicRange(10, 100, rng).writeln; }
Nov 25 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Friday, 25 November 2016 at 15:09:46 UTC, Joseph Rushton 
Wakeling wrote:
 * its own internal state (or at least, those parts of it that 
 correspond to state underlying the pseudo-random process, which 
 is NOT limited to the RNG state) is not at risk of being copied 
 by value;
What kind of state should not be copied by value? I thought it is only an Engine. Engines can not be copied by value in Mir Random. If you mean counters, than it is not correct for Dlang. Almost all D counters like ranges are copied by value. A `ref` for randomSample should be used to functions if thay want to modify its counters (the same logic exists in std.experimental.primitives like popFront ).
 * it's possible to generate multiple `RandomSample` instances 
 deep in the inner loop of a program, without creating slow-down 
 related to memory allocation (this was the reason I abandoned a 
 class-based approach);
No problem, the same approach like for RandomRange should work.
 * it's easy to fit the `RandomSample` implementation into a 
 range chain.
No problem, ditto
 The last point doesn't have to be quite as simple as the sort 
 of thing you can do currently, such as:

     iota(100).randomSample(10, rng).filter!(a => a > 
 50).writeln;
Why not? rng can be passed by ref and and its pointer can be preserved internally. So the same syntax will work in Mir Random.
 Ideally it would be possible to create a function that creates 
 that kind of range chain and returns it, as in:

     auto myMagicRange(RNG)(size_t n, size_t m, ref RNG rng)
     {
         assert(n < m);

         return iota(m).randomSample(n, rng).filter!(a => a > 
 n/2);
     }

     // Now output 10,000 unique filtered samples ...
     foreach (_; 0 .. 10_000)
     {
         myMagicRange(10, 100, rng).writeln;
     }
This works for RandomRange and can be done for RandomSample the same way. BTW, I am looking for your RandomSample PR! :-) range.algorithm [1] seems to be a proper module for it (it is small for now, so your name should be the first here). [1] https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
Nov 25 2016
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Friday, 25 November 2016 at 15:56:45 UTC, Ilya Yaroshenko 
wrote:
 What kind of state should not be copied by value? I thought it 
 is only an Engine.
Unfortunately that's not true. The sampling algorithm pulls some tricks to try to reduce the number of calls to the random number generator (as well as to avoid calls to some more expensive calculations to transform those variates into what is wanted). The variable in question is here: https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2413 And you can see here the places where regenerating it from the RNG may be avoided: https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2699-L2702 https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2763 Essentially the sampling algorithm carries out its own little internal pseudo-random process which falls back to using the RNG when necessary. Now, in this specific case, it might be possible to avoid that. The paper where this algorithm is introduced is more than 30 years old and it's possible that optimization is no longer very relevant on modern hardware with modern RNGs, so it might be fine to just generate the value concerned from the RNG with every popFront(). However, that would need to be benchmarked; and more generally, it doesn't seem to me to be reasonable to assume that other random algorithms might not have similar requirements to maintain some state associated with an internal pseudo-random process.
 Engines can not be copied by value in Mir Random. If you mean 
 counters, than it is not correct for Dlang. Almost all D 
 counters like ranges are copied by value. A `ref` for 
 randomSample should be used to functions if thay want to modify 
 its counters (the same logic exists in 
 std.experimental.primitives like popFront ).
All of the "No problem" in your remarks comes out of the assumption that it's just the RNG whose state must never be copied by value. But even if we can manage to get rid of the problematic state from this particular algorithm, I don't think that's an assumption that can necessarily be generalized to all random algorithms, which may do all sorts of clever state-based tricks to optimize their performance. Can that kind of state also be wrapped up inside a struct that can't be copied by value? Possibly, but then that makes it rather more tricky to use RandomSample in a friendly way inside extended range chains as described in my previous post. Stepping back from the specifics: one of my gripes, not with you personally, but with the general discussion around this issue is that people almost always seem to assume it's all about the RNGs, and that if you stop them from being copied by value, then everything else will fall out just fine. It's very rare that people ever give consideration to the random algorithms except as an afterthought. That means that the solutions proposed very often seem nice and promising when looked at in terms of RNGs, and sometimes also in terms of random distributions, but fall flat the moment random algorithms come into play. I've had a lot of conversations with different people around this issue, and I've tried out a bunch of different ideas, including functor RNGs which would be wrapped with a range API accessing them via pointer (a suggestion Dicebot among others made to me). In every case, I've run into difficulty when moving from RNGs and distributions to random algorithms like RandomCover and RandomSample. Now, that's very possibly just my failure -- it would be great if you could identify a solution where I could not. But the reason I described the RandomSample-related problem above was not to get assurances that it would not be a problem. It was meant to encourage you to look in detail yourself at this problem, to experience its nuances for yourself. It may not be as unproblematic as you think.
 BTW, I am looking for your RandomSample PR! :-)
Well, I would love to give you one, but in this case I do think it may be worth your examining the algorithm and its implementation in more detail yourself. Please understand, I'm not trying to be obstructive here. I genuinely think there are factors you have not considered in this case, that would be useful for you to look into in greater depth before you finalize your designs. BTW the original research papers on the algorithm are here, if you want to take a look: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.329.6400&rep=rep1&type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.1689&rep=rep1&type=pdf What I will follow up with is to benchmark the sampling algorithm without the optimization trick. I'll also try to submit a PR for the xoroshiro and xorshift* RNGs, and post issues containing some of the feedback promised on other aspects of the design. You may have to be a little patient with me, though ... it's a busy time right now :-\
Nov 25 2016