www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A simple solution for randomness copy problem

reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
Hello,

The problem is that a structure for a random algorithm or a 
random variable may holds its own precomputed random state, which 
is not correct to copy.

 From [1] by Joseph Rushton Wakeling:
 Essentially the sampling algorithm carries out its own little 
 internal pseudo-random process which falls back to using the 
 RNG when necessary.
 I've had a lot of conversations with different people around 
 this issue, and I've tried out a bunch of different ideas, 
 including functor RNGs which would be wrapped with a range API 
 accessing them via pointer (a suggestion Dicebot among others 
 made to me).  In every case, I've run into difficulty when 
 moving from RNGs and distributions to random algorithms like 
 RandomCover and RandomSample.
The solution is to add a `hot` flag that indicates that precomputed random values can be used and reset this flag in copy constructor. It works without performance issues for the Vitter's algorithm and Normal sampling (of course if you don't copy the struct for each call). Both Vitter's random sample algorithm and normal distribution needs this flag or analog anyway (at least in my implementations). Below are 2 reduced code samples. The first one is for Vitter's strides for random sample and the second one is for Normal distribution. The actual code for random sample (`sample`) can be found at [2], and for `NormalVairable` at 3. [1] http://forum.dlang.org/post/gpsilefdillwtleuwohl forum.dlang.org [2] https://github.com/libmir/mir-random/blob/master/source/mir/random/algorithm.d [3] https://github.com/libmir/mir-random/blob/master/source/mir/random/variable.d ----------- Vitter's Algorithm D --------- struct VitterStrides { private double vprime; // A future/pregenerated random value private bool hot; // A flag that indicates that vprime can be used this(this) { hot = false; } sizediff_t opCall(G)(ref G gen) { ... } } ---------- Normal Distribution ---------- RandomVariable struct NormalVariable(T) if (isFloatingPoint!T) { private T y = 0; private bool hot; this(this) { hot = false; } /// T opCall(G)(ref G gen) if (isSaturatedRandomEngine!G) { T x = void; if (hot) { hot = false; x = y; } else { // compute independent x, y at once } ... } }
Nov 29 2016
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Tuesday, 29 November 2016 at 08:50:52 UTC, Ilya Yaroshenko 
wrote:
 The solution is to add a `hot` flag that indicates that 
 precomputed random values can be used and reset this flag in 
 copy constructor. It works without performance issues for the 
 Vitter's algorithm and Normal sampling (of course if you don't 
 copy the struct for each call).
Oh, nice. This feels like a bit of a "facepalm" moment as in, "How did no one think of that before?" :-P It could be a heavy performance hit for larger state variables (e.g. a Ziggurat), but on that note ...
 Both Vitter's random sample algorithm and normal distribution 
 needs this flag or analog anyway (at least in my 
 implementations).
I'm not sure that it's so important for distributions. Unlike the general case of random algorithms (which likely need to be created multiple times in inner program loops), distributions are much more likely to be created at a relatively high level and then _used_ multiple times in an inner loop. That means that in principle, distributions could be treated in the same way as RNGs, i.e. just ban copying by value and expect them to be passed around by ref or by pointer. Of course, including the `hot` flag could offer best of both worlds: if you pass around by ref then you avoid needing to recreate the internal state, if you pass around by value then the internal state is regenerated for statistical safety. I'll try this out in detail with the Phobos std.random stuff and see how it plays. Thank you _very_ much for focusing on this problem.
         if (isSaturatedRandomEngine!G)
Question on your terminology here: while saturated makes sense, is it really your intention to restrict things to random _engines_ (i.e. pseudo-random algorithms)? Surely it should also be possible for this functionality to work with random devices?
Nov 30 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 30 November 2016 at 11:29:25 UTC, Joseph Rushton 
Wakeling wrote:
 On Tuesday, 29 November 2016 at 08:50:52 UTC, Ilya Yaroshenko
         if (isSaturatedRandomEngine!G)
Question on your terminology here: while saturated makes sense, is it really your intention to restrict things to random _engines_ (i.e. pseudo-random algorithms)? Surely it should also be possible for this functionality to work with random devices?
Of course they can work with random devices. It looks strange to me to have explicit API difference between engines and devices. A random devices can be marked as random engines or we can add a simple generic adaptor (Device->Engine) for them. --Ilya I will be happy to se your PR for random devices to mir-random. Thanks, Ilya
Nov 30 2016
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 30 November 2016 at 12:28:28 UTC, Ilya Yaroshenko 
wrote:
 Of course they can work with random devices. It looks strange 
 to me to have explicit  API difference between engines and 
 devices. A random devices can be marked as random engines or we 
 can add a simple generic adaptor (Device->Engine) for them.
The terminology here (from the C++11 <random> standard) is: * generator == source of uniformly-distributed random bits * engine == pseudo-random generator (i.e. seedable sequences) * device == 'truly random' generator
 I will be happy to se your PR for random devices to mir-random.
I'll see what I can do.
Nov 30 2016
prev sibling parent Dicebot <public dicebot.lv> writes:
One minor nitpick - please avoid calling postblit constructor a 
"copy constructor", it tends to mislead developers with C++ 
origins into expecting copy constructor they are used to - and 
disappointment when it proves to not be the case.
Nov 30 2016