www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 8247] New: Inconsistent behaviour of randomSample depending on whether a random number generator is specified

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8247

           Summary: Inconsistent behaviour of randomSample depending on
                    whether a random number generator is specified
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: joseph.wakeling webdrake.net


--- Comment #0 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2012-06-14 12:27:53 PDT ---
Created an attachment (id=1116)
Working minimal example illustrating the inconsistencies described.

The randomSample function in std.random can be called with or without
specifying a random number generator to use.

If no RNG is specified, then each lazy evaluation of the sample evaluates
differently, i.e. if you do

   sample1 = randomSample(iota(0, 100), 5);
   writeln(sample1);
   writeln(sample1);
   writeln(sample1);

you will get 3 different samples.

Conversely, if a random number generator is specified, you will get 3 times the
same result:

   sample2 = randomSample(iota(0, 100), 5, Random(unpredictableSeed));
   writeln(sample2);
   writeln(sample2);
   writeln(sample2);

Note that the seeding of the RNG is important, because if an already-existing
RNG is provided to create multiple different samples, they will evaluate
identically, e.g.

   sample3 = randomSample(iota(0, 100), 5, rndGen);
   writeln(sample3);
   sample4 = randomSample(iota(0, 100), 5, rndGen);
   writeln(sample4);
   sample5 = randomSample(iota(0, 100), 5, rndGen);
   writeln(sample5);

... will produce the same output 3 times.  This happens because the RNG passed
to randomSample is copied rather than used by reference.

These inconsistencies lead to a lot of potential confusion and sources of bugs.
 So, first of all, we need a firm decision on how the lazy evaluation of
RandomSample should behave -- should it

  (1) always evaluate to the same sample, or

  (2) always evaluate to a different sample?

... and depending on the answer, we then need to address how to specify and
seed an RNG for RandomSample.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 14 2012
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8247



--- Comment #1 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2012-06-14 12:35:44 PDT ---
Online discussion on this:
http://forum.dlang.org/thread/4FD735EB.70404 webdrake.net

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 14 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8247


jens.k.mueller gmx.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jens.k.mueller gmx.de


--- Comment #2 from jens.k.mueller gmx.de 2012-06-14 13:41:13 PDT ---
I opt for the returning the same sample (option 1). I want the sample to stay
the same.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 14 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8247


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com


--- Comment #3 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-14 15:27:33
PDT ---
If you want randomSample to be consistent as to which you get, it needs to be
made to handle both reference and value type random number generating ranges
identically, since they could be either. At present, all of those in std.random
are value types, which is actually a problem in general. They really should
reference types. But regardless of which they are, there's nothing stopping
someone from implementing either a value type or reference type range which is
a random number generator, in which case you'll get inconsistent behavior if
randomSample doesn't code for using both by using save where appropriate.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 14 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8247



--- Comment #4 from jens.k.mueller gmx.de 2012-06-15 00:54:20 PDT ---
Now I see why you want to pass RNG by reference. Because you may want that two
functions share the same generator.
But then I would go with passing them all by reference for consistency reasons.
And all functions have as default argument rndGen() which could be renamed to
defaultRNG().
randomShuffle is already doing it this way. Though I don't see why it sets the
template argument RandomGen to Random by default. This should be inferred
automatically by the default argument rndGen() anyway.
So randomCover and randomSample should follow the same approach.

I do not see why one needs to pass a RNG by value then. Admittedly I have never
used std.random. So I may have wrong use cases in mind.
But having a thread local RNG that is used by default should be okay.

 Jonathan
Why should a RNG type have reference semantics? I think it's fine to pass them
by reference where needed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 15 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8247



--- Comment #5 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-15 01:05:40
PDT ---
 Why should a RNG type have reference semantics? I think it's fine to pass them
by reference where needed. Because it makes no sense for it to have value semantics. Take this for example auto func(R)(R r) { r.popFront(); auto var1 = r.front; ///... } func(generator); generator.popFront(); auto var2 = generator.front; Both var1 and var2 will have the exact same value. This is an easy mistake to make, and since random number generators are supposed to be returning random numbers, having them return the _same_ number after popFront has been called is definitely problematic. By making them reference types, the only time that you get the same number multiple times in a row is when you do it on purpose (e.g. by storing the value of front or by calling save on the range). There's a discussion on this in issue# 7067. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 15 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8247



--- Comment #6 from jens.k.mueller gmx.de 2012-06-15 01:22:31 PDT ---
(In reply to comment #5)
 Why should a RNG type have reference semantics? I think it's fine to pass them
by reference where needed. Because it makes no sense for it to have value semantics. Take this for example auto func(R)(R r) { r.popFront(); auto var1 = r.front; ///... } func(generator); generator.popFront(); auto var2 = generator.front; Both var1 and var2 will have the exact same value. This is an easy mistake to make, and since random number generators are supposed to be returning random numbers, having them return the _same_ number after popFront has been called is definitely problematic. By making them reference types, the only time that you get the same number multiple times in a row is when you do it on purpose (e.g. by storing the value of front or by calling save on the range). There's a discussion on this in issue# 7067.
I see. Thanks. Since passing around RNGs should be by default by reference RNGs should be reference types. Otherwise everybody writing own functions accepting RNGs has to use ref which is error-prone. Using ref when passing RNGs in std.random won't solve this general design issue. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 15 2012