digitalmars.D.bugs - [Issue 7067] New: std.random.RandomSample and RandomCover are poorly designed
- d-bugmail puremagic.com (30/30) Dec 05 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (10/10) Dec 05 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (15/29) Dec 05 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (7/17) Dec 05 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (6/7) Dec 06 2011 We have backward compatibility to worry about.
- d-bugmail puremagic.com (22/22) Dec 06 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (8/10) Dec 06 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (14/17) Dec 07 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (12/12) Dec 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (14/14) Jun 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (21/51) Jun 19 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (14/30) Jun 19 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (7/11) Sep 27 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (34/42) Sep 27 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (50/50) Jun 20 2013 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (9/10) Jun 20 2013 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (16/19) Jun 20 2013 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (8/14) Aug 21 2013 http://d.puremagic.com/issues/show_bug.cgi?id=7067
- d-bugmail puremagic.com (17/17) Aug 29 2013 http://d.puremagic.com/issues/show_bug.cgi?id=7067
http://d.puremagic.com/issues/show_bug.cgi?id=7067 Summary: std.random.RandomSample and RandomCover are poorly designed Product: D Version: D2 Platform: Other OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: thecybershadow gmail.com CC: andrei metalanguage.com --- Comment #0 from Vladimir Panteleev <thecybershadow gmail.com> 2011-12-05 04:00:22 PST --- The following tests will always fail: int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8 ]; assert(!equal(randomCover(a, rndGen()), randomCover(a, rndGen()))); assert(!equal(randomSample(a, 5, rndGen()), randomSample(a, 5, rndGen()))); The reason why these tests will fail is that both functions take the RNG by value. Not only is this unintuitive, this is also a security liability - someone depending on these functions to generate random identifiers can be surprised that two successive calls generate the same ID. I strongly suggest that RNG types are disallowed from being implicitly copied. Even though pseudo-random number generators shouldn't be used for security purposes, they're still likely to be used in such contexts - especially considering lack of better sources of random data in Phobos. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 05 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 bearophile_hugs eml.cc changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bearophile_hugs eml.cc --- Comment #1 from bearophile_hugs eml.cc 2011-12-05 04:09:15 PST --- See also issue 6593 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 05 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #2 from Andrei Alexandrescu <andrei metalanguage.com> 2011-12-05 21:28:32 PST --- (In reply to comment #0)The following tests will always fail: int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8 ]; assert(!equal(randomCover(a, rndGen()), randomCover(a, rndGen()))); assert(!equal(randomSample(a, 5, rndGen()), randomSample(a, 5, rndGen())));Good point.The reason why these tests will fail is that both functions take the RNG by value. Not only is this unintuitive, this is also a security liability - someone depending on these functions to generate random identifiers can be surprised that two successive calls generate the same ID.I think we can safely eliminate this argument from the discussion.I strongly suggest that RNG types are disallowed from being implicitly copied. Even though pseudo-random number generators shouldn't be used for security purposes, they're still likely to be used in such contexts - especially considering lack of better sources of random data in Phobos.The problem with taking a random generator by reference is that it then needs to be escaped. So people would be quite surprised to see that: auto gen = rndGen; return randomSample(a, 5, gen); has undefined behavior. One way or another we need to solve this, e.g. by creating a wrapper with reference semantics over generators. Ideas are welcome. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 05 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #3 from bearophile_hugs eml.cc 2011-12-05 23:53:48 PST --- (In reply to comment #2)The problem with taking a random generator by reference is that it then needs to be escaped. So people would be quite surprised to see that: auto gen = rndGen; return randomSample(a, 5, gen); has undefined behavior. One way or another we need to solve this, e.g. by creating a wrapper with reference semantics over generators. Ideas are welcome.Turn random generators into final classes? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 05 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #4 from Andrei Alexandrescu <andrei metalanguage.com> 2011-12-06 07:53:28 PST ---Turn random generators into final classes?We have backward compatibility to worry about. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 06 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #5 from Vladimir Panteleev <thecybershadow gmail.com> 2011-12-06 08:04:49 PST --- The disadvantages of breaking backwards compatibility need to be considered on a case-by-case basis. I think that turning RNGs into reference types has the potential to be a relatively low-impact change, while also having a good chance of revealing broken code. The typical usage of std.random does not involve using the RNG types directly, and rather using the implicit thread-local RNG. An example of affected code: Random r; // ... use r I think that generally allowing such code is a mistake, because it's not clear that the RNG is not seeded. auto r = Random(42); We can implement this for backwards compatibility using static opCall (which shall be scheduled for deprecation). The biggest problem is intentional usage of value semantics (it would transparently turn into reference semantics). Perhaps there's something we could do with the help of opAssign? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 06 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #6 from bearophile_hugs eml.cc 2011-12-06 09:49:20 PST --- (In reply to comment #5)The biggest problem is intentional usage of value semantics (it would transparently turn into reference semantics).I suggest to ignore such cases, they are probably uncommon, and add a warning note to the changelog. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 06 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 Alex Rønne Petersen <xtzgzorex gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |xtzgzorex gmail.com --- Comment #7 from Alex Rønne Petersen <xtzgzorex gmail.com> 2011-12-07 04:21:37 PST --- (In reply to comment #4)Sometimes breaking changes can be a good thing. It seems to me like with the current design, it's more likely that people are Doing It Wrong than the opposite. Keeping backwards compatibility also means allowing people to continue doing the same mistakes. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------Turn random generators into final classes?We have backward compatibility to worry about.
Dec 07 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 Bernard Helyer <blood.of.life gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |blood.of.life gmail.com --- Comment #8 from Bernard Helyer <blood.of.life gmail.com> 2011-12-08 00:02:29 PST --- To add an anecdote to this, when I first tried passing Random instances around, I was surprised to find they weren't reference types; it wasn't obvious to me at all. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7067 Jonathan M Davis <jmdavisProg gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jmdavisProg gmx.com --- Comment #9 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-15 01:02:22 PDT --- We can make the random number generator ranges reference types without breaking code simply by creating inner structs for them which hold their member variables and sit on the heap. The only code which would break because of this would be code which relied on them being value types, which (as discussed) is almost certainly a bug. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7067 jens.k.mueller gmx.de changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jens.k.mueller gmx.de --- Comment #10 from jens.k.mueller gmx.de 2012-06-19 07:30:43 PDT --- (In reply to comment #2)(In reply to comment #0)Code like this is always incorrect, i.e. the problem is more general. I wonder whether a compiler can always flag those programs as invalid. Is it possible to solve the problem of escaping references to local variables in general? If so it should probably be done that way. And a RNG should be made non-copyable to force pass by ref. If it isn't possible (or inefficient or too time consuming, etc.), then std.random should be deprecated and std.random2 should replace it in the long run. I believe this is the best solution (but far from perfect) to handle design problems like this. I wish there was a better way to fix a wrong design decision. But working around (kind of against the language) is not future-proof. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------The following tests will always fail: int[] a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8 ]; assert(!equal(randomCover(a, rndGen()), randomCover(a, rndGen()))); assert(!equal(randomSample(a, 5, rndGen()), randomSample(a, 5, rndGen())));Good point.The reason why these tests will fail is that both functions take the RNG by value. Not only is this unintuitive, this is also a security liability - someone depending on these functions to generate random identifiers can be surprised that two successive calls generate the same ID.I think we can safely eliminate this argument from the discussion.I strongly suggest that RNG types are disallowed from being implicitly copied. Even though pseudo-random number generators shouldn't be used for security purposes, they're still likely to be used in such contexts - especially considering lack of better sources of random data in Phobos.The problem with taking a random generator by reference is that it then needs to be escaped. So people would be quite surprised to see that: auto gen = rndGen; return randomSample(a, 5, gen); has undefined behavior.
Jun 19 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7067 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmitry.olsh gmail.com --- Comment #11 from Dmitry Olshansky <dmitry.olsh gmail.com> 2012-06-19 07:40:01 PDT ---I had similar problems with redesign of std.regex. It has few less then ideal tradeoffs because of that but I've come to appreciate backwards compatibility. In any case big value types like RNG that seldom get modified but often forwarded could use ref-counted COW. In D we are thread-local by default and thus COW is fun and cheap. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------auto gen = rndGen; return randomSample(a, 5, gen); has undefined behavior.Code like this is always incorrect, i.e. the problem is more general. I wonder whether a compiler can always flag those programs as invalid. Is it possible to solve the problem of escaping references to local variables in general? If so it should probably be done that way. And a RNG should be made non-copyable to force pass by ref. If it isn't possible (or inefficient or too time consuming, etc.), then std.random should be deprecated and std.random2 should replace it in the long run. I believe this is the best solution (but far from perfect) to handle design problems like this. I wish there was a better way to fix a wrong design decision. But working around (kind of against the language) is not future-proof.
Jun 19 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #12 from bearophile_hugs eml.cc 2012-09-27 04:46:57 PDT --- (In reply to comment #10)then std.random should be deprecated and std.random2 should replace it in the long run. I believe this is the best solution (but far from perfect) to handle design problems like this.std.random2 seems a good solution. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 27 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7067 monarchdodra gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |monarchdodra gmail.com --- Comment #13 from monarchdodra gmail.com 2012-09-27 13:51:58 PDT --- (In reply to comment #12)(In reply to comment #10)As I've mentioned, I am almost ready with a non-breaking reference semantic fix. To avoid breaking code, the ranges are basically lazilly initialized. As I've stated in the forums, I do not want to add ANY new functionality in random, which would have to be ditched for a migration to random2 anyways. My two ideas were: * Explicit opSlice that returns an agressivilly initialized PRNG, for higher performance. They'd also have tighter safety. * A PRNG.Payload alias for users that *really* want stack allocation. Out of curiosity, if you had to write random2, what would you want in random2? * I'd remove all constructors, and require an explicit call to either a seed function, or a free Make!PRNG fuction. I don't think classes is quite the right solution. ** This would address both the explicit initialization issue, as well as the "no-arg" initialization issue * I'd also make them "only" input ranges. It doesn't make much sense to save a prng (though I'd still have .dup), but they wouldn't be forwardRanges. Not much else actually. I was planning on asking community feedback on potential evolutions when my developement was ready to go through (either evolving std.random, or forking to std.random2), but I wouldn't mind a pre-release opinion from users who have more hindsight of the issue ;) PS: Related, I'd want std.container to require the exact same initialization scheme >:( PPS: No spellchecker here, sorry. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------then std.random should be deprecated and std.random2 should replace it in the long run. I believe this is the best solution (but far from perfect) to handle design problems like this.std.random2 seems a good solution.
Sep 27 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7067 Joseph Rushton Wakeling <joseph.wakeling webdrake.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |joseph.wakeling webdrake.ne | |t --- Comment #14 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net> 2013-06-20 16:02:07 PDT --- I just want to put on record some remarks about short- and long-term solutions to this problem. Long-term as I think we all agree we need RNGs to be reference types. In this case the general design of ranges like RandomCover and RandomSample will be not entirely different from what RandomCover has now: the constructor will take a RNG as one of its arguments, and (as it's a reference type) this can be copied (by reference!) to an internal variable: struct SomeRandomRange(Rng) { private Rng _gen; ... this(/*stuff*/, Rng gen) { _gen = gen; ... } ... } ... and then you'd have some helper functions that would handle the case where the user doesn't specify an RNG by passing rndGen to the constructor. However, while we're stuck with the situation we have, the design of RandomCover is statistically completely unsafe, as you'll inevitably get unwanted correlations due to storing the RNG by value. As RandomCover's constructor insists on receiving an RNG, the only way to escape this is to pass it a new RNG seeded with unpredictableSeed. What we can do, though, is to salvage things minimally by patching RandomCover to use the same technique as RandomSample -- that is, if the constructor gets passed an RNG, copy it; but if not -- call rndGen directly. I'm going to prepare some patches to that effect, which will also try and do something about the .save methods of Random{Cover,Sample} -- both of which are currently broken. It's putting a sticking plaster on a gaping wound, but at least it should mean that the simplest case available to the user -- doing something like this: auto c = randomCover(input); ... will be statistically reliable. The caveat here is that we have to remember to tweak the constructors back to insisting on getting an RNG once we can guarantee that RNGs will behave as reference types. (Is there any kind of template constraint that could be used to guarantee reference copying semantics?) Anyway, I'll get on with writing the patches. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #15 from bearophile_hugs eml.cc 2013-06-20 16:13:04 PDT --- (In reply to comment #14)It's putting a sticking plaster on a gaping wound,Why do you/we care so much for breaking backwards compatibility with something that is so broken? If you let it take the generator by ref I think you will remove from bugs from user code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #16 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net> 2013-06-20 16:27:56 PDT --- (In reply to comment #15)Why do you/we care so much for breaking backwards compatibility with something that is so broken? If you let it take the generator by ref I think you will remove from bugs from user code.You can pass the generator to the constructor by ref, but you can't safely store it as a reference type, and so far as I can see you DO need to store it internally -- RandomCover is a range object, not a function where you can just use it and return. _If_ you could safely store a reference to the RNG, then of course what you say would be right. But you can't, not while they are value types. My proposal won't break backwards compatibility (which is not in itself a good thing -- this back deserves to be broken:-) and won't fix the bigger picture, but it will improve the status quo without having to wait for the large-scale redesign that random number generation needs. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #17 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net> 2013-08-21 07:38:13 PDT --- (In reply to comment #14)What we can do, though, is to salvage things minimally by patching RandomCover to use the same technique as RandomSample -- that is, if the constructor gets passed an RNG, copy it; but if not -- call rndGen directly. I'm going to prepare some patches to that effect, which will also try and do something about the .save methods of Random{Cover,Sample} -- both of which are currently broken.Just to note, I've submitted a pull request with updates along these lines: https://github.com/D-Programming-Language/phobos/pull/1499 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 21 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7067 --- Comment #18 from github-bugzilla puremagic.com 2013-08-29 00:34:46 PDT --- Commit pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/8d9233cf8b9e4d27bd70dd0fcd171d2f6dc2f2c0 Partial fix for Issues 7067 and 10434 - std.random.RandomCover The existing RandomCover design is fatally flawed because it requires a RNG as input, which it then copies internally by value. So, unless the user is smart enough to pass something like e.g. SomeRNG(unpredictableSeed), there will be unintended correlations in random behaviour. This partial fix follows the design of RandomSample in allowing RandomCover to use the thread-global default RNG rndGen. It also improves the choice of template parameter and variable names in line with Issue 10434. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 29 2013