## digitalmars.D.announce - legacy code retreat's triva game : the D version

- marcpmichel (8/8) Dec 20 2013 I participated in the "global day of code retreat 2013", and we
- bearophile (4/7) Dec 20 2013 Bye,
- marcpmichel (4/11) Dec 20 2013 Do you mean I should have used :
- bearophile (4/7) Dec 20 2013 Right. Using % introduces a bias.
- Chris Cain (38/53) Dec 20 2013 TL;DR version:
- Ivan Kazmenko (48/53) Dec 21 2013 Looks like your new implementation has one modulo operator,
- marcpmichel (20/29) Dec 21 2013 Indeed, your're right, thanks.
- bearophile (11/12) Dec 21 2013 From page 6:
- Meta (3/15) Dec 21 2013 I know immutable is a good thing, but don't you think `immutable
- bearophile (23/25) Dec 21 2013 Some answer, choose the one you prefer:
- ilya-stromberg (17/28) Dec 21 2013 Why did Walter reject this idea?
- John Colvin (4/33) Dec 22 2013 Those are quite different. The first one does have an effect,
- Marco Leise (45/56) Dec 21 2013 bool notAWinner;
- Timon Gehr (5/12) Dec 21 2013 Yes it does. (The probability that some value is never generated is 0.)
- Timon Gehr (2/10) Dec 21 2013 'pseudo random number generators' would be a more accurate term.
- Marco Leise (10/22) Dec 22 2013 Can you elaborate a bit? How do you know that the Java LCG
- Chris Cain (29/35) Dec 22 2013 If I may,
- Timon Gehr (17/38) Dec 22 2013 The probability that a certain number does not occur in one round is
- Marco Leise (11/92) Dec 22 2013 Am Sun, 22 Dec 2013 13:09:51 +0100

I participated in the "global day of code retreat 2013", and we had to do refactoring on a very ugly piece of code which was available on many languages. But there was no D version, so I made one (based on the java version) and pull-requested it. Here is the ugly thing : https://github.com/jbrains/trivia/tree/master/d EOT

Dec 20 2013

marcpmichel:Here is the ugly thing : https://github.com/jbrains/trivia/tree/master/dAnd wrong:if (rand.front() % 9 == 7) {Bye, bearophile

Dec 20 2013

On Friday, 20 December 2013 at 15:05:07 UTC, bearophile wrote:marcpmichel:Do you mean I should have used : if (uniform(0,10) == 7) { instead ?Here is the ugly thing : https://github.com/jbrains/trivia/tree/master/dAnd wrong:if (rand.front() % 9 == 7) {Bye, bearophile

Dec 20 2013

marcpmichel:Do you mean I should have used : if (uniform(0,10) == 7) { instead ?Right. Using % introduces a bias. Bye, bearophile

Dec 20 2013

On Friday, 20 December 2013 at 16:20:44 UTC, marcpmichel wrote:On Friday, 20 December 2013 at 15:05:07 UTC, bearophile wrote:TL;DR version: Actually, the equivalent would be uniform(0, 9), but yes, that'd be the preferable approach there. (also note https://github.com/jbrains/trivia/blob/7b473f9fbbd125b0ab1c2e82582b8a8c414ca501/d source/trivia.d#L19 too should be changed to `uniform(1, 6)` which will give numbers in the range [1 .. 6) ... that's what you want, right?) Long version: For more information, I've written a document on an implementation of uniform (which should be coming in 2.065, btw) which discusses the issue with just using the modulus operator: https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdf Generally speaking, this new uniform will be _extremely_ close to the same speed of just using the modulus operator, but avoids the bias issue. I think there is no real good reason to not use the standard function anymore. That said, the bias with such a small number (9) won't be significant. If rand gives you a uniform 32-bit number, then the distribution of rand % 9 will be [477218589, 477218589, 477218589, 477218589, 477218588, 477218588, 477218588, 477218588, 477218588] (notice how the first 4 have 1 more occurrence than the rest?)... so the bias is miniscule in this case. The bias issue matters a lot more with larger numbers where some numbers could actually occur twice as often as others, or if your application demands high quality random numbers (think gambling games). Related to those reasons, even if your code doesn't use large numbers and isn't used for a gambling game now, it's still possible for it to eventually be used for such things (or to influence others to follow your example for the bad situations). For those reasons alone, it's pretty important to get in the habit of using the standard function. But that's not all since the standard function is probably easier to read, too. Let's say you want to emulate a standard 6-sided die. If you want numbers in the range [1..6] (note inclusive bounds) it's easier to see immediately when you say `uniform!"[]"(1, 6)' rather than `rand % 6 + 1` That's probably all TMI, but maybe all of that will be useful for you.marcpmichel:Do you mean I should have used : if (uniform(0,10) == 7) { instead ?Here is the ugly thing : https://github.com/jbrains/trivia/tree/master/dAnd wrong:if (rand.front() % 9 == 7) {Bye, bearophile

Dec 20 2013

On Saturday, 21 December 2013 at 05:12:57 UTC, Chris Cain wrote:For more information, I've written a document on an implementation of uniform (which should be coming in 2.065, btw) which discusses the issue with just using the modulus operator: https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdfLooks like your new implementation has one modulo operator, compared to the previous one having two divisions. That may be the cause of speedup. The previous implementation was, by its looks, copied from C++ Boost which also uses two divisions. Do you know the reason for that? They seem to have been solving the exact same problem (strict uniformness provided that the underlying RNG is uniform). I'd like to touch a relevant point here that matters for me. In a mature randomness library, one important quality is reproducibility: there are applications where you want to use pseudo-random values, but generate the exact same pseudo-random values across different versions, computers, operating systems and language implementations. So far I have seen very few languages which provide such reproducibility guarantees for their standard library. For example, in C and C++ standard randomness library, the details were implementation-dependent all the way until the recent C++11. Python stood for long but finally broke it between 3.1 and 3.2 because of the exact same non-uniformness problem. A positive example in this regard is Java which enforces the implementation of Random since at least version 1.5. If you break the reproducibility of uniform in dmd 2.065, there should be at least a note on that in its documentation. For a mature library, I think the old implementation should also have been made available somehow. (well, there's always an option to include an old library version in your project, but...) Perhaps that's not the case for D and Phobos since they are still not stabilized. Especially so for std.random which is due to more breakage anyway because of the value/reference issues with RNG types. Regarding that, I have a point on designing a randomness library. Right now, most of what I have seen has at most two layers: the core RNG providing random bits, and the various uses of these bits, like uniform distribution on a segment, random shuffle and so on. It is comfortable when the elements of the two layers are independent, and you can compose different first layers (LCG, MT19937, or maybe some interface to /dev/*random) with different second layer functions (uniform[0,9], random_shuffle, etc.). Still, many of the useful second level functions build upon uniform distribution for integers on a segment. Thus I would like to have an explicit intermediate layer consisting of uniform and maybe other distributions which could also have different (fast vs. exact) implementations to choose from. In the long run, such design could also solve reproducibility problems: we can provide another implementation of uniform as the default, but it is still easy to set the previous one as the preferred intermediate level. Ivan Kazmenko.

Dec 21 2013

Indeed, your're right, thanks. I used the modulo trick for multiple reasons : * I ported the java source, which used the basic java.util.random's Random.nextInt() then a modulo to cap the output. * D's std.random had me scratching my head for minutes; like : "What is this mess ? And where is the simple rand() function ?" * I didn't care about speed or uniformness of the generated numbers. * While in the code retreat event, we tried to get a "golden master" ( the output of the program ), to be able to test that refactoring didn't change anything. One trick is to set the seed of the random number generator to guarantee we always got the same dice rolls. And the std.random complexity didn't help to choose the right method. That being said, there are worse things in game.d : I introduced new bugs in this already buggy program, by using D's array slices. https://github.com/jbrains/trivia/blob/7b473f9fbbd125b0ab1c2e82582b8a8c414ca501/d/source/game.d#L101 Lastly, this tiny contribution is just a drop in the ocean of "spreading the world about D".Do you mean I should have used : if (uniform(0,10) == 7) { instead ?TL;DR version: Actually, the equivalent would be uniform(0, 9), but yes, that'd be the preferable approach there. (also note https://github.com/jbrains/trivia/blob/7b473f9fbbd125b0ab1c2e82582b8a8c414ca501/d source/trivia.d#L19 too should be changed to `uniform(1, 6)` which will give numbers in the range [1 .. 6) ... that's what you want, right?)

Dec 21 2013

Chris Cain:https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdfFrom page 6: size_t[] counts = new size_t[](top); foreach(i; 0 .. 500_000_000) counts[uniform(0, top)] += 1; Modern D allows you to write better code: size_t[N] counts; foreach (immutable _; 0 .. 500_000_000) counts[uniform(0, $)]++; Bye, bearophile

Dec 21 2013

On Saturday, 21 December 2013 at 15:03:34 UTC, bearophile wrote:Chris Cain:I know immutable is a good thing, but don't you think `immutable _` is a bit unnecessary in this case?https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdfFrom page 6: size_t[] counts = new size_t[](top); foreach(i; 0 .. 500_000_000) counts[uniform(0, top)] += 1; Modern D allows you to write better code: size_t[N] counts; foreach (immutable _; 0 .. 500_000_000) counts[uniform(0, $)]++; Bye, bearophile

Dec 21 2013

Meta:I know immutable is a good thing, but don't you think `immutable _` is a bit unnecessary in this case?Some answer, choose the one you prefer: 1) Yes, it's totally useless because the _ variable is not even used inside the loop body! So sorry, I'm always so pedantic. 2) It's necessary, don't you see that? You don't need to mutate that variable, so it's better for it be immutable. A simple rule to follow is to make const/immutable all variables that don't need to mutate, to make code simpler and safer. There's no real reason to break that general rule in this case. 3) Just like the integer '5' a range of values as 0 .. 1000 is an immutable value. So a variable that scans such range should be immutable. If you really want to mutate such variable you should add a modifier like "mutable" or "mut" or something. Another common trap in D coding is iterating on an array of structs with foreach, mutating the current struct and forgetting that you are mutating only a _copy_ of the items. Unfortunately there is no mutable keyword in D, and Walter rejected all this idea. So the next best thing it to always put "immutable" at the foreach variable, unless you want to mutate it or if you can't use const/immutable for some other reason. Probably I can invent you more creative answers if you want. Bear hugs, bearophile

Dec 21 2013

On Saturday, 21 December 2013 at 20:43:27 UTC, bearophile wrote:3) Just like the integer '5' a range of values as 0 .. 1000 is an immutable value. So a variable that scans such range should be immutable. If you really want to mutate such variable you should add a modifier like "mutable" or "mut" or something. Another common trap in D coding is iterating on an array of structs with foreach, mutating the current struct and forgetting that you are mutating only a _copy_ of the items. Unfortunately there is no mutable keyword in D, and Walter rejected all this idea. So the next best thing it to always put "immutable" at the foreach variable, unless you want to mutate it or if you can't use const/immutable for some other reason.Why did Walter reject this idea? BTW, we don't need `mutable` keyword to implement this idea. We should just deny any mutation of item copy. If you really need to store temporary result, add new variable. For example: foreach(i; arr) { ++i; //error - this variable contains copy of data, not a ref to the original data auto temp_i = i + 1; //OK } We already have similar errors, for example: void foo() { int i; i; //Error: var has no effect in expression (i) }

Dec 21 2013

On Sunday, 22 December 2013 at 07:29:22 UTC, ilya-stromberg wrote:On Saturday, 21 December 2013 at 20:43:27 UTC, bearophile wrote:Those are quite different. The first one does have an effect, it's just that the effect is only local to the loop scope. Even that isn't guaranteed, as ++i could have side-effects.3) Just like the integer '5' a range of values as 0 .. 1000 is an immutable value. So a variable that scans such range should be immutable. If you really want to mutate such variable you should add a modifier like "mutable" or "mut" or something. Another common trap in D coding is iterating on an array of structs with foreach, mutating the current struct and forgetting that you are mutating only a _copy_ of the items. Unfortunately there is no mutable keyword in D, and Walter rejected all this idea. So the next best thing it to always put "immutable" at the foreach variable, unless you want to mutate it or if you can't use const/immutable for some other reason.Why did Walter reject this idea? BTW, we don't need `mutable` keyword to implement this idea. We should just deny any mutation of item copy. If you really need to store temporary result, add new variable. For example: foreach(i; arr) { ++i; //error - this variable contains copy of data, not a ref to the original data auto temp_i = i + 1; //OK } We already have similar errors, for example: void foo() { int i; i; //Error: var has no effect in expression (i) }

Dec 22 2013

Am Fri, 20 Dec 2013 15:53:08 +0100 schrieb "marcpmichel" <marc.p.michel gmail.com>:I participated in the "global day of code retreat 2013", and we had to do refactoring on a very ugly piece of code which was available on many languages. But there was no D version, so I made one (based on the java version) and pull-requested it. Here is the ugly thing : https://github.com/jbrains/trivia/tree/master/d EOTbool notAWinner; do { game.roll(rand.front() % 5 + 1); rand.popFront(); if (rand.front() % 9 == 7) { // <-- WARNING! WARNING! notAWinner = game.wrongAnswer(); } else { notAWinner = game.wasCorrectlyAnswered(); } rand.popFront(); } while (notAWinner); This kind of code is a dangerous gamble. This is a story about my student time: I once sat in a Java class and one of the students had an issue with their code not outputting anything and not quitting either. When the teacher came around, we found only one obvious point for an infinite loop could occur and it looked like this: Random rng = new Random(); int count = 0; // Visit all items once while (count < list.size()) { bool found = false; while (!found) { int idx = rng.nextInt() % list.size(); if (list[idx].visited == false) { list[idx].visited = true; found = true; count++; } } } [I don't remember the exact lines, but this is the gist of it.] The teacher himself wrote this code and presented it to the class as a simple way to iterate over a list in random order which was part of todays programming task. It didn't cause issues for any of the other students, but on this particular computer the random seed that the Random ctor chose caused a degenerate case where it never hit any of the 3 remaining indexes of the list. The morale is that "uniform" random numbers doesn't imply that every value in the range will eventually be generated once! -- Marco

Dec 21 2013

On 12/22/2013 01:07 AM, Marco Leise wrote:... It didn't cause issues for any of the other students, but on this particular computer the random seed that the Random ctor chose caused a degenerate case where it never hit any of the 3 remaining indexes of the list. The morale is that "uniform" random numbers doesn't imply that every value in the range will eventually be generated once!Yes it does. (The probability that some value is never generated is 0.) The actual morale is that random number generators do not generate true randomness, and poor random number generators may generate sequences that do not look remotely random.

Dec 21 2013

On 12/22/2013 02:09 AM, Timon Gehr wrote:'pseudo random number generators' would be a more accurate term.The morale is that "uniform" random numbers doesn't imply that every value in the range will eventually be generated once!Yes it does. (The probability that some value is never generated is 0.) The actual morale is that random number generators do not generate true randomness, and poor random number generators may generate sequences that do not look remotely random.

Dec 21 2013

Am Sun, 22 Dec 2013 02:12:51 +0100 schrieb Timon Gehr <timon.gehr gmx.ch>:On 12/22/2013 02:09 AM, Timon Gehr wrote:Can you elaborate a bit? How do you know that the Java LCG can produce every 32-bit integer once? If that's true then the problem with the Java code was something different and I was just biased, because I was already expecting the code to fail before the fact. (Expectations can do strange things to your perception.) -- Marco'pseudo random number generators' would be a more accurate term.The morale is that "uniform" random numbers doesn't imply that every value in the range will eventually be generated once!Yes it does. (The probability that some value is never generated is 0.) The actual morale is that random number generators do not generate true randomness, and poor random number generators may generate sequences that do not look remotely random.

Dec 22 2013

On Sunday, 22 December 2013 at 08:06:30 UTC, Marco Leise wrote:Can you elaborate a bit? How do you know that the Java LCG can produce every 32-bit integer once? If that's true then the problem with the Java code was something different and I was just biased, because I was already expecting the code to fail before the fact. (Expectations can do strange things to your perception.)If I may, http://en.wikipedia.org/wiki/Linear_congruential_generator Definition of an LCG: ``` Xnext = (a * Xprev + c) % m ``` An LCG is said to have a "full period" if the length of the period is m. If the period is m, we know the LCG must produce every number between 0 and m because if there was even one repeated number then the generator as defined above would repeat the entire sequence up to that point and, thus, the period would not be m, which is a contradiction. According to the Hull-Dobell Theorem, an LCG will have a full period iff: 1. `c` and `m` are relatively prime. For Java, c = 11 and m = 2^48 This condition applies. 2. `(a - 1)` is divisible by all prime factors of m` For Java, a = 25214903917 and thus a-1 is even which means the prime factors of m (just 2) do divide it. This condition applies. 3. `a - 1` is a multiple of 4 if `m` is a multiple of 4. For Java, m is a multiple of 4. `(a - 1)/4` is 6303725979, so it's also a multiple of 4. This condition applies as well. Since Java's LCG has a full period over 2^48, we know that taking the top 32 bits (which is what Java does to get "better" randomness) would also all be represented.

Dec 22 2013

On 12/22/2013 09:06 AM, Marco Leise wrote:Am Sun, 22 Dec 2013 02:12:51 +0100 schrieb Timon Gehr <timon.gehr gmx.ch>:The probability that a certain number does not occur in one round is (n-1)/n. ((n-1)/n)^k goes to 0 rather fast as k goes to infinity. In fact, the expected number of trials until all numbers are covered is ~ n log n, and the probability that the process runs significantly longer is very small. See also: http://en.wikipedia.org/wiki/Coupon_collector%27s_problemOn 12/22/2013 02:09 AM, Timon Gehr wrote:Can you elaborate a bit?'pseudo random number generators' would be a more accurate term.The morale is that "uniform" random numbers doesn't imply that every value in the range will eventually be generated once!Yes it does. (The probability that some value is never generated is 0.) The actual morale is that random number generators do not generate true randomness, and poor random number generators may generate sequences that do not look remotely random.How do you know that the Java LCG can produce every 32-bit integer once?Typically constants are chosen such that this holds, but your code would require something stronger to fail, namely, that a certain congruence class does not occur. Typically pseudo random number generators are chosen such that the generated sequences look close to true randomness. If such a simple process can be used to reliably distinguish true randomness and the pseudo random number generator, then the pseudo random number generator is not very good.If that's true then the problem with the Java code was something different and I was just biased, because I was already expecting the code to fail before the fact.Maybe. There is a vast number of ways that this could have failed.(Expectations can do strange things to your perception.)Indeed. :)

Dec 22 2013

Am Sun, 22 Dec 2013 09:19:48 +0000 schrieb "Chris Cain" <clcain uncg.edu>:On Sunday, 22 December 2013 at 08:06:30 UTC, Marco Leise wrote:Am Sun, 22 Dec 2013 13:09:51 +0100 schrieb Timon Gehr <timon.gehr gmx.ch>:Can you elaborate a bit? How do you know that the Java LCG can produce every 32-bit integer once? If that's true then the problem with the Java code was something different and I was just biased, because I was already expecting the code to fail before the fact. (Expectations can do strange things to your perception.)If I may, http://en.wikipedia.org/wiki/Linear_congruential_generator Definition of an LCG: ``` Xnext = (a * Xprev + c) % m ``` An LCG is said to have a "full period" if the length of the period is m. If the period is m, we know the LCG must produce every number between 0 and m because if there was even one repeated number then the generator as defined above would repeat the entire sequence up to that point and, thus, the period would not be m, which is a contradiction. According to the Hull-Dobell Theorem, an LCG will have a full period iff: 1. `c` and `m` are relatively prime. For Java, c = 11 and m = 2^48 This condition applies. 2. `(a - 1)` is divisible by all prime factors of m` For Java, a = 25214903917 and thus a-1 is even which means the prime factors of m (just 2) do divide it. This condition applies. 3. `a - 1` is a multiple of 4 if `m` is a multiple of 4. For Java, m is a multiple of 4. `(a - 1)/4` is 6303725979, so it's also a multiple of 4. This condition applies as well. Since Java's LCG has a full period over 2^48, we know that taking the top 32 bits (which is what Java does to get "better" randomness) would also all be represented.On 12/22/2013 09:06 AM, Marco Leise wrote:Thank you two for explaining LCGs to me. That's good information for reasoning about code. Every good (full period) LCG is a specific permutation of the numbers [0..m). The next time I wonder how I can iterate in random order over a list of length n^2, I know what I'll use ;) -- MarcoAm Sun, 22 Dec 2013 02:12:51 +0100 schrieb Timon Gehr <timon.gehr gmx.ch>:The probability that a certain number does not occur in one round is (n-1)/n. ((n-1)/n)^k goes to 0 rather fast as k goes to infinity. In fact, the expected number of trials until all numbers are covered is ~ n log n, and the probability that the process runs significantly longer is very small. See also: http://en.wikipedia.org/wiki/Coupon_collector%27s_problemHow do you know that the Java LCG can produce every 32-bit integer once?Typically constants are chosen such that this holds, but your code would require something stronger to fail, namely, that a certain congruence class does not occur. Typically pseudo random number generators are chosen such that the generated sequences look close to true randomness. If such a simple process can be used to reliably distinguish true randomness and the pseudo random number generator, then the pseudo random number generator is not very good.

Dec 22 2013