www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 10550] New: Xorshift32 and Xorshift160 do not generate uniformly-distributed random numbers

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550

           Summary: Xorshift32 and Xorshift160 do not generate
                    uniformly-distributed random numbers
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: joseph.wakeling webdrake.net


--- Comment #0 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-05 07:11:03 PDT ---
The output of Xorshift32 and Xorshift160 departs strongly from uniformity.

The issue for Xorshift32 can be seen clearly even in this very simple piece of
code:

    auto rng = Xorshift32(unpredictableSeed);

    foreach(r; rng.take(20))
    {
        writeln(r, "\t", rng.min, "\t", rng.max);
    }

... where we can see that the generated values are always much closer to
rng.min than rng.max.

A more sophisticated example (but still very simple) is provided in the test of
uniformity in https://github.com/WebDrake/std.random.test -- specifically, in:
https://github.com/WebDrake/std.random.test/blob/master/uniform.d
https://github.com/WebDrake/std.random.test/blob/master/test/stats.d

... which generates millions of random numbers using uniform(0.0, 0.1, rng),
divides them up into bins of width 0.05, and outputs the histogram and
cumulative distribution values.

For Xorshift32, this gives:

Generating 10000000 random numbers in [0, 1) with XorshiftEngine!(uint, 32, 13,
17, 5)
    5    1    1
    10    0    1
    15    0    1
    20    0    1
    25    0    1
    30    0    1
    35    0    1
    40    0    1
    45    0    1
    50    0    1
    55    0    1
    60    0    1
    65    0    1
    70    0    1
    75    0    1
    80    0    1
    85    0    1
    90    0    1
    95    0    1
    100    0    1

... while for Xorshift160 it gives:

Generating 10000000 random numbers in [0, 1) with XorshiftEngine!(uint, 160, 2,
1, 4)
    5    0.0535713    0.0535713
    10    0.0208334    0.0744047
    15    0.0446429    0.119048
    20    0.047619    0.166667
    25    0.0714286    0.238095
    30    0.0238096    0.261905
    35    0.0208333    0.282738
    40    0.0416667    0.324405
    45    0.0595238    0.383929
    50    0.0446429    0.428572
    55    0.0476191    0.476191
    60    0.0595239    0.535714
    65    0.0714285    0.607143
    70    0.0535714    0.660714
    75    0.0535713    0.714286
    80    0.0744045    0.78869
    85    0.0744049    0.863095
    90    0.0357143    0.898809
    95    0.0238095    0.922619
    100    0.0773811    1

Other RNGs, including other Xorshift types, appear to generate correct
proportions.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 05 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #1 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-05 07:35:07 PDT ---
The Xorshift32 non-uniformity can be fixed by correcting the update rule in
popFront(), from:

        static if (bits == 32)
        {
            temp      = seeds_[0] ^ (seeds_[0] << a);
            temp      = temp >> b;
            seeds_[0] = temp ^ (temp << c);
        }

to:

        static if (bits == 32)
        {
            temp      = seeds_[0] ^ (seeds_[0] << a);
            temp      = temp ^ (temp >> b);
            seeds_[0] = temp ^ (temp << c);
        }

See p.3 of http://www.jstatsoft.org/v08/i14/paper -- the current implementation
appears to be a typo when copying the first from the list of possible update
rules.

However, if this change is made, the Xorshift unittests fail for the checks
against the reference edition:

    auto checking = [
        [2463534242UL, 267649, 551450, 53765, 108832, 215250, 435468, 860211,
660133, 263375],
        [362436069UL, 2113136921, 19051112, 3010520417, 951284840, 1213972223,
3173832558, 2611145638, 2515869689, 2245824891],
        [521288629UL, 1950277231, 185954712, 1582725458, 3580567609,
2303633688, 2394948066, 4108622809, 1116800180, 3357585673],
        [88675123UL, 3701687786, 458299110, 2500872618, 3633119408, 516391518,
2377269574, 2599949379, 717229868, 137866584],
        [5783321UL, 93724048, 491642011, 136638118, 246438988, 238186808,
140181925, 533680092, 285770921, 462053907],
        [0UL, 246875399, 3690007200, 1264581005, 3906711041, 1866187943,
2481925219, 2464530826, 1604040631, 3653403911]
    ];

    alias TypeTuple!(Xorshift32, Xorshift64, Xorshift96, Xorshift128,
Xorshift160, Xorshift192) XorshiftTypes;

    foreach (I, Type; XorshiftTypes)
    {
        Type rnd;

        foreach (e; checking[I])
        {
            assert(rnd.front == e);
            rnd.popFront();
        }
    }

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 05 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #2 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-05 07:35:38 PDT ---
I also think the choice of a, b, c values may be in error: currently we have

    alias XorshiftEngine!(uint, 32,  13, 17, 5)  Xorshift32;

... but I think this is most likely a typo for

    alias XorshiftEngine!(uint, 32,  13, 17, 15)  Xorshift32;

as the paper states that there should be a < c and the triple 13, 17, 5 is not
found among the list of valid triples (while 13, 17, 15 is).

However, correcting this does not prevent the unittest fail described in the
previous comment.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 05 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #3 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-05 07:38:56 PDT ---
(In reply to comment #2)
 I also think the choice of a, b, c values may be in error: currently we have
 
     alias XorshiftEngine!(uint, 32,  13, 17, 5)  Xorshift32;
This seems to stem from a descriptive passage at the top of p.4 of the paper, where this choice of a, b, c values is described as one of the author's favourites. However, I suspect that this is a typo as it violates the a < c rule described elsewhere in the paper and (as already stated) the 13, 17, 5 triple is not found in the table of appropriate triples for 32-bit Xorshift. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 05 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #4 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-05 08:12:24 PDT ---
Uniformity in Xorshift160 can be restored by tweaking the update rules:

        else static if (bits == 160)
        {
            temp      = seeds_[0] ^ (seeds_[0] >> a);
            seeds_[0] = seeds_[1];
            seeds_[1] = seeds_[2];
            seeds_[2] = seeds_[3];
            seeds_[3] = seeds_[4];
            seeds_[4] = seeds_[4] ^ (seeds_[4] >> c) ^ temp ^ (temp >> b);
        }

... which faithfully reproduce what is given at the bottom of p.4 of the paper,
and changing the first line to:

            temp      = seeds_[0] ^ (seeds_[0] << a);

Note that this change was pure guesswork on the grounds that other bit-values
of the algorithm had this alternative formulation.  It also results in a
failure of the unittest on line 1032 of std.random.

Unfortunately George Marsaglia has died, so we can't ask him about typos in his
papers. :-(  I was not able to find any erratum to the published article, but
the discrepancies already identified make me suspect that it must be in error
in several places.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 05 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc


--- Comment #5 from bearophile_hugs eml.cc 2013-07-08 01:46:42 PDT ---
Maybe this bug should have a "major" importance. And maybe a warning note in
site ddocs should be added in the meantime.

These tests can help:
http://en.wikipedia.org/wiki/Diehard_tests


Unfortunately George Marsaglia has died, so we can't ask him about typos in his
papers.< Some people should live longer. Probably some of his collaborators or people that have used his generators have some errata list or some suggestions to help. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550


Joseph Rushton Wakeling <joseph.wakeling webdrake.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major


--- Comment #6 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-08 02:46:40 PDT ---
(In reply to comment #5)
 Maybe this bug should have a "major" importance. And maybe a warning note in
 site ddocs should be added in the meantime.
Agree with the "major" importance, tweaked accordingly.
 These tests can help:
 http://en.wikipedia.org/wiki/Diehard_tests
These are now a little out of date, but there is the "dieharder" suite which is actually available as a utility in many Linux distros. I plan on incorporating that into my test suite -- I think it should be as simple as just getting D to generate random variates using whatever method and then piping them through to dieharder.
 Some people should live longer. Probably some of his collaborators or people
 that have used his generators have some errata list or some suggestions to
 help.
I did not find anything yet, but I'll keep looking. I assumed that his homepage would still exist with remarks like this on it, but didn't track down anything useful so far. There are other people who've written follow-up papers on Xorshift who could be worth contacting. I may do that if I can't find obvious documentary material. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #7 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-08 05:58:49 PDT ---
What I'd really like is to have a source for the checking values for Xorshift
used in the unittests.  Masahiro, do you recall how you obtained these values? 
They're not in Marsaglia's paper, and Google searches to track down their
source are proving fruitless. :-(

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #8 from Masahiro Nakagawa <repeatedly gmail.com> 2013-07-08
07:28:26 PDT ---
(In reply to comment #7)
 What I'd really like is to have a source for the checking values for Xorshift
 used in the unittests.  Masahiro, do you recall how you obtained these values? 
 They're not in Marsaglia's paper, and Google searches to track down their
 source are proving fruitless. :-(
I remember correctly, I generated test cases from paper based C implementation. I implemented C version first and generated test cases for D unittest. After that, I implemented XorshiftEngine for D-ish code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #9 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-08 07:51:04 PDT ---
(In reply to comment #8)
 I remember correctly, I generated test cases from paper based C implementation.
 I implemented C version first and generated test cases for D unittest.
 After that, I implemented XorshiftEngine for D-ish code.
Ahhh, OK. Then I think we can reasonably assume that these test sequences, like the code in std.random, reflect typos in Marsaglia's paper. For Xorshift32 we can see that Marsaglia's C implementation at the very top of p.32 contains at least one typo -- the second bitshift y=(y>>17) should be y^=(y>>17). I am strongly suspicious that there is a second typo for the value of c, which should be 15 rather than 5. It's definitely not the only such typo in the paper -- Panneton and L'Ecuyer (2005) note that the (a, b, c) triple (9, 5, 1) should be (9, 5, 14). In any case, we lose nothing by using the triple (13, 17, 15) since it's acknowledged in the paper as a valid choice. For Xorshift160 I think that the code given has a typo for the bitshift with respect to a, as in all other such code examples the bitshift is in the opposite direction to that for b and c. In general, the bitshifts seem to follow a rule of two in one direction, one in the other. In summary, I think we can proceed as follows: * confirm with experts in the field the typos in the paper * generate new checking sequences with corrected versions of the C code * correct the D code and unittests accordingly I am actually inclined to jump ahead with getting the patches to Phobos done, because I'm pretty confident my analysis here is correct :-) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #10 from Masahiro Nakagawa <repeatedly gmail.com> 2013-07-08
09:26:03 PDT ---
(In reply to comment #9)
 ...
 In summary, I think we can proceed as follows:
 
    * confirm with experts in the field the typos in the paper
 
    * generate new checking sequences with corrected versions of the C code
 
    * correct the D code and unittests accordingly
 
 I am actually inclined to jump ahead with getting the patches to Phobos done,
 because I'm pretty confident my analysis here is correct :-)
I rechecked the paper and I agree with you (y^=, a < c and 160's >>) So I think we can fix this paper derived bugs. Could you send the pull request? After passed auto tester, I will merge it. On the other hand, confirm the typos is good for future generations. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #11 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-08 09:47:04 PDT ---
(In reply to comment #10)
 I rechecked the paper and I agree with you (y^=, a < c and 160's >>)
 So I think we can fix this paper derived bugs.
 Could you send the pull request?
 After passed auto tester, I will merge it.
Yes, I'll get that submitted either later this evening or tomorrow. :-)
 On the other hand, confirm the typos is good for future generations.
Agree. I also think it could be a good way to begin outreach to the PRNG academic community -- see: http://d.puremagic.com/issues/show_bug.cgi?id=10572 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #12 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-08 12:49:09 PDT ---
(In reply to comment #11)
 (In reply to comment #10)
 I rechecked the paper and I agree with you (y^=, a < c and 160's >>)
 So I think we can fix this paper derived bugs.
 Could you send the pull request?
 After passed auto tester, I will merge it.
Yes, I'll get that submitted either later this evening or tomorrow. :-)
First step -- corrected reference code: https://github.com/WebDrake/xorshift I first implemented versions that match _exactly_ Marsaglia's sample code, and that reproduce the checking values from std.random: https://github.com/WebDrake/xorshift/blob/4305bd5a0ac4d94f59505713a085b34d2e4d482e/xorshift.c Subsequent commits correct the typos we've observed, and so the current version can be used to generate updated checking values. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #13 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-08 14:31:53 PDT ---
Pull request submitted:
https://github.com/D-Programming-Language/phobos/pull/1403

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #14 from github-bugzilla puremagic.com 2013-07-08 22:48:03 PDT ---
Commit pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/e1504f70b8cfdf4cd1082408fe2452a71e299ab3
Merge pull request #1403 from WebDrake/xorshift

Fix Issue 10550 - Xorshift32 and Xorshift160 do not generate
uniformly-distributed random numbers

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 08 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550



--- Comment #15 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-09 01:04:56 PDT ---
Masahiro -- thanks so much for your fast attention to this. :-)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 09 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10550


Joseph Rushton Wakeling <joseph.wakeling webdrake.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


--- Comment #16 from Joseph Rushton Wakeling <joseph.wakeling webdrake.net>
2013-07-09 11:16:30 PDT ---
I think we can now close the bug, but I will try and follow up with some people
expert in RNG design to see if we can confirm the fixes really are correct.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 09 2013