www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.random and mersenne twister

reply Andrea Fontana <nospam example.com> writes:
I read that default RNG in phobos is Mersenne Twister.=20

I think it would be a good idea to replace it with
complementary-multiply-with-carry (cmwc). CMWC is faster (use simpler
math), has a longer period (standard implementation has a 2^131104
period vs 2^19937 of current MT implementation in phobos) and passed
diehard tests (mt passes them too)

And of course it's very easy to implement:
http://en.wikipedia.org/wiki/Multiply-with-carry#Implementation
Jul 20 2012
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 20 July 2012 at 09:47:52 UTC, Andrea Fontana wrote:
 I read that default RNG in phobos is Mersenne Twister.

 I think it would be a good idea to replace it with
 complementary-multiply-with-carry (cmwc). CMWC is faster (use 
 simpler
 math), has a longer period (standard implementation has a 
 2^131104
 period vs 2^19937 of current MT implementation in phobos) and 
 passed
 diehard tests (mt passes them too)

 And of course it's very easy to implement:
 http://en.wikipedia.org/wiki/Multiply-with-carry#Implementation
I'd say the problem is that it is not a very widespread or known PRNG. While I wouldn't be against adding it to the library ("I see no reason not to add it"), making it the _default_ PRNG is a whole other story. Users that choose "default" want something reliable, documented, trustworthy etc... Multiply With Carry seems like it is Cutting Edge to me, not yet widespread, known or tested. I'd say it should only be used by those that explicitly request it's usage.
Jul 20 2012
parent reply Andrea Fontana <nospam example.com> writes:
CMWC is  proven to be a valid method and it passes diehard tests. It was
written by prof. George Marsiglia (he developed xorshift too - included
in std.random). He was one of the best experts in PRNG.

He also developed the "Marsiglia's Theorem" where he demonstrate that
LCG (that is the default algorithm for many languages  for ex: glibc and
vc++ lib, ...) has big issues.
LCG is very widespread but d doesn't use it (phew!). If a user know
difference between PRNG algorithms can use MT, but as default for people
that use weak C rand() function as default (that neither passes diehard
tests) it can just be a good improvement (why should we give them MT
that is slower than CMWC if they neither know default rand() weakness?)=20

MT has a complex implementation, I hope std.random MT was tested :)=20

Il giorno ven, 20/07/2012 alle 13.16 +0200, monarch_dodra ha scritto:

 On Friday, 20 July 2012 at 09:47:52 UTC, Andrea Fontana wrote:
 I read that default RNG in phobos is Mersenne Twister.

 I think it would be a good idea to replace it with
 complementary-multiply-with-carry (cmwc). CMWC is faster (use=20
 simpler
 math), has a longer period (standard implementation has a=20
 2^131104
 period vs 2^19937 of current MT implementation in phobos) and=20
 passed
 diehard tests (mt passes them too)

 And of course it's very easy to implement:
 http://en.wikipedia.org/wiki/Multiply-with-carry#Implementation
=20 I'd say the problem is that it is not a very widespread or known=20 PRNG. =20 While I wouldn't be against adding it to the library ("I see no=20 reason not to add it"), making it the _default_ PRNG is a whole=20 other story. =20 Users that choose "default" want something reliable, documented,=20 trustworthy etc... =20 Multiply With Carry seems like it is Cutting Edge to me, not yet=20 widespread, known or tested. I'd say it should only be used by=20 those that explicitly request it's usage.
Jul 20 2012
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 20/07/12 14:51, Andrea Fontana wrote:
 He also developed the "Marsiglia's Theorem" where he demonstrate that LCG (that
 is the default algorithm for many languages  for ex: glibc and vc++ lib, ...)
 has big issues.
C and C++ have the bad luck to have been created before many of the high-quality PRNGs were developed. A lot of scientific software now has MT as the default. I'd personally never heard of CMWC before this email exchange -- a nice discovery :-)
 MT has a complex implementation, I hope std.random MT was tested :)
I doubt there's anything wrong with the MT implementation per se; it looks largely like a close match to that in the Boost C++ random library, and they certainly generate identical output. There is an issue with the seeding, that D's MT can only take a number as seed, not a sequence of numbers -- see: http://forum.dlang.org/thread/jr0luj$1ctj$1 digitalmars.com But I agree that there should be a proper set of unittests for the PRNGs that really check their implementation. Personally if there is a flaw in the current D random number generation, my money would be on the unpredictableSeed being responsible. It's a hunch rather than something I can justify technically, but the method of generating that seed looks too simple to me -- I'd be worried that for some PRNGs, different threads might wind up with correlated random sequences by accident, because the different unpredictableSeeds would not be different enough. So, given that D looks to be gaining traction in various areas of scientific simulation, I think it'd be well worth trying to make sure that multithreaded pseudo-random number generation is really rigorously tested. Unfortunately I don't know what kinds of test would be appropriate here.
Jul 20 2012
prev sibling next sibling parent reply "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Friday, 20 July 2012 at 12:51:25 UTC, Andrea Fontana wrote:
 CMWC is  proven to be a valid method and it passes diehard 
 tests. It was
 written by prof. George Marsiglia (he developed xorshift too - 
 included
 in std.random). He was one of the best experts in PRNG.

 He also developed the "Marsiglia's Theorem" where he 
 demonstrate that
 LCG (that is the default algorithm for many languages  for ex: 
 glibc and
 vc++ lib, ...) has big issues.
 LCG is very widespread but d doesn't use it (phew!). If a user 
 know
 difference between PRNG algorithms can use MT, but as default 
 for people
 that use weak C rand() function as default (that neither passes 
 diehard
 tests) it can just be a good improvement (why should we give 
 them MT
 that is slower than CMWC if they neither know default rand() 
 weakness?)

 MT has a complex implementation, I hope std.random MT was 
 tested :)

 Il giorno ven, 20/07/2012 alle 13.16 +0200, monarch_dodra ha 
 scritto:

 On Friday, 20 July 2012 at 09:47:52 UTC, Andrea Fontana wrote:
 I read that default RNG in phobos is Mersenne Twister.

 I think it would be a good idea to replace it with
 complementary-multiply-with-carry (cmwc). CMWC is faster 
 (use simpler
 math), has a longer period (standard implementation has a 
 2^131104
 period vs 2^19937 of current MT implementation in phobos) 
 and passed
 diehard tests (mt passes them too)

 And of course it's very easy to implement:
 http://en.wikipedia.org/wiki/Multiply-with-carry#Implementation
I'd say the problem is that it is not a very widespread or known PRNG. While I wouldn't be against adding it to the library ("I see no reason not to add it"), making it the _default_ PRNG is a whole other story. Users that choose "default" want something reliable, documented, trustworthy etc... Multiply With Carry seems like it is Cutting Edge to me, not yet widespread, known or tested. I'd say it should only be used by those that explicitly request it's usage.
But Mersenne Twister has a cooler name :o) 'Multiply with carry' is so blah. You'll need to come up with a sexy new name for it. Cheers, Craig
Jul 20 2012
parent reply Andrea Fontana <nospam example.com> writes:
Il giorno ven, 20/07/2012 alle 16.05 +0200, Craig Dillabaugh ha scritto:

 On Friday, 20 July 2012 at 12:51:25 UTC, Andrea Fontana wrote:
 CMWC is  proven to be a valid method and it passes diehard=20
 tests. It was
 written by prof. George Marsiglia (he developed xorshift too -=20
 included
 in std.random). He was one of the best experts in PRNG.

 He also developed the "Marsiglia's Theorem" where he=20
 demonstrate that
 LCG (that is the default algorithm for many languages  for ex:=20
 glibc and
 vc++ lib, ...) has big issues.
 LCG is very widespread but d doesn't use it (phew!). If a user=20
 know
 difference between PRNG algorithms can use MT, but as default=20
 for people
 that use weak C rand() function as default (that neither passes=20
 diehard
 tests) it can just be a good improvement (why should we give=20
 them MT
 that is slower than CMWC if they neither know default rand()=20
 weakness?)

 MT has a complex implementation, I hope std.random MT was=20
 tested :)

 Il giorno ven, 20/07/2012 alle 13.16 +0200, monarch_dodra ha=20
 scritto:

 On Friday, 20 July 2012 at 09:47:52 UTC, Andrea Fontana wrote:
 I read that default RNG in phobos is Mersenne Twister.

 I think it would be a good idea to replace it with
 complementary-multiply-with-carry (cmwc). CMWC is faster=20
 (use simpler
 math), has a longer period (standard implementation has a=20
 2^131104
 period vs 2^19937 of current MT implementation in phobos)=20
 and passed
 diehard tests (mt passes them too)

 And of course it's very easy to implement:
 http://en.wikipedia.org/wiki/Multiply-with-carry#Implementation
=20 I'd say the problem is that it is not a very widespread or=20 known PRNG. =20 While I wouldn't be against adding it to the library ("I see=20 no reason not to add it"), making it the _default_ PRNG is a=20 whole other story. =20 Users that choose "default" want something reliable,=20 documented, trustworthy etc... =20 Multiply With Carry seems like it is Cutting Edge to me, not=20 yet widespread, known or tested. I'd say it should only be=20 used by those that explicitly request it's usage.
=20 But Mersenne Twister has a cooler name :o) =20 'Multiply with carry' is so blah. You'll need to come up with a sexy new name for it. =20 Cheers, Craig =20
Marsaglia Tornado?
Jul 20 2012
parent "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Friday, 20 July 2012 at 14:11:18 UTC, Andrea Fontana wrote:
 Il giorno ven, 20/07/2012 alle 16.05 +0200, Craig Dillabaugh ha 
 scritto:

 On Friday, 20 July 2012 at 12:51:25 UTC, Andrea Fontana wrote:
clip...
 
 But Mersenne Twister has a cooler name :o)
 
 'Multiply with carry' is so blah. You'll need to come up with a
 sexy new name for it.
 
 Cheers,
 Craig
 
Marsaglia Tornado?
Very nice ... and it even uses the same acronym. Craig
Jul 20 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/20/12 8:51 AM, Andrea Fontana wrote:
 CMWC is proven to be a valid method and it passes diehard tests. It was
 written by prof. George Marsiglia (he developed xorshift too - included
 in std.random). He was one of the best experts in PRNG.
Would be great if you wanted to contribute an implementation of CMWC to std.random. Thanks, Andrei
Jul 20 2012
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 20 July 2012 at 21:45:17 UTC, Andrei Alexandrescu 
wrote:
 On 7/20/12 8:51 AM, Andrea Fontana wrote:
 CMWC is proven to be a valid method and it passes diehard 
 tests. It was
 written by prof. George Marsiglia (he developed xorshift too - 
 included
 in std.random). He was one of the best experts in PRNG.
Would be great if you wanted to contribute an implementation of CMWC to std.random. Thanks, Andrei
That's something I could undertake confidently. Writing both an actual engine, and creating aliases for pre-optimized schemes. I'd just have to wait to finish my current development (I don't know how to have parallel forks).
Jul 21 2012
next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Saturday, 21 July 2012 at 10:28:58 UTC, monarch_dodra wrote:
 I'd just have to wait to finish my current development (I don't 
 know how to have parallel forks).
Just create a single GitHub fork with multiple branches, one for each feature/bug you are working on. After pushing them to GitHub, you can create separate pull requests for each of them. David
Jul 21 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/21/12 6:28 AM, monarch_dodra wrote:
 On Friday, 20 July 2012 at 21:45:17 UTC, Andrei Alexandrescu wrote:
 On 7/20/12 8:51 AM, Andrea Fontana wrote:
 CMWC is proven to be a valid method and it passes diehard tests. It was
 written by prof. George Marsiglia (he developed xorshift too - included
 in std.random). He was one of the best experts in PRNG.
Would be great if you wanted to contribute an implementation of CMWC to std.random. Thanks, Andrei
That's something I could undertake confidently. Writing both an actual engine, and creating aliases for pre-optimized schemes. I'd just have to wait to finish my current development (I don't know how to have parallel forks).
That would be awesome (both RNG and getting fluent with parallel forks). Andrei
Jul 21 2012
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 21 July 2012 at 16:32:56 UTC, Andrei Alexandrescu 
wrote:
 On 7/21/12 6:28 AM, monarch_dodra wrote:
 On Friday, 20 July 2012 at 21:45:17 UTC, Andrei Alexandrescu 
 wrote:
 On 7/20/12 8:51 AM, Andrea Fontana wrote:
 CMWC is proven to be a valid method and it passes diehard 
 tests. It was
 written by prof. George Marsiglia (he developed xorshift too 
 - included
 in std.random). He was one of the best experts in PRNG.
Would be great if you wanted to contribute an implementation of CMWC to std.random. Thanks, Andrei
That's something I could undertake confidently. Writing both an actual engine, and creating aliases for pre-optimized schemes. I'd just have to wait to finish my current development (I don't know how to have parallel forks).
That would be awesome (both RNG and getting fluent with parallel forks). Andrei
I gave it a shot, but I just couldn't find enough documentation on CMWC to write a correctly parametrizable engine. I honestly don't have much to go by other than wikipedia's example. I could force the implementation, but I'd have zero faith in it's reliability. That said, I don't think it would be a bad idea to add a Lagged Fibonacci generator first. The generator already exists in boost, and is much more documented. The actual development would just require a port. I'll try to get *that* done, but for CMWC, I'm out. Sorry. The bright side is I learned to parallel fork :D
Jul 24 2012
next sibling parent reply "Andrea Fontana" <nospam example.com> writes:
On Tuesday, 24 July 2012 at 16:09:15 UTC, monarch_dodra wrote:
 On Saturday, 21 July 2012 at 16:32:56 UTC, Andrei Alexandrescu 
 wrote:
 On 7/21/12 6:28 AM, monarch_dodra wrote:
 On Friday, 20 July 2012 at 21:45:17 UTC, Andrei Alexandrescu 
 wrote:
 On 7/20/12 8:51 AM, Andrea Fontana wrote:
 CMWC is proven to be a valid method and it passes diehard 
 tests. It was
 written by prof. George Marsiglia (he developed xorshift 
 too - included
 in std.random). He was one of the best experts in PRNG.
Would be great if you wanted to contribute an implementation of CMWC to std.random. Thanks, Andrei
That's something I could undertake confidently. Writing both an actual engine, and creating aliases for pre-optimized schemes. I'd just have to wait to finish my current development (I don't know how to have parallel forks).
That would be awesome (both RNG and getting fluent with parallel forks). Andrei
I gave it a shot, but I just couldn't find enough documentation on CMWC to write a correctly parametrizable engine. I honestly don't have much to go by other than wikipedia's example. I could force the implementation, but I'd have zero faith in it's reliability. That said, I don't think it would be a bad idea to add a Lagged Fibonacci generator first. The generator already exists in boost, and is much more documented. The actual development would just require a port. I'll try to get *that* done, but for CMWC, I'm out. Sorry. The bright side is I learned to parallel fork :D
I see there's an implementation in tango for d1, with params too :) Have you looked for Marsaglia's paper?
Jul 24 2012
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, July 25, 2012 00:01:03 Andrea Fontana wrote:
 I see there's an implementation in tango for d1, with params too
Which does Phobos no good unless you can get permission from the author(s) of that code to switch the license to Boost. Without that, you should probably avoid even looking at it if you're going to be working on an implementation for it so that you eliminate any risk of licensing issues. There's a lot of great work which went into Tango, but as long as its license is incompatible with Boost, it can't be put into Phobos at all, though it can still obviously be used by those who want to (especially now that there's a D2 fork of Tango). - Jonathan M Davis
Jul 24 2012
parent reply Andrea Fontana <nospam example.com> writes:
He want docs, here he can understand how it works.=20

Some good params for CMWC can be found here and was provided by G.
Marsaglia:
http://blacklen.wordpress.com/2011/05/15/prng-3-complementary-multiply-with=
-carry/

A code for java "you're free to use this code as you want" can be find
here:
http://www.javaprogrammingforums.com/blogs/helloworld922/11-complimentary-m=
ultiply-carry-better-way-generate-pseudo-random-numbers.html

Here some code and overview from george marsaglia:
http://school.anhb.uwa.edu.au/personalpages/kwessen/shared/Marsaglia03.html

Some considerations:
http://www.powerbasic.com/support/pbforums/showthread.php?t=3D50216


Il giorno mar, 24/07/2012 alle 18.18 -0400, Jonathan M Davis ha scritto:

 On Wednesday, July 25, 2012 00:01:03 Andrea Fontana wrote:
 I see there's an implementation in tango for d1, with params too
=20 Which does Phobos no good unless you can get permission from the author(s=
) of=20
 that code to switch the license to Boost. Without that, you should probab=
ly=20
 avoid even looking at it if you're going to be working on an implementati=
on=20
 for it so that you eliminate any risk of licensing issues. There's a lot =
of=20
 great work which went into Tango, but as long as its license is incompati=
ble=20
 with Boost, it can't be put into Phobos at all, though it can still obvio=
usly=20
 be used by those who want to (especially now that there's a D2 fork of Ta=
ngo).
=20
 - Jonathan M Davis
Jul 25 2012
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
Thanks for the links, and the info regarding copyrights. I'll 
stick with my decision to *start* with Lagged Fibonacci. I'll let 
the experience of that decide for me if I want to tackle CMWC or 
not (provided I even succeed with LF).

PS: I like the name Marsaglia Tornado, but the initials are mt...
Jul 25 2012
parent Andrea Fontana <nospam example.com> writes:
Of course it was a joke :)

Il giorno mer, 25/07/2012 alle 10.41 +0200, monarch_dodra ha scritto:

 PS: I like the name Marsaglia Tornado, but the initials are mt...
Jul 25 2012
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 24 July 2012 at 16:09:15 UTC, monarch_dodra wrote:
 That said, I don't think it would be a bad idea to add a Lagged 
 Fibonacci generator first. The generator already exists in 
 boost, and is much more documented. The actual development 
 would just require a port.
Hi There! I wrote the port. https://github.com/D-Programming-Language/phobos/pull/727 It works as intended, AFAIK, as it generates the exact same sequence that boost does. I'd be mighty pleased to have code review done on it. The details are on the pull page, but I guess you can also leave some feedback here.
Jul 30 2012
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 20/07/12 11:47, Andrea Fontana wrote:
 I think it would be a good idea to replace it with
 complementary-multiply-with-carry (cmwc). CMWC is faster (use simpler math),
has
 a longer period (standard implementation has a 2^131104 period vs 2^19937 of
 current MT implementation in phobos) and passed diehard tests (mt passes them
too)
That reminds me ... it might be an idea to implement the Diehard tests for D, as part of a test suite for std.random. Rigorously testing pseudorandom functionality is not really my area of expertise, but it's something that is important to do for any code that might have a scientific application (quite early on in my experience of writing simulations, I had the lovely experience of having to rewrite and rerun a whole load of code because of poor RNG choice; fortunately it didn't affect anything that had been published). Some years ago, there was an observed departure from randomness in the default MATLAB RNG which must have resulted in all kinds of false conclusions and results being out there in the scientific literature. On the subject of default RNG -- the use of Mersenne Twister is very widespread as a default method (used in MATLAB/Octave, R, and it's the recommended method in GSL and Boost). So it may be a good default choice for D just by virtue of easy comparison with other tools.
Jul 20 2012