www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Mersenne Twister Seeding and UUIDs

reply Andrew Talbot <andrew.talbot talbotville.com> writes:
Forgive what may be the unintelligible ramblings of an ignorant hobbyist, 
but, if I am not mistaken, the Mersenne Twister implementation in std.random 
currently can be seeded only with a 32-bit unsigned integer, which I presume 
gives it 2^32 starting points, whereas I believe there should also be an 
alternative option to seed it with an array of up to 624 uintS, so that 
potentially it can be started in any one of its 19,937 internal states. I 
may be wrong, but I believe that to have instant access to enough of the 
"state space" to be able to generate a large number of unique random UUIDs 
this second seeding option may be necessary.

-- 
Andy Talbot.
Jun 09 2012
next sibling parent reply Andrew Talbot <andrew.talbot talbotville.com> writes:
Andrew Talbot wrote:

 which I presume gives it 2^32 starting points, whereas I believe there
 should also be an alternative option to seed it with an array of up to 624
 uintS, so that potentially it can be started in any one of its 19,937
 internal states.

Of course I meant "...in any one of its 2^19,937 internal states". -- Andy Talbot.
Jun 10 2012
parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Sunday, 10 June 2012 at 08:20:37 UTC, Andrew Talbot wrote:
 Andrew Talbot wrote:

 which I presume gives it 232 starting points, whereas I 
 believe there should also be an alternative option to seed it 
 with an array of up to 624 uintS, so that potentially it can 
 be started in any one of its 19,937 internal states.

Of course I meant "...in any one of its 2^19,937 internal states".

This does take me back a few years when I was making a PRNG that accepted an int for seeding (to be compatible with C's srand). My solution ended up being that if you didn't use the RNG before you seeded it again it would append to the seed internally; So 2 int's worth would be a 64bit seed. Anyways it's mostly a thought. I'm sure you'll get far better answers from everyone else.
Jun 13 2012
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 10 Jun 2012 00:24 +0100
schrieb Andrew Talbot <andrew.talbot talbotville.com>:

 Forgive what may be the unintelligible ramblings of an ignorant
 hobbyist, but, if I am not mistaken, the Mersenne Twister
 implementation in std.random currently can be seeded only with a
 32-bit unsigned integer, which I presume gives it 2^32 starting
 points, whereas I believe there should also be an alternative option
 to seed it with an array of up to 624 uintS, so that potentially it
 can be started in any one of its 19,937 internal states. I may be
 wrong, but I believe that to have instant access to enough of the
 "state space" to be able to generate a large number of unique random
 UUIDs this second seeding option may be necessary.
 

Could someone who's familiar with RNGs answer this question? This seems to be important for st.uuid, we should get this right.
Jun 11 2012
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 11/06/12 18:15, Johannes Pfau wrote:
 Am Sun, 10 Jun 2012 00:24 +0100
 schrieb Andrew Talbot<andrew.talbot talbotville.com>:

 Forgive what may be the unintelligible ramblings of an ignorant
 hobbyist, but, if I am not mistaken, the Mersenne Twister
 implementation in std.random currently can be seeded only with a
 32-bit unsigned integer, which I presume gives it 2^32 starting
 points, whereas I believe there should also be an alternative option
 to seed it with an array of up to 624 uintS, so that potentially it
 can be started in any one of its 19,937 internal states. I may be
 wrong, but I believe that to have instant access to enough of the
 "state space" to be able to generate a large number of unique random
 UUIDs this second seeding option may be necessary.

Could someone who's familiar with RNGs answer this question? This seems to be important for st.uuid, we should get this right.

In the Boost C++ implementation it certainly accepts a range as input: template<class It> void seed(It& first, It last) { int j; for(j = 0; j < n && first != last; ++j, ++first) x[j] = *first; i = n; if(first == last && j < n) throw std::invalid_argument("mersenne_twister::seed"); } Some information on the seeding issue can be found here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html
Jun 11 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/11/12 12:32 PM, Joseph Rushton Wakeling wrote:
 On 11/06/12 18:15, Johannes Pfau wrote:
 Could someone who's familiar with RNGs answer this question? This seems
 to be important for st.uuid, we should get this right.

In the Boost C++ implementation it certainly accepts a range as input: template<class It> void seed(It& first, It last) { int j; for(j = 0; j < n && first != last; ++j, ++first) x[j] = *first; i = n; if(first == last && j < n) throw std::invalid_argument("mersenne_twister::seed"); } Some information on the seeding issue can be found here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html

We should have the same in std.random. Could anyone please initiate a pull request? Thanks, Andrei
Jun 11 2012
parent Andrew Talbot <andrew.talbot talbotville.com> writes:
Regarding the mass production of random UUIDs, I believe that one would have 
to provide a seed that was at least 128 bits (or 16 ubyteS) wide in order to 
have uniform probability of generating any one random UUID from the entire 
set of 2^128 possible values. (If I read the following web page 
[http://randomlib.sourceforge.net/html/seeds.html] correctly, there would 
seem to be a strong probability of collisions when generating more than 2^16 
such UUIDs if only a 32-bit seed were used.)

In the same vein, when shuffling playing cards I reckon one would need a 
226-bit (or 29-ubyte) seed to cover the entire set of 52! (or approximately 
8 x 10^67) possible permutations. It may be that using even more bits is 
advisable.

I wish I understood PRNGs. I don't, but I think the above is true.

Thanks to everyone!

-- 
Andy.
Jun 13 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Johannes Pfau wrote:
 Am Mon, 11 Jun 2012 13:09:26 -0500
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 
 On 6/11/12 12:32 PM, Joseph Rushton Wakeling wrote:
 On 11/06/12 18:15, Johannes Pfau wrote:
 Could someone who's familiar with RNGs answer this question? This
 seems to be important for st.uuid, we should get this right.

In the Boost C++ implementation it certainly accepts a range as input: template<class It> void seed(It& first, It last) { int j; for(j = 0; j < n && first != last; ++j, ++first) x[j] = *first; i = n; if(first == last && j < n) throw std::invalid_argument("mersenne_twister::seed"); } Some information on the seeding issue can be found here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html

We should have the same in std.random. Could anyone please initiate a pull request? Thanks, Andrei

Here's a first try: https://gist.github.com/2916360 Three questions: * As seed is a normal function right now, I can't overload it with a template. Is it safe to make the original seed a template as well, so seedRange could be named seed, or would that break the API?

What do you mean by you can't overload it? Does it not compile? I think it should.
 * How should the seed array/range be generated? By repeatedly calling
   unpredictableSeed?

No idea what is recommended from cryptographic point of view.
 * Is there a Range which repeatedly calls a function? Similar to
   repeat, but not repeating a value.

Here is one way to do it. seedRange(map!((a) => unpredictableSeed)(iota(0, 624))); Probably, there is a more straightforward way. I simplified your code a bit. enum n = 624; enforce(range.length >= n, "MersenneTwisterEngine.seedRange: Input range didn't provide enough" " elements"); uint mt[]; mt.length = n; copy(range, mt); But I'm unsure whether it still does what you intended. Jens
Jun 12 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 11/06/12 19:09, Andrei Alexandrescu wrote:
 We should have the same in std.random. Could anyone please initiate a pull
request?

I'll see what I can do. Is it OK to pretty much copy the Boost code, bar D-ifying it a bit? The licence is identical, so this shouldn't be an issue AFAICS.
Jun 12 2012
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 12/06/12 13:15, Joseph Rushton Wakeling wrote:
 I'll see what I can do. Is it OK to pretty much copy the Boost code, bar
 D-ifying it a bit? The licence is identical, so this shouldn't be an issue
AFAICS.

... hit Reply before all the other overnight (for me) emails arrived. D'oh. :-\
Jun 12 2012
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Mon, 11 Jun 2012 13:09:26 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 6/11/12 12:32 PM, Joseph Rushton Wakeling wrote:
 On 11/06/12 18:15, Johannes Pfau wrote:
 Could someone who's familiar with RNGs answer this question? This
 seems to be important for st.uuid, we should get this right.

In the Boost C++ implementation it certainly accepts a range as input: template<class It> void seed(It& first, It last) { int j; for(j = 0; j < n && first != last; ++j, ++first) x[j] = *first; i = n; if(first == last && j < n) throw std::invalid_argument("mersenne_twister::seed"); } Some information on the seeding issue can be found here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html

We should have the same in std.random. Could anyone please initiate a pull request? Thanks, Andrei

Here's a first try: https://gist.github.com/2916360 Three questions: * As seed is a normal function right now, I can't overload it with a template. Is it safe to make the original seed a template as well, so seedRange could be named seed, or would that break the API? * How should the seed array/range be generated? By repeatedly calling unpredictableSeed? * Is there a Range which repeatedly calls a function? Similar to repeat, but not repeating a value.
Jun 12 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, June 12, 2012 11:48:11 Jens Mueller wrote:
 Johannes Pfau wrote:
 * As seed is a normal function right now, I can't overload it with a
 
   template. Is it safe to make the original seed a template as well, so
   seedRange could be named seed, or would that break the API?

What do you mean by you can't overload it? Does it not compile? I think it should.

You can't currently overload templated functions with non-templated functions or vice versa: http://d.puremagic.com/issues/show_bug.cgi?id=1528 The solution is to just give the non-templated functions an empty template parameter list. It shouldn't break any code. Worst case, you might have to add constraints to the fully templated version to make it so that it doesn't match any of the arguments that the formerly non-templated function accepts. - Jonathan M Davis
Jun 12 2012
prev sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Jonathan M Davis wrote:
 On Tuesday, June 12, 2012 11:48:11 Jens Mueller wrote:
 Johannes Pfau wrote:
 * As seed is a normal function right now, I can't overload it with a
 
   template. Is it safe to make the original seed a template as well, so
   seedRange could be named seed, or would that break the API?

What do you mean by you can't overload it? Does it not compile? I think it should.

You can't currently overload templated functions with non-templated functions or vice versa: http://d.puremagic.com/issues/show_bug.cgi?id=1528 The solution is to just give the non-templated functions an empty template parameter list. It shouldn't break any code. Worst case, you might have to add constraints to the fully templated version to make it so that it doesn't match any of the arguments that the formerly non-templated function accepts.

Thanks, I didn't know. Better said I forgot. Because I already voted on this issue. Jens
Jun 12 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Tue, 12 Jun 2012 11:48:11 +0200
schrieb Jens Mueller <jens.k.mueller gmx.de>:

 * As seed is a normal function right now, I can't overload it with a
   template. Is it safe to make the original seed a template as
 well, so seedRange could be named seed, or would that break the API?

What do you mean by you can't overload it? Does it not compile? I think it should.
 * How should the seed array/range be generated? By repeatedly
 calling unpredictableSeed?

No idea what is recommended from cryptographic point of view.
 * Is there a Range which repeatedly calls a function? Similar to
   repeat, but not repeating a value.

Here is one way to do it. seedRange(map!((a) => unpredictableSeed)(iota(0, 624))); Probably, there is a more straightforward way.

Thanks, that's good enough. It would be even better if it was an infinite range, though.
 I simplified your code a bit.
 
 enum n = 624;
 enforce(range.length >= n, "MersenneTwisterEngine.seedRange: Input
 range didn't provide enough" " elements");

I can't really use .length, the code has to work with infinite ranges as well.
 uint mt[];
 mt.length = n;
 copy(range, mt);
 
 But I'm unsure whether it still does what you intended.

Well, the code was a simple 1:1 port of the boost function and as I'm not familiar with RNGs I don't really want to change (and probably break) it.
 
 Jens

Jun 12 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Tue, 12 Jun 2012 13:14:23 +0200
schrieb Johannes Pfau <nospam example.com>:

 Am Tue, 12 Jun 2012 11:48:11 +0200
 schrieb Jens Mueller <jens.k.mueller gmx.de>:
 
 * As seed is a normal function right now, I can't overload it
 with a template. Is it safe to make the original seed a template
 as well, so seedRange could be named seed, or would that break
 the API?

What do you mean by you can't overload it? Does it not compile? I think it should.
 * How should the seed array/range be generated? By repeatedly
 calling unpredictableSeed?

No idea what is recommended from cryptographic point of view.
 * Is there a Range which repeatedly calls a function? Similar to
   repeat, but not repeating a value.

Here is one way to do it. seedRange(map!((a) => unpredictableSeed)(iota(0, 624))); Probably, there is a more straightforward way.

Thanks, that's good enough. It would be even better if it was an infinite range, though.

gen.seed(map!((a) => unpredictableSeed)(repeat(0))); seems to work.
Jun 12 2012
prev sibling parent Johannes Pfau <nospam example.com> writes:
Am Mon, 11 Jun 2012 13:09:26 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 We should have the same in std.random. Could anyone please initiate a 
 pull request?
 
 Thanks,
 
 Andrei

https://github.com/D-Programming-Language/phobos/pull/627
Jun 12 2012