digitalmars.D - isUniformRNG

Nick Sabalausky (4/4) May 03 2014 In std.random, is the "isUniformRNG" intended to determine whether the

Joseph Rushton Wakeling via Digitalmars-d (29/33) May 04 2014 Yes, it is meant to denote that the type is a _uniform_ random number ge...

Nick Sabalausky (15/19) May 04 2014 Fair enough point, I'll explain my situation:

Joseph Rushton Wakeling via Digitalmars-d (38/50) May 04 2014 Yea, it's a good question. I think I would answer it as follows.

Nick Sabalausky (27/60) May 04 2014 The spec for Hash_DRBG is in "NIST Special Publication 800-90A" which

Joseph Rushton Wakeling via Digitalmars-d (11/27) May 04 2014 Ahh, OK. So in practice you can probably template it on an unsigned int...

Nick Sabalausky (17/40) May 04 2014 Well, Hash_DRBG isn't really a normal stream since, based on my reading

Joseph Rushton Wakeling via Digitalmars-d (2/5) May 04 2014 You're very welcome. Keep me posted on how things go with your implemen...

Nick Sabalausky (6/14) May 04 2014 Will do. Speaking of, this algo uses SHA-1 and SHA-2 (depending on the
Nick Sabalausky (25/33) May 07 2014 It's now implemented and working in DAuth's HEAD:

Nick Sabalausky (7/10) May 07 2014 Oops, that's supposed to be:

David Nadlinger (3/8) May 07 2014 Shouldn't that take an ubyte output range instead?

David Nadlinger (3/4) May 07 2014 Erm, scrap that, wasn't thinking.

Nick Sabalausky (3/6) May 07 2014 Actually, maybe it should, even if only as an option. Unless I'm the one...

Joseph Rushton Wakeling via Digitalmars-d (6/12) May 04 2014 I should also be thanking you -- this reminded me that I needed to make ...

Nick Sabalausky (10/16) May 08 2014 Some good looking stuff in there. The separation is a nice improvement,

Joseph Rushton Wakeling via Digitalmars-d (12/21) May 08 2014 Thanks! :-)

Nick Sabalausky (56/87) May 08 2014 There can technically be multiple "instances", but yea, they're all

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (4/33) May 09 2014 I strongly disagree here. If a user has explicitly requested a
Colin Grogan (2/8) May 09 2014 Theres lots of uses. e.g. Minecraft.
Joseph Rushton Wakeling via Digitalmars-d (55/104) May 10 2014 I'm not sure I follow your point about crypto RNGs here. Assuming it's

Nick Sabalausky (195/266) May 10 2014 The seed doesn't need to be compromised for synchronized RNGs to fubar

Joseph Rushton Wakeling via Digitalmars-d (100/178) May 11 2014 Before we go onto the detail of discussion, thanks very much for the ext...

Nick Sabalausky (126/224) May 12 2014 Oh, not at all. I've been finding the discussion rather interesting. :)

Joseph Rushton Wakeling via Digitalmars-d (65/141) May 22 2014 Well, I wasn't going to suggest struct* as the type. There have been va...

Nick Sabalausky (48/140) May 22 2014 Cool, yea, we're agreed on classes looking like probably the best

Joseph Rushton Wakeling via Digitalmars-d (23/43) May 23 2014 The main issue is because your popFront will be relying on external sour...

Nick Sabalausky (20/54) May 23 2014 Didn't we already have exp.* for that? (Or something like that.)

Joseph Rushton Wakeling via Digitalmars-d (5/17) May 23 2014 Actually it's probably irrelevant in your case anyway, isn't it, because...

Nick Sabalausky (5/7) May 24 2014 So to recap my understanding of it:

Joseph Rushton Wakeling via Digitalmars-d (5/7) May 24 2014 I think that's up to you. I don't want to hold you back here, but equal...

Nick Sabalausky (6/13) May 27 2014 Perhaps. In any case, I've tossed up a PR for it (it contains a few

Joseph Rushton Wakeling via Digitalmars-d (2/3) May 04 2014 ... we can always empirically verify the uniformity of the distribution ...

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

In std.random, is the "isUniformRNG" intended to determine whether the 
given type is *some* RNG or just a *specific* form of RNG? Because I 
don't see any "isRNG" that's more general.

More importantly, should a crypto RNG count as "isUniformRNG"?

May 03 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 04/05/14 04:03, Nick Sabalausky via Digitalmars-d wrote:
 In std.random, is the "isUniformRNG" intended to determine whether the given
 type is *some* RNG or just a *specific* form of RNG? Because I don't see any
 "isRNG" that's more general.

Yes, it is meant to denote that the type is a _uniform_ random number
generator, 
that is, it's an input range whose elements are numbers drawn from a uniform 
distribution.  One could even be more strict as per the C++11 standard and 
specify that it's a range whose elements are unsigned integral types uniformly 
distributed in the closed interval [min, max].

Personally speaking, I have spent quite a lot of time being uncertain about 
whether the constraint to only unsigned integral types is entirely appropriate 
(e.g. Boost::Random contains RNGs whose underlying type is floating-point), but 
it _does_ greatly simplify many design factors.  For example, the fact that 
you're given a guarantee about bounds makes things much simpler when you move
on 
to (say) random distributions, such as the uniform distribution, where you want 
to control precisely the upper and lower bound behaviour (closed or open).

About a more general "isRNG" template: can you be more precise about what you 
are interested in achieving with this?  Generally speaking I would find it 
rather dangerous to go passing around sources of randomness without having some 
understanding of their properties :-)

I should also add: the C++11 standard distinguishes between uniform random 
number generators (the aforementioned 
uniformly-distributed-unsigned-integers-in-[min, max]) and uniform 
_distributions_, which take the output of a uniform RNG and transform it into 
random values drawn from other distributions.  Again, here we can see the 
importance of C++11's strict definition of a uniform RNG: suppose we take a 
uniform _distribution_ of floating-point numbers in the half-open range [0, 1) 
and use it as the _source_ of randomness for a uniform distribution in the 
closed interval [0, 1].  It won't work, because the assumptions about bounds 
won't work.

 More importantly, should a crypto RNG count as "isUniformRNG"?

If it's producing unsigned integral types uniformly distributed in a closed 
interval [min, max], why not?

May 04 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/4/2014 3:47 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 About a more general "isRNG" template: can you be more precise about
 what you are interested in achieving with this?  Generally speaking I
 would find it rather dangerous to go passing around sources of
 randomness without having some understanding of their properties :-)

Fair enough point, I'll explain my situation:

On a general level, I'm trying to grok the whole intent of isUniformRNG 
and see whether or not anything else may ever be needed in addition to 
isUniformRNG. I'm not trying to push an "isRNG", just trying to 
understand std.random's current intentions and reasoning, so I know how 
to work with it appropriately.

But more immediately, since Phobos lacks a crypto-secure RNG, I'm 
implementing NIST's Hash_DRBG (backed by the OS-provided 
RtlGenRandom/CryptGenRandom and "/dev/random" as entropy sources). 
Hopefully I can get it into a Phobos-acceptable state.

Now, I can follow the spec for the Hash_DRBG algorithm well enough, but 
I'm not really solid on random-number theory, so I can't be certain 
whether or not isUniformRNG is appropriate for this. I would assume 
"yes", but I don't want to assume. Hence my inquiries.

May 04 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 04/05/14 16:28, Nick Sabalausky via Digitalmars-d wrote:
 On a general level, I'm trying to grok the whole intent of isUniformRNG and see
 whether or not anything else may ever be needed in addition to isUniformRNG.
I'm
 not trying to push an "isRNG", just trying to understand std.random's current
 intentions and reasoning, so I know how to work with it appropriately.

Yea, it's a good question.  I think I would answer it as follows.

 From a design point of view, it's helpful to distinguish between code that
acts 
as a fundamental source of (usually pseudo-) randomness, versus stuff that 
transforms the output of those random sources into other forms (say, 
uniformly-distributed floating point numbers, or rolls of a 6-sided die, or 
permutations of an array, or whatever else).

In other words, it's helpful to distinguish between _generators_ of randomness, 
versus _users_ of randomness.

Now, in principle, there are many different kinds of random generator
available, 
not all of them drawing from a uniform distribution.  But again, from a design 
point of view, it's helpful to constrain the range of functionality assumed,
and 
uniformly-distributed random numbers are a very good foundation from which to 
build, because they can reliably be used to generate numbers drawn from any 
other probability distribution.

On top of that, it's useful to have the constraint of a distribution on a
closed 
interval that has explicitly included minimum and maximum values, because that 
greatly simplifies the checks and assumptions one has to make in writing 
functions that make use of the random generator.

Hence you come to this definition of a "uniform random number generator", which 
is the fundamental building block of all other random-number functionality.
What the isUniformRNG template is _really_ asking is, "Is this going to give me 
a valid source of randomness to plug in to all the other 
random-number-generation functions, so that they'll do what I expect?"

Simple example of how that can be important: suppose you have a random
generator 
that gives you uniform numbers in the half-open interval [min, max).  Now run 
that through std.random.uniform!"[]"().  You will not in fact get random
numbers 
drawn from the closed interval you have specified.

 But more immediately, since Phobos lacks a crypto-secure RNG, I'm implementing
 NIST's Hash_DRBG (backed by the OS-provided RtlGenRandom/CryptGenRandom and
 "/dev/random" as entropy sources). Hopefully I can get it into a
 Phobos-acceptable state.


That's great -- I really look forward to seeing your work :-)  Can you give me 
some links to the spec?

 Now, I can follow the spec for the Hash_DRBG algorithm well enough, but I'm not
 really solid on random-number theory, so I can't be certain whether or not
 isUniformRNG is appropriate for this. I would assume "yes", but I don't want to
 assume. Hence my inquiries.

I don't know that this necessarily needs too much in the way of random-number 
theory, but there are 3 questions that really need answering here:

    * What's the type of number that this algorithm produces?

    * Are these numbers uniformly distributed?

    * What's the min and max values produced, and are these
      inclusive bounds, i.e. are the random numbers drawn
      from the closed [min, max] interval?

Can you clarify?

May 04 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/4/2014 11:38 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:
On 04/05/14 16:28, Nick Sabalausky via Digitalmars-d wrote:
On a general level, I'm trying to grok the whole intent of
isUniformRNG and see
whether or not anything else may ever be needed in addition to
isUniformRNG. I'm
not trying to push an "isRNG", just trying to understand std.random's
current
intentions and reasoning, so I know how to work with it appropriately.

Yea, it's a good question. I think I would answer it as follows.

[...]

Ok, I see. Thanks.

But more immediately, since Phobos lacks a crypto-secure RNG, I'm
implementing
NIST's Hash_DRBG (backed by the OS-provided
RtlGenRandom/CryptGenRandom and
"/dev/random" as entropy sources). Hopefully I can get it into a
Phobos-acceptable state.

That's great -- I really look forward to seeing your work :-) Can you
give me some links to the spec?

The spec for Hash_DRBG is in "NIST Special Publication 800-90A" which
also includes a few other crypto RNGs (including, unfortunately, the one
with a known NSA backdoor, Dual_EC_DRBG, but it's easy enough to just
not bother implementing that one):

http://csrc.nist.gov/publications/nistpubs/800-90A/SP800-90A.pdf

That paper is linked to from here (seems to be down right now though):

http://csrc.nist.gov/groups/ST/toolkit/random_number.html

Which I found from here:

http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator#Standards

Now, I can follow the spec for the Hash_DRBG algorithm well enough,
but I'm not
really solid on random-number theory, so I can't be certain whether or
not
isUniformRNG is appropriate for this. I would assume "yes", but I
don't want to
assume. Hence my inquiries.

I don't know that this necessarily needs too much in the way of
random-number theory, but there are 3 questions that really need
answering here:

* What's the type of number that this algorithm produces?

Just a string of random bits. Effectively unsigned integers.

* What's the min and max values produced, and are these
inclusive bounds, i.e. are the random numbers drawn
from the closed [min, max] interval?

It's all based on SHA hashes (of seeds, internal state, and some other
stuff) and it doesn't explicitly exclude any output values. So if SHA is
working as expected, then yea, I'd imagine it should be a closed
interval over [0, 2^numBitsRequested - 1] where "numBitsRequested" is
permitted by the spec to be any positive integer.

* Are these numbers uniformly distributed?

This is the part I've been a little unclear on, although maybe I'm just
being paranoid about it. Doesn't a uniform distribution carry certain
reliable statistical properties? I don't know whether that could be used
to improve the likelihood of guessing future values. If so, then maybe
the algorithm is designed to avoid that?

Then again, wouldn't the only alternative to uniform distribution be a
weighted distribution? I can't imagine an RNG intended for crypto would
be deliberately weighted (unless maybe there were some randomness to the
weights...if that even makes any sense at all).

Maybe I'm just overthinking it?

May 04 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 04/05/14 19:42, Nick Sabalausky via Digitalmars-d wrote:
 Just a string of random bits. Effectively unsigned integers.

Ahh, OK.  So in practice you can probably template it on an unsigned integral 
type (which could include bool) and it'll just take the appropriate number of 
bits from the stream, no ... ?  Cf. what I did with /dev/urandom etc.:
https://github.com/WebDrake/std.random2/blob/master/std/random2/device.d#L122

 It's all based on SHA hashes (of seeds, internal state, and some other stuff)
 and it doesn't explicitly exclude any output values. So if SHA is working as
 expected, then yea, I'd imagine it should be a closed interval over [0,
 2^numBitsRequested - 1] where "numBitsRequested" is permitted by the spec to be
 any positive integer.

Yea, sounds right to me.

 This is the part I've been a little unclear on, although maybe I'm just being
 paranoid about it. Doesn't a uniform distribution carry certain reliable
 statistical properties? I don't know whether that could be used to improve the
 likelihood of guessing future values. If so, then maybe the algorithm is
 designed to avoid that?

 Then again, wouldn't the only alternative to uniform distribution be a weighted
 distribution? I can't imagine an RNG intended for crypto would be deliberately
 weighted (unless maybe there were some randomness to the weights...if that even
 makes any sense at all).

 Maybe I'm just overthinking it?

Probably :-)  Let's put it this way: if you think in terms of the individual 
bits being generated, there obviously has to be, from the point of view of the 
user of the algorithm, no way to decide which bit value is more likely, which 
corresponds to a uniform distribution of individual bit values.  And that in 
turn will map to a uniform distribution of bit sequences of any length.

May 04 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/4/2014 2:10 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 19:42, Nick Sabalausky via Digitalmars-d wrote:
 Just a string of random bits. Effectively unsigned integers.

 Ahh, OK.  So in practice you can probably template it on an unsigned
 integral type (which could include bool) and it'll just take the
 appropriate number of bits from the stream, no ... ?  Cf. what I did
 with /dev/urandom etc.:
 https://github.com/WebDrake/std.random2/blob/master/std/random2/device.d#L122

Well, Hash_DRBG isn't really a normal stream since, based on my reading 
of its spec, it sounds like (for example) requesting one byte four times 
will give a different result than requesting four bytes all at once 
(assuming you're starting from the same internal state and there's no 
reseeding).

But aside from that minor detail, yes, that's essentially correct. And 
much like /dev/(u)random, you could even make the number of bytes/bits 
requested a per-call runtime parameter (although that would diverge from 
the existing std.random interfaces and would require either allocating 
or taking an output sink, so I don't know whether I'll bother).

 Then again, wouldn't the only alternative to uniform distribution be a
 weighted
 distribution? I can't imagine an RNG intended for crypto would be
 deliberately
 weighted (unless maybe there were some randomness to the weights...if
 that even
 makes any sense at all).

 Maybe I'm just overthinking it?

 Probably :-)  Let's put it this way: if you think in terms of the
 individual bits being generated, there obviously has to be, from the
 point of view of the user of the algorithm, no way to decide which bit
 value is more likely, which corresponds to a uniform distribution of
 individual bit values.  And that in turn will map to a uniform
 distribution of bit sequences of any length.

Yea. Plus, this doc about testing these crypto PRNGs...

http://csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf

...does mention the importance of "uniformity".

So I think it's probably safe to figure this is a uniform distribution 
unless some expert chimes in and says otherwise.

Thanks for the help.

May 04 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution unless
 some expert chimes in and says otherwise.

 Thanks for the help.

You're very welcome.  Keep me posted on how things go with your implementation!
:-)

May 04 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/4/2014 3:26 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution
 unless
 some expert chimes in and says otherwise.

 Thanks for the help.

 You're very welcome.  Keep me posted on how things go with your
 implementation! :-)

Will do. Speaking of, this algo uses SHA-1 and SHA-2 (depending on the 
desired crypto strength of the psuedo-randomness). SHA-1 of course has 
been in Phobos for awhile. I've recently expanded it to support SHA-2, 
pending sufficient Phobos pull code reviews:

https://github.com/D-Programming-Language/phobos/pull/2129

May 04 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/4/2014 3:26 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution
 unless
 some expert chimes in and says otherwise.

 Thanks for the help.

 You're very welcome.  Keep me posted on how things go with your
 implementation! :-)

It's now implemented and working in DAuth's HEAD:

https://github.com/Abscissa/DAuth/blob/master/src/dauth/hashdrbg.d

Usage is pretty simple:

----------------------------
import std.algorithm : take;
import std.array : array;
import dauth.sha;
import dauth.hashdrbg;

HashDRBG!uint rand;

// The above is equivalent to:
// HashDRBG!(uint, SHA512, "DAuth") rand;

// Now use rand just like any other RNG in std.random:
uint[] values1 = rand.take(7).array();

// The algorithm specifically supports generating
// arbitrary lengths at once, so can also do:
HashDRBGStream!uint randStream;

ubyte[] values2;
values2.length = 42;

randStream.read(values2);
----------------------------

Next steps are to support Hash_DRBG's optional "additional input" 
feature, and to see about Phobos-ifying it and making a pull request.

I'll take a look at your proposed std.random too, sorry I haven't had a 
chance to yet.

May 07 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/7/2014 4:06 PM, Nick Sabalausky wrote:
 // The algorithm specifically supports generating
 // arbitrary lengths at once, so can also do:
 HashDRBGStream!uint randStream;

Oops, that's supposed to be:

HashDRBGStream!() randStream;
//aka:
// HashDRBGStream!(SHA512, "D Crypto RNG") randStream;

The stream version isn't a range and only supports filling a provided 
ubyte[] buffer. So no element type.

May 07 2014

"David Nadlinger" <code klickverbot.at> writes:

On Wednesday, 7 May 2014 at 20:11:13 UTC, Nick Sabalausky wrote:
 HashDRBGStream!() randStream;
 //aka:
 // HashDRBGStream!(SHA512, "D Crypto RNG") randStream;

 The stream version isn't a range and only supports filling a 
 provided ubyte[] buffer. So no element type.

Shouldn't that take an ubyte output range instead?

David

May 07 2014

"David Nadlinger" <code klickverbot.at> writes:

On Wednesday, 7 May 2014 at 21:01:20 UTC, David Nadlinger wrote:
 Shouldn't that take an ubyte output range instead?

Erm, scrap that, wasn't thinking.

David

May 07 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/7/2014 5:09 PM, David Nadlinger wrote:
 On Wednesday, 7 May 2014 at 21:01:20 UTC, David Nadlinger wrote:
 Shouldn't that take an ubyte output range instead?

 Erm, scrap that, wasn't thinking.

Actually, maybe it should, even if only as an option. Unless I'm the one 
not thinking now...?

May 07 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 04/05/14 21:26, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution unless
 some expert chimes in and says otherwise.

 Thanks for the help.

 You're very welcome.  Keep me posted on how things go with your
implementation! :-)

I should also be thanking you -- this reminded me that I needed to make some
key 
changes to my std.random2 proposal, including this stricter isUniformRNG
template:
https://github.com/WebDrake/std.random2/commit/a071d8d182eb28516198bb292a0b45f86f4425fe

How does this look to people?  And would there be interest in seeing this land 
in the existing std.random ... ?

May 04 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/4/2014 4:08 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 I should also be thanking you -- this reminded me that I needed to make
 some key changes to my std.random2 proposal, including this stricter
 isUniformRNG template:
 https://github.com/WebDrake/std.random2/commit/a071d8d182eb28516198bb292a0b45f86f4425fe


 How does this look to people?  And would there be interest in seeing
 this land in the existing std.random ... ?

Some good looking stuff in there. The separation is a nice improvement, 
and having a "shuffle" is something we could certainly use.

About avoiding problems from accidentally duplicating internal state, 
I'm not sure that changing to classes is really necessary for that. I 
addressed it in HashDRBG by simply making the internal state static. 
(Oops, or at least I meant to...fixing now...). The various InputRange 
instantiations (for the different possible elemement types) all draw 
from the same source (HashDRBGStream), so this should work out pretty 
well. I'd imagine the existing std.random algos could do something similar.

May 08 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 08/05/14 19:18, Nick Sabalausky via Digitalmars-d wrote:
 Some good looking stuff in there. The separation is a nice improvement, and
 having a "shuffle" is something we could certainly use.

Thanks! :-)

There is already a shuffle function in the existing std.random, but it's called 
randomShuffle:


 About avoiding problems from accidentally duplicating internal state, I'm not
 sure that changing to classes is really necessary for that. I addressed it in
 HashDRBG by simply making the internal state static. (Oops, or at least I meant
 to...fixing now...). The various InputRange instantiations (for the different
 possible elemement types) all draw from the same source (HashDRBGStream), so
 this should work out pretty well. I'd imagine the existing std.random algos
 could do something similar.

That seems a problematic fix for me -- doesn't it mean that there can only ever 
be one instance of any individual RNG?  While many applications wouldn't notice 
the difference, it's a pretty major breach of encapsulation (and purity).

On a related note -- I like your concept of a random stream.  I think it may be 
useful to adapt as part of the way in which std.random2.device works.  I need
to 
compare to other Phobos stream code -- isn't Steven Schweighoffer working on 
up-to-date stream functionality as part of std.io ... ?

May 08 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/8/2014 5:29 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 08/05/14 19:18, Nick Sabalausky via Digitalmars-d wrote:
 Some good looking stuff in there. The separation is a nice
 improvement, and
 having a "shuffle" is something we could certainly use.

 Thanks! :-)

 There is already a shuffle function in the existing std.random, but it's
 called randomShuffle:


How'd I either miss or forget that?! ;)

 About avoiding problems from accidentally duplicating internal state,
 I'm not
 sure that changing to classes is really necessary for that. I
 addressed it in
 HashDRBG by simply making the internal state static. (Oops, or at
 least I meant
 to...fixing now...). The various InputRange instantiations (for the
 different
 possible elemement types) all draw from the same source
 (HashDRBGStream), so
 this should work out pretty well. I'd imagine the existing std.random
 algos
 could do something similar.

 That seems a problematic fix for me -- doesn't it mean that there can
 only ever be one instance of any individual RNG?

There can technically be multiple "instances", but yea, they're all 
effectively tied together. However, I'm leaning towards the belief 
that's "correct behavior" for a RNG. It's *definitely* correct for a 
crypto RNG - you certainly wouldn't want two crypto RNGs ever generating 
the same sequence, not even deliberately. That would defeat the whole 
point of a crypto RNG.

As for ordinary non-crypto RNGs, I honestly can't imaging any purpose 
for reliably generating the same values other than "playing back" a 
previous sequence. But if that's what you want, then IMO it's better to 
record the sequence of emitted values, or even record/replay the 
higher-level decisions which the randomness was used to influence.

For randomness, record/replay is just less fiddly and error-prone than 
reliably regenerating values from an algorithm that intentionally tries 
to imitate non-predictability. One slip-up while regenerating the 
sequence (example: trying to replay from the same seed on a RNG that has 
since had a bug fixed, or on a different architecture which the RNG 
wasn't well-tested on) and then the inaccuracies just cascade. Plus this 
way you can swap in a non-deterministic RNG and everything will still 
work. Or more easily skip back/ahead.

I'm just not seeing a legitimate use-case for multiple states of the 
same RNG engine (at least on the same thread) that wouldn't be better 
served by different approach.

 While many
 applications wouldn't notice the difference, it's a pretty major breach
 of encapsulation (and purity).

AIUI an RNG is impure by definition. (Well, unless you consider the 
internal state to be part of the input, but I'd say *that* view breaks 
encapsulation.)

I'm also unconvinced that it breaks encapsulation anyway. If anything, I 
think it's better encapsulated since separate instances *won't* behave 
like they're under "quantum entanglement". It forces separate instances 
to stay separate.

It *is* a fairly strange situation though. Unlike most encapsulation 
units, the whole point of an RNG is to be *inconsistent* rather than 
consistent. So that kinda turns everything on its ear.

 On a related note -- I like your concept of a random stream.  I think it
 may be useful to adapt as part of the way in which std.random2.device
 works.  I need to compare to other Phobos stream code -- isn't Steven
 Schweighoffer working on up-to-date stream functionality as part of
 std.io ... ?

Yea, when I wrote it I figured on proposing an optional stream-like 
interface for std.random. The Hash_DRBG algorithm is inherently based 
around the idea that each generated number can be arbitrary-length (not 
only that - generating ex. two 8-byte values with Hash_DRGB is 
fundamentally different from generating eight 2-byte values, and 
produces different results). And the OS-specific entropy generators are 
basically streams, too (on both Win and Posix, you request any number of 
bytes of randomness).

So that was just a natural way to implement it. But it did strike me as 
being generally useful, so I'm glad that you agree.

You're right though, the current state of Phobos streams is kind of a 
stumbling block. Naturally it'd be best to use the new stream interface, 
but since that isn't in place, that's an issue. I don't know what the 
state of the rewrite is. Been awhile since I've heard anything about it.

I figure, at the very least, we could just make the stream interfaces 
for RNGs private. Then we can open it up whenever we finally do have a 
new std.stream to conform to.

Or we could do an OutputRange instead...but the recent discussion on 
those makes me a bit hesitant to do so. And I definitely don't wanna 
perform the contortions to invert it into a non-coroutine-like InputRange.

Hmm, yea, I'm more and more liking the "keep it private for now, and 
then see what happens" approach...unfortunately... :/

May 08 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 9 May 2014 at 00:43:10 UTC, Nick Sabalausky wrote:
 On 5/8/2014 5:29 PM, Joseph Rushton Wakeling via Digitalmars-d 
 wrote:
 That seems a problematic fix for me -- doesn't it mean that 
 there can
 only ever be one instance of any individual RNG?

 There can technically be multiple "instances", but yea, they're 
 all effectively tied together. However, I'm leaning towards the 
 belief that's "correct behavior" for a RNG. It's *definitely* 
 correct for a crypto RNG - you certainly wouldn't want two 
 crypto RNGs ever generating the same sequence, not even 
 deliberately. That would defeat the whole point of a crypto RNG.

 As for ordinary non-crypto RNGs, I honestly can't imaging any 
 purpose for reliably generating the same values other than 
 "playing back" a previous sequence. But if that's what you 
 want, then IMO it's better to record the sequence of emitted 
 values, or even record/replay the higher-level decisions which 
 the randomness was used to influence.

 For randomness, record/replay is just less fiddly and 
 error-prone than reliably regenerating values from an algorithm 
 that intentionally tries to imitate non-predictability. One 
 slip-up while regenerating the sequence (example: trying to 
 replay from the same seed on a RNG that has since had a bug 
 fixed, or on a different architecture which the RNG wasn't 
 well-tested on) and then the inaccuracies just cascade. Plus 
 this way you can swap in a non-deterministic RNG and everything 
 will still work. Or more easily skip back/ahead.

 I'm just not seeing a legitimate use-case for multiple states 
 of the same RNG engine (at least on the same thread) that 
 wouldn't be better served by different approach.

I strongly disagree here. If a user has explicitly requested a 
PRNG, they should be able to rely on its most basic property, 
being deterministic.

May 09 2014

"Colin Grogan" <grogan.colin gmail.com> writes:

On Friday, 9 May 2014 at 00:43:10 UTC, Nick Sabalausky wrote:

 As for ordinary non-crypto RNGs, I honestly can't imaging any 
 purpose for reliably generating the same values other than 
 "playing back" a previous sequence. But if that's what you 
 want, then IMO it's better to record the sequence of emitted 
 values, or even record/replay the higher-level decisions which 
 the randomness was used to influence.


Theres lots of uses. e.g. Minecraft.

May 09 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 09/05/14 02:42, Nick Sabalausky via Digitalmars-d wrote:
 There can technically be multiple "instances", but yea, they're all effectively
 tied together. However, I'm leaning towards the belief that's "correct
behavior"
 for a RNG. It's *definitely* correct for a crypto RNG - you certainly wouldn't
 want two crypto RNGs ever generating the same sequence, not even deliberately.
 That would defeat the whole point of a crypto RNG.

I'm not sure I follow your point about crypto RNGs here.  Assuming it's 
deterministic at all, if someone else knows the seed state (which would be 
required to get two instances generating the same sequences), aren't you 
buggered anyway?  And isn't the point of a deterministic crypto RNG that its 
seed state is kept secret, or that it is seeded with non-deterministic values 
upon instantiation?

 As for ordinary non-crypto RNGs, I honestly can't imaging any purpose for
 reliably generating the same values other than "playing back" a previous
 sequence. But if that's what you want, then IMO it's better to record the
 sequence of emitted values, or even record/replay the higher-level decisions
 which the randomness was used to influence.

Even that will not save you in Minecraft, where the results of different seeds 
will produce different results in updated versions of the game where there are 
more random possibilities available (more monsters, more flora and fauna...).
;-)

 For randomness, record/replay is just less fiddly and error-prone than reliably
 regenerating values from an algorithm that intentionally tries to imitate
 non-predictability. One slip-up while regenerating the sequence (example:
trying
 to replay from the same seed on a RNG that has since had a bug fixed, or on a
 different architecture which the RNG wasn't well-tested on) and then the
 inaccuracies just cascade. Plus this way you can swap in a non-deterministic
RNG
 and everything will still work. Or more easily skip back/ahead.

There's a difference between _relying_ on the deterministic sequence from a 
pseudo-RNG and the principle of design that says if I declare two separate RNG 
instances,

     SomeRNG rng1;
     SomeRNG rng2;

then the calls to rng1 should not have any effect on the results of the calls
to 
rng2.  Or alternatively, if I do,

     SomeRNG rng1;
     SomeRNG rng2;

     rng1.seed(n);
     rng2.seed(n);

     foreach (a, b; lockstep(rng1, rng2).take(10))
     {
         writefln("%s\t%s", a, b);
     }

then the values I get should be the same, because rng1 and rng2 are independent 
instances that have been identically seeded.

 I'm just not seeing a legitimate use-case for multiple states of the same RNG
 engine (at least on the same thread) that wouldn't be better served by
different
 approach.

We could get into broader discussions, but I don't think it needs to be more 
complicated than: if I create an instance of a variable of type T, then I
should 
expect that it will be independent of any other instance of a variable of type 
T, unless I have deliberately created that link.

 AIUI an RNG is impure by definition. (Well, unless you consider the internal
 state to be part of the input, but I'd say *that* view breaks encapsulation.)

Not at all.  In fact a good theoretical model of a pseudo-RNG is

     * a state variable, and

     * a pure function that transforms one state variable into another
       such that the sequence of states is indistinguishable from a random
       sequence.

You can also add other pure transformations to map the state variable to 
particular random variates (e.g. unsigned integers in a certain range).

You can encapsulate this quite neatly as the state variable being a private 
member of a struct or class, with the pure transformation function being 
.popFront and the state-to-variate mapping being handled by .front.  Note that 
.popFront _is_ pure for all the pseudo-RNGs in std.random2 :-)

 I'm also unconvinced that it breaks encapsulation anyway. If anything, I think
 it's better encapsulated since separate instances *won't* behave like they're
 under "quantum entanglement". It forces separate instances to stay separate.

Cf. my earlier example with two individual RNG instances initialized with the 
same seed.  This ought to produce two identical sequences; with your proposed 
static internals, it won't, because calls to rng1 will affect the results of 
subsequent calls to rng2, and vice versa.  This is pretty much the definition
of 
breaking encapsulation, no?

 It *is* a fairly strange situation though. Unlike most encapsulation units, the
 whole point of an RNG is to be *inconsistent* rather than consistent. So that
 kinda turns everything on its ear.

The point of a pseudo-RNG is that if you analyse the sequence of variates 
produced, you can't distinguish it from a genuine random sequence.  But it's 
also intended that different instances seeded with different values ought to 
produce statistically independent sequences with very high probability.

Your proposal would make it impossible to _have_ multiple effective instances 
seeded differently (unless in different threads).

 Yea, when I wrote it I figured on proposing an optional stream-like interface
 for std.random. The Hash_DRBG algorithm is inherently based around the idea
that
 each generated number can be arbitrary-length (not only that - generating ex.
 two 8-byte values with Hash_DRGB is fundamentally different from generating
 eight 2-byte values, and produces different results). And the OS-specific
 entropy generators are basically streams, too (on both Win and Posix, you
 request any number of bytes of randomness).

 So that was just a natural way to implement it. But it did strike me as being
 generally useful, so I'm glad that you agree.

 You're right though, the current state of Phobos streams is kind of a stumbling
 block. Naturally it'd be best to use the new stream interface, but since that
 isn't in place, that's an issue. I don't know what the state of the rewrite is.
 Been awhile since I've heard anything about it.

Yup, me neither.  I will try and quiz Steven on it at some point.

 I figure, at the very least, we could just make the stream interfaces for RNGs
 private. Then we can open it up whenever we finally do have a new std.stream to
 conform to.

 Or we could do an OutputRange instead...but the recent discussion on those
makes
 me a bit hesitant to do so. And I definitely don't wanna perform the
contortions
 to invert it into a non-coroutine-like InputRange.

 Hmm, yea, I'm more and more liking the "keep it private for now, and then see
 what happens" approach...unfortunately... :/

Yes, that's probably for the best.

May 10 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/10/2014 12:34 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 09/05/14 02:42, Nick Sabalausky via Digitalmars-d wrote:
 There can technically be multiple "instances", but yea, they're all
 effectively
 tied together. However, I'm leaning towards the belief that's "correct
 behavior"
 for a RNG. It's *definitely* correct for a crypto RNG - you certainly
 wouldn't
 want two crypto RNGs ever generating the same sequence, not even
 deliberately.
 That would defeat the whole point of a crypto RNG.

 I'm not sure I follow your point about crypto RNGs here.  Assuming it's
 deterministic at all, if someone else knows the seed state (which would
 be required to get two instances generating the same sequences), aren't
 you buggered anyway?  And isn't the point of a deterministic crypto RNG
 that its seed state is kept secret, or that it is seeded with
 non-deterministic values upon instantiation?

The seed doesn't need to be compromised for synchronized RNGs to fubar 
security. It's bad enough just to accidentally generate the same byte 
sequence for multiple things:

struct CryptoRng { ulong internalState; ...etc... }
CryptoRng myRand;

// Oops! Duplicates internal state!
ubyte[] getRandomJunk(CryptoRng rand) {
	return rand.take(64).array();
}

// Oops! These are all identical! Passwords may as well be unsalted!
auto saltForStevensPassword =  getRandomJunk(myRand);
auto saltForSallysPassword =  getRandomJunk(myRand);
auto saltForGregsPassword =  getRandomJunk(myRand);

// Oops! Also identical!
auto oneTimeURL1 =  "https://verify-email/a a.a/"~getRandomJunk(myRand);
auto oneTimeURL2 =  "https://verify-email/b b.b/"~getRandomJunk(myRand);
auto oneTimeURL3 =  "https://verify-email/c c.c/"~getRandomJunk(myRand);

May as well have just defined getRandomJunk as a constant ;)

Maybe not *too* terrible for randomized alien invader attack patterns 
(unless it makes the game boring and doesn't sell), but a fairly major 
failure for security purposes.

There's another issue, too:

First a little background: With Hash_DRBG (and I would assume other 
crypto RNGs as well), the RNG is periodically reseeded, both 
automatically and on-demand. This reseeding is *not* like normal 
non-crypto reseeding: With non-crypto reseeding, you're fully resetting 
the internal state to a certain specific value and thus resetting the 
existing entropy, replacing it with the fresh new entropy. But with 
Hash_DRBG, reseeding only *adds* additional fresh entropy - nothing gets 
reset. Any existing entropy still remains. So reseeding two different 
Hash_DRBGs with the *same* reseed data *still* results in two completely 
different internal states. This is deliberately defined by the NIST spec.

There are reasons for that, but one noteworthy side-consequence is this: 
If all your Hash_DRBG's share the same internal state, then anytime one 
is reseeded with fresh entropy, they *all* benefit from the fresh 
entropy, for free.

And as a (maybe smaller) additional benefit, if different parts of your 
program are using an RNG independently of each other, but the RNGs 
internally share the same state, then *each* component contributes 
additional (semi-)non-determinism to the pool. That part's true of other 
RNGs too, but the effect is compounded further by Hash_DRBG which 
updates the internal state differently depending on how much data is 
requested at each, umm...at each request.


 As for ordinary non-crypto RNGs, I honestly can't imaging any 


purpose for
 reliably generating the same values other than "playing back" a previous
 sequence. But if that's what you want, then IMO it's better to 


record the
 sequence of emitted values, or even record/replay the higher-level
 decisions
 which the randomness was used to influence.

 Even that will not save you in Minecraft, where the results of different
 seeds will produce different results in updated versions of the game
 where there are more random possibilities available (more monsters, more
 flora and fauna...). ;-)

Yea, exactly, that's why it's best to record high-level *results* (like 
"spawn a cypress tree at position (12,73,4)" or "alien fighter ID 742 
changed velocity to (...) at time (...)") instead of relying on 
recreating it all from a given seed and deterministic RNG algorithm.

However...the mentions of Minecraft did raise one point that I'll 
concede: If you have a multiplayer environment (or more generally, a 
large chunk of shared multi-computer data) that's randomly-generated, 
then a very careful usage of a shared seed and RNG algorithm could 
drastically reduce the network bandwidth requirements.

I have no idea whether minecraft actually works that way, or even it's 
even random-generated (I've only played for a few minutes - I like the 
style, but I don't have the attention span for sandbox stuff ;) ), but 
it is enough to convince me that there might be legitimate uses after 
all for individual RNG instances that emit the same sequence. (But not 
crypto-RNGs though. Those exist purely for one very specific purpose - 
to maximize unpredictability when there's limited "true" randomness.)

Granted, in that particular case of "shared random-generated data", I'm 
not sure I'd feel comfortable relying on an RNG from a third-party lib, 
or even a standard library. One tiny little thing changes and causes one 
tiny little variance in the outputs...and everything gets shot to hell. 
And it won't even be detected unless you thought ahead enough to verify 
everyone's results with checksums (or some such). So if it were me, I'd 
either write or borrow an RNG, but then keep it in the project's own 
codebase where I can more easily limit any chance of changed behavior.

 There's a difference between _relying_ on the deterministic sequence
 from a pseudo-RNG and the principle of design that says if I declare two
 separate RNG instances,

      SomeRNG rng1;
      SomeRNG rng2;

 then the calls to rng1 should not have any effect on the results of the
 calls to rng2.  Or alternatively, if I do,

      SomeRNG rng1;
      SomeRNG rng2;

      rng1.seed(n);
      rng2.seed(n);

      foreach (a, b; lockstep(rng1, rng2).take(10))
      {
          writefln("%s\t%s", a, b);
      }

 then the values I get should be the same, because rng1 and rng2 are
 independent instances that have been identically seeded.

I do agree strongly that there's a major difference between "relying on 
deterministic sequence from an RNG" and "principle of design".

However, I also maintain that the "principle of design" case demands you 
should *NOT* get the same results from RNGs unless you *specifically 
intend* to be "relying on deterministic sequence from an RNG". After 
all, they *are* supposed to be "random number generators". That's their 
primary function.

I do believe strongly in the same principles of encapsulation that you 
do. But the thing is, the usual rules of proper encapsulated design are 
deliberately and fundamentally predicated on the assumption that you 
*want* reliably predictable deterministic behavior. The vast majority of 
the time, that's a correct assumption - but the whole *point* of an RNG 
is to *not* be deterministic. Or at least to get as close to 
non-determinism as reasonably possible within a fundamentally 
deterministic system.

Since the basic goals of RNGs are in fundamental disagreement with the 
underlying assumptions of usual "well encapsulated" design, that calls 
into question the applicability of usual encapsulation rules to RNGs.

At the risk of being presumptuous, I think you do agree on some level. 
Isn't that the whole reason you've proposed that RNGs change from 
structs to classes? Because you don't want identical sequences from RNGs 
unless *specifically* intended, right? The only parts of that I've 
intended to question are:

A. Does that really necessitate moving to classes?
B. Are there any legitimate use-cases for "specifically relying on 
deterministic sequence from an RNG"?

For B, the above example of "reducing bandwidth/storage requirements for 
a large set of shared random-generated data" is making me question my 
former stance of "No, there's no legitimate use-case". If B falls (and I 
suppose it maybe it does), then A does too, and classes would indeed be 
necessary.

But that's all for non-crypto RNGs. For crypto-RNGs, determinism is 
*never* anything more than a concession forced by the deterministic 
nature of the machines themselves. FWIW, this does mean  crypto-RNGs can 
go either struct or class, since the internal states really should be 
tied together anyway (in order to prevent their *outputs* from being 
tied together).


 We could get into broader discussions, but I don't think it needs to be
 more complicated than: if I create an instance of a variable of type T,
 then I should expect that it will be independent of any other instance
 of a variable of type T, unless I have deliberately created that link.

Yes, I can agree with that much, but the difficulty is that synchronized 
"random" number generation *is* effectively an implicit link, even if 
the internal implementation doesn't explicitly link them. Having the 
same internal state *is* the link, whether it was intentional or not. To 
make them logically "unlink", some care is required and *may* involve a 
concrete link on the implementation side. (I say "may involve" instead 
of "will involve" because, naturally, just using classes as you've 
proposed also does the job of avoiding unintentional linking.)

Again, the nature of "random" or "psuedo-random" is at odds with normal 
rules and kinda flips things around. If I copy a normal struct, then 
yes, I expect most operations on both to be deterministic and return the 
same results - but that's because most structs *model* deterministic 
things. If I do:

auto rngB = rngA;

Then they're both still psuedo-random number generators, so I think for 
such a simple operation as basic assignment (or passing to a func, etc) 
it's questionable to start getting the same values from abstraction 
intended to model (more or less) randomness. I'd expect to have to go 
further out of my way:

auto rngB = rngA.clone;

But, you're right, that does ultimately suggest that structs may be a 
questionable choice for an RNG anyway.


 I'm also unconvinced that it breaks encapsulation anyway. If anything,
 I think
 it's better encapsulated since separate instances *won't* behave like
 they're
 under "quantum entanglement". It forces separate instances to stay
 separate.

 Cf. my earlier example with two individual RNG instances initialized
 with the same seed.  This ought to produce two identical sequences; with
 your proposed static internals, it won't, because calls to rng1 will
 affect the results of subsequent calls to rng2, and vice versa.

This is a "concrete vs logical" issue.

First of all, consider the basic idea of encapsulation: It's a black 
box. You use it, you get the result described on its label, no peeking 
inside. Note that the label on an RNG basically says "Retrieve a 
seemingly random value."

Now your question, "Are two RNGs with a shared internal state 
encapsulated?" That's where the "concrete vs logical" comes in:

Concrete:

In a concrete sense, yes each one will cause the other to emit a 
different *specific* value, so *concretely* maybe they're not 
encapsulated. But remember, these are psuedo-*random* number generators 
- minimal predictability is the whole point. Getting the same sequence 
of values from two different things is correlation - ie, predictability. 
Predictability is the antithesis of random. Violates our primary 
purpose. Encapsulation or not, the label on the box lied.

Logical:

The logical meaning of "return a (psudo-)random number" is "gimme a 
seemingly random number, avoid any correlation with anything 
meaningful". In a logical sense, both RNGs (with shared internal state) 
are *still* emitting psuedo-random numbers, which is *exactly* what 
their encapsulation unit defines.

Even better still, the internal (*not* external) interplay between the 
two black boxes (which "encapsulation" dictates we shouldn't be worrying 
about anyway) is ensuring we get *minimal* correlation between the two 
RNG's outputs, ie. less predictability, ie. once again *exactly* what we 
expect from a black box labeled "Push my button and I'll give you a 
seemingly random value with minimal correlation to anything meaningful".

It didn't give you the "random" number you wanted? Well then you weren't 
really looking for random anyway. You're peeking inside the box and 
saying "I want *that* random value, because that's the one the other 
randomizer gave me". Not very encapsulated.

*That* is the primary purpose of an RNG, deterministic or not.

But that brings us to the secondary purpose: Maybe we really do 
occasionally want to be able to recreate the same sequence. Well ok, 
fine. But that's a special case. Expecting it as a side-effect of using 
an encapsulation unit labeled "psuedo-random number generator" isn't 
respecting encapsulation. Encapsulation is using the button labeled "Get 
an RNG that emits the same sequence as some other RNG."

 This is
 pretty much the definition of breaking encapsulation, no?

Well, again, the normal rules of encapsulation are predicated on the 
assumption that you want reliable and predictable, not random and 
unpredictable. With a RNG you DO want (seemingly) random and 
unpredictable. Thus, encapsulation is only applicable until it starts 
interfering with an RNG's raison d'etre. Getting predictably similar 
values from two RNGs when you haven't *specifically* requested so,  is 
certainly in opposition to an RNG's very purpose.

 The point of a pseudo-RNG is that if you analyse the sequence of
 variates produced, you can't distinguish it from a genuine random
 sequence.  But it's also intended that different instances seeded with
 different values ought to produce statistically independent sequences
 with very high probability.

 Your proposal would make it impossible to _have_ multiple effective
 instances seeded differently (unless in different threads).

That's why I brought up the question of whether there was a 
significantly valid use-case for doing do. If there really is, then yea, 
my idea blows ;)

For non-crypto-RNGs, I guess I'm kinda on the fence about whether 
there's a worthwhile use-case (but I guess I'm outnumbered, so I don't 
mind as long as your class idea is adopted in order to prevent 
accidental correlations).

For crypto-RNGs, I maintain that the ability to duplicate and generate 
identical sequences is an anti-feature. (For that matter, crypto-RNGs 
are specifically *supposed* to seed, and periodically reseed, 
*themselves* with data that contains at least some true randomness. So 
hopefully, none of this should be objectionable for crypto-RNGs.) Note, 
FWIW, this does *not* prevent crypto-RNGs from being class-based, so 
that aspect isn't even an issue anyway (just to be clear).


 Hmm, yea, I'm more and more liking the "keep it private for now, and
 then see
 what happens" approach...unfortunately... :/

 Yes, that's probably for the best.

Another option is to still permit the stream versions, but label them 
with a big red warning that they're a temporary design and may change 
when std.stream gets revamped.

I think maybe that'd be a more pragmatic way, user-friendly to go. If 
they hate breakage, well, they've been warned and nobody's forcing them 
to use it. If they have reason to use it and don't mind potential 
breaking, well, then nobody's forcing them to NOT use it.

May 10 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
 The seed doesn't need to be compromised for synchronized RNGs to fubar
security.

Before we go onto the detail of discussion, thanks very much for the extensive 
explanation.  I was slightly worried that my previous email might have come 
across as dismissive of your (completely understandable) concerns.  I'm
actually 
quite keen that we can find a mutually agreeable solution :-)

 It's bad enough just to accidentally generate the same byte sequence for
 multiple things:

 struct CryptoRng { ulong internalState; ...etc... }
 CryptoRng myRand;

 // Oops! Duplicates internal state!
 ubyte[] getRandomJunk(CryptoRng rand) {
      return rand.take(64).array();
 }

 // Oops! These are all identical! Passwords may as well be unsalted!
 auto saltForStevensPassword =  getRandomJunk(myRand);
 auto saltForSallysPassword =  getRandomJunk(myRand);
 auto saltForGregsPassword =  getRandomJunk(myRand);

 // Oops! Also identical!
 auto oneTimeURL1 =  "https://verify-email/a a.a/"~getRandomJunk(myRand);
 auto oneTimeURL2 =  "https://verify-email/b b.b/"~getRandomJunk(myRand);
 auto oneTimeURL3 =  "https://verify-email/c c.c/"~getRandomJunk(myRand);

 May as well have just defined getRandomJunk as a constant ;)

Sure, but this is a consequence of two things (i) CryptoRng is a value type and 
(ii) it gets passed by value, not by reference.

In your example, obviously one can fix the problem by having the function 
declared instead as

     ubyte[] getRandomJunk(ref CryptoRng rand) { ... }

but I suspect you'd say (and I would agree) that this is inadequate: it's 
relying on programmer virtue to ensure the correct behaviour, and sooner or 
later someone will forget that "ref".  (It will also not handle other cases, 
such as an entity that needs to internally store the RNG, as we've discussed 
many times on the list with reference to e.g. RandomSample or RandomCover.)

Obviously one _can_ solve the problem by the internal state variables of the
RNG 
being static, but I'd suggest to you that RNG-as-reference-type (which doesn't 
necessarily need to mean "class") solves that particular problem without the 
constraints that static internal variables have.

 Maybe not *too* terrible for randomized alien invader attack patterns (unless
it
 makes the game boring and doesn't sell), but a fairly major failure for
security
 purposes.

 There's another issue, too:

 First a little background: With Hash_DRBG (and I would assume other crypto RNGs
 as well), the RNG is periodically reseeded, both automatically and on-demand.
 This reseeding is *not* like normal non-crypto reseeding: With non-crypto
 reseeding, you're fully resetting the internal state to a certain specific
value
 and thus resetting the existing entropy, replacing it with the fresh new
 entropy. But with Hash_DRBG, reseeding only *adds* additional fresh entropy -
 nothing gets reset. Any existing entropy still remains. So reseeding two
 different Hash_DRBGs with the *same* reseed data *still* results in two
 completely different internal states. This is deliberately defined by the NIST
 spec.

 There are reasons for that, but one noteworthy side-consequence is this: If all
 your Hash_DRBG's share the same internal state, then anytime one is reseeded
 with fresh entropy, they *all* benefit from the fresh entropy, for free.

 And as a (maybe smaller) additional benefit, if different parts of your program
 are using an RNG independently of each other, but the RNGs internally share the
 same state, then *each* component contributes additional (semi-)non-determinism
 to the pool. That part's true of other RNGs too, but the effect is compounded
 further by Hash_DRBG which updates the internal state differently depending on
 how much data is requested at each, umm...at each request.

This is a very interesting point.  My understanding is that in many (most?) 
programs, each individual thread does typically operate using a single RNG 
instance anyway.  The question for me is whether that's a design choice of the 
developer or something that is an unavoidable outcome of the RNG design.

For example, it's readily possible to just instantiate a single RNG (crypto or 
otherwise, as required) at the start of the program and pass that around: if 
it's a reference type (as I would argue it should be) then the effects you 
desire will still take place.

Alternatively, one can also implement a thread-global RNG instance which 
functions will use.  This is already done with rndGen, which is effectively a 
singleton pattern.  There's no reason why one couldn't also define a cryptoGen 
which works in a similar way, just using an instance of a cryto RNG where
rndGen 
uses a Mersenne Twister:
https://github.com/D-Programming-Language/phobos/blob/d2f2d34d52f89dc9854069b13f52523965a7107e/std/random.d#L1161-L1174
https://github.com/WebDrake/std.random2/blob/a071d8d182eb28516198bb292a0b45f86f4425fe/std/random2/generator.d#L60-L81

The benefit of doing it this way, as opposed to static internal variables, is 
that you aren't then constrained to have only one single, effectively 
thread-global instance of _every_ RNG you create.

 However...the mentions of Minecraft did raise one point that I'll concede: If
 you have a multiplayer environment (or more generally, a large chunk of shared
 multi-computer data) that's randomly-generated, then a very careful usage of a
 shared seed and RNG algorithm could drastically reduce the network bandwidth
 requirements.

You may also recall earlier games like Elite which were able to "store" whole 
universes on a single 5 1/4" floppy by virtue of auto-generating that universe 
from a repeatable random sequence.  It was amazing what could be done on the
old 
BBC Micro :-)

 I do agree strongly that there's a major difference between "relying on
 deterministic sequence from an RNG" and "principle of design".

 However, I also maintain that the "principle of design" case demands you should
 *NOT* get the same results from RNGs unless you *specifically intend* to be
 "relying on deterministic sequence from an RNG". After all, they *are* supposed
 to be "random number generators". That's their primary function.

 I do believe strongly in the same principles of encapsulation that you do. But
 the thing is, the usual rules of proper encapsulated design are deliberately
and
 fundamentally predicated on the assumption that you *want* reliably predictable
 deterministic behavior. The vast majority of the time, that's a correct
 assumption - but the whole *point* of an RNG is to *not* be deterministic. Or
at
 least to get as close to non-determinism as reasonably possible within a
 fundamentally deterministic system.

 Since the basic goals of RNGs are in fundamental disagreement with the
 underlying assumptions of usual "well encapsulated" design, that calls into
 question the applicability of usual encapsulation rules to RNGs.

 At the risk of being presumptuous, I think you do agree on some level. Isn't
 that the whole reason you've proposed that RNGs change from structs to classes?
 Because you don't want identical sequences from RNGs unless *specifically*
 intended, right? The only parts of that I've intended to question are:

I completely agree that _unintended_ identical RNG sequences are a very bad 
thing that must be avoided.  I think that we're in disagreement only about how 
best to achieve that in a sane/safe/uncomplicated way.  With luck we will find 
something that works for both of us :-)

What I feel is that static internal variables go too far in the opposite 
direction: they make it impossible to have even _intentionally_ identical RNG 
sequences.  There's no way, for example, to .save or .dup an RNG instance with 
this design.

However, the effect of a common and reliable source of randomness, used by all 
parts of the program, can be implemented instead by a thread-global singleton 
instance of an RNG as described above.  This seems to me to be a better
solution 
as it gives you what you achieve using static variables, but without preventing 
you from instantiating as many other independent RNG instances as you like.

 A. Does that really necessitate moving to classes?

No, the important distinction is value vs. reference types, not structs v. 
classes.  One could implement the RNGs as struct-based reference types.

I wrote a class-based implementation because I wanted to see how that came out 
to various struct-based approaches monarch_dodra and I had tried.  I do think 
that it offers much value in terms of simplicity and elegance of design, but 
there may be other costs that make it untenable.  I am not sure the correct 
choice will be apparent until the current work on memory allocation settles
down.

 B. Are there any legitimate use-cases for "specifically relying on
deterministic
 sequence from an RNG"?

Well, when I was writing large-scale scientific simulations, it was certainly 
useful to be able to re-run them with the same random sequence.  Given the 
number of random variates I was generating in the process, it was not really 
practical to store the sequence on memory or disk.  The same is true of many 
circumstances where one is writing a program that relies on randomness, but 
wants to test it reliably.

However, fundamentally this comes down to a design point.  Pseudo-random RNGs 
are _always_ deterministic in their operation once seeded.  The difference
we're 
really talking about here is whether the source of randomness used by different 
parts of a program is thread-global or not.

And so, the question becomes rather, "Is it legitimate to enforce a 
thread-global source of randomness in all circumstances?"  I'd say the answer
to 
that is a definite no.

Bear in mind that one nasty consequence of using static internal variables to 
force thread-global behaviour is that it means that the presence or absence of 
unittests might affect the actual running program.  Consider:

     struct SomeRNG { private static InternalState _internals; ... }

     unittest
     {
         SomeRNG rng;
         rng.seed(KnownSeed);
         // verify that SomeRNG produces expected sequence of variates
     }

Now when you start your _actual_ program, the internal state of SomeRNG will 
already have been set by the unittests.  Obviously if you are doing things 
properly you will scrub this by re-seeding the RNG at the start of your program 
proper, but the risk is clearly still there.

 For B, the above example of "reducing bandwidth/storage requirements for a
large
 set of shared random-generated data" is making me question my former stance of
 "No, there's no legitimate use-case". If B falls (and I suppose it maybe it
 does), then A does too, and classes would indeed be necessary.

Yes, I think that is one very good example of a use-case, which is probably one 
shared between a bunch of different applications in practice (scientific 
simulation, games, probably other stuff too).

As I said before, I don't think it mandates _classes_, but it does mandate 
reference-type RNGs that are properly encapsulated.

 But that's all for non-crypto RNGs. For crypto-RNGs, determinism is *never*
 anything more than a concession forced by the deterministic nature of the
 machines themselves. FWIW, this does mean  crypto-RNGs can go either struct or
 class, since the internal states really should be tied together anyway (in
order
 to prevent their *outputs* from being tied together).

What do you think about the global-singleton-instance pattern as a way of 
achieving what you want?

I think I will stop my reply here for now, because I think that your response
to 
this question (and the explanations leading up to it) probably determines how I 
will respond to your later remarks.  What are your thoughts?

Best wishes,

     -- Joe

May 11 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/11/2014 8:16 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:> 
On 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
 The seed doesn't need to be compromised for synchronized RNGs to fubar
 security.

 Before we go onto the detail of discussion, thanks very much for the
 extensive explanation.  I was slightly worried that my previous email
 might have come across as dismissive of your (completely understandable)
 concerns.

Oh, not at all. I've been finding the discussion rather interesting. :)

 I'm actually quite keen that we can find a mutually agreeable
 solution :-)

I agree. And I think our stances aren't quite as opposed as they may seem.

 Sure, but this is a consequence of two things (i) CryptoRng is a value
 type and (ii) it gets passed by value, not by reference.

 In your example, obviously one can fix the problem by having the
 function declared instead as

      ubyte[] getRandomJunk(ref CryptoRng rand) { ... }

 but I suspect you'd say (and I would agree) that this is inadequate:
 it's relying on programmer virtue to ensure the correct behaviour, and
 sooner or later someone will forget that "ref".  (It will also not
 handle other cases, such as an entity that needs to internally store the
 RNG, as we've discussed many times on the list with reference to e.g.
 RandomSample or RandomCover.)

Right. Agreed on all points.

 Obviously one _can_ solve the problem by the internal state variables of
 the RNG being static, but I'd suggest to you that RNG-as-reference-type
 (which doesn't necessarily need to mean "class")

Yea, doesn't necessarily mean class, but if it is made a reference type 
then class is likely the best option. For example, I'd typically regard 
struct* in a D API as a code smell.

 solves that particular
 problem without the constraints that static internal variables have.

Pretty much agreed, but the only question is whether those constraints 
of using static internals are good/bad/inconsequential:

For non-crypto RNGs:

While I've tended to think the usefulness of a library-provided RNG that 
permits independent-but-identically-seeded instances is small and 
debatable, through this discussion I have become sufficiently convinced 
that they're worth permitting. Besides, even if there weren't need for 
it, the downsides of permitting such a feature (as long as it's not 
accident-prone) are minimal, if not zero.

So I'm fine going the class route (or otherwise reference-based) and 
making internal state per-instance. Or even having a "duplicate this RNG 
with identical state" function, if people want it.

I think we're pretty well agreed on non-crypto RNGs. Your stance is 
convincing here.

For crypto-RNGs:

A crypto-RNG exists for exactly one reason only: To stretch the useful 
lifetime of a limited source of truely non-deterministic 
randomness/entropy (or nearly-true randomness, if that's the best 
available). Because of this, any and all determinism is a concession, 
not a feature (unlike for non-crypto deterministic-RNGs). Even *how* you 
use it deliberately affects the internal state, not just "how many times 
you asked for a value". These things go all-out to throw any wrenches 
they can into any sources of determinism. I was actually quite impressed :)

In fact, the seeding/reseeding is specifically defined to be completely 
internal to the crypto-RNG (at least with Hash_DRBG anyway, probably 
others) - the user *never* provides a seed - which intentionally makes 
it that much harder for an application to use deterministic seeds (which 
would compromise the security implications of the crypto-RNG, and 
therefore defeat the whole point of using a crypto-RNG instead of a 
normal RNG).

All this is because determinism is NOT what a crypto-RNG is for, it's 
exactly what a crypto-RNG is specifically designed to fight against.

What all that implies: A crypto-RNG shouldn't *explicitly* provide a way 
to get different instances with identical internal states, and 
definitely shouldn't let it happen by accident. It's also under 
absolutely no obligation whatsoever for relying on determinism to even 
be possible at all (it would carry certain difficulties anyway).

Luckily though, that doesn't imply anything particularly profound. *If* 
it's even possible to get identical crypto-RNGs at all, then as long you 
have to work to do it (memcopying raw class data, providing a custom 
entropy source that's written to be deterministic, or even using muliple 
threads/processes, etc), then everything's all good.

Therefore, a class or otherwise reference-based approach is fine for 
crypt-RNGs, too.

I think my preference would still be to keep the internal state static 
here though (again, just speaking for crypto-RNGs only). As I've argued, 
the determinism is a non-feature for crypto-RNGs (they deliberately 
fight it every step of the way), and the shared state carries a couple 
entropy-related benefits (Reseeding one, ie accumulating additional 
fresh entropy, benefits all others. And all RNG activity within in the 
thread acts as additional entropy). So I only see upsides, not downsides.

The only potential issue I see: Are there going to be people who are 
looking for determinism in an RNG and intentionally want to use 
something like Hash_DRBG for that. Maybe for the sake of it's usage of 
SHA. They would have to either limit the number of values they generate 
(to avoid the algorithm's internal reseeding) or else provide their own 
entropy source (not too difficult though, since I've recently 
parametrized HashDRBG on that). But I suppose it could be done.

Shoot, now I'm not so certain anymore...

Well, I suppose permitting a crypto-RNG to be used in replacement for a 
simpler deterministic RNG would be perfectly reasonable to permit 
*provided that* it wasn't done in a way that would make it easier for 
the crypto-uses to make the wrong choice of non-global instances. 
Although, I don't know whether that would ever be appropriate anyway, if 
we already have RNGs intended for non-crypto deterministic purposes.

 This is a very interesting point.  My understanding is that in many
 (most?) programs, each individual thread does typically operate using a
 single RNG instance anyway.  The question for me is whether that's a
 design choice of the developer or something that is an unavoidable
 outcome of the RNG design.

 For example, it's readily possible to just instantiate a single RNG
 (crypto or otherwise, as required) at the start of the program and pass
 that around: if it's a reference type (as I would argue it should be)
 then the effects you desire will still take place.

Yea, that is true.

 Alternatively, one can also implement a thread-global RNG instance which
 functions will use.  This is already done with rndGen, which is
 effectively a singleton pattern.  There's no reason why one couldn't
 also define a cryptoGen which works in a similar way, just using an
 instance of a cryto RNG where rndGen uses a Mersenne Twister:
 

https://github.com/D-Programming-Language/phobos/blob/d2f2d34d52f89dc9854069b13f52523965a7107e/std/
andom.d#L1161-L1174 

 

https://github.com/WebDrake/std.random2/blob/a071d8d182eb28516198bb292a0b45f86f4425fe/std/random2
generator.d#L60-L81 

 The benefit of doing it this way, as opposed to static internal
 variables, is that you aren't then constrained to have only one single,
 effectively thread-global instance of _every_ RNG you create.

Non-crypto RNGs: Yea, that's good, I like it.

Crypto RNGs: Hmm, good question. My concern is that it's something the 
user has to specifically choose to use. It's not what automatically 
happens when you ask for an instance of a specific RNG.


 You may also recall earlier games like Elite which were able to "store"
 whole universes on a single 5 1/4" floppy by virtue of auto-generating
 that universe from a repeatable random sequence.  It was amazing what
 could be done on the old BBC Micro :-)

Yea, I have a deep appreciation for those 8-bit-era machines. I was an 
Apple-II guy myself. (Apple-IIc).


 I completely agree that _unintended_ identical RNG sequences are a very
 bad thing that must be avoided.  I think that we're in disagreement only
 about how best to achieve that in a sane/safe/uncomplicated way.  With
 luck we will find something that works for both of us :-)

 What I feel is that static internal variables go too far in the opposite
 direction: they make it impossible to have even _intentionally_
 identical RNG sequences.  There's no way, for example, to .save or .dup
 an RNG instance with this design.

Fair enough, at least for non-crypto RNGs. For crypto RNGs, I'm thinking 
now that static state can be avoided as long as the design still does a 
sufficient job of steering actual crypto-purpose users away from 
multiple separate instances.

 However, the effect of a common and reliable source of randomness, used
 by all parts of the program, can be implemented instead by a
 thread-global singleton instance of an RNG as described above.  This
 seems to me to be a better solution as it gives you what you achieve
 using static variables, but without preventing you from instantiating as
 many other independent RNG instances as you like.

Yea, it may. For crypto-RNGs though I'd have it give it some more thought.

 A. Does that really necessitate moving to classes?

 No, the important distinction is value vs. reference types, not structs
 v. classes.  One could implement the RNGs as struct-based reference 

types.

True, although in general I think any reference-based design that 
doesn't use classes requires fairly strong justification for doing so. I 
think we're unlikely to see that happen here. So I'm good with classes 
(for both crypto and non-crypto RNGs) unless some big unexpected issue 
pops up.

 I wrote a class-based implementation because I wanted to see how that
 came out to various struct-based approaches monarch_dodra and I had
 tried.  I do think that it offers much value in terms of simplicity and
 elegance of design, but there may be other costs that make it
 untenable.

Have you found any such costs yet, or anything in particular that 
suggests there may be some? Intuitively, I wouldn't think the minor 
amount of (by default) GC heap usage would matter (just as one 
particular aspect of classes).

 B. Are there any legitimate use-cases for "specifically relying on
 deterministic
 sequence from an RNG"?

 Well, when I was writing large-scale scientific simulations, it was
 certainly useful to be able to re-run them with the same random
 sequence.  Given the number of random variates I was generating in the
 process, it was not really practical to store the sequence on memory or
 disk.  The same is true of many circumstances where one is writing a
 program that relies on randomness, but wants to test it reliably.

Fair enough. Between that and games, I'm convinced.


 And so, the question becomes rather, "Is it legitimate to enforce a
 thread-global source of randomness in all circumstances?"  I'd say the
 answer to that is a definite no.

For "...in **all** circumstances", I'm now sold on your answer, too.

What I now think is unclear is: "Is there *some* circumstance where it's 
legitimate to enforce a thread-global source of randomness?"


      struct SomeRNG { private static InternalState _internals; ... }

      unittest
      {
          SomeRNG rng;
          rng.seed(KnownSeed);
          // verify that SomeRNG produces expected sequence of variates
      }

 Now when you start your _actual_ program, the internal state of SomeRNG
 will already have been set by the unittests.  Obviously if you are doing
 things properly you will scrub this by re-seeding the RNG at the start
 of your program proper, but the risk is clearly still there.

Good point.


 What do you think about the global-singleton-instance pattern as a way
 of achieving what you want?

 I think I will stop my reply here for now, because I think that your
 response to this question (and the explanations leading up to it)
 probably determines how I will respond to your later remarks.  What are
 your thoughts?

Again, for non-crypto RNGs, I'm totally with you now.

For crypto-RNGs, I think it's unclear. The question hinges on:

A. "Can we permit deterministic usages without making crypto users more 
likely to have multiple independent instances within a thread? 
(regardless of whether crypto users do it intentionally or unintentionally)"

That question, of course, needs to be balanced with:

B. "Is 'non-crypo deterministic' an inappropriate misuse of crypto-RNGs 
(when we already have RNGs designed for non-crypo deterministic purposes 
anyway?)"

and C. "How much should we even care about B?"

I'm concerned that the global-singleton-instance pattern may be 
insufficient for "A".

What I think I'd like to see is this: For *all* RNGs (crypto and 
non-crypto), a reference-based design where obtaining a common 
thread-global instance is always the implicit default behavior, and 
individual or duplicate instances can always be obtained *explicitly* in 
a way that cannot be misread/misinterpreted. That would succeed at "A" 
and render "B" and "C" non-issues. Plus, I think it would be generally 
appropriate, even for deterministic non-crypto RNGs, anyway.

Also, I don't want to forget the issue of stream interfaces. What do you 
think about including them, but just with a big red "Subject to change 
pending a new std.stream" banner in the docs? I think that's a perfectly 
pragmatic "best of both worlds" compromise. Think it would be 
well/poorly-received?

May 12 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 12/05/14 20:17, Nick Sabalausky via Digitalmars-d wrote:
 On 5/11/2014 8:16 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:> On
 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
  >> The seed doesn't need to be compromised for synchronized RNGs to fubar
  >> security.
  >
  > Before we go onto the detail of discussion, thanks very much for the
  > extensive explanation.  I was slightly worried that my previous email
  > might have come across as dismissive of your (completely understandable)
  > concerns.

 Oh, not at all. I've been finding the discussion rather interesting. :)

Me too -- sorry for the late response, things have been busy :-)

  > Obviously one _can_ solve the problem by the internal state variables of
  > the RNG being static, but I'd suggest to you that RNG-as-reference-type
  > (which doesn't necessarily need to mean "class")

 Yea, doesn't necessarily mean class, but if it is made a reference type then
 class is likely the best option. For example, I'd typically regard struct* in a
 D API as a code smell.

Well, I wasn't going to suggest struct* as the type.  There have been various 
proposals here for a RefTypeOf template that stores an internal pointer to a 
struct instance and exposes its public interface via alias this.  Unfortunately 
this approach is probably problematic because of this issue:
https://issues.dlang.org/show_bug.cgi?id=10996

One could also just have an internal pointer to the RNG's private state 
variable(s).  monarch_dodra and I have both prototyped some designs for this, 
but I do agree that class is largely preferable, because it avoids the need for 
the developer to take responsibility for ensuring the reference type semantics.

 So I'm fine going the class route (or otherwise reference-based) and making
 internal state per-instance. Or even having a "duplicate this RNG with
identical
 state" function, if people want it.

 I think we're pretty well agreed on non-crypto RNGs. Your stance is convincing
 here.

Excellent. :-)

 For crypto-RNGs:

   [ ... snip ... ]

 I think my preference would still be to keep the internal state static here
 though (again, just speaking for crypto-RNGs only). As I've argued, the
 determinism is a non-feature for crypto-RNGs (they deliberately fight it every
 step of the way), and the shared state carries a couple entropy-related
benefits
 (Reseeding one, ie accumulating additional fresh entropy, benefits all others.
 And all RNG activity within in the thread acts as additional entropy). So I
only
 see upsides, not downsides.

Thanks for the excellent and detailed explanation here.  Your case is also 
pretty convincing.  A few remarks, though.

First, one concern I still have with static internals is essentially the same
as 
the issue I have with the reference-types-made-of-structs: it's relying on the 
programmer to "do the right thing", and you know that someone is going to
forget 
to mark as static a variable that needs it.  With any luck that will be an 
easily-spotted and fixed issue, but using a class avoids the need.

So, I'd still feel more comfortable with the idea of crypto-RNGs being classes 
and not structs -- you can still have the static internals to deal with your 
desire for uniqueness, of course.

Second, I think your idea about separating the deterministic part of the 
algorithm from the source of entropy, and allowing arbitrary sources (or 
combinations of sources) to be plugged in, is an interesting one and worth
pursuing.

 The only potential issue I see: Are there going to be people who are looking
for
 determinism in an RNG and intentionally want to use something like Hash_DRBG
for
 that. Maybe for the sake of it's usage of SHA. They would have to either limit
 the number of values they generate (to avoid the algorithm's internal
reseeding)
 or else provide their own entropy source (not too difficult though, since I've
 recently parametrized HashDRBG on that). But I suppose it could be done.

To be honest, I wouldn't worry about anyone wanting to use a crypto RNG 
algorithm deterministically.  Wait for someone to request the feature. :-)

  > The benefit of doing it this way, as opposed to static internal
  > variables, is that you aren't then constrained to have only one single,
  > effectively thread-global instance of _every_ RNG you create.

 Non-crypto RNGs: Yea, that's good, I like it.

 Crypto RNGs: Hmm, good question. My concern is that it's something the user has
 to specifically choose to use. It's not what automatically happens when you ask
 for an instance of a specific RNG.

Yea, this is a good point.  I think you've convinced me that the natural state 
of a crypto RNG is that its state should essentially be unique -- not
per-instance.

One remark: if you can separate out your algorithm into a deterministic 
algorithm templated on sources of entropy, then note that each instantiation 
will be unique _relative to the sources of entropy_ but that one could create 
multiple independent instances relying on _different_ sources of entropy.

 Fair enough, at least for non-crypto RNGs. For crypto RNGs, I'm thinking now
 that static state can be avoided as long as the design still does a sufficient
 job of steering actual crypto-purpose users away from multiple separate
instances.

Having got this far through the discussion, I feel that I'm happy with the idea 
of static state for crypto RNGs, but equally I'll be happy with alternatives. 
I 
probably do have a bit of a personal inclination to avoid static if at all 
possible, but in this case I think you've made a very reasonable argument for 
it.  (Some of the arguments I cited earlier, like the effect on function
purity, 
etc., don't apply here because crypto RNGs' .popFront() is of necessity going
to 
be non-pure.)

  > I wrote a class-based implementation because I wanted to see how that
  > came out to various struct-based approaches monarch_dodra and I had
  > tried.  I do think that it offers much value in terms of simplicity and
  > elegance of design, but there may be other costs that make it
  > untenable.
  >

 Have you found any such costs yet, or anything in particular that suggests
there
 may be some? Intuitively, I wouldn't think the minor amount of (by default) GC
 heap usage would matter (just as one particular aspect of classes).

Well, the main concern would be if using classes made it impossible (or 
frustratingly difficult) to use the RNG package's full functionality in 
non-GC-using code.

I don't think there are any issues like speed hits.

 Again, for non-crypto RNGs, I'm totally with you now.

 For crypto-RNGs, I think it's unclear. The question hinges on:

 A. "Can we permit deterministic usages without making crypto users more likely
 to have multiple independent instances within a thread? (regardless of whether
 crypto users do it intentionally or unintentionally)"

Honestly, if an RNG is designed as a crypto RNG, I don't think you need to
worry 
about supporting deterministic usage.  The main concern of my arguments was 
related to non-crypto pseudo-RNG use-cases.

 That question, of course, needs to be balanced with:

 B. "Is 'non-crypo deterministic' an inappropriate misuse of crypto-RNGs (when
we
 already have RNGs designed for non-crypo deterministic purposes anyway?)"

Probably. :-)

 and C. "How much should we even care about B?"

 I'm concerned that the global-singleton-instance pattern may be insufficient
for
 "A".

I don't think that matters.  We don't need to support deterministic usages for 
crypto RNGs.

 What I think I'd like to see is this: For *all* RNGs (crypto and non-crypto), a
 reference-based design where obtaining a common thread-global instance is
always
 the implicit default behavior, and individual or duplicate instances can always
 be obtained *explicitly* in a way that cannot be misread/misinterpreted. That
 would succeed at "A" and render "B" and "C" non-issues. Plus, I think it would
 be generally appropriate, even for deterministic non-crypto RNGs, anyway.

Well, I guess that what I feel is: the general class-based approach of 
std.random2 handles most of this.  Where crypto RNGs are concerned I'm fine
with 
the idea of the internal state being static if you feel that will maximize 
effective use of the entropy supplied.

If we combine that with the idea of templating the deterministic parts of
crypto 
RNGs on their sources of entropy, then it should be clear that there _can_ be 
multiple independent instances of a crypto RNG if that's desired, but the user 
needs to provide different sources of entropy to each in order to make that
happen.

Then, provide a sensible "default" version of each crypto RNG type, with 
explicitly specified entropy sources, so that the typical version that will be 
instantiated by users will (i) be suitable for crypto and (ii) assuming it does 
have static internals, will be unique per thread.

 Also, I don't want to forget the issue of stream interfaces. What do you think
 about including them, but just with a big red "Subject to change pending a new
 std.stream" banner in the docs? I think that's a perfectly pragmatic "best of
 both worlds" compromise. Think it would be well/poorly-received?

If the DConf discussion related to an experimental part of the standard library 
are anything to go by, I think we will have plenty of opportunity to implement 
functionality that is subject to change, so I don't think we need fear doing
that.

May 22 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/22/2014 5:01 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 12/05/14 20:17, Nick Sabalausky via Digitalmars-d wrote:
 Yea, doesn't necessarily mean class, but if it is made a reference
 type then
 class is likely the best option. For example, I'd typically regard
 struct* in a
 D API as a code smell.

 Well, I wasn't going to suggest struct* as the type.  There have been
 various proposals here for a RefTypeOf template that stores an internal
 pointer to a struct instance and exposes its public interface via alias
 this.  Unfortunately this approach is probably problematic because of
 this issue:
 https://issues.dlang.org/show_bug.cgi?id=10996

 One could also just have an internal pointer to the RNG's private state
 variable(s).  monarch_dodra and I have both prototyped some designs for
 this, but I do agree that class is largely preferable, because it avoids
 the need for the developer to take responsibility for ensuring the
 reference type semantics.

Cool, yea, we're agreed on classes looking like probably the best 
approach. (Pending your concern below about "does it work for people not 
using the GC"...)


 Thanks for the excellent and detailed explanation here.  Your case is
 also pretty convincing.  A few remarks, though.

 First, one concern I still have with static internals is essentially the
 same as the issue I have with the reference-types-made-of-structs: it's
 relying on the programmer to "do the right thing", and you know that
 someone is going to forget to mark as static a variable that needs it.
 With any luck that will be an easily-spotted and fixed issue, but using
 a class avoids the need.

 So, I'd still feel more comfortable with the idea of crypto-RNGs being
 classes and not structs -- you can still have the static internals to
 deal with your desire for uniqueness, of course.

Yea, again, classes FTW.


 Second, I think your idea about separating the deterministic part of the
 algorithm from the source of entropy, and allowing arbitrary sources (or
 combinations of sources) to be plugged in, is an interesting one and
 worth pursuing.

Yup, as one example, I felt it was a good idea to keep the door open for 
anyone wanting to directly use one o' them fancy hardware-based 
true-randomness generators. Or facilitate anyone who may need to avoid 
reliance on the randomness provided by their OS. Or enhancing 
unittests/debugging with dependency injection.


 To be honest, I wouldn't worry about anyone wanting to use a crypto RNG
 algorithm deterministically.  Wait for someone to request the feature. :-)

Alright. Sounds good :)


 Yea, this is a good point.  I think you've convinced me that the natural
 state of a crypto RNG is that its state should essentially be unique --
 not per-instance.

 One remark: if you can separate out your algorithm into a deterministic
 algorithm templated on sources of entropy, then note that each
 instantiation will be unique _relative to the sources of entropy_ but
 that one could create multiple independent instances relying on
 _different_ sources of entropy.

That's certainly true. But I don't think it's a problem since using 
multiple types of crypto-secure entropy in the same program would be a 
rather oddball usecase (and it wouldn't actually *be* a security flaw in 
and of itself).

And in the rare case where someone actually might want use multiple 
types of entropy from a single RNG, they can still just provide a custom 
"aggregate" entropy source, to draw from other sources however they deem 
fit.


 Fair enough, at least for non-crypto RNGs. For crypto RNGs, I'm
 thinking now
 that static state can be avoided as long as the design still does a
 sufficient
 job of steering actual crypto-purpose users away from multiple
 separate instances.

 Having got this far through the discussion, I feel that I'm happy with
 the idea of static state for crypto RNGs, but equally I'll be happy with
 alternatives.  I probably do have a bit of a personal inclination to
 avoid static if at all possible,

Avoiding statics *is* usually a prudent approach in most situations, I 
agree.


 but in this case I think you've made a
 very reasonable argument for it.  (Some of the arguments I cited
 earlier, like the effect on function purity, etc., don't apply here

Sounds good then :)


 because crypto RNGs' .popFront() is of necessity going to be non-pure.)

To make sure I understand (it seems my understanding of D's pure isn't 
quite as strong as I'd thought): It cannot be pure *because* of the 
static internal state, right?


 Have you found any such costs yet, or anything in particular that
 suggests there
 may be some? Intuitively, I wouldn't think the minor amount of (by
 default) GC
 heap usage would matter (just as one particular aspect of classes).

 Well, the main concern would be if using classes made it impossible (or
 frustratingly difficult) to use the RNG package's full functionality in
 non-GC-using code.

 I don't think there are any issues like speed hits.

I see.

FWIW, if simply making them class-based *did* cause any problems using 
them "in non-GC-using code", I would think that could only be due to a 
much more general failing in D. Because D's classes aren't supposed to 
require GC allocation. GC allocation is only supposed to be the default 
(for classes).


 I don't think that matters.  We don't need to support deterministic
 usages for crypto RNGs.

[In the legendary words of Sgt Rick Hunter:]

"Works for me!"


 Well, I guess that what I feel is: the general class-based approach of
 std.random2 handles most of this.  Where crypto RNGs are concerned I'm
 fine with the idea of the internal state being static if you feel that
 will maximize effective use of the entropy supplied.

 If we combine that with the idea of templating the deterministic parts
 of crypto RNGs on their sources of entropy, then it should be clear that
 there _can_ be multiple independent instances of a crypto RNG if that's
 desired, but the user needs to provide different sources of entropy to
 each in order to make that happen.

Agreed.


 Then, provide a sensible "default" version of each crypto RNG type, with
 explicitly specified entropy sources, so that the typical version that
 will be instantiated by users will (i) be suitable for crypto and (ii)
 assuming it does have static internals, will be unique per thread.

Right. And my Hash_DRBG implementation already does this "sensible default".


 Also, I don't want to forget the issue of stream interfaces. What do
 you think
 about including them, but just with a big red "Subject to change
 pending a new
 std.stream" banner in the docs? I think that's a perfectly pragmatic
 "best of
 both worlds" compromise. Think it would be well/poorly-received?

 If the DConf discussion related to an experimental part of the standard
 library are anything to go by, I think we will have plenty of
 opportunity to implement functionality that is subject to change, so I
 don't think we need fear doing that.

I think I may have missed that particular discussion (I've only been 
catching the livestreams of certain talks). Recap?

If truly "no need to fear adding functionality that's subject to 
change", then that also answers another question I was about to raise: 
Submit a PR for this algorithm now, or just simply incorporate it as 
part of your new std.random yet to go through the approval process? May 
as well go with "now".

This is one last matter: Should my initial PR for Hash_DRBG be 
struct-based or class-based? While we both agree that class-based is a 
better approach overall, the current std.random is still struct-based 
and your class-based version hasn't gone through the approvals process 
yet (I assume it needs to, since it's a fairly big reworking of a whole 
phobos module).

May 22 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

Hi Nick,

Short reply because I think things are converging pretty well decision-wise :-)

 because crypto RNGs' .popFront() is of necessity going to be non-pure.)

 To make sure I understand (it seems my understanding of D's pure isn't quite as
 strong as I'd thought): It cannot be pure *because* of the static internal
 state, right?

The main issue is because your popFront will be relying on external sources of 
entropy which I presume to be non-deterministic (otherwise, what entropy are 
they adding?).  That's what makes purity impossible.

As we've discussed, you could avoid having the static variables, but as the 
function is going to be impure anyway, they shouldn't be avoided over the
purity 
issue.

 If the DConf discussion related to an experimental part of the standard
 library are anything to go by, I think we will have plenty of
 opportunity to implement functionality that is subject to change, so I
 don't think we need fear doing that.

 I think I may have missed that particular discussion (I've only been catching
 the livestreams of certain talks). Recap?

Andrei's opening keynote seemed pretty clear that we need a space in the 
standard library where people can experiment freely.  He called it 
std.experimental but I suspect the name will be something different :-)

 If truly "no need to fear adding functionality that's subject to change", then
 that also answers another question I was about to raise: Submit a PR for this
 algorithm now, or just simply incorporate it as part of your new std.random yet
 to go through the approval process? May as well go with "now".

 This is one last matter: Should my initial PR for Hash_DRBG be struct-based or
 class-based? While we both agree that class-based is a better approach overall,
 the current std.random is still struct-based and your class-based version
hasn't
 gone through the approvals process yet (I assume it needs to, since it's a
 fairly big reworking of a whole phobos module).

Well, what I'm inclined to do is to submit my Phobos package as an experimental 
module (probably call it exp.random or something like that).  I can and
probably 
will do that in the very near future.

I think that you should similarly feel unrestricted, but maybe give it a day or 
two post-Dconf to see what the actual plan is about the experimental repo.

By the way, what are your thoughts on having a std.random.crypto module versus
a 
general std.crypto module that includes crypto RNGs and other crypto 
functionality?  I'd been considering a std.random.crypto but I'm starting to 
incline towards the view that std.random and crypto should be kept deliberately 
separate.

Best wishes,

     -- Joe

May 23 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/23/2014 3:04 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 Hi Nick,

 Short reply because I think things are converging pretty well
 decision-wise :-)

 because crypto RNGs' .popFront() is of necessity going to be non-pure.)

 To make sure I understand (it seems my understanding of D's pure isn't
 quite as
 strong as I'd thought): It cannot be pure *because* of the static
 internal
 state, right?

 The main issue is because your popFront will be relying on external
 sources of entropy which I presume to be non-deterministic (otherwise,
 what entropy are they adding?).  That's what makes purity impossible.

 As we've discussed, you could avoid having the static variables, but as
 the function is going to be impure anyway, they shouldn't be avoided
 over the purity issue.

Oh, right. For some reason I was thinking "front", not "popFront". My bad.

 I think I may have missed that particular discussion (I've only been
 catching
 the livestreams of certain talks). Recap?

 Andrei's opening keynote seemed pretty clear that we need a space in the
 standard library where people can experiment freely.  He called it
 std.experimental but I suspect the name will be something different :-)

Didn't we already have exp.* for that? (Or something like that.)

That's one talk I sadly missed. Anxiously awaiting the rerun on YouTube.

 Well, what I'm inclined to do is to submit my Phobos package as an
 experimental module (probably call it exp.random or something like
 that).  I can and probably will do that in the very near future.

 I think that you should similarly feel unrestricted, but maybe give it a
 day or two post-Dconf to see what the actual plan is about the
 experimental repo.

Ok.

 By the way, what are your thoughts on having a std.random.crypto module
 versus a general std.crypto module that includes crypto RNGs and other
 crypto functionality?  I'd been considering a std.random.crypto but I'm
 starting to incline towards the view that std.random and crypto should
 be kept deliberately separate.

Well, I don't think crypto RNGs should be put together with non-RNG 
crypto stuff like SHA, etc. (Unless maybe it's all under a 
"std.crypto.*" package...but then you still have weird questions like 
"Does MD5 go in std.digest.md or std.crypto.digest.md? What about 
SHA-1?" So still maybe kinda hairy.)

As for whether crypto RNGs should be together with non-crypto RNGs, I 
dunno. I can't really say I'd have much objection either way. I suppose 
there is general preference in D at this point to break up phobos 
modules wherever reasonable, so I guess something like this might 
generally be preferred:

std/random/package.d
std/random/deterministic.d
std/random/crypto.d

But I don't have any strong opinions on the matter. I'd say just pick 
something, run with it, and see if any bikeshedding erupts ;)

May 23 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 23/05/14 21:38, Nick Sabalausky via Digitalmars-d wrote:
 Oh, right. For some reason I was thinking "front", not "popFront". My bad.

Actually it's probably irrelevant in your case anyway, isn't it, because you'll 
be outputting raw binary data to a provided output range ... ?

 Well, I don't think crypto RNGs should be put together with non-RNG crypto
stuff
 like SHA, etc. (Unless maybe it's all under a "std.crypto.*" package...but then
 you still have weird questions like "Does MD5 go in std.digest.md or
 std.crypto.digest.md? What about SHA-1?" So still maybe kinda hairy.)

 As for whether crypto RNGs should be together with non-crypto RNGs, I dunno. I
 can't really say I'd have much objection either way. I suppose there is general
 preference in D at this point to break up phobos modules wherever reasonable,
so
 I guess something like this might generally be preferred:

 std/random/package.d
 std/random/deterministic.d
 std/random/crypto.d

OK, so let's run with the idea that at some point crypto RNGs will be a 
submodule of std.random.

May 23 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/23/2014 3:43 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 OK, so let's run with the idea that at some point crypto RNGs will be a
 submodule of std.random.

So to recap my understanding of it:

An initial PR for Hash_DRBG being struct-based and directly part of 
"std.random", and then the submodule and conversion to class being part 
of your std.random2?

May 24 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 24/05/14 19:46, Nick Sabalausky via Digitalmars-d wrote:
 An initial PR for Hash_DRBG being struct-based and directly part of
 "std.random"

I think that's up to you.  I don't want to hold you back here, but equally, I 
feel that crypto functionality probably should be prototyped in an experimental 
module before being finalized in the standard library.

It's something that's too important to get right (and properly reviewed).

May 24 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 5/24/2014 5:19 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 24/05/14 19:46, Nick Sabalausky via Digitalmars-d wrote:
 An initial PR for Hash_DRBG being struct-based and directly part of
 "std.random"

 I think that's up to you.  I don't want to hold you back here, but
 equally, I feel that crypto functionality probably should be prototyped
 in an experimental module before being finalized in the standard library.

Perhaps. In any case, I've tossed up a PR for it (it contains a few 
changes since the latest DAuth version of it). Further HashDRBG 
discussion should probably go there:

https://github.com/D-Programming-Language/phobos/pull/2208

"Destroy"

 It's something that's too important to get right (and properly reviewed).

May 27 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 04/05/14 20:10, Joseph Rushton Wakeling via Digitalmars-d wrote:
 Probably :-)  Let's put it this way:

... we can always empirically verify the uniformity of the distribution :-)

May 04 2014

D Programming

C/C++ Programming

Other

digitalmars.D - isUniformRNG