www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - isUniformRNG

reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
In std.random, is the "isUniformRNG" intended to determine whether the 
given type is *some* RNG or just a *specific* form of RNG? Because I 
don't see any "isRNG" that's more general.

More importantly, should a crypto RNG count as "isUniformRNG"?
May 03 2014
next sibling parent reply Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 04/05/14 04:03, Nick Sabalausky via Digitalmars-d wrote:
 In std.random, is the "isUniformRNG" intended to determine whether the given
 type is *some* RNG or just a *specific* form of RNG? Because I don't see any
 "isRNG" that's more general.

Yes, it is meant to denote that the type is a _uniform_ random number generator, that is, it's an input range whose elements are numbers drawn from a uniform distribution. One could even be more strict as per the C++11 standard and specify that it's a range whose elements are unsigned integral types uniformly distributed in the closed interval [min, max]. Personally speaking, I have spent quite a lot of time being uncertain about whether the constraint to only unsigned integral types is entirely appropriate (e.g. Boost::Random contains RNGs whose underlying type is floating-point), but it _does_ greatly simplify many design factors. For example, the fact that you're given a guarantee about bounds makes things much simpler when you move on to (say) random distributions, such as the uniform distribution, where you want to control precisely the upper and lower bound behaviour (closed or open). About a more general "isRNG" template: can you be more precise about what you are interested in achieving with this? Generally speaking I would find it rather dangerous to go passing around sources of randomness without having some understanding of their properties :-) I should also add: the C++11 standard distinguishes between uniform random number generators (the aforementioned uniformly-distributed-unsigned-integers-in-[min, max]) and uniform _distributions_, which take the output of a uniform RNG and transform it into random values drawn from other distributions. Again, here we can see the importance of C++11's strict definition of a uniform RNG: suppose we take a uniform _distribution_ of floating-point numbers in the half-open range [0, 1) and use it as the _source_ of randomness for a uniform distribution in the closed interval [0, 1]. It won't work, because the assumptions about bounds won't work.
 More importantly, should a crypto RNG count as "isUniformRNG"?

If it's producing unsigned integral types uniformly distributed in a closed interval [min, max], why not?
May 04 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/4/2014 3:47 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 About a more general "isRNG" template: can you be more precise about
 what you are interested in achieving with this?  Generally speaking I
 would find it rather dangerous to go passing around sources of
 randomness without having some understanding of their properties :-)

Fair enough point, I'll explain my situation: On a general level, I'm trying to grok the whole intent of isUniformRNG and see whether or not anything else may ever be needed in addition to isUniformRNG. I'm not trying to push an "isRNG", just trying to understand std.random's current intentions and reasoning, so I know how to work with it appropriately. But more immediately, since Phobos lacks a crypto-secure RNG, I'm implementing NIST's Hash_DRBG (backed by the OS-provided RtlGenRandom/CryptGenRandom and "/dev/random" as entropy sources). Hopefully I can get it into a Phobos-acceptable state. Now, I can follow the spec for the Hash_DRBG algorithm well enough, but I'm not really solid on random-number theory, so I can't be certain whether or not isUniformRNG is appropriate for this. I would assume "yes", but I don't want to assume. Hence my inquiries.
May 04 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/4/2014 11:38 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 16:28, Nick Sabalausky via Digitalmars-d wrote:
 On a general level, I'm trying to grok the whole intent of
 isUniformRNG and see
 whether or not anything else may ever be needed in addition to
 isUniformRNG. I'm
 not trying to push an "isRNG", just trying to understand std.random's
 current
 intentions and reasoning, so I know how to work with it appropriately.

Yea, it's a good question. I think I would answer it as follows. [...]

Ok, I see. Thanks.
 But more immediately, since Phobos lacks a crypto-secure RNG, I'm
 implementing
 NIST's Hash_DRBG (backed by the OS-provided
 RtlGenRandom/CryptGenRandom and
 "/dev/random" as entropy sources). Hopefully I can get it into a
 Phobos-acceptable state.

That's great -- I really look forward to seeing your work :-) Can you give me some links to the spec?

The spec for Hash_DRBG is in "NIST Special Publication 800-90A" which also includes a few other crypto RNGs (including, unfortunately, the one with a known NSA backdoor, Dual_EC_DRBG, but it's easy enough to just not bother implementing that one): http://csrc.nist.gov/publications/nistpubs/800-90A/SP800-90A.pdf That paper is linked to from here (seems to be down right now though): http://csrc.nist.gov/groups/ST/toolkit/random_number.html Which I found from here: http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator#Standards
 Now, I can follow the spec for the Hash_DRBG algorithm well enough,
 but I'm not
 really solid on random-number theory, so I can't be certain whether or
 not
 isUniformRNG is appropriate for this. I would assume "yes", but I
 don't want to
 assume. Hence my inquiries.

I don't know that this necessarily needs too much in the way of random-number theory, but there are 3 questions that really need answering here: * What's the type of number that this algorithm produces?

Just a string of random bits. Effectively unsigned integers.
     * What's the min and max values produced, and are these
       inclusive bounds, i.e. are the random numbers drawn
       from the closed [min, max] interval?

It's all based on SHA hashes (of seeds, internal state, and some other stuff) and it doesn't explicitly exclude any output values. So if SHA is working as expected, then yea, I'd imagine it should be a closed interval over [0, 2^numBitsRequested - 1] where "numBitsRequested" is permitted by the spec to be any positive integer.
     * Are these numbers uniformly distributed?

This is the part I've been a little unclear on, although maybe I'm just being paranoid about it. Doesn't a uniform distribution carry certain reliable statistical properties? I don't know whether that could be used to improve the likelihood of guessing future values. If so, then maybe the algorithm is designed to avoid that? Then again, wouldn't the only alternative to uniform distribution be a weighted distribution? I can't imagine an RNG intended for crypto would be deliberately weighted (unless maybe there were some randomness to the weights...if that even makes any sense at all). Maybe I'm just overthinking it?
May 04 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/4/2014 2:10 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 19:42, Nick Sabalausky via Digitalmars-d wrote:
 Just a string of random bits. Effectively unsigned integers.

Ahh, OK. So in practice you can probably template it on an unsigned integral type (which could include bool) and it'll just take the appropriate number of bits from the stream, no ... ? Cf. what I did with /dev/urandom etc.: https://github.com/WebDrake/std.random2/blob/master/std/random2/device.d#L122

Well, Hash_DRBG isn't really a normal stream since, based on my reading of its spec, it sounds like (for example) requesting one byte four times will give a different result than requesting four bytes all at once (assuming you're starting from the same internal state and there's no reseeding). But aside from that minor detail, yes, that's essentially correct. And much like /dev/(u)random, you could even make the number of bytes/bits requested a per-call runtime parameter (although that would diverge from the existing std.random interfaces and would require either allocating or taking an output sink, so I don't know whether I'll bother).
 Then again, wouldn't the only alternative to uniform distribution be a
 weighted
 distribution? I can't imagine an RNG intended for crypto would be
 deliberately
 weighted (unless maybe there were some randomness to the weights...if
 that even
 makes any sense at all).

 Maybe I'm just overthinking it?

Probably :-) Let's put it this way: if you think in terms of the individual bits being generated, there obviously has to be, from the point of view of the user of the algorithm, no way to decide which bit value is more likely, which corresponds to a uniform distribution of individual bit values. And that in turn will map to a uniform distribution of bit sequences of any length.

Yea. Plus, this doc about testing these crypto PRNGs... http://csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf ...does mention the importance of "uniformity". So I think it's probably safe to figure this is a uniform distribution unless some expert chimes in and says otherwise. Thanks for the help.
May 04 2014
next sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/4/2014 3:26 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution
 unless
 some expert chimes in and says otherwise.

 Thanks for the help.

You're very welcome. Keep me posted on how things go with your implementation! :-)

Will do. Speaking of, this algo uses SHA-1 and SHA-2 (depending on the desired crypto strength of the psuedo-randomness). SHA-1 of course has been in Phobos for awhile. I've recently expanded it to support SHA-2, pending sufficient Phobos pull code reviews: https://github.com/D-Programming-Language/phobos/pull/2129
May 04 2014
prev sibling next sibling parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/4/2014 3:26 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution
 unless
 some expert chimes in and says otherwise.

 Thanks for the help.

You're very welcome. Keep me posted on how things go with your implementation! :-)

It's now implemented and working in DAuth's HEAD: https://github.com/Abscissa/DAuth/blob/master/src/dauth/hashdrbg.d Usage is pretty simple: ---------------------------- import std.algorithm : take; import std.array : array; import dauth.sha; import dauth.hashdrbg; HashDRBG!uint rand; // The above is equivalent to: // HashDRBG!(uint, SHA512, "DAuth") rand; // Now use rand just like any other RNG in std.random: uint[] values1 = rand.take(7).array(); // The algorithm specifically supports generating // arbitrary lengths at once, so can also do: HashDRBGStream!uint randStream; ubyte[] values2; values2.length = 42; randStream.read(values2); ---------------------------- Next steps are to support Hash_DRBG's optional "additional input" feature, and to see about Phobos-ifying it and making a pull request. I'll take a look at your proposed std.random too, sorry I haven't had a chance to yet.
May 07 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/7/2014 4:06 PM, Nick Sabalausky wrote:
 // The algorithm specifically supports generating
 // arbitrary lengths at once, so can also do:
 HashDRBGStream!uint randStream;

Oops, that's supposed to be: HashDRBGStream!() randStream; //aka: // HashDRBGStream!(SHA512, "D Crypto RNG") randStream; The stream version isn't a range and only supports filling a provided ubyte[] buffer. So no element type.
May 07 2014
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/7/2014 5:09 PM, David Nadlinger wrote:
 On Wednesday, 7 May 2014 at 21:01:20 UTC, David Nadlinger wrote:
 Shouldn't that take an ubyte output range instead?

Erm, scrap that, wasn't thinking.

Actually, maybe it should, even if only as an option. Unless I'm the one not thinking now...?
May 07 2014
prev sibling parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/4/2014 4:08 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 I should also be thanking you -- this reminded me that I needed to make
 some key changes to my std.random2 proposal, including this stricter
 isUniformRNG template:
 https://github.com/WebDrake/std.random2/commit/a071d8d182eb28516198bb292a0b45f86f4425fe


 How does this look to people?  And would there be interest in seeing
 this land in the existing std.random ... ?

Some good looking stuff in there. The separation is a nice improvement, and having a "shuffle" is something we could certainly use. About avoiding problems from accidentally duplicating internal state, I'm not sure that changing to classes is really necessary for that. I addressed it in HashDRBG by simply making the internal state static. (Oops, or at least I meant to...fixing now...). The various InputRange instantiations (for the different possible elemement types) all draw from the same source (HashDRBGStream), so this should work out pretty well. I'd imagine the existing std.random algos could do something similar.
May 08 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/8/2014 5:29 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 08/05/14 19:18, Nick Sabalausky via Digitalmars-d wrote:
 Some good looking stuff in there. The separation is a nice
 improvement, and
 having a "shuffle" is something we could certainly use.

Thanks! :-) There is already a shuffle function in the existing std.random, but it's called randomShuffle: http://dlang.org/phobos/std_random.html#.randomShuffle

How'd I either miss or forget that?! ;)
 About avoiding problems from accidentally duplicating internal state,
 I'm not
 sure that changing to classes is really necessary for that. I
 addressed it in
 HashDRBG by simply making the internal state static. (Oops, or at
 least I meant
 to...fixing now...). The various InputRange instantiations (for the
 different
 possible elemement types) all draw from the same source
 (HashDRBGStream), so
 this should work out pretty well. I'd imagine the existing std.random
 algos
 could do something similar.

That seems a problematic fix for me -- doesn't it mean that there can only ever be one instance of any individual RNG?

There can technically be multiple "instances", but yea, they're all effectively tied together. However, I'm leaning towards the belief that's "correct behavior" for a RNG. It's *definitely* correct for a crypto RNG - you certainly wouldn't want two crypto RNGs ever generating the same sequence, not even deliberately. That would defeat the whole point of a crypto RNG. As for ordinary non-crypto RNGs, I honestly can't imaging any purpose for reliably generating the same values other than "playing back" a previous sequence. But if that's what you want, then IMO it's better to record the sequence of emitted values, or even record/replay the higher-level decisions which the randomness was used to influence. For randomness, record/replay is just less fiddly and error-prone than reliably regenerating values from an algorithm that intentionally tries to imitate non-predictability. One slip-up while regenerating the sequence (example: trying to replay from the same seed on a RNG that has since had a bug fixed, or on a different architecture which the RNG wasn't well-tested on) and then the inaccuracies just cascade. Plus this way you can swap in a non-deterministic RNG and everything will still work. Or more easily skip back/ahead. I'm just not seeing a legitimate use-case for multiple states of the same RNG engine (at least on the same thread) that wouldn't be better served by different approach.
 While many
 applications wouldn't notice the difference, it's a pretty major breach
 of encapsulation (and purity).

AIUI an RNG is impure by definition. (Well, unless you consider the internal state to be part of the input, but I'd say *that* view breaks encapsulation.) I'm also unconvinced that it breaks encapsulation anyway. If anything, I think it's better encapsulated since separate instances *won't* behave like they're under "quantum entanglement". It forces separate instances to stay separate. It *is* a fairly strange situation though. Unlike most encapsulation units, the whole point of an RNG is to be *inconsistent* rather than consistent. So that kinda turns everything on its ear.
 On a related note -- I like your concept of a random stream.  I think it
 may be useful to adapt as part of the way in which std.random2.device
 works.  I need to compare to other Phobos stream code -- isn't Steven
 Schweighoffer working on up-to-date stream functionality as part of
 std.io ... ?

Yea, when I wrote it I figured on proposing an optional stream-like interface for std.random. The Hash_DRBG algorithm is inherently based around the idea that each generated number can be arbitrary-length (not only that - generating ex. two 8-byte values with Hash_DRGB is fundamentally different from generating eight 2-byte values, and produces different results). And the OS-specific entropy generators are basically streams, too (on both Win and Posix, you request any number of bytes of randomness). So that was just a natural way to implement it. But it did strike me as being generally useful, so I'm glad that you agree. You're right though, the current state of Phobos streams is kind of a stumbling block. Naturally it'd be best to use the new stream interface, but since that isn't in place, that's an issue. I don't know what the state of the rewrite is. Been awhile since I've heard anything about it. I figure, at the very least, we could just make the stream interfaces for RNGs private. Then we can open it up whenever we finally do have a new std.stream to conform to. Or we could do an OutputRange instead...but the recent discussion on those makes me a bit hesitant to do so. And I definitely don't wanna perform the contortions to invert it into a non-coroutine-like InputRange. Hmm, yea, I'm more and more liking the "keep it private for now, and then see what happens" approach...unfortunately... :/
May 08 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/10/2014 12:34 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 09/05/14 02:42, Nick Sabalausky via Digitalmars-d wrote:
 There can technically be multiple "instances", but yea, they're all
 effectively
 tied together. However, I'm leaning towards the belief that's "correct
 behavior"
 for a RNG. It's *definitely* correct for a crypto RNG - you certainly
 wouldn't
 want two crypto RNGs ever generating the same sequence, not even
 deliberately.
 That would defeat the whole point of a crypto RNG.

I'm not sure I follow your point about crypto RNGs here. Assuming it's deterministic at all, if someone else knows the seed state (which would be required to get two instances generating the same sequences), aren't you buggered anyway? And isn't the point of a deterministic crypto RNG that its seed state is kept secret, or that it is seeded with non-deterministic values upon instantiation?

The seed doesn't need to be compromised for synchronized RNGs to fubar security. It's bad enough just to accidentally generate the same byte sequence for multiple things: struct CryptoRng { ulong internalState; ...etc... } CryptoRng myRand; // Oops! Duplicates internal state! ubyte[] getRandomJunk(CryptoRng rand) { return rand.take(64).array(); } // Oops! These are all identical! Passwords may as well be unsalted! auto saltForStevensPassword = getRandomJunk(myRand); auto saltForSallysPassword = getRandomJunk(myRand); auto saltForGregsPassword = getRandomJunk(myRand); // Oops! Also identical! auto oneTimeURL1 = "https://verify-email/a a.a/"~getRandomJunk(myRand); auto oneTimeURL2 = "https://verify-email/b b.b/"~getRandomJunk(myRand); auto oneTimeURL3 = "https://verify-email/c c.c/"~getRandomJunk(myRand); May as well have just defined getRandomJunk as a constant ;) Maybe not *too* terrible for randomized alien invader attack patterns (unless it makes the game boring and doesn't sell), but a fairly major failure for security purposes. There's another issue, too: First a little background: With Hash_DRBG (and I would assume other crypto RNGs as well), the RNG is periodically reseeded, both automatically and on-demand. This reseeding is *not* like normal non-crypto reseeding: With non-crypto reseeding, you're fully resetting the internal state to a certain specific value and thus resetting the existing entropy, replacing it with the fresh new entropy. But with Hash_DRBG, reseeding only *adds* additional fresh entropy - nothing gets reset. Any existing entropy still remains. So reseeding two different Hash_DRBGs with the *same* reseed data *still* results in two completely different internal states. This is deliberately defined by the NIST spec. There are reasons for that, but one noteworthy side-consequence is this: If all your Hash_DRBG's share the same internal state, then anytime one is reseeded with fresh entropy, they *all* benefit from the fresh entropy, for free. And as a (maybe smaller) additional benefit, if different parts of your program are using an RNG independently of each other, but the RNGs internally share the same state, then *each* component contributes additional (semi-)non-determinism to the pool. That part's true of other RNGs too, but the effect is compounded further by Hash_DRBG which updates the internal state differently depending on how much data is requested at each, umm...at each request.
 As for ordinary non-crypto RNGs, I honestly can't imaging any 


 reliably generating the same values other than "playing back" a previous
 sequence. But if that's what you want, then IMO it's better to 


 sequence of emitted values, or even record/replay the higher-level
 decisions
 which the randomness was used to influence.

Even that will not save you in Minecraft, where the results of different seeds will produce different results in updated versions of the game where there are more random possibilities available (more monsters, more flora and fauna...). ;-)

Yea, exactly, that's why it's best to record high-level *results* (like "spawn a cypress tree at position (12,73,4)" or "alien fighter ID 742 changed velocity to (...) at time (...)") instead of relying on recreating it all from a given seed and deterministic RNG algorithm. However...the mentions of Minecraft did raise one point that I'll concede: If you have a multiplayer environment (or more generally, a large chunk of shared multi-computer data) that's randomly-generated, then a very careful usage of a shared seed and RNG algorithm could drastically reduce the network bandwidth requirements. I have no idea whether minecraft actually works that way, or even it's even random-generated (I've only played for a few minutes - I like the style, but I don't have the attention span for sandbox stuff ;) ), but it is enough to convince me that there might be legitimate uses after all for individual RNG instances that emit the same sequence. (But not crypto-RNGs though. Those exist purely for one very specific purpose - to maximize unpredictability when there's limited "true" randomness.) Granted, in that particular case of "shared random-generated data", I'm not sure I'd feel comfortable relying on an RNG from a third-party lib, or even a standard library. One tiny little thing changes and causes one tiny little variance in the outputs...and everything gets shot to hell. And it won't even be detected unless you thought ahead enough to verify everyone's results with checksums (or some such). So if it were me, I'd either write or borrow an RNG, but then keep it in the project's own codebase where I can more easily limit any chance of changed behavior.
 There's a difference between _relying_ on the deterministic sequence
 from a pseudo-RNG and the principle of design that says if I declare two
 separate RNG instances,

      SomeRNG rng1;
      SomeRNG rng2;

 then the calls to rng1 should not have any effect on the results of the
 calls to rng2.  Or alternatively, if I do,

      SomeRNG rng1;
      SomeRNG rng2;

      rng1.seed(n);
      rng2.seed(n);

      foreach (a, b; lockstep(rng1, rng2).take(10))
      {
          writefln("%s\t%s", a, b);
      }

 then the values I get should be the same, because rng1 and rng2 are
 independent instances that have been identically seeded.

I do agree strongly that there's a major difference between "relying on deterministic sequence from an RNG" and "principle of design". However, I also maintain that the "principle of design" case demands you should *NOT* get the same results from RNGs unless you *specifically intend* to be "relying on deterministic sequence from an RNG". After all, they *are* supposed to be "random number generators". That's their primary function. I do believe strongly in the same principles of encapsulation that you do. But the thing is, the usual rules of proper encapsulated design are deliberately and fundamentally predicated on the assumption that you *want* reliably predictable deterministic behavior. The vast majority of the time, that's a correct assumption - but the whole *point* of an RNG is to *not* be deterministic. Or at least to get as close to non-determinism as reasonably possible within a fundamentally deterministic system. Since the basic goals of RNGs are in fundamental disagreement with the underlying assumptions of usual "well encapsulated" design, that calls into question the applicability of usual encapsulation rules to RNGs. At the risk of being presumptuous, I think you do agree on some level. Isn't that the whole reason you've proposed that RNGs change from structs to classes? Because you don't want identical sequences from RNGs unless *specifically* intended, right? The only parts of that I've intended to question are: A. Does that really necessitate moving to classes? B. Are there any legitimate use-cases for "specifically relying on deterministic sequence from an RNG"? For B, the above example of "reducing bandwidth/storage requirements for a large set of shared random-generated data" is making me question my former stance of "No, there's no legitimate use-case". If B falls (and I suppose it maybe it does), then A does too, and classes would indeed be necessary. But that's all for non-crypto RNGs. For crypto-RNGs, determinism is *never* anything more than a concession forced by the deterministic nature of the machines themselves. FWIW, this does mean crypto-RNGs can go either struct or class, since the internal states really should be tied together anyway (in order to prevent their *outputs* from being tied together).
 We could get into broader discussions, but I don't think it needs to be
 more complicated than: if I create an instance of a variable of type T,
 then I should expect that it will be independent of any other instance
 of a variable of type T, unless I have deliberately created that link.

Yes, I can agree with that much, but the difficulty is that synchronized "random" number generation *is* effectively an implicit link, even if the internal implementation doesn't explicitly link them. Having the same internal state *is* the link, whether it was intentional or not. To make them logically "unlink", some care is required and *may* involve a concrete link on the implementation side. (I say "may involve" instead of "will involve" because, naturally, just using classes as you've proposed also does the job of avoiding unintentional linking.) Again, the nature of "random" or "psuedo-random" is at odds with normal rules and kinda flips things around. If I copy a normal struct, then yes, I expect most operations on both to be deterministic and return the same results - but that's because most structs *model* deterministic things. If I do: auto rngB = rngA; Then they're both still psuedo-random number generators, so I think for such a simple operation as basic assignment (or passing to a func, etc) it's questionable to start getting the same values from abstraction intended to model (more or less) randomness. I'd expect to have to go further out of my way: auto rngB = rngA.clone; But, you're right, that does ultimately suggest that structs may be a questionable choice for an RNG anyway.
 I'm also unconvinced that it breaks encapsulation anyway. If anything,
 I think
 it's better encapsulated since separate instances *won't* behave like
 they're
 under "quantum entanglement". It forces separate instances to stay
 separate.

Cf. my earlier example with two individual RNG instances initialized with the same seed. This ought to produce two identical sequences; with your proposed static internals, it won't, because calls to rng1 will affect the results of subsequent calls to rng2, and vice versa.

This is a "concrete vs logical" issue. First of all, consider the basic idea of encapsulation: It's a black box. You use it, you get the result described on its label, no peeking inside. Note that the label on an RNG basically says "Retrieve a seemingly random value." Now your question, "Are two RNGs with a shared internal state encapsulated?" That's where the "concrete vs logical" comes in: Concrete: In a concrete sense, yes each one will cause the other to emit a different *specific* value, so *concretely* maybe they're not encapsulated. But remember, these are psuedo-*random* number generators - minimal predictability is the whole point. Getting the same sequence of values from two different things is correlation - ie, predictability. Predictability is the antithesis of random. Violates our primary purpose. Encapsulation or not, the label on the box lied. Logical: The logical meaning of "return a (psudo-)random number" is "gimme a seemingly random number, avoid any correlation with anything meaningful". In a logical sense, both RNGs (with shared internal state) are *still* emitting psuedo-random numbers, which is *exactly* what their encapsulation unit defines. Even better still, the internal (*not* external) interplay between the two black boxes (which "encapsulation" dictates we shouldn't be worrying about anyway) is ensuring we get *minimal* correlation between the two RNG's outputs, ie. less predictability, ie. once again *exactly* what we expect from a black box labeled "Push my button and I'll give you a seemingly random value with minimal correlation to anything meaningful". It didn't give you the "random" number you wanted? Well then you weren't really looking for random anyway. You're peeking inside the box and saying "I want *that* random value, because that's the one the other randomizer gave me". Not very encapsulated. *That* is the primary purpose of an RNG, deterministic or not. But that brings us to the secondary purpose: Maybe we really do occasionally want to be able to recreate the same sequence. Well ok, fine. But that's a special case. Expecting it as a side-effect of using an encapsulation unit labeled "psuedo-random number generator" isn't respecting encapsulation. Encapsulation is using the button labeled "Get an RNG that emits the same sequence as some other RNG."
 This is
 pretty much the definition of breaking encapsulation, no?

Well, again, the normal rules of encapsulation are predicated on the assumption that you want reliable and predictable, not random and unpredictable. With a RNG you DO want (seemingly) random and unpredictable. Thus, encapsulation is only applicable until it starts interfering with an RNG's raison d'etre. Getting predictably similar values from two RNGs when you haven't *specifically* requested so, is certainly in opposition to an RNG's very purpose.
 The point of a pseudo-RNG is that if you analyse the sequence of
 variates produced, you can't distinguish it from a genuine random
 sequence.  But it's also intended that different instances seeded with
 different values ought to produce statistically independent sequences
 with very high probability.

 Your proposal would make it impossible to _have_ multiple effective
 instances seeded differently (unless in different threads).

That's why I brought up the question of whether there was a significantly valid use-case for doing do. If there really is, then yea, my idea blows ;) For non-crypto-RNGs, I guess I'm kinda on the fence about whether there's a worthwhile use-case (but I guess I'm outnumbered, so I don't mind as long as your class idea is adopted in order to prevent accidental correlations). For crypto-RNGs, I maintain that the ability to duplicate and generate identical sequences is an anti-feature. (For that matter, crypto-RNGs are specifically *supposed* to seed, and periodically reseed, *themselves* with data that contains at least some true randomness. So hopefully, none of this should be objectionable for crypto-RNGs.) Note, FWIW, this does *not* prevent crypto-RNGs from being class-based, so that aspect isn't even an issue anyway (just to be clear).
 Hmm, yea, I'm more and more liking the "keep it private for now, and
 then see
 what happens" approach...unfortunately... :/

Yes, that's probably for the best.

Another option is to still permit the stream versions, but label them with a big red warning that they're a temporary design and may change when std.stream gets revamped. I think maybe that'd be a more pragmatic way, user-friendly to go. If they hate breakage, well, they've been warned and nobody's forcing them to use it. If they have reason to use it and don't mind potential breaking, well, then nobody's forcing them to NOT use it.
May 10 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/11/2014 8:16 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:> 
On 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
 The seed doesn't need to be compromised for synchronized RNGs to fubar
 security.

Before we go onto the detail of discussion, thanks very much for the extensive explanation. I was slightly worried that my previous email might have come across as dismissive of your (completely understandable) concerns.

Oh, not at all. I've been finding the discussion rather interesting. :)
 I'm actually quite keen that we can find a mutually agreeable
 solution :-)

I agree. And I think our stances aren't quite as opposed as they may seem.
 Sure, but this is a consequence of two things (i) CryptoRng is a value
 type and (ii) it gets passed by value, not by reference.

 In your example, obviously one can fix the problem by having the
 function declared instead as

      ubyte[] getRandomJunk(ref CryptoRng rand) { ... }

 but I suspect you'd say (and I would agree) that this is inadequate:
 it's relying on programmer virtue to ensure the correct behaviour, and
 sooner or later someone will forget that "ref".  (It will also not
 handle other cases, such as an entity that needs to internally store the
 RNG, as we've discussed many times on the list with reference to e.g.
 RandomSample or RandomCover.)

Right. Agreed on all points.
 Obviously one _can_ solve the problem by the internal state variables of
 the RNG being static, but I'd suggest to you that RNG-as-reference-type
 (which doesn't necessarily need to mean "class")

Yea, doesn't necessarily mean class, but if it is made a reference type then class is likely the best option. For example, I'd typically regard struct* in a D API as a code smell.
 solves that particular
 problem without the constraints that static internal variables have.

Pretty much agreed, but the only question is whether those constraints of using static internals are good/bad/inconsequential: For non-crypto RNGs: While I've tended to think the usefulness of a library-provided RNG that permits independent-but-identically-seeded instances is small and debatable, through this discussion I have become sufficiently convinced that they're worth permitting. Besides, even if there weren't need for it, the downsides of permitting such a feature (as long as it's not accident-prone) are minimal, if not zero. So I'm fine going the class route (or otherwise reference-based) and making internal state per-instance. Or even having a "duplicate this RNG with identical state" function, if people want it. I think we're pretty well agreed on non-crypto RNGs. Your stance is convincing here. For crypto-RNGs: A crypto-RNG exists for exactly one reason only: To stretch the useful lifetime of a limited source of truely non-deterministic randomness/entropy (or nearly-true randomness, if that's the best available). Because of this, any and all determinism is a concession, not a feature (unlike for non-crypto deterministic-RNGs). Even *how* you use it deliberately affects the internal state, not just "how many times you asked for a value". These things go all-out to throw any wrenches they can into any sources of determinism. I was actually quite impressed :) In fact, the seeding/reseeding is specifically defined to be completely internal to the crypto-RNG (at least with Hash_DRBG anyway, probably others) - the user *never* provides a seed - which intentionally makes it that much harder for an application to use deterministic seeds (which would compromise the security implications of the crypto-RNG, and therefore defeat the whole point of using a crypto-RNG instead of a normal RNG). All this is because determinism is NOT what a crypto-RNG is for, it's exactly what a crypto-RNG is specifically designed to fight against. What all that implies: A crypto-RNG shouldn't *explicitly* provide a way to get different instances with identical internal states, and definitely shouldn't let it happen by accident. It's also under absolutely no obligation whatsoever for relying on determinism to even be possible at all (it would carry certain difficulties anyway). Luckily though, that doesn't imply anything particularly profound. *If* it's even possible to get identical crypto-RNGs at all, then as long you have to work to do it (memcopying raw class data, providing a custom entropy source that's written to be deterministic, or even using muliple threads/processes, etc), then everything's all good. Therefore, a class or otherwise reference-based approach is fine for crypt-RNGs, too. I think my preference would still be to keep the internal state static here though (again, just speaking for crypto-RNGs only). As I've argued, the determinism is a non-feature for crypto-RNGs (they deliberately fight it every step of the way), and the shared state carries a couple entropy-related benefits (Reseeding one, ie accumulating additional fresh entropy, benefits all others. And all RNG activity within in the thread acts as additional entropy). So I only see upsides, not downsides. The only potential issue I see: Are there going to be people who are looking for determinism in an RNG and intentionally want to use something like Hash_DRBG for that. Maybe for the sake of it's usage of SHA. They would have to either limit the number of values they generate (to avoid the algorithm's internal reseeding) or else provide their own entropy source (not too difficult though, since I've recently parametrized HashDRBG on that). But I suppose it could be done. Shoot, now I'm not so certain anymore... Well, I suppose permitting a crypto-RNG to be used in replacement for a simpler deterministic RNG would be perfectly reasonable to permit *provided that* it wasn't done in a way that would make it easier for the crypto-uses to make the wrong choice of non-global instances. Although, I don't know whether that would ever be appropriate anyway, if we already have RNGs intended for non-crypto deterministic purposes.
 This is a very interesting point.  My understanding is that in many
 (most?) programs, each individual thread does typically operate using a
 single RNG instance anyway.  The question for me is whether that's a
 design choice of the developer or something that is an unavoidable
 outcome of the RNG design.

 For example, it's readily possible to just instantiate a single RNG
 (crypto or otherwise, as required) at the start of the program and pass
 that around: if it's a reference type (as I would argue it should be)
 then the effects you desire will still take place.

Yea, that is true.
 Alternatively, one can also implement a thread-global RNG instance which
 functions will use.  This is already done with rndGen, which is
 effectively a singleton pattern.  There's no reason why one couldn't
 also define a cryptoGen which works in a similar way, just using an
 instance of a cryto RNG where rndGen uses a Mersenne Twister:
 

 

 The benefit of doing it this way, as opposed to static internal
 variables, is that you aren't then constrained to have only one single,
 effectively thread-global instance of _every_ RNG you create.

Non-crypto RNGs: Yea, that's good, I like it. Crypto RNGs: Hmm, good question. My concern is that it's something the user has to specifically choose to use. It's not what automatically happens when you ask for an instance of a specific RNG.
 You may also recall earlier games like Elite which were able to "store"
 whole universes on a single 5 1/4" floppy by virtue of auto-generating
 that universe from a repeatable random sequence.  It was amazing what
 could be done on the old BBC Micro :-)

Yea, I have a deep appreciation for those 8-bit-era machines. I was an Apple-II guy myself. (Apple-IIc).
 I completely agree that _unintended_ identical RNG sequences are a very
 bad thing that must be avoided.  I think that we're in disagreement only
 about how best to achieve that in a sane/safe/uncomplicated way.  With
 luck we will find something that works for both of us :-)

 What I feel is that static internal variables go too far in the opposite
 direction: they make it impossible to have even _intentionally_
 identical RNG sequences.  There's no way, for example, to .save or .dup
 an RNG instance with this design.

Fair enough, at least for non-crypto RNGs. For crypto RNGs, I'm thinking now that static state can be avoided as long as the design still does a sufficient job of steering actual crypto-purpose users away from multiple separate instances.
 However, the effect of a common and reliable source of randomness, used
 by all parts of the program, can be implemented instead by a
 thread-global singleton instance of an RNG as described above.  This
 seems to me to be a better solution as it gives you what you achieve
 using static variables, but without preventing you from instantiating as
 many other independent RNG instances as you like.

Yea, it may. For crypto-RNGs though I'd have it give it some more thought.
 A. Does that really necessitate moving to classes?

No, the important distinction is value vs. reference types, not structs v. classes. One could implement the RNGs as struct-based reference

True, although in general I think any reference-based design that doesn't use classes requires fairly strong justification for doing so. I think we're unlikely to see that happen here. So I'm good with classes (for both crypto and non-crypto RNGs) unless some big unexpected issue pops up.
 I wrote a class-based implementation because I wanted to see how that
 came out to various struct-based approaches monarch_dodra and I had
 tried.  I do think that it offers much value in terms of simplicity and
 elegance of design, but there may be other costs that make it
 untenable.

Have you found any such costs yet, or anything in particular that suggests there may be some? Intuitively, I wouldn't think the minor amount of (by default) GC heap usage would matter (just as one particular aspect of classes).
 B. Are there any legitimate use-cases for "specifically relying on
 deterministic
 sequence from an RNG"?

Well, when I was writing large-scale scientific simulations, it was certainly useful to be able to re-run them with the same random sequence. Given the number of random variates I was generating in the process, it was not really practical to store the sequence on memory or disk. The same is true of many circumstances where one is writing a program that relies on randomness, but wants to test it reliably.

Fair enough. Between that and games, I'm convinced.
 And so, the question becomes rather, "Is it legitimate to enforce a
 thread-global source of randomness in all circumstances?"  I'd say the
 answer to that is a definite no.

For "...in **all** circumstances", I'm now sold on your answer, too. What I now think is unclear is: "Is there *some* circumstance where it's legitimate to enforce a thread-global source of randomness?"
      struct SomeRNG { private static InternalState _internals; ... }

      unittest
      {
          SomeRNG rng;
          rng.seed(KnownSeed);
          // verify that SomeRNG produces expected sequence of variates
      }

 Now when you start your _actual_ program, the internal state of SomeRNG
 will already have been set by the unittests.  Obviously if you are doing
 things properly you will scrub this by re-seeding the RNG at the start
 of your program proper, but the risk is clearly still there.

Good point.
 What do you think about the global-singleton-instance pattern as a way
 of achieving what you want?

 I think I will stop my reply here for now, because I think that your
 response to this question (and the explanations leading up to it)
 probably determines how I will respond to your later remarks.  What are
 your thoughts?

Again, for non-crypto RNGs, I'm totally with you now. For crypto-RNGs, I think it's unclear. The question hinges on: A. "Can we permit deterministic usages without making crypto users more likely to have multiple independent instances within a thread? (regardless of whether crypto users do it intentionally or unintentionally)" That question, of course, needs to be balanced with: B. "Is 'non-crypo deterministic' an inappropriate misuse of crypto-RNGs (when we already have RNGs designed for non-crypo deterministic purposes anyway?)" and C. "How much should we even care about B?" I'm concerned that the global-singleton-instance pattern may be insufficient for "A". What I think I'd like to see is this: For *all* RNGs (crypto and non-crypto), a reference-based design where obtaining a common thread-global instance is always the implicit default behavior, and individual or duplicate instances can always be obtained *explicitly* in a way that cannot be misread/misinterpreted. That would succeed at "A" and render "B" and "C" non-issues. Plus, I think it would be generally appropriate, even for deterministic non-crypto RNGs, anyway. Also, I don't want to forget the issue of stream interfaces. What do you think about including them, but just with a big red "Subject to change pending a new std.stream" banner in the docs? I think that's a perfectly pragmatic "best of both worlds" compromise. Think it would be well/poorly-received?
May 12 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/22/2014 5:01 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 12/05/14 20:17, Nick Sabalausky via Digitalmars-d wrote:
 Yea, doesn't necessarily mean class, but if it is made a reference
 type then
 class is likely the best option. For example, I'd typically regard
 struct* in a
 D API as a code smell.

Well, I wasn't going to suggest struct* as the type. There have been various proposals here for a RefTypeOf template that stores an internal pointer to a struct instance and exposes its public interface via alias this. Unfortunately this approach is probably problematic because of this issue: https://issues.dlang.org/show_bug.cgi?id=10996 One could also just have an internal pointer to the RNG's private state variable(s). monarch_dodra and I have both prototyped some designs for this, but I do agree that class is largely preferable, because it avoids the need for the developer to take responsibility for ensuring the reference type semantics.

Cool, yea, we're agreed on classes looking like probably the best approach. (Pending your concern below about "does it work for people not using the GC"...)
 Thanks for the excellent and detailed explanation here.  Your case is
 also pretty convincing.  A few remarks, though.

 First, one concern I still have with static internals is essentially the
 same as the issue I have with the reference-types-made-of-structs: it's
 relying on the programmer to "do the right thing", and you know that
 someone is going to forget to mark as static a variable that needs it.
 With any luck that will be an easily-spotted and fixed issue, but using
 a class avoids the need.

 So, I'd still feel more comfortable with the idea of crypto-RNGs being
 classes and not structs -- you can still have the static internals to
 deal with your desire for uniqueness, of course.

Yea, again, classes FTW.
 Second, I think your idea about separating the deterministic part of the
 algorithm from the source of entropy, and allowing arbitrary sources (or
 combinations of sources) to be plugged in, is an interesting one and
 worth pursuing.

Yup, as one example, I felt it was a good idea to keep the door open for anyone wanting to directly use one o' them fancy hardware-based true-randomness generators. Or facilitate anyone who may need to avoid reliance on the randomness provided by their OS. Or enhancing unittests/debugging with dependency injection.
 To be honest, I wouldn't worry about anyone wanting to use a crypto RNG
 algorithm deterministically.  Wait for someone to request the feature. :-)

Alright. Sounds good :)
 Yea, this is a good point.  I think you've convinced me that the natural
 state of a crypto RNG is that its state should essentially be unique --
 not per-instance.

 One remark: if you can separate out your algorithm into a deterministic
 algorithm templated on sources of entropy, then note that each
 instantiation will be unique _relative to the sources of entropy_ but
 that one could create multiple independent instances relying on
 _different_ sources of entropy.

That's certainly true. But I don't think it's a problem since using multiple types of crypto-secure entropy in the same program would be a rather oddball usecase (and it wouldn't actually *be* a security flaw in and of itself). And in the rare case where someone actually might want use multiple types of entropy from a single RNG, they can still just provide a custom "aggregate" entropy source, to draw from other sources however they deem fit.
 Fair enough, at least for non-crypto RNGs. For crypto RNGs, I'm
 thinking now
 that static state can be avoided as long as the design still does a
 sufficient
 job of steering actual crypto-purpose users away from multiple
 separate instances.

Having got this far through the discussion, I feel that I'm happy with the idea of static state for crypto RNGs, but equally I'll be happy with alternatives. I probably do have a bit of a personal inclination to avoid static if at all possible,

Avoiding statics *is* usually a prudent approach in most situations, I agree.
 but in this case I think you've made a
 very reasonable argument for it.  (Some of the arguments I cited
 earlier, like the effect on function purity, etc., don't apply here

Sounds good then :)
 because crypto RNGs' .popFront() is of necessity going to be non-pure.)

To make sure I understand (it seems my understanding of D's pure isn't quite as strong as I'd thought): It cannot be pure *because* of the static internal state, right?
 Have you found any such costs yet, or anything in particular that
 suggests there
 may be some? Intuitively, I wouldn't think the minor amount of (by
 default) GC
 heap usage would matter (just as one particular aspect of classes).

Well, the main concern would be if using classes made it impossible (or frustratingly difficult) to use the RNG package's full functionality in non-GC-using code. I don't think there are any issues like speed hits.

I see. FWIW, if simply making them class-based *did* cause any problems using them "in non-GC-using code", I would think that could only be due to a much more general failing in D. Because D's classes aren't supposed to require GC allocation. GC allocation is only supposed to be the default (for classes).
 I don't think that matters.  We don't need to support deterministic
 usages for crypto RNGs.

[In the legendary words of Sgt Rick Hunter:] "Works for me!"
 Well, I guess that what I feel is: the general class-based approach of
 std.random2 handles most of this.  Where crypto RNGs are concerned I'm
 fine with the idea of the internal state being static if you feel that
 will maximize effective use of the entropy supplied.

 If we combine that with the idea of templating the deterministic parts
 of crypto RNGs on their sources of entropy, then it should be clear that
 there _can_ be multiple independent instances of a crypto RNG if that's
 desired, but the user needs to provide different sources of entropy to
 each in order to make that happen.

Agreed.
 Then, provide a sensible "default" version of each crypto RNG type, with
 explicitly specified entropy sources, so that the typical version that
 will be instantiated by users will (i) be suitable for crypto and (ii)
 assuming it does have static internals, will be unique per thread.

Right. And my Hash_DRBG implementation already does this "sensible default".
 Also, I don't want to forget the issue of stream interfaces. What do
 you think
 about including them, but just with a big red "Subject to change
 pending a new
 std.stream" banner in the docs? I think that's a perfectly pragmatic
 "best of
 both worlds" compromise. Think it would be well/poorly-received?

If the DConf discussion related to an experimental part of the standard library are anything to go by, I think we will have plenty of opportunity to implement functionality that is subject to change, so I don't think we need fear doing that.

I think I may have missed that particular discussion (I've only been catching the livestreams of certain talks). Recap? If truly "no need to fear adding functionality that's subject to change", then that also answers another question I was about to raise: Submit a PR for this algorithm now, or just simply incorporate it as part of your new std.random yet to go through the approval process? May as well go with "now". This is one last matter: Should my initial PR for Hash_DRBG be struct-based or class-based? While we both agree that class-based is a better approach overall, the current std.random is still struct-based and your class-based version hasn't gone through the approvals process yet (I assume it needs to, since it's a fairly big reworking of a whole phobos module).
May 22 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/23/2014 3:04 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 Hi Nick,

 Short reply because I think things are converging pretty well
 decision-wise :-)

 because crypto RNGs' .popFront() is of necessity going to be non-pure.)

To make sure I understand (it seems my understanding of D's pure isn't quite as strong as I'd thought): It cannot be pure *because* of the static internal state, right?

The main issue is because your popFront will be relying on external sources of entropy which I presume to be non-deterministic (otherwise, what entropy are they adding?). That's what makes purity impossible. As we've discussed, you could avoid having the static variables, but as the function is going to be impure anyway, they shouldn't be avoided over the purity issue.

Oh, right. For some reason I was thinking "front", not "popFront". My bad.
 I think I may have missed that particular discussion (I've only been
 catching
 the livestreams of certain talks). Recap?

Andrei's opening keynote seemed pretty clear that we need a space in the standard library where people can experiment freely. He called it std.experimental but I suspect the name will be something different :-)

Didn't we already have exp.* for that? (Or something like that.) That's one talk I sadly missed. Anxiously awaiting the rerun on YouTube.
 Well, what I'm inclined to do is to submit my Phobos package as an
 experimental module (probably call it exp.random or something like
 that).  I can and probably will do that in the very near future.

 I think that you should similarly feel unrestricted, but maybe give it a
 day or two post-Dconf to see what the actual plan is about the
 experimental repo.

Ok.
 By the way, what are your thoughts on having a std.random.crypto module
 versus a general std.crypto module that includes crypto RNGs and other
 crypto functionality?  I'd been considering a std.random.crypto but I'm
 starting to incline towards the view that std.random and crypto should
 be kept deliberately separate.

Well, I don't think crypto RNGs should be put together with non-RNG crypto stuff like SHA, etc. (Unless maybe it's all under a "std.crypto.*" package...but then you still have weird questions like "Does MD5 go in std.digest.md or std.crypto.digest.md? What about SHA-1?" So still maybe kinda hairy.) As for whether crypto RNGs should be together with non-crypto RNGs, I dunno. I can't really say I'd have much objection either way. I suppose there is general preference in D at this point to break up phobos modules wherever reasonable, so I guess something like this might generally be preferred: std/random/package.d std/random/deterministic.d std/random/crypto.d But I don't have any strong opinions on the matter. I'd say just pick something, run with it, and see if any bikeshedding erupts ;)
May 23 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/23/2014 3:43 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 OK, so let's run with the idea that at some point crypto RNGs will be a
 submodule of std.random.

So to recap my understanding of it: An initial PR for Hash_DRBG being struct-based and directly part of "std.random", and then the submodule and conversion to class being part of your std.random2?
May 24 2014
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 5/24/2014 5:19 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 24/05/14 19:46, Nick Sabalausky via Digitalmars-d wrote:
 An initial PR for Hash_DRBG being struct-based and directly part of
 "std.random"

I think that's up to you. I don't want to hold you back here, but equally, I feel that crypto functionality probably should be prototyped in an experimental module before being finalized in the standard library.

Perhaps. In any case, I've tossed up a PR for it (it contains a few changes since the latest DAuth version of it). Further HashDRBG discussion should probably go there: https://github.com/D-Programming-Language/phobos/pull/2208 "Destroy"
 It's something that's too important to get right (and properly reviewed).

May 27 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 04/05/14 16:28, Nick Sabalausky via Digitalmars-d wrote:
 On a general level, I'm trying to grok the whole intent of isUniformRNG and see
 whether or not anything else may ever be needed in addition to isUniformRNG.
I'm
 not trying to push an "isRNG", just trying to understand std.random's current
 intentions and reasoning, so I know how to work with it appropriately.

Yea, it's a good question. I think I would answer it as follows. From a design point of view, it's helpful to distinguish between code that acts as a fundamental source of (usually pseudo-) randomness, versus stuff that transforms the output of those random sources into other forms (say, uniformly-distributed floating point numbers, or rolls of a 6-sided die, or permutations of an array, or whatever else). In other words, it's helpful to distinguish between _generators_ of randomness, versus _users_ of randomness. Now, in principle, there are many different kinds of random generator available, not all of them drawing from a uniform distribution. But again, from a design point of view, it's helpful to constrain the range of functionality assumed, and uniformly-distributed random numbers are a very good foundation from which to build, because they can reliably be used to generate numbers drawn from any other probability distribution. On top of that, it's useful to have the constraint of a distribution on a closed interval that has explicitly included minimum and maximum values, because that greatly simplifies the checks and assumptions one has to make in writing functions that make use of the random generator. Hence you come to this definition of a "uniform random number generator", which is the fundamental building block of all other random-number functionality. What the isUniformRNG template is _really_ asking is, "Is this going to give me a valid source of randomness to plug in to all the other random-number-generation functions, so that they'll do what I expect?" Simple example of how that can be important: suppose you have a random generator that gives you uniform numbers in the half-open interval [min, max). Now run that through std.random.uniform!"[]"(). You will not in fact get random numbers drawn from the closed interval you have specified.
 But more immediately, since Phobos lacks a crypto-secure RNG, I'm implementing
 NIST's Hash_DRBG (backed by the OS-provided RtlGenRandom/CryptGenRandom and
 "/dev/random" as entropy sources). Hopefully I can get it into a
 Phobos-acceptable state.

That's great -- I really look forward to seeing your work :-) Can you give me some links to the spec?
 Now, I can follow the spec for the Hash_DRBG algorithm well enough, but I'm not
 really solid on random-number theory, so I can't be certain whether or not
 isUniformRNG is appropriate for this. I would assume "yes", but I don't want to
 assume. Hence my inquiries.

I don't know that this necessarily needs too much in the way of random-number theory, but there are 3 questions that really need answering here: * What's the type of number that this algorithm produces? * Are these numbers uniformly distributed? * What's the min and max values produced, and are these inclusive bounds, i.e. are the random numbers drawn from the closed [min, max] interval? Can you clarify?
May 04 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 04/05/14 19:42, Nick Sabalausky via Digitalmars-d wrote:
 Just a string of random bits. Effectively unsigned integers.

Ahh, OK. So in practice you can probably template it on an unsigned integral type (which could include bool) and it'll just take the appropriate number of bits from the stream, no ... ? Cf. what I did with /dev/urandom etc.: https://github.com/WebDrake/std.random2/blob/master/std/random2/device.d#L122
 It's all based on SHA hashes (of seeds, internal state, and some other stuff)
 and it doesn't explicitly exclude any output values. So if SHA is working as
 expected, then yea, I'd imagine it should be a closed interval over [0,
 2^numBitsRequested - 1] where "numBitsRequested" is permitted by the spec to be
 any positive integer.

Yea, sounds right to me.
 This is the part I've been a little unclear on, although maybe I'm just being
 paranoid about it. Doesn't a uniform distribution carry certain reliable
 statistical properties? I don't know whether that could be used to improve the
 likelihood of guessing future values. If so, then maybe the algorithm is
 designed to avoid that?

 Then again, wouldn't the only alternative to uniform distribution be a weighted
 distribution? I can't imagine an RNG intended for crypto would be deliberately
 weighted (unless maybe there were some randomness to the weights...if that even
 makes any sense at all).

 Maybe I'm just overthinking it?

Probably :-) Let's put it this way: if you think in terms of the individual bits being generated, there obviously has to be, from the point of view of the user of the algorithm, no way to decide which bit value is more likely, which corresponds to a uniform distribution of individual bit values. And that in turn will map to a uniform distribution of bit sequences of any length.
May 04 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 04/05/14 20:10, Joseph Rushton Wakeling via Digitalmars-d wrote:
 Probably :-)  Let's put it this way:

... we can always empirically verify the uniformity of the distribution :-)
May 04 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution unless
 some expert chimes in and says otherwise.

 Thanks for the help.

You're very welcome. Keep me posted on how things go with your implementation! :-)
May 04 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 04/05/14 21:26, Joseph Rushton Wakeling via Digitalmars-d wrote:
 On 04/05/14 20:56, Nick Sabalausky via Digitalmars-d wrote:
 So I think it's probably safe to figure this is a uniform distribution unless
 some expert chimes in and says otherwise.

 Thanks for the help.

You're very welcome. Keep me posted on how things go with your implementation! :-)

I should also be thanking you -- this reminded me that I needed to make some key changes to my std.random2 proposal, including this stricter isUniformRNG template: https://github.com/WebDrake/std.random2/commit/a071d8d182eb28516198bb292a0b45f86f4425fe How does this look to people? And would there be interest in seeing this land in the existing std.random ... ?
May 04 2014
prev sibling next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 7 May 2014 at 20:11:13 UTC, Nick Sabalausky wrote:
 HashDRBGStream!() randStream;
 //aka:
 // HashDRBGStream!(SHA512, "D Crypto RNG") randStream;

 The stream version isn't a range and only supports filling a 
 provided ubyte[] buffer. So no element type.

Shouldn't that take an ubyte output range instead? David
May 07 2014
prev sibling next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Wednesday, 7 May 2014 at 21:01:20 UTC, David Nadlinger wrote:
 Shouldn't that take an ubyte output range instead?

Erm, scrap that, wasn't thinking. David
May 07 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 08/05/14 19:18, Nick Sabalausky via Digitalmars-d wrote:
 Some good looking stuff in there. The separation is a nice improvement, and
 having a "shuffle" is something we could certainly use.

Thanks! :-) There is already a shuffle function in the existing std.random, but it's called randomShuffle: http://dlang.org/phobos/std_random.html#.randomShuffle
 About avoiding problems from accidentally duplicating internal state, I'm not
 sure that changing to classes is really necessary for that. I addressed it in
 HashDRBG by simply making the internal state static. (Oops, or at least I meant
 to...fixing now...). The various InputRange instantiations (for the different
 possible elemement types) all draw from the same source (HashDRBGStream), so
 this should work out pretty well. I'd imagine the existing std.random algos
 could do something similar.

That seems a problematic fix for me -- doesn't it mean that there can only ever be one instance of any individual RNG? While many applications wouldn't notice the difference, it's a pretty major breach of encapsulation (and purity). On a related note -- I like your concept of a random stream. I think it may be useful to adapt as part of the way in which std.random2.device works. I need to compare to other Phobos stream code -- isn't Steven Schweighoffer working on up-to-date stream functionality as part of std.io ... ?
May 08 2014
prev sibling next sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Friday, 9 May 2014 at 00:43:10 UTC, Nick Sabalausky wrote:
 On 5/8/2014 5:29 PM, Joseph Rushton Wakeling via Digitalmars-d 
 wrote:
 That seems a problematic fix for me -- doesn't it mean that 
 there can
 only ever be one instance of any individual RNG?

There can technically be multiple "instances", but yea, they're all effectively tied together. However, I'm leaning towards the belief that's "correct behavior" for a RNG. It's *definitely* correct for a crypto RNG - you certainly wouldn't want two crypto RNGs ever generating the same sequence, not even deliberately. That would defeat the whole point of a crypto RNG. As for ordinary non-crypto RNGs, I honestly can't imaging any purpose for reliably generating the same values other than "playing back" a previous sequence. But if that's what you want, then IMO it's better to record the sequence of emitted values, or even record/replay the higher-level decisions which the randomness was used to influence. For randomness, record/replay is just less fiddly and error-prone than reliably regenerating values from an algorithm that intentionally tries to imitate non-predictability. One slip-up while regenerating the sequence (example: trying to replay from the same seed on a RNG that has since had a bug fixed, or on a different architecture which the RNG wasn't well-tested on) and then the inaccuracies just cascade. Plus this way you can swap in a non-deterministic RNG and everything will still work. Or more easily skip back/ahead. I'm just not seeing a legitimate use-case for multiple states of the same RNG engine (at least on the same thread) that wouldn't be better served by different approach.

I strongly disagree here. If a user has explicitly requested a PRNG, they should be able to rely on its most basic property, being deterministic.
May 09 2014
prev sibling next sibling parent "Colin Grogan" <grogan.colin gmail.com> writes:
On Friday, 9 May 2014 at 00:43:10 UTC, Nick Sabalausky wrote:

 As for ordinary non-crypto RNGs, I honestly can't imaging any 
 purpose for reliably generating the same values other than 
 "playing back" a previous sequence. But if that's what you 
 want, then IMO it's better to record the sequence of emitted 
 values, or even record/replay the higher-level decisions which 
 the randomness was used to influence.

Theres lots of uses. e.g. Minecraft.
May 09 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 09/05/14 02:42, Nick Sabalausky via Digitalmars-d wrote:
 There can technically be multiple "instances", but yea, they're all effectively
 tied together. However, I'm leaning towards the belief that's "correct
behavior"
 for a RNG. It's *definitely* correct for a crypto RNG - you certainly wouldn't
 want two crypto RNGs ever generating the same sequence, not even deliberately.
 That would defeat the whole point of a crypto RNG.

I'm not sure I follow your point about crypto RNGs here. Assuming it's deterministic at all, if someone else knows the seed state (which would be required to get two instances generating the same sequences), aren't you buggered anyway? And isn't the point of a deterministic crypto RNG that its seed state is kept secret, or that it is seeded with non-deterministic values upon instantiation?
 As for ordinary non-crypto RNGs, I honestly can't imaging any purpose for
 reliably generating the same values other than "playing back" a previous
 sequence. But if that's what you want, then IMO it's better to record the
 sequence of emitted values, or even record/replay the higher-level decisions
 which the randomness was used to influence.

Even that will not save you in Minecraft, where the results of different seeds will produce different results in updated versions of the game where there are more random possibilities available (more monsters, more flora and fauna...). ;-)
 For randomness, record/replay is just less fiddly and error-prone than reliably
 regenerating values from an algorithm that intentionally tries to imitate
 non-predictability. One slip-up while regenerating the sequence (example:
trying
 to replay from the same seed on a RNG that has since had a bug fixed, or on a
 different architecture which the RNG wasn't well-tested on) and then the
 inaccuracies just cascade. Plus this way you can swap in a non-deterministic
RNG
 and everything will still work. Or more easily skip back/ahead.

There's a difference between _relying_ on the deterministic sequence from a pseudo-RNG and the principle of design that says if I declare two separate RNG instances, SomeRNG rng1; SomeRNG rng2; then the calls to rng1 should not have any effect on the results of the calls to rng2. Or alternatively, if I do, SomeRNG rng1; SomeRNG rng2; rng1.seed(n); rng2.seed(n); foreach (a, b; lockstep(rng1, rng2).take(10)) { writefln("%s\t%s", a, b); } then the values I get should be the same, because rng1 and rng2 are independent instances that have been identically seeded.
 I'm just not seeing a legitimate use-case for multiple states of the same RNG
 engine (at least on the same thread) that wouldn't be better served by
different
 approach.

We could get into broader discussions, but I don't think it needs to be more complicated than: if I create an instance of a variable of type T, then I should expect that it will be independent of any other instance of a variable of type T, unless I have deliberately created that link.
 AIUI an RNG is impure by definition. (Well, unless you consider the internal
 state to be part of the input, but I'd say *that* view breaks encapsulation.)

Not at all. In fact a good theoretical model of a pseudo-RNG is * a state variable, and * a pure function that transforms one state variable into another such that the sequence of states is indistinguishable from a random sequence. You can also add other pure transformations to map the state variable to particular random variates (e.g. unsigned integers in a certain range). You can encapsulate this quite neatly as the state variable being a private member of a struct or class, with the pure transformation function being .popFront and the state-to-variate mapping being handled by .front. Note that .popFront _is_ pure for all the pseudo-RNGs in std.random2 :-)
 I'm also unconvinced that it breaks encapsulation anyway. If anything, I think
 it's better encapsulated since separate instances *won't* behave like they're
 under "quantum entanglement". It forces separate instances to stay separate.

Cf. my earlier example with two individual RNG instances initialized with the same seed. This ought to produce two identical sequences; with your proposed static internals, it won't, because calls to rng1 will affect the results of subsequent calls to rng2, and vice versa. This is pretty much the definition of breaking encapsulation, no?
 It *is* a fairly strange situation though. Unlike most encapsulation units, the
 whole point of an RNG is to be *inconsistent* rather than consistent. So that
 kinda turns everything on its ear.

The point of a pseudo-RNG is that if you analyse the sequence of variates produced, you can't distinguish it from a genuine random sequence. But it's also intended that different instances seeded with different values ought to produce statistically independent sequences with very high probability. Your proposal would make it impossible to _have_ multiple effective instances seeded differently (unless in different threads).
 Yea, when I wrote it I figured on proposing an optional stream-like interface
 for std.random. The Hash_DRBG algorithm is inherently based around the idea
that
 each generated number can be arbitrary-length (not only that - generating ex.
 two 8-byte values with Hash_DRGB is fundamentally different from generating
 eight 2-byte values, and produces different results). And the OS-specific
 entropy generators are basically streams, too (on both Win and Posix, you
 request any number of bytes of randomness).

 So that was just a natural way to implement it. But it did strike me as being
 generally useful, so I'm glad that you agree.

 You're right though, the current state of Phobos streams is kind of a stumbling
 block. Naturally it'd be best to use the new stream interface, but since that
 isn't in place, that's an issue. I don't know what the state of the rewrite is.
 Been awhile since I've heard anything about it.

Yup, me neither. I will try and quiz Steven on it at some point.
 I figure, at the very least, we could just make the stream interfaces for RNGs
 private. Then we can open it up whenever we finally do have a new std.stream to
 conform to.

 Or we could do an OutputRange instead...but the recent discussion on those
makes
 me a bit hesitant to do so. And I definitely don't wanna perform the
contortions
 to invert it into a non-coroutine-like InputRange.

 Hmm, yea, I'm more and more liking the "keep it private for now, and then see
 what happens" approach...unfortunately... :/

Yes, that's probably for the best.
May 10 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
 The seed doesn't need to be compromised for synchronized RNGs to fubar
security.

Before we go onto the detail of discussion, thanks very much for the extensive explanation. I was slightly worried that my previous email might have come across as dismissive of your (completely understandable) concerns. I'm actually quite keen that we can find a mutually agreeable solution :-)
 It's bad enough just to accidentally generate the same byte sequence for
 multiple things:

 struct CryptoRng { ulong internalState; ...etc... }
 CryptoRng myRand;

 // Oops! Duplicates internal state!
 ubyte[] getRandomJunk(CryptoRng rand) {
      return rand.take(64).array();
 }

 // Oops! These are all identical! Passwords may as well be unsalted!
 auto saltForStevensPassword =  getRandomJunk(myRand);
 auto saltForSallysPassword =  getRandomJunk(myRand);
 auto saltForGregsPassword =  getRandomJunk(myRand);

 // Oops! Also identical!
 auto oneTimeURL1 =  "https://verify-email/a a.a/"~getRandomJunk(myRand);
 auto oneTimeURL2 =  "https://verify-email/b b.b/"~getRandomJunk(myRand);
 auto oneTimeURL3 =  "https://verify-email/c c.c/"~getRandomJunk(myRand);

 May as well have just defined getRandomJunk as a constant ;)

Sure, but this is a consequence of two things (i) CryptoRng is a value type and (ii) it gets passed by value, not by reference. In your example, obviously one can fix the problem by having the function declared instead as ubyte[] getRandomJunk(ref CryptoRng rand) { ... } but I suspect you'd say (and I would agree) that this is inadequate: it's relying on programmer virtue to ensure the correct behaviour, and sooner or later someone will forget that "ref". (It will also not handle other cases, such as an entity that needs to internally store the RNG, as we've discussed many times on the list with reference to e.g. RandomSample or RandomCover.) Obviously one _can_ solve the problem by the internal state variables of the RNG being static, but I'd suggest to you that RNG-as-reference-type (which doesn't necessarily need to mean "class") solves that particular problem without the constraints that static internal variables have.
 Maybe not *too* terrible for randomized alien invader attack patterns (unless
it
 makes the game boring and doesn't sell), but a fairly major failure for
security
 purposes.

 There's another issue, too:

 First a little background: With Hash_DRBG (and I would assume other crypto RNGs
 as well), the RNG is periodically reseeded, both automatically and on-demand.
 This reseeding is *not* like normal non-crypto reseeding: With non-crypto
 reseeding, you're fully resetting the internal state to a certain specific
value
 and thus resetting the existing entropy, replacing it with the fresh new
 entropy. But with Hash_DRBG, reseeding only *adds* additional fresh entropy -
 nothing gets reset. Any existing entropy still remains. So reseeding two
 different Hash_DRBGs with the *same* reseed data *still* results in two
 completely different internal states. This is deliberately defined by the NIST
 spec.

 There are reasons for that, but one noteworthy side-consequence is this: If all
 your Hash_DRBG's share the same internal state, then anytime one is reseeded
 with fresh entropy, they *all* benefit from the fresh entropy, for free.

 And as a (maybe smaller) additional benefit, if different parts of your program
 are using an RNG independently of each other, but the RNGs internally share the
 same state, then *each* component contributes additional (semi-)non-determinism
 to the pool. That part's true of other RNGs too, but the effect is compounded
 further by Hash_DRBG which updates the internal state differently depending on
 how much data is requested at each, umm...at each request.

This is a very interesting point. My understanding is that in many (most?) programs, each individual thread does typically operate using a single RNG instance anyway. The question for me is whether that's a design choice of the developer or something that is an unavoidable outcome of the RNG design. For example, it's readily possible to just instantiate a single RNG (crypto or otherwise, as required) at the start of the program and pass that around: if it's a reference type (as I would argue it should be) then the effects you desire will still take place. Alternatively, one can also implement a thread-global RNG instance which functions will use. This is already done with rndGen, which is effectively a singleton pattern. There's no reason why one couldn't also define a cryptoGen which works in a similar way, just using an instance of a cryto RNG where rndGen uses a Mersenne Twister: https://github.com/D-Programming-Language/phobos/blob/d2f2d34d52f89dc9854069b13f52523965a7107e/std/random.d#L1161-L1174 https://github.com/WebDrake/std.random2/blob/a071d8d182eb28516198bb292a0b45f86f4425fe/std/random2/generator.d#L60-L81 The benefit of doing it this way, as opposed to static internal variables, is that you aren't then constrained to have only one single, effectively thread-global instance of _every_ RNG you create.
 However...the mentions of Minecraft did raise one point that I'll concede: If
 you have a multiplayer environment (or more generally, a large chunk of shared
 multi-computer data) that's randomly-generated, then a very careful usage of a
 shared seed and RNG algorithm could drastically reduce the network bandwidth
 requirements.

You may also recall earlier games like Elite which were able to "store" whole universes on a single 5 1/4" floppy by virtue of auto-generating that universe from a repeatable random sequence. It was amazing what could be done on the old BBC Micro :-)
 I do agree strongly that there's a major difference between "relying on
 deterministic sequence from an RNG" and "principle of design".

 However, I also maintain that the "principle of design" case demands you should
 *NOT* get the same results from RNGs unless you *specifically intend* to be
 "relying on deterministic sequence from an RNG". After all, they *are* supposed
 to be "random number generators". That's their primary function.

 I do believe strongly in the same principles of encapsulation that you do. But
 the thing is, the usual rules of proper encapsulated design are deliberately
and
 fundamentally predicated on the assumption that you *want* reliably predictable
 deterministic behavior. The vast majority of the time, that's a correct
 assumption - but the whole *point* of an RNG is to *not* be deterministic. Or
at
 least to get as close to non-determinism as reasonably possible within a
 fundamentally deterministic system.

 Since the basic goals of RNGs are in fundamental disagreement with the
 underlying assumptions of usual "well encapsulated" design, that calls into
 question the applicability of usual encapsulation rules to RNGs.

 At the risk of being presumptuous, I think you do agree on some level. Isn't
 that the whole reason you've proposed that RNGs change from structs to classes?
 Because you don't want identical sequences from RNGs unless *specifically*
 intended, right? The only parts of that I've intended to question are:

I completely agree that _unintended_ identical RNG sequences are a very bad thing that must be avoided. I think that we're in disagreement only about how best to achieve that in a sane/safe/uncomplicated way. With luck we will find something that works for both of us :-) What I feel is that static internal variables go too far in the opposite direction: they make it impossible to have even _intentionally_ identical RNG sequences. There's no way, for example, to .save or .dup an RNG instance with this design. However, the effect of a common and reliable source of randomness, used by all parts of the program, can be implemented instead by a thread-global singleton instance of an RNG as described above. This seems to me to be a better solution as it gives you what you achieve using static variables, but without preventing you from instantiating as many other independent RNG instances as you like.
 A. Does that really necessitate moving to classes?

No, the important distinction is value vs. reference types, not structs v. classes. One could implement the RNGs as struct-based reference types. I wrote a class-based implementation because I wanted to see how that came out to various struct-based approaches monarch_dodra and I had tried. I do think that it offers much value in terms of simplicity and elegance of design, but there may be other costs that make it untenable. I am not sure the correct choice will be apparent until the current work on memory allocation settles down.
 B. Are there any legitimate use-cases for "specifically relying on
deterministic
 sequence from an RNG"?

Well, when I was writing large-scale scientific simulations, it was certainly useful to be able to re-run them with the same random sequence. Given the number of random variates I was generating in the process, it was not really practical to store the sequence on memory or disk. The same is true of many circumstances where one is writing a program that relies on randomness, but wants to test it reliably. However, fundamentally this comes down to a design point. Pseudo-random RNGs are _always_ deterministic in their operation once seeded. The difference we're really talking about here is whether the source of randomness used by different parts of a program is thread-global or not. And so, the question becomes rather, "Is it legitimate to enforce a thread-global source of randomness in all circumstances?" I'd say the answer to that is a definite no. Bear in mind that one nasty consequence of using static internal variables to force thread-global behaviour is that it means that the presence or absence of unittests might affect the actual running program. Consider: struct SomeRNG { private static InternalState _internals; ... } unittest { SomeRNG rng; rng.seed(KnownSeed); // verify that SomeRNG produces expected sequence of variates } Now when you start your _actual_ program, the internal state of SomeRNG will already have been set by the unittests. Obviously if you are doing things properly you will scrub this by re-seeding the RNG at the start of your program proper, but the risk is clearly still there.
 For B, the above example of "reducing bandwidth/storage requirements for a
large
 set of shared random-generated data" is making me question my former stance of
 "No, there's no legitimate use-case". If B falls (and I suppose it maybe it
 does), then A does too, and classes would indeed be necessary.

Yes, I think that is one very good example of a use-case, which is probably one shared between a bunch of different applications in practice (scientific simulation, games, probably other stuff too). As I said before, I don't think it mandates _classes_, but it does mandate reference-type RNGs that are properly encapsulated.
 But that's all for non-crypto RNGs. For crypto-RNGs, determinism is *never*
 anything more than a concession forced by the deterministic nature of the
 machines themselves. FWIW, this does mean  crypto-RNGs can go either struct or
 class, since the internal states really should be tied together anyway (in
order
 to prevent their *outputs* from being tied together).

What do you think about the global-singleton-instance pattern as a way of achieving what you want? I think I will stop my reply here for now, because I think that your response to this question (and the explanations leading up to it) probably determines how I will respond to your later remarks. What are your thoughts? Best wishes, -- Joe
May 11 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 12/05/14 20:17, Nick Sabalausky via Digitalmars-d wrote:
 On 5/11/2014 8:16 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:> On
 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
  >> The seed doesn't need to be compromised for synchronized RNGs to fubar
  >> security.
  >
  > Before we go onto the detail of discussion, thanks very much for the
  > extensive explanation.  I was slightly worried that my previous email
  > might have come across as dismissive of your (completely understandable)
  > concerns.

 Oh, not at all. I've been finding the discussion rather interesting. :)

Me too -- sorry for the late response, things have been busy :-)
  > Obviously one _can_ solve the problem by the internal state variables of
  > the RNG being static, but I'd suggest to you that RNG-as-reference-type
  > (which doesn't necessarily need to mean "class")

 Yea, doesn't necessarily mean class, but if it is made a reference type then
 class is likely the best option. For example, I'd typically regard struct* in a
 D API as a code smell.

Well, I wasn't going to suggest struct* as the type. There have been various proposals here for a RefTypeOf template that stores an internal pointer to a struct instance and exposes its public interface via alias this. Unfortunately this approach is probably problematic because of this issue: https://issues.dlang.org/show_bug.cgi?id=10996 One could also just have an internal pointer to the RNG's private state variable(s). monarch_dodra and I have both prototyped some designs for this, but I do agree that class is largely preferable, because it avoids the need for the developer to take responsibility for ensuring the reference type semantics.
 So I'm fine going the class route (or otherwise reference-based) and making
 internal state per-instance. Or even having a "duplicate this RNG with
identical
 state" function, if people want it.

 I think we're pretty well agreed on non-crypto RNGs. Your stance is convincing
 here.

Excellent. :-)
 For crypto-RNGs:

   [ ... snip ... ]

 I think my preference would still be to keep the internal state static here
 though (again, just speaking for crypto-RNGs only). As I've argued, the
 determinism is a non-feature for crypto-RNGs (they deliberately fight it every
 step of the way), and the shared state carries a couple entropy-related
benefits
 (Reseeding one, ie accumulating additional fresh entropy, benefits all others.
 And all RNG activity within in the thread acts as additional entropy). So I
only
 see upsides, not downsides.

Thanks for the excellent and detailed explanation here. Your case is also pretty convincing. A few remarks, though. First, one concern I still have with static internals is essentially the same as the issue I have with the reference-types-made-of-structs: it's relying on the programmer to "do the right thing", and you know that someone is going to forget to mark as static a variable that needs it. With any luck that will be an easily-spotted and fixed issue, but using a class avoids the need. So, I'd still feel more comfortable with the idea of crypto-RNGs being classes and not structs -- you can still have the static internals to deal with your desire for uniqueness, of course. Second, I think your idea about separating the deterministic part of the algorithm from the source of entropy, and allowing arbitrary sources (or combinations of sources) to be plugged in, is an interesting one and worth pursuing.
 The only potential issue I see: Are there going to be people who are looking
for
 determinism in an RNG and intentionally want to use something like Hash_DRBG
for
 that. Maybe for the sake of it's usage of SHA. They would have to either limit
 the number of values they generate (to avoid the algorithm's internal
reseeding)
 or else provide their own entropy source (not too difficult though, since I've
 recently parametrized HashDRBG on that). But I suppose it could be done.

To be honest, I wouldn't worry about anyone wanting to use a crypto RNG algorithm deterministically. Wait for someone to request the feature. :-)
  > The benefit of doing it this way, as opposed to static internal
  > variables, is that you aren't then constrained to have only one single,
  > effectively thread-global instance of _every_ RNG you create.

 Non-crypto RNGs: Yea, that's good, I like it.

 Crypto RNGs: Hmm, good question. My concern is that it's something the user has
 to specifically choose to use. It's not what automatically happens when you ask
 for an instance of a specific RNG.

Yea, this is a good point. I think you've convinced me that the natural state of a crypto RNG is that its state should essentially be unique -- not per-instance. One remark: if you can separate out your algorithm into a deterministic algorithm templated on sources of entropy, then note that each instantiation will be unique _relative to the sources of entropy_ but that one could create multiple independent instances relying on _different_ sources of entropy.
 Fair enough, at least for non-crypto RNGs. For crypto RNGs, I'm thinking now
 that static state can be avoided as long as the design still does a sufficient
 job of steering actual crypto-purpose users away from multiple separate
instances.

Having got this far through the discussion, I feel that I'm happy with the idea of static state for crypto RNGs, but equally I'll be happy with alternatives. I probably do have a bit of a personal inclination to avoid static if at all possible, but in this case I think you've made a very reasonable argument for it. (Some of the arguments I cited earlier, like the effect on function purity, etc., don't apply here because crypto RNGs' .popFront() is of necessity going to be non-pure.)
  > I wrote a class-based implementation because I wanted to see how that
  > came out to various struct-based approaches monarch_dodra and I had
  > tried.  I do think that it offers much value in terms of simplicity and
  > elegance of design, but there may be other costs that make it
  > untenable.
  >

 Have you found any such costs yet, or anything in particular that suggests
there
 may be some? Intuitively, I wouldn't think the minor amount of (by default) GC
 heap usage would matter (just as one particular aspect of classes).

Well, the main concern would be if using classes made it impossible (or frustratingly difficult) to use the RNG package's full functionality in non-GC-using code. I don't think there are any issues like speed hits.
 Again, for non-crypto RNGs, I'm totally with you now.

 For crypto-RNGs, I think it's unclear. The question hinges on:

 A. "Can we permit deterministic usages without making crypto users more likely
 to have multiple independent instances within a thread? (regardless of whether
 crypto users do it intentionally or unintentionally)"

Honestly, if an RNG is designed as a crypto RNG, I don't think you need to worry about supporting deterministic usage. The main concern of my arguments was related to non-crypto pseudo-RNG use-cases.
 That question, of course, needs to be balanced with:

 B. "Is 'non-crypo deterministic' an inappropriate misuse of crypto-RNGs (when
we
 already have RNGs designed for non-crypo deterministic purposes anyway?)"

Probably. :-)
 and C. "How much should we even care about B?"

 I'm concerned that the global-singleton-instance pattern may be insufficient
for
 "A".

I don't think that matters. We don't need to support deterministic usages for crypto RNGs.
 What I think I'd like to see is this: For *all* RNGs (crypto and non-crypto), a
 reference-based design where obtaining a common thread-global instance is
always
 the implicit default behavior, and individual or duplicate instances can always
 be obtained *explicitly* in a way that cannot be misread/misinterpreted. That
 would succeed at "A" and render "B" and "C" non-issues. Plus, I think it would
 be generally appropriate, even for deterministic non-crypto RNGs, anyway.

Well, I guess that what I feel is: the general class-based approach of std.random2 handles most of this. Where crypto RNGs are concerned I'm fine with the idea of the internal state being static if you feel that will maximize effective use of the entropy supplied. If we combine that with the idea of templating the deterministic parts of crypto RNGs on their sources of entropy, then it should be clear that there _can_ be multiple independent instances of a crypto RNG if that's desired, but the user needs to provide different sources of entropy to each in order to make that happen. Then, provide a sensible "default" version of each crypto RNG type, with explicitly specified entropy sources, so that the typical version that will be instantiated by users will (i) be suitable for crypto and (ii) assuming it does have static internals, will be unique per thread.
 Also, I don't want to forget the issue of stream interfaces. What do you think
 about including them, but just with a big red "Subject to change pending a new
 std.stream" banner in the docs? I think that's a perfectly pragmatic "best of
 both worlds" compromise. Think it would be well/poorly-received?

If the DConf discussion related to an experimental part of the standard library are anything to go by, I think we will have plenty of opportunity to implement functionality that is subject to change, so I don't think we need fear doing that.
May 22 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
Hi Nick,

Short reply because I think things are converging pretty well decision-wise :-)

 because crypto RNGs' .popFront() is of necessity going to be non-pure.)

To make sure I understand (it seems my understanding of D's pure isn't quite as strong as I'd thought): It cannot be pure *because* of the static internal state, right?

The main issue is because your popFront will be relying on external sources of entropy which I presume to be non-deterministic (otherwise, what entropy are they adding?). That's what makes purity impossible. As we've discussed, you could avoid having the static variables, but as the function is going to be impure anyway, they shouldn't be avoided over the purity issue.
 If the DConf discussion related to an experimental part of the standard
 library are anything to go by, I think we will have plenty of
 opportunity to implement functionality that is subject to change, so I
 don't think we need fear doing that.

I think I may have missed that particular discussion (I've only been catching the livestreams of certain talks). Recap?

Andrei's opening keynote seemed pretty clear that we need a space in the standard library where people can experiment freely. He called it std.experimental but I suspect the name will be something different :-)
 If truly "no need to fear adding functionality that's subject to change", then
 that also answers another question I was about to raise: Submit a PR for this
 algorithm now, or just simply incorporate it as part of your new std.random yet
 to go through the approval process? May as well go with "now".

 This is one last matter: Should my initial PR for Hash_DRBG be struct-based or
 class-based? While we both agree that class-based is a better approach overall,
 the current std.random is still struct-based and your class-based version
hasn't
 gone through the approvals process yet (I assume it needs to, since it's a
 fairly big reworking of a whole phobos module).

Well, what I'm inclined to do is to submit my Phobos package as an experimental module (probably call it exp.random or something like that). I can and probably will do that in the very near future. I think that you should similarly feel unrestricted, but maybe give it a day or two post-Dconf to see what the actual plan is about the experimental repo. By the way, what are your thoughts on having a std.random.crypto module versus a general std.crypto module that includes crypto RNGs and other crypto functionality? I'd been considering a std.random.crypto but I'm starting to incline towards the view that std.random and crypto should be kept deliberately separate. Best wishes, -- Joe
May 23 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 23/05/14 21:38, Nick Sabalausky via Digitalmars-d wrote:
 Oh, right. For some reason I was thinking "front", not "popFront". My bad.

Actually it's probably irrelevant in your case anyway, isn't it, because you'll be outputting raw binary data to a provided output range ... ?
 Well, I don't think crypto RNGs should be put together with non-RNG crypto
stuff
 like SHA, etc. (Unless maybe it's all under a "std.crypto.*" package...but then
 you still have weird questions like "Does MD5 go in std.digest.md or
 std.crypto.digest.md? What about SHA-1?" So still maybe kinda hairy.)

 As for whether crypto RNGs should be together with non-crypto RNGs, I dunno. I
 can't really say I'd have much objection either way. I suppose there is general
 preference in D at this point to break up phobos modules wherever reasonable,
so
 I guess something like this might generally be preferred:

 std/random/package.d
 std/random/deterministic.d
 std/random/crypto.d

OK, so let's run with the idea that at some point crypto RNGs will be a submodule of std.random.
May 23 2014
prev sibling parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 24/05/14 19:46, Nick Sabalausky via Digitalmars-d wrote:
 An initial PR for Hash_DRBG being struct-based and directly part of
 "std.random"

I think that's up to you. I don't want to hold you back here, but equally, I feel that crypto functionality probably should be prototyped in an experimental module before being finalized in the standard library. It's something that's too important to get right (and properly reviewed).
May 24 2014