digitalmars.D - Mir Random [WIP]

Ilya Yaroshenko (21/21) Nov 21 2016 https://github.com/libmir/mir-random

Andrei Alexandrescu (4/5) Nov 22 2016 This seems like a gratuitous departure from common D practice. Random

Joseph Rushton Wakeling (21/27) Nov 22 2016 Yes, I think this is avoiding the existing problems with RNGs and
John Colvin (12/18) Nov 22 2016 I'm pretty sure everyone *wants* it to be a range, but it turns

Andrei Alexandrescu (11/27) Nov 22 2016 The proposed design has disabled copy ctor and uses opCall() for a new

Joseph Rushton Wakeling (12/16) Nov 23 2016 I'll see if I can write that up in depth some time soon. TBH

Andrei Alexandrescu (4/17) Nov 23 2016 Well we need to get to the bottom of this if we're to make progress.

Joseph Rushton Wakeling (10/13) Nov 23 2016 Would you be up for a direct discussion of this some time --

Kagamin (4/6) Nov 23 2016 Didn't we have a generic Ref!T wrapper in std.typecons that would
Martin Nowak (12/21) Nov 25 2016 Yes the problems are inadvertent copies and a disabled this(this)

Martin Nowak (7/11) Nov 25 2016 Could have an optional .clone if copying is supported.

Joseph Rushton Wakeling (5/6) Nov 26 2016 I would personally really welcome that, but subject to the
Andrei Alexandrescu (5/9) Nov 26 2016 Input ranges with disabled this(this) would be a major breaking change

Jonathan M Davis via Digitalmars-d (24/34) Nov 27 2016 For a while now, I've been tempted to argue that we should require that

Joseph Rushton Wakeling (17/29) Nov 26 2016 I remember you made that suggestion of disabled this(this) to me

Ilya Yaroshenko (36/42) Nov 22 2016 It is safe low level architecture without performance and API

Ilya Yaroshenko (3/26) Nov 22 2016 EDIT: std.range -> std.random

Andrei Alexandrescu (2/6) Nov 23 2016 Oh, that narrows it down. Thx. -- Andrei

Joseph Rushton Wakeling (9/15) Nov 23 2016 Alternative solution: functionality that expects the full

Ilya Yaroshenko (3/20) Nov 23 2016 Good point, will add this. --Ilya
Andrei Alexandrescu (2/8) Nov 23 2016 Good idea - static assert is your friend. -- Andrei

Joseph Rushton Wakeling (25/34) Nov 23 2016 Note that if you want to do this, it's convenient to still

Ilya Yaroshenko (14/48) Nov 23 2016 For example, Mir Random basic utilities (more low level then

Joseph Rushton Wakeling (21/33) Nov 23 2016 Yes, but as described, `opCall` can easily be created as a

Ilya Yaroshenko (5/40) Nov 23 2016 Inlining will solve this problem with data duplication (if any)
Ilya Yaroshenko (4/19) Nov 23 2016 Added RandomRangeAdaptor for URBGs:

Joseph Rushton Wakeling (7/9) Nov 23 2016 This has exactly the problem I identified above, though: you're

Ilya Yaroshenko (7/16) Nov 23 2016 1. Current default RNG (Mt19937) has not state for the latest

Joseph Rushton Wakeling (8/13) Nov 23 2016 That's a detail of your implementation, though, and it's not true

Andrei Alexandrescu (7/15) Nov 23 2016 Well I do see another small problem with

Joseph Rushton Wakeling (9/11) Nov 23 2016 Yea, this touches on one of my more general concerns related to

Jonathan M Davis via Digitalmars-d (17/32) Nov 23 2016 It's quite feasible if you don't calculate front until it's called or

Joseph Rushton Wakeling (24/41) Nov 25 2016 Yes, indeed. I actually came up with a general purpose solution

Jonathan M Davis via Digitalmars-d (63/106) Nov 25 2016 It's much messier from an implementation standpoint. If the constructor

Andrei Alexandrescu (15/16) Nov 23 2016 So it either "takes 1 minute" to change the interface from opCall to

Jonathan M Davis via Digitalmars-d (16/31) Nov 23 2016 I've frequently found that in cases where I'm not operating on a range o...

Somebody (2/9) Nov 23 2016 Would, assuming the OpCall() interface, generate!(() => rng())

Kagamin (6/17) Nov 23 2016 xorshift128+ returns a temporary value computed during state

Andrei Alexandrescu (10/14) Nov 23 2016 That seems to be a minor matter. Of course at best some measurements

Kagamin (6/13) Nov 23 2016 Consider this okayish looking code:

Ryan (11/16) Nov 23 2016 Also consider the case of using 1 generator in your program, but
Jonathan M Davis via Digitalmars-d (19/33) Nov 23 2016 In the cases where I don't really need a range of random numbers, I've

Kagamin (4/13) Nov 24 2016 It's the same behavior and suffers from the same problem of reuse

Jonathan M Davis via Digitalmars-d (23/38) Nov 24 2016 How so? Because someone might call range.front again without bothering t...

Kagamin (3/5) Nov 24 2016 That's what everything in std.algorithm does.

Jonathan M Davis via Digitalmars-d (11/16) Nov 24 2016 Then call popFront or drop before passing it along if you're paranoid ab...

Kagamin (5/7) Nov 25 2016 There's little need for it, as it was pointed out earlier that

Ilya Yaroshenko (13/24) Nov 24 2016 The fix is opCall syntax. Reference types are classes, which

John Colvin (9/25) Nov 24 2016 If druntime was initialised by default using

Ilya Yaroshenko (4/11) Nov 24 2016 No, it should work without DRuntime. Nicholas wrote that it will

Kagamin (5/9) Nov 23 2016 Yes, it's just Joseph worries about microoptimizations.
Kagamin (4/7) Nov 23 2016 Though we saw that the compiler can't optimize some simple

Andrei Alexandrescu (4/10) Nov 23 2016 [Details needed]

Kagamin (3/5) Nov 24 2016 That's three instructions instead of one shr eax,1

Timon Gehr (2/7) Nov 24 2016 That would be wrong code. Cast to int after dividing.
John Colvin (75/82) Nov 24 2016 That's because of the cast(int), dividing by two is optimised

Kagamin (2/4) Nov 24 2016 What about Andrei's example?

John Colvin (3/8) Nov 24 2016 I was talking about andrei's example. He has a cast(int) in it

Kagamin (4/6) Nov 24 2016 And you think that compilation of cast(int)a.length/2 to 3

John Colvin (4/11) Nov 24 2016 Because it's correct. If a.length is larger than int.max then

Kagamin (4/7) Nov 24 2016 It can't be possibly correct when a.length is larger than int.max

Timon Gehr (3/9) Nov 24 2016 The code does not specify that this information should be recovered. Why...

Andrei Alexandrescu (56/99) Nov 23 2016 I don't understand this. Can you please be more specific? I don't see a

Kagamin (16/19) Nov 23 2016 struct A

Joseph Rushton Wakeling (3/17) Nov 23 2016 auto m = (&r).map!(a=>1);

Kagamin (3/5) Nov 23 2016 Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)

Ilya Yaroshenko (3/9) Nov 23 2016 What the principal difference with randomRangeAdaptor? Anyway,

Ryan (3/5) Nov 23 2016 Tested, works with 2.071.1

Ilya Yaroshenko (2/9) Nov 23 2016 r.randomRangeAdaptor.map!(a=>1);

Ilya Yaroshenko (69/187) Nov 23 2016 A range to use it with std.algorithm and std.range must be

Andrei Alexandrescu (5/11) Nov 23 2016 Is this a design matter, or an implementation matter? Could you

Ilya Yaroshenko (52/69) Nov 23 2016 Floating point generation are implementation matter.

Ilya Yaroshenko (3/5) Nov 23 2016 EDIT: Mod_X ~ Y + X, Y ~ X. (X & Y are independent)
Andrei Alexandrescu (9/12) Nov 23 2016 OK, so (lack of) copyability is a good argument. I'm ready to live with

Ilya Yaroshenko (7/10) Nov 23 2016 Current adaptor should be used in a function scope. (Would be

Andrei Alexandrescu (2/10) Nov 23 2016 Yah, problem here is we can't make that @safe. -- Andrei

Ilya Yaroshenko (3/20) Nov 23 2016 Can we improve D to make it safe?

Andrei Alexandrescu (4/20) Nov 23 2016 It's difficult in the general case for objective reasons (modular

Kagamin (4/9) Nov 24 2016 You want to instantiate and seed Mt every time you call a

Ilya Yaroshenko (3/12) Nov 24 2016 A function can use a global RNG defined by a user or accepts RNG

Ilya Yaroshenko (5/20) Nov 24 2016 Real uniform `rand` (-2^^exp, +2^^exp) and real UniformVariable

Timon Gehr (16/21) Nov 24 2016 I would posit that it is not actually natural to model random numbers as...

Joseph Rushton Wakeling (27/33) Nov 22 2016 As we discussed in relation to Seb's project, I think this is a

Andrei Alexandrescu (2/16) Nov 22 2016 Yah, I think that would be nice. -- Andrei

Joseph Rushton Wakeling (16/17) Nov 22 2016 This means that seemingly identical code will produce different

Andrei Alexandrescu (2/4) Nov 22 2016 Interesting. Could you please add a couple of links about that? -- Andre...

ketmar (4/10) Nov 22 2016 http://www.pcg-random.org/other-rngs.html
Kagamin (3/5) Nov 23 2016 http://xoroshiro.di.unimi.it/

Andrea Fontana (3/8) Nov 24 2016 Very short public domain implementation:

Joseph Rushton Wakeling (4/13) Nov 24 2016 Implementation in D, written during DConf 2016 ;-)

Ilya Yaroshenko (10/29) Nov 22 2016 Mir Random is going to be a library with saturated uniform FP

Joseph Rushton Wakeling (17/25) Nov 23 2016 All of which is fine in its own terms, but why prioritize the

Ilya Yaroshenko (12/31) Nov 23 2016 We have a Random alias. I think it is OK if it generates

Joseph Rushton Wakeling (6/8) Nov 23 2016 I think we're at an impasse in terms of priorities, because

Andrea Fontana (6/8) Nov 23 2016 I wonder why Mersenne Twister is *always* choosen over other

Joseph Rushton Wakeling (10/12) Nov 23 2016 The weight of history, I suspect. Mersenne Twister was the major
Ilya Yaroshenko (3/11) Nov 23 2016 A PR for CMWC is highly welcome!

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

https://github.com/libmir/mir-random


Dlang Random Number Generators



  - `opCall` API instead of range interface is used (similar to 
C++)
  - No default and copy constructors are allowed for generators.
  - 64-bit Mt19937 initialization is fixed
  - 64-bit Mt19937 is default for 64-bit targets
  - `unpredictableSeed` has not state, returns `ulong`
  - Does not depend on DRuntime (Better C concept)
  - [WIP] additional Xorshift generators


[WIP]


[WIP]


[WIP]

Please contact me if you are interesting to contribute!
https://gitter.im/libmir/public

Nov 21 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)

This seems like a gratuitous departure from common D practice. Random 
number generators are most naturally modeled in D as infinite ranges. -- 
Andrei

Nov 22 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to 
 C++)

 This seems like a gratuitous departure from common D practice. 
 Random number generators are most naturally modeled in D as 
 infinite ranges. -- Andrei

Yes, I think this is avoiding the existing problems with RNGs and 
ranges rather than solving them.

I don't blame anyone for _wanting_ to avoid them; they are nasty, 
subtle issues that seem to keep getting more complex the more one 
looks at them (for example, after my DConf talk last year, I 
realized that there were a whole set of other potential 
complications related to how ranges typically treat laziness).  
But I think they can be solved, and should be.

OTOH, there's no reason per se why there should not be an 
`opCall` for random number generators along the lines of,

     UIntType opCall()
     {
         this.popFront();
         return this.front;
     }

... just to provide options to the user.  (BTW, note the order 
there, which touches on the issues related to what lazy 
evaluation means not just for RNGs but for any non-deterministic 
IO.)

Nov 22 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to 
 C++)

 This seems like a gratuitous departure from common D practice. 
 Random number generators are most naturally modeled in D as 
 infinite ranges. -- Andrei

I'm pretty sure everyone *wants* it to be a range, but it turns 
out it's a difficult design space. Lot's of nasty trade-offs that 
can be (and have been) argued back and forth depending on your 
priorities.

Perhaps you have an insight that has been missed that can make a 
good rng range without causing less than optimal performance or 
statistically unsafe default behaviour?

IIRC you think people are making too much fuss about the 
statistically safe default behaviour, but it would be interesting 
to hear a more expanded version of that view.

Nov 22 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/22/16 7:30 PM, John Colvin wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)

 This seems like a gratuitous departure from common D practice. Random
 number generators are most naturally modeled in D as infinite ranges.
 -- Andrei

 I'm pretty sure everyone *wants* it to be a range, but it turns out it's
 a difficult design space. Lot's of nasty trade-offs that can be (and
 have been) argued back and forth depending on your priorities.

The proposed design has disabled copy ctor and uses opCall() for a new 
element. That seems to be a difference without a distinction from an 
input range that has disabled copy ctor and offers the input range 
primitives.

 Perhaps you have an insight that has been missed that can make a good
 rng range without causing less than optimal performance or statistically
 unsafe default behaviour?

We should add a reference RNG that transforms any other RNG into an 
input range that shares the actual RNG.

 IIRC you think people are making too much fuss about the statistically
 safe default behaviour, but it would be interesting to hear a more
 expanded version of that view.

I'm unclear on what that statistically unsafe default behavior is - my 
understanding is it has to do with RNGs being inadvertently copied. It 
would be great to formalize that in a well-explained issue.


Andrei

Nov 22 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
 I'm unclear on what that statistically unsafe default behavior 
 is - my understanding is it has to do with RNGs being 
 inadvertently copied. It would be great to formalize that in a 
 well-explained issue.

I'll see if I can write that up in depth some time soon.  TBH 
though I think the problem is less about RNGs and more about 
stuff like RandomSample and RandomCover (and, in future, random 
distributions that have internal state, like a struct 
implementing a normal distribution using the Ziggurat algorithm 
internally).

It's not so difficult to stop RNG state being copied 
inadvertently, but when you have ranges wrapping ranges wrapping 
ranges, each containing their own extra state that cannot be 
copied by value, things get a bit more complicated.

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 05:47 AM, Joseph Rushton Wakeling wrote:
 On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:
 I'm unclear on what that statistically unsafe default behavior is - my
 understanding is it has to do with RNGs being inadvertently copied. It
 would be great to formalize that in a well-explained issue.

 I'll see if I can write that up in depth some time soon.  TBH though I
 think the problem is less about RNGs and more about stuff like
 RandomSample and RandomCover (and, in future, random distributions that
 have internal state, like a struct implementing a normal distribution
 using the Ziggurat algorithm internally).

 It's not so difficult to stop RNG state being copied inadvertently, but
 when you have ranges wrapping ranges wrapping ranges, each containing
 their own extra state that cannot be copied by value, things get a bit
 more complicated.

Well we need to get to the bottom of this if we're to make progress. 
Otherwise it's copypasta with little changes followed by disappointment 
all the way down. -- Andrei

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 13:44:00 UTC, Andrei 
Alexandrescu wrote:
 Well we need to get to the bottom of this if we're to make 
 progress. Otherwise it's copypasta with little changes followed 
 by disappointment all the way down. -- Andrei

Would you be up for a direct discussion of this some time -- 
maybe next week once I've had time to re-review the fine details 
of my concerns?  I can provide a summary of issues as a starting 
point for that discussion, if it would be useful.

I suggest this because (i) I'm interested in your insight and 
(ii) even if we end up disagreeing on concerns and solutions, it 
would be good to make sure that we're on the same page in terms 
of understanding each other's point of view.

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
 We should add a reference RNG that transforms any other RNG 
 into an input range that shares the actual RNG.

Didn't we have a generic Ref!T wrapper in std.typecons that would 
work this way? Can't find it now.

Nov 23 2016

Martin Nowak <code dawg.eu> writes:

On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
 On 11/22/16 7:30 PM, John Colvin wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
 Alexandrescu wrote:

 The proposed design has disabled copy ctor and uses opCall() 
 for a new element. That seems to be a difference without a 
 distinction from an input range that has disabled copy ctor and 
 offers the input range primitives.

Yes the problems are inadvertent copies and a disabled this(this) 
would prevent that. RNGs should have unique ownership of their 
internal state.
Using InputRanges with phobos is somewhat clumsy. Maybe people 
have been burned by those and now generally consider InputRanges 
as broken.
Maybe non-copyability needs to become a requirement for 
InputRanges.

 We should add a reference RNG that transforms any other RNG 
 into an input range that shares the actual RNG.

We already have refRange for that, no?
Also refCounted(Random(unpredictableSeed)) should work.

Nov 25 2016

Martin Nowak <code dawg.eu> writes:

On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Maybe non-copyability needs to become a requirement for 
 InputRanges.

Could have an optional .clone if copying is supported.
What would be an InputRange where copying is correct?

 We already have refRange for that, no?
 Also refCounted(Random(unpredictableSeed)) should work.

Same scheme works for files with only plain int fds streams, 
non-copyable/unique ownership, and move, refRange, or refCounted 
for passing and sharing.

Should we split off this discussion to a dlang-study thread?

Nov 25 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 26 November 2016 at 06:55:24 UTC, Martin Nowak wrote:
 Should we split off this discussion to a dlang-study thread?

I would personally really welcome that, but subject to the 
understanding that people agree to look seriously at random 
algorithms (like RandomSample) and not focus the discussion 
solely on RNGs.

Nov 26 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/26/2016 01:55 AM, Martin Nowak wrote:
 On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Maybe non-copyability needs to become a requirement for InputRanges.

 Could have an optional .clone if copying is supported.
 What would be an InputRange where copying is correct?

Input ranges with disabled this(this) would be a major breaking change 
that we probably cannot bear (and shouldn't because the gain is small). 
I think an input range would be at best "pure reference" in the sense 
that popFront advances all copies to the same position. -- Andrei

Nov 26 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Saturday, November 26, 2016 12:24:47 Andrei Alexandrescu via Digitalmars-
d wrote:
 On 11/26/2016 01:55 AM, Martin Nowak wrote:
 On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Maybe non-copyability needs to become a requirement for InputRanges.

 Could have an optional .clone if copying is supported.
 What would be an InputRange where copying is correct?

 Input ranges with disabled this(this) would be a major breaking change
 that we probably cannot bear (and shouldn't because the gain is small).
 I think an input range would be at best "pure reference" in the sense
 that popFront advances all copies to the same position. -- Andrei

For a while now, I've been tempted to argue that we should require that
well-behaved input ranges be reference types (or at least have reference
semantics). It just seems like too much misbehaves if they're not, and
arguably, the whole reason that they're input ranges and not forward ranges
(aside from the programmer just not bothering to implement save) is that
they effectively have reference semantics whether they're actually coded
that way or not. We obviously can't test for that behavior at compile time,
but it could easily be on the list of things that are documented as a
requirement for well-behaved ranges.

I'm also tempted to argue that well-behaved forward ranges and greater
should have value semantics (in the sense that dynamic arrays do), but
that's a much bigger change (it effectively makes save unnecessary), and
proper use of save works around that problem (though it's very common that
save isn't used as often as it should be). It would clean up a lot of the
mistakes that occur with forward ranges though.

However, regardless, a non-copyable range would not play nicely at all with
how range-based code works right now. You'd have to use ref all over the
place, which range-based code simply doesn't do and would not play well with
function chaining, since you'd need lvalues, and since you couldn't copy the
range to the stack to make it an lvalue, that seems like it would go nowhere
fast.

- Jonathan M Davis

Nov 27 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Yes the problems are inadvertent copies and a disabled 
 this(this) would prevent that. RNGs should have unique 
 ownership of their internal state.
 Using InputRanges with phobos is somewhat clumsy. Maybe people 
 have been burned by those and now generally consider 
 InputRanges as broken.
 Maybe non-copyability needs to become a requirement for 
 InputRanges.

I remember you made that suggestion of disabled this(this) to me 
a while back, and I did use it for this small collection of RNGs:
https://github.com/WebDrake/dxorshift

 We should add a reference RNG that transforms any other RNG 
 into an input range that shares the actual RNG.

 We already have refRange for that, no?
 Also refCounted(Random(unpredictableSeed)) should work.

RefRange still has the issue that it only forwards the range 
primitives and not other properties such as the `isUniformRandom` 
enum.  But those are probably fixable.

However, these approaches basically cover the case of something 
where an instance is initialized high up in the program and 
passed around by ref throughout its lifetime.  That doesn't 
address the question of how random _algorithms_ like 
`RandomSample` could be updated for safety (since they might 
often need to be initialized multiple times in the inner loops of 
a program).  See:
https://forum.dlang.org/post/gnlvyogolkmocujtnxjj forum.dlang.org
https://forum.dlang.org/post/gpsilefdillwtleuwohl forum.dlang.org

for some discussion of the problem.

Nov 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to 
 C++)

 This seems like a gratuitous departure from common D practice. 
 Random number generators are most naturally modeled in D as 
 infinite ranges. -- Andrei

It is safe low level architecture without performance and API 
issues. It prevents users to do stupid things implicitly (like 
copying RNGs). A hight level range interface can be added in the 
future (it will hold a _pointer_ to an RNG). In additional, when 
you need to write algorithms or distributions opCall is much more 
convenient than range API. In additions, users would not use 
Engine API in 99% cases: they will just want to call `rand` or 
`uniform`, or other distribution.

I am sure that almost any library should have low level API that 
is fits to its implementation first. Addition API levels also may 
be added. Current Phobos evolution is generic degradation: more 
generic and "universal" code hide more uncovered bugs in the 
code. The std.range is good example of degradation, it has a lot 
of API and implementation  bugs.




not used when an uniform integer number is generated : 
`uniform!ulong` / `uniform!ulong(0, 100)`.


8/16/32/64 bits uniformly. It is RNG problem how to do it.

I will not fill this bug as well another dozen std.random bugs 
because the module should be rewritten anyway and I am working on 
it. std.random is a collection of bugs from C/C++ libraries 
extended with D generic idioms. For example, there is no reason 
in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 
expansion of 32 bit algorithms must be proved theoretically 
before we allow it for end users. 64 bit analogs are exists, but 
they have another implementations. Phobos degrades because we add 
a lot of generic specializations and small utilities without 
understanding use cases. Phobos really follows stupid idealistic 
idea: more generic is better, more API is better, more universal 
algorithms is better. The problems is that Phobos/DRuntime is 
soup where all (because its "universality") interacts with 
everything.

Nov 22 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
 Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar 
 to C++)

 This seems like a gratuitous departure from common D practice. 
 Random number generators are most naturally modeled in D as 
 infinite ranges. -- Andrei

 It is safe low level architecture without performance and API 
 issues. It prevents users to do stupid things implicitly (like 
 copying RNGs). A hight level range interface can be added in 
 the future (it will hold a _pointer_ to an RNG). In additional, 
 when you need to write algorithms or distributions opCall is 
 much more convenient than range API. In additions, users would 
 not use Engine API in 99% cases: they will just want to call 
 `rand` or `uniform`, or other distribution.

 I am sure that almost any library should have low level API 
 that is fits to its implementation first. Addition API levels 
 also may be added. Current Phobos evolution is generic 
 degradation: more generic and "universal" code hide more 
 uncovered bugs in the code. The std.range is good example of 
 degradation, it has a lot of API and implementation  bugs.

EDIT: std.range -> std.random

Nov 22 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 01:16 AM, Ilya Yaroshenko wrote:
 The std.range is good example of degradation, it has a lot of API and
 implementation  bugs.

 EDIT: std.range -> std.random

Oh, that narrows it down. Thx. -- Andrei

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:



 are not used when an uniform integer number is generated : 
 `uniform!ulong` / `uniform!ulong(0, 100)`.


 8/16/32/64 bits uniformly. It is RNG problem how to do it.

Alternative solution: functionality that expects the full 
unsigned integer word (UIntType) to be filled with random bits, 
should validate that the min/max values of the generator 
correspond to UIntType.min and UIntType.max.

That introduces less breaking change, creates less divergence 
with the C++11 standard, and preserves more flexibility for the 
future.

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 10:27:00 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
 wrote:



 are not used when an uniform integer number is generated : 
 `uniform!ulong` / `uniform!ulong(0, 100)`.


 8/16/32/64 bits uniformly. It is RNG problem how to do it.

 Alternative solution: functionality that expects the full 
 unsigned integer word (UIntType) to be filled with random bits, 
 should validate that the min/max values of the generator 
 correspond to UIntType.min and UIntType.max.

 That introduces less breaking change, creates less divergence 
 with the C++11 standard, and preserves more flexibility for the 
 future.

Good point, will add this. --Ilya

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 05:27 AM, Joseph Rushton Wakeling wrote:
 Alternative solution: functionality that expects the full unsigned
 integer word (UIntType) to be filled with random bits, should validate
 that the min/max values of the generator correspond to UIntType.min and
 UIntType.max.

 That introduces less breaking change, creates less divergence with the
 C++11 standard, and preserves more flexibility for the future.

Good idea - static assert is your friend. -- Andrei

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
 It is safe low level architecture without performance and API 
 issues. It prevents users to do stupid things implicitly (like 
 copying RNGs). A hight level range interface can be added in 
 the future (it will hold a _pointer_ to an RNG).

Note that if you want to do this, it's convenient to still 
preserve a separation between popping the RNG state versus 
getting the current variate.  Otherwise, the range interface will 
wind up having to separately cache the variate value, which is 
wasteful.

Something like this:

struct MyRNG
{
     void popFront() { /* update internal state */ }

     UIntType front()  property { return /* whatever part of 
internal state */; }

     UIntType opCall()
     {
         this.popFront();
         return this.front;
     }
}

(The above is basically just the input range API less the `empty` 
property, and the `front` and `popFront()` are arguably a 
lower-level API than `opCall`.)

 In additional, when you need to write algorithms or 
 distributions opCall is much more convenient than range API.

Can you give some examples of what you mean here?

  In additions, users would not use Engine API in 99% cases: 
 they will just want to call `rand` or `uniform`, or other 
 distribution.

I don't think that's necessarily true, but in any case, we 
shouldn't restrict the use-cases of the 1% unless we have to.

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
 wrote:
 It is safe low level architecture without performance and API 
 issues. It prevents users to do stupid things implicitly (like 
 copying RNGs). A hight level range interface can be added in 
 the future (it will hold a _pointer_ to an RNG).

 Note that if you want to do this, it's convenient to still 
 preserve a separation between popping the RNG state versus 
 getting the current variate.  Otherwise, the range interface 
 will wind up having to separately cache the variate value, 
 which is wasteful.

 Something like this:

 struct MyRNG
 {
     void popFront() { /* update internal state */ }

     UIntType front()  property { return /* whatever part of 
 internal state */; }

     UIntType opCall()
     {
         this.popFront();
         return this.front;
     }
 }

 (The above is basically just the input range API less the 
 `empty` property, and the `front` and `popFront()` are arguably 
 a lower-level API than `opCall`.)

 In additional, when you need to write algorithms or 
 distributions opCall is much more convenient than range API.

 Can you give some examples of what you mean here?

For example, Mir Random basic utilities (more low level then 
distributions):
https://github.com/libmir/mir-random/blob/master/source/random/package.d

Also you can explore std.random code. opCall would almost always 
more convenient to use.

  In additions, users would not use Engine API in 99% cases: 
 they will just want to call `rand` or `uniform`, or other 
 distribution.

 I don't think that's necessarily true, but in any case, we 
 shouldn't restrict the use-cases of the 1% unless we have to.

We don't restrict. It is 1 minute to write an Range wrapper. This 
wrapper can be added to the library, if we found a real world use 
case. Again, I have not seen a real world algorithm, which looks 
better with Range API for generators. RandomSample can be created 
without Range API, and it would look more convenient then it is 
now. I think that Range API is bad and useless overengineering 
for RNGs.

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko 
wrote:
 For example, Mir Random basic utilities (more low level then 
 distributions):
 https://github.com/libmir/mir-random/blob/master/source/random/package.d

 Also you can explore std.random code. opCall would almost 
 always more convenient to use.

Yes, but as described, `opCall` can easily be created as a 
composition of `popFront` and `front` for these convenience 
purposes.

 We don't restrict. It is 1 minute to write an Range wrapper.

If all you have is `opCall` then the range wrapper is less 
efficient than it need be.

 This wrapper can be added to the library, if we found a real 
 world use case. Again, I have not seen a real world algorithm, 
 which looks better with Range API for generators. RandomSample 
 can be created without Range API, and it would look more 
 convenient then it is now. I think that Range API is bad and 
 useless overengineering for RNGs.

Yes, most uses of RNGs in std.random involve calling `front` and 
then `popFront()` (although it would probably be better the other 
way round).  But it's readily possible to imagine range-based 
use-cases for random distributions along the lines of,

     myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 
0).somethingElse.take(20);

But what I'd say more broadly is that of what I've seen so far, 
mir.random is conflating too many breaking changes without giving 
thought for their impact (for example, converting the 
`isUniformRNG` check to rely on a UDA is IMO problematic; I can 
file a GitHub issue explaining the reasons for this).  Allowing 
for the wider goals of the exercise, it could be worth giving 
some thought to which of these breakages is really needed to 
support your use-cases, and which can be avoided.

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko 
 wrote:
 For example, Mir Random basic utilities (more low level then 
 distributions):
 https://github.com/libmir/mir-random/blob/master/source/random/package.d

 Also you can explore std.random code. opCall would almost 
 always more convenient to use.

 Yes, but as described, `opCall` can easily be created as a 
 composition of `popFront` and `front` for these convenience 
 purposes.

 We don't restrict. It is 1 minute to write an Range wrapper.

 If all you have is `opCall` then the range wrapper is less 
 efficient than it need be.

Inlining will solve this problem with data duplication (if any) 
for most real world cases.

 This wrapper can be added to the library, if we found a real 
 world use case. Again, I have not seen a real world algorithm, 
 which looks better with Range API for generators. RandomSample 
 can be created without Range API, and it would look more 
 convenient then it is now. I think that Range API is bad and 
 useless overengineering for RNGs.

 Yes, most uses of RNGs in std.random involve calling `front` 
 and then `popFront()` (although it would probably be better the 
 other way round).  But it's readily possible to imagine 
 range-based use-cases for random distributions along the lines 
 of,

     myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 
 0).somethingElse.take(20);

 But what I'd say more broadly is that of what I've seen so far, 
 mir.random is conflating too many breaking changes without 
 giving thought for their impact (for example, converting the 
 `isUniformRNG` check to rely on a UDA is IMO problematic; I can 
 file a GitHub issue explaining the reasons for this).  Allowing 
 for the wider goals of the exercise, it could be worth giving 
 some thought to which of these breakages is really needed to 
 support your use-cases, and which can be avoided.

Yes, please file a GitHub issue.

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton 
Wakeling wrote:
 Yes, most uses of RNGs in std.random involve calling `front` 
 and then `popFront()` (although it would probably be better the 
 other way round).  But it's readily possible to imagine 
 range-based use-cases for random distributions along the lines 
 of,

     myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 
 0).somethingElse.take(20);

 But what I'd say more broadly is that of what I've seen so far, 
 mir.random is conflating too many breaking changes without 
 giving thought for their impact (for example, converting the 
 `isUniformRNG` check to rely on a UDA is IMO problematic; I can 
 file a GitHub issue explaining the reasons for this).  Allowing 
 for the wider goals of the exercise, it could be worth giving 
 some thought to which of these breakages is really needed to 
 support your use-cases, and which can be avoided.

Added RandomRangeAdaptor for URBGs:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko 
wrote:
 Added RandomRangeAdaptor for URBGs:
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

This has exactly the problem I identified above, though: you're 
unnecessarily cacheing the latest variate rather than just using 
the RNG state directly.  Not the biggest deal in the world, but 
avoidable if you allow a separation between updating RNG state 
and accessing it.

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 13:26:41 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko 
 wrote:
 Added RandomRangeAdaptor for URBGs:
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

 This has exactly the problem I identified above, though: you're 
 unnecessarily cacheing the latest variate rather than just 
 using the RNG state directly.  Not the biggest deal in the 
 world, but avoidable if you allow a separation between updating 
 RNG state and accessing it.

1. Current default RNG (Mt19937) has not state for the latest 
value.
2. The structure is allocated on stack and compilers can 
recognise loop patterns and eliminate addition memory movements 
for many cases.

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 13:35:33 UTC, Ilya Yaroshenko 
wrote:
 1. Current default RNG (Mt19937) has not state for the latest 
 value.

That's a detail of your implementation, though, and it's not true 
for lots of other pseudo-RNGs.

 2. The structure is allocated on stack and compilers can 
 recognise loop patterns and eliminate addition memory movements 
 for many cases.

Fair enough.  I'm still not sure, though, why you would object to 
providing public methods for pseudo-RNGs to (i) update the 
internal state and (ii) access the most recently generated 
variate, which the `opCall` would then wrap.

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 08:26 AM, Joseph Rushton Wakeling wrote:
 On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:
 Added RandomRangeAdaptor for URBGs:
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

 This has exactly the problem I identified above, though: you're
 unnecessarily cacheing the latest variate rather than just using the RNG
 state directly.  Not the biggest deal in the world, but avoidable if you
 allow a separation between updating RNG state and accessing it.

Well I do see another small problem with 
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d#L19. 
It creates the first value eagerly upon construction. That's just a bit 
awkward. It's not the first time I'm seeing this issue, and to be honest 
am a bit bummed about it. It's a consequence of all ranges needing to 
buffer at least one element. -- Andrei

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei 
Alexandrescu wrote:
 Well I do see another small problem with 
 https://github.com/libmir/mir-random/blob/master/source/ran
om/algorithm.d#L19. It creates the first value eagerly upon construction.
That's just a bit awkward. It's not the first time I'm seeing this issue, and
to be honest am a bit bummed about it. It's a consequence of all ranges needing
to buffer at least one element. -- Andrei

Yea, this touches on one of my more general concerns related to 
ranges, randomness and laziness.  Part of the trouble is we don't 
have a really good general design for how to implement _truly_ 
lazy ranges, where the value of `front` is not determined until 
the point where you first access it.

This has particular relevance to e.g. RandomSample and 
RandomCover.

Nov 23 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wednesday, November 23, 2016 14:02:36 Joseph Rushton Wakeling via 
Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei

 Alexandrescu wrote:
 Well I do see another small problem with
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm
 .d#L19. It creates the first value eagerly upon construction. That's
 just a bit awkward. It's not the first time I'm seeing this issue, and
 to be honest am a bit bummed about it. It's a consequence of all ranges
 needing to buffer at least one element. -- Andrei

 Yea, this touches on one of my more general concerns related to
 ranges, randomness and laziness.  Part of the trouble is we don't
 have a really good general design for how to implement _truly_
 lazy ranges, where the value of `front` is not determined until
 the point where you first access it.

 This has particular relevance to e.g. RandomSample and
 RandomCover.

It's quite feasible if you don't calculate front until it's called or
popFront is called, and then you cache the result so that subsequent calls
to front prior to the call to popFront return the same result, but it
creates additional overhead, because then every call to front and popFront
has to check whether the value has been calculated yet. And if every range
in the chain of ranges has to do that, then that additional overhead is
going to add up.

In general, it's just cleaner to either calculate front on every call to
front (which doesn't make sense in the case of random number generators) or
to calculate the first front upon construction and be done with it. And I
think that the general consensus has been that calculating front in the
constructor and popFront and caching works better than calculating it on
every call to front, but that doesn't always work (e.g. map), and that
caching does cause issues occasionally.

- Jonathan M Davis

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis 
wrote:
 It's quite feasible if you don't calculate front until it's 
 called or popFront is called, and then you cache the result so 
 that subsequent calls to front prior to the call to popFront 
 return the same result, but it creates additional overhead, 
 because then every call to front and popFront has to check 
 whether the value has been calculated yet. And if every range 
 in the chain of ranges has to do that, then that additional 
 overhead is going to add up.

Yes, indeed.  I actually came up with a general purpose solution 
for that a while back via template mixin (although the version in 
the post linked to is not the final version: I updated it to use 
HaraldZealot's suggestion of making the frontImplementation and 
popFrontImplementation the template parameters):
https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d puremagic.com

 In general, it's just cleaner to either calculate front on 
 every call to front (which doesn't make sense in the case of 
 random number generators) or to calculate the first front upon 
 construction and be done with it.

In some ways I think it's cleaner to not privilege the first 
front value and go for "true" laziness for all front values -- 
particularly when dealing with stuff like RandomSample.

The downside is that `.front` becomes non-const.  Sometimes I 
wonder whether it wouldn't have been better to require that 
`popFront()` be called before even the first call to `.front`, 
and place the onus on `popFront()` to handle any special-casing 
of the first `.front` value.

 And I think that the general consensus has been that 
 calculating front in the constructor and popFront and caching 
 works better than calculating it on every call to front, but 
 that doesn't always work (e.g. map), and that caching does 
 cause issues occasionally.

The case I always think of is: what if your input range is 
designed to correspond to sensor observations made at a 
particular time?  This is a case where cacheing the first value 
on construction is very problematic.

For RNGs it's really not such a big deal so long as successive 
variates are independent, but for random algorithms I think the 
laziness matters, since their popFront is essentially 
IO-dependent (in the Haskell-style sense that randomness is IO).

Nov 25 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Friday, November 25, 2016 08:32:07 Joseph Rushton Wakeling via 
Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis

 wrote:
 It's quite feasible if you don't calculate front until it's
 called or popFront is called, and then you cache the result so
 that subsequent calls to front prior to the call to popFront
 return the same result, but it creates additional overhead,
 because then every call to front and popFront has to check
 whether the value has been calculated yet. And if every range
 in the chain of ranges has to do that, then that additional
 overhead is going to add up.

 Yes, indeed.  I actually came up with a general purpose solution
 for that a while back via template mixin (although the version in
 the post linked to is not the final version: I updated it to use
 HaraldZealot's suggestion of making the frontImplementation and
 popFrontImplementation the template parameters):
 https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d pu
 remagic.com
 In general, it's just cleaner to either calculate front on
 every call to front (which doesn't make sense in the case of
 random number generators) or to calculate the first front upon
 construction and be done with it.

 In some ways I think it's cleaner to not privilege the first
 front value and go for "true" laziness for all front values --
 particularly when dealing with stuff like RandomSample.

It's much messier from an implementation standpoint. If the constructor
checks for empty and caches front if the range it was given was non-empty,
then every call to front can just return front, and every call to popFront
can just pop the element off and do whatever it does to turn the front from
the wrapped range into the front of this range and then cache it. If the
constructor doesn't do that work, then every call to front and every call to
popFront have to check whether they're the first time that either has been
called and _then_ they end up doing all of the same work anyway. So, there's
more code, and it's less efficient. And if every range in the chain does
that, then you're getting a lot of extra checks just to determine whether
it's the first time that front or popFront is being called. Now, if there's
never any caching of the front value, and front always calculates it, then
that might actually be more efficient if front is only ever called once per
element, because you don't get the cost of copying the element to cache it,
but if front gets called multiple times (which happens fairly often, I
think), then the cost is definitely greater.

You're never going to be able to rely on a fully lazy front anyway unless we
specified it as part of the range API that all ranges work that way, and as
it stands, very few ranges do.

 The downside is that `.front` becomes non-const.

Well, realistically, as it stands, ranges are utterly useless with const
anyway. We'd have to have a standard mechanism for getting a tail-const copy
of a range from a const or immutable range, and we don't.

 Sometimes I
 wonder whether it wouldn't have been better to require that
 `popFront()` be called before even the first call to `.front`,
 and place the onus on `popFront()` to handle any special-casing
 of the first `.front` value.

That would be terrible with chaining ranges. It also would be _very_
error-prone. It's also heading into two-phase initialization territory,
which is almost always a bad idea.

 And I think that the general consensus has been that
 calculating front in the constructor and popFront and caching
 works better than calculating it on every call to front, but
 that doesn't always work (e.g. map), and that caching does
 cause issues occasionally.

 The case I always think of is: what if your input range is
 designed to correspond to sensor observations made at a
 particular time?  This is a case where cacheing the first value
 on construction is very problematic.

 For RNGs it's really not such a big deal so long as successive
 variates are independent, but for random algorithms I think the
 laziness matters, since their popFront is essentially
 IO-dependent (in the Haskell-style sense that randomness is IO).

Honestly, I don't think that it really works to have a range where the value
of its elements depend on when you call front or popFront. The API lets you
do it, but you're going to run into plenty of problems if you do unless you
don't care about the frequency at which the values are generated. Part of
the problem here is that ranges and the algorithms that use them are really
designed with arrays in mind. A non-random-access range reduces those
capabilities, and a basic input range doesn't actually require that the
elements be reproducible aside from multiple calls to front giving the same
result, but it's still the case that it's pretty much assumed that a range
is a fixed set of values, and pretty much nothing is concerned with how fast
front or popFront is called aside for efficiency concerns. So, if someone
has a range that calculates a value when front or popFront is called, and
that value depends on when the function is called, I think that they're just
going to have to deal with the consequences of that. It's already pretty
problematic when you have a range that's a basic input range rather than a
forward range.

Another thing to consider here is that those sorts of concerns really only
apply to basic input ranges. As soon as you have a forward range, the values
of the elements need to be completely predictable - or at least completely
reproducible - meaning that aside from caching a potentially infinite number
of elements to guarantee that save works, nothing that's doing something
like grabbing changing values from a sensor is going to be anything but a
basic input range. So, maybe we should do something special with regards to
basic input ranges and how they're treated in order to better deal with
cases like that, but if we went that route, then we pretty much couldn't
treat them like forward ranges without save anymore. In many cases, we'd
have to have separate implementations that avoided things like caching, and
that would become problematic fast.

I think that there always going to be cases where certain things have some
funky corner cases with ranges - especially the further that they get from
being dynamic arrays. We probably need to do a better job of formalizing
some of this stuff to better avoid some of those corner cases, but I think
that we're always going to be able to find something that we can define as a
range that works in many cases but starts doing weird things if we use it on
a large enough range of algorithms.

- Jonathan M Davis

Nov 25 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:
 I think that Range API is bad and useless overengineering for RNGs.

So it either "takes 1 minute" to change the interface from opCall to 
ranges (i.e. they're virtually the same thing), or it's the above. 
There's precious little overengineering that can be done in 1 minute 
:o). I see you did that in a dozen lines in RandomRangeAdaptor.

I understand you believe the range interface is unnecessary or overkill 
for random number generators. I may even agree for certain cases. 
However, there are a few arguments you may want to consider:

* By virtue of working with D, everybody know what a range is. 
Presenting the RNGs that way instantly evokes a host of knowledge, 
understanding, and idioms.

* D users (or at least a fraction) and several algorithms understand the 
notion of an infinite range and make salient decisions based on that. A 
range interface automatically taps into that.


Andrei

Nov 23 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wednesday, November 23, 2016 08:54:30 Andrei Alexandrescu via 
Digitalmars-d wrote:
 On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:
 I think that Range API is bad and useless overengineering for RNGs.

 So it either "takes 1 minute" to change the interface from opCall to
 ranges (i.e. they're virtually the same thing), or it's the above.
 There's precious little overengineering that can be done in 1 minute

 :o). I see you did that in a dozen lines in RandomRangeAdaptor.

 I understand you believe the range interface is unnecessary or overkill
 for random number generators. I may even agree for certain cases.
 However, there are a few arguments you may want to consider:

 * By virtue of working with D, everybody know what a range is.
 Presenting the RNGs that way instantly evokes a host of knowledge,
 understanding, and idioms.

 * D users (or at least a fraction) and several algorithms understand the
 notion of an infinite range and make salient decisions based on that. A
 range interface automatically taps into that.

I've frequently found that in cases where I'm not operating on a range of
values, I really just want a way to get a random number, and having to call
both popFront and front to get it is a bit annoying (it's actually one of
the few places that I've ever used the comma operator). That being said, I
think that the correct solution to that is to add a function to std.random
that takes a range, calls popFront on it, and then returns front so that it
can work with any of the random number generators and avoids problems with
whether or not popFront was called prior to calling front. I have to agree
that creating a different API that uses opCall or whatever instead of a
the range API is a bad idea, particularly when a simple helper function
would make it possible to use a random number generator in a fashion more
like rand() for the cases where that's preferable, and for a lot of cases,
having the range API is _extremely_ useful.

- Jonathan M Davis

Nov 23 2016

Somebody <ajieskola gmail.com> writes:

 I have to agree that creating a different API that uses opCall 
 or whatever instead of a the range API is a bad idea, 
 particularly when a simple helper function would make it 
 possible to use a random number generator in a fashion more 
 like rand() for the cases where that's preferable, and for a 
 lot of cases, having the range API is _extremely_ useful.

 - Jonathan M Davis

Would, assuming the OpCall() interface, generate!(() => rng()) 
make an input range out of it easily? Or am I missing something?

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton 
Wakeling wrote:
 struct MyRNG
 {
     void popFront() { /* update internal state */ }

     UIntType front()  property { return /* whatever part of 
 internal state */; }

     UIntType opCall()
     {
         this.popFront();
         return this.front;
     }
 }

xorshift128+ returns a temporary value computed during state 
update, so it can't efficiently separate font from popFront. Also 
this API is prone to reuse some values and discard others, call 
popFront after front.

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 09:12 AM, Kagamin wrote:
 xorshift128+ returns a temporary value computed during state update, so
 it can't efficiently separate font from popFront.

That seems to be a minor matter. Of course at best some measurements 
would be in order.

 Also this API is prone
 to reuse some values and discard others, call popFront after front.

This claim would apply to all ranges. There seems to be evidence it is 
unfounded.

The main argument for using the range interface for RNGs is reuse of 
abstraction. Minute implementation matters are much less important. The 
main counter-argument is that the abstraction is not fitting well and 
another abstraction (such as opCall) is more advantageous.


Andrei

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
 This claim would apply to all ranges. There seems to be 
 evidence it is unfounded.

 The main argument for using the range interface for RNGs is 
 reuse of abstraction. Minute implementation matters are much 
 less important. The main counter-argument is that the 
 abstraction is not fitting well and another abstraction (such 
 as opCall) is more advantageous.

Consider this okayish looking code:
consume(rng());
consume(rng.take(2)); //reuses previous value
consume(rng()); //discards unused value

Nov 23 2016

Ryan <clumsycodemonkey gmail.com> writes:

On Wednesday, 23 November 2016 at 15:29:14 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei

 Consider this okayish looking code:
 consume(rng());
 consume(rng.take(2)); //reuses previous value
 consume(rng()); //discards unused value

Also consider the case of using 1 generator in your program, but 
calling it from different places. In one place, you use opCall 
with the popFront -> front order of calls, and in the other you 
use the range interface directly with the order reversed. You 
would re-use values there too.

By offering both interfaces together it makes it pretty easy to 
create silly bugs like this, especially in a large project.

Might be useful to have a basic RNG interface and then wrap it 
with a template that provides either, but not both, of the other 
interfaces at a time.

Nov 23 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wednesday, November 23, 2016 15:29:14 Kagamin via Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei

 Alexandrescu wrote:
 This claim would apply to all ranges. There seems to be
 evidence it is unfounded.

 The main argument for using the range interface for RNGs is
 reuse of abstraction. Minute implementation matters are much
 less important. The main counter-argument is that the
 abstraction is not fitting well and another abstraction (such
 as opCall) is more advantageous.

 Consider this okayish looking code:
 consume(rng());
 consume(rng.take(2)); //reuses previous value
 consume(rng()); //discards unused value

In the cases where I don't really need a range of random numbers, I've
typically ended up doing stuff like

auto nextNum = rndGen().popFront(), rndGen().front;

though I think that using the comma operator like that is deprecated now.
Adding a helper function such as

auto getNext(R)(ref R range)
    if(isInputRange!R)
{
    range.popFront();
    return range.front;
}

would solve that problem. And there are plenty of cases where having a range
of random numbers is extremely useful. After having dealt with, std.random,
it would really suck to have to deal with a random number generator that was
not range-based. std.random has problems such as not using reference types
for its ranges, but using the range API in the way it does is extremely
useful.

- Jonathan M Davis

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis 
wrote:
 though I think that using the comma operator like that is 
 deprecated now. Adding a helper function such as

 auto getNext(R)(ref R range)
     if(isInputRange!R)
 {
     range.popFront();
     return range.front;
 }

 would solve that problem.

It's the same behavior and suffers from the same problem of reuse 
of RNG output.

Nov 24 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thursday, November 24, 2016 09:05:34 Kagamin via Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis

 wrote:
 though I think that using the comma operator like that is
 deprecated now. Adding a helper function such as

 auto getNext(R)(ref R range)
     if(isInputRange!R)
 {
     range.popFront();
     return range.front;
 }

 would solve that problem.

 It's the same behavior and suffers from the same problem of reuse
 of RNG output.

How so? Because someone might call range.front again without bothering to
call popFront? If you're paranoid about that, then it can call popFront
again before returning (though that would be wasteful).

My take on it is that if you just call popFront before using the random
number generator range, then you don't have to worry about what any other
code that used it did.

Regardless, the range API is _way_ more useful in general than something
like rand(). And if anyone is trying to avoid ranges with random numbers, I
think that they're just making their life harder. Occasionally, it's useful
to get just one random number, in which case, having to deal with the range
API over rand() is kind of annoying, but in general, it's not a problem, and
the wrapper function that I suggested basically gives you rand() from a
range of random numbers.

Alternatively, you could just do rndGen().take(1).front, and as long as
rndGen() gives you a reference type, it works just fine. Unfortunately,
std.random did not use reference types for its ranges. _That_ is the big
mistake of std.random and the main item that needs to be fixed. There are
some other subtleties (e.g. it's useful to be able to save the state of a
random number generating range, but you don't necessarily really want it to
be a forward range), but those are minor in comparison to the mistake of
std.random using value types rather than reference types for ranges.

- Jonathan M Davis

Nov 24 2016

Kagamin <spam here.lot> writes:

On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
wrote:
 How so? Because someone might call range.front again without 
 bothering to call popFront?

That's what everything in std.algorithm does.

Nov 24 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thursday, November 24, 2016 13:54:50 Kagamin via Digitalmars-d wrote:
 On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis

 wrote:
 How so? Because someone might call range.front again without
 bothering to call popFront?

 That's what everything in std.algorithm does.

Then call popFront or drop before passing it along if you're paranoid about
it. If it's a serious concern that this be fixed in general, then the
obvious fix to that would be to make it so that rndGen() just called
popFront before returning it. Heck, if rndGen() were guaranteed to call
popFront when it was called rather than simply returning the range, then you
could just do rndGen().front whenever you wanted the equivalent of rand(),
making it trivial to use rndGen() both for cases where you wanted a single
number and for cases where you wanted a range of them - and without worrying
about front being stale.

- Jonathan M Davis

Nov 24 2016

Kagamin <spam here.lot> writes:

On Thursday, 24 November 2016 at 14:22:05 UTC, Jonathan M Davis 
wrote:
 Then call popFront or drop before passing it along if you're 
 paranoid about it.

There's little need for it, as it was pointed out earlier that 
the generate algorithm does the right thing and needs only 
opCall, also a nice example of orthogonality and composability.

Nov 25 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
wrote:
 Alternatively, you could just do rndGen().take(1).front, and as 
 long as rndGen() gives you a reference type, it works just 
 fine. Unfortunately, std.random did not use reference types for 
 its ranges. _That_ is the big mistake of std.random and the 
 main item that needs to be fixed. There are some other 
 subtleties (e.g. it's useful to be able to save the state of a 
 random number generating range, but you don't necessarily 
 really want it to be a forward range), but those are minor in 
 comparison to the mistake of std.random using value types 
 rather than reference types for ranges.

 - Jonathan M Davis

The fix is opCall syntax. Reference types are classes, which 
would not work without linking DRuntime and Phobos (BetterC). 
Refcounting is to slow to implement for users.
Note, that Engines is only backend RNGs. There are also 
nonuniform distributions (WIP). See the example of users code: 
https://forum.dlang.org/post/nyvtoejvmsaolzaqyche forum.dlang.org 
.

It is very difficult to implement both user random variable and 
Engine using Range API. Note, that code should work without 
DRuntime and should be simple (the example is user code).

Ilya

Nov 24 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Thursday, 24 November 2016 at 16:09:23 UTC, Ilya Yaroshenko 
wrote:
 On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
 wrote:
 Alternatively, you could just do rndGen().take(1).front, and 
 as long as rndGen() gives you a reference type, it works just 
 fine. Unfortunately, std.random did not use reference types 
 for its ranges. _That_ is the big mistake of std.random and 
 the main item that needs to be fixed. There are some other 
 subtleties (e.g. it's useful to be able to save the state of a 
 random number generating range, but you don't necessarily 
 really want it to be a forward range), but those are minor in 
 comparison to the mistake of std.random using value types 
 rather than reference types for ranges.

 - Jonathan M Davis

 The fix is opCall syntax. Reference types are classes, which 
 would not work without linking DRuntime and Phobos (BetterC).

If druntime was initialised by default using 
__attribute__((constructor)) for static and linker .init for 
shared libraries, would that be good enough for you*? I feel like 
you're limiting your design choices because of a relatively small 
and simple implementation shortcoming.

* remember, we don't currently guarantee ABI compatibility 
between compiler versions even if you don't use druntime/phobos.

Nov 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Thursday, 24 November 2016 at 17:04:43 UTC, John Colvin wrote:
 If druntime was initialised by default using 
 __attribute__((constructor)) for static and linker .init for 
 shared libraries, would that be good enough for you*? I feel 
 like you're limiting your design choices because of a 
 relatively small and simple implementation shortcoming.

 * remember, we don't currently guarantee ABI compatibility 
 between compiler versions even if you don't use druntime/phobos.

No, it should work without DRuntime. Nicholas wrote that it will 
work on GPU using dcompute :-) BetterC is the goal for the future 
Mir-Phobos, hehe

Nov 24 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
 That seems to be a minor matter.

Yes, it's just Joseph worries about microoptimizations.

 The main counter-argument is that the abstraction is not 
 fitting well and another abstraction (such as opCall) is more 
 advantageous.

Ranges are designed for reuse of front, which is not desirable 
for RNGs.

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
 That seems to be a minor matter. Of course at best some 
 measurements would be in order.
 Yes, it's just Joseph worries about microoptimizations.


Though we saw that the compiler can't optimize some simple 
operations like division by 2.

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/16 11:10 AM, Kagamin wrote:
 On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:
 That seems to be a minor matter. Of course at best some measurements
 would be in order.
 Yes, it's just Joseph worries about microoptimizations.


 Though we saw that the compiler can't optimize some simple operations
 like division by 2.

[Details needed]

I just took a look https://godbolt.org/g/EMy6X4, it's happening.


Andrei

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei 
Alexandrescu wrote:
 [Details needed]

 I just took a look https://godbolt.org/g/EMy6X4, it's happening.

That's three instructions instead of one shr eax,1

Nov 24 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 24.11.2016 09:55, Kagamin wrote:
 On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote:
 [Details needed]

 I just took a look https://godbolt.org/g/EMy6X4, it's happening.

 That's three instructions instead of one shr eax,1

That would be wrong code. Cast to int after dividing.

Nov 24 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Thursday, 24 November 2016 at 08:55:18 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei 
 Alexandrescu wrote:
 [Details needed]

 I just took a look https://godbolt.org/g/EMy6X4, it's 
 happening.

 That's three instructions instead of one shr eax,1

That's because of the cast(int), dividing by two is optimised 
just fine. You can even do this:

int foo(int a) { return a / 2; }
uint foo(uint a) { return a / 2; }
int bar(int a)
{
	if (a > 0) return a / 2;
	else assert(0);
}

and get

int example.foo(int):  // slow, to deal with sign
	mov     eax, edi
	shr     eax, 31
	add     eax, edi
	sar     eax
	ret
uint example.foo(uint):  // fast because no sign
	mov     eax, edi
	shr     eax
	ret
int example.bar(int):  // single shift because sign always 0
	test    edi, edi
	jle     .L8
	mov     eax, edi
	sar     eax
	ret
.L8:
         ud2


This should help explain why the extra/different instructions are 
necessary for signed:

int foo0(int a)
{
	asm
	{
		naked;
		mov     EAX, EDI;
		shr     EAX, 31;
		add     EAX, EDI;
		sar     EAX, 1;
		ret;
	}
}

int foo1(int a)
{
	asm
	{
		naked;
		mov     EAX, EDI;
		sar     EAX, 1;
		ret;
	}
}

int foo2(int a)
{
	asm
	{
		naked;
		mov     EAX, EDI;
		shr     EAX, 1;
		ret;
	}
}


void main()
{
	import std.stdio;
	foreach(i; [int.min, -1, 0, 1, int.max])
		writeln(foo0(i), ' ', foo1(i), ' ', foo2(i));
}

output:

-1073741824 -1073741824 1073741824
0 -1 2147483647
0 0 0
0 0 0
1073741823 1073741823 1073741823

Nov 24 2016

Kagamin <spam here.lot> writes:

On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin wrote:
 That's because of the cast(int), dividing by two is optimised 
 just fine.

What about Andrei's example?

Nov 24 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Thursday, 24 November 2016 at 10:14:27 UTC, Kagamin wrote:
 On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin 
 wrote:
 That's because of the cast(int), dividing by two is optimised 
 just fine.

 What about Andrei's example?

I was talking about andrei's example. He has a cast(int) in it 
before the division.

Nov 24 2016

Kagamin <spam here.lot> writes:

On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin wrote:
 I was talking about andrei's example. He has a cast(int) in it 
 before the division.

And you think that compilation of cast(int)a.length/2 to 3 
instructions is optimized just fine? How is it better that one 
shift?

Nov 24 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Thursday, 24 November 2016 at 10:41:44 UTC, Kagamin wrote:
 On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin 
 wrote:
 I was talking about andrei's example. He has a cast(int) in it 
 before the division.

 And you think that compilation of cast(int)a.length/2 to 3 
 instructions is optimized just fine? How is it better that one 
 shift?

Because it's correct. If a.length is larger than int.max then 
cast(int)a.length will half the time be negative and therefore a 
simple rightshift would not be equivalent to division by 2.

Nov 24 2016

Kagamin <spam here.lot> writes:

On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:
 Because it's correct. If a.length is larger than int.max then 
 cast(int)a.length will half the time be negative and therefore 
 a simple rightshift would not be equivalent to division by 2.

It can't be possibly correct when a.length is larger than int.max 
because it doesn't recover information lost in a narrowing 
conversion.

Nov 24 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 24.11.2016 14:50, Kagamin wrote:
 On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:
 Because it's correct. If a.length is larger than int.max then
 cast(int)a.length will half the time be negative and therefore a
 simple rightshift would not be equivalent to division by 2.

 It can't be possibly correct when a.length is larger than int.max
 because it doesn't recover information lost in a narrowing conversion.

The code does not specify that this information should be recovered. Why 
is this the compiler's problem?

Nov 24 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)

 This seems like a gratuitous departure from common D practice. Random
 number generators are most naturally modeled in D as infinite ranges.
 -- Andrei

 It is safe low level architecture without performance and API issues.

I don't understand this. Can you please be more specific? I don't see a 
major issue wrt offering opCall() vs. front/popFront. (empty is always 
true.)

 It
 prevents users to do stupid things implicitly (like copying RNGs).

An input range can be made noncopyable.

 A
 hight level range interface can be added in the future (it will hold a
 _pointer_ to an RNG).

Is there a reason to not have that now? Again, I'm literally talking 
about offering front/popFront in lieu of opCall(). The only 
implementation difference is you keep the currently-generated number as 
a member instead of returning it from opCall. I doubt one could measure 
a performance difference.

If you implement it as a noncopyable input range, you get a large 
support network working for you. With opCall, we have virtually no such 
support - you need to do everything once again.

 In additional, when you need to write algorithms
 or distributions opCall is much more convenient than range API.

Could you please be more specific? On the face of it I'd agree one call 
is less than two, but I don't see a major drawback here.

 In
 additions, users would not use Engine API in 99% cases: they will just
 want to call `rand` or `uniform`, or other distribution.

 I am sure that almost any library should have low level API that is fits
 to its implementation first. Addition API levels also may be added.

Is there a large difference between opCall and front/popFront?

Actually I can think of one - the matter of getting things started. 
Ranges have this awkwardness of starting the iteration: either you fill 
the current front eagerly in the constructor, or you have some sort of 
means to detect initialization has not yet been done and do it lazily 
upon the first use of front. The best strategy would depend on the 
actual generator, and admittedly would be a bit more of a headache 
compared to opCall. Was this the motivation?

 Current Phobos evolution is generic degradation: more generic and
 "universal" code hide more uncovered bugs in the code. The std.range is
 good example of degradation, it has a lot of API and implementation  bugs.

Do you have examples of issues outside random number generators?




 used when an uniform integer number is generated : `uniform!ulong` /
 `uniform!ulong(0, 100)`.


 uniformly. It is RNG problem how to do it.

Min and max are not parameters, they are bounds provided by each 
generator. I agree their purpose is unclear. We could require all 
generators to provide min = 0 and max = UIntType.max without breaking 
APIs. In that case we only need to renounce LinearCongruentialEngine 
with c = 0 (see 
https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact 
that's the main reason for introducing min and max in the first place. 
All other code stays unchanged, and we can easily deprecate min and max 
for RNGs.

(I do see min and max used by uniform at 
https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm 
not sure I get what you mean, but anyhow the idea that we require RNGs 
to fill an uint/ulong with all random bits simplifies a lot of matters.)

 I will not fill this bug as well another dozen std.random bugs because
 the module should be rewritten anyway and I am working on it. std.random
 is a collection of bugs from C/C++ libraries extended with D generic
 idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit
 by design. Furthermore, 64 expansion of 32 bit algorithms must be proved
 theoretically before we allow it for end users. 64 bit analogs are
 exists, but they have another implementations.

One matter that I see is there's precious little difference between 
mir.random and std.random. Much of the code seems copied, which is an 
inefficient way to go about things. We shouldn't fork everything if we 
don't like a bit of it, though admittedly the path toward making changes 
in std is more difficult. Is your intent to work on mir.random on the 
side and then submit it as a wholesale replacement of std.random under a 
different name? In that case you'd have my support, but you'd need to 
convince me the replacement is necessary. You'd probably have a good 
case for eliminating xorshift/64, but then we may simply deprecate that 
outright. You'd possibly have a more difficult time with opCall.

 Phobos degrades because
 we add a lot of generic specializations and small utilities without
 understanding use cases.

This is really difficult to parse. Are you using "degrades" the way it's 
meant? What is a "generic specialization"? What are examples of "small 
utilities without understanding use cases"?

 Phobos really follows stupid idealistic idea:
 more generic is better, more API is better, more universal algorithms is
 better. The problems is that Phobos/DRuntime is soup where all (because
 its "universality") interacts with everything.

I do think more generic is better, of course within reason. It would be 
a tenuous statement that generic designs in Phobos such as ranges, 
algorithms, and allocators are stupid and idealistic. So I'd be quite 
interested in hearing more about this. What's that bouillabaisse about?


Thanks,

Andrei

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei 
Alexandrescu wrote:
 An input range can be made noncopyable.

 If you implement it as a noncopyable input range, you get a 
 large support network working for you.

struct A
{
	bool empty;
	int front;
	void popFront(){}
	 disable this(this);
}

import std.algorithm;
void f()
{
	A r;
	auto m=r.map!(a=>1);
}

Does this compile for you?

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 15:25:29 UTC, Kagamin wrote:
 struct A
 {
 	bool empty;
 	int front;
 	void popFront(){}
 	 disable this(this);
 }

 import std.algorithm;
 void f()
 {
 	A r;
 	auto m=r.map!(a=>1);
 }

 Does this compile for you?

auto m = (&r).map!(a=>1);

...?  (Untested, but I think it should work.)

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)

Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 15:59:39 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
 Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)

 Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)

What the principal difference with randomRangeAdaptor? Anyway, 
noncopyable input range con not be used with Phobos. --Ilya

Nov 23 2016

Ryan <clumsycodemonkey gmail.com> writes:

On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)

Tested, works with 2.071.1

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 16:01:14 UTC, Ryan wrote:
 On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
 Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)

 Tested, works with 2.071.1

r.randomRangeAdaptor.map!(a=>1);

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
 Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar 
 to C++)

 This seems like a gratuitous departure from common D 
 practice. Random
 number generators are most naturally modeled in D as infinite 
 ranges.
 -- Andrei

 It is safe low level architecture without performance and API 
 issues.

 I don't understand this. Can you please be more specific? I 
 don't see a major issue wrt offering opCall() vs. 
 front/popFront. (empty is always true.)

A range to use it with std.algorithm and std.range must be 
copyable (it is passed by value.

 It
 prevents users to do stupid things implicitly (like copying 
 RNGs).

 An input range can be made noncopyable.

Ditto. A noncopyable input range is useless.

 A
 hight level range interface can be added in the future (it 
 will hold a
 _pointer_ to an RNG).

 Is there a reason to not have that now?

Done. See `RandomRangeAdaptor`:
https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d

 In additional, when you need to write algorithms
 or distributions opCall is much more convenient than range API.

 Could you please be more specific? On the face of it I'd agree 
 one call is less than two, but I don't see a major drawback 
 here.

The main reason in implementation simplicity. Engines should be 
simple to create,
simple to maintain, and simple to use. opCall is more simple then 
range interface because
1. One declaration instead of 4 (3 range functions for plus 
latest generated value (optional))
2. Input range is useless if range is not copyable.
3. `randomRangeAdaptor` is implemented for Engines and will be 
done for Distributions too. So range API is supported better then 
in std.range (because Engines are copied).

 In
 additions, users would not use Engine API in 99% cases: they 
 will just
 want to call `rand` or `uniform`, or other distribution.

 I am sure that almost any library should have low level API 
 that is fits
 to its implementation first. Addition API levels also may be 
 added.

 Is there a large difference between opCall and front/popFront?

 Actually I can think of one - the matter of getting things 
 started. Ranges have this awkwardness of starting the 
 iteration: either you fill the current front eagerly in the 
 constructor, or you have some sort of means to detect 
 initialization has not yet been done and do it lazily upon the 
 first use of front. The best strategy would depend on the 
 actual generator, and admittedly would be a bit more of a 
 headache compared to opCall. Was this the motivation?

Simplicity is main motivation.




 are not
 used when an uniform integer number is generated : 
 `uniform!ulong` /
 `uniform!ulong(0, 100)`.


 8/16/32/64 bits
 uniformly. It is RNG problem how to do it.

 Min and max are not parameters, they are bounds provided by 
 each generator. I agree their purpose is unclear. We could 
 require all generators to provide min = 0 and max = 
 UIntType.max without breaking APIs. In that case we only need 
 to renounce LinearCongruentialEngine with c = 0 (see 
 https://github.com/dlang/phobos/blob/master/std/random.d#L258) 
 - in fact that's the main reason for introducing min and max in 
 the first place. All other code stays unchanged, and we can 
 easily deprecate min and max for RNGs.

 (I do see min and max used by uniform at 
 https://github.com/dlang/phobos/blob/master/std/random.d#L1281 
 so I'm not sure I get what you mean, but anyhow the idea that 
 we require RNGs to fill an uint/ulong with all random bits 
 simplifies a lot of matters.)

Current Mir solution looks like pair  isURBG and isSURBG. `S` 
prefix means `T.max == ReturnType!T.max` where T is an Engine. 
So, functions use isSURBG now. The min property is not required: 
we can just subtract actual min from a returning value.

An adaptor can be added to convert URBG to Saturated URBG.

 I will not fill this bug as well another dozen std.random bugs 
 because
 the module should be rewritten anyway and I am working on it. 
 std.random
 is a collection of bugs from C/C++ libraries extended with D 
 generic
 idioms. For example, there is no reason in 64 bit Xorshift. It 
 is 32 bit
 by design. Furthermore, 64 expansion of 32 bit algorithms must 
 be proved
 theoretically before we allow it for end users. 64 bit analogs 
 are
 exists, but they have another implementations.

 One matter that I see is there's precious little difference 
 between mir.random and std.random. Much of the code seems 
 copied, which is an inefficient way to go about things. We 
 shouldn't fork everything if we don't like a bit of it, though 
 admittedly the path toward making changes in std is more 
 difficult. Is your intent to work on mir.random on the side and 
 then submit it as a wholesale replacement of std.random under a 
 different name? In that case you'd have my support, but you'd 
 need to convince me the replacement is necessary. You'd 
 probably have a good case for eliminating xorshift/64, but then 
 we may simply deprecate that outright. You'd possibly have a 
 more difficult time with opCall.

I started with Engines as basis. The library will be very 
different comparing with Phobos and _any_ other RNG libraries in 
terms of floating point generation quality. All FP generation I 
have seen are not saturated (amount of possible unique FP values 
are very small comparing with ideal situation because of IEEE 
arithmetic). I have not found the idea described by others, so it 
may be an article in the future.

A set of new modern Engines would be added (Nicholas Wilson, and 
may be Joseph). Also Seb and I will add a set of distributions.

 Phobos degrades because
 we add a lot of generic specializations and small utilities 
 without
 understanding use cases.

 This is really difficult to parse. Are you using "degrades" the 
 way it's meant? What is a "generic specialization"? What are 
 examples of "small utilities without understanding use cases"?

Sorry, my English is ... .
It is not clear to me what subset of generic code is nothrow 
(reduce, for example). The same true for BetterC concept: it is 
hard to predict when an algorithms requires DRuntime to be linked 
/ initialised. It is not clear what  modules are imported by an 
module.

"small utilities without understanding use cases" -
Numeric code in std.algorithm:
minElement, sum. They should not be in std.algorithm. A user can 
use `reduce`. Or, if speed is required we need to move to numeric 
solution suitable for vectorization. And std.algorithm seems to 
be wrong module for vectorised numeric code.

 Phobos really follows stupid idealistic idea:
 more generic is better, more API is better, more universal 
 algorithms is
 better. The problems is that Phobos/DRuntime is soup where all 
 (because
 its "universality") interacts with everything.

 I do think more generic is better, of course within reason. It 
 would be a tenuous statement that generic designs in Phobos 
 such as ranges, algorithms, and allocators are stupid and 
 idealistic. So I'd be quite interested in hearing more about 
 this. What's that bouillabaisse about?

For example std.allocator. It is awesome! But I can not use it in 
GLAS, because I don't understand if it will work without linking 
with DRuntime.

So, I copy-pasted and modified your code for AlignedMallocator:
https://github.com/libmir/mir-glas/blob/master/source/glas/internal/memory.d

ranges, algorithms seem good to me except it is not clear when 
code is nothrow /BetterC. std.math is a problem: we are adding 
new API without solving existing API problems and C 
compatibility. std.complex prevents math optimisations (this can 
not be solved without a compiler hacks), GLAS migrated to native 
(old) complex numbers.

I like generics when they make D usage simpler. If one will add a 
random number generation for Phobos sorting algorithm it will 
make it useless for BetterC (because it will require to link 
RNG). Such issues are not reviewed during Phobos review process. 
Linking Phobos / DRuntime is not an option because it has not 
backward binary compatibility, so packages can not be distributed 
as precompiled libraries.

std.traits, std.meta, std.range.primitives, std.ndslice, and part 
of std.math is only modules I am using in Mir libraries.

It is very important to me to have BetterC guarainties between 
different Phobos versions. Super generic code when different 
modules imports each other is hard to review.

Best regards,
Ilya

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
 I started with Engines as basis. The library will be very different
 comparing with Phobos and _any_ other RNG libraries in terms of floating
 point generation quality. All FP generation I have seen are not
 saturated (amount of possible unique FP values are very small comparing
 with ideal situation because of IEEE arithmetic). I have not found the
 idea described by others, so it may be an article in the future.

Is this a design matter, or an implementation matter? Could you 
implement the superior FP generation on the existing std.random API? To 
not be able to vs. to not want to is a different matter; I have an 
appreciation for each, but we need to clarify. -- Andrei

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
 I started with Engines as basis. The library will be very 
 different
 comparing with Phobos and _any_ other RNG libraries in terms 
 of floating
 point generation quality. All FP generation I have seen are not
 saturated (amount of possible unique FP values are very small 
 comparing
 with ideal situation because of IEEE arithmetic). I have not 
 found the
 idea described by others, so it may be an article in the 
 future.

 Is this a design matter, or an implementation matter? Could you 
 implement the superior FP generation on the existing std.random 
 API? To not be able to vs. to not want to is a different 
 matter; I have an appreciation for each, but we need to 
 clarify. -- Andrei

Floating point generation are implementation matter. 
Distributions and algorithms are design matter.

Assume your are building a modification Mod_X ~ 1 / X + X for a 
distribution. This is how it will look in Mir Random:

-------------------------

struct Mod(D)
   if(isRandomVariable!D && isFloatingPoint!(ReturnType!D))
{
     D irv;
     alias irv this;
     ReturnType!D opCall(G)(ref G gen)
         if(isSURBG!D)
     {
         ReturnType!D x1 = void;
         do x1 = irv(gen);
         while(x1 == 0);
         ReturnType!D x2 = irv(gen);
         
/////////////////////////////////////////////////////////////////////////
         /////////////////   X1 and X2 are independent! 
///////////////
         
////////////////////////////////////////////////////////////////////////
         return 1 / x1 + x2;
     }
}

unittest
{
     auto gen = Xorshift(1);
     auto rv = Mod!GammaRandomVariable(...);
     auto x = rv(gen); /// generator is never copied by value.
}

-------------------------

How it can be done with Range interface instead of opCall?
Please note that users would use Distributions, not Engines.

So, Range API requirements are:

1. Engine must not have implicit copy semantic: it is not correct 
for RNGs and has performance issues. They also should not be 
copied explicitly in this example.

2. Do not use Engine's pointers in RandomVariable, because 
RandomVariables can be passed easily outside of function: they 
are just a small tuples of params.

3. Do not use classes because of BetterC and performance issues.

4. Engines must have Range interface

5. RandomVariables (both Mod and an underlaying) must have Range 
interface.

I don't think Range API for random variables can be done easily 
and without performance or security issues.

Thanks,
Ilya

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 18:13:52 UTC, Ilya Yaroshenko 
wrote:
 Assume your are building a modification Mod_X ~ 1 / X + X for a 
 distribution. This is how it will look in Mir Random:

EDIT: Mod_X ~ Y + X, Y ~ X. (X & Y are independent)

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/2016 01:13 PM, Ilya Yaroshenko wrote:
 1. Engine must not have implicit copy semantic: it is not correct for
 RNGs and has performance issues. They also should not be copied
 explicitly in this example.

OK, so (lack of) copyability is a good argument. I'm ready to live with 
random generators that are noncopyable values and use opCall; then use a 
range adaptor on top of a pointer to that. It seems providing the range 
primitives for a noncopyable object is not useful and is liable to 
create more confusion and frustration than a distinct API.

A tip for the range adaptor: have it allocate and own the generator 
internally. That way it's easy to make it reference counted economically.


Andrei

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei 
Alexandrescu wrote:
 A tip for the range adaptor: have it allocate and own the 
 generator internally. That way it's easy to make it reference 
 counted economically.

Current adaptor should be used in a function scope. (Would be 
great to have a DIP for that kind of semantic check). An RC 
adaptor can be added too. First we need to find a real world use 
case where we need to store a random range outside of a function. 
-- Ilya

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote:
 A tip for the range adaptor: have it allocate and own the generator
 internally. That way it's easy to make it reference counted economically.

 Current adaptor should be used in a function scope. (Would be great to
 have a DIP for that kind of semantic check). An RC adaptor can be added
 too. First we need to find a real world use case where we need to store
 a random range outside of a function. -- Ilya

Yah, problem here is we can't make that  safe. -- Andrei

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei 
 Alexandrescu wrote:
 A tip for the range adaptor: have it allocate and own the 
 generator
 internally. That way it's easy to make it reference counted 
 economically.

 Current adaptor should be used in a function scope. (Would be 
 great to
 have a DIP for that kind of semantic check). An RC adaptor can 
 be added
 too. First we need to find a real world use case where we need 
 to store
 a random range outside of a function. -- Ilya

 Yah, problem here is we can't make that  safe. -- Andrei

Can we improve D to make it safe?

Nov 23 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/23/16 3:05 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei Alexandrescu wrote:
 On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu
 wrote:
 A tip for the range adaptor: have it allocate and own the generator
 internally. That way it's easy to make it reference counted
 economically.

 Current adaptor should be used in a function scope. (Would be great to
 have a DIP for that kind of semantic check). An RC adaptor can be added
 too. First we need to find a real world use case where we need to store
 a random range outside of a function. -- Ilya

 Yah, problem here is we can't make that  safe. -- Andrei

 Can we improve D to make it safe?

It's difficult in the general case for objective reasons (modular 
analysis). For this, we may be able to take an argument by ref, save its 
address inside, add some scope attributes. It's a project. -- Andrei

Nov 23 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko 
wrote:
 Current adaptor should be used in a function scope. (Would be 
 great to have a DIP for that kind of semantic check). An RC 
 adaptor can be added too. First we need to find a real world 
 use case where we need to store a random range outside of a 
 function. -- Ilya

You want to instantiate and seed Mt every time you call a 
function that may need random numbers?

Nov 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Thursday, 24 November 2016 at 08:59:03 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko 
 wrote:
 Current adaptor should be used in a function scope. (Would be 
 great to have a DIP for that kind of semantic check). An RC 
 adaptor can be added too. First we need to find a real world 
 use case where we need to store a random range outside of a 
 function. -- Ilya

 You want to instantiate and seed Mt every time you call a 
 function that may need random numbers?

A function can use a global RNG defined by a user or accepts RNG 
by reference. Range adaptor stores pointer, not value.

Nov 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
 I started with Engines as basis. The library will be very 
 different
 comparing with Phobos and _any_ other RNG libraries in terms 
 of floating
 point generation quality. All FP generation I have seen are not
 saturated (amount of possible unique FP values are very small 
 comparing
 with ideal situation because of IEEE arithmetic). I have not 
 found the
 idea described by others, so it may be an article in the 
 future.

 Is this a design matter, or an implementation matter? Could you 
 implement the superior FP generation on the existing std.random 
 API?

Real uniform `rand` (-2^^exp, +2^^exp) and real UniformVariable 
[a, b) was added. `rand` dose not use IEEE arithmetic to generate 
a real random number. --Ilya

Nov 24 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 23.11.2016 00:55, Andrei Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)

 This seems like a gratuitous departure from common D practice. Random
 number generators are most naturally modeled in D as infinite ranges. --
 Andrei

I would posit that it is not actually natural to model random numbers as 
ranges. Not every random object is a sequence. Different samples are 
(ideally) independent, so there is little conceptual motivation to group 
them together -- it just increases the logistic overhead needed to get 
at the single samples. I.e., the most natural way to use a PRNG range is 
as follows:

PRNG range;
auto sample(){
     auto r=range.front;
     range.popFront();
     return r;
}

PRNGs are an example where global state is actually desirable, because 
in contrast to most other programming tasks, usually less predictability 
is good.

Nov 24 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:

 [WIP]


 [WIP]


 [WIP]

As we discussed in relation to Seb's project, I think this is a 
problematic conceptualization of the best way to structure 
functionality related to randomness.

An arguably better way (as outlined in the C++11 standard) is to 
think in terms of:

   * random _generators_, i.e. sources of uniformly distributed 
random bits:

     - random _engines_ (seedable, pseudo-random algorithms)

     - random _devices_ (non-deterministic sources of uniformly 
distributed bits)

   * random _distributions_, which transform uniformly-distributed 
random bits
     into variates with some other type and distribution

     - note _this includes uniformly-distributed integers_!

     - also uniformly-distributed floats, enums, etc.

     - and also non-uniform distributions

   * random _algorithms_, i.e. algorithms in the sense of 
std.random, but where
     their popFront() includes random decision-making

     - randomCover, randomSample, etc.

The point of the above breakdown is that it gives a nice and 
clear separation of concerns that allows for easily replaceable 
components.  Separating out stuff just by the ultimate result you 
want (integers vs. floats, uniform vs. non-uniform, etc.) isn't 
helpful in that way.

Nov 22 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/22/16 7:36 PM, Joseph Rushton Wakeling wrote:
   * random _generators_, i.e. sources of uniformly distributed random bits:

     - random _engines_ (seedable, pseudo-random algorithms)

     - random _devices_ (non-deterministic sources of uniformly
 distributed bits)

   * random _distributions_, which transform uniformly-distributed random
 bits
     into variates with some other type and distribution

     - note _this includes uniformly-distributed integers_!

     - also uniformly-distributed floats, enums, etc.

     - and also non-uniform distributions

   * random _algorithms_, i.e. algorithms in the sense of std.random, but
 where
     their popFront() includes random decision-making

     - randomCover, randomSample, etc.

Yah, I think that would be nice. -- Andrei

Nov 22 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:
  - 64-bit Mt19937 is default for 64-bit targets

This means that seemingly identical code will produce different 
results depending on whether it's compiled for 64-bit or 32-bit.  
Is that really worth it, when anyone who cares about the 
difference between 32-bit vs. 64-bit random words is quite 
capable of specifying the RNG they want to use and not just 
relying on the default?

Having a different default RNG would make sense for targets where 
there are serious performance issues at stake (e.g. minimal 
memory available for RNG state) but just for the sake of 32- vs. 
64-bit Mersenne Twister seems an unnecessary complication.

These days it's debatable whether Mersenne Twister of _any_ word 
size is the optimal choice for a default RNG, so if the default 
is going to be changed, it might as well be to something 
significantly better (rather than worrying about numbers of bits).

Nov 22 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:
 These days it's debatable whether Mersenne Twister of _any_ word size is
 the optimal choice for a default RNG

Interesting. Could you please add a couple of links about that? -- Andrei

Nov 22 2016

ketmar <ketmar ketmar.no-ip.org> writes:

On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:
 On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:
 These days it's debatable whether Mersenne Twister of _any_ 
 word size is
 the optimal choice for a default RNG

 Interesting. Could you please add a couple of links about that? 
 -- Andrei

http://www.pcg-random.org/other-rngs.html

;-)

Nov 22 2016

Kagamin <spam here.lot> writes:

On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:
 Interesting. Could you please add a couple of links about that? 
 -- Andrei

http://xoroshiro.di.unimi.it/

Nov 23 2016

Andrea Fontana <nospam example.com> writes:

On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
 Alexandrescu wrote:
 Interesting. Could you please add a couple of links about 
 that? -- Andrei

 http://xoroshiro.di.unimi.it/

Very short public domain implementation:
http://xoroshiro.di.unimi.it/xoroshiro128plus.c

Nov 24 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Thursday, 24 November 2016 at 08:36:41 UTC, Andrea Fontana 
wrote:
 On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
 Alexandrescu wrote:
 Interesting. Could you please add a couple of links about 
 that? -- Andrei

 http://xoroshiro.di.unimi.it/

 Very short public domain implementation:
 http://xoroshiro.di.unimi.it/xoroshiro128plus.c

Implementation in D, written during DConf 2016 ;-)
https://github.com/WebDrake/dxorshift

Nov 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 00:44:26 UTC, Joseph Rushton 
Wakeling wrote:
 On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
 wrote:
  - 64-bit Mt19937 is default for 64-bit targets

 This means that seemingly identical code will produce different 
 results depending on whether it's compiled for 64-bit or 
 32-bit.  Is that really worth it, when anyone who cares about 
 the difference between 32-bit vs. 64-bit random words is quite 
 capable of specifying the RNG they want to use and not just 
 relying on the default?

 Having a different default RNG would make sense for targets 
 where there are serious performance issues at stake (e.g. 
 minimal memory available for RNG state) but just for the sake 
 of 32- vs. 64-bit Mersenne Twister seems an unnecessary 
 complication.

 These days it's debatable whether Mersenne Twister of _any_ 
 word size is the optimal choice for a default RNG, so if the 
 default is going to be changed, it might as well be to 
 something significantly better (rather than worrying about 
 numbers of bits).

Mir Random is going to be a library with saturated uniform FP 
RNGs and almost saturated exponential FP RNGs. Comparing with all 
other libraries (any language) the basic uniform FP numbers will 
be generated in interval (-1, +1) and contains _all_ possible 
values including all subnormal numbers. 64 bit generators are 2 
times faster for this task if you need to generate a 64 bit 
floating point number. Explanation of technique will be in my 
post/article. --Ilya

Nov 22 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko 
wrote:
 Mir Random is going to be a library with saturated uniform FP 
 RNGs and almost saturated exponential FP RNGs. Comparing with 
 all other libraries (any language) the basic uniform FP numbers 
 will be generated in interval (-1, +1) and contains _all_ 
 possible values including all subnormal numbers. 64 bit 
 generators are 2 times faster for this task if you need to 
 generate a 64 bit floating point number. Explanation of 
 technique will be in my post/article. --Ilya

All of which is fine in its own terms, but why prioritize the 
speed of the default behaviour over its reliability and 
reproducibility?

Anyone who cares about that combination of speed and statistical 
quality will have enough information in the documentation to know 
what to do.  By contrast producing different results for 
identical user code depending on whether you're making a 32- or 
64-bit build is an unpleasant complication it could be better to 
avoid.

In any case, given what you say above, shouldn't the choice of 
32- vs. 64-bit RNG depend on whether one is using a distribution 
that generates 32- vs. 64-bit floating-point variates, rather 
than whether one is building for a 32- or 64-bit target?  In 
which case it needs to be a user choice, since it's only the user 
who knows what distribution they're using.

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 10:33:21 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko 
 wrote:
 [...]

 All of which is fine in its own terms, but why prioritize the 
 speed of the default behaviour over its reliability and 
 reproducibility?

 Anyone who cares about that combination of speed and 
 statistical quality will have enough information in the 
 documentation to know what to do.  By contrast producing 
 different results for identical user code depending on whether 
 you're making a 32- or 64-bit build is an unpleasant 
 complication it could be better to avoid.

 In any case, given what you say above, shouldn't the choice of 
 32- vs. 64-bit RNG depend on whether one is using a 
 distribution that generates 32- vs. 64-bit floating-point 
 variates, rather than whether one is building for a 32- or 
 64-bit target?  In which case it needs to be a user choice, 
 since it's only the user who knows what distribution they're 
 using.

We have a Random alias. I think it is OK if it generates 
different numbers for different arch and library versions. If a 
user want to exact the same behaviour he can use explicit names 
like Mt19937_32 and Mt19937_64. Both are defined for all 
architectures.

64-bit has not significant speed degradation on 64-bit machines 
for 32-bit random number generation. Maybe only few %. So it is 
better for 64-bit machines.

It is only question of `Random` alias, which can be changed in 
the future anyway. Both Mt19937_32 and Mt19937_64 are defined.

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 11:03:33 UTC, Ilya Yaroshenko 
wrote:
 It is only question of `Random` alias, which can be changed in 
 the future anyway. Both Mt19937_32 and Mt19937_64 are defined.

I think we're at an impasse in terms of priorities, because 
that's exactly the reason that I think you should leave the 
Random alias pointing to the same generator, and let the people 
with speed/quality concerns select the alternative generator ;-)

Nov 23 2016

Andrea Fontana <nospam example.com> writes:

On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:
  - 64-bit Mt19937 initialization is fixed
  - 64-bit Mt19937 is default for 64-bit targets

I wonder why Mersenne Twister is *always* choosen over other 
algorithms.
My vote goes for CMWC, of course.

Andrea

Nov 23 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana 
wrote:
 I wonder why Mersenne Twister is *always* choosen over other 
 algorithms.

The weight of history, I suspect.  Mersenne Twister was the major 
new high-quality RNG back when people started getting really 
concerned about having good defaults, and when the Diehard Tests 
were the state of the art in tests of randomness.  IIRC there's 
also a benefit in terms of dimensionality, which some more recent 
generators don't address, which can make it a safer "all-round 
default".

Agree that there are probably better choices for today, though.

Nov 23 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana 
wrote:
 On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
 wrote:
  - 64-bit Mt19937 initialization is fixed
  - 64-bit Mt19937 is default for 64-bit targets

 I wonder why Mersenne Twister is *always* choosen over other 
 algorithms.
 My vote goes for CMWC, of course.

 Andrea

A PR for CMWC is highly welcome!

Nov 23 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Mir Random [WIP]