www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Mir Random [WIP]

reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
https://github.com/libmir/mir-random

# mir-random
Dlang Random Number Generators

#### Comparison with Phobos
##### Generators
  - `opCall` API instead of range interface is used (similar to 
C++)
  - No default and copy constructors are allowed for generators.
  - 64-bit Mt19937 initialization is fixed
  - 64-bit Mt19937 is default for 64-bit targets
  - `unpredictableSeed` has not state, returns `ulong`
  - Does not depend on DRuntime (Better C concept)
  - [WIP] additional Xorshift generators

##### Integer uniform generators
[WIP]

##### Real uniform generators
[WIP]

##### Nonuniform generators
[WIP]

Please contact me if you are interesting to contribute!
https://gitter.im/libmir/public
Nov 21 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Nov 22 2016
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to 
 C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
Yes, I think this is avoiding the existing problems with RNGs and ranges rather than solving them. I don't blame anyone for _wanting_ to avoid them; they are nasty, subtle issues that seem to keep getting more complex the more one looks at them (for example, after my DConf talk last year, I realized that there were a whole set of other potential complications related to how ranges typically treat laziness). But I think they can be solved, and should be. OTOH, there's no reason per se why there should not be an `opCall` for random number generators along the lines of, UIntType opCall() { this.popFront(); return this.front; } ... just to provide options to the user. (BTW, note the order there, which touches on the issues related to what lazy evaluation means not just for RNGs but for any non-deterministic IO.)
Nov 22 2016
prev sibling next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to 
 C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
I'm pretty sure everyone *wants* it to be a range, but it turns out it's a difficult design space. Lot's of nasty trade-offs that can be (and have been) argued back and forth depending on your priorities. Perhaps you have an insight that has been missed that can make a good rng range without causing less than optimal performance or statistically unsafe default behaviour? IIRC you think people are making too much fuss about the statistically safe default behaviour, but it would be interesting to hear a more expanded version of that view.
Nov 22 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/22/16 7:30 PM, John Colvin wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
I'm pretty sure everyone *wants* it to be a range, but it turns out it's a difficult design space. Lot's of nasty trade-offs that can be (and have been) argued back and forth depending on your priorities.
The proposed design has disabled copy ctor and uses opCall() for a new element. That seems to be a difference without a distinction from an input range that has disabled copy ctor and offers the input range primitives.
 Perhaps you have an insight that has been missed that can make a good
 rng range without causing less than optimal performance or statistically
 unsafe default behaviour?
We should add a reference RNG that transforms any other RNG into an input range that shares the actual RNG.
 IIRC you think people are making too much fuss about the statistically
 safe default behaviour, but it would be interesting to hear a more
 expanded version of that view.
I'm unclear on what that statistically unsafe default behavior is - my understanding is it has to do with RNGs being inadvertently copied. It would be great to formalize that in a well-explained issue. Andrei
Nov 22 2016
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
 I'm unclear on what that statistically unsafe default behavior 
 is - my understanding is it has to do with RNGs being 
 inadvertently copied. It would be great to formalize that in a 
 well-explained issue.
I'll see if I can write that up in depth some time soon. TBH though I think the problem is less about RNGs and more about stuff like RandomSample and RandomCover (and, in future, random distributions that have internal state, like a struct implementing a normal distribution using the Ziggurat algorithm internally). It's not so difficult to stop RNG state being copied inadvertently, but when you have ranges wrapping ranges wrapping ranges, each containing their own extra state that cannot be copied by value, things get a bit more complicated.
Nov 23 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 05:47 AM, Joseph Rushton Wakeling wrote:
 On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei Alexandrescu wrote:
 I'm unclear on what that statistically unsafe default behavior is - my
 understanding is it has to do with RNGs being inadvertently copied. It
 would be great to formalize that in a well-explained issue.
I'll see if I can write that up in depth some time soon. TBH though I think the problem is less about RNGs and more about stuff like RandomSample and RandomCover (and, in future, random distributions that have internal state, like a struct implementing a normal distribution using the Ziggurat algorithm internally). It's not so difficult to stop RNG state being copied inadvertently, but when you have ranges wrapping ranges wrapping ranges, each containing their own extra state that cannot be copied by value, things get a bit more complicated.
Well we need to get to the bottom of this if we're to make progress. Otherwise it's copypasta with little changes followed by disappointment all the way down. -- Andrei
Nov 23 2016
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 13:44:00 UTC, Andrei 
Alexandrescu wrote:
 Well we need to get to the bottom of this if we're to make 
 progress. Otherwise it's copypasta with little changes followed 
 by disappointment all the way down. -- Andrei
Would you be up for a direct discussion of this some time -- maybe next week once I've had time to re-review the fine details of my concerns? I can provide a summary of issues as a starting point for that discussion, if it would be useful. I suggest this because (i) I'm interested in your insight and (ii) even if we end up disagreeing on concerns and solutions, it would be good to make sure that we're on the same page in terms of understanding each other's point of view.
Nov 23 2016
prev sibling next sibling parent Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
 We should add a reference RNG that transforms any other RNG 
 into an input range that shares the actual RNG.
Didn't we have a generic Ref!T wrapper in std.typecons that would work this way? Can't find it now.
Nov 23 2016
prev sibling parent reply Martin Nowak <code dawg.eu> writes:
On Wednesday, 23 November 2016 at 01:34:23 UTC, Andrei 
Alexandrescu wrote:
 On 11/22/16 7:30 PM, John Colvin wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
 Alexandrescu wrote:
The proposed design has disabled copy ctor and uses opCall() for a new element. That seems to be a difference without a distinction from an input range that has disabled copy ctor and offers the input range primitives.
Yes the problems are inadvertent copies and a disabled this(this) would prevent that. RNGs should have unique ownership of their internal state. Using InputRanges with phobos is somewhat clumsy. Maybe people have been burned by those and now generally consider InputRanges as broken. Maybe non-copyability needs to become a requirement for InputRanges.
 We should add a reference RNG that transforms any other RNG 
 into an input range that shares the actual RNG.
We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work.
Nov 25 2016
next sibling parent reply Martin Nowak <code dawg.eu> writes:
On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Maybe non-copyability needs to become a requirement for 
 InputRanges.
Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct?
 We already have refRange for that, no?
 Also refCounted(Random(unpredictableSeed)) should work.
Same scheme works for files with only plain int fds streams, non-copyable/unique ownership, and move, refRange, or refCounted for passing and sharing. Should we split off this discussion to a dlang-study thread?
Nov 25 2016
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 26 November 2016 at 06:55:24 UTC, Martin Nowak wrote:
 Should we split off this discussion to a dlang-study thread?
I would personally really welcome that, but subject to the understanding that people agree to look seriously at random algorithms (like RandomSample) and not focus the discussion solely on RNGs.
Nov 26 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/26/2016 01:55 AM, Martin Nowak wrote:
 On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Maybe non-copyability needs to become a requirement for InputRanges.
Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct?
Input ranges with disabled this(this) would be a major breaking change that we probably cannot bear (and shouldn't because the gain is small). I think an input range would be at best "pure reference" in the sense that popFront advances all copies to the same position. -- Andrei
Nov 26 2016
parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Saturday, November 26, 2016 12:24:47 Andrei Alexandrescu via Digitalmars-
d wrote:
 On 11/26/2016 01:55 AM, Martin Nowak wrote:
 On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Maybe non-copyability needs to become a requirement for InputRanges.
Could have an optional .clone if copying is supported. What would be an InputRange where copying is correct?
Input ranges with disabled this(this) would be a major breaking change that we probably cannot bear (and shouldn't because the gain is small). I think an input range would be at best "pure reference" in the sense that popFront advances all copies to the same position. -- Andrei
For a while now, I've been tempted to argue that we should require that well-behaved input ranges be reference types (or at least have reference semantics). It just seems like too much misbehaves if they're not, and arguably, the whole reason that they're input ranges and not forward ranges (aside from the programmer just not bothering to implement save) is that they effectively have reference semantics whether they're actually coded that way or not. We obviously can't test for that behavior at compile time, but it could easily be on the list of things that are documented as a requirement for well-behaved ranges. I'm also tempted to argue that well-behaved forward ranges and greater should have value semantics (in the sense that dynamic arrays do), but that's a much bigger change (it effectively makes save unnecessary), and proper use of save works around that problem (though it's very common that save isn't used as often as it should be). It would clean up a lot of the mistakes that occur with forward ranges though. However, regardless, a non-copyable range would not play nicely at all with how range-based code works right now. You'd have to use ref all over the place, which range-based code simply doesn't do and would not play well with function chaining, since you'd need lvalues, and since you couldn't copy the range to the stack to make it an lvalue, that seems like it would go nowhere fast. - Jonathan M Davis
Nov 27 2016
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 26 November 2016 at 06:46:19 UTC, Martin Nowak wrote:
 Yes the problems are inadvertent copies and a disabled 
 this(this) would prevent that. RNGs should have unique 
 ownership of their internal state.
 Using InputRanges with phobos is somewhat clumsy. Maybe people 
 have been burned by those and now generally consider 
 InputRanges as broken.
 Maybe non-copyability needs to become a requirement for 
 InputRanges.
I remember you made that suggestion of disabled this(this) to me a while back, and I did use it for this small collection of RNGs: https://github.com/WebDrake/dxorshift
 We should add a reference RNG that transforms any other RNG 
 into an input range that shares the actual RNG.
We already have refRange for that, no? Also refCounted(Random(unpredictableSeed)) should work.
RefRange still has the issue that it only forwards the range primitives and not other properties such as the `isUniformRandom` enum. But those are probably fixable. However, these approaches basically cover the case of something where an instance is initialized high up in the program and passed around by ref throughout its lifetime. That doesn't address the question of how random _algorithms_ like `RandomSample` could be updated for safety (since they might often need to be initialized multiple times in the inner loops of a program). See: https://forum.dlang.org/post/gnlvyogolkmocujtnxjj forum.dlang.org https://forum.dlang.org/post/gpsilefdillwtleuwohl forum.dlang.org for some discussion of the problem.
Nov 26 2016
prev sibling next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu 
wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to 
 C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs. ### Example of API+implementation bug: #### Bug: RNGs has min and max params (hello C++). But, they are not used when an uniform integer number is generated : `uniform!ulong` / `uniform!ulong(0, 100)`. #### Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits uniformly. It is RNG problem how to do it. I will not fill this bug as well another dozen std.random bugs because the module should be rewritten anyway and I am working on it. std.random is a collection of bugs from C/C++ libraries extended with D generic idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit by design. Furthermore, 64 expansion of 32 bit algorithms must be proved theoretically before we allow it for end users. 64 bit analogs are exists, but they have another implementations. Phobos degrades because we add a lot of generic specializations and small utilities without understanding use cases. Phobos really follows stupid idealistic idea: more generic is better, more API is better, more universal algorithms is better. The problems is that Phobos/DRuntime is soup where all (because its "universality") interacts with everything.
Nov 22 2016
next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
 Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar 
 to C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
It is safe low level architecture without performance and API issues. It prevents users to do stupid things implicitly (like copying RNGs). A hight level range interface can be added in the future (it will hold a _pointer_ to an RNG). In additional, when you need to write algorithms or distributions opCall is much more convenient than range API. In additions, users would not use Engine API in 99% cases: they will just want to call `rand` or `uniform`, or other distribution. I am sure that almost any library should have low level API that is fits to its implementation first. Addition API levels also may be added. Current Phobos evolution is generic degradation: more generic and "universal" code hide more uncovered bugs in the code. The std.range is good example of degradation, it has a lot of API and implementation bugs.
EDIT: std.range -> std.random
Nov 22 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 01:16 AM, Ilya Yaroshenko wrote:
 The std.range is good example of degradation, it has a lot of API and
 implementation  bugs.
EDIT: std.range -> std.random
Oh, that narrows it down. Thx. -- Andrei
Nov 23 2016
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
 ### Example of API+implementation bug:

 #### Bug: RNGs has min and max params (hello C++). But, they 
 are not used when an uniform integer number is generated : 
 `uniform!ulong` / `uniform!ulong(0, 100)`.

 #### Solution: In Mir Rundom any RNGs must generate all 
 8/16/32/64 bits uniformly. It is RNG problem how to do it.
Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.
Nov 23 2016
next sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 10:27:00 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
 wrote:
 ### Example of API+implementation bug:

 #### Bug: RNGs has min and max params (hello C++). But, they 
 are not used when an uniform integer number is generated : 
 `uniform!ulong` / `uniform!ulong(0, 100)`.

 #### Solution: In Mir Rundom any RNGs must generate all 
 8/16/32/64 bits uniformly. It is RNG problem how to do it.
Alternative solution: functionality that expects the full unsigned integer word (UIntType) to be filled with random bits, should validate that the min/max values of the generator correspond to UIntType.min and UIntType.max. That introduces less breaking change, creates less divergence with the C++11 standard, and preserves more flexibility for the future.
Good point, will add this. --Ilya
Nov 23 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 05:27 AM, Joseph Rushton Wakeling wrote:
 Alternative solution: functionality that expects the full unsigned
 integer word (UIntType) to be filled with random bits, should validate
 that the min/max values of the generator correspond to UIntType.min and
 UIntType.max.

 That introduces less breaking change, creates less divergence with the
 C++11 standard, and preserves more flexibility for the future.
Good idea - static assert is your friend. -- Andrei
Nov 23 2016
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
wrote:
 It is safe low level architecture without performance and API 
 issues. It prevents users to do stupid things implicitly (like 
 copying RNGs). A hight level range interface can be added in 
 the future (it will hold a _pointer_ to an RNG).
Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate. Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful. Something like this: struct MyRNG { void popFront() { /* update internal state */ } UIntType front() property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } } (The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.)
 In additional, when you need to write algorithms or 
 distributions opCall is much more convenient than range API.
Can you give some examples of what you mean here?
  In additions, users would not use Engine API in 99% cases: 
 they will just want to call `rand` or `uniform`, or other 
 distribution.
I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to.
Nov 23 2016
next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 05:58:47 UTC, Ilya Yaroshenko 
 wrote:
 It is safe low level architecture without performance and API 
 issues. It prevents users to do stupid things implicitly (like 
 copying RNGs). A hight level range interface can be added in 
 the future (it will hold a _pointer_ to an RNG).
Note that if you want to do this, it's convenient to still preserve a separation between popping the RNG state versus getting the current variate. Otherwise, the range interface will wind up having to separately cache the variate value, which is wasteful. Something like this: struct MyRNG { void popFront() { /* update internal state */ } UIntType front() property { return /* whatever part of internal state */; } UIntType opCall() { this.popFront(); return this.front; } } (The above is basically just the input range API less the `empty` property, and the `front` and `popFront()` are arguably a lower-level API than `opCall`.)
 In additional, when you need to write algorithms or 
 distributions opCall is much more convenient than range API.
Can you give some examples of what you mean here?
For example, Mir Random basic utilities (more low level then distributions): https://github.com/libmir/mir-random/blob/master/source/random/package.d Also you can explore std.random code. opCall would almost always more convenient to use.
  In additions, users would not use Engine API in 99% cases: 
 they will just want to call `rand` or `uniform`, or other 
 distribution.
I don't think that's necessarily true, but in any case, we shouldn't restrict the use-cases of the 1% unless we have to.
We don't restrict. It is 1 minute to write an Range wrapper. This wrapper can be added to the library, if we found a real world use case. Again, I have not seen a real world algorithm, which looks better with Range API for generators. RandomSample can be created without Range API, and it would look more convenient then it is now. I think that Range API is bad and useless overengineering for RNGs.
Nov 23 2016
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko 
wrote:
 For example, Mir Random basic utilities (more low level then 
 distributions):
 https://github.com/libmir/mir-random/blob/master/source/random/package.d

 Also you can explore std.random code. opCall would almost 
 always more convenient to use.
Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes.
 We don't restrict. It is 1 minute to write an Range wrapper.
If all you have is `opCall` then the range wrapper is less efficient than it need be.
 This wrapper can be added to the library, if we found a real 
 world use case. Again, I have not seen a real world algorithm, 
 which looks better with Range API for generators. RandomSample 
 can be created without Range API, and it would look more 
 convenient then it is now. I think that Range API is bad and 
 useless overengineering for RNGs.
Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.
Nov 23 2016
next sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 11:14:38 UTC, Ilya Yaroshenko 
 wrote:
 For example, Mir Random basic utilities (more low level then 
 distributions):
 https://github.com/libmir/mir-random/blob/master/source/random/package.d

 Also you can explore std.random code. opCall would almost 
 always more convenient to use.
Yes, but as described, `opCall` can easily be created as a composition of `popFront` and `front` for these convenience purposes.
 We don't restrict. It is 1 minute to write an Range wrapper.
If all you have is `opCall` then the range wrapper is less efficient than it need be.
Inlining will solve this problem with data duplication (if any) for most real world cases.
 This wrapper can be added to the library, if we found a real 
 world use case. Again, I have not seen a real world algorithm, 
 which looks better with Range API for generators. RandomSample 
 can be created without Range API, and it would look more 
 convenient then it is now. I think that Range API is bad and 
 useless overengineering for RNGs.
Yes, most uses of RNGs in std.random involve calling `front` and then `popFront()` (although it would probably be better the other way round). But it's readily possible to imagine range-based use-cases for random distributions along the lines of, myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 0).somethingElse.take(20); But what I'd say more broadly is that of what I've seen so far, mir.random is conflating too many breaking changes without giving thought for their impact (for example, converting the `isUniformRNG` check to rely on a UDA is IMO problematic; I can file a GitHub issue explaining the reasons for this). Allowing for the wider goals of the exercise, it could be worth giving some thought to which of these breakages is really needed to support your use-cases, and which can be avoided.
Yes, please file a GitHub issue.
Nov 23 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 11:44:44 UTC, Joseph Rushton 
Wakeling wrote:
 Yes, most uses of RNGs in std.random involve calling `front` 
 and then `popFront()` (although it would probably be better the 
 other way round).  But it's readily possible to imagine 
 range-based use-cases for random distributions along the lines 
 of,

     myRNG.normalDistribution(0.0, 5.0).filter!(a => a > 
 0).somethingElse.take(20);

 But what I'd say more broadly is that of what I've seen so far, 
 mir.random is conflating too many breaking changes without 
 giving thought for their impact (for example, converting the 
 `isUniformRNG` check to rely on a UDA is IMO problematic; I can 
 file a GitHub issue explaining the reasons for this).  Allowing 
 for the wider goals of the exercise, it could be worth giving 
 some thought to which of these breakages is really needed to 
 support your use-cases, and which can be avoided.
Added RandomRangeAdaptor for URBGs: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
Nov 23 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko 
wrote:
 Added RandomRangeAdaptor for URBGs:
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.
Nov 23 2016
next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 13:26:41 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko 
 wrote:
 Added RandomRangeAdaptor for URBGs:
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.
1. Current default RNG (Mt19937) has not state for the latest value. 2. The structure is allocated on stack and compilers can recognise loop patterns and eliminate addition memory movements for many cases.
Nov 23 2016
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 13:35:33 UTC, Ilya Yaroshenko 
wrote:
 1. Current default RNG (Mt19937) has not state for the latest 
 value.
That's a detail of your implementation, though, and it's not true for lots of other pseudo-RNGs.
 2. The structure is allocated on stack and compilers can 
 recognise loop patterns and eliminate addition memory movements 
 for many cases.
Fair enough. I'm still not sure, though, why you would object to providing public methods for pseudo-RNGs to (i) update the internal state and (ii) access the most recently generated variate, which the `opCall` would then wrap.
Nov 23 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 08:26 AM, Joseph Rushton Wakeling wrote:
 On Wednesday, 23 November 2016 at 13:03:04 UTC, Ilya Yaroshenko wrote:
 Added RandomRangeAdaptor for URBGs:
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
This has exactly the problem I identified above, though: you're unnecessarily cacheing the latest variate rather than just using the RNG state directly. Not the biggest deal in the world, but avoidable if you allow a separation between updating RNG state and accessing it.
Well I do see another small problem with https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d#L19. It creates the first value eagerly upon construction. That's just a bit awkward. It's not the first time I'm seeing this issue, and to be honest am a bit bummed about it. It's a consequence of all ranges needing to buffer at least one element. -- Andrei
Nov 23 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei 
Alexandrescu wrote:
 Well I do see another small problem with 
 https://github.com/libmir/mir-random/blob/master/source/ran
om/algorithm.d#L19. It creates the first value eagerly upon construction.
That's just a bit awkward. It's not the first time I'm seeing this issue, and
to be honest am a bit bummed about it. It's a consequence of all ranges needing
to buffer at least one element. -- Andrei
Yea, this touches on one of my more general concerns related to ranges, randomness and laziness. Part of the trouble is we don't have a really good general design for how to implement _truly_ lazy ranges, where the value of `front` is not determined until the point where you first access it. This has particular relevance to e.g. RandomSample and RandomCover.
Nov 23 2016
parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wednesday, November 23, 2016 14:02:36 Joseph Rushton Wakeling via 
Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 13:56:27 UTC, Andrei

 Alexandrescu wrote:
 Well I do see another small problem with
 https://github.com/libmir/mir-random/blob/master/source/random/algorithm
 .d#L19. It creates the first value eagerly upon construction. That's
 just a bit awkward. It's not the first time I'm seeing this issue, and
 to be honest am a bit bummed about it. It's a consequence of all ranges
 needing to buffer at least one element. -- Andrei
Yea, this touches on one of my more general concerns related to ranges, randomness and laziness. Part of the trouble is we don't have a really good general design for how to implement _truly_ lazy ranges, where the value of `front` is not determined until the point where you first access it. This has particular relevance to e.g. RandomSample and RandomCover.
It's quite feasible if you don't calculate front until it's called or popFront is called, and then you cache the result so that subsequent calls to front prior to the call to popFront return the same result, but it creates additional overhead, because then every call to front and popFront has to check whether the value has been calculated yet. And if every range in the chain of ranges has to do that, then that additional overhead is going to add up. In general, it's just cleaner to either calculate front on every call to front (which doesn't make sense in the case of random number generators) or to calculate the first front upon construction and be done with it. And I think that the general consensus has been that calculating front in the constructor and popFront and caching works better than calculating it on every call to front, but that doesn't always work (e.g. map), and that caching does cause issues occasionally. - Jonathan M Davis
Nov 23 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis 
wrote:
 It's quite feasible if you don't calculate front until it's 
 called or popFront is called, and then you cache the result so 
 that subsequent calls to front prior to the call to popFront 
 return the same result, but it creates additional overhead, 
 because then every call to front and popFront has to check 
 whether the value has been calculated yet. And if every range 
 in the chain of ranges has to do that, then that additional 
 overhead is going to add up.
Yes, indeed. I actually came up with a general purpose solution for that a while back via template mixin (although the version in the post linked to is not the final version: I updated it to use HaraldZealot's suggestion of making the frontImplementation and popFrontImplementation the template parameters): https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d puremagic.com
 In general, it's just cleaner to either calculate front on 
 every call to front (which doesn't make sense in the case of 
 random number generators) or to calculate the first front upon 
 construction and be done with it.
In some ways I think it's cleaner to not privilege the first front value and go for "true" laziness for all front values -- particularly when dealing with stuff like RandomSample. The downside is that `.front` becomes non-const. Sometimes I wonder whether it wouldn't have been better to require that `popFront()` be called before even the first call to `.front`, and place the onus on `popFront()` to handle any special-casing of the first `.front` value.
 And I think that the general consensus has been that 
 calculating front in the constructor and popFront and caching 
 works better than calculating it on every call to front, but 
 that doesn't always work (e.g. map), and that caching does 
 cause issues occasionally.
The case I always think of is: what if your input range is designed to correspond to sensor observations made at a particular time? This is a case where cacheing the first value on construction is very problematic. For RNGs it's really not such a big deal so long as successive variates are independent, but for random algorithms I think the laziness matters, since their popFront is essentially IO-dependent (in the Haskell-style sense that randomness is IO).
Nov 25 2016
parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Friday, November 25, 2016 08:32:07 Joseph Rushton Wakeling via 
Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 21:19:51 UTC, Jonathan M Davis

 wrote:
 It's quite feasible if you don't calculate front until it's
 called or popFront is called, and then you cache the result so
 that subsequent calls to front prior to the call to popFront
 return the same result, but it creates additional overhead,
 because then every call to front and popFront has to check
 whether the value has been calculated yet. And if every range
 in the chain of ranges has to do that, then that additional
 overhead is going to add up.
Yes, indeed. I actually came up with a general purpose solution for that a while back via template mixin (although the version in the post linked to is not the final version: I updated it to use HaraldZealot's suggestion of making the frontImplementation and popFrontImplementation the template parameters): https://forum.dlang.org/post/mailman.263.1439926667.13986.digitalmars-d pu remagic.com
 In general, it's just cleaner to either calculate front on
 every call to front (which doesn't make sense in the case of
 random number generators) or to calculate the first front upon
 construction and be done with it.
In some ways I think it's cleaner to not privilege the first front value and go for "true" laziness for all front values -- particularly when dealing with stuff like RandomSample.
It's much messier from an implementation standpoint. If the constructor checks for empty and caches front if the range it was given was non-empty, then every call to front can just return front, and every call to popFront can just pop the element off and do whatever it does to turn the front from the wrapped range into the front of this range and then cache it. If the constructor doesn't do that work, then every call to front and every call to popFront have to check whether they're the first time that either has been called and _then_ they end up doing all of the same work anyway. So, there's more code, and it's less efficient. And if every range in the chain does that, then you're getting a lot of extra checks just to determine whether it's the first time that front or popFront is being called. Now, if there's never any caching of the front value, and front always calculates it, then that might actually be more efficient if front is only ever called once per element, because you don't get the cost of copying the element to cache it, but if front gets called multiple times (which happens fairly often, I think), then the cost is definitely greater. You're never going to be able to rely on a fully lazy front anyway unless we specified it as part of the range API that all ranges work that way, and as it stands, very few ranges do.
 The downside is that `.front` becomes non-const.
Well, realistically, as it stands, ranges are utterly useless with const anyway. We'd have to have a standard mechanism for getting a tail-const copy of a range from a const or immutable range, and we don't.
 Sometimes I
 wonder whether it wouldn't have been better to require that
 `popFront()` be called before even the first call to `.front`,
 and place the onus on `popFront()` to handle any special-casing
 of the first `.front` value.
That would be terrible with chaining ranges. It also would be _very_ error-prone. It's also heading into two-phase initialization territory, which is almost always a bad idea.
 And I think that the general consensus has been that
 calculating front in the constructor and popFront and caching
 works better than calculating it on every call to front, but
 that doesn't always work (e.g. map), and that caching does
 cause issues occasionally.
The case I always think of is: what if your input range is designed to correspond to sensor observations made at a particular time? This is a case where cacheing the first value on construction is very problematic. For RNGs it's really not such a big deal so long as successive variates are independent, but for random algorithms I think the laziness matters, since their popFront is essentially IO-dependent (in the Haskell-style sense that randomness is IO).
Honestly, I don't think that it really works to have a range where the value of its elements depend on when you call front or popFront. The API lets you do it, but you're going to run into plenty of problems if you do unless you don't care about the frequency at which the values are generated. Part of the problem here is that ranges and the algorithms that use them are really designed with arrays in mind. A non-random-access range reduces those capabilities, and a basic input range doesn't actually require that the elements be reproducible aside from multiple calls to front giving the same result, but it's still the case that it's pretty much assumed that a range is a fixed set of values, and pretty much nothing is concerned with how fast front or popFront is called aside for efficiency concerns. So, if someone has a range that calculates a value when front or popFront is called, and that value depends on when the function is called, I think that they're just going to have to deal with the consequences of that. It's already pretty problematic when you have a range that's a basic input range rather than a forward range. Another thing to consider here is that those sorts of concerns really only apply to basic input ranges. As soon as you have a forward range, the values of the elements need to be completely predictable - or at least completely reproducible - meaning that aside from caching a potentially infinite number of elements to guarantee that save works, nothing that's doing something like grabbing changing values from a sensor is going to be anything but a basic input range. So, maybe we should do something special with regards to basic input ranges and how they're treated in order to better deal with cases like that, but if we went that route, then we pretty much couldn't treat them like forward ranges without save anymore. In many cases, we'd have to have separate implementations that avoided things like caching, and that would become problematic fast. I think that there always going to be cases where certain things have some funky corner cases with ranges - especially the further that they get from being dynamic arrays. We probably need to do a better job of formalizing some of this stuff to better avoid some of those corner cases, but I think that we're always going to be able to find something that we can define as a range that works in many cases but starts doing weird things if we use it on a large enough range of algorithms. - Jonathan M Davis
Nov 25 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:
 I think that Range API is bad and useless overengineering for RNGs.
So it either "takes 1 minute" to change the interface from opCall to ranges (i.e. they're virtually the same thing), or it's the above. There's precious little overengineering that can be done in 1 minute :o). I see you did that in a dozen lines in RandomRangeAdaptor. I understand you believe the range interface is unnecessary or overkill for random number generators. I may even agree for certain cases. However, there are a few arguments you may want to consider: * By virtue of working with D, everybody know what a range is. Presenting the RNGs that way instantly evokes a host of knowledge, understanding, and idioms. * D users (or at least a fraction) and several algorithms understand the notion of an infinite range and make salient decisions based on that. A range interface automatically taps into that. Andrei
Nov 23 2016
parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wednesday, November 23, 2016 08:54:30 Andrei Alexandrescu via 
Digitalmars-d wrote:
 On 11/23/2016 06:14 AM, Ilya Yaroshenko wrote:
 I think that Range API is bad and useless overengineering for RNGs.
So it either "takes 1 minute" to change the interface from opCall to ranges (i.e. they're virtually the same thing), or it's the above. There's precious little overengineering that can be done in 1 minute :o). I see you did that in a dozen lines in RandomRangeAdaptor. I understand you believe the range interface is unnecessary or overkill for random number generators. I may even agree for certain cases. However, there are a few arguments you may want to consider: * By virtue of working with D, everybody know what a range is. Presenting the RNGs that way instantly evokes a host of knowledge, understanding, and idioms. * D users (or at least a fraction) and several algorithms understand the notion of an infinite range and make salient decisions based on that. A range interface automatically taps into that.
I've frequently found that in cases where I'm not operating on a range of values, I really just want a way to get a random number, and having to call both popFront and front to get it is a bit annoying (it's actually one of the few places that I've ever used the comma operator). That being said, I think that the correct solution to that is to add a function to std.random that takes a range, calls popFront on it, and then returns front so that it can work with any of the random number generators and avoids problems with whether or not popFront was called prior to calling front. I have to agree that creating a different API that uses opCall or whatever instead of a the range API is a bad idea, particularly when a simple helper function would make it possible to use a random number generator in a fashion more like rand() for the cases where that's preferable, and for a lot of cases, having the range API is _extremely_ useful. - Jonathan M Davis
Nov 23 2016
parent Somebody <ajieskola gmail.com> writes:
 I have to agree that creating a different API that uses opCall 
 or whatever instead of a the range API is a bad idea, 
 particularly when a simple helper function would make it 
 possible to use a random number generator in a fashion more 
 like rand() for the cases where that's preferable, and for a 
 lot of cases, having the range API is _extremely_ useful.

 - Jonathan M Davis
Would, assuming the OpCall() interface, generate!(() => rng()) make an input range out of it easily? Or am I missing something?
Nov 23 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 10:57:04 UTC, Joseph Rushton 
Wakeling wrote:
 struct MyRNG
 {
     void popFront() { /* update internal state */ }

     UIntType front()  property { return /* whatever part of 
 internal state */; }

     UIntType opCall()
     {
         this.popFront();
         return this.front;
     }
 }
xorshift128+ returns a temporary value computed during state update, so it can't efficiently separate font from popFront. Also this API is prone to reuse some values and discard others, call popFront after front.
Nov 23 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 09:12 AM, Kagamin wrote:
 xorshift128+ returns a temporary value computed during state update, so
 it can't efficiently separate font from popFront.
That seems to be a minor matter. Of course at best some measurements would be in order.
 Also this API is prone
 to reuse some values and discard others, call popFront after front.
This claim would apply to all ranges. There seems to be evidence it is unfounded. The main argument for using the range interface for RNGs is reuse of abstraction. Minute implementation matters are much less important. The main counter-argument is that the abstraction is not fitting well and another abstraction (such as opCall) is more advantageous. Andrei
Nov 23 2016
next sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
 This claim would apply to all ranges. There seems to be 
 evidence it is unfounded.

 The main argument for using the range interface for RNGs is 
 reuse of abstraction. Minute implementation matters are much 
 less important. The main counter-argument is that the 
 abstraction is not fitting well and another abstraction (such 
 as opCall) is more advantageous.
Consider this okayish looking code: consume(rng()); consume(rng.take(2)); //reuses previous value consume(rng()); //discards unused value
Nov 23 2016
next sibling parent Ryan <clumsycodemonkey gmail.com> writes:
On Wednesday, 23 November 2016 at 15:29:14 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei

 Consider this okayish looking code:
 consume(rng());
 consume(rng.take(2)); //reuses previous value
 consume(rng()); //discards unused value
Also consider the case of using 1 generator in your program, but calling it from different places. In one place, you use opCall with the popFront -> front order of calls, and in the other you use the range interface directly with the order reversed. You would re-use values there too. By offering both interfaces together it makes it pretty easy to create silly bugs like this, especially in a large project. Might be useful to have a basic RNG interface and then wrap it with a template that provides either, but not both, of the other interfaces at a time.
Nov 23 2016
prev sibling parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wednesday, November 23, 2016 15:29:14 Kagamin via Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei

 Alexandrescu wrote:
 This claim would apply to all ranges. There seems to be
 evidence it is unfounded.

 The main argument for using the range interface for RNGs is
 reuse of abstraction. Minute implementation matters are much
 less important. The main counter-argument is that the
 abstraction is not fitting well and another abstraction (such
 as opCall) is more advantageous.
Consider this okayish looking code: consume(rng()); consume(rng.take(2)); //reuses previous value consume(rng()); //discards unused value
In the cases where I don't really need a range of random numbers, I've typically ended up doing stuff like auto nextNum = rndGen().popFront(), rndGen().front; though I think that using the comma operator like that is deprecated now. Adding a helper function such as auto getNext(R)(ref R range) if(isInputRange!R) { range.popFront(); return range.front; } would solve that problem. And there are plenty of cases where having a range of random numbers is extremely useful. After having dealt with, std.random, it would really suck to have to deal with a random number generator that was not range-based. std.random has problems such as not using reference types for its ranges, but using the range API in the way it does is extremely useful. - Jonathan M Davis
Nov 23 2016
parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis 
wrote:
 though I think that using the comma operator like that is 
 deprecated now. Adding a helper function such as

 auto getNext(R)(ref R range)
     if(isInputRange!R)
 {
     range.popFront();
     return range.front;
 }

 would solve that problem.
It's the same behavior and suffers from the same problem of reuse of RNG output.
Nov 24 2016
parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, November 24, 2016 09:05:34 Kagamin via Digitalmars-d wrote:
 On Wednesday, 23 November 2016 at 21:33:53 UTC, Jonathan M Davis

 wrote:
 though I think that using the comma operator like that is
 deprecated now. Adding a helper function such as

 auto getNext(R)(ref R range)
     if(isInputRange!R)
 {
     range.popFront();
     return range.front;
 }

 would solve that problem.
It's the same behavior and suffers from the same problem of reuse of RNG output.
How so? Because someone might call range.front again without bothering to call popFront? If you're paranoid about that, then it can call popFront again before returning (though that would be wasteful). My take on it is that if you just call popFront before using the random number generator range, then you don't have to worry about what any other code that used it did. Regardless, the range API is _way_ more useful in general than something like rand(). And if anyone is trying to avoid ranges with random numbers, I think that they're just making their life harder. Occasionally, it's useful to get just one random number, in which case, having to deal with the range API over rand() is kind of annoying, but in general, it's not a problem, and the wrapper function that I suggested basically gives you rand() from a range of random numbers. Alternatively, you could just do rndGen().take(1).front, and as long as rndGen() gives you a reference type, it works just fine. Unfortunately, std.random did not use reference types for its ranges. _That_ is the big mistake of std.random and the main item that needs to be fixed. There are some other subtleties (e.g. it's useful to be able to save the state of a random number generating range, but you don't necessarily really want it to be a forward range), but those are minor in comparison to the mistake of std.random using value types rather than reference types for ranges. - Jonathan M Davis
Nov 24 2016
next sibling parent reply Kagamin <spam here.lot> writes:
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
wrote:
 How so? Because someone might call range.front again without 
 bothering to call popFront?
That's what everything in std.algorithm does.
Nov 24 2016
parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, November 24, 2016 13:54:50 Kagamin via Digitalmars-d wrote:
 On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis

 wrote:
 How so? Because someone might call range.front again without
 bothering to call popFront?
That's what everything in std.algorithm does.
Then call popFront or drop before passing it along if you're paranoid about it. If it's a serious concern that this be fixed in general, then the obvious fix to that would be to make it so that rndGen() just called popFront before returning it. Heck, if rndGen() were guaranteed to call popFront when it was called rather than simply returning the range, then you could just do rndGen().front whenever you wanted the equivalent of rand(), making it trivial to use rndGen() both for cases where you wanted a single number and for cases where you wanted a range of them - and without worrying about front being stale. - Jonathan M Davis
Nov 24 2016
parent Kagamin <spam here.lot> writes:
On Thursday, 24 November 2016 at 14:22:05 UTC, Jonathan M Davis 
wrote:
 Then call popFront or drop before passing it along if you're 
 paranoid about it.
There's little need for it, as it was pointed out earlier that the generate algorithm does the right thing and needs only opCall, also a nice example of orthogonality and composability.
Nov 25 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
wrote:
 Alternatively, you could just do rndGen().take(1).front, and as 
 long as rndGen() gives you a reference type, it works just 
 fine. Unfortunately, std.random did not use reference types for 
 its ranges. _That_ is the big mistake of std.random and the 
 main item that needs to be fixed. There are some other 
 subtleties (e.g. it's useful to be able to save the state of a 
 random number generating range, but you don't necessarily 
 really want it to be a forward range), but those are minor in 
 comparison to the mistake of std.random using value types 
 rather than reference types for ranges.

 - Jonathan M Davis
The fix is opCall syntax. Reference types are classes, which would not work without linking DRuntime and Phobos (BetterC). Refcounting is to slow to implement for users. Note, that Engines is only backend RNGs. There are also nonuniform distributions (WIP). See the example of users code: https://forum.dlang.org/post/nyvtoejvmsaolzaqyche forum.dlang.org . It is very difficult to implement both user random variable and Engine using Range API. Note, that code should work without DRuntime and should be simple (the example is user code). Ilya
Nov 24 2016
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 24 November 2016 at 16:09:23 UTC, Ilya Yaroshenko 
wrote:
 On Thursday, 24 November 2016 at 13:45:40 UTC, Jonathan M Davis 
 wrote:
 Alternatively, you could just do rndGen().take(1).front, and 
 as long as rndGen() gives you a reference type, it works just 
 fine. Unfortunately, std.random did not use reference types 
 for its ranges. _That_ is the big mistake of std.random and 
 the main item that needs to be fixed. There are some other 
 subtleties (e.g. it's useful to be able to save the state of a 
 random number generating range, but you don't necessarily 
 really want it to be a forward range), but those are minor in 
 comparison to the mistake of std.random using value types 
 rather than reference types for ranges.

 - Jonathan M Davis
The fix is opCall syntax. Reference types are classes, which would not work without linking DRuntime and Phobos (BetterC).
If druntime was initialised by default using __attribute__((constructor)) for static and linker .init for shared libraries, would that be good enough for you*? I feel like you're limiting your design choices because of a relatively small and simple implementation shortcoming. * remember, we don't currently guarantee ABI compatibility between compiler versions even if you don't use druntime/phobos.
Nov 24 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Thursday, 24 November 2016 at 17:04:43 UTC, John Colvin wrote:
 If druntime was initialised by default using 
 __attribute__((constructor)) for static and linker .init for 
 shared libraries, would that be good enough for you*? I feel 
 like you're limiting your design choices because of a 
 relatively small and simple implementation shortcoming.

 * remember, we don't currently guarantee ABI compatibility 
 between compiler versions even if you don't use druntime/phobos.
No, it should work without DRuntime. Nicholas wrote that it will work on GPU using dcompute :-) BetterC is the goal for the future Mir-Phobos, hehe
Nov 24 2016
prev sibling next sibling parent Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
 That seems to be a minor matter.
Yes, it's just Joseph worries about microoptimizations.
 The main counter-argument is that the abstraction is not 
 fitting well and another abstraction (such as opCall) is more 
 advantageous.
Ranges are designed for reuse of front, which is not desirable for RNGs.
Nov 23 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei 
Alexandrescu wrote:
 That seems to be a minor matter. Of course at best some 
 measurements would be in order.
 Yes, it's just Joseph worries about microoptimizations.
Though we saw that the compiler can't optimize some simple operations like division by 2.
Nov 23 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/16 11:10 AM, Kagamin wrote:
 On Wednesday, 23 November 2016 at 14:30:53 UTC, Andrei Alexandrescu wrote:
 That seems to be a minor matter. Of course at best some measurements
 would be in order.
 Yes, it's just Joseph worries about microoptimizations.
Though we saw that the compiler can't optimize some simple operations like division by 2.
[Details needed] I just took a look https://godbolt.org/g/EMy6X4, it's happening. Andrei
Nov 23 2016
parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei 
Alexandrescu wrote:
 [Details needed]

 I just took a look https://godbolt.org/g/EMy6X4, it's happening.
That's three instructions instead of one shr eax,1
Nov 24 2016
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 24.11.2016 09:55, Kagamin wrote:
 On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei Alexandrescu wrote:
 [Details needed]

 I just took a look https://godbolt.org/g/EMy6X4, it's happening.
That's three instructions instead of one shr eax,1
That would be wrong code. Cast to int after dividing.
Nov 24 2016
prev sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 24 November 2016 at 08:55:18 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 21:18:27 UTC, Andrei 
 Alexandrescu wrote:
 [Details needed]

 I just took a look https://godbolt.org/g/EMy6X4, it's 
 happening.
That's three instructions instead of one shr eax,1
That's because of the cast(int), dividing by two is optimised just fine. You can even do this: int foo(int a) { return a / 2; } uint foo(uint a) { return a / 2; } int bar(int a) { if (a > 0) return a / 2; else assert(0); } and get int example.foo(int): // slow, to deal with sign mov eax, edi shr eax, 31 add eax, edi sar eax ret uint example.foo(uint): // fast because no sign mov eax, edi shr eax ret int example.bar(int): // single shift because sign always 0 test edi, edi jle .L8 mov eax, edi sar eax ret .L8: ud2 This should help explain why the extra/different instructions are necessary for signed: int foo0(int a) { asm { naked; mov EAX, EDI; shr EAX, 31; add EAX, EDI; sar EAX, 1; ret; } } int foo1(int a) { asm { naked; mov EAX, EDI; sar EAX, 1; ret; } } int foo2(int a) { asm { naked; mov EAX, EDI; shr EAX, 1; ret; } } void main() { import std.stdio; foreach(i; [int.min, -1, 0, 1, int.max]) writeln(foo0(i), ' ', foo1(i), ' ', foo2(i)); } output: -1073741824 -1073741824 1073741824 0 -1 2147483647 0 0 0 0 0 0 1073741823 1073741823 1073741823
Nov 24 2016
parent reply Kagamin <spam here.lot> writes:
On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin wrote:
 That's because of the cast(int), dividing by two is optimised 
 just fine.
What about Andrei's example?
Nov 24 2016
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 24 November 2016 at 10:14:27 UTC, Kagamin wrote:
 On Thursday, 24 November 2016 at 09:46:16 UTC, John Colvin 
 wrote:
 That's because of the cast(int), dividing by two is optimised 
 just fine.
What about Andrei's example?
I was talking about andrei's example. He has a cast(int) in it before the division.
Nov 24 2016
parent reply Kagamin <spam here.lot> writes:
On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin wrote:
 I was talking about andrei's example. He has a cast(int) in it 
 before the division.
And you think that compilation of cast(int)a.length/2 to 3 instructions is optimized just fine? How is it better that one shift?
Nov 24 2016
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 24 November 2016 at 10:41:44 UTC, Kagamin wrote:
 On Thursday, 24 November 2016 at 10:16:11 UTC, John Colvin 
 wrote:
 I was talking about andrei's example. He has a cast(int) in it 
 before the division.
And you think that compilation of cast(int)a.length/2 to 3 instructions is optimized just fine? How is it better that one shift?
Because it's correct. If a.length is larger than int.max then cast(int)a.length will half the time be negative and therefore a simple rightshift would not be equivalent to division by 2.
Nov 24 2016
parent reply Kagamin <spam here.lot> writes:
On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:
 Because it's correct. If a.length is larger than int.max then 
 cast(int)a.length will half the time be negative and therefore 
 a simple rightshift would not be equivalent to division by 2.
It can't be possibly correct when a.length is larger than int.max because it doesn't recover information lost in a narrowing conversion.
Nov 24 2016
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 24.11.2016 14:50, Kagamin wrote:
 On Thursday, 24 November 2016 at 12:02:22 UTC, John Colvin wrote:
 Because it's correct. If a.length is larger than int.max then
 cast(int)a.length will half the time be negative and therefore a
 simple rightshift would not be equivalent to division by 2.
It can't be possibly correct when a.length is larger than int.max because it doesn't recover information lost in a narrowing conversion.
The code does not specify that this information should be recovered. Why is this the compiler's problem?
Nov 24 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
It is safe low level architecture without performance and API issues.
I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.)
 It
 prevents users to do stupid things implicitly (like copying RNGs).
An input range can be made noncopyable.
 A
 hight level range interface can be added in the future (it will hold a
 _pointer_ to an RNG).
Is there a reason to not have that now? Again, I'm literally talking about offering front/popFront in lieu of opCall(). The only implementation difference is you keep the currently-generated number as a member instead of returning it from opCall. I doubt one could measure a performance difference. If you implement it as a noncopyable input range, you get a large support network working for you. With opCall, we have virtually no such support - you need to do everything once again.
 In additional, when you need to write algorithms
 or distributions opCall is much more convenient than range API.
Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here.
 In
 additions, users would not use Engine API in 99% cases: they will just
 want to call `rand` or `uniform`, or other distribution.

 I am sure that almost any library should have low level API that is fits
 to its implementation first. Addition API levels also may be added.
Is there a large difference between opCall and front/popFront? Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation?
 Current Phobos evolution is generic degradation: more generic and
 "universal" code hide more uncovered bugs in the code. The std.range is
 good example of degradation, it has a lot of API and implementation  bugs.
Do you have examples of issues outside random number generators?
 ### Example of API+implementation bug:

 #### Bug: RNGs has min and max params (hello C++). But, they are not
 used when an uniform integer number is generated : `uniform!ulong` /
 `uniform!ulong(0, 100)`.

 #### Solution: In Mir Rundom any RNGs must generate all 8/16/32/64 bits
 uniformly. It is RNG problem how to do it.
Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs. (I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.)
 I will not fill this bug as well another dozen std.random bugs because
 the module should be rewritten anyway and I am working on it. std.random
 is a collection of bugs from C/C++ libraries extended with D generic
 idioms. For example, there is no reason in 64 bit Xorshift. It is 32 bit
 by design. Furthermore, 64 expansion of 32 bit algorithms must be proved
 theoretically before we allow it for end users. 64 bit analogs are
 exists, but they have another implementations.
One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is more difficult. Is your intent to work on mir.random on the side and then submit it as a wholesale replacement of std.random under a different name? In that case you'd have my support, but you'd need to convince me the replacement is necessary. You'd probably have a good case for eliminating xorshift/64, but then we may simply deprecate that outright. You'd possibly have a more difficult time with opCall.
 Phobos degrades because
 we add a lot of generic specializations and small utilities without
 understanding use cases.
This is really difficult to parse. Are you using "degrades" the way it's meant? What is a "generic specialization"? What are examples of "small utilities without understanding use cases"?
 Phobos really follows stupid idealistic idea:
 more generic is better, more API is better, more universal algorithms is
 better. The problems is that Phobos/DRuntime is soup where all (because
 its "universality") interacts with everything.
I do think more generic is better, of course within reason. It would be a tenuous statement that generic designs in Phobos such as ranges, algorithms, and allocators are stupid and idealistic. So I'd be quite interested in hearing more about this. What's that bouillabaisse about? Thanks, Andrei
Nov 23 2016
next sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei 
Alexandrescu wrote:
 An input range can be made noncopyable.

 If you implement it as a noncopyable input range, you get a 
 large support network working for you.
struct A { bool empty; int front; void popFront(){} disable this(this); } import std.algorithm; void f() { A r; auto m=r.map!(a=>1); } Does this compile for you?
Nov 23 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 15:25:29 UTC, Kagamin wrote:
 struct A
 {
 	bool empty;
 	int front;
 	void popFront(){}
 	 disable this(this);
 }

 import std.algorithm;
 void f()
 {
 	A r;
 	auto m=r.map!(a=>1);
 }

 Does this compile for you?
auto m = (&r).map!(a=>1); ...? (Untested, but I think it should work.)
Nov 23 2016
next sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)
Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)
Nov 23 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 15:59:39 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
 Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)
Or RefRange https://issues.dlang.org/show_bug.cgi?id=10888 :)
What the principal difference with randomRangeAdaptor? Anyway, noncopyable input range con not be used with Phobos. --Ilya
Nov 23 2016
prev sibling parent reply Ryan <clumsycodemonkey gmail.com> writes:
On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)
Tested, works with 2.071.1
Nov 23 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 16:01:14 UTC, Ryan wrote:
 On Wednesday, 23 November 2016 at 15:44:10 UTC, Joseph Rushton 
 Wakeling wrote:
 auto m = (&r).map!(a=>1);

 ...?  (Untested, but I think it should work.)
Tested, works with 2.071.1
r.randomRangeAdaptor.map!(a=>1);
Nov 23 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 13:41:25 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/2016 12:58 AM, Ilya Yaroshenko wrote:
 On Tuesday, 22 November 2016 at 23:55:01 UTC, Andrei 
 Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar 
 to C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
It is safe low level architecture without performance and API issues.
I don't understand this. Can you please be more specific? I don't see a major issue wrt offering opCall() vs. front/popFront. (empty is always true.)
A range to use it with std.algorithm and std.range must be copyable (it is passed by value.
 It
 prevents users to do stupid things implicitly (like copying 
 RNGs).
An input range can be made noncopyable.
Ditto. A noncopyable input range is useless.
 A
 hight level range interface can be added in the future (it 
 will hold a
 _pointer_ to an RNG).
Is there a reason to not have that now?
Done. See `RandomRangeAdaptor`: https://github.com/libmir/mir-random/blob/master/source/random/algorithm.d
 In additional, when you need to write algorithms
 or distributions opCall is much more convenient than range API.
Could you please be more specific? On the face of it I'd agree one call is less than two, but I don't see a major drawback here.
The main reason in implementation simplicity. Engines should be simple to create, simple to maintain, and simple to use. opCall is more simple then range interface because 1. One declaration instead of 4 (3 range functions for plus latest generated value (optional)) 2. Input range is useless if range is not copyable. 3. `randomRangeAdaptor` is implemented for Engines and will be done for Distributions too. So range API is supported better then in std.range (because Engines are copied).
 In
 additions, users would not use Engine API in 99% cases: they 
 will just
 want to call `rand` or `uniform`, or other distribution.

 I am sure that almost any library should have low level API 
 that is fits
 to its implementation first. Addition API levels also may be 
 added.
Is there a large difference between opCall and front/popFront? Actually I can think of one - the matter of getting things started. Ranges have this awkwardness of starting the iteration: either you fill the current front eagerly in the constructor, or you have some sort of means to detect initialization has not yet been done and do it lazily upon the first use of front. The best strategy would depend on the actual generator, and admittedly would be a bit more of a headache compared to opCall. Was this the motivation?
Simplicity is main motivation.
 ### Example of API+implementation bug:

 #### Bug: RNGs has min and max params (hello C++). But, they 
 are not
 used when an uniform integer number is generated : 
 `uniform!ulong` /
 `uniform!ulong(0, 100)`.

 #### Solution: In Mir Rundom any RNGs must generate all 
 8/16/32/64 bits
 uniformly. It is RNG problem how to do it.
Min and max are not parameters, they are bounds provided by each generator. I agree their purpose is unclear. We could require all generators to provide min = 0 and max = UIntType.max without breaking APIs. In that case we only need to renounce LinearCongruentialEngine with c = 0 (see https://github.com/dlang/phobos/blob/master/std/random.d#L258) - in fact that's the main reason for introducing min and max in the first place. All other code stays unchanged, and we can easily deprecate min and max for RNGs. (I do see min and max used by uniform at https://github.com/dlang/phobos/blob/master/std/random.d#L1281 so I'm not sure I get what you mean, but anyhow the idea that we require RNGs to fill an uint/ulong with all random bits simplifies a lot of matters.)
Current Mir solution looks like pair isURBG and isSURBG. `S` prefix means `T.max == ReturnType!T.max` where T is an Engine. So, functions use isSURBG now. The min property is not required: we can just subtract actual min from a returning value. An adaptor can be added to convert URBG to Saturated URBG.
 I will not fill this bug as well another dozen std.random bugs 
 because
 the module should be rewritten anyway and I am working on it. 
 std.random
 is a collection of bugs from C/C++ libraries extended with D 
 generic
 idioms. For example, there is no reason in 64 bit Xorshift. It 
 is 32 bit
 by design. Furthermore, 64 expansion of 32 bit algorithms must 
 be proved
 theoretically before we allow it for end users. 64 bit analogs 
 are
 exists, but they have another implementations.
One matter that I see is there's precious little difference between mir.random and std.random. Much of the code seems copied, which is an inefficient way to go about things. We shouldn't fork everything if we don't like a bit of it, though admittedly the path toward making changes in std is more difficult. Is your intent to work on mir.random on the side and then submit it as a wholesale replacement of std.random under a different name? In that case you'd have my support, but you'd need to convince me the replacement is necessary. You'd probably have a good case for eliminating xorshift/64, but then we may simply deprecate that outright. You'd possibly have a more difficult time with opCall.
I started with Engines as basis. The library will be very different comparing with Phobos and _any_ other RNG libraries in terms of floating point generation quality. All FP generation I have seen are not saturated (amount of possible unique FP values are very small comparing with ideal situation because of IEEE arithmetic). I have not found the idea described by others, so it may be an article in the future. A set of new modern Engines would be added (Nicholas Wilson, and may be Joseph). Also Seb and I will add a set of distributions.
 Phobos degrades because
 we add a lot of generic specializations and small utilities 
 without
 understanding use cases.
This is really difficult to parse. Are you using "degrades" the way it's meant? What is a "generic specialization"? What are examples of "small utilities without understanding use cases"?
Sorry, my English is ... . It is not clear to me what subset of generic code is nothrow (reduce, for example). The same true for BetterC concept: it is hard to predict when an algorithms requires DRuntime to be linked / initialised. It is not clear what modules are imported by an module. "small utilities without understanding use cases" - Numeric code in std.algorithm: minElement, sum. They should not be in std.algorithm. A user can use `reduce`. Or, if speed is required we need to move to numeric solution suitable for vectorization. And std.algorithm seems to be wrong module for vectorised numeric code.
 Phobos really follows stupid idealistic idea:
 more generic is better, more API is better, more universal 
 algorithms is
 better. The problems is that Phobos/DRuntime is soup where all 
 (because
 its "universality") interacts with everything.
I do think more generic is better, of course within reason. It would be a tenuous statement that generic designs in Phobos such as ranges, algorithms, and allocators are stupid and idealistic. So I'd be quite interested in hearing more about this. What's that bouillabaisse about?
For example std.allocator. It is awesome! But I can not use it in GLAS, because I don't understand if it will work without linking with DRuntime. So, I copy-pasted and modified your code for AlignedMallocator: https://github.com/libmir/mir-glas/blob/master/source/glas/internal/memory.d ranges, algorithms seem good to me except it is not clear when code is nothrow /BetterC. std.math is a problem: we are adding new API without solving existing API problems and C compatibility. std.complex prevents math optimisations (this can not be solved without a compiler hacks), GLAS migrated to native (old) complex numbers. I like generics when they make D usage simpler. If one will add a random number generation for Phobos sorting algorithm it will make it useless for BetterC (because it will require to link RNG). Such issues are not reviewed during Phobos review process. Linking Phobos / DRuntime is not an option because it has not backward binary compatibility, so packages can not be distributed as precompiled libraries. std.traits, std.meta, std.range.primitives, std.ndslice, and part of std.math is only modules I am using in Mir libraries. It is very important to me to have BetterC guarainties between different Phobos versions. Super generic code when different modules imports each other is hard to review. Best regards, Ilya
Nov 23 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
 I started with Engines as basis. The library will be very different
 comparing with Phobos and _any_ other RNG libraries in terms of floating
 point generation quality. All FP generation I have seen are not
 saturated (amount of possible unique FP values are very small comparing
 with ideal situation because of IEEE arithmetic). I have not found the
 idea described by others, so it may be an article in the future.
Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API? To not be able to vs. to not want to is a different matter; I have an appreciation for each, but we need to clarify. -- Andrei
Nov 23 2016
next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
 I started with Engines as basis. The library will be very 
 different
 comparing with Phobos and _any_ other RNG libraries in terms 
 of floating
 point generation quality. All FP generation I have seen are not
 saturated (amount of possible unique FP values are very small 
 comparing
 with ideal situation because of IEEE arithmetic). I have not 
 found the
 idea described by others, so it may be an article in the 
 future.
Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API? To not be able to vs. to not want to is a different matter; I have an appreciation for each, but we need to clarify. -- Andrei
Floating point generation are implementation matter. Distributions and algorithms are design matter. Assume your are building a modification Mod_X ~ 1 / X + X for a distribution. This is how it will look in Mir Random: ------------------------- struct Mod(D) if(isRandomVariable!D && isFloatingPoint!(ReturnType!D)) { D irv; alias irv this; ReturnType!D opCall(G)(ref G gen) if(isSURBG!D) { ReturnType!D x1 = void; do x1 = irv(gen); while(x1 == 0); ReturnType!D x2 = irv(gen); ///////////////////////////////////////////////////////////////////////// ///////////////// X1 and X2 are independent! /////////////// //////////////////////////////////////////////////////////////////////// return 1 / x1 + x2; } } unittest { auto gen = Xorshift(1); auto rv = Mod!GammaRandomVariable(...); auto x = rv(gen); /// generator is never copied by value. } ------------------------- How it can be done with Range interface instead of opCall? Please note that users would use Distributions, not Engines. So, Range API requirements are: 1. Engine must not have implicit copy semantic: it is not correct for RNGs and has performance issues. They also should not be copied explicitly in this example. 2. Do not use Engine's pointers in RandomVariable, because RandomVariables can be passed easily outside of function: they are just a small tuples of params. 3. Do not use classes because of BetterC and performance issues. 4. Engines must have Range interface 5. RandomVariables (both Mod and an underlaying) must have Range interface. I don't think Range API for random variables can be done easily and without performance or security issues. Thanks, Ilya
Nov 23 2016
next sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 18:13:52 UTC, Ilya Yaroshenko 
wrote:
 Assume your are building a modification Mod_X ~ 1 / X + X for a 
 distribution. This is how it will look in Mir Random:
EDIT: Mod_X ~ Y + X, Y ~ X. (X & Y are independent)
Nov 23 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/2016 01:13 PM, Ilya Yaroshenko wrote:
 1. Engine must not have implicit copy semantic: it is not correct for
 RNGs and has performance issues. They also should not be copied
 explicitly in this example.
OK, so (lack of) copyability is a good argument. I'm ready to live with random generators that are noncopyable values and use opCall; then use a range adaptor on top of a pointer to that. It seems providing the range primitives for a noncopyable object is not useful and is liable to create more confusion and frustration than a distinct API. A tip for the range adaptor: have it allocate and own the generator internally. That way it's easy to make it reference counted economically. Andrei
Nov 23 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei 
Alexandrescu wrote:
 A tip for the range adaptor: have it allocate and own the 
 generator internally. That way it's easy to make it reference 
 counted economically.
Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Nov 23 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu wrote:
 A tip for the range adaptor: have it allocate and own the generator
 internally. That way it's easy to make it reference counted economically.
Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Yah, problem here is we can't make that safe. -- Andrei
Nov 23 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei 
 Alexandrescu wrote:
 A tip for the range adaptor: have it allocate and own the 
 generator
 internally. That way it's easy to make it reference counted 
 economically.
Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Yah, problem here is we can't make that safe. -- Andrei
Can we improve D to make it safe?
Nov 23 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/23/16 3:05 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:51:50 UTC, Andrei Alexandrescu wrote:
 On 11/23/16 2:40 PM, Ilya Yaroshenko wrote:
 On Wednesday, 23 November 2016 at 19:16:29 UTC, Andrei Alexandrescu
 wrote:
 A tip for the range adaptor: have it allocate and own the generator
 internally. That way it's easy to make it reference counted
 economically.
Current adaptor should be used in a function scope. (Would be great to have a DIP for that kind of semantic check). An RC adaptor can be added too. First we need to find a real world use case where we need to store a random range outside of a function. -- Ilya
Yah, problem here is we can't make that safe. -- Andrei
Can we improve D to make it safe?
It's difficult in the general case for objective reasons (modular analysis). For this, we may be able to take an argument by ref, save its address inside, add some scope attributes. It's a project. -- Andrei
Nov 23 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko 
wrote:
 Current adaptor should be used in a function scope. (Would be 
 great to have a DIP for that kind of semantic check). An RC 
 adaptor can be added too. First we need to find a real world 
 use case where we need to store a random range outside of a 
 function. -- Ilya
You want to instantiate and seed Mt every time you call a function that may need random numbers?
Nov 24 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Thursday, 24 November 2016 at 08:59:03 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 19:40:47 UTC, Ilya Yaroshenko 
 wrote:
 Current adaptor should be used in a function scope. (Would be 
 great to have a DIP for that kind of semantic check). An RC 
 adaptor can be added too. First we need to find a real world 
 use case where we need to store a random range outside of a 
 function. -- Ilya
You want to instantiate and seed Mt every time you call a function that may need random numbers?
A function can use a global RNG defined by a user or accepts RNG by reference. Range adaptor stores pointer, not value.
Nov 24 2016
prev sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 16:54:44 UTC, Andrei 
Alexandrescu wrote:
 On 11/23/2016 10:52 AM, Ilya Yaroshenko wrote:
 I started with Engines as basis. The library will be very 
 different
 comparing with Phobos and _any_ other RNG libraries in terms 
 of floating
 point generation quality. All FP generation I have seen are not
 saturated (amount of possible unique FP values are very small 
 comparing
 with ideal situation because of IEEE arithmetic). I have not 
 found the
 idea described by others, so it may be an article in the 
 future.
Is this a design matter, or an implementation matter? Could you implement the superior FP generation on the existing std.random API?
Real uniform `rand` (-2^^exp, +2^^exp) and real UniformVariable [a, b) was added. `rand` dose not use IEEE arithmetic to generate a real random number. --Ilya
Nov 24 2016
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 23.11.2016 00:55, Andrei Alexandrescu wrote:
 On 11/22/16 1:31 AM, Ilya Yaroshenko wrote:
  - `opCall` API instead of range interface is used (similar to C++)
This seems like a gratuitous departure from common D practice. Random number generators are most naturally modeled in D as infinite ranges. -- Andrei
I would posit that it is not actually natural to model random numbers as ranges. Not every random object is a sequence. Different samples are (ideally) independent, so there is little conceptual motivation to group them together -- it just increases the logistic overhead needed to get at the single samples. I.e., the most natural way to use a PRNG range is as follows: PRNG range; auto sample(){ auto r=range.front; range.popFront(); return r; } PRNGs are an example where global state is actually desirable, because in contrast to most other programming tasks, usually less predictability is good.
Nov 24 2016
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:
 ##### Integer uniform generators
 [WIP]

 ##### Real uniform generators
 [WIP]

 ##### Nonuniform generators
 [WIP]
As we discussed in relation to Seb's project, I think this is a problematic conceptualization of the best way to structure functionality related to randomness. An arguably better way (as outlined in the C++11 standard) is to think in terms of: * random _generators_, i.e. sources of uniformly distributed random bits: - random _engines_ (seedable, pseudo-random algorithms) - random _devices_ (non-deterministic sources of uniformly distributed bits) * random _distributions_, which transform uniformly-distributed random bits into variates with some other type and distribution - note _this includes uniformly-distributed integers_! - also uniformly-distributed floats, enums, etc. - and also non-uniform distributions * random _algorithms_, i.e. algorithms in the sense of std.random, but where their popFront() includes random decision-making - randomCover, randomSample, etc. The point of the above breakdown is that it gives a nice and clear separation of concerns that allows for easily replaceable components. Separating out stuff just by the ultimate result you want (integers vs. floats, uniform vs. non-uniform, etc.) isn't helpful in that way.
Nov 22 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/22/16 7:36 PM, Joseph Rushton Wakeling wrote:
   * random _generators_, i.e. sources of uniformly distributed random bits:

     - random _engines_ (seedable, pseudo-random algorithms)

     - random _devices_ (non-deterministic sources of uniformly
 distributed bits)

   * random _distributions_, which transform uniformly-distributed random
 bits
     into variates with some other type and distribution

     - note _this includes uniformly-distributed integers_!

     - also uniformly-distributed floats, enums, etc.

     - and also non-uniform distributions

   * random _algorithms_, i.e. algorithms in the sense of std.random, but
 where
     their popFront() includes random decision-making

     - randomCover, randomSample, etc.
Yah, I think that would be nice. -- Andrei
Nov 22 2016
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:
  - 64-bit Mt19937 is default for 64-bit targets
This means that seemingly identical code will produce different results depending on whether it's compiled for 64-bit or 32-bit. Is that really worth it, when anyone who cares about the difference between 32-bit vs. 64-bit random words is quite capable of specifying the RNG they want to use and not just relying on the default? Having a different default RNG would make sense for targets where there are serious performance issues at stake (e.g. minimal memory available for RNG state) but just for the sake of 32- vs. 64-bit Mersenne Twister seems an unnecessary complication. These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG, so if the default is going to be changed, it might as well be to something significantly better (rather than worrying about numbers of bits).
Nov 22 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:
 These days it's debatable whether Mersenne Twister of _any_ word size is
 the optimal choice for a default RNG
Interesting. Could you please add a couple of links about that? -- Andrei
Nov 22 2016
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:
 On 11/22/16 7:44 PM, Joseph Rushton Wakeling wrote:
 These days it's debatable whether Mersenne Twister of _any_ 
 word size is
 the optimal choice for a default RNG
Interesting. Could you please add a couple of links about that? -- Andrei
http://www.pcg-random.org/other-rngs.html ;-)
Nov 22 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
Alexandrescu wrote:
 Interesting. Could you please add a couple of links about that? 
 -- Andrei
http://xoroshiro.di.unimi.it/
Nov 23 2016
parent reply Andrea Fontana <nospam example.com> writes:
On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
 Alexandrescu wrote:
 Interesting. Could you please add a couple of links about 
 that? -- Andrei
http://xoroshiro.di.unimi.it/
Very short public domain implementation: http://xoroshiro.di.unimi.it/xoroshiro128plus.c
Nov 24 2016
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Thursday, 24 November 2016 at 08:36:41 UTC, Andrea Fontana 
wrote:
 On Wednesday, 23 November 2016 at 17:31:58 UTC, Kagamin wrote:
 On Wednesday, 23 November 2016 at 01:28:11 UTC, Andrei 
 Alexandrescu wrote:
 Interesting. Could you please add a couple of links about 
 that? -- Andrei
http://xoroshiro.di.unimi.it/
Very short public domain implementation: http://xoroshiro.di.unimi.it/xoroshiro128plus.c
Implementation in D, written during DConf 2016 ;-) https://github.com/WebDrake/dxorshift
Nov 24 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 00:44:26 UTC, Joseph Rushton 
Wakeling wrote:
 On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
 wrote:
  - 64-bit Mt19937 is default for 64-bit targets
This means that seemingly identical code will produce different results depending on whether it's compiled for 64-bit or 32-bit. Is that really worth it, when anyone who cares about the difference between 32-bit vs. 64-bit random words is quite capable of specifying the RNG they want to use and not just relying on the default? Having a different default RNG would make sense for targets where there are serious performance issues at stake (e.g. minimal memory available for RNG state) but just for the sake of 32- vs. 64-bit Mersenne Twister seems an unnecessary complication. These days it's debatable whether Mersenne Twister of _any_ word size is the optimal choice for a default RNG, so if the default is going to be changed, it might as well be to something significantly better (rather than worrying about numbers of bits).
Mir Random is going to be a library with saturated uniform FP RNGs and almost saturated exponential FP RNGs. Comparing with all other libraries (any language) the basic uniform FP numbers will be generated in interval (-1, +1) and contains _all_ possible values including all subnormal numbers. 64 bit generators are 2 times faster for this task if you need to generate a 64 bit floating point number. Explanation of technique will be in my post/article. --Ilya
Nov 22 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko 
wrote:
 Mir Random is going to be a library with saturated uniform FP 
 RNGs and almost saturated exponential FP RNGs. Comparing with 
 all other libraries (any language) the basic uniform FP numbers 
 will be generated in interval (-1, +1) and contains _all_ 
 possible values including all subnormal numbers. 64 bit 
 generators are 2 times faster for this task if you need to 
 generate a 64 bit floating point number. Explanation of 
 technique will be in my post/article. --Ilya
All of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility? Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do. By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid. In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target? In which case it needs to be a user choice, since it's only the user who knows what distribution they're using.
Nov 23 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 10:33:21 UTC, Joseph Rushton 
Wakeling wrote:
 On Wednesday, 23 November 2016 at 05:26:12 UTC, Ilya Yaroshenko 
 wrote:
 [...]
All of which is fine in its own terms, but why prioritize the speed of the default behaviour over its reliability and reproducibility? Anyone who cares about that combination of speed and statistical quality will have enough information in the documentation to know what to do. By contrast producing different results for identical user code depending on whether you're making a 32- or 64-bit build is an unpleasant complication it could be better to avoid. In any case, given what you say above, shouldn't the choice of 32- vs. 64-bit RNG depend on whether one is using a distribution that generates 32- vs. 64-bit floating-point variates, rather than whether one is building for a 32- or 64-bit target? In which case it needs to be a user choice, since it's only the user who knows what distribution they're using.
We have a Random alias. I think it is OK if it generates different numbers for different arch and library versions. If a user want to exact the same behaviour he can use explicit names like Mt19937_32 and Mt19937_64. Both are defined for all architectures. 64-bit has not significant speed degradation on 64-bit machines for 32-bit random number generation. Maybe only few %. So it is better for 64-bit machines. It is only question of `Random` alias, which can be changed in the future anyway. Both Mt19937_32 and Mt19937_64 are defined.
Nov 23 2016
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 11:03:33 UTC, Ilya Yaroshenko 
wrote:
 It is only question of `Random` alias, which can be changed in 
 the future anyway. Both Mt19937_32 and Mt19937_64 are defined.
I think we're at an impasse in terms of priorities, because that's exactly the reason that I think you should leave the Random alias pointing to the same generator, and let the people with speed/quality concerns select the alternative generator ;-)
Nov 23 2016
prev sibling parent reply Andrea Fontana <nospam example.com> writes:
On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
wrote:
  - 64-bit Mt19937 initialization is fixed
  - 64-bit Mt19937 is default for 64-bit targets
I wonder why Mersenne Twister is *always* choosen over other algorithms. My vote goes for CMWC, of course. Andrea
Nov 23 2016
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana 
wrote:
 I wonder why Mersenne Twister is *always* choosen over other 
 algorithms.
The weight of history, I suspect. Mersenne Twister was the major new high-quality RNG back when people started getting really concerned about having good defaults, and when the Diehard Tests were the state of the art in tests of randomness. IIRC there's also a benefit in terms of dimensionality, which some more recent generators don't address, which can make it a safer "all-round default". Agree that there are probably better choices for today, though.
Nov 23 2016
prev sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 23 November 2016 at 13:01:22 UTC, Andrea Fontana 
wrote:
 On Tuesday, 22 November 2016 at 06:31:45 UTC, Ilya Yaroshenko 
 wrote:
  - 64-bit Mt19937 initialization is fixed
  - 64-bit Mt19937 is default for 64-bit targets
I wonder why Mersenne Twister is *always* choosen over other algorithms. My vote goes for CMWC, of course. Andrea
A PR for CMWC is highly welcome!
Nov 23 2016