www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dconf 2015 talks...

reply Era Scarecrow <rtcvb32 yahoo.com> writes:
  Finally getting around to watching all the talks, all good stuff 
:) Figured I'd make a thread talking about things I came across 
and perhaps get into a discussion on them in detail.

  Anyways I'm watching the talk with Joseph Wakeling involving RNG 
and I wonder, is that still a problem or not?

  I ask since the simplest and most correct thing I can think of 
is to use the heap for a referenced state, and dup would then 
duplicate the state properly while otherwise the RNG problems go 
away that most of the talk went on about.

  It's possible I've been out of the loop and this has already 
been solved. Not sure yet, I haven't looked closely at any of the 
sources or the language updates.

  So... time for some rusty D prototyping.

[code]
struct RNG {
   int* state;

   this(int seed) {
     state = new int;
     *state = seed;
   }

   //if you provide a pointer, we don't rely on the heap/gc at 
all...
   //maybe a small stack based allocator could be useful for this?
   this(int* seed) { state = seed; }

   //dup instead of save, so it's not a forward range
   RNG dup() { return RNG(*state); }

   //popfront, front & empty as expected
}
[/code]
Jan 25
next sibling parent Dicebot <public dicebot.lv> writes:
Main problem is with both making it  safe and avoid unforced 
allocations (i.e. allowing shared state to be kept on stack of 
`main`).
Jan 25
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 13:05:46 UTC, Era Scarecrow wrote:
  Finally getting around to watching all the talks, all good 
 stuff :) Figured I'd make a thread talking about things I came 
 across and perhaps get into a discussion on them in detail.

  Anyways I'm watching the talk with Joseph Wakeling involving 
 RNG and I wonder, is that still a problem or not?
It's still an issue, at least partly because amid all other things in life, I haven't had much opportunity to sit down and focus on it :-(
  I ask since the simplest and most correct thing I can think of 
 is to use the heap for a referenced state, and dup would then 
 duplicate the state properly while otherwise the RNG problems 
 go away that most of the talk went on about.

  It's possible I've been out of the loop and this has already 
 been solved. Not sure yet, I haven't looked closely at any of 
 the sources or the language updates.

  So... time for some rusty D prototyping.

 [code]
 struct RNG {
   int* state;

   this(int seed) {
     state = new int;
     *state = seed;
   }

   //if you provide a pointer, we don't rely on the heap/gc at 
 all...
   //maybe a small stack based allocator could be useful for 
 this?
   this(int* seed) { state = seed; }

   //dup instead of save, so it's not a forward range
   RNG dup() { return RNG(*state); }

   //popfront, front & empty as expected
 }
 [/code]
The major problems are not so much with RNGs per se (note that your approach can actually be much more nicely implemented as a final class). It's all the functionality that _wraps_ RNGs, such as random algorithms like randomCover and randomSample, or a D equivalent of C++'s random distributions. These need to also be reference types (for their internal state as well as the RNG they wrap), and here it gets very tricky to avoid nasty situations with lots of allocations and frees (when ideally, you'd like stuff like this to be stack only). Don't even ask about how complicated a `.dup` method gets when you start trying to extend to such entities. :-(
Jan 25
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Monday, 25 January 2016 at 14:31:12 UTC, Joseph Rushton 
Wakeling wrote:
 It's all the functionality that _wraps_ RNGs, such as random 
 algorithms like randomCover and randomSample, or a D equivalent 
 of C++'s random distributions.  These need to also be reference 
 types (for their internal state as well as the RNG they wrap), 
 and here it gets very tricky to avoid nasty situations with 
 lots of allocations and frees (when ideally, you'd like stuff 
 like this to be stack only).
Hmm i wonder... If recognizes it as infinite, could it avoid treating them as forward ranges? As a struct it still wouldn't work, but as a class/reference type it would work then...
Jan 25
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 15:38:45 UTC, Era Scarecrow wrote:
  Hmm i wonder... If recognizes it as infinite, could it avoid 
 treating them as forward ranges? As a struct it still wouldn't 
 work, but as a class/reference type it would work then...
They shouldn't be forward ranges anyway, because if their content is randomly generated then it's not legit for them to implement the .save property. The whole implementation of stuff in std.random via forward ranges is a massive design mistake. Implementing the random algorithms/other wrappers as a class is problematic because then you get into the hassle of potentially having to new/free a lot of individual heap objects deep in the inner loop of your program. I already tried this in hap.random, and came to the conclusion that it wasn't a valid approach.
Jan 25
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
BTW, I apologize for the rather terse replies so far; busy day 
:-)  I'll try and write out a more complete summary of the 
problem at some point in the near future (though it might have to 
wait 'til the weekend).

Thanks for the interest in contributing to solving this problem 
:-)
Jan 25
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, 25 January 2016 at 17:19:05 UTC, Joseph Rushton 
Wakeling wrote:
 On Monday, 25 January 2016 at 15:38:45 UTC, Era Scarecrow wrote:
  Hmm i wonder... If recognizes it as infinite, could it avoid 
 treating them as forward ranges? As a struct it still wouldn't 
 work, but as a class/reference type it would work then...
They shouldn't be forward ranges anyway, because if their content is randomly generated then it's not legit for them to implement the .save property. The whole implementation of stuff in std.random via forward ranges is a massive design mistake.
As long as the numbers are pseudo-random, then in theory, there's no problem with a range of random numbers being a forward range. That could be useful upon occasion (similar to how it can be useful that the same seed results in the same sequence of numbers). The problem is that if they're a forward range, then you tend to get code that accidentally ends up reusing numbers in the sequence and that definitely is a problem. So ultimately, they probably should be input ranges rather than forward ranges, but I think that it's more of an issue with human error than with what makes sense from a technical perspective. Regardless, non-pseudo random number generators obviously can't be forward ranges though, since their state isn't savable or repeatable. - Jonathan M Davis
Jan 25
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 18:40:59 UTC, Jonathan M Davis 
wrote:
 As long as the numbers are pseudo-random, then in theory, 
 there's no problem with a range of random numbers being a 
 forward range.
I thought that too, for a long time, but I came to the conclusion it's not the case. Here's the essence of the problem: a forward range is a promise to the code that uses it, "I am a deterministic sequence." But the whole point of pseudo-random sequences is that you're not supposed to be able to tell them from actual randomness. If you _tell_ downstream code "I am deterministic", that code may make assumptions about how it can use you, that are valid for "normal" deterministic sequences, but aren't valid for what are supposedly random sequences. Consider the typical handling of forward ranges: auto foo (R) (R range) if (isForwardRange!R) { auto rcopy = range.save; // do something with rcopy } That's fine if you're dealing with something whose behaviour is meant to be deterministic, but if you call this with a pseudo-random sequence, than every time you call it, you will get the same result. It won't matter if what you pass is a reference-type -- the .save (which is correct behaviour for handling a forward range) means you're just repeatedly using a copy of the random sequence.
 That could be useful upon occasion (similar to how it can be 
 useful that the same seed results in the same sequence of 
 numbers).
Right, but the point is that re-use of a pseudo-random sequence should happen at programmer request only, not under the hood in library code they call -- which is what unfortunately happens if you implement RNGs and other random sequences as forward ranges :-(
 The problem is that if they're a forward range, then you tend 
 to get code that accidentally ends up reusing numbers in the 
 sequence and that definitely is a problem. So ultimately, they 
 probably should be input ranges rather than forward ranges, but 
 I think that it's more of an issue with human error than with 
 what makes sense from a technical perspective.
Again, I thought that too, but I came to the conclusion that I'd been wrong. It's not about fallible humans, it's about the promise a forward range makes. Superficially, it looks like that promise is: "You can use my deterministic nature if you need to." But actually, I think it is really something stronger: "You can use my deterministic nature freely and nothing will go wrong in doing so." That's a much stronger promise than can be allowed for a pseudo-RNG, and I think it's borne out in the way in which e.g. phobos functionality makes use of forward ranges.
 Regardless, non-pseudo random number generators obviously can't 
 be forward ranges though, since their state isn't savable or 
 repeatable.
Right -- non-PRNGs must be input ranges by design. I came to the conclusion that pseudo-RNGs need to be input ranges, but that implement an alternative to .save: a .dup method whose promise is, "You can duplicate the state and hence behaviour of this range, but you can't make any assumptions that this is a safe or sane thing to do."
Jan 25
next sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Monday, 25 January 2016 at 21:20:09 UTC, Joseph Rushton 
Wakeling wrote:
 I thought that too, for a long time, but I came to the 
 conclusion it's not the case.

 <snip>

 That's fine if you're dealing with something whose behaviour is 
 meant to be deterministic, but if you call this with a 
 pseudo-random sequence, than every time you call it, you will 
 get the same result.  It won't matter if what you pass is a  
 reference-type -- the .save (which is correct behaviour for 
 handling a forward range) means you're just repeatedly using a 
 copy of the random sequence.

 <snip>

 Right -- non-PRNGs must be input ranges by design.  I came to 
 the conclusion that pseudo-RNGs need to be input ranges, but 
 that implement an alternative to .save: a .dup method whose 
 promise is, "You can duplicate the state and hence behaviour of 
 this range, but you can't make any assumptions that this is a 
 safe or sane thing to do."
So in short, the RNG shouldn't be a range at all. Of course it could be a struct (for sanity and other reasons), but not a range. I wonder then, assuming we remove RNG from being a range, the a RNG could give out a delegate of it's current data, and that delegate get passed to a range-like wrapper? Or maybe the RNG returns a Voldemort range that referrs to the original RNG which isn't a range anymore... Actually that sounds promising. Aliasing could deal with it automatically converting/creating the range.
Jan 25
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, 25 January 2016 at 21:30:47 UTC, Era Scarecrow wrote:
 On Monday, 25 January 2016 at 21:20:09 UTC, Joseph Rushton 
 Wakeling wrote:
 Right -- non-PRNGs must be input ranges by design.  I came to 
 the conclusion that pseudo-RNGs need to be input ranges, but 
 that implement an alternative to .save: a .dup method whose 
 promise is, "You can duplicate the state and hence behaviour 
 of this range, but you can't make any assumptions that this is 
 a safe or sane thing to do."
So in short, the RNG shouldn't be a range at all. Of course it could be a struct (for sanity and other reasons), but not a range.
Why? They work fine as input ranges regardless of whether having them be forward ranges make sense. By its very nature, an input range cannot be saved, and it's not assumed that it's deterministic. The issue that we run into isn't that they're ranges but that they're not implemented as reference types, and copies of them accidentally get used, resulting in subtle bugs. But I'd strongly argue that we should at least consider requiring that all input ranges be reference types, since they don't make sense as value types, and pseudo-reference types are just plain buggy. That's a wider discussion than random number generator ranges though. - Jonathan M Davis
Jan 25
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 21:30:47 UTC, Era Scarecrow wrote:
  So in short, the RNG shouldn't be a range at all. Of course it 
 could be a struct (for sanity and other reasons), but not a 
 range.

  I wonder then, assuming we remove RNG from being a range, the 
 a RNG could give out a delegate of it's current data, and that 
 delegate get passed to a range-like wrapper? Or maybe the RNG 
 returns a Voldemort range that referrs to the original RNG 
 which isn't a range anymore... Actually that sounds promising. 
 Aliasing could deal with it automatically converting/creating 
 the range.
Well, an idea that was suggested to me at DConf (by several people; thanks in particular to Steve S. and to DiceBot!) was indeed to separate RNG state from the range interface, and to disable copy-by-value for the state data structure. One option would be to implement the basic RNG data structor à la C++, as a functor: so it'd be something like: struct SomeRNG(UIntType, ...) { private: // the RNG state variables public: UIntType opCall() { // returns a random variate } } ... and again, you'd disable copy-by-value for this entity. I had some fun a while ago playing with this and the appropriate technique to wrap it into a range (it's not as trivial as you think, because you need to use some tricks to ensure truly lazy evaluation of the range, where D ranges typically evaluate the first element eagerly). Where I ran into trouble was considering how to extend this approach in a viable way to stuff like randomSample, i.e. the stuff which wraps randomness, which also needs to ensure its internal state is never copied by value, and yet which needs (ideally) to fit nicely into a UFCS chain of ranges: someInput.filter!(WHATEVER).randomSample(n, rng).map!(...).writeln; I suppose it might be possible to implement a functor-based approach for this, that could be wrapped in a range, but it feels nasty for something which really ought to fit more naturally into the range syntax: a random sample is just an algorithm (similar to those in std.algorithm), but one whose elements need to be truly lazily evaluated and whose values are not deterministic but depend on a source of randomness. The entire complication comes out of the fact that in order to play nice in a statistical sense, you need the RandomSample range to be a reference type, but in order to play nice with the kind of in-the-middle-of-the-inner-loop use above, you need it to be cheap and quick to instantiate and destroy (so classes and heap allocation are a problem). Basically, what you probably want is for RandomSample to be a struct, but with a reference-type internal state that is nevertheless allocated on the stack and that is cheap to create and let go of when you're done with it.
Jan 25
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/25/2016 05:05 PM, Joseph Rushton Wakeling wrote:
 One option would be to implement the basic RNG data structor à la C++,
 as a functor
That's semantically the same as an input range. -- Andrei
Jan 25
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 23:37:25 UTC, Andrei Alexandrescu 
wrote:
 On 01/25/2016 05:05 PM, Joseph Rushton Wakeling wrote:
 One option would be to implement the basic RNG data structor à 
 la C++,
 as a functor
That's semantically the same as an input range. -- Andrei
“Yes, but...” :-P There are actually some interesting subtleties required for the input range design, not just for RNGs but for ANY range whose elements are generated randomly. I will try and write this up properly later, but the TL;DR is, it involves doing extra work to ensure every element is _truly_ lazily evaluated. To some extent, splitting into functor and range wrapper can help clarify the code design there (even if the functor stays hidden as an implementation detail).
Jan 25
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/25/2016 04:20 PM, Joseph Rushton Wakeling wrote:
 I thought that too, for a long time, but I came to the conclusion it's
 not the case.
I see no problem with adding a category of rngs that are not forward. Naming is of course the hardest problem in our community :o). A good stard would be a /dev/random wrapper. -- Andrei
Jan 25
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 23:34:56 UTC, Andrei Alexandrescu 
wrote:
 I see no problem with adding a category of rngs that are not 
 forward. Naming is of course the hardest problem in our 
 community :o). A good stard would be a /dev/random wrapper. -- 
 Andrei
It's not about different categories of RNG in this case -- really, NO RNGs should be forward ranges. If ONLY it was just a naming problem... :-(
Jan 25
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/26/2016 02:07 AM, Joseph Rushton Wakeling wrote:
 On Monday, 25 January 2016 at 23:34:56 UTC, Andrei Alexandrescu wrote:
 I see no problem with adding a category of rngs that are not forward.
 Naming is of course the hardest problem in our community :o). A good
 stard would be a /dev/random wrapper. -- Andrei
It's not about different categories of RNG in this case -- really, NO RNGs should be forward ranges.
I think a pseudo-rng as a forward range is useful. It's good in testing and experimentation to fork a sequence of pseudo-random numbers, turn the clock back, etc. Essentially I see no harm in it; it's always easy to make a forward range into an input range, it's the opposite that's hard. -- Andrei
Jan 26
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Tuesday, 26 January 2016 at 12:07:59 UTC, Andrei Alexandrescu 
wrote:
 I think a pseudo-rng as a forward range is useful. It's good in 
 testing and experimentation to fork a sequence of pseudo-random 
 numbers, turn the clock back, etc. Essentially I see no harm in 
 it; it's always easy to make a forward range into an input 
 range, it's the opposite that's hard. -- Andrei
Yes, but there are other ways to fork that sequence that don't involve implementing the .save primitive. That's why in my talk (and here) I distinguish between the .save primitive versus some other possible primitive -- say, .dup -- whose promise is, "You can use me to duplicate the state of this range, but you must use me carefully and in an appropriate context." This is important, because if you just define pseudo-RNGs as forward ranges, most people won't read the small print about when you need to wrap it in an input range (which is pretty much any time you want to pass it into phobos code and have it behave like an actual source of randomness). As a result, they'll get unintended and possibly unnoticed statistical correlations -- annoying for something like a computer game, possibly a serious consequence if occurring in something like a research simulation.
Jan 26
prev sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Monday, 25 January 2016 at 17:19:05 UTC, Joseph Rushton 
Wakeling wrote:
 Implementing the random algorithms/other wrappers as a class is 
 problematic because then you get into the hassle of potentially 
 having to new/free a lot of individual heap objects deep in the 
 inner loop of your program.  I already tried this in 
 hap.random, and came to the conclusion that it wasn't a valid 
 approach.
What about an alternative allocator? Specifically I'm thinking in C's equivalent which is alloca (allocates directly on the stack with the rest of the variables). If the constructor is a function then that won't work; but if it's inlined then it should work. I suppose the alternate is an option to skip/throw away some numbers that should've been consumed (assuming you want to keep using the same seed), or seeding each per use.
Jan 25
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 20:26:12 UTC, Era Scarecrow wrote:
  What about an alternative allocator? Specifically I'm thinking 
 in C's equivalent which is alloca (allocates directly on the 
 stack with the rest of the variables). If the constructor is a 
 function then that won't work; but if it's inlined then it 
 should work.
I have been wondering about how allocators could help to deal with these problems. Could you put forward a minimal example of how you would see it working?
  I suppose the alternate is an option to skip/throw away some 
 numbers that should've been consumed (assuming you want to keep 
 using the same seed), or seeding each per use.
I'm not sure I follow what you mean here or why you think this would work? Could you give a concrete example?
Jan 25
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Monday, 25 January 2016 at 21:22:13 UTC, Joseph Rushton 
Wakeling wrote:
 On Monday, 25 January 2016 at 20:26:12 UTC, Era Scarecrow wrote:
  What about an alternative allocator? Specifically I'm 
 thinking in C's equivalent which is alloca (allocates directly 
 on the stack with the rest of the variables). If the 
 constructor is a function then that won't work; but if it's 
 inlined then it should work.
I have been wondering about how allocators could help to deal with these problems. Could you put forward a minimal example of how you would see it working?
Most likely alloca would have to be built into the compiler. Here's a crash course in how the stack memory management works. sp=stack pointer, bp=base pointer (more relevant pre 386). When you prepare to enter a function (or block) the bp register is backed up, the sp is saved in the bp register, then you add your arguments, then the function is entered. After the function is entered it adds to the sp register which give it a block of space for all known local variables at once; then does any assignments it needs to. When you leave the function/block you simply replace the sp register with your backup the bp register. Once you return you restore the old bp register. So something close to this: [code] ... push bp push calling_argument(s) call func sub sp, -N //size of total arguments pushed pop bp //completely done with function call. ... func: mov bp, sp add sp, +N //size of local variables // setup variables // code // cleanup/return mov sp, bp ret [/code] Technically alloca simply returns the current sp, then adds to it the number of bytes you requested. This means you have to run it at the function stack where you want to use it (and not in a called function, otherwise corruption). So inlined functions where alloca's data would remain would be a must. Then it comes down to a simple: [code] struct RNG { int *state; //add assert to functions that this isn't null this(int seed) { state = alloca(sizeof(int)); *state = seed; } } [/code]
  I suppose the alternate is an option to skip/throw-away some 
 numbers that should've been consumed (assuming you want to 
 keep using the same seed), or seeding each per use.
I'm not sure I follow what you mean here or why you think this would work? Could you give a concrete example?
certainly. [code] struct RNG { ... void skip(int x) {assert(x>0); while(x--) popfront();} } RNG rnd = RND(seed); rnd.take(10).writeln; //loosely based on talk examples rnd.skip(10); //skips the 'consumed' numbers. rnd.take(10).writeln; //won't have identical output [/code]
Jan 25
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
  Most likely alloca would have to be built into the compiler. 
 Here's a crash course in how the stack memory management works. 
 sp=stack pointer, bp=base pointer (more relevant pre 386).
An apology in advance: I have an early start tomorrow, and a busy few days coming up, so I probably won't be able to follow up on this discussion until the weekend. In the meantime, thanks for the food for thought. I'll try to be in touch about this topic soon :-)
Jan 25
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
 On Monday, 25 January 2016 at 21:22:13 UTC, Joseph Rushton 
 Wakeling wrote:
 I have been wondering about how allocators could help to deal 
 with these problems.  Could you put forward a minimal example 
 of how you would see it working?
Most likely alloca would have to be built into the compiler. Here's a crash course in how the stack memory management works. sp=stack pointer, bp=base pointer (more relevant pre 386).
Apologies for the delay in writing back about this. What you describe makes sense, but I don't quite follow what you mean in one particular case:
  Technically alloca simply returns the current sp, then adds to 
 it the number of bytes you requested. This means you have to 
 run it at the function stack where you want to use it (and not 
 in a called function, otherwise corruption). So inlined 
 functions where alloca's data would remain would be a must.
I don't quite follow your remark about inlined functions; do you mean that the function where the RNG instance is generated must be inlined? (That would make sense in order to avoid the internal state being deallocated immediately.) I think there might be more complications here than just allocating individual RNG instances, though (which can happen quite early on in the program); what about stuff like random algorithms (RandomCover, RandomSample) which might be generated deep in internal loops, passed to other functionality as rvalues, etc. etc.?
 Then it comes down to a simple:

 [code]
 struct RNG {
   int *state; //add assert to functions that this isn't null
   this(int seed) {
     state = alloca(sizeof(int));
     *state = seed;
   }
 }
 [/code]
Yes, missing your understanding of the details of how it would have to happen, this is pretty much what I had in mind for random ranges; a pointer to internal state nevertheless allocated on the stack. But the already-mentioned concerns about some of the ways that stack could be deallocated make for some concerns. It might be simpler, in practice, to just have the state refcounted.
  I suppose the alternate is an option to skip/throw-away some 
 numbers that should've been consumed (assuming you want to 
 keep using the same seed), or seeding each per use.
I'm not sure I follow what you mean here or why you think this would work? Could you give a concrete example?
certainly. [code] struct RNG { ... void skip(int x) {assert(x>0); while(x--) popfront();} } RNG rnd = RND(seed); rnd.take(10).writeln; //loosely based on talk examples rnd.skip(10); //skips the 'consumed' numbers. rnd.take(10).writeln; //won't have identical output [/code]
I'm afraid that's not really viable :-( In the best case, it's just working around the fundamental design problem via programmer virtue. But the problem is, in the general case, you can't anticipate how many random variates may be popped from your random number generator inside a function (it might depend on other input factors over which you as programmer have no control).
Feb 07
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Sunday, 7 February 2016 at 22:27:40 UTC, Joseph Rushton 
Wakeling wrote:
 On Monday, 25 January 2016 at 22:06:31 UTC, Era Scarecrow wrote:
 What you describe makes sense, but I don't quite follow what 
 you mean in one particular case:

  Technically alloca simply returns the current sp, then adds 
 to it the number of bytes you requested. This means you have 
 to run it at the function stack where you want to use it (and 
 not in a called function, otherwise corruption). So inlined 
 functions where alloca's data would remain would be a must.
That's the low level assembly language that's generated i'm referring to; At least based on what i've read and seen for code output from a compiler to a .s file or similar.
 I don't quite follow your remark about inlined functions; do 
 you mean that the function where the RNG instance is generated 
 must be inlined?  (That would make sense in order to avoid the 
 internal state being deallocated immediately.)
Assuming alloca moves to the inlined function. Although i had another idea thrown in my head where the memory would be pre-allocated and you could just point to it when requested via an allocator. So assume alloca(sizeof(int)) struct RNG { During instantiation it would know the size ahead of time and just append that to the end of the structure. That extra padding space could be handled manually instead. this(int seed) { state = cast(void*)(this+1); } But this forced type breaking is quite clunky (and obviously the wrong way to write it). I recall in C there was suppose to be a way to attach an array (of unknown size) immediately following a struct by making the length of the array 0, then accessing it directly. But you'd still need to guarantee somehow that the access rights are in place and not referencing other data which you could screw up (via optimizations or something).
 I think there might be more complications here than just 
 allocating individual RNG instances, though (which can happen 
 quite early on in the program); what about stuff like random 
 algorithms (RandomCover, RandomSample) which might be generated 
 deep in internal loops, passed to other functionality as 
 rvalues, etc. etc.?
Either they use more stack space, or they act normally after their call is done and are deallocated normally (automatically, unless they are passed outside of the scope where they were generated).
 It might be simpler, in practice, to just have the state 
 refcounted.

  I suppose the alternate is an option to skip/throw-away 
 some numbers that should've been consumed
I'm not sure I follow what you mean here or why you think this would work? Could you give a concrete example?
   void skip(int x) {assert(x>0); while(x--) popfront();}
 rnd.take(10).writeln;  //loosely based on talk examples
 rnd.skip(10);          //skips the 'consumed' numbers.
 rnd.take(10).writeln;  //won't have identical output
 I'm afraid that's not really viable :-(
 But the problem is, in the general case, you can't anticipate 
 how many random variates may be popped from your random number 
 generator inside a function.
True; Perhaps have one RNG for seeding and one RNG for passing, then reseed after passing the function off, although how far deep some of this could go with it's deeper copying; I don't know. Perhaps RNG should be a class outright, which probably removes a lot of these problems.
Feb 07
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 8 February 2016 at 00:54:24 UTC, Era Scarecrow wrote:
  Either they use more stack space, or they act normally after 
 their call is done and are deallocated normally (automatically, 
 unless they are passed outside of the scope where they were 
 generated).
It's that "passed outside of the scope where they were generated" that is bothering me. Consider: auto foo (Range) (Range r) if (isInputRange!Range) { return r.randomSample(100).take(10); } void main () { iota(1000).foo.writeln; } If the RandomSample internals are allocated on the stack using the technique you propose, what's going to happen here?
  True; Perhaps have one RNG for seeding and one RNG for 
 passing, then reseed after passing the function off, although 
 how far deep some of this could go with it's deeper copying; I 
 don't know.
No, I really don't think that's an approach that's worth pursuing, except as a desperate workaround for the faults of the existing design. RNGs should as much as possible "just work" and the user shouldn't have to concern themselves like this with
  Perhaps RNG should be a class outright, which probably removes 
 a lot of these problems.
It does indeed give you the desired reference semantics very cheaply. But now suppose you have lots of RandomSamples being instantiated in the inner loop of your program. If _they_ are also a class, how do you handle their instantiation and deallocation?
Feb 08
parent John Colvin <john.loughran.colvin gmail.com> writes:
On Monday, 8 February 2016 at 19:46:19 UTC, Joseph Rushton 
Wakeling wrote:
 [snip]
This might be a stupid idea, but perhaps there's something useful in it: Determinism isn't the same thing as "one long chain of numbers that everybody reads from". It can be acceptable to seed a set of reasonable pseudo-random number generators with consecutive integers (indeed seeding randomly can be dangerous because of the birthday problem). More generally, any change of the state of the rng in "seed-space" should produce an output equivalent to taking a sample from the output distribution. Can you not have a random number generator make a copy of itself like this: struct RNG { State state; static State.ModifierT modifier; this(this) { this.state.alterBy(modifier++); //recalculate output sample etc... } } Then any time you copy a RNG, the copy is kicked on to a new path in state-space. Basically we're deterministically re-seeding on copy.
Feb 08